embl-hentze-group / htseq-clip Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 21.74 MB

a toolset designed for the processing and analysis of eCLIP/iCLIP dataset

License: MIT License

Python 100.00%

bioinformatics eclip ngs-analysis

htseq-clip's People

Contributors

Stargazers

Watchers

Forkers

warddeb

htseq-clip's Issues

How to set the -e/--mate parameter for non-strand-specific or single-end sequencing libraries?

Hi,

I have a question regarding the -e/--mate parameter for the extract command.

Sometimes sequencing library is either non-strand-specific or single-end sequencing. According to the documentation, the -e/--mate parameter is used to select the read/mate to extract the crosslink sites from paired-end sequencing, with choices 1 or 2 (1 for the first mate and 2 for the second mate).

Could you please provide guidance on how to set this parameter for:

Non-strand-specific libraries
Single-end sequencing libraries

Thank you for your help!

Best regards,

Duplicates in unique_id column from mapToId function

A follow up of the issue opened by @connorrogerson in DEWSeq repo, reopening here as it turned out to be an htseq-clip issue.

Issue: duplicates in unique_id column from mapToId output, which caused DEWSeq to crash. The files causing this issue were generated as follows:

wget ftp:// ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz

I then just followed these steps:
htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed
htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 100 -s 50 -o SLBP_w100s50.txt
htseq-clip mapToId -a SLBP_w100s50.txt -o SLBP_w100s50_annotation.txt.gz

subset of the file SLBP_w100s50_annotation.subset.txt.gz

OSError: [Errno 28] No space left on device while running htseq-clip count

I'm working through the worked example from https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10 but I'm stuck at counting crosslinks due to an error.

Command:
htseq-clip count -i ../extract_xlink/1_xlink.bed -a /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_gencode_w50s20.txt -o 1_counts.csv

The job fails and the output is:
[INFO] run started at 2021-12-07 14:25
[INFO] Count crosslink sites
[INFO] Annotation file /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_gencode_w50s20.txt crosslink sites file ../extract_xlink/1_trimmed.bam.csv
[DEBUG] {} format: sliding window annotation file
Traceback (most recent call last):
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/clip.py", line 245, in main
_count(args)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/clip.py", line 102, in _count
countC.count(stranded)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/countCLIP.py", line 228, in count
with TempBed(self.annotation) as ta:
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/countCLIP.py", line 27, in enter
copyfile(self.bedfile,self.tmpBed)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device

A quick check of df -h shows plenty of space so I'm a bit lost. Any help will be appreciated!

Issue in extraction step

The .bam files ENCFF218ZEI, ENCFF511HSJ, ENCFF879UID are in my 'bam' solder in SLBP_analysis.
When I try to run command

htseq-clip extract -i bam/ENCFF218ZEI.bam -e 2 -s s -g -1 --primary -o sites/ENCFF218ZEI.bed

after successfully doing

cd /path/to/SLBP_analysis

This error message occurs:

[INFO]  run started at 2022-03-23 13:12
 [INFO]  Extracting start sites
 [INFO]  Bam file : bam/ENCFF218ZEI.bam, output file: sites/ENCFF218ZEI.bed, offset: -1
 [INFO]  Using sites/3jgbrnwq as tmp folder
[E::idx_find_and_load] Could not retrieve index file for 'bam/ENCFF218ZEI.bam'
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/clip.py", line 257, in main
    _extract(args)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/clip.py", line 71, in _extract
    with bamCLIP(args) as bh:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/bamCLIP.py", line 48, in __enter__
    self._bam_checker() # find all chromosomes in the given bam file
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/bamCLIP.py", line 86, in _bam_checker
    with pysam.AlignmentFile(self.fInput,mode='rb',check_sq=True,check_header=True,require_index=True) as _bh:
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 1016, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] No such file or directory

Do you have any advice? Thanks

Not working on MacOSX

When running some commands e.g.

htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed
htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 50 -s 20 -o SLBP_K562_w50s20.txt

The terminal comes out with the following error:

ImportError: cannot import name 'sched_getaffinity' from 'os' (/PATH/opt/anaconda3/envs/htseq-clip/lib/python3.7/os.py)

On inspection, 'os.py' is a file and 'os' as a folder within python3.7 does not exist. Online it has become clear that 'sched_getaffinity' is not a supported function on MacOSX and is available on Linux only. It appears therefore that MacOSX users are unable to run this code.

Please may someone correct me if I am wrong (I am not an expert at Python), or recommend what can be done to edit this code for Mac users? Thank you.

Does htseq-clip count respect splicing pattern?

Thank you for the tool!

I was wondering if htseq-clip createMatrix takes into account the splicing pattern of reads when counting crosslink sites?

How does the flag (0, 1, 2, 3) in htseq-clip annotation influence downstream processing in:
annotation -> createSlidingWindows -> mapToId -> extract -> count?
It this used for filtering at any step?

After Installing htseq-clip Can called from terminal

Thank you very much or maintenance

This is the error I got when I run

Traceback (most recent call last):
File "/home/nicolas/.local/bin/htseq-clip", line 11, in
sys.exit(main())
File "/home/nicolas/.local/lib/python3.6/site-packages/clip/command_line.py", line 4, in main
clip.main()
File "/home/nicolas/.local/lib/python3.6/site-packages/clip/clip.py", line 136, in main
subps = parser.add_subparsers(help='Need positional arguments',dest='subparser',required=True)
File "/usr/lib/python3.6/argparse.py", line 1716, in add_subparsers
action = parsers_class(option_strings=[], **kwargs)
TypeError: init() got an unexpected keyword argument 'required'

Thank you very much I am not an informatician by formation sorry in advance

Best

Nicolas

Error while trying to run count

Thanks a lot for making this tool available!

I am trying to run the count function, but I run into the following error:

[INFO]  run started at 2024-03-20 20:20
 [INFO]  Count crosslink sites
 [INFO]  Annotation file gencode.vM10.annotation.gff3.windows.txt.gz crosslink sites file SRR5335818Aligned.bed output file SRR5335818Aligned_count.txt
Traceback (most recent call last):
  File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/clip.py", line 784, in main
    _count(args)
  File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/clip.py", line 143, in _count
    countC = countCLIP(args)
  File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/countCLIP.py", line 94, in __init__
    self._annotationSanityCheck()
  File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/countCLIP.py", line 141, in _annotationSanityCheck
    raise ValueError("BED file line contains more than 9 fields")
ValueError: BED file line contains more than 9 fields

As input I am using the bed file that was generating using the extract function. The top 10 entries are like this:

head SRR5335818Aligned.bed
1 3340178 3340179 SRR5335818_BARCODE_ATCG_UMI_GACCACTCTT|25 1 -
1 4129998 4129999 SRR5335818_BARCODE_ATCG_UMI_TTGCTCTCCT|24 1 +
1 4599340 4599341 SRR5335818_BARCODE_ATCG_UMI_TTATTTTCAT|61 1 +
1 4599341 4599342 SRR5335818_BARCODE_ATCG_UMI_TCTGCGACAC|61 1 +
1 4777581 4777582 SRR5335818_BARCODE_ATCG_UMI_ATCGTGCCCT|28 1 +
1 4793900 4793901 SRR5335818_BARCODE_ATCG_UMI_ACATACCTCC|35 1 -
1 4825254 4825255 SRR5335818_BARCODE_ATCG_UMI_GGTATACCTA|36 1 -
1 4825358 4825359 SRR5335818_BARCODE_ATCG_UMI_CTTTTTCCAC|23 1 -
1 4827099 4827100 SRR5335818_BARCODE_ATCG_UMI_TACGCACCCA|31 1 -
1 4832471 4832472 SRR5335818_BARCODE_ATCG_UMI_GCTGTAGTTT|42 1 -

Any help would be greatly appreciated!

Nordin

Error activating htseq-clip.simg

Hi,

thanks for the useful tool! I met a problem when running the code which is the activation the .simg file. (/vol/hpctest not available)

The error info is as below:
Activating singularity image /home/global/tools/htseq-clip/htseq-clip.simg
WARNING: skipping mount of /vol/hpctest: no such file or directory
Error in rule do_extract_sites:
(check log file(s) for error message)

When trying 'singularity test htseq-clip.simg', I got the same error
WARNING: skipping mount of /vol/hpctest: no such file or directory

Since I'm using an HPC server, I don't have an administrator right and cannot visit, do you have any suggestions on how to modify or rebuild the .simg file? Or is there any other suggestion that I can run this tool?

Sincerely thank you,
Iris

May I ask why htseq-clip used first/second mate for pair-end crosslink sites extraction ?

Hi,

I am new to iCLIP/eCLIP analysis.

I have read the paper and document for htseq-clip and it will be very helpful for CLIP data anaylsis.

But I do not understand why htseq-clip used first/second mate for pair-end crosslink sites extraction.
May I ask for the reason ?

What will be disadvantage if it used both pair for pair-end ?

Thanks

Empty counts file

I followed the steps mentioned here to generate the counts file from my input bam file, however the counts file generated in completely empty.

Here is the sample code which I used:

htseq-clip annotation -g gencode.v21.annotation.gff3 -o gencode.v21.annotation.bed 
htseq-clip createSlidingWindows -i gencode.v21.annotation.bed -w 100 -s 20 v21_w100s20.txt
htseq-clip extract -i Merged_Gal_dedup.bam -e 1 -s e --primary -o Gal_R2_sites.bed
htseq-clip count -i Gal_R2_sites.bed -a v21_w100s20.txt

The output file generated is completely empty with just column names present. I tried to tweak some parameters in createSlidingWindows and used window sizes 25,50,100,300 and I also tried different parameters in the extract step, however I still get am empty file as an output.

Can you please help me in determining the right parameters for my use case? Thanks

embl-hentze-group / htseq-clip Goto Github PK

htseq-clip's People

Contributors

Stargazers

Watchers

Forkers

htseq-clip's Issues

How to set the -e/--mate parameter for non-strand-specific or single-end sequencing libraries?

Duplicates in unique_id column from mapToId function

OSError: [Errno 28] No space left on device while running htseq-clip count

Issue in extraction step

Not working on MacOSX

Does htseq-clip count respect splicing pattern?

After Installing htseq-clip Can called from terminal

Error while trying to run count

Error activating htseq-clip.simg

May I ask why htseq-clip used first/second mate for pair-end crosslink sites extraction ?

Empty counts file

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent