embl-hentze-group / htseq-clip Goto Github PK
View Code? Open in Web Editor NEWa toolset designed for the processing and analysis of eCLIP/iCLIP dataset
License: MIT License
a toolset designed for the processing and analysis of eCLIP/iCLIP dataset
License: MIT License
Hi,
I have a question regarding the -e/--mate parameter for the extract command.
Sometimes sequencing library is either non-strand-specific or single-end sequencing. According to the documentation, the -e/--mate parameter is used to select the read/mate to extract the crosslink sites from paired-end sequencing, with choices 1 or 2 (1 for the first mate and 2 for the second mate).
Could you please provide guidance on how to set this parameter for:
Non-strand-specific libraries
Single-end sequencing libraries
Thank you for your help!
Best regards,
A follow up of the issue opened by @connorrogerson in DEWSeq repo, reopening here as it turned out to be an htseq-clip issue.
Issue: duplicates in unique_id
column from mapToId
output, which caused DEWSeq to crash. The files causing this issue were generated as follows:
wget ftp:// ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz
I then just followed these steps:
htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed
htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 100 -s 50 -o SLBP_w100s50.txt
htseq-clip mapToId -a SLBP_w100s50.txt -o SLBP_w100s50_annotation.txt.gz
subset of the file SLBP_w100s50_annotation.subset.txt.gz
I'm working through the worked example from https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10 but I'm stuck at counting crosslinks due to an error.
Command:
htseq-clip count -i ../extract_xlink/1_xlink.bed -a /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_gencode_w50s20.txt -o 1_counts.csv
The job fails and the output is:
[INFO] run started at 2021-12-07 14:25
[INFO] Count crosslink sites
[INFO] Annotation file /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_gencode_w50s20.txt crosslink sites file ../extract_xlink/1_trimmed.bam.csv
[DEBUG] {} format: sliding window annotation file
Traceback (most recent call last):
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/clip.py", line 245, in main
_count(args)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/clip.py", line 102, in _count
countC.count(stranded)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/countCLIP.py", line 228, in count
with TempBed(self.annotation) as ta:
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/site-packages/clip/countCLIP.py", line 27, in enter
copyfile(self.bedfile,self.tmpBed)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/shutil.py", line 122, in copyfile
copyfileobj(fsrc, fdst)
File "/home/cjr78/miniconda3/envs/seq/lib/python3.7/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device
A quick check of df -h shows plenty of space so I'm a bit lost. Any help will be appreciated!
The .bam files ENCFF218ZEI, ENCFF511HSJ, ENCFF879UID are in my 'bam' solder in SLBP_analysis.
When I try to run command
htseq-clip extract -i bam/ENCFF218ZEI.bam -e 2 -s s -g -1 --primary -o sites/ENCFF218ZEI.bed
after successfully doing
cd /path/to/SLBP_analysis
This error message occurs:
[INFO] run started at 2022-03-23 13:12
[INFO] Extracting start sites
[INFO] Bam file : bam/ENCFF218ZEI.bam, output file: sites/ENCFF218ZEI.bed, offset: -1
[INFO] Using sites/3jgbrnwq as tmp folder
[E::idx_find_and_load] Could not retrieve index file for 'bam/ENCFF218ZEI.bam'
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/clip.py", line 257, in main
_extract(args)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/clip.py", line 71, in _extract
with bamCLIP(args) as bh:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/bamCLIP.py", line 48, in __enter__
self._bam_checker() # find all chromosomes in the given bam file
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/clip/bamCLIP.py", line 86, in _bam_checker
with pysam.AlignmentFile(self.fInput,mode='rb',check_sq=True,check_header=True,require_index=True) as _bh:
File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
File "pysam/libcalignmentfile.pyx", line 1016, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] No such file or directory
Do you have any advice? Thanks
When running some commands e.g.
htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed
htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 50 -s 20 -o SLBP_K562_w50s20.txt
The terminal comes out with the following error:
ImportError: cannot import name 'sched_getaffinity' from 'os' (/PATH/opt/anaconda3/envs/htseq-clip/lib/python3.7/os.py)
On inspection, 'os.py' is a file and 'os' as a folder within python3.7 does not exist. Online it has become clear that 'sched_getaffinity' is not a supported function on MacOSX and is available on Linux only. It appears therefore that MacOSX users are unable to run this code.
Please may someone correct me if I am wrong (I am not an expert at Python), or recommend what can be done to edit this code for Mac users? Thank you.
Thank you for the tool!
I was wondering if htseq-clip createMatrix takes into account the splicing pattern of reads when counting crosslink sites?
How does the flag (0, 1, 2, 3) in htseq-clip annotation influence downstream processing in:
annotation -> createSlidingWindows -> mapToId -> extract -> count?
It this used for filtering at any step?
Thank you very much or maintenance
This is the error I got when I run
Traceback (most recent call last):
File "/home/nicolas/.local/bin/htseq-clip", line 11, in
sys.exit(main())
File "/home/nicolas/.local/lib/python3.6/site-packages/clip/command_line.py", line 4, in main
clip.main()
File "/home/nicolas/.local/lib/python3.6/site-packages/clip/clip.py", line 136, in main
subps = parser.add_subparsers(help='Need positional arguments',dest='subparser',required=True)
File "/usr/lib/python3.6/argparse.py", line 1716, in add_subparsers
action = parsers_class(option_strings=[], **kwargs)
TypeError: init() got an unexpected keyword argument 'required'
Thank you very much I am not an informatician by formation sorry in advance
Best
Nicolas
Thanks a lot for making this tool available!
I am trying to run the count function, but I run into the following error:
[INFO] run started at 2024-03-20 20:20
[INFO] Count crosslink sites
[INFO] Annotation file gencode.vM10.annotation.gff3.windows.txt.gz crosslink sites file SRR5335818Aligned.bed output file SRR5335818Aligned_count.txt
Traceback (most recent call last):
File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/clip.py", line 784, in main
_count(args)
File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/clip.py", line 143, in _count
countC = countCLIP(args)
File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/countCLIP.py", line 94, in __init__
self._annotationSanityCheck()
File "/Users/nordin/.pyenv/versions/3.7.17/envs/iCount_project/lib/python3.7/site-packages/clip/countCLIP.py", line 141, in _annotationSanityCheck
raise ValueError("BED file line contains more than 9 fields")
ValueError: BED file line contains more than 9 fields
As input I am using the bed file that was generating using the extract function. The top 10 entries are like this:
head SRR5335818Aligned.bed
1 3340178 3340179 SRR5335818_BARCODE_ATCG_UMI_GACCACTCTT|25 1 -
1 4129998 4129999 SRR5335818_BARCODE_ATCG_UMI_TTGCTCTCCT|24 1 +
1 4599340 4599341 SRR5335818_BARCODE_ATCG_UMI_TTATTTTCAT|61 1 +
1 4599341 4599342 SRR5335818_BARCODE_ATCG_UMI_TCTGCGACAC|61 1 +
1 4777581 4777582 SRR5335818_BARCODE_ATCG_UMI_ATCGTGCCCT|28 1 +
1 4793900 4793901 SRR5335818_BARCODE_ATCG_UMI_ACATACCTCC|35 1 -
1 4825254 4825255 SRR5335818_BARCODE_ATCG_UMI_GGTATACCTA|36 1 -
1 4825358 4825359 SRR5335818_BARCODE_ATCG_UMI_CTTTTTCCAC|23 1 -
1 4827099 4827100 SRR5335818_BARCODE_ATCG_UMI_TACGCACCCA|31 1 -
1 4832471 4832472 SRR5335818_BARCODE_ATCG_UMI_GCTGTAGTTT|42 1 -
Any help would be greatly appreciated!
Nordin
Hi,
thanks for the useful tool! I met a problem when running the code which is the activation the .simg file. (/vol/hpctest not available)
The error info is as below:
Activating singularity image /home/global/tools/htseq-clip/htseq-clip.simg
WARNING: skipping mount of /vol/hpctest: no such file or directory
Error in rule do_extract_sites:
(check log file(s) for error message)
When trying 'singularity test htseq-clip.simg', I got the same error
WARNING: skipping mount of /vol/hpctest: no such file or directory
Since I'm using an HPC server, I don't have an administrator right and cannot visit, do you have any suggestions on how to modify or rebuild the .simg file? Or is there any other suggestion that I can run this tool?
Sincerely thank you,
Iris
Hi,
I am new to iCLIP/eCLIP analysis.
I have read the paper and document for htseq-clip and it will be very helpful for CLIP data anaylsis.
But I do not understand why htseq-clip used first/second mate for pair-end crosslink sites extraction.
May I ask for the reason ?
What will be disadvantage if it used both pair for pair-end ?
Thanks
I followed the steps mentioned here to generate the counts file from my input bam file, however the counts file generated in completely empty.
Here is the sample code which I used:
htseq-clip annotation -g gencode.v21.annotation.gff3 -o gencode.v21.annotation.bed
htseq-clip createSlidingWindows -i gencode.v21.annotation.bed -w 100 -s 20 v21_w100s20.txt
htseq-clip extract -i Merged_Gal_dedup.bam -e 1 -s e --primary -o Gal_R2_sites.bed
htseq-clip count -i Gal_R2_sites.bed -a v21_w100s20.txt
The output file generated is completely empty with just column names present. I tried to tweak some parameters in createSlidingWindows
and used window sizes 25,50,100,300 and I also tried different parameters in the extract
step, however I still get am empty file as an output.
Can you please help me in determining the right parameters for my use case? Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.