Comments (3)
Hi Daehwan:
This does occur when mapping reads to the entire human genome, but I don't
observe this strange splicing pattern in other loci (though I'm not sure
I'd know).
ChrUn_gl000220 contains the 45S pre-ribosomal RNA loci. These can be up to
90% of all transcripts in the cell, and individuals can have many repeats of
these loci. Both GRCh38 and 37 only represent 45S rRNA loci in an unplaced contig containing 2
repeats. Reads from many other loci are probably compressed onto these two
repeats.
I will try the --no-temp-splicesite and --dta-cufflinks options to see what I get.
Right now I'm happy with mapping these reads via bowtie and doing the rest
via Hisat, but it's a bit of a cumbersome workflow.
I wonder if Hisat could do a better job by increasing the cost of gap
extension (or whatever is similar for hisat). That might get it to use the
nearest possible splice site instead of the most closely matching.
from hisat2.
Here is a small set of reads that illustrates the problem.
I baited pairs using mirabait and http://hgdownload.cse.ucsc.edu/goldenpath/hg19/snp138Mask/chrUn_gl000220.fa (requiring 6 31mers) in either read, then sampled 50k pairs with ngs-tools
Then I aligned the reads to hg19 using the attached refseq gtf as a guide and in directional mode where i could specify these.
undepleted_chrUn_gl000220.1.fastq.gz
undepleted_chrUn_gl000220.2.fastq.gz
encode_and_RM_rRNA.merged.interval.gz
hg19_genes.gtf.gz
Compared with the larger dataset above, the Tophat "false" splice junctions are less obvious, and star more evenly distributes the reads across the two loci (don't understand this... but that's a different rabbit hole).
from hisat2.
Hi @bwlang,
Thank you for your detailed information (and sorry for the late response). It looks like chrUn_gl000220.fa contains many repeats (as shown in lowercase) and sequences of low complexity, which could mainly explain why HISAT2 reports many spliced alignments. Does this issue generally happen across the whole human genome or on this particular sequence (chrUn_gl000220)?
With HISAT2, we can be more conservative regarding spliced alignment using --no-temp-splicesite and --dta-cufflinks options, but they could have negative impacts on the alignment of non-ribosome RNA reads.
Thanks,
Daehwan
from hisat2.
Related Issues (20)
- hisat2 hangs aligning axolotl reads HOT 1
- Output files(.snp, .haplotype) of hisat2_extract_snps_haplotypes_*.py are empty
- Please add the pbat option of hisat-3n
- A question about methylation information extraction
- Any plans to support Apple Silicon architecture? HOT 1
- Installation Issue Error 1 - make HOT 1
- -np argument seemingly not working
- ERR): "fastq file.fastq" does not exist. Exiting now ...
- [Bug Report] hisat2-align exited with value 137, space complexity of hisat2
- hisat2 location does not exist
- Hisat-3N mapping quality
- hisat2-build index for circRNA-seq
- hisat2-build failed for Segmentation fault
- [Future request] hisat-3n table option to report conversions summarized to genomic feature or reads counts
- Issue with hisatgenotype HOT 1
- Mapping using different parameters --very-sensitive and default
- (ERR): "ref.genome" does not exist Exiting now ...
- --directional-mapping-reverse vs. --rna-strandness on HISAT3N
- Question about calculation of base counts in hisat3Ntable
- mkfifo failed error and change $temp_dir HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hisat2.