Comments (14)
Thank you for your suggestion, @ewels
The suggested format looks great to me! I'll try to incorporate it in the next version of HISAT2. I'll be out of the country the rest of week and the next week, sorry for the brief response.
from hisat2.
Ok brilliant - thanks for the log changes and explanation 👍
Regarding the input filenames - no problem, I'll just take the log filename for now and hope that people name their logs after their samples :)
Phil
from hisat2.
Thank you for your suggestion again, and developing MultiQC, a very powerful tool! :-)
from hisat2.
Hi @infphilo,
I've just written the new HISAT2 MultiQC module to work with the output from --new-summary
, so it's now available in v1.1dev
.
Thanks again,
Phil
from hisat2.
Thank you for your great work, @ewels !
from hisat2.
Fantastic, thanks! Looking forward to it..
from hisat2.
+1 ^^
from hisat2.
It took me such a long time to implement your suggested output format due to multiple (very exciting) projects, job hunting, grant writing, etc.
How about the summary output format?
-- single-end reads --
Summary stats:
Total reads: 1000000
Aligned 0 time: 956 (0.10%)
Aligned 1 time: 957987 (95.80%)
Aligned >1 times: 41057 (4.11%)
Overall alignment rate: 99.90%
-- paired-end reads --
Summary stats:
Total pairs: 1000000
Aligned concordantly 0 time: 1116 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (4.57%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%
I also implemented a new option, --summary-file, to output the summary to a file (in addition to stderr).
from hisat2.
Hi @infphilo,
No problem - I know the feeling! Thanks for looking into this.
The output you suggest looks great... A couple of minor suggestions:
- Could you change
Summary stats:
toHISAT2 Summary stats:
? The addition of the specificHISAT2
string makes the output a lot easier to find programmatically. - If it's possible to print the input filenames that would be great. Some users concatenate stderr from multiple samples, then it's nice to have the input sample associated with the summary stats.
Cheers,
Phil
from hisat2.
ps. A question - one of the plots I'd like to make for MultiQC is a stacked bargraph showing how all of the input read pairs are aligned (eg. like this one). So what proportion are not aligned at all, what proportion have > 1 alignment and so on. However, it's not entirely clear to me how the numbers from your paired-end output can be summed:
Category | Number of Reads | Running Total |
---|---|---|
Total pairs | 1000000 | |
Aligned concordantly 0 time | 1116 | 1116 |
Aligned concordantly 1 time | 965412 | 966528 |
Aligned concordantly >1 times | 33472 | 1000000 |
Aligned discordantly 1 time | 51 | 1000051 |
Total unpaired reads | 2130 | |
Aligned 0 time | 1057 | 1001108 |
Aligned 1 time | 1057 | 1002165 |
Aligned >1 times | 16 | 1002181 |
I assume that this is because reads pairs can be assigned to multiple categories. Or are some numbers sub-categories of others (eg. unpaired reads?). Is there a way to put this together into a stacked bar plot that your recommend?
Cheers,
Phil
from hisat2.
Ok, looking at this a little longer. I guess Aligned discordantly 1 time is part of Aligned concordantly 0 time, which is why the top part doesn't sum to 1000000
. So I can subtract one from the other to make a new category and everything should add up.
Still a bit confused about where the 2130
Total unpaired reads come from though. Are they part of the 1000000
read pair input? Or did the input FastQ files somehow have 1000000
paired-end reads and 2130
single-end reads mixed together?
How does the Overall alignment rate take this into account? Presumably you have to come to a total number of aligned reads to calculate this.
Apologies if I'm being slow here..
Phil
from hisat2.
Thank you - I just changed the log a bit as follows:
HISAT2 summary stats:
Total pairs: 1000000
Aligned concordantly or discordantly 0 time: 1065 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (0.01%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%
Below is a breakdown of some numbers, and Total unpaired reads are twice the number of unaligned pairs.
Total pairs (1000000) = Aligned concordantly or discordantly 0 time + Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time
Total unpaired reads (2130) = 2 * Aligned concordantly or discordantly 0 time
Overall alignment rate is number of aligned reads / number of total reads
= (2 * (Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time) + Aligned 1 time + Aligned >1 times) / (2 * Total pairs)
from hisat2.
I think adding input file names is also a good idea, but we'd like to have minimal summary output for now. We might add the additional info. in a later version of HISAT2.
from hisat2.
No problem at all, thanks for HISAT2! Open-source bioinformatics is great 😁 🌟
from hisat2.
Related Issues (20)
- hisat2 hangs aligning axolotl reads HOT 1
- Output files(.snp, .haplotype) of hisat2_extract_snps_haplotypes_*.py are empty
- Please add the pbat option of hisat-3n
- A question about methylation information extraction
- Any plans to support Apple Silicon architecture? HOT 1
- Installation Issue Error 1 - make HOT 1
- -np argument seemingly not working
- ERR): "fastq file.fastq" does not exist. Exiting now ...
- [Bug Report] hisat2-align exited with value 137, space complexity of hisat2
- hisat2 location does not exist
- Hisat-3N mapping quality
- hisat2-build index for circRNA-seq
- hisat2-build failed for Segmentation fault
- [Future request] hisat-3n table option to report conversions summarized to genomic feature or reads counts
- Issue with hisatgenotype HOT 1
- Mapping using different parameters --very-sensitive and default
- (ERR): "ref.genome" does not exist Exiting now ...
- --directional-mapping-reverse vs. --rna-strandness on HISAT3N
- Question about calculation of base counts in hisat3Ntable
- mkfifo failed error and change $temp_dir HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hisat2.