Hi there, I'm a big fan of HISAT2 but I struggle with the summary lo

Thank you for your suggestion, <a class="user-mention notranslate" data-hovercard-type

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you for your great work, <a class="user-mention notranslate" data-hovercard-type

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Suggestion: Improved logging about hisat2 HOT 14 CLOSED

daehwankimlab commented on August 15, 2024 3

Suggestion: Improved logging

from hisat2.

Comments (14)

infphilo commented on August 15, 2024 1

Thank you for your suggestion, @ewels

The suggested format looks great to me! I'll try to incorporate it in the next version of HISAT2. I'll be out of the country the rest of week and the next week, sorry for the brief response.

from hisat2.

ewels commented on August 15, 2024 1

Ok brilliant - thanks for the log changes and explanation 👍

Regarding the input filenames - no problem, I'll just take the log filename for now and hope that people name their logs after their samples :)

Phil

from hisat2.

infphilo commented on August 15, 2024 1

Thank you for your suggestion again, and developing MultiQC, a very powerful tool! :-)

from hisat2.

ewels commented on August 15, 2024 1

Hi @infphilo,

I've just written the new HISAT2 MultiQC module to work with the output from --new-summary, so it's now available in v1.1dev.

Thanks again,

Phil

from hisat2.

infphilo commented on August 15, 2024 1

Thank you for your great work, @ewels !

from hisat2.

ewels commented on August 15, 2024

Fantastic, thanks! Looking forward to it..

from hisat2.

lcolladotor commented on August 15, 2024

+1 ^^

from hisat2.

infphilo commented on August 15, 2024

It took me such a long time to implement your suggested output format due to multiple (very exciting) projects, job hunting, grant writing, etc.

How about the summary output format?
-- single-end reads --
Summary stats:
Total reads: 1000000
Aligned 0 time: 956 (0.10%)
Aligned 1 time: 957987 (95.80%)
Aligned >1 times: 41057 (4.11%)
Overall alignment rate: 99.90%

-- paired-end reads --
Summary stats:
Total pairs: 1000000
Aligned concordantly 0 time: 1116 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (4.57%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%

I also implemented a new option, --summary-file, to output the summary to a file (in addition to stderr).

from hisat2.

ewels commented on August 15, 2024

Hi @infphilo,

No problem - I know the feeling! Thanks for looking into this.

The output you suggest looks great... A couple of minor suggestions:

Could you change Summary stats: to HISAT2 Summary stats:? The addition of the specific HISAT2 string makes the output a lot easier to find programmatically.
If it's possible to print the input filenames that would be great. Some users concatenate stderr from multiple samples, then it's nice to have the input sample associated with the summary stats.

Cheers,

Phil

from hisat2.

ewels commented on August 15, 2024

ps. A question - one of the plots I'd like to make for MultiQC is a stacked bargraph showing how all of the input read pairs are aligned (eg. like this one). So what proportion are not aligned at all, what proportion have > 1 alignment and so on. However, it's not entirely clear to me how the numbers from your paired-end output can be summed:

Category	Number of Reads	Running Total
Total pairs	1000000
Aligned concordantly 0 time	1116	1116
Aligned concordantly 1 time	965412	966528
Aligned concordantly >1 times	33472	1000000
Aligned discordantly 1 time	51	1000051
Total unpaired reads	2130
Aligned 0 time	1057	1001108
Aligned 1 time	1057	1002165
Aligned >1 times	16	1002181

I assume that this is because reads pairs can be assigned to multiple categories. Or are some numbers sub-categories of others (eg. unpaired reads?). Is there a way to put this together into a stacked bar plot that your recommend?

Cheers,

Phil

from hisat2.

ewels commented on August 15, 2024

Ok, looking at this a little longer. I guess Aligned discordantly 1 time is part of Aligned concordantly 0 time, which is why the top part doesn't sum to 1000000. So I can subtract one from the other to make a new category and everything should add up.

Still a bit confused about where the 2130 Total unpaired reads come from though. Are they part of the 1000000 read pair input? Or did the input FastQ files somehow have 1000000 paired-end reads and 2130 single-end reads mixed together?

How does the Overall alignment rate take this into account? Presumably you have to come to a total number of aligned reads to calculate this.

Apologies if I'm being slow here..

Phil

from hisat2.

infphilo commented on August 15, 2024

Thank you - I just changed the log a bit as follows:

HISAT2 summary stats:
Total pairs: 1000000
Aligned concordantly or discordantly 0 time: 1065 (0.11%)
Aligned concordantly 1 time: 965412 (96.54%)
Aligned concordantly >1 times: 33472 (3.35%)
Aligned discordantly 1 time: 51 (0.01%)
Total unpaired reads: 2130
Aligned 0 time: 1057 (49.62%)
Aligned 1 time: 1057 (49.62%)
Aligned >1 times: 16 (0.75%)
Overall alignment rate: 99.95%

Below is a breakdown of some numbers, and Total unpaired reads are twice the number of unaligned pairs.

Total pairs (1000000) = Aligned concordantly or discordantly 0 time + Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time

Total unpaired reads (2130) = 2 * Aligned concordantly or discordantly 0 time

Overall alignment rate is number of aligned reads / number of total reads
= (2 * (Aligned concordantly 1 time + Aligned concordantly >1 times + Aligned discordantly 1 time) + Aligned 1 time + Aligned >1 times) / (2 * Total pairs)

from hisat2.

infphilo commented on August 15, 2024

I think adding input file names is also a good idea, but we'd like to have minimal summary output for now. We might add the additional info. in a later version of HISAT2.

from hisat2.

ewels commented on August 15, 2024

No problem at all, thanks for HISAT2! Open-source bioinformatics is great 😁 🌟

from hisat2.

Suggestion: Improved logging about hisat2 HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent