Dear CoverM developers, We're happy users of CoverM, which we use in

Thanks for having a deep look <a class="user-mention notranslate" data-hovercard-type=

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Feature request: Handle secondary alignments and new release about coverm HOT 5 CLOSED

jakobnissen commented on July 30, 2024

Feature request: Handle secondary alignments and new release

from coverm.

Comments (5)

wwood commented on July 30, 2024 1

Thanks for having a deep look @jakobnissen

When you say no material difference, do you mean specifically for VAMB?

Definitely think that secondary alignments have their uses - would be good to include them for a better relative_abundance calculation for instance.

from coverm.

wwood commented on July 30, 2024

Hi,

Thanks for the kind words, glad it is useful.

Without answering your question fully, I'm wondering whether this is a practical or just theoretical issue. The BAM format states that exactly 1 mapping for each (mapped) read has to marked as primary. So mapping algorithms tend to use your option (A) there i.e. assign one as a primary alignment.

Obviously non-primary alignments hold information, but are you seeing some specific case where it is definitely the root cause?

Thanks, ben

from coverm.

jakobnissen commented on July 30, 2024

Dear @wwood

We have now investigated more closely, and it appears that at least when using minimap2 to map short reads, there is no material difference in the relative abundances between contigs reported by CoverM when including up to 20 secondary hits using CoverM, and having CoverM ignore all the non-primary hits, respectively.
The ratio remains more or less the same despite the fact that the presence of secondary reads cause a large difference in the absolute values of the CoverM output abundances.
This is probably because, in the face of multiple equally good alignments, minimap2 assigns the primary alignment randomly. Hence, when contigs are long and depth is high, the large number of reads per contig means that the sampling effect from minimap2 randomly selecting a subset of alignments as being primary alignments is small, and the computed depth is proportial to the true depth.

Notably, this might not be the case for other mappers than minimap2, nor necessarily with contigs created by fewer, longer reads, where the sampling effect is significant. It may also not necessarily be the case when you have multiple reads of differing identity - in our case, we had lots of hits with 100% identity. Nonetheless, I'm closing the issue.

from coverm.

jakobnissen commented on July 30, 2024

Yes, to go a little more in detail, Vamb (and other binners) derive much of their signal from co-abundance. I wanted to investigate how multimapping reads might throw off co-abundance e.g. if aligners aggregated multimapping reads to a few select references, skewing the abundance signal.
We can compute the co-abundance signal by measuring Pearson correlation of abundances across all samples for two contigs, and then we find that the presence or absence of secondary alignments in the BAM files makes little (essentially, no) difference to the co-abundance signal, when the abundances are computed by CoverM. To trick CoverM to not ignore secondary alignments, we've simply unset the "secondary alignment" flag for all the secondary alignments in the BAM file.

This is despite the actual abundance values changing quite a bit. So what I think is happening is that minimap2, by randomly assigning one of multiple alignments to be primary if they have the same score, and CoverM only measuring the primary alignments, it works effectively like a random sub-sampling of the alignments.

from coverm.

ilnamkang commented on July 30, 2024

Dear @jakobnissen

I'd also like to trick CoverM not to ignore secondary alignments.

Would you let me know how I can unset the "secondary alignment" flag in the BAM file?

Thanks.

from coverm.

Feature request: Handle secondary alignments and new release about coverm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent