Giter VIP home page Giter VIP logo

Comments (5)

wwood avatar wwood commented on July 30, 2024 1

Thanks for having a deep look @jakobnissen

When you say no material difference, do you mean specifically for VAMB?

Definitely think that secondary alignments have their uses - would be good to include them for a better relative_abundance calculation for instance.

from coverm.

wwood avatar wwood commented on July 30, 2024

Hi,

Thanks for the kind words, glad it is useful.

Without answering your question fully, I'm wondering whether this is a practical or just theoretical issue. The BAM format states that exactly 1 mapping for each (mapped) read has to marked as primary. So mapping algorithms tend to use your option (A) there i.e. assign one as a primary alignment.

Obviously non-primary alignments hold information, but are you seeing some specific case where it is definitely the root cause?

Thanks, ben

from coverm.

jakobnissen avatar jakobnissen commented on July 30, 2024

Dear @wwood

We have now investigated more closely, and it appears that at least when using minimap2 to map short reads, there is no material difference in the relative abundances between contigs reported by CoverM when including up to 20 secondary hits using CoverM, and having CoverM ignore all the non-primary hits, respectively.
The ratio remains more or less the same despite the fact that the presence of secondary reads cause a large difference in the absolute values of the CoverM output abundances.
This is probably because, in the face of multiple equally good alignments, minimap2 assigns the primary alignment randomly. Hence, when contigs are long and depth is high, the large number of reads per contig means that the sampling effect from minimap2 randomly selecting a subset of alignments as being primary alignments is small, and the computed depth is proportial to the true depth.

Notably, this might not be the case for other mappers than minimap2, nor necessarily with contigs created by fewer, longer reads, where the sampling effect is significant. It may also not necessarily be the case when you have multiple reads of differing identity - in our case, we had lots of hits with 100% identity. Nonetheless, I'm closing the issue.

from coverm.

jakobnissen avatar jakobnissen commented on July 30, 2024

Yes, to go a little more in detail, Vamb (and other binners) derive much of their signal from co-abundance. I wanted to investigate how multimapping reads might throw off co-abundance e.g. if aligners aggregated multimapping reads to a few select references, skewing the abundance signal.
We can compute the co-abundance signal by measuring Pearson correlation of abundances across all samples for two contigs, and then we find that the presence or absence of secondary alignments in the BAM files makes little (essentially, no) difference to the co-abundance signal, when the abundances are computed by CoverM. To trick CoverM to not ignore secondary alignments, we've simply unset the "secondary alignment" flag for all the secondary alignments in the BAM file.

This is despite the actual abundance values changing quite a bit. So what I think is happening is that minimap2, by randomly assigning one of multiple alignments to be primary if they have the same score, and CoverM only measuring the primary alignments, it works effectively like a random sub-sampling of the alignments.

from coverm.

ilnamkang avatar ilnamkang commented on July 30, 2024

Dear @jakobnissen

I'd also like to trick CoverM not to ignore secondary alignments.

Would you let me know how I can unset the "secondary alignment" flag in the BAM file?

Thanks.

from coverm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.