Comments (5)
Thanks for having a deep look @jakobnissen
When you say no material difference, do you mean specifically for VAMB?
Definitely think that secondary alignments have their uses - would be good to include them for a better relative_abundance calculation for instance.
from coverm.
Hi,
Thanks for the kind words, glad it is useful.
Without answering your question fully, I'm wondering whether this is a practical or just theoretical issue. The BAM format states that exactly 1 mapping for each (mapped) read has to marked as primary. So mapping algorithms tend to use your option (A) there i.e. assign one as a primary alignment.
Obviously non-primary alignments hold information, but are you seeing some specific case where it is definitely the root cause?
Thanks, ben
from coverm.
Dear @wwood
We have now investigated more closely, and it appears that at least when using minimap2 to map short reads, there is no material difference in the relative abundances between contigs reported by CoverM when including up to 20 secondary hits using CoverM, and having CoverM ignore all the non-primary hits, respectively.
The ratio remains more or less the same despite the fact that the presence of secondary reads cause a large difference in the absolute values of the CoverM output abundances.
This is probably because, in the face of multiple equally good alignments, minimap2 assigns the primary alignment randomly. Hence, when contigs are long and depth is high, the large number of reads per contig means that the sampling effect from minimap2 randomly selecting a subset of alignments as being primary alignments is small, and the computed depth is proportial to the true depth.
Notably, this might not be the case for other mappers than minimap2, nor necessarily with contigs created by fewer, longer reads, where the sampling effect is significant. It may also not necessarily be the case when you have multiple reads of differing identity - in our case, we had lots of hits with 100% identity. Nonetheless, I'm closing the issue.
from coverm.
Yes, to go a little more in detail, Vamb (and other binners) derive much of their signal from co-abundance. I wanted to investigate how multimapping reads might throw off co-abundance e.g. if aligners aggregated multimapping reads to a few select references, skewing the abundance signal.
We can compute the co-abundance signal by measuring Pearson correlation of abundances across all samples for two contigs, and then we find that the presence or absence of secondary alignments in the BAM files makes little (essentially, no) difference to the co-abundance signal, when the abundances are computed by CoverM. To trick CoverM to not ignore secondary alignments, we've simply unset the "secondary alignment" flag for all the secondary alignments in the BAM file.
This is despite the actual abundance values changing quite a bit. So what I think is happening is that minimap2, by randomly assigning one of multiple alignments to be primary if they have the same score, and CoverM only measuring the primary alignments, it works effectively like a random sub-sampling of the alignments.
from coverm.
Dear @jakobnissen
I'd also like to trick CoverM not to ignore secondary alignments.
Would you let me know how I can unset the "secondary alignment" flag in the BAM file?
Thanks.
from coverm.
Related Issues (20)
- Read minimum length & small RNA mapping HOT 3
- A question on RPKM calculation HOT 2
- Checkm2 tsv for `coverm cluster --checkm-tab-table` HOT 1
- Coverm filter inverse paired reads HOT 4
- Question about TPM HOT 2
- Calculations of min-read-percent-identity and min-read-aligned-length HOT 2
- Zero covered bases and mean coverage but non-zero reads mapping
- --contig-end-exclusion doesn't work with -m not set to mean HOT 2
- CoverM reports near 0 reads mapped despite almost all reads seemingly being mapped HOT 5
- Extracting bam files HOT 1
- CoverM to assess genomes relative proportion in Metatranscriptomics data?
- How can I provide extra parameters to BWA using the "--bwa-params" option? HOT 3
- [2023-06-04T07:45:16Z INFO coverm::contig] In sample 'cdhit_rep_seq.fna/SRR13083091_1.fq.gz', found 0 reads mapped out of 0 total (NaN%) HOT 3
- libtinfow.so.6: no version information available (required by samtools) HOT 2
- Q: coverM compatible with bowtie2 HOT 1
- Usage of --sharded
- default rep seq picking method HOT 2
- Unable to find BAM file when file exists HOT 2
- thread 'main' panicked at 'index out of bounds: the len is 6564 but the index is 6586' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coverm.