Comments (3)
Thanks for using KMCP.
If the output table contains more than one ref per species based on which parameter should we choose the best hit?
The real genome in samples may match more than one reference, we can't tell which one is the truth. But the similarity score (column score
, the 90th percentile of k-mer coverage of all uniquely matched reads) may be an index to show which one is more similar to the real genome.
According to your manual the percentage column refers to Relative abundance of the reference however, we are not sure how this value is calculated. Could you give us more details about this metric?
First, the coverage (column coverage
) of each matched reference genome is computed by dividing the total bases of matched reads
with the genome size
(the total bases of either complete genome or unfinished genomes like MAGs with plasmid sequences filtered out). Then the relative abundance of one species is computed by dividing the sum of genome coverages of this species
with the sum of genome coverages of all genomes
. At last, the relative abundances of taxa at each rank are the sum of percentages of all the child taxa.
from kmcp.
thank you for the swift reply, if we have several refs with a score of 100 what would be the second metric to use to filter them? would coverage be a good one to use?
from kmcp.
I think so.
from kmcp.
Related Issues (20)
- KMCP database building tutorial HOT 3
- Dealing with novel/non-sequenced species HOT 2
- long read metagenomic profiling HOT 2
- suitable for CDS and/or contig taxonomic assignment? HOT 2
- Masking prophages in bacterial genomes before building database as Phanta does HOT 2
- How to specify multiple kmer values HOT 2
- Add a tutorial of detecting specific pathogen in sequencing data HOT 1
- Detecting closest reference in custom DB HOT 4
- Report statistics of matched, unmatched reads HOT 1
- KMCP's MetaPhlAn output doesn't follow the MetaPhlAn file format HOT 3
- [Suggestion] Use score calibration when identifying proviruses and plasmids HOT 2
- ETA missing when building KMCP index HOT 4
- Merge error (number of fields < query index field) HOT 3
- Optimizing KMCP with HumGut HOT 9
- Kmcp profile empty HOT 2
- TODO: save the search result into a serializing binary file for fast downstream parsing HOT 3
- Availability of old gtdb databases? HOT 5
- coverage is greater than 100 HOT 1
- kmcp search is stucked HOT 4
- How to profile results only identify 1 species per reference HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kmcp.