Comments (13)
I am benchmarking currently, hope that will shed some light on it.
from ampliseq.
Hi there,
I actually never looked into that. Therefore I am not aware of a solution or whether it could be implemented. Maybe someone else could chime in.
For more context, could you describe why that would help you/what you would gain? E.g. whats your usecase? Might that be of more general interest as well?
from ampliseq.
Thanks for the swift reply! The reason for the request is that while comparing 16S results from databases like SILVA, with whole genome-based methods (with NCBI Taxonomy used), one stumbles upon a problem of outdated taxonomic labels of SILVA and discrepansies between OTUs and genomic taxonomy.
Since SILVA stores information on 16S gene primary accession in GenBank (and from what I see some of the ampliseq database files do that too), it is possible to use it for finding what's the NCBI taxonomy assigned to the gene - which is likely more up to date and in line with whole genomes' taxonomy.
from ampliseq.
I see. The outdated taxonomies could be probably improved that way.
Regarding comparing it with whole genome-based methods, one could either use GTDB via --dada_ref_taxonomy gtdb
(the corresponding shotgun metagenomics assembly classifier can be used in e.g. nf-core/mag) or Kraken2 with the standard
database using --kraken2_ref_taxonomy standard
(which seems to work just fine in preliminary benchmarks).
from ampliseq.
Great recommendations!
I've launched GTDB-based classification with DADA2, however I see some crucial taxa are not detected further than phylum/class/family level, even though genus/species level gets assigned to the same ASV with SILVA. It seems to me that the GTDB database have a different content on the sequence level than the default SILVA, and hence the classification results differ?
Would it be the same case for Kraken2 database?
from ampliseq.
Databases are non-trivial to compare, so if you do not find a "crucial" taxa, turn to another one.
Another reason could be that classification with DADA2 is not always same, some taxa close to cutoffs can fall below or raise above said cutoff because of tiny number alterations (that are outside of my control, fixed seed doesnt help). So running the same taxonomic classification with DADA2 multiple times can lead to missing or added taxonomic levels when around the confidence threshold.
Would it be the same case for Kraken2 database?
I dont know, you would need to test.
from ampliseq.
IMO, "sbdi-gtdb" is better than "gtdb" as we know there are rRNA-sequences in the GTDB collection that are assigned to the wrong species. "sbdi-gtdb" is phylogenetically vetted to remove these.
from ampliseq.
Thanks for all the suggestions - indeed I gave --kraken2_ref_taxonomy standard
and --dada_ref_taxonomy sbdi-gtdb
a go and while the latter definitely brought some of the results closer on the genus level, phylum level taxonomy still seems to be a jungle in comparison to whole-genome GTDB: sometimes I see Bacteroidota, but sometimes I see Firmicutes. Proteobacteria should be Pseudomonadota so this label is probably also not up to date.
I didn't expect it to be such a challenge to benchmark the technologies, looks like it requires a lot of manual research to map the taxonomic labels correspondence, otherwise while plotted one next to another the data looks like the results were completely different.
from ampliseq.
Thank you so much Daniel!
Now that I think about it, it may be a matter of different GTDB versions? For full-genome methods, we use the latest 214 release. As far as I can see, SBDI is tied to 207 release, which means I see Bacillota_A in shotgun, but Firmicutes_A in ampliseq results - the major phyla names change is probably not accounted for in v 207?
from ampliseq.
Thank you so much Daniel!
Now that I think about it, it may be a matter of different GTDB versions? For full-genome methods, we use the latest 214 release. As far as I can see, SBDI is tied to 207 release, which means I see Bacillota_A in shotgun, but Firmicutes_A in ampliseq results - the major phyla names change is probably not accounted for in v 207?
That's it. I'm working on SBDI-GTDB 08RS214, and soon release 09 (when that's released, likely in late April).
from ampliseq.
That's precious. If I set up the repository to track the releases, will it be enough to be notified when it becomes available?
from ampliseq.
That's precious. If I set up the repository to track the releases, will it be enough to be notified when it becomes available?
New releases of databases are included in new releases of the pipeline itself, so yes. Hopefully, I'm done with the next release in time for Ampliseq 2.10.
from ampliseq.
That would be fantastic - thanks for all the work, I'm hitting the Watch
button then :) Good luck!
from ampliseq.
Related Issues (20)
- 12S taxonomic classification databases HOT 3
- Does the `gtdb` database only include Bacteria? HOT 5
- Remove PhytoRef as it's included in PR2 5.0.0 HOT 1
- Template update 2.13.1 HOT 1
- Barrnap filtration HOT 4
- ERROR ~ Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_MERGE .........has been broken HOT 3
- ANCOM-BC for differentially abundant taxa HOT 1
- ERROR: R_HOME ('/usr/local/lib/R') not found HOT 5
- Misleading error message when samples are not passing filterandtrim HOT 1
- DADA2 split regions singularity HOT 13
- multi-region analysis: sidle/reconstructed/reconstructed_merged.tsv OCCATIONALLY mis-formatted HOT 1
- Running test error HOT 1
- Abundance plots for qiime2 results without metadata provided HOT 3
- Adding ONT read support for ampliseq HOT 2
- `overall_summary.tsv` sometimes with misleading numbers in 2.9.0 HOT 11
- Analyse data set that contains unknown primer set HOT 5
- Cutadapt with "-u" instead of "fw/rv_primer seq" HOT 1
- There is no qiime2 result file in the results HOT 2
- Misleading text in output documentation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ampliseq.