santosrac / conekt_grasses_microbiome Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
In the otu_classification model, the following function fails to find the taxonomic path using the filter_by function, even though path are the same in uploaded files (source of GG taxonomic information and the file with associations between OTU names and their taxon GG paths);
This query returns None in GGTaxon.query.filter_by(taxon_path=path).first()
.
Report associated with commit 04cc611
Currently, the function is associating all OTUs with a single GG record (pk 1).
Is it useful to enable the creation of studies involving multiple species? Currently, we have a study in maize leaves (Wallace et al. 2018) and another one in rice roots (Santos-Medellin et al. 2022) that have paired microbiome and transcriptome sequencing data.
In order to adapt CoNekT Grasses Microbiome to incorporate/accept different types of expression normalization methods in expression profiles, a column was added (normalization_method). However, the JSON string still has a 'tpm' field and several functions use this information. Changes must be made to all functions using expression profiles to adapt to this change.
ASV methods can be associated with literature, but this is not mandatory, since ASVs can be independent of a single experiment.
For datasets like Wallace et al. (2018), GreenGenes are used for taxonomic classification of units (e.g., OTUs). However, in the older GG releases there is not a NCBI taxonomy id associated with each record. If we need to connect SILVA/GG or even Phytozome records (e.g., grass species), we need a function that generates such a mapping between NXBI taxid and Greengenes.
Operational Taxonomic Units (or OTUs) will be mostly analyzed with their profiles (counts or transformed counts). Two models (tables) will store information associated with OTU profiles:
CoNekT Grasses Microbiome will have less transcriptome functions and much more of microbiome and microbiome-transcriptome integration (eventually there will be integration between microbiome and other omics in grasses).
The features page must be updated to reflect all functions that were implemented during @SantosRAC internship in the US.
Currently, CoNekT Grasses Microbiome inherits InterPro and Gene Ontology from CoNekT, as well as CAZYmes from CoNekT Grasses (I implemented recently). However, other functional information might be important when analyzing microbiome. This is something we definitely need to discuss, @khidalgo85 .
I am currently implementing NCBI taxonomy in CoNekT Grasses Microbiome. I believe this is something we should keep synchronized (Plant Cell Wall Knowledgebase, PAGED and CoNekT Grasses Microbiome). @khidalgo85 and @JoaoNovoletti, let's talk about it during the JavaScript meeting.
This initial version of NCBI taxonomy will include only Bacteria, Plants and Fungi, Unassigned and Environmental samples. This is done by filtering NCBI divisions (IDs: 0, 4, 8, 11)
I am currently not sure we need to table with associations between OTU profiles and sequencing runs (profiles already have such information in JSON).
Revision required.
Implement a class that will handle release information (e.g., SILVA 138 or Greengenes 13_5).
Entries in NCBI, SILVA and GreenGenes tables (currently NCBITaxon, SILVATaxon and GGTaxon) must have an associated method id.
To enable end users to exploit OTU taxonomic information, the following models must be implemented:
A connection between these tables and NCBI/SILVA taxonomy must exist as well.
When adding taxonomic classification to OTUs in CoNekT Grasses Microbiome for GreenGenes, however, imported OTUs may not have a full path matching exactly how it appears in the GG table. And, as far as I know, there is not a mapping file between GreenGenes and NCBI Taxonomy ID.
Possible solution for the problem with GreenGenes: create function that get the lowest rank possible for an imported OTU, then try to partially match records in GreenGenes (to make sure a valid assignment will be made). In addition, it would be nice to have a NCBI taxon id associated with these records (like what SILVA has) - therefore, this function could also find the NCBI taxon ID for this lowest rank taxon (e.g., genus or species whenever it is possible).
Add functions to internally compute correlations between OTU/ASV counts (transformed) and gene expression (transformed). After importing expression profiles and ASV/OTU counts, admins must be able to compute the correlations for pairs of gene - ASV/OTU.
Implementation involves:
Studies could be created using different literature, plant species, etc. What about considering multiple taxonomic units (ASVs and OTUs) ? It seems that depending on the sequencing quality we are not able to use ASVs (e.g., Wallace et al 2018)
All functions implemented in CoNekT Grasses Microbiome must have associated test data.
Current plans involve including rice and maize data from two manuscripts, respectively:
Details about test data are being included in the Sphinx docs
folder and the actual data are being included in the tests/data
Create:
Some links:
Admins could either select species from (must enable autocomplete in flask form) or a verification step must be included to check if species name match "scientific name" in NCBI taxonomy table.
Users must be able to access gene annotation and expression data (e.g., expression profiles and networks) for a grass species in CoNekT Grasses. For this purpose, gene names will be common between both platforms and derive from common sources (e.g. Phytozome).
Scripts are used to build CoNekT Grasses Microbiome without the need of someone adding/building using the admin panel. This enables the development of automated pipelines to fill the platform with data (we have done this for CoNekT Grasses). Scripts are located in the scripts/
dir.
All SILVA records must have an associated NCBI taxid. Therefore,
A initial version of custom network of gene and OTUs is available (option "Draw Network of Correlations").
However, it still needs some improvements:
Morever, still need to change the form to enable inclusion of multiple genes and/or multiple OTUs in the form that generates the network (and warn users if genes or microbes [OTUs/ASVs] in form are not available in correlation pairs).
OTU tables (counts) will be permitted in CoNekT Grasses Microbiome.
Main models in mind:
Models and methods will serve as basis for implementation of further functions (e.g., generate/import profiles, OTU classification, associations/correlations between transcriptome and microbiome).
Preliminary list of classes and functions to be updated/reviewed:
Taxonomy-related classes and functions:
Sequencing run/experiment-related classes and functions:
Metataxonomics-related classes and functions:
Implement Solr search engine
All admin functionalities for adding data (e.g., sequences, species, OTUs etc), building (e.g., run correlations between OTUs and gene expression) must be well documented. It will be easier for maintenance by LabBCES members (or other projects derived from CoNekT Grasses Microbiome).
Well-documented admin pages will also make it easier to build scripts for building /filling CoNekT Grasses Microbiome with information and computing correlations and other things internally in automated pipelines (e.g., Snakemake or Nextflow).
GreenGenes provides an internal id and taxonomic paths for all their records (e.g., version 13_5).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.