Giter VIP home page Giter VIP logo

conekt_grasses_microbiome's People

Contributors

santosrac avatar

Stargazers

 avatar

Watchers

 avatar  avatar

conekt_grasses_microbiome's Issues

Add classification method cannot find GG record by taxon path

In the otu_classification model, the following function fails to find the taxonomic path using the filter_by function, even though path are the same in uploaded files (source of GG taxonomic information and the file with associations between OTU names and their taxon GG paths);

This query returns None in GGTaxon.query.filter_by(taxon_path=path).first().

Report associated with commit 04cc611

Currently, the function is associating all OTUs with a single GG record (pk 1).

Adapt expression field in expression profile JSON text

In order to adapt CoNekT Grasses Microbiome to incorporate/accept different types of expression normalization methods in expression profiles, a column was added (normalization_method). However, the JSON string still has a 'tpm' field and several functions use this information. Changes must be made to all functions using expression profiles to adapt to this change.

Improve ASV definition

ASV methods can be associated with literature, but this is not mandatory, since ASVs can be independent of a single experiment.

Implement function to get NCBI taxid from GreenGenes taxonomy path

For datasets like Wallace et al. (2018), GreenGenes are used for taxonomic classification of units (e.g., OTUs). However, in the older GG releases there is not a NCBI taxonomy id associated with each record. If we need to connect SILVA/GG or even Phytozome records (e.g., grass species), we need a function that generates such a mapping between NXBI taxid and Greengenes.

Implement OTU profiles

Operational Taxonomic Units (or OTUs) will be mostly analyzed with their profiles (counts or transformed counts). Two models (tables) will store information associated with OTU profiles:

  • OTUProfileMethod - unit (e.g., raw counts or cpm), method used to generate values, study (in CoNekT Grasses Microbiome, a study is defined by admins and corresponds to the set of samples/runs used in downstream analyses and that will be exploited by end users), information about any applied filtering steps etc.
  • OTUProfile - linked to method (above), consists of values and their associated sequencing runs

Update the features page

CoNekT Grasses Microbiome will have less transcriptome functions and much more of microbiome and microbiome-transcriptome integration (eventually there will be integration between microbiome and other omics in grasses).

The features page must be updated to reflect all functions that were implemented during @SantosRAC internship in the US.

Implement NCBI taxonomy

I am currently implementing NCBI taxonomy in CoNekT Grasses Microbiome. I believe this is something we should keep synchronized (Plant Cell Wall Knowledgebase, PAGED and CoNekT Grasses Microbiome). @khidalgo85 and @JoaoNovoletti, let's talk about it during the JavaScript meeting.

This initial version of NCBI taxonomy will include only Bacteria, Plants and Fungi, Unassigned and Environmental samples. This is done by filtering NCBI divisions (IDs: 0, 4, 8, 11)

Revise OTU profile and run associations

I am currently not sure we need to table with associations between OTU profiles and sequencing runs (profiles already have such information in JSON).

Revision required.

Implement Generic Taxonomy Method class

Implement a class that will handle release information (e.g., SILVA 138 or Greengenes 13_5).

Entries in NCBI, SILVA and GreenGenes tables (currently NCBITaxon, SILVATaxon and GGTaxon) must have an associated method id.

Implement OTU classification

To enable end users to exploit OTU taxonomic information, the following models must be implemented:

  • OTUClassificationMethod - method used for classification, reference database and release information, additional information if necessary
  • OTUClassification - association between a particular OTU and NCBI/SILVA/GreenGenes record, by a particular classification method (described in the above model)

A connection between these tables and NCBI/SILVA taxonomy must exist as well.

When adding taxonomic classification to OTUs in CoNekT Grasses Microbiome for GreenGenes, however, imported OTUs may not have a full path matching exactly how it appears in the GG table. And, as far as I know, there is not a mapping file between GreenGenes and NCBI Taxonomy ID.

Possible solution for the problem with GreenGenes: create function that get the lowest rank possible for an imported OTU, then try to partially match records in GreenGenes (to make sure a valid assignment will be made). In addition, it would be nice to have a NCBI taxon id associated with these records (like what SILVA has) - therefore, this function could also find the NCBI taxon ID for this lowest rank taxon (e.g., genus or species whenever it is possible).

Add functions to internally compute correlations (integration)

Add functions to internally compute correlations between OTU/ASV counts (transformed) and gene expression (transformed). After importing expression profiles and ASV/OTU counts, admins must be able to compute the correlations for pairs of gene - ASV/OTU.

Implementation involves:

Implement GTDB database and associated functions

Create:

  • tables for the GTDB basis
  • tables with association with the classification of OTUs and ASVs in the platform
  • functions for classification of representative sequences for OTUs and ASVs (instead of importing the classification from studies, an alternative could be to have GTDB classify those sequences)

Some links:

Implement connections to data in CoNekt Grasses

Users must be able to access gene annotation and expression data (e.g., expression profiles and networks) for a grass species in CoNekT Grasses. For this purpose, gene names will be common between both platforms and derive from common sources (e.g. Phytozome).

Create scripts to add/build as alternative to admin panel

Scripts are used to build CoNekT Grasses Microbiome without the need of someone adding/building using the admin panel. This enables the development of automated pipelines to fill the platform with data (we have done this for CoNekT Grasses). Scripts are located in the scripts/ dir.

Ensure all SILVA records have valid NCBI taxid

All SILVA records must have an associated NCBI taxid. Therefore,

  • Foreign key associated NCBI taxid (SILVA table vs. NCBI table)
  • It must have ondelete "SET NULL" to allow deletion of NCBI data if a new taxonomy is added, BUT NCBI taxid must be ensured for all SILVA records during its import

Enable multigene and multimicrobe networks in custom transcriptome-microbiome network

A initial version of custom network of gene and OTUs is available (option "Draw Network of Correlations").

However, it still needs some improvements:

  • Fix legend (currently not being displayed correctly)
  • Enable dynamic change of what is being shown in network edges
  • Add more information to nodes (links to profiles and additional information the group believes is important)

Morever, still need to change the form to enable inclusion of multiple genes and/or multiple OTUs in the form that generates the network (and warn users if genes or microbes [OTUs/ASVs] in form are not available in correlation pairs).

Implement OTU (model) and associated methods

OTU tables (counts) will be permitted in CoNekT Grasses Microbiome.

Main models in mind:

  • OperationalTaxonomicUnitMethod - will describe methods used for generation of OTUs (e.g., clustering algorithms, threshold, reference database if open or closed reference method is used, reference db release to enable reproducible analyses). It will also link to available literature, if available (e.g., Wallace et al. 2018 is the first to be included with OTUs).
  • OperationalTaxonomicUnit - will mainly store the representative sequence for the OTU, defined by method (OperationalTaxonomicUnitMethod above)

Models and methods will serve as basis for implementation of further functions (e.g., generate/import profiles, OTU classification, associations/correlations between transcriptome and microbiome).

  • ⚠ Importantly, using OTUs will only be enabled in studies comprising a single literature!

Update description in admin pages

All admin functionalities for adding data (e.g., sequences, species, OTUs etc), building (e.g., run correlations between OTUs and gene expression) must be well documented. It will be easier for maintenance by LabBCES members (or other projects derived from CoNekT Grasses Microbiome).

Well-documented admin pages will also make it easier to build scripts for building /filling CoNekT Grasses Microbiome with information and computing correlations and other things internally in automated pipelines (e.g., Snakemake or Nextflow).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.