Giter VIP home page Giter VIP logo

tripal_genetic's Introduction

Tripal Genetic

This repository is meant to house a best practices and API focused Tripal extension module in the future. Currently it is an indication of intent and a location to begin discussions on what is most helpful for this module to contain.

Possible scopes / goals

Each of the following would be an optional submodule housed in this repository. In this way, we can combine maintainence efforts and work together to make a more cohesive set of tools while also ensuring that no specific Tripal site needs to enable any more functionality then they need to reduce bloat.

1. Provide an API for managing the genotype_call chado table.

There are multiple groups currently using the genotype_call table to manage their genotypic data (e.g. KnowPulse, TreeGenes/CartograPlant, MainLab consortium of Tripal sites). Each group has their own set of needs and materialized views for interacting with this table. The proposed API provided by this module would allow modules using it to define their needs and this module would provide materialized view sync functionality optimized for the large datasets typically handled by the genotype_call table.

2. Provide a Tripal importer for Genotypic data stored in VCF files.

There is currently a Genotypes Loader module developed and maintained by the University of Saskatchewan, Pulse Bioinformations group. There is potential for moving that module into this repository and the original developers are happy to take into account different data storage conventions when upgrading this module to Tripal 4.

3. Support for Genetic Maps and associated data.

There is a beautiful Tripal Mapviewer module developed and maintained by Main Lab Bioinformatics. While this specific module is likely too complex for inclusion as a submodule here, it would be nice to provide a set of data storage best practice Tripal importers that are compatible with that tool. This submodule could also provide a place for Tripal fields related to genetic maps which are beyond the scope of TripalMap.

4. Create content types and provide Tripal Fields to enhance pages.

According to Tripal Issue #281, core Tripal is considering removing the existing content typer creation and data type specific fields out to community-driven extensions focused on each data realm. This would be one such module. The current core Tripal Content Types we would look to take over creation of are:

Category Label Term Name Term Accession Ontology
Genetic Genetic Map Genetic map data:1278 EDAM
Genetic QTL QTL SO:0000771 Sequence
Genetic Sequence Variant sequence_variant SO:0001060 Sequence
Genetic Genetic Marker genetic_marker SO:0001645 Sequence
Genetic Heritable Phenotypic Marker heritable_phenotypic_marker SO:0001500 Sequence

tripal_genetic's People

Contributors

laceysanderson avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tripal_genetic's Issues

Genotypes API: genotype_call table management

There was a discussion in Slack that a general API that could handle management of materialized views related to the genotype_call table would be useful.

In the discussion I suggested that an API to manage the following could reduce duplicated effort and allow us all to benefit for optimization for large data sets.

  • Create materialized view tables and indices (support partitioning in a number of different ways and allow other tools to define the materialized view they use)
  • Create the genotype_call table itself (not yet included in core Chado) for a consistent definition and indices.
  • New Tripal 4 BioTasks for sync'ing these materialized views. BioTasks have more fine-grained control over locks and provide the flexibility needed to optimize this process (i.e. multiple queries, chunking of data to be added, truncating existing data, etc.)
  • Extension of the new Tripal DBX to provide simplified querying which is aware of partitions.

In order to support the very different needs of different tools using the genotype_call table, this API would provide a means for tools to describe the materialized view name, columns, composition, queries, indices, optimization approaches, etc to be used by the API. It is understood that a single best practice materialized view for this type of data is not possible as with large datasets it is important to cater to the specific composition of the data and needs of the tool in order to be performant.

This API is NOT trying to force us all to use the same materialized views or even optimization approaches. Rather it is trying to provide all tools with a set of optimization approaches which can be selected from to support each tool optimally.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.