Giter VIP home page Giter VIP logo

cellbase's People

Contributors

antonior26 avatar dapregi avatar frasator avatar imedina avatar j-coll avatar javild avatar jperflo avatar jtarraga avatar juanfesanahuja avatar juanrizetta avatar julie-sullivan avatar kevinpetersavage avatar marnau avatar marrobi avatar mbleda avatar mbsimonovic avatar melsiddieg avatar pabarcgar avatar pfurio avatar phamidko avatar swaathik avatar wbari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cellbase's Issues

QueryOptions functionality to be enabled in the variant annotation WS

Several functionalities are required.

Filters:

  • geneset=gencode_basic: must only annotate against genecode-basic genes. Genecode-basic genes can be identified by looking at the gencode gtf:

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

There is a tag="basic" for each gencode-basic transcript. Tasks:
1.- gencode gtf has to be downloaded
2.- The list of gencode-basic transcript ids (ENSTxxx) must be loaded within the GeneParser into a HashSet.
3.- GeneParser will include a new annotationFlag "basic" for all parsed genecode-basic transcripts
4.- getAllConsequenceTypesByVariantList at VariantAnnotationMongoDBAdaptor will check that flag before proceeding to annotate the variant

  • so=term1,term2,term3: getAnnotationByVariantList shall only return the annotation for those variants which present any of these so terms.

Includes:

  • include={variation,clinical,consequence,conservation}: to allow enabling only certain annotation types.

Consequence Type calculation

A new method is needed to calculate the consequence type from SNV variants. This will be part of the Variant Annotation new functionality.
The behaviour must be as similar as Ensembl VEP as possible

New species web service

Would be great to implement a ws with all species information.
This is an example of the response. More information can be added.

{
    "taxonomies":[
        {
            "name":"vertebrates",
            "species":[
                {
                    "text":"Homo Sapiens",
                    "assembly":"GRCh38",
                    "chromosomes":[
                        {
                            "name": "5",
                            "isCircular": 0,
                            "size": 180915260,
                            "end": 180915260,
                            "start": 1,
                            "cytobands": [
                                {
                                    "stain": "acen",
                                    "name": "p11.1",
                                    "end": 17600000,
                                    "start": 16100001
                                }
                            ]
                        }
                    ]
                }
            ]
        },
        {
            "name":"metazoa",
            "species":[

            ]
        }
    ]
}

Fix or remove outdated tests

Some unit tests do not pass because they are outdated or they are based in local paths (/home/.... ). Fix those tests.

Create DB Loaders

Create a "load" interface in cellbase-core module. This interface will define the operations to load the data models, created by cellbase-app 'build' command, into a database.

A MongoDB implementation of this interface should be implemented in cellbase-mongodb module.

Fixes for ExonWSServer

The following Exon WS should be implemented (currently not working):

  • /{version}/{species}/feature/exon/{exonId}/info
  • /{version}/{species}/feature/exon/{exonId}/region
  • /{version}/{species}/feature/exon/{exonId}/sequence
  • /{version}/{species}/feature/exon/{exonId}/transcript

Interesting but not urgent:

  • /{version}/{species}/feature/exon/{exonId}/aminos

/{version}/{species}/feature/exon/{exonId}/bysnp should not be there and has been marked as Deprecated.

New Variant Effect model

Until we complete the implementation of the new Variant Effect classes, the commit 3b92d0c (7th May, branch ebi-develop) does not allow to compile OpenCGA nor EVA.

Commit 5da5e51 must be used until then.

CLI should return database stats

This new option must return the collections installed for one species together with the indexes created and number documents. Other info may be also useful to be returned

New cellbase-mongodb module

Currently MySQL-Hibernate implementation is found in cellbase-core. To offer a more modular implementation and to have a plugin oriented framework the interfaces (cellbase-core) must be implemented in a different module, so a cellbase-mongodb module must be created for MongoDB

ClinVar WS should query the clinical collection

ClinVar WS are now querying the ClinVar collection. ClinVar is also loaded within the clinical collection. Only one ClinVar copy will remain, the one within the clinical collection, and all queries should point to this one.

Fixes for GeneWSServer

The following Exon WS should be implemented (currently not working):

  • /{version}/{species}/feature/gene/{geneId}/tfbs
  • /{version}/{species}/feature/gene/{geneId}/mirna_target
  • /{version}/{species}/feature/gene/{geneId}/reactome

/{version}/{species}/feature/gene/{geneId}/protein returns the PPIs for the specified gene. We should rename this WS to ppi or protein_interaction.

Would be also interesting to create a proper /{version}/{species}/feature/gene/{geneId}/protein WS returning UniProt information for this gene.

Improve documentation

Documentation needs to be significantly improved: building, architecture, REST calls

Migration to MongoDB

NoSQL databases offer a higher performance and scalability. Document oriented database MongoDB fits very well for Cellbase needs. A new implementation based on MongoDB needs to be done.

New Variant Annotation functionality

New variant annotation functionality can be implemented, this will return all the known information about a variant in CellBase: consequence type #26 , conservation, ...
Data models must be added to biodata-models.

RefSeq parser

To add RefSeq parser method in GeneParser, data must be loaded together with Ensembl gene set

Fixes for SnpWSServer

The following SNP WS should be implemented (currently not working):

  • /{version}/{species}/feature/snp/{snpId}/consequence_type
  • /{version}/{species}/feature/snp/{snpId}/population_frequency
  • /{version}/{species}/feature/snp/{snpId}/xref

Interesting but not urgent:

  • /{version}/{species}/feature/snp/{snpId}/sequence
  • /{version}/{species}/feature/snp/{snpId}/regulatory

List of deprecated WS:

  • /{version}/{species}/feature/snp/{snpId}/consequence_types
  • /{version}/{species}/feature/snp/{snpId}/phenotypes

Swagger integration

In order to have a better documentation Swagger must be integrated and configured

Reimplement parsers to follow a more general ETL model

Some parsers will be reimplemented so that they generate a general data model stored in a json object. 'Loaders' will be implemented which will transform data into an appropriate an efficient format for the specific DBMS (e.g. MongoDB), as well as will load them into the DB. The objective is to obtain a data model which contains all the information regardless of the specific implementation for a given DBMS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.