opencb / cellbase Goto Github PK

High-Performance NoSQL database and RESTful web services to access to most relevant biological data

License: Apache License 2.0

Perl 1.62% Shell 0.23% Python 2.70% HTML 0.16% CSS 1.40% JavaScript 37.04% Java 55.74% R 0.76% Jupyter Notebook 0.23% Dockerfile 0.07% Mustache 0.05% Smarty 0.01%

cellbase's People

Contributors

Stargazers

Watchers

cellbase's Issues

QueryOptions functionality to be enabled in the variant annotation WS

Several functionalities are required.

Filters:

geneset=gencode_basic: must only annotate against genecode-basic genes. Genecode-basic genes can be identified by looking at the gencode gtf:

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

There is a tag="basic" for each gencode-basic transcript. Tasks:
1.- gencode gtf has to be downloaded
2.- The list of gencode-basic transcript ids (ENSTxxx) must be loaded within the GeneParser into a HashSet.
3.- GeneParser will include a new annotationFlag "basic" for all parsed genecode-basic transcripts
4.- getAllConsequenceTypesByVariantList at VariantAnnotationMongoDBAdaptor will check that flag before proceeding to annotate the variant

so=term1,term2,term3: getAnnotationByVariantList shall only return the annotation for those variants which present any of these so terms.

Includes:

include={variation,clinical,consequence,conservation}: to allow enabling only certain annotation types.

ChromosomeMongoDBAdapator runtime errors

Several errors are raised because new 'datastore' library integration.

Cellbase server war file should not be deployed at Central repository

Deploying war files should be avoided. Maven pom.xml files need to be properly configured

Consequence Type calculation

A new method is needed to calculate the consequence type from SNV variants. This will be part of the Variant Annotation new functionality.
The behaviour must be as similar as Ensembl VEP as possible

Improve VariationParser performance

VariationParser takes too much time to generate, a new strategy is needed to improve performance

New species web service

Would be great to implement a ws with all species information.
This is an example of the response. More information can be added.

{
    "taxonomies":[
        {
            "name":"vertebrates",
            "species":[
                {
                    "text":"Homo Sapiens",
                    "assembly":"GRCh38",
                    "chromosomes":[
                        {
                            "name": "5",
                            "isCircular": 0,
                            "size": 180915260,
                            "end": 180915260,
                            "start": 1,
                            "cytobands": [
                                {
                                    "stain": "acen",
                                    "name": "p11.1",
                                    "end": 17600000,
                                    "start": 16100001
                                }
                            ]
                        }
                    ]
                }
            ]
        },
        {
            "name":"metazoa",
            "species":[

            ]
        }
    ]
}

Fix or remove outdated tests

Some unit tests do not pass because they are outdated or they are based in local paths (/home/.... ). Fix those tests.

Write query tutorial

Write clear guidelines for using the query command of the CLI

Create DB Loaders

Create a "load" interface in cellbase-core module. This interface will define the operations to load the data models, created by cellbase-app 'build' command, into a database.

A MongoDB implementation of this interface should be implemented in cellbase-mongodb module.

Species CLI parameter should not be a List

There is no the need of passing different species here:
https://github.com/opencb/cellbase/blob/develop/cellbase-app/src/main/java/org/opencb/cellbase/app/cli/CliOptionsParser.java#L96

Different species can be executed in different executions. This will make the code a bit simpler without losing any real functionality.

Add population frequencies to Variation collection

Variation document must contain population frequencies, this can be obtained from EVA datasets

Fixes for ExonWSServer

The following Exon WS should be implemented (currently not working):

/{version}/{species}/feature/exon/{exonId}/info
/{version}/{species}/feature/exon/{exonId}/region
/{version}/{species}/feature/exon/{exonId}/sequence
/{version}/{species}/feature/exon/{exonId}/transcript

Interesting but not urgent:

/{version}/{species}/feature/exon/{exonId}/aminos

/{version}/{species}/feature/exon/{exonId}/bysnp should not be there and has been marked as Deprecated.

Return Uniprot's functional description of variants with variant annotation

Uniprot's data is already integrated in CellBase. Link functional description of the variants with the vriant annotation WS

New Variant Effect model

Until we complete the implementation of the new Variant Effect classes, the commit 3b92d0c (7th May, branch ebi-develop) does not allow to compile OpenCGA nor EVA.

Commit 5da5e51 must be used until then.

Ensembl Perl scripts should not use registry.conf

There is a mechanism in Ensembl Perl to avoid passing a huge registry file, this will avoid maintaining this file and will make CLI simpler since no parameter is needed for the registry file

CLI should return database stats

This new option must return the collections installed for one species together with the indexes created and number documents. Other info may be also useful to be returned

Move cellbase to a module architecture

CellBase must make use of Maven modules to offer a bigger modularity and reduce dependencies loaded.

CellbaseClient should be able to call the POST WS for variant annotation

Currently, CellbaseClient can only call the GET WS for variant annotation. Include an option to allow making calls to the POST WS, thereby enabling sending bigger variant batches

New cellbase-mongodb module

Currently MySQL-Hibernate implementation is found in cellbase-core. To offer a more modular implementation and to have a plugin oriented framework the interfaces (cellbase-core) must be implemented in a different module, so a cellbase-mongodb module must be created for MongoDB

ClinVar WS should query the clinical collection

ClinVar WS are now querying the ClinVar collection. ClinVar is also loaded within the clinical collection. Only one ClinVar copy will remain, the one within the clinical collection, and all queries should point to this one.

Many third party dependencies need to be upgraded

Some dependencies are using old versions such as Jackson, SQLite or Jersey, these need to be upgraded and tested

Fixes for GeneWSServer

The following Exon WS should be implemented (currently not working):

/{version}/{species}/feature/gene/{geneId}/tfbs
/{version}/{species}/feature/gene/{geneId}/mirna_target
/{version}/{species}/feature/gene/{geneId}/reactome

/{version}/{species}/feature/gene/{geneId}/protein returns the PPIs for the specified gene. We should rename this WS to ppi or protein_interaction.

Would be also interesting to create a proper /{version}/{species}/feature/gene/{geneId}/protein WS returning UniProt information for this gene.

ChromosomeMongoDBAdapator method use aggregation instead of elemMatch

Method 'getAllByIdList' uses a complex aggregation when a much more simple elemMatch could be used. Also, currently 'supercontigs' are also returned:

http://www.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/genomic/chromosome/13/info?of=json

Improve documentation

Documentation needs to be significantly improved: building, architecture, REST calls

Use new NIO from Java 7+

To create directories and other file system actions a new NIO API was developed in Java 7, this must used, e.g.:

https://github.com/opencb/cellbase/blob/develop/cellbase-app/src/main/java/org/opencb/cellbase/app/cli/DownloadCommandParser.java#L363

New DisGeNET data source to be included

DisGeNET database need to be downloaded and included:

http://www.disgenet.org/web/DisGeNET/v2.1

A new collection gene_disease_association must be created.

Write a Download and Build wiki tutorial

A tutorial for downloading data soruces and building the data models is needed:

https://github.com/opencb/cellbase/wiki/Download-and-Build-Data-Models

Migration to MongoDB

NoSQL databases offer a higher performance and scalability. Document oriented database MongoDB fits very well for Cellbase needs. A new implementation based on MongoDB needs to be done.

Implementation of new CLI using JCommander

New CLI must be implemented using JCommander, the available commands are: download, build, load and query

Module cellbase-build must be renamed to cellbase-app

New module app will accept different command such as download, build and query

cellbase-server starts_with webservice not working

http://www.ebi.ac.uk/cellbase/webservices/rest/latest/hsa/feature/id/BRCA2/starts_with?of=json

Returns null instead of a QueryResponse json serialized object.

Make CellBase DBAdaptors use datastore repository

Remove from cellbase adaptors direct uses of the mongoDB drivers. Use datastore functionality instead.

New ClinVar WS to query by gene symbol (HGNC)

The WS must use the ClinicalMongoDBAdaptor and query the

referenceClinVarAssertion.measureSet.measure.measureRelationship.symbol

field within the ClinVar record.

Add a WS for biological interactions

Reorganize configuration info (properties files)

Move species list and DB configuration info to the cellbase-server application.properties
Create Config object able to contain all relevant configuration info needed by cellbase-mongodb

IntAct database integration

PPI from IntAct must be added, data models must be created in biodata-models

CLI fails when executed from outside the root directory

cellbase.sh should work when executed from any directory in the system.

New Variant Annotation functionality

New variant annotation functionality can be implemented, this will return all the known information about a variant in CellBase: consequence type #26 , conservation, ...
Data models must be added to biodata-models.

RefSeq parser

To add RefSeq parser method in GeneParser, data must be loaded together with Ensembl gene set

HGVS shall be returned as part of the variant annotation

Transcript HGVS shall be calculated and included within the VariantAnnotation object

Fixes for SnpWSServer

The following SNP WS should be implemented (currently not working):

/{version}/{species}/feature/snp/{snpId}/consequence_type
/{version}/{species}/feature/snp/{snpId}/population_frequency
/{version}/{species}/feature/snp/{snpId}/xref

Interesting but not urgent:

/{version}/{species}/feature/snp/{snpId}/sequence
/{version}/{species}/feature/snp/{snpId}/regulatory

List of deprecated WS:

/{version}/{species}/feature/snp/{snpId}/consequence_types
/{version}/{species}/feature/snp/{snpId}/phenotypes

UniProt integration

UniProt database needs to be integrated in CellBase

cellbase-server latest/species not working correctly

The web service:
http://www.ebi.ac.uk/cellbase/webservices/rest/latest/species?of=json
does not show the species correctly, it returns repeated species in different formats.

Swagger integration

In order to have a better documentation Swagger must be integrated and configured

Generate json schemas with jackson

Some schemas should be defined, using JSON Schemas seems the simplest approach

New Gene Expression Atlas data source to be included

Add Gene Expression Atlas data to the knowledgebase. Implement corresponding code for the:

Downloader
Builder
Loader
WSs

Reimplement parsers to follow a more general ETL model

Some parsers will be reimplemented so that they generate a general data model stored in a json object. 'Loaders' will be implemented which will transform data into an appropriate an efficient format for the specific DBMS (e.g. MongoDB), as well as will load them into the DB. The objective is to obtain a data model which contains all the information regardless of the specific implementation for a given DBMS.

opencb / cellbase Goto Github PK

cellbase's People

Contributors

Stargazers

Watchers

Forkers

cellbase's Issues

Recommend Projects

Recommend Topics

Recommend Org