Giter VIP home page Giter VIP logo

server's Introduction

http://genomicsandhealth.org/files/logo_ga.png

GA4GH Reference Implementation

Join the chat at https://gitter.im/ga4gh/server

This is the development version of the GA4GH reference implementation. If you would like to install the stable version of the server, please see the instructions on the PyPI page.

The server is currently under heavy development, and many aspects of the layout and APIs will change as requirements are better understood. If you would like to help, please check out our list of issues!

The latest bleeding-edge documentation is available at read-the-docs.org.

  • For a quick start with the GA4GH API, please see our demo.
  • To configure and deploy the GA4GH server in production please see the installation page.
  • If you would like to contribute to the project, please see the development page.

server's People

Contributors

adamnovak avatar afirth avatar almussel avatar andrewjesaitis avatar andrewyatz avatar bjea avatar bwalsh avatar cassiedoll avatar david4096 avatar dcolligan avatar gabrielsaldana avatar hershman avatar jeromekelleher avatar jmarshall avatar kerrydc avatar kozbo avatar macieksmuga avatar melaniedc avatar mollyzhang avatar naburimannu avatar ohsu-machineuser avatar palfrey avatar pashields avatar pcingola avatar peterolph avatar saupchurch avatar sguthrie avatar shajoezhu avatar sidrahussain avatar skeenan avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

server's Issues

update and test

Recently squashed commits - can you update github and re-run all nosetests to ensure all is well?

G2P '/genotypephenotype/search' experiences

G2P '/genotypephenotype/search' experiences

Summary

We extended the GA4GH Reference server to include a the '/genotypephenotype/search' endpoint. This document describes the experience and makes some targeted suggestions for improvements, primarily for the request payload.

Approach

We based our work on the model captured in ga4gh/schemas commit of Jul 30, 2015. This version of the schema predates the separated genotype to phenotype files from baseline.

The code was based on a branch setup for this purpose by the server team.
No major refactoring of the server was needed, additional code was added to ga4gh/backend.py,ga4gh/frontend.py and test/unit/test_views.py

Data

The cancer genome database Clinical Genomics Knowledge Base published by the Monarch project was the source of Evidence.

image

API

The GA4GH schemas define a single endpoint /genotypephenotype/search which accepts a POST of a request body containing one or more of Feature, PhenotypeInstance, EnvironmentalContext, and Evidence which are combined as a logical AND to query the underlying datastore. Missing types are treated as a wildcard returning all data. Responses of matching data are returned as a list of FeaturePhenotypeAssociation. All types rely heavily on OntologyTerm

Request

image
http://yuml.me/edit/bf06b90a

Response

image

http://yuml.me/edit/25343da1


Implementation

image

http://yuml.me/c97fada2


Issues

Query by example

There are four datatypes types for each entity [string, external identifier, ontology identifier and 'entity'].
Currently the implementation handles queries of [string, external identifier and ontology identifier].

The 'entity' query is a type of query-by-example has been deferred. Challenges that arose:

  • schema constraints: there are several fields within the schemas that are defined as non-null. This may be fine when creating an entity from a data store, however, they are problematic when creating an entity to be used in a query.
  • additional discussions needed to determine what properties from an existing entity will be used for the query and which will be ignored. For example a Feature has [id,parentIds, featureSetId, referenceName, start,end, strand, featureType, attributes] we need to specify exactly what the query's expectations are.

Ontology Queries

  • The 'ontologySource' is assumed to be equivalent to an Ontologies 'prefix'. However, no agreement or mechanism exists to align ontologySource to specific. Recommend collapsing ontologySource and identifier into a single URI

Name collision (SearchFeaturesResponse)

That schema contains two definitions of the class SearchFeaturesResponse. How are these handled in the generated code in _protocol_definitions.py? (Currently I only see one)

The schema project the current server is based on is version = '0.6.be171b00'
Snippets from this commit follow

  • One in the file genotypephenotypemethods.avdl, protocol GenotypePhenotypeMethods
/** This is the response from `POST /genotypephenotype/search` expressed as JSON. */
record SearchFeaturesResponse {
  /**
  The list of matching FeaturePhenotypeAssociation.
  */
  array<org.ga4gh.models.FeaturePhenotypeAssociation> associations = [];

  ...
  • The second one is found in sequenceAnnotationmethods.avdl
  /** This is the response from `POST /features/search` expressed as JSON. */
  record SearchFeaturesResponse {
    /**
    The list of matching annotations, sorted by start position. Annotations which
    share a start position are returned in a deterministic order.
    */
    array<org.ga4gh.models.Feature> features = [];

    ... 
  • The generated code only has the class associated with sequenceAnnotationmethods.avdl
    def __init__(self):
        self.features = []
        self.nextPageToken = None

Both sequenceAnnotationmethods.avdl and genotypephenotypemethods.avdl share the same namespace @namespace("org.ga4gh.methods") each file defines an enclosing protocol.

In the names section of the spec

A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol. For example, if "name": "X" is specified, and this occurs within a field of the record definition of org.foo.Y, then the fullname is org.foo.X. If there is no enclosing namespace then the null namespace is used.

I'm assuming that the schemas pass validation...

A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used ("before" in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come "before" the messages attribute.)

TODO

Pull Request Prep

General clean up.  Additional Tests.

MS Literome adapter

Create a facade to interact with MS:Literome.  See http://literome.azurewebsites.net 

CIViC Client

angular UI and node reverse proxy

Literome Feedback

Allow API to accept optional diseaseOrDrug, return first 100 potential associations

http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2
{"ClassName":"System.ArgumentException","Message":"'diseaseOrDrug' cannot be empty.",...}

image

Accept dbSNP ids on par with gene name

http://literome.azurewebsites.net/gwas/get?snporgene=rs80359550&diseaseordrug=Breast%20Diseases
{"Associations":[],"Abstracts":[]}

Allow disease name flexibility

http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2&diseaseordrug=Breast%20Diseases
{"Associations":[{"SnpOrGeneType":....}
http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2&diseaseordrug=Breast%20Disease
{"Associations":[],"Abstracts":[]}

Accept entrez id for gene

http://literome.azurewebsites.net/gwas/get?snporgene=675&diseaseordrug=Breast%20Diseases
{"Associations":[{"SnpOrGeneType":....}

Use Drug ontology ids

DiseaseOrDrugId: "PA443559"

PA443559 equivalent_to http://www.ncbi.nlm.nih.gov/mesh/D001941

Problems setting up project

I'm trying to set up a development environment with Docker following the current documentation at http://ga4gh-reference-implementation.readthedocs.org/en/latest/installation.html#deployment-on-docker but ran into some issues:

  1. After docker image build, running: docker run myimage Outputs an error: OSError: [Errno 2] No such file or directory: '/ga4gh-example-data/referenceSets'
  2. The link for the example data at http://www.well.ox.ac.uk/~jk/ga4gh-example-data.tar.gz is broken
  3. After trying another command: docker run -d -p 8000:80 --name ga4gh_demo afirth/ga4gh-server:develop-demo Got error: Error pulling image (develop-demo) from docker.io/afirth/ga4gh-server, Untar re-exec error: exit status 1: output: write /data/ga4gh-example-data/reads/low-coverage/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam: no space left on device

Am I missing something from the documentation or does the docs need more details or an update?

client code

See
ga4gh#607 (comment)

Client code

There is no client API provided. This should be added and tested, so we can run queries from the command line without using curl or whatever.

use case

As a researcher, in order to use the G2P api in a python notebook, I need a client side api to integrate it.


Please read the code and create a design on how a client api might be used.

See

https://github.com/ohsu-computational-biology/server/blob/develop/ga4gh/client.py
https://github.com/ohsu-computational-biology/server/blob/develop/tests/unit/test_client.py

Analysis required: how would sparql queries translate to REST endpoints

existing sparql queries

(it would be useful to have these mapped to scigraph rest endpoint)

Note: these were produced via $nosetests tests.unit.test_views:TestFrontend.testGenotypePhenotypeSearchFeature --nocapture

lookup by location label

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 100 OFFSET 0 

lookup by location label & drug label


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild") && regex(?drug_label, "imatinib"))
            }
            LIMIT 100 OFFSET 0 

lookup by location label , drug label & disease label

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild") && regex(?drug_label, "imatinib") && regex(?disease_label, "GIST"))
            }
            LIMIT 100 OFFSET 0 

lookup by location ontology id

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ))
            }
            LIMIT 100 OFFSET 0 


lookup by location ontology id & disease ontology id



            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0 

lookup by location , disease & drug ontology id(s)


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?drug = <http://www.drugbank.ca/drugs/DB00619> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0 

lookup by location drug & disease id


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?drug = <FOODB00619> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0 

simple lable lookup (with paging)


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 0 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:1
offset 1

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 1 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:2
offset 2

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 2 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:3
offset 3

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 3 
starting query
ending query len(rows)=1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.