ohsu-comp-bio / server Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ga4gh/ga4gh-server

0.0 7.0 4.0 11.74 MB

A reference implementation of the APIs defined in the schemas repository.

License: Apache License 2.0

Shell 0.08% Python 96.10% HTML 0.57% Jupyter Notebook 3.25%

server's Introduction

GA4GH Reference Implementation

Join the chat at https://gitter.im/ga4gh/server

This is the development version of the GA4GH reference implementation. If you would like to install the stable version of the server, please see the instructions on the PyPI page.

The server is currently under heavy development, and many aspects of the layout and APIs will change as requirements are better understood. If you would like to help, please check out our list of issues!

The latest bleeding-edge documentation is available at read-the-docs.org.

For a quick start with the GA4GH API, please see our demo.
To configure and deploy the GA4GH server in production please see the installation page.
If you would like to contribute to the project, please see the development page.

server's People

Contributors

Watchers

Forkers

rnpandya aamargolin gabrielsaldana malisas

server's Issues

update and test

Recently squashed commits - can you update github and re-run all nosetests to ensure all is well?

G2P '/genotypephenotype/search' experiences

Summary

We extended the GA4GH Reference server to include a the '/genotypephenotype/search' endpoint. This document describes the experience and makes some targeted suggestions for improvements, primarily for the request payload.

Approach

We based our work on the model captured in ga4gh/schemas commit of Jul 30, 2015. This version of the schema predates the separated genotype to phenotype files from baseline.

The code was based on a branch setup for this purpose by the server team.
No major refactoring of the server was needed, additional code was added to ga4gh/backend.py,ga4gh/frontend.py and test/unit/test_views.py

Data

The cancer genome database Clinical Genomics Knowledge Base published by the Monarch project was the source of Evidence.

API

The GA4GH schemas define a single endpoint /genotypephenotype/search which accepts a POST of a request body containing one or more of Feature, PhenotypeInstance, EnvironmentalContext, and Evidence which are combined as a logical AND to query the underlying datastore. Missing types are treated as a wildcard returning all data. Responses of matching data are returned as a list of FeaturePhenotypeAssociation. All types rely heavily on OntologyTerm

Request

http://yuml.me/edit/bf06b90a

Response

http://yuml.me/edit/25343da1

Implementation

http://yuml.me/c97fada2

Issues

Query by example

There are four datatypes types for each entity [string, external identifier, ontology identifier and 'entity'].
Currently the implementation handles queries of [string, external identifier and ontology identifier].

The 'entity' query is a type of query-by-example has been deferred. Challenges that arose:

schema constraints: there are several fields within the schemas that are defined as non-null. This may be fine when creating an entity from a data store, however, they are problematic when creating an entity to be used in a query.
additional discussions needed to determine what properties from an existing entity will be used for the query and which will be ignored. For example a Feature has [id,parentIds, featureSetId, referenceName, start,end, strand, featureType, attributes] we need to specify exactly what the query's expectations are.

Ontology Queries

The 'ontologySource' is assumed to be equivalent to an Ontologies 'prefix'. However, no agreement or mechanism exists to align ontologySource to specific. Recommend collapsing ontologySource and identifier into a single URI

Name collision (SearchFeaturesResponse)

That schema contains two definitions of the class SearchFeaturesResponse. How are these handled in the generated code in _protocol_definitions.py? (Currently I only see one)

The schema project the current server is based on is version = '0.6.be171b00'
Snippets from this commit follow

One in the file genotypephenotypemethods.avdl, protocol GenotypePhenotypeMethods

/** This is the response from `POST /genotypephenotype/search` expressed as JSON. */
record SearchFeaturesResponse {
  /**
  The list of matching FeaturePhenotypeAssociation.
  */
  array<org.ga4gh.models.FeaturePhenotypeAssociation> associations = [];

  ...

The second one is found in sequenceAnnotationmethods.avdl

  /** This is the response from `POST /features/search` expressed as JSON. */
  record SearchFeaturesResponse {
    /**
    The list of matching annotations, sorted by start position. Annotations which
    share a start position are returned in a deterministic order.
    */
    array<org.ga4gh.models.Feature> features = [];

    ...

The generated code only has the class associated with sequenceAnnotationmethods.avdl

    def __init__(self):
        self.features = []
        self.nextPageToken = None

Both sequenceAnnotationmethods.avdl and genotypephenotypemethods.avdl share the same namespace @namespace("org.ga4gh.methods") each file defines an enclosing protocol.

In the names section of the spec

A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol. For example, if "name": "X" is specified, and this occurs within a field of the record definition of org.foo.Y, then the fullname is org.foo.X. If there is no enclosing namespace then the null namespace is used.

I'm assuming that the schemas pass validation...

A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used ("before" in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come "before" the messages attribute.)

TODO

Pull Request Prep

General clean up.  Additional Tests.

MS Literome adapter

Create a facade to interact with MS:Literome.  See http://literome.azurewebsites.net

CIViC Client

angular UI and node reverse proxy

Literome Feedback

Allow API to accept optional diseaseOrDrug, return first 100 potential associations

http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2
{"ClassName":"System.ArgumentException","Message":"'diseaseOrDrug' cannot be empty.",...}

Accept dbSNP ids on par with gene name

http://literome.azurewebsites.net/gwas/get?snporgene=rs80359550&diseaseordrug=Breast%20Diseases
{"Associations":[],"Abstracts":[]}

Allow disease name flexibility

http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2&diseaseordrug=Breast%20Diseases
{"Associations":[{"SnpOrGeneType":....}

http://literome.azurewebsites.net/gwas/get?snporgene=BRCA2&diseaseordrug=Breast%20Disease
{"Associations":[],"Abstracts":[]}

Accept entrez id for gene

http://literome.azurewebsites.net/gwas/get?snporgene=675&diseaseordrug=Breast%20Diseases
{"Associations":[{"SnpOrGeneType":....}

Use Drug ontology ids

DiseaseOrDrugId: "PA443559"

PA443559 equivalent_to http://www.ncbi.nlm.nih.gov/mesh/D001941

uncomment debug flag, exercise git workflow

See ga4gh#607 (comment)

In order to have a simple dry run of our git workflow, I thought it would be good to exercise it with a simple 1 line change, please:

fork this repo https://github.com/ohsu-computational-biology/server
uncomment bug tests/unit/test_views.py
run tests, ensure they pass
create pull request

CLI needs to be updated to conform with ga4gh/server PR #643

The CLI will need to be updated to conform to the new standards outlined in: ga4gh#643

Problems setting up project

I'm trying to set up a development environment with Docker following the current documentation at http://ga4gh-reference-implementation.readthedocs.org/en/latest/installation.html#deployment-on-docker but ran into some issues:

After docker image build, running: docker run myimage Outputs an error: OSError: [Errno 2] No such file or directory: '/ga4gh-example-data/referenceSets'
The link for the example data at http://www.well.ox.ac.uk/~jk/ga4gh-example-data.tar.gz is broken
After trying another command: docker run -d -p 8000:80 --name ga4gh_demo afirth/ga4gh-server:develop-demo Got error: Error pulling image (develop-demo) from docker.io/afirth/ga4gh-server, Untar re-exec error: exit status 1: output: write /data/ga4gh-example-data/reads/low-coverage/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam: no space left on device

Am I missing something from the documentation or does the docs need more details or an update?

Client code

There is no client API provided. This should be added and tested, so we can run queries from the command line without using curl or whatever.

use case

As a researcher, in order to use the G2P api in a python notebook, I need a client side api to integrate it.

Please read the code and create a design on how a client api might be used.

See

https://github.com/ohsu-computational-biology/server/blob/develop/ga4gh/client.py
https://github.com/ohsu-computational-biology/server/blob/develop/tests/unit/test_client.py

Analysis required: how would sparql queries translate to REST endpoints

existing sparql queries

(it would be useful to have these mapped to scigraph rest endpoint)

Note: these were produced via $nosetests tests.unit.test_views:TestFrontend.testGenotypePhenotypeSearchFeature --nocapture

lookup by location label

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 100 OFFSET 0

lookup by location label & drug label


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild") && regex(?drug_label, "imatinib"))
            }
            LIMIT 100 OFFSET 0

lookup by location label , drug label & disease label

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild") && regex(?drug_label, "imatinib") && regex(?disease_label, "GIST"))
            }
            LIMIT 100 OFFSET 0

lookup by location ontology id

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ))
            }
            LIMIT 100 OFFSET 0

lookup by location ontology id & disease ontology id



            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0

lookup by location , disease & drug ontology id(s)


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?drug = <http://www.drugbank.ca/drugs/DB00619> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0

lookup by location drug & disease id


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .
                    ?l  faldo:location ?location .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER ((?location = <http://www.monarchinitiative.org/_CGD:d8c2d551UniProtKB:P10721#P10721-1Region> ) && (?drug = <FOODB00619> ) && (?disease = <http://purl.obolibrary.org/obo/OMIM_606764> ))
            }
            LIMIT 100 OFFSET 0

simple lable lookup (with paging)


            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 0 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:1
offset 1

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 1 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:2
offset 2

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 2 
starting query
ending query len(rows)=1
_pickUpIteration
pageToken 0:3
offset 3

            PREFIX OBAN: <http://purl.org/oban/>
            PREFIX OBO: <http://purl.obolibrary.org/obo/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX faldo: <http://biohackathon.org/resource/faldo#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            SELECT distinct ?s  ?location ?location_label ?disease ?disease_label ?drug  ?drug_label
                WHERE {
                    ?s    a OBAN:association .
                    ?s  OBAN:association_has_subject ?l .
                    ?l rdfs:label ?location_label  .

                    ?s  OBO:RO_has_environment  ?drug .
                    ?drug  rdfs:label ?drug_label  .
                    ?s  OBAN:association_has_object  ?d .
                    ?d  rdfs:label ?disease_label  .
                    ?d rdf:type ?disease .
                    ?s  OBAN:association_has_object_property  ?evidence .
                    OPTIONAL {  ?evidence  rdfs:label ?evidence_label } .
              FILTER (regex(?location_label, "KIT *wild"))
            }
            LIMIT 1 OFFSET 3 
starting query
ending query len(rows)=1

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.