Giter VIP home page Giter VIP logo

ncbi-drs's Introduction

ncbi-drs

An implementation of GA4GH's Data Repository Service (DRS) on top of NCBI's SRA Data Locator (SDL).

Running the service:

Build the image.

docker build -t ncbi-drs .

Run the container.

docker run --detach --publish 8080:80 --name ncbi-drs --rm ncbi-drs

Note: The actual port (in this example, 8080) is up to you.

Test that the service is running.

curl http://localhost:8080/ga4gh/drs/v1/objects/SRR000000

This will produce a canned response. This only tests that the service is running. (Note: SRR000000 is not a real SRA accession.)

Test that the service is running and talking to NCBI's SDL service.

curl http://localhost:8080/ga4gh/drs/v1/objects/SRR000001

This should produce a real response. This tests that the service is running and can talk to NCBI to perform lookups. (Note: SRR000001 is a real SRA accession containing public data.)

This is as far as you can test with public data.

Accessing protected dbGaP data

The details of accessing protected dbGaP data through this service are in flux, this is for illustration purposes only.

You can access protected dbGaP data by providing a dbGaP access token as a bearer token in the HTTP Authentication header. For example:

curl -H "Authorization: Bearer @/path/to/token" http://localhost:8080/ga4gh/drs/v1/objects/SRR000001

dbGaP has a test project that consists of publicly available data. Anyone can get access to this data. See dbGaP project: 1000 Genomes Used for Cloud Testing

A Note about the DRS IDs used by this service

This service recognizes SRA run accessions (e.g. SRR, ERR, and DRR followed by a string of digits) as bundle IDs and returns a contents array. The IDs in that array can then be used to request access URLs to the individual files.

Why only run accessions?

The complete set of SRA accessions do not correspond to DRS IDs. SRA accessions represent objects such as submissions, experiments, studies, and projects. These objects are long-lived and their contents can change over time. This makes them unsuitable as DRS IDs. Generally, these changes are additive, but a researcher can submit new data to replace their previous submission. Generally, these changes happen in recent (or active) projects with the old ones being stable, but this is not guaranteed. So at this time, we can not properly support DRS IDs, and we do not wish to issue invalid IDs. However, using run accessions, we can come very close to something that mostly works as expected.

Some technical details

Protected dbGaP files are only available from the SRA via signed URLs to cloud storage buckets (or from NCBI as encrypted files). Due to issues regarding security and egress charges, this service does not hand out the signed URLs directly. When a signed URL is encountered, this service hands out a URL to itself. That URL leads to a transparent proxy that will perform the retrieval. Any egress charges will accrue to the account that is running this service and not to the account that owns the bucket.

This Dockerfile builds for ubuntu's configuration of Apache, it is particular and sensitive to that.

This service is implemented in Python and Flask.

Hopefully useful information:

If running on Amazon Linux 2,

Install docker, see Docker Basics for Amazon ECS

Pinning

It is possible to generate a version-pinned Dockerfile with the pinner.py script. For example:

python3 pinner.py > Dockerfile.pinned

Or:

python3 pinner.py | docker build -f - -t ncbi-drs:pinned .

ncbi-drs's People

Contributors

aboshkin avatar durbrow avatar kwrodarmer avatar vartanianmh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.