Giter VIP home page Giter VIP logo

ingest-pipeline's Introduction

HuBMAP Ingest Pipeline

About

This repository implements the internals of the HuBMAP data repository processing pipeline. This code is independent of the UI but works in response to requests from the data-ingest UI backend.

Using the devtest assay type

devtest is a mock assay for use by developers. It provides a testing tool controlled by a simple YAML file, allowing a developer to simulate execution of a full ingest pipeline without the need for real data. To do a devtest run, follow this procedure.

  1. Create an input dataset, for example using the ingest UI.
  • It must have a valid Source ID.
  • Its datatype must be Other -> devtest
  1. Insert a control file named test.yml into the top-level directory of the dataset. The file format is described below. You may include any other files in the directory, as long as test.yml exists.
  2. Submit the dataset.

Ingest operations will proceed normally from that point:

  1. The state of the original dataset will change from New through Processing to QA.
  2. A secondary dataset will be created, and will move through Processing to QA with an adjustable delay (see below).
  3. Files specified in test.yml may be moved into the dataset directory of the secondary dataset.
  4. All normal metadata will be returned, including extra metadata specified in test.yml (see below).

The format for test.yml is:

{
  # the following line is required for the submission to be properly identified at assay 'devtest'
  collectiontype: devtest,
  
  # The pipeline_exec stage will delay for this many seconds before returning (default 30 seconds)
  delay_sec: 120,
  
  # If this list is present, the listed files will be copied from the submission directory to the derived dataset.
  files_to_copy: ["file_068.bov", "file_068.doubles"],
  
  # If present, the given metadata will be returned as dataset metadata for the derived dataset.
  metadata_to_return: {
    mymessage: 'hello world',
    othermessage: 'and also this'
  }
}

API

API Test
Description Test that the API is available
HTTP Method GET
Example URL /api/hubmap/test
URL Parameters None
Data Parameters None
Success Response Code: 200
Content: {"api_is_alive":true}
Error Responses None
Get Process Strings
Description Get a list of valid process identifier keys
HTTP Method GET
Example URL /api/hubmap/get_process_strings
URL Parameters None
Data Parameters None
Success Response Code: 200
Content: {"process_strings":[...list of keys...]}
Error Responses None
Get Version Information
Description Get API version information
HTTP Method GET
Example URL /api/hubmap/version
URL Parameters None
Data Parameters None
Success Response Code: 200
Content: {"api":API version, "build":build version}
Error Responses None
Request Ingest
Description Cause a workflow to be applied to a dataset in the LZ. The full dataset path is computed from the data parameters.
HTTP Method POST
Example URL /api/hubmap/request_ingest
URL Parameters None
Data Parameters provider : one of a known set of providers, e.g. 'Vanderbilt'
submission_id : unique identifier string for the data submission
process : one of a known set of process names, e.g. 'MICROSCOPY.IMS.ALL'
Success Response Code: 200
Content:{
"ingest_id":"some_unique_string",
"run_id":"some_other_unique_string"
}
Error Responses Bad Request:
  Code: 400
  Content strings:
    "Must specify provider to request data be ingested"
    "Must specify sample_id to request data be ingested"
    "Must specify process to request data be ingested"
    "NAME is not a known ingestion process"
Unauthorized:
  Code: 401
  Content strings:
    "You are not authorized to use this resource"
Not Found:
  Code: 404
  Content strings:
    "Resource not found"
    "Dag id DAG_ID not found"
Server Error:
  Code: 500
  Content strings:
    "An unexpected problem occurred"
    "The request happened twice?"
    "Attempt to trigger run produced an error: ERROR"

ingest-pipeline's People

Contributors

jswelling avatar sunset666 avatar mruffalo avatar gesinaphillips avatar jpuerto-psc avatar icaoberg avatar ilan-gold avatar yuanzhou avatar keller-mark avatar austinhartman avatar gesmira avatar julianx avatar frdougal avatar jpuerto96 avatar derekfurstpitt avatar chuckkollar avatar daghis avatar mccalluc avatar pennycuda avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.