o2r-project / api Goto Github PK

View Code? Open in Web Editor NEW

3.0 7.0 8.0 16.7 MB

Reproducibility service RESTful web API specification and documentation

Home Page: https://o2r.info/api

License: Other

Makefile 100.00%

api's Introduction

o2r Web API

Project description: https://o2r.info

Basics

We're a research project, so everything in this API and its documentation is subject to change. The "working" state should always be in the master branch, which is published online at https://o2r.info/api, and open pull requests reflect features under development.

API docs

This specification follows the Open API 3.0.3 specification. It is written in YAML and deployed automatically using ReDoc. See the ReDoc documentation for details.

View

The docs are build dynamically based on openapi.yml when index.html is opened in a browser. You can do this locally by starting a web browser in the /docs directory:

docker run --rm -it -p 80:80 -v $(pwd)/docs:/usr/share/nginx/html:ro nginx

Then open http://localhost/index.html.

Build

You can render the openapi.yml in this repository with redoc-cli tool. The output is a a zero-dependency static HTML file in your current directory.

#npm i -g redoc-cli

redoc-cli bundle docs/openapi.yml

⚠️ This will not include our style changes!

Our combination of the openapi.yml and ReDoc's redoc.standalone.js will render a html which is then deployed via the /docs folder. Our script redoc_theme.js contains the actual ReDoc initialization command and makes a few style changes through callback functions to correspond to our project. The css rules which expand the core ReDoc style are in the openapi_style.css file.

Web pages build

The pages at https://o2r.info/api/ are rendered client side (API docs) or are built locally by developers on relevant changes (load test docs). The website is served from the directory /docs, which must be configured in the repository settings.

Develop locally

You can serve the HTML page (without style changes!) and automatically re-rendering on changes with

redoc-cli serve --watch docs/openapi.yml

PDF Generation

Note that for every commit on the master branch a new PDF document will be generated. This can quickly lead to many commits. So it is best to develop new features on other branches.

Load testing

This repository contains a collection of R Markdown documents that can be used to evaluate the performance of the o2r reproducibility service. See the directory docs/evaluation for R code and documentation and for running load tests on the API and the user interface. The current load test report can be rendered with make loadtest_buld.

License

The o2r Web API specification is licensed under Creative Commons CC0 1.0 Universal License, see file LICENSE. To the extent possible under law, the people who associated CC0 with this work have waived all copyright and related or neighboring rights to this work. This work is published from: Germany.

api's People

Contributors

Stargazers

Watchers

Forkers

nuest jansule lukaslohoff rehans516 timmimim tekraft fmazin

api's Issues

Do not create PDF on forks

Let's not run the action on forks, so we do not create each PDF twice.

For testing, one can run the code form the action manually.

Add stability levels to each endpoint

See https://nodejs.org/api/documentation.html#documentation_stability_index

Transform bindings spec to OpenAPI

See #82 (now closed)

@Fmazin please check with @NJaku01 if there is anything missing.

Add HTML output

It would be helpful to also publish the API as a HTML document, ideally as a GH-page.

Attribute Username

The orcid id has already been added to the metadata of a single compendium. It would be nice if also the username could be added. Maybe something like this?

"User": {
    "name": "xy",
    "orcid": "1234"
}

Create OpenAPI / Swagger 2.0 documentation

https://en.wikipedia.org/wiki/OpenAPI_Specification

https://www.openapis.org/

Add body parameter to job creation for disabling all caches when executing a compendium

Could be a build cache for image, a download cache, ... let's disable them all with one flag

cache: false

Update ERC creation workflow and clarify in API

Situation See comment below for alternative solution!

ERC creation roughly has these steps:

submit workspace (upload, or provide pointer)
process upload (create container, extract metadata etc.)
user reviews metadata
user clicks "save" button (and saves the metadata, which starts the brokering)

This process must be communicated clearly to the user, especially in the metadata edit form. The buttons should effectively convey the messages "Finish ERC creation" or "Abort ERC creation", so that it is clear that not doing anything in the metadata review (e.g. closing the browser) will actually not create the ERC.

@7048730 Imho this clarification of the process means that we do not need brokering during the first processing (i.e. in the loader), but only need to do it during metadata update, because the metadata update will always be done.

Also, step 2 might take quite a long time and therefore the upload must support asynchronous communication, which is crucial to integrate this into larger architectures. See also http://farazdagi.com/blog/2014/rest-long-running-jobs/ [outcome of discussion with publisher architect]

Here we have two approaches, both of which we should try out (see also this SO answer!

queue + request/response (polling, easier? for us to implement)
callback (preferable for integration in larger architectures)

Tasks

API docs reflect the process by providing an intermediate "workspace" resource
- GET /api/v1/workspace lists all uncompleted workspaces of the current user, or all (including completed ones) if the user has admin level
- GET /api/v1/workspace/<id> provides workspace information, most importantly status
  - property status
    - processing
    - reviewable > processing is completed, metadata is ready for review
    - completed > makes a redirect to ERC with HTTP status 303 See Other
  - property eta, the estimated time to finish. Default eta is the average of all completed workspaces in the database
  - property cancel - a link where the workspace processing can be stopped (see below)
  - property lastUpdate - that last update time = also completion time
  - property compendium - if (and only if) the compendium is created
- DELETE /api/v1/workspace/<id> endpoint (only admins and creating user)
  - should have two variants: really delete, or set deleted: true in the compendium object. The API never returns deleted objects, so no need to document this. To retrieve them direct database access is needed.
- two variants for creating a compendium
  - POST /api/v1/compendium immediately returns with a response, status code HTTP 202 and a Location header field point to the respective workspace, see also on Location header
  - POST /api/v1/compendium?callback=http://callback.org/endpoint also reply with HTTP 202 and the Location header, but it would also register a callback which is called once the workspace processing is completed, see below
implementation in loader and muncher updated
issue for UI to implement this created
- loader only creates workspace
- muncher makes ERC out of workspace (this is what takes long)

The callback

If a callback endpoint is provided on creating a new compendium, e.g. POST /api/v1/compendium?callback=http://publisher.com/publication/100/appendix/1, the endpoint is called with the following operation after the workspace processing is completed:

PUT publisher.com/publication/100/appendix/1
# content of GET /api/v1/workspace/<id>
{
  "status": "reviewable",
  "compendium": "https://o2r.uni-muenster.de/api/v1/compendium/1234",
  "lastUpdate": "2017..."
}

Should we also notify the endpoint if the status changes from reviewable to completed?

Our own UI / websockets

It is currently unclear how we can also provide the notification via websockets... tbc

Search or filter via doi

Ideally an ERC is connected to a single publication, which has a doi.

If that is a goal and often enough the case, we can try to support this in search or filter operations

/api/v1/compendium?doi=doi%3A10.10.1038%2Fnphys1170
/api/v1/search?doi=doi%3A10.10.1038%2Fnphys1170

Should this go into filtering or into search?

Check if timestamps are correctly documented

See branch https://github.com/o2r-project/api/tree/timestamps (last two commits).

If everything is in the OpenAPI spec, then tell @nuest the branch timestamps can be deleted.

RDA Collections WG Specification

How does the RDA Collections WG relate to the "collections" our API provides?

https://github.com/RDACollectionsWG/specification
https://rdacollectionswg.github.io/apidocs/#/

Use different ORCID for examples

https://orcid.org/0000-0002-1825-0097

Add entry point

See https://restful-api-design.readthedocs.org/en/latest/urls.html#entry-point

Should contain version, list of resources etc.

Add sub-resources links to compendium

Retrieve the public link for a compendium at ../api/v1/compendium/ABC12/link.

This only concerns editors/admins, so the current listing of all links is probably fine for now.

reconsider license

I am not sure Apache is a license that works for a specification, which essentially is a document, not software/code.

I'd suggest to switch to http://creativecommons.org/licenses/by/4.0/ but am open for opinions and links to useful resources on this matter.

(Note to self: OpenSearch uses a CC license: http://www.opensearch.org/Specifications/License)

Add page number for documents

PDF document metadata should also have the number of pages, not just the file size. This would allow the client to request a preview or the whole file, potentially better than the mere file size.

Clarify content types of the API and extend the microservice's tests

Right now we don't check the content types of requests, because the whole API is all JSON. Nevertheless, we should use the correct content types and return errors if wrong content types are used.

This could be part of a "general" section, because it applies to all APIs.

Add endpoint for supported/available computing environments

api/v1/environment returns

{
  "architecture": [
    "amd64"
  ],
  "os": [
    {
      "name": "linux",
      "version": "5.4.0-48-generic"
    }
  ],
  "container_runtimes": [
    {
      "name": "Docker Engine - Community",
      "api_version": "1.40",
      "version": "19.03.13"
    }
  ],
  "erc": {
    "manifest": {
      "capture_image": "o2rproject/containerit:geospatial-0.6.0.9003",
      "base_image": "rocker/geospatial:3.6.2",
      "memory": 2147483648
    },
    "execution": {
      "memory": 4294967296
    }
  }
}

Implemented in o2r-project/o2r-muncher#123, but not yet documented here. Will catch up with that ater switchingto the Open API spec.

API does not reflect session requirements

With the new User authentication and authenticated sessions, a better user authentification has been implemented. This makes the current X-API-Key header obsolete. Endpoints requiring authenticated sessions should reflect this, which would (as of now) only be the POST /api/v1/compendium endpoint. Also, this needs further implementation in the o2r muncher service.

Remove X-API-Key from documentation
Implement authenticated session for /api/v1/compendium in o2r muncher - see o2r-project/o2r-muncher#21
Add note to API docs that an authenticated session is required for /api/v1/compendium

API endpoint for all publications of one author

After logging in an author will be redirected to his landingpage. There, all of his own publications will be listed. Therefore an api endpoint is needed to list all publications of one author (including metadata for each publication).

Explicitly mention POST MIME type/content type

See https://stackoverflow.com/a/4073451/261210 for background.

In the examples for direct upload, we use curl -F which points to form/multipart:

-F, --form CONTENT  Specify HTTP multipart POST data (H)

Clarify which operations require which certain user level

@MarkusKonk you were asking recently what levels we have hare:o2r-project/o2r-platform#73 The API docs would be a good place to keep the levels in one place across all microservices.

create a user level page collecting all different levels from the microservices
add formatted notes to all requests that they do require a certain user level, and link to the level page

Document metadata property of compendium

The metadata part of the platform is still very much under development. Instead of putting an intermediate documentation into the API we will keep an updated version of the documentation in this issue, and also discuss it here.

Eventually this should go into a file docs/compendium-metadata.md

Tasks

describe each field in o2r metadata <- edit: see schema for full description

Compendium metadata

o2r provides seperate MD for different purposes making use of their translatability:

{
   "id":"XyZ19",
   "metadata":{
      "third_party": {},
      "o2r": {
              "license": {},
       },
      "raw": {}
   },
   "created":"2016-12-15T08:22:27.029Z",
   "user":"0000-0002-0024-5046",
   "files":{
   }
}

License metadata

edit: need license in main for mappings!
Licensing information is provided separately for the main parts of a compendium, i.e. data, text, and code. Cases such as different licenses for different files or sub-projects are not covered directly.

License MD can contain free text or a list of licenses.

A license must be provided for each part of the compendium (code, data, text). The license might be identical. The license string is based on Open Licenses Service names.

edit: need to look at repository requirements, cf. trello card
~~- for code, the list of OSI licenses is recommended~~

~~an alternative list is SPDX~~
~~for text, the list of Open Definition licenses is recommended~~
~~for data, the list of CKAN licenses is recommended~~

{
   "metadata":{
      "license":{
         "data":"Against-DRM",
         "text":"CC-BY-4.0",
         "code":"AAL"
      }
   }
}

o2r metadata

This subset contains a core set of metadata attributes. They are refined from automatic extraction ("raw MD) to comply to the o2r-schema. Within the workflow, the user of the o2r platform is to review the raw MD and provide additions or modifications. The MD broker will translate the corrected raw MD to o2r MD.

{
   "metadata":{
      "o2r":{   "title":"ERC title", ... }
   }
}

edit: we dont need anythin beyond this point. e.g. zenodo MD is one subset of the metadata json key.

Shipping metadata

~~This element contains recipient-specific metadata for shipments. It is derived from core metadata and updated after manual edits. It can directly be used for shipment purposes.~~

{
   "metadata":{
      "shipping":{
         "zenodo":{

         },
         "orcid":{

         },
         "codemeta":{

         },
         "datacite":{

         }
      }
   }
}

Switch from ISO-8601 to RFC-3339

Instead of ISO 8601, we should refer (and use) only to RFC-3339 (tools.ietf.org/html/rfc3339) which is open/free.

This only requires to change the name of the standard (search project for "8601").

Mime type for files

Hello,
referring to the API a file has the attributes path, name and size. Please also add the attribute type containing a string with the file's mime type. Folders should not have this attribute.

Best,
Jan

Rename "job" resource to "reproduction"

Right now, a user clicks "Run analysis" to execute a job for a compendium and there is a "
Ccurrently running analysis" and a "Last finished analysis", in our original poster (2016) we used "one-click reproduce", and the Web API has a resource /job. One compendium can have multiple jobs, /compendium/<compendium id>/jobs.

UI screenshot

Steps of a "job"

Suggestion

IMHO it would make our API and tools easier to understand if we use the word "reproduction" instead of "job" in the API, and align the UI with this wording. We should add a short note to the API docs and user interface that this is "computational" or "methods reproduction".

@MarkusKonk @edzer @chriskray @7048730 What do you think?

URL template

Does it makes sense to use http://tools.ietf.org/html/rfc6570 for describing the URLs?

Add API docs for bindings

@MarkusKonk Can you take an hour and update the API docs for the current state of the bindings, please?

Improve PDF rendering

Just some ideas for improving the PDF output. Not urgent.

Fix title page (e.g., o2r logo):

Add page breaks before new chapters

Keep code blocks on one page

Reduce font size for code chunks

page numbers
page header (spec name, small logo, link)

Add documentation about job live update API with WebSockets

See https://github.com/o2r-project/o2r-informer/blob/master/README.md

Add more fine-grained check levels

As of now, the check is binary. This requires the outputs to be pixel-perfect, whereas reality might require a human reviewer to make that call. Therefore we should discuss to loosen the binary nature of the check and...

consider adding different result levels, better distinguishing
consider not requiring images to be pixel perfect - can the checker distinguish between grayscale changes and completely different curves in a graph?
consider allowing reviewers to override a result (transparently), maybe with a specific status check_overruled ?

Add documentation for error responses when download fails in transportar

When there is no job and subsequently not image tarball that can be downloaded, then the transportar returns an HTTP 500 with a JSON body error message. This is not documented in the API, so please add it.

Add option to delete jobs

Useful because jobs can be run even before publication, cf. o2r-project/o2r-UI#151

file listings need MIME type for files

return MIME type in attribute type for each file

Document Security Scheme in OpenAPI

https://swagger.io/specification/#security-scheme-object

We use an apiKey that is stored as a cookie parameter.

It could even work to reference the OAuth2 endpoint or docs from ORCID?

@Fmazin Is there really no way to get the content under ## User authentication into the "Authentication" headline right after "About" ?

Support direct file upload for substitution

See #60

A draft to support direct file upload during substitution, i.e. a user selects a file from the browser instead of from the overlay ERC.

More external sources for submissions

As proposed by @nuest:

External source
- something generic, e.g. /upload/remote with POST and then we figure out from the provided JSON payload what it is (e.g. git URL)

clarify files page

The files page should mention that it is not a separate API function, but a description of a answer subset for view single job/compendium

Add option to save without publish, add action for publishing

Change the process so that users can Save an ERC without it being published, see https://o2r.info/api/compendium/candidate/#metadata-review-and-saving

Add API feature to publish ERC.

Implement JSON API or add HAL documents or change to HAPI

HAL standardizes links between API responses, see http://stateless.co/hal_specification.html

Re-add PDF generation

https://stackoverflow.com/questions/54259816/how-to-generate-a-pdf-or-markup-from-openapi-3-0

This is the old make target for PDF generation of the mkdocs-based site:

pdf: build
	wkhtmltopdf --version;
	# fix protocol relative URLs, see https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2713
	find site/ -type f -name '*.html' | xargs sed -i 's|href="//|href="https://|g'
	find site/ -type f -name '*.html' | xargs sed -i 's|src="//|src="https://|g'
	wkhtmltopdf --margin-top 20mm --no-background --javascript-delay 5000 \
	file://$(shell pwd)/site/index.html \
	file://$(shell pwd)/site/compendium/view/index.html \
	file://$(shell pwd)/site/compendium/candidate/index.html \
	file://$(shell pwd)/site/compendium/files/index.html \
	file://$(shell pwd)/site/compendium/delete/index.html \
	file://$(shell pwd)/site/compendium/upload/index.html \
	file://$(shell pwd)/site/compendium/public_share/index.html \
	file://$(shell pwd)/site/compendium/download/index.html \
	file://$(shell pwd)/site/compendium/metadata/index.html \
	file://$(shell pwd)/site/compendium/substitute/index.html \
	file://$(shell pwd)/site/compendium/link/index.html \
	file://$(shell pwd)/site/job/index.html \
	file://$(shell pwd)/site/search/index.html \
	file://$(shell pwd)/site/shipment/index.html \
	file://$(shell pwd)/site/user/index.html \
	o2r-web-api.pdf

Evaluate other APIs with "jobs"

Identify some APIs who do similar things (Amazon compute, Travis, Drone) and see what useful patterns can be re-used

Rename cookie or replace with Authentication: Bearer header

See o2r-project/o2r-bouncer#14

Important: must update all microservices for this!

There fore it might be preferable to switch to a non-cookie based way to provide the token, for example with Authentication: Bearer header.

API endpoint for search

Where `[FileDescriptor]` allows overriding files from the ERC with files from a different execution Job or a different ERC

Where [FileDescriptor] allows overriding files from the ERC with files
from a different execution Job or a different ERC.
[what? diese Funktionalität ist mir neu.]
[jk: O3,4, User Stories 53-55]
[MK: Ok, wenn ich die User Stories dazu nehme, ist das klarer, aber dann müsstest du hier noch genauer werden.]

API endpoint for user management

Endpoints

api/v1/user > list all orcids
api/v1/user/<orcid> > show orcid and name, if logged as admin also show level

Features

list all users
change user level (admin only) via PATCH request

Clarification questions and comments

in 01-API.md
- are "groups" equal to "service endpoints", or RESTish "resources"?
02-upload.md
- URC
  - suggest to just use ../compendium, or do we want to distinguish URC, ERC, PERC? I am against modelling these states in the URLs
  - why not /upload/urc/:id, but content with JSON?
- Workspace
  - should say something like "archived workspace", and support not just zip, but also tar.gz.
- External source
  - something generic, e.g. /upload/remote with POST and then we figure out from the provided JSON payload what it is (e.g. git URL)
03-execution.md
- execute_now is very limited, why not a delay_seconds which is 0 by default? or easier, a job_start_time, which can be in the past (execute now) or in the future (the client has to do the math)
- FileDescriptor should be marked a a potential later feature
- why not /compendium/<compendium ID>/jobs?
- I find the create part in the URL confusing - isn't that what POST and PUT are for?
- similar for ../run - should this not be POST to /jobs/:id?
- /jobs/view/:id should simply be GET /jobs/:id
04-ERC.md
- I am for using "compendium" in the text.
- /erc/view/:id > no need for /view
05-user.md
- good this is not started yet, we don't need users for now.
General
- add an version element to the URL, i.e. /v1/upload/... etc.
- introduce pagination to all responses that can contain more than one resource

o2r-project / api Goto Github PK

api's Introduction

o2r Web API

Basics

API docs

View

Build

Web pages build

Develop locally

PDF Generation

Load testing

License

api's People

Contributors

Stargazers

Watchers

Forkers

api's Issues

Situation See comment below for alternative solution!

Tasks

The callback

Our own UI / websockets

Tasks

Compendium metadata

License metadata

o2r metadata

Shipping metadata

UI screenshot

Steps of a "job"

Suggestion

Fix title page (e.g., o2r logo):

Add page breaks before new chapters

Keep code blocks on one page

Reduce font size for code chunks

More

Endpoints

Features

Recommend Projects

Recommend Topics

Recommend Org