Giter VIP home page Giter VIP logo

api's Introduction

o2r Web API

Project description: https://o2r.info

Basics

We're a research project, so everything in this API and its documentation is subject to change. The "working" state should always be in the master branch, which is published online at https://o2r.info/api, and open pull requests reflect features under development.

API docs

This specification follows the Open API 3.0.3 specification. It is written in YAML and deployed automatically using ReDoc. See the ReDoc documentation for details.

View

The docs are build dynamically based on openapi.yml when index.html is opened in a browser. You can do this locally by starting a web browser in the /docs directory:

docker run --rm -it -p 80:80 -v $(pwd)/docs:/usr/share/nginx/html:ro nginx

Then open http://localhost/index.html.

Build

You can render the openapi.yml in this repository with redoc-cli tool. The output is a a zero-dependency static HTML file in your current directory.

#npm i -g redoc-cli

redoc-cli bundle docs/openapi.yml

⚠️ This will not include our style changes!

Our combination of the openapi.yml and ReDoc's redoc.standalone.js will render a html which is then deployed via the /docs folder. Our script redoc_theme.js contains the actual ReDoc initialization command and makes a few style changes through callback functions to correspond to our project. The css rules which expand the core ReDoc style are in the openapi_style.css file.

Web pages build

The pages at https://o2r.info/api/ are rendered client side (API docs) or are built locally by developers on relevant changes (load test docs). The website is served from the directory /docs, which must be configured in the repository settings.

Develop locally

You can serve the HTML page (without style changes!) and automatically re-rendering on changes with

redoc-cli serve --watch docs/openapi.yml

PDF Generation

Note that for every commit on the master branch a new PDF document will be generated. This can quickly lead to many commits. So it is best to develop new features on other branches.

Load testing

This repository contains a collection of R Markdown documents that can be used to evaluate the performance of the o2r reproducibility service. See the directory docs/evaluation for R code and documentation and for running load tests on the API and the user interface. The current load test report can be rendered with make loadtest_buld.

License

The o2r Web API specification is licensed under Creative Commons CC0 1.0 Universal License, see file LICENSE. To the extent possible under law, the people who associated CC0 with this work have waived all copyright and related or neighboring rights to this work. This work is published from: Germany.

api's People

Contributors

fmazin avatar jankoppe avatar jansule avatar lukaslohoff avatar nuest avatar o2r-user avatar sbastiangarzon avatar tekraft avatar timmimim avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

api's Issues

Do not create PDF on forks

Let's not run the action on forks, so we do not create each PDF twice.

For testing, one can run the code form the action manually.

Add HTML output

It would be helpful to also publish the API as a HTML document, ideally as a GH-page.

Attribute Username

The orcid id has already been added to the metadata of a single compendium. It would be nice if also the username could be added. Maybe something like this?

"User": {
    "name": "xy",
    "orcid": "1234"
}

Update ERC creation workflow and clarify in API

Situation See comment below for alternative solution!

ERC creation roughly has these steps:

  1. submit workspace (upload, or provide pointer)
  2. process upload (create container, extract metadata etc.)
  3. user reviews metadata
  4. user clicks "save" button (and saves the metadata, which starts the brokering)

This process must be communicated clearly to the user, especially in the metadata edit form. The buttons should effectively convey the messages "Finish ERC creation" or "Abort ERC creation", so that it is clear that not doing anything in the metadata review (e.g. closing the browser) will actually not create the ERC.

@7048730 Imho this clarification of the process means that we do not need brokering during the first processing (i.e. in the loader), but only need to do it during metadata update, because the metadata update will always be done.

Also, step 2 might take quite a long time and therefore the upload must support asynchronous communication, which is crucial to integrate this into larger architectures. See also http://farazdagi.com/blog/2014/rest-long-running-jobs/ [outcome of discussion with publisher architect]

Here we have two approaches, both of which we should try out (see also this SO answer!

  • queue + request/response (polling, easier? for us to implement)
  • callback (preferable for integration in larger architectures)

Tasks

  • API docs reflect the process by providing an intermediate "workspace" resource
    • GET /api/v1/workspace lists all uncompleted workspaces of the current user, or all (including completed ones) if the user has admin level
    • GET /api/v1/workspace/<id> provides workspace information, most importantly status
      • property status
        • processing
        • reviewable > processing is completed, metadata is ready for review
        • completed > makes a redirect to ERC with HTTP status 303 See Other
      • property eta, the estimated time to finish. Default eta is the average of all completed workspaces in the database
      • property cancel - a link where the workspace processing can be stopped (see below)
      • property lastUpdate - that last update time = also completion time
      • property compendium - if (and only if) the compendium is created
    • DELETE /api/v1/workspace/<id> endpoint (only admins and creating user)
      • should have two variants: really delete, or set deleted: true in the compendium object. The API never returns deleted objects, so no need to document this. To retrieve them direct database access is needed.
    • two variants for creating a compendium
      • POST /api/v1/compendium immediately returns with a response, status code HTTP 202 and a Location header field point to the respective workspace, see also on Location header
      • POST /api/v1/compendium?callback=http://callback.org/endpoint also reply with HTTP 202 and the Location header, but it would also register a callback which is called once the workspace processing is completed, see below
  • implementation in loader and muncher updated
  • issue for UI to implement this created
    • loader only creates workspace
    • muncher makes ERC out of workspace (this is what takes long)

The callback

If a callback endpoint is provided on creating a new compendium, e.g. POST /api/v1/compendium?callback=http://publisher.com/publication/100/appendix/1, the endpoint is called with the following operation after the workspace processing is completed:

PUT publisher.com/publication/100/appendix/1
# content of GET /api/v1/workspace/<id>
{
  "status": "reviewable",
  "compendium": "https://o2r.uni-muenster.de/api/v1/compendium/1234",
  "lastUpdate": "2017..."
}

Should we also notify the endpoint if the status changes from reviewable to completed?

Our own UI / websockets

It is currently unclear how we can also provide the notification via websockets... tbc

Search or filter via doi

Ideally an ERC is connected to a single publication, which has a doi.

If that is a goal and often enough the case, we can try to support this in search or filter operations

  • /api/v1/compendium?doi=doi%3A10.10.1038%2Fnphys1170
  • /api/v1/search?doi=doi%3A10.10.1038%2Fnphys1170

Should this go into filtering or into search?

Add sub-resources links to compendium

Retrieve the public link for a compendium at ../api/v1/compendium/ABC12/link.

This only concerns editors/admins, so the current listing of all links is probably fine for now.

Add page number for documents

PDF document metadata should also have the number of pages, not just the file size. This would allow the client to request a preview or the whole file, potentially better than the mere file size.

Clarify content types of the API and extend the microservice's tests

Right now we don't check the content types of requests, because the whole API is all JSON. Nevertheless, we should use the correct content types and return errors if wrong content types are used.

This could be part of a "general" section, because it applies to all APIs.

Add endpoint for supported/available computing environments

api/v1/environment returns

{
  "architecture": [
    "amd64"
  ],
  "os": [
    {
      "name": "linux",
      "version": "5.4.0-48-generic"
    }
  ],
  "container_runtimes": [
    {
      "name": "Docker Engine - Community",
      "api_version": "1.40",
      "version": "19.03.13"
    }
  ],
  "erc": {
    "manifest": {
      "capture_image": "o2rproject/containerit:geospatial-0.6.0.9003",
      "base_image": "rocker/geospatial:3.6.2",
      "memory": 2147483648
    },
    "execution": {
      "memory": 4294967296
    }
  }
}

Implemented in o2r-project/o2r-muncher#123, but not yet documented here. Will catch up with that ater switchingto the Open API spec.

API does not reflect session requirements

With the new User authentication and authenticated sessions, a better user authentification has been implemented. This makes the current X-API-Key header obsolete. Endpoints requiring authenticated sessions should reflect this, which would (as of now) only be the POST /api/v1/compendium endpoint. Also, this needs further implementation in the o2r muncher service.

  • Remove X-API-Key from documentation
  • Implement authenticated session for /api/v1/compendium in o2r muncher - see o2r-project/o2r-muncher#21
  • Add note to API docs that an authenticated session is required for /api/v1/compendium

API endpoint for all publications of one author

After logging in an author will be redirected to his landingpage. There, all of his own publications will be listed. Therefore an api endpoint is needed to list all publications of one author (including metadata for each publication).

Document metadata property of compendium

The metadata part of the platform is still very much under development. Instead of putting an intermediate documentation into the API we will keep an updated version of the documentation in this issue, and also discuss it here.

Eventually this should go into a file docs/compendium-metadata.md

Tasks

  • describe each field in o2r metadata <- edit: see schema for full description

Compendium metadata

o2r provides seperate MD for different purposes making use of their translatability:

{
   "id":"XyZ19",
   "metadata":{
      "third_party": {},
      "o2r": {
              "license": {},
       },
      "raw": {}
   },
   "created":"2016-12-15T08:22:27.029Z",
   "user":"0000-0002-0024-5046",
   "files":{
   }
}

License metadata

edit: need license in main for mappings!
Licensing information is provided separately for the main parts of a compendium, i.e. data, text, and code. Cases such as different licenses for different files or sub-projects are not covered directly.

License MD can contain free text or a list of licenses.

A license must be provided for each part of the compendium (code, data, text). The license might be identical. The license string is based on Open Licenses Service names.

edit: need to look at repository requirements, cf. trello card
- for code, the list of OSI licenses is recommended

{
   "metadata":{
      "license":{
         "data":"Against-DRM",
         "text":"CC-BY-4.0",
         "code":"AAL"
      }
   }
}

o2r metadata

This subset contains a core set of metadata attributes. They are refined from automatic extraction ("raw MD) to comply to the o2r-schema. Within the workflow, the user of the o2r platform is to review the raw MD and provide additions or modifications. The MD broker will translate the corrected raw MD to o2r MD.

{
   "metadata":{
      "o2r":{   "title":"ERC title", ... }
   }
}

edit: we dont need anythin beyond this point. e.g. zenodo MD is one subset of the metadata json key.

~~

Shipping metadata

This element contains recipient-specific metadata for shipments. It is derived from core metadata and updated after manual edits. It can directly be used for shipment purposes.

{
   "metadata":{
      "shipping":{
         "zenodo":{

         },
         "orcid":{

         },
         "codemeta":{

         },
         "datacite":{

         }
      }
   }
}

Switch from ISO-8601 to RFC-3339

Instead of ISO 8601, we should refer (and use) only to RFC-3339 (tools.ietf.org/html/rfc3339) which is open/free.

This only requires to change the name of the standard (search project for "8601").

Mime type for files

Hello,
referring to the API a file has the attributes path, name and size. Please also add the attribute type containing a string with the file's mime type. Folders should not have this attribute.

Best,
Jan

Rename "job" resource to "reproduction"

Right now, a user clicks "Run analysis" to execute a job for a compendium and there is a "
Ccurrently running analysis" and a "Last finished analysis", in our original poster (2016) we used "one-click reproduce", and the Web API has a resource /job. One compendium can have multiple jobs, /compendium/<compendium id>/jobs.

UI screenshot

image

Steps of a "job"

image

Suggestion

IMHO it would make our API and tools easier to understand if we use the word "reproduction" instead of "job" in the API, and align the UI with this wording. We should add a short note to the API docs and user interface that this is "computational" or "methods reproduction".

@MarkusKonk @edzer @chriskray @7048730 What do you think?

Improve PDF rendering

Just some ideas for improving the PDF output. Not urgent.

Fix title page (e.g., o2r logo):

image

Add page breaks before new chapters

image

Keep code blocks on one page

image

Reduce font size for code chunks

image

More

  • page numbers
  • page header (spec name, small logo, link)

Add more fine-grained check levels

As of now, the check is binary. This requires the outputs to be pixel-perfect, whereas reality might require a human reviewer to make that call. Therefore we should discuss to loosen the binary nature of the check and...

  • consider adding different result levels, better distinguishing
  • consider not requiring images to be pixel perfect - can the checker distinguish between grayscale changes and completely different curves in a graph?
  • consider allowing reviewers to override a result (transparently), maybe with a specific status check_overruled ?

clarify files page

The files page should mention that it is not a separate API function, but a description of a answer subset for view single job/compendium

Re-add PDF generation

https://stackoverflow.com/questions/54259816/how-to-generate-a-pdf-or-markup-from-openapi-3-0

This is the old make target for PDF generation of the mkdocs-based site:

pdf: build
	wkhtmltopdf --version;
	# fix protocol relative URLs, see https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2713
	find site/ -type f -name '*.html' | xargs sed -i 's|href="//|href="https://|g'
	find site/ -type f -name '*.html' | xargs sed -i 's|src="//|src="https://|g'
	wkhtmltopdf --margin-top 20mm --no-background --javascript-delay 5000 \
	file://$(shell pwd)/site/index.html \
	file://$(shell pwd)/site/compendium/view/index.html \
	file://$(shell pwd)/site/compendium/candidate/index.html \
	file://$(shell pwd)/site/compendium/files/index.html \
	file://$(shell pwd)/site/compendium/delete/index.html \
	file://$(shell pwd)/site/compendium/upload/index.html \
	file://$(shell pwd)/site/compendium/public_share/index.html \
	file://$(shell pwd)/site/compendium/download/index.html \
	file://$(shell pwd)/site/compendium/metadata/index.html \
	file://$(shell pwd)/site/compendium/substitute/index.html \
	file://$(shell pwd)/site/compendium/link/index.html \
	file://$(shell pwd)/site/job/index.html \
	file://$(shell pwd)/site/search/index.html \
	file://$(shell pwd)/site/shipment/index.html \
	file://$(shell pwd)/site/user/index.html \
	o2r-web-api.pdf

API endpoint for user management

Endpoints

  • api/v1/user > list all orcids
  • api/v1/user/<orcid> > show orcid and name, if logged as admin also show level

Features

  • list all users
  • change user level (admin only) via PATCH request

Clarification questions and comments

  • in 01-API.md
    • are "groups" equal to "service endpoints", or RESTish "resources"?
  • 02-upload.md
    • URC
      • suggest to just use ../compendium, or do we want to distinguish URC, ERC, PERC? I am against modelling these states in the URLs
      • why not /upload/urc/:id, but content with JSON?
    • Workspace
      • should say something like "archived workspace", and support not just zip, but also tar.gz.
    • External source
      • something generic, e.g. /upload/remote with POST and then we figure out from the provided JSON payload what it is (e.g. git URL)
  • 03-execution.md
    • execute_now is very limited, why not a delay_seconds which is 0 by default? or easier, a job_start_time, which can be in the past (execute now) or in the future (the client has to do the math)
    • FileDescriptor should be marked a a potential later feature
    • why not /compendium/<compendium ID>/jobs?
    • I find the create part in the URL confusing - isn't that what POST and PUT are for?
    • similar for ../run - should this not be POST to /jobs/:id?
    • /jobs/view/:id should simply be GET /jobs/:id
  • 04-ERC.md
    • I am for using "compendium" in the text.
    • /erc/view/:id > no need for /view
  • 05-user.md
    • good this is not started yet, we don't need users for now.
  • General
    • add an version element to the URL, i.e. /v1/upload/... etc.
    • introduce pagination to all responses that can contain more than one resource

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.