Giter VIP home page Giter VIP logo

ontohub-backend's Introduction

Build Status Coverage Status Code Climate GitHub issues

ontohub-backend

The main Ontohub service that serves the data for the frontend and other clients via the GraphQL API.

Run the backend in development mode

You need to have the postgres and rabbitmq services started. This guide assumes that you have got the hets-agent located at ../hets-agent relative to the ontohub-backend repository.

With invoker

We use the invoker process manager during development to run all the needed processes with one command. You can start it with either bundle exec rails invoker or bundle exec invoker start invoker.ini (the former only calls the latter, but consumes more memory).

Manually

Alternatively, you can start the processes yourself by running each of these commands in a separate terminal:

  • bundle exec rails sneakers:run to run the sneakers workers
  • bundle exec rails server to run the HTTP server of the backend in development mode
  • pushd ../hets-agent && bin/hets_agent; popd to run the HetsAgent

Access the backend

The backend is then reachable from the browser at http://localhost:3000. The interactive GraphiQL console can be accessed at http://localhost:3000/graphiql.

Dependencies

The backend is implemented in Ruby on Rails. First, the Ruby version referenced in the file .ruby-version needs to be installed, as well as the gem bundler. git needs to be installed as well. Invoking the command bundle install in the directory of this repository will then install all dependencies of the backend.

Set up a development environment

In order to set up a complete environment, please refer to the wiki page Setting up the development environment.

Build the REST API documentation

We maintain API documentation with a JSON schema description. The schemas are located at spec/support/api/schemas. You can build an HTML-representation of it with doca. This requires the tools npm and yarn to be installed on your system and be in your PATH.

First, you need to install doca with npm. We created a Rake task for this:

rails apidoc:prepare

Next, you need to create the documentation server files:

rails apidoc:init

This initialization must be run whenever new schema files are created.

And finally, you can run the API documentation server (the default port is 3002):

rails apidoc:run
# or to change the port:
PORT=8001 rails apidoc:run

Then, visit http://localhost:3002 to see the REST API documentation. This server listens to changes on the JSON schema files and updates the documentation.

ontohub-backend's People

Contributors

dependabot[bot] avatar derprofessor avatar ebolloff avatar eugenk avatar phyrog avatar stickler-ci avatar tillmo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ontohub-backend's Issues

Version check fails on Travis

The current version check introduced in #47 uses git describe --long --tags to get the latest git tag. Since Travis only clones the last 50 commits and we now have more than 50 commits on some new branches, the tag is not cloned anymore and thus the version check fails.

The main problem is, that the command is run at startup time and not during the tests (during which the code is stubbed). So basically, if we're in test mode, we don't want to run the command at all during startup.

Go through the security checklist

See https://github.com/brunofacca/zen-rails-security-checklist.
Some of this is only applicable to apps with html views, but some of it is very important for the API-side as well.

Update (16 Nov 2017)

I went through the checklist and these are the points which we need to pay a little bit more attention to:

Authentication

  • Expire the session at log out and expire old sessions at every successful login. Mitigates CSRF, session hijacking and session fixation attacks by reducing their time-frame.

    Backend: We would need to have a column session_key for a User that is re-written on every sign-in. A sign-out would delete the session_key. This requires a signOutMutation. Authentication only happens if the correct session_key is inside the JWT.

    Frontend: Sign-out not only deletes the JWT in the browser, but also invalidates the session_key

  • Expire sessions after a period of inactivity (e.g., 30 minutes). Mitigates CSRF, session hijacking and session fixation attacks by reducing their time-frame.

    Backend: We do have an expiration time in the JWT but, currently, we don't check it during authentication.

  • Consider using two-factor authentication (2FA) as provided by Authy. Provides a highly effective extra layer of authentication security.
    Devise: see the devise-two-factor and authy-devise gems.

HTTP & TLS

Security-related headers

Security Tools

  • Run Brakeman before each deploy. If using an automated code review tool like Code Climate, enable the Brakeman engine.

  • Consider using a continuous security service such as Detectify.

    Detectify automatically scans the application for several hundred security vulnerabilities. It even has a free plan for nonprofit organisations (but only one domain/subdomain). They need to be contacted by mail in this case.

  • Consider using a Web Application Firewall (WAF) such as
    NAXSI for Nginx,
    ModSecurity for Apache and Nginx.
    Mitigates XSS, SQL Injection, DoS, and many other attacks.

CSRF protection

We should implement some sort of Cross Site Request Forgery protection. While Rails offers the protect_from_forgery method on controllers, combining this with an independent frontend is a bit more tricky (since the forms are not rendered by Rails, it can't inject the CSRF token).

http://www.blaketidwell.com/2015/01/27/ember-rails-csrf-handling.html describes how we could implement this with Ember and Rails. https://nvisium.com/blog/2014/09/10/understanding-protectfromforgery/ describes a bit more how CSRF protection works in Rails itself.

Git: Conflict Resolution

Resolve conflicts when committing. Currently, committing is prohibited if the branch has changed since the checking out a file (see previous_head_sha variable all over lib/git/committing.rb).

Resolve minor conflicts by:

  1. Create a new branch from the previous_head_sha
  2. Commit to the branch
  3. Merge the branch back to the original branch
  4. Delete the created branch

If the merging fails, respond with a merge conflict, but delete the created branch anyway.

draw diagram with components and tools

we use so many tools (e.g. mirage, ember_json_schema, gitlab_git...) that an overview diagram would be very useful. Standard gems can be omitted, list only the most important gems.

Add Settings for Host

Currently, url_for always returns http://localhost:3000/.... This must be configured to use a value that is configurable.

Outsource the git layer to gitlab_git

The git layer may be interesting for other people as well. Since gitlab does not maintain gitlab_git any more, we should move the library functionality of our git layer to our fork of the gitlab_git gem.

Persist JWT key

Persist the JWT key such that it is still used if the server is restarted.

Remove the JWT key secrets from the config/secrets.yml because they are not used.

Login with JWT and Devise

We need

  • SessionsController (Actions: create) with tests
  • Authentication strategy warden/devise (See Tutorial)
  • Token with UserID, Expiration

Ontohub services

  • Translate formula along signature morphism (hets)
  • Flatten a theory (hets)
  • Compute colimit of a network (hets)
  • Compute normal form of a structured OMS/graph (hets)
  • Syntax highlighting, autocomplete, resolve origin of symbol, correction, includes (hets)
  • Show all open proof obligations in OMS graph (hets)
  • OMS graph transformation (hets; proof rule, apply comorphism)
  • Show and inspect refinement tree (hets)
  • Parse OMS (hets)
  • Prove proof obligation (hets)
  • Parse string in context (hets)
  • QMT (hets; parse, check, simplify, infer type, present, map, filter, sparql-like)

  • List all unproven theorems of a flat theory
  • List all indirect imports of a theory (recursive database)
  • GET URI
  • POST content
  • GET SVG (multiple graphs)
  • Search
  • Admin commands

  • Subterm selection (client side)
  • Folding (client side)
  • Visibility changes (client side; infered types, redundant brackets, implicit arguments)
  • Tooltip (client side)

Git: Mutex git actions

We use two different systems to write to git repositories: git-shell and gitlab-git. The write actions need to be done in a critical path, exclusively for one process. Use an appropriate mutex mechanism to ensure that only one process has write access. Lock-files may be a good option.

Add API documentation

Although there are tests and JSON schema definitions, we could use a human-readable API documentation. I suggest to create a doc/api directory and put API documentation there.

Authorisation

Add a notion of an admin and set up authorisation. Search for the best library for the job. In legacy-Ontohub, we wanted to switch to pundit because policies are just ruby objects and they can be used more easily in the git-shell, which we want to be light-weight.

Solve devise/migrate issue

rails db:drop
rails db:create
rails db:migrate

Fails because of the devise_for in the routes. Fix it.

Roadmap: The next few steps

The next few big steps of the overall implementation should be



These blocks contain tasks that can be done more or less in parallel. The

Create a controller for Namespace

Only for read actions. Namespaces should be created/updated/deleted in the console only at first. Later on, namespaces should be closely tied to the OrganizationalUnit (whose children will get controllers).

Create API for commits and files

Needed routes:

/repositories/:id/commits # List of commits in that repository
/repositories/:id/commits/:hash # One specific commit
/repositories/:id/commits/:hash/files # List of files at one specific commit
/repositories/:id/commits/:hash/files/:file_path # One specific file at one specific commit

Repository should include relationship to the list of commits. In the list of commits each commit should have a relationship to the list of files and each file in this list should link to the file url.
The repository should also include some information about the branches of the repository, especially which one is the default branch (needs to be implemented in the models first, see ontohub/ontohub-models#52).

central features needed for going productive

Here is a list of central features that needed for going productive (i.e. replacing the productive Ontohub 1.0). The list should include the necessary things, but should be kept as short as possible so that we can go online as quickly as possible:

  • manage repositories
  • manage ontologies and their sentences and symbols, and errors
  • theorem proving
    Needed only later:
  • displays of graphs
  • mappings
  • search
  • metadata
  • list of logics

Git: Refs and Tags

Implement the following refs/tags functionality:

  • List refs (struck out: see below)
    /:user/:repository/refs   GET index
    
  • List tags
    /:user/:repository/tags   GET index
    
  • Create a tag
    /:user/:repository/tags   POST create
    
  • Show a tag
    /:user/:repository/tags/:tag   GET show (with additional information like release notes)
    
  • Delete a tag
    /:user/:repository/tags/:tag   DELETE destroy
    

These functions are already implemented in GraphQL by #165.

The struck out bullets are not needed because we only want to have read actions in the REST API.

Also, I vote to drop support for the refs route because it is only the union of tags and branches. If it is really needed to show both on a web page, the frontend should make two queries and put them together itself.

Git: Branches

Implement the following branching functionality

  • List branches
    /:user/:repository/branches   GET index
    
  • Show branch
    /:user/:repository/branches/:branch   GET show (identical with /:user/:repository/ref/:branch/commits)
    
  • Create a branch
    /:user/:repository/branches   POST create
    
  • Delete a branch
    /:user/:repository/branches/:branch   DELETE destroy
    
  • Get the default branch
    /:user/:repository/branches///default   GET show
    
  • Set the default branch
    /:user/:repository/branches///default PATCH update
    

These functions are already implemented in GraphQL by #165.

The struck out bullets are not needed because we only want to have read actions in the REST API.

Git: Cloning

When creating a repository, allow to clone a remote git/svn repository. Either as a fork that is writable or as a mirror that is synchronised periodically and write-protected. When creating such a mirror/fork, allow to specify a list of UrlMappings.

This depends on ontohub/ontohub-models#114 and #250

Use GitHelper.exclusively(repository) { repository.git.pull } to wrap the synchronisation process in a mutex. See #85 and #304.

Implementation Hints

Cloning

  • Create a GraphQL mutation
    mutation ($newRepository: NewRepository!, $remoteAddress: String!, $remoteType: RepositoryRemoteTypeEnum!, $urlMappings: [UrlMapping!]!) : Repository {
      cloneRepository(data: $newRepository, remoteAddress: $remoteAddress, type: $type, urlMappings: $urlMappings)
    }
    that creates a Repository (not a RepositoryCompound) similar to the createRepositoryMutation but also sets the additional fields appropriately. After creating, it schedules an asynchronous background job with RepositoryCloningJob.perform_later(repository.id).
  • Edit: Before creating the Repository, call Bringit::Wrapper.valid_remote?(remote_address) to check whether or not the remote is a supported repository. If it returns false, add an error to the GraphQL context.
  • Create a RepositoryCloningJob with a public method perform(repository_id) that fetches the Repository from the database and clones the git repository with gitlab_git. Right after cloning, the synchronized_at timestamp should be set to the current time.
  • Create a RepositoryCloningWorker that only sets the options (queue, threads, prefetch, timeout) appropriately. See, for instance, the PostProcessHetsWorker as an example.
  • Add a simple, small bare git repository and an svn repository to db/seeds/fixtures/repositories. Clone these two repositories at the end of db/seeds/030_repository_seeds.rb the same way you clone them in the createRepositoryMutation.

Pulling

  • Create a RepositoryPullingWorker that only sets the appropriate options.
  • Create a RepositoryPullingJob with a public method perform(repository_id) that fetches the RepositoryCompound from the database and pulls the git repository with gitlab_git inside a mutex. Use GitHelper.exclusively(repository) { repository.git.pull } to wrap the synchronisation process in a mutex.
  • Pulling is supposed to happen periodically on mirrors (not forks), e.g. once a day. Create a RepositoryPullingPeriodicallyJob that uses the :async adapter (see the second listing in RailsGuides: ActiveJob Basics: 4.2 Setting the Backend for the configuration of a single job). The RepositoryPullingPeriodicallyJob has a perform method (no arguments) that fetches every mirror repository from the database and schedules a RepositoryPullingJob for each of them. In the end, the RepositoryPullingPeriodicallyJob schedules a RepositoryPullingPeriodicallyJob again with RepositoryPullingPeriodicallyJob.perform_in(OntohubBackend::Application.config.mirror_repository_synchronization_interval)
  • Add a configuration config.mirror_repository_synchronization_interval = 6.hours to config/application.rb.
  • Call RepositoryPullingPeriodicallyJob.perform_in(OntohubBackend::Application.config.mirror_repository_synchronization_interval) from the after_initialize block at the end of config/application.rb.

Policies

Edit the RepositoryPolicy#write? policy such that it is not permitted to write to a mirror repository.

Explanation of the Control Flow

A mirror/fork repository is created synchronously by the mutation, but it is not yet cloned.
Cloning is supposed to happen in the background because it may take a lot of time and we want to keep the response delays at a minimum. Calling SomeJob.perform_later pushes a job to a queue. The arguments (their serialisation) of such jobs should also be as small as possible. The SomeWorker runs in a separate process (maybe even on a different machine) and listens to the queue. Whenever a new item appears on the queue, it calls the SomeJob and passes the arguments.

Unfortunately, our default backend for ActiveJob does not support delayed jobs which are needed for periodical pulling of mirror repositories. We work around this shortcoming by using the ActiveJob Async Job for the periodic feature. This one, on the other hand does not persist jobs, so they are lost when the ontohub-backend reboots.

Git: Diff

  • Diff of a commit

    /:user/:repository/diff/:revision   GET
    

    The :revision in the URL is the target commit. If it is not specified, the HEAD of the default branch is supposed to be the target.

  • Diff of a commit range

    /:user/:repository/diff/:revision_from..:revision_to   GET
    

These functions are already implemented in GraphQL by #165.

Create dummy search controller

It would be good for the minimal deployment to have some kind of way to access all repositories. The nicest way IMHO would be to have a dummy search page, that always shows all repositories. That way, we can just add the search functionality later and keep the frontend view.

Git: Commit Info

Implement the following commit info functionality:

  • Show info of a commit (HEAD of the default branch of no ref is given)
    /:user/:repository[/ref/:reference]/commit   GET show (the commit)
    
  • Git log a directory (from the HEAD of the default branch of no ref is given)
    /:user/:repository[/ref/:reference]/commits/:path_to_the_directory   GET
    
  • Git log a file (from the HEAD of the default branch of no ref is given)
    /:user/:repository[/ref/:reference]/commits/:path_to_the_file   GET
    

These functions are already implemented in GraphQL by #165.

Parse relationships object of a POST request

Ember sends data as

{
  "data": {
    "attributes": {
      "name": "foobar",
      "description": "barfoo"
    },
    "relationships": {
      "namespace": {
        "data": {
          "type": "namespaces",
          "id": "ada"
        }
      }
    },
    "type": "repositories"
  }
}

and not as

{
  "data": {
    "attributes": {
      "name": "foobar",
      "description": "barfoo",
      "namespace_id": "ada"
    },
    "type": "repositories"
  }
}

Both of them are compatible to the JSON API. The upper one is currently parsed, which we don't want.

Parse the Ember JSON body.

Config option to generate links with http or https

Because the Rails app will sit behind a reverse proxy, that might have configured https, it would be nice to tell the Rails app to generate the links with the correct protocol. This will also help with the hets communication, since redirections from http to https have been a big problem with hets.

Structure json schema files

The JSON schema files that document the API need to be structured strictly for ontohub/ontohub-frontend#41 to work.

I suggest the following file names and directory tree:

  • spec/support/api/schemas/: The JSON Schema directory
    • controllers/<controller_name>/: All the actions of the controllers are defined in this place. There is one file per action, for example, v2/repositories/get_show.json. These may contain $refs to files of the following bullet.
    • models/: Definitions of the models with their attributes, links and relationships. For instance, a repository_model.json would be here. These will be needed in the frontend to generate models there. The relationships themselves are just $refs to files of the following bullet.
    • relationships/: Definitions of the structure that the models have inside a relationships object. For example, the definition of properties.relationships.owner of a repository would be placed in the organizational_unit_relationship.json in this directory.

Every file should have the following header:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "<title for the documentation>",
  "description": "<description fot the documentation>",

  // the actual content of this schema goes here...
}

Note that there is no id of the schema itself any more.
Also, schemas that are referenced via $ref are defined in the root object and not nested in a definitions object.

Use gitlab_git

Use gitlab_git to

  • create a git repository
  • commit new files to master
  • commit changed files to master
  • commit deleted files to master

This shall be done in service objects that use the models File and Commit (ontohub/ontohub-models#13).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.