ontohub / ontohub-backend Goto Github PK

View Code? Open in Web Editor NEW

16.0 8.0 3.0 1.29 MB

The main Ontohub service that serves the data for the frontend and other clients via the GraphQL API.

License: GNU Affero General Public License v3.0

Ruby 83.39% HTML 0.39% JavaScript 0.10% Shell 16.12%

backend ontology analyse repository graphql ruby oms

ontohub-backend's Introduction

ontohub-backend

The main Ontohub service that serves the data for the frontend and other clients via the GraphQL API.

Run the backend in development mode

You need to have the postgres and rabbitmq services started. This guide assumes that you have got the hets-agent located at ../hets-agent relative to the ontohub-backend repository.

With invoker

We use the invoker process manager during development to run all the needed processes with one command. You can start it with either bundle exec rails invoker or bundle exec invoker start invoker.ini (the former only calls the latter, but consumes more memory).

Manually

Alternatively, you can start the processes yourself by running each of these commands in a separate terminal:

bundle exec rails sneakers:run to run the sneakers workers
bundle exec rails server to run the HTTP server of the backend in development mode
pushd ../hets-agent && bin/hets_agent; popd to run the HetsAgent

Access the backend

The backend is then reachable from the browser at http://localhost:3000. The interactive GraphiQL console can be accessed at http://localhost:3000/graphiql.

Dependencies

The backend is implemented in Ruby on Rails. First, the Ruby version referenced in the file .ruby-version needs to be installed, as well as the gem bundler. git needs to be installed as well. Invoking the command bundle install in the directory of this repository will then install all dependencies of the backend.

Set up a development environment

In order to set up a complete environment, please refer to the wiki page Setting up the development environment.

Build the REST API documentation

We maintain API documentation with a JSON schema description. The schemas are located at spec/support/api/schemas. You can build an HTML-representation of it with doca. This requires the tools npm and yarn to be installed on your system and be in your PATH.

First, you need to install doca with npm. We created a Rake task for this:

rails apidoc:prepare

Next, you need to create the documentation server files:

rails apidoc:init

This initialization must be run whenever new schema files are created.

And finally, you can run the API documentation server (the default port is 3002):

rails apidoc:run
# or to change the port:
PORT=8001 rails apidoc:run

Then, visit http://localhost:3002 to see the REST API documentation. This server listens to changes on the JSON schema files and updates the documentation.

ontohub-backend's People

Contributors

Stargazers

Watchers

Forkers

atul9 shimada-toy-box 00mjk

ontohub-backend's Issues

Version check fails on Travis

The current version check introduced in #47 uses git describe --long --tags to get the latest git tag. Since Travis only clones the last 50 commits and we now have more than 50 commits on some new branches, the tag is not cloned anymore and thus the version check fails.

The main problem is, that the command is run at startup time and not during the tests (during which the code is stubbed). So basically, if we're in test mode, we don't want to run the command at all during startup.

Elaborate use of Encrypted Secrets

Take a close look at Encrypted Secrets, as mentioned in http://weblog.rubyonrails.org/2017/2/23/Rails-5-1-beta1/ and decide whether or not they're a good choice for the ontohub-backend.

migration tool for DOL files (Ontohub 1.0 -> 2.0)

adapt the repo URLs (i.e. add the organisation name)

Add VersionsController

Add a controller that reports the current backend version (commit oid, date).

Go through the security checklist

See https://github.com/brunofacca/zen-rails-security-checklist.
Some of this is only applicable to apps with html views, but some of it is very important for the API-side as well.

Update (16 Nov 2017)

I went through the checklist and these are the points which we need to pay a little bit more attention to:

Authentication

Expire the session at log out and expire old sessions at every successful login. Mitigates CSRF, session hijacking and session fixation attacks by reducing their time-frame.

Backend: We would need to have a column session_key for a User that is re-written on every sign-in. A sign-out would delete the session_key. This requires a signOutMutation. Authentication only happens if the correct session_key is inside the JWT.

Frontend: Sign-out not only deletes the JWT in the browser, but also invalidates the session_key
Expire sessions after a period of inactivity (e.g., 30 minutes). Mitigates CSRF, session hijacking and session fixation attacks by reducing their time-frame.

Backend: We do have an expiration time in the JWT but, currently, we don't check it during authentication.
Consider using two-factor authentication (2FA) as provided by Authy. Provides a highly effective extra layer of authentication security.
Devise: see the devise-two-factor and authy-devise gems.

HTTP & TLS

Security-related headers

Consider using the Secure Headers gem. Mitigates several attacks.

Security Tools

Run Brakeman before each deploy. If using an automated code review tool like Code Climate, enable the Brakeman engine.
Consider using a continuous security service such as Detectify.

Detectify automatically scans the application for several hundred security vulnerabilities. It even has a free plan for nonprofit organisations (but only one domain/subdomain). They need to be contacted by mail in this case.
Consider using a Web Application Firewall (WAF) such as
NAXSI for Nginx,
ModSecurity for Apache and Nginx.
Mitigates XSS, SQL Injection, DoS, and many other attacks.

CSRF protection

We should implement some sort of Cross Site Request Forgery protection. While Rails offers the protect_from_forgery method on controllers, combining this with an independent frontend is a bit more tricky (since the forms are not rendered by Rails, it can't inject the CSRF token).

http://www.blaketidwell.com/2015/01/27/ember-rails-csrf-handling.html describes how we could implement this with Ember and Rails. https://nvisium.com/blog/2014/09/10/understanding-protectfromforgery/ describes a bit more how CSRF protection works in Rails itself.

repository controller returns repositories of all namespaces

Given there are two namespaces with repositories
When I call the API to get the repositories of one of the namespaces (e.g. /namespaces/xyz/repositories)
Then all repositories of both namespaces will be included

Build broken due to merge problems

By merging #74 and #75 we created a problem. #74 still uses the users id in the token instead of the slug.

RSpec is run twice when using rake

RSpec::Core::RakeTask.new(:spec)

should be removed from the Rakefile.

Add repository content type: mathematical

Add a repository content type mathematical for mathematical theories.

Auto generate API documentation from JSON Schema

I just discovered doca, which is a tool developed by CloudFlare, that generates HTML from JSON Schema files. Here is an introductory blog post, that shows how this works: https://blog.cloudflare.com/cloudflares-json-powered-documentation-generator/

Also: CloudFlares own API documentation generated with doca

Might be worth looking into, given that we already have JSON Schema files for the API tests.

Upgrade to rbx 3.70

Update README

In ontohub/ontohub-frontend#30 we linked to the backend, but at the moment there's no content in the README.

Git: Conflict Resolution

Resolve conflicts when committing. Currently, committing is prohibited if the branch has changed since the checking out a file (see previous_head_sha variable all over lib/git/committing.rb).

Resolve minor conflicts by:

Create a new branch from the previous_head_sha
Commit to the branch
Merge the branch back to the original branch
Delete the created branch

If the merging fails, respond with a merge conflict, but delete the created branch anyway.

draw diagram with components and tools

we use so many tools (e.g. mirage, ember_json_schema, gitlab_git...) that an overview diagram would be very useful. Standard gems can be omitted, list only the most important gems.

Add Settings for Host

Currently, url_for always returns http://localhost:3000/.... This must be configured to use a value that is configurable.

Users/Organizations serializers do not render repositories relationship

They should have something like we used to in the namespace serializer.

Outsource the git layer to gitlab_git

The git layer may be interesting for other people as well. Since gitlab does not maintain gitlab_git any more, we should move the library functionality of our git layer to our fork of the gitlab_git gem.

Add OrganizationsController

For ontohub/ontohub-models#39

Serialize organization's description

ontohub/ontohub-models#49 added a description field, that is not yet serialized in the backend.

Create Capistrano Tasks

Deploy all the components via a Capistrano task.

Persist JWT key

Persist the JWT key such that it is still used if the server is restarted.

Remove the JWT key secrets from the config/secrets.yml because they are not used.

Login with JWT and Devise

We need

SessionsController (Actions: create) with tests
Authentication strategy warden/devise (See Tutorial)
Token with UserID, Expiration

Investigate party_foul

Investigate if party_foul is useful to automatically report backend errors as github issues.

Ontohub services

Translate formula along signature morphism (hets)
Flatten a theory (hets)
Compute colimit of a network (hets)
Compute normal form of a structured OMS/graph (hets)
Syntax highlighting, autocomplete, resolve origin of symbol, correction, includes (hets)
Show all open proof obligations in OMS graph (hets)
OMS graph transformation (hets; proof rule, apply comorphism)
Show and inspect refinement tree (hets)
Parse OMS (hets)
Prove proof obligation (hets)
Parse string in context (hets)
QMT (hets; parse, check, simplify, infer type, present, map, filter, sparql-like)

List all unproven theorems of a flat theory
List all indirect imports of a theory (recursive database)
GET URI
POST content
GET SVG (multiple graphs)
Search
Admin commands

Subterm selection (client side)
Folding (client side)
Visibility changes (client side; infered types, redundant brackets, implicit arguments)
Tooltip (client side)

Wrap intentionally untested code in nocov

Wrap intentionally untested code in # :nocov: comments to hide them in the coverage report. See https://github.com/colszowka/simplecov#ignoringskipping-code for an example.

Add spec helper method to inspect the response body

And use it instead of JSON.parse(response.body) in the specs.

Add content_type and private_access to RepositoriesController

Adjust the RepositoriesController to ontohub/ontohub-models#19

Git: Mutex git actions

We use two different systems to write to git repositories: git-shell and gitlab-git. The write actions need to be done in a critical path, exclusively for one process. Use an appropriate mutex mechanism to ensure that only one process has write access. Lock-files may be a good option.

Add API documentation

Although there are tests and JSON schema definitions, we could use a human-readable API documentation. I suggest to create a doc/api directory and put API documentation there.

Authorisation

Add a notion of an admin and set up authorisation. Search for the best library for the job. In legacy-Ontohub, we wanted to switch to pundit because policies are just ruby objects and they can be used more easily in the git-shell, which we want to be light-weight.

Solve devise/migrate issue

rails db:drop
rails db:create
rails db:migrate

Fails because of the devise_for in the routes. Fix it.

Roadmap: The next few steps

The next few big steps of the overall implementation should be

Frontend: tests of the available code ontohub/ontohub-frontend#41, ontohub/ontohub-frontend#38, ontohub/ontohub-frontend#39
Frontend: Check backend version ontohub/ontohub-frontend#36
Backend: basic git API #9
Frontend: basic git GUI ontohub/ontohub-frontend#57
System-tests: all reasonable system tests ontohub/system-test#1

Backend: user registration (with captcha) and deletion API #86
Frontend: user registration ontohub/ontohub-frontend#93
Frontend: user (with deletion) GUI ontohub/ontohub-frontend#98
Backend: authorisation #87, #175
Frontend: authorisation ontohub/ontohub-frontend#59

Hets: basic OMS analysis (e.g. only the name) spechub/Hets#1690
hets-rabbitmq-wrapper: Call Hets for OMS analysis ontohub/hets-agent#12

These blocks contain tasks that can be done more or less in parallel. The

Registration and Deletion of Users

Allow to register and delete users. Registration should be done with a captcha system.

Create a controller for Namespace

Only for read actions. Namespaces should be created/updated/deleted in the console only at first. Later on, namespaces should be closely tied to the OrganizationalUnit (whose children will get controllers).

Setup a full-application-integration-test-environment

Use a CI-service like Jenkins to test the full application, not only single components like frontend or backend.

Create API for commits and files

Needed routes:

/repositories/:id/commits # List of commits in that repository
/repositories/:id/commits/:hash # One specific commit
/repositories/:id/commits/:hash/files # List of files at one specific commit
/repositories/:id/commits/:hash/files/:file_path # One specific file at one specific commit

Repository should include relationship to the list of commits. In the list of commits each commit should have a relationship to the list of files and each file in this list should link to the file url.
The repository should also include some information about the branches of the repository, especially which one is the default branch (needs to be implemented in the models first, see ontohub/ontohub-models#52).

central features needed for going productive

Here is a list of central features that needed for going productive (i.e. replacing the productive Ontohub 1.0). The list should include the necessary things, but should be kept as short as possible so that we can go online as quickly as possible:

manage repositories
manage ontologies and their sentences and symbols, and errors
theorem proving
Needed only later:
displays of graphs
mappings
search
metadata
list of logics

Git: Refs and Tags

Implement the following refs/tags functionality:

~~List refs~~ (struck out: see below)
```
/:user/:repository/refs   GET index
```
List tags
```
/:user/:repository/tags   GET index
```
~~Create a tag~~
```
/:user/:repository/tags   POST create
```

Show a tag

/:user/:repository/tags/:tag   GET show (with additional information like release notes)

~~Delete a tag~~

/:user/:repository/tags/:tag   DELETE destroy

These functions are already implemented in GraphQL by #165.

The struck out bullets are not needed because we only want to have read actions in the REST API.

Also, I vote to drop support for the refs route because it is only the union of tags and branches. If it is really needed to show both on a web page, the frontend should make two queries and put them together itself.

Git: Branches

Implement the following branching functionality

List branches

/:user/:repository/branches   GET index

Show branch

/:user/:repository/branches/:branch   GET show (identical with /:user/:repository/ref/:branch/commits)

~~Create a branch~~

/:user/:repository/branches   POST create

~~Delete a branch~~

/:user/:repository/branches/:branch   DELETE destroy

Get the default branch

/:user/:repository/branches///default   GET show

~~Set the default branch~~

/:user/:repository/branches///default PATCH update

These functions are already implemented in GraphQL by #165.

The struck out bullets are not needed because we only want to have read actions in the REST API.

Git: Cloning

When creating a repository, allow to clone a remote git/svn repository. Either as a fork that is writable or as a mirror that is synchronised periodically and write-protected. When creating such a mirror/fork, allow to specify a list of UrlMappings.

This depends on ontohub/ontohub-models#114 and #250

Use GitHelper.exclusively(repository) { repository.git.pull } to wrap the synchronisation process in a mutex. See #85 and #304.

Implementation Hints

Cloning

Create a GraphQL mutation

mutation ($newRepository: NewRepository!, $remoteAddress: String!, $remoteType: RepositoryRemoteTypeEnum!, $urlMappings: [UrlMapping!]!) : Repository {
  cloneRepository(data: $newRepository, remoteAddress: $remoteAddress, type: $type, urlMappings: $urlMappings)
}

that creates a Repository (not a RepositoryCompound) similar to the createRepositoryMutation but also sets the additional fields appropriately. After creating, it schedules an asynchronous background job with RepositoryCloningJob.perform_later(repository.id).

Edit: Before creating the Repository, call Bringit::Wrapper.valid_remote?(remote_address) to check whether or not the remote is a supported repository. If it returns false, add an error to the GraphQL context.
Create a RepositoryCloningJob with a public method perform(repository_id) that fetches the Repository from the database and clones the git repository with gitlab_git. Right after cloning, the synchronized_at timestamp should be set to the current time.
Create a RepositoryCloningWorker that only sets the options (queue, threads, prefetch, timeout) appropriately. See, for instance, the PostProcessHetsWorker as an example.
Add a simple, small bare git repository and an svn repository to db/seeds/fixtures/repositories. Clone these two repositories at the end of db/seeds/030_repository_seeds.rb the same way you clone them in the createRepositoryMutation.

Pulling

Create a RepositoryPullingWorker that only sets the appropriate options.
Create a RepositoryPullingJob with a public method perform(repository_id) that fetches the RepositoryCompound from the database and pulls the git repository with gitlab_git inside a mutex. Use GitHelper.exclusively(repository) { repository.git.pull } to wrap the synchronisation process in a mutex.
Pulling is supposed to happen periodically on mirrors (not forks), e.g. once a day. Create a RepositoryPullingPeriodicallyJob that uses the :async adapter (see the second listing in RailsGuides: ActiveJob Basics: 4.2 Setting the Backend for the configuration of a single job). The RepositoryPullingPeriodicallyJob has a perform method (no arguments) that fetches every mirror repository from the database and schedules a RepositoryPullingJob for each of them. In the end, the RepositoryPullingPeriodicallyJob schedules a RepositoryPullingPeriodicallyJob again with RepositoryPullingPeriodicallyJob.perform_in(OntohubBackend::Application.config.mirror_repository_synchronization_interval)
Add a configuration config.mirror_repository_synchronization_interval = 6.hours to config/application.rb.
Call RepositoryPullingPeriodicallyJob.perform_in(OntohubBackend::Application.config.mirror_repository_synchronization_interval) from the after_initialize block at the end of config/application.rb.

Policies

Edit the RepositoryPolicy#write? policy such that it is not permitted to write to a mirror repository.

Explanation of the Control Flow

A mirror/fork repository is created synchronously by the mutation, but it is not yet cloned.
Cloning is supposed to happen in the background because it may take a lot of time and we want to keep the response delays at a minimum. Calling SomeJob.perform_later pushes a job to a queue. The arguments (their serialisation) of such jobs should also be as small as possible. The SomeWorker runs in a separate process (maybe even on a different machine) and listens to the queue. Whenever a new item appears on the queue, it calls the SomeJob and passes the arguments.

Unfortunately, our default backend for ActiveJob does not support delayed jobs which are needed for periodical pulling of mirror repositories. We work around this shortcoming by using the ActiveJob Async Job for the periodic feature. This one, on the other hand does not persist jobs, so they are lost when the ontohub-backend reboots.

Git: Diff

Diff of a commit
```
/:user/:repository/diff/:revision   GET
```
The :revision in the URL is the target commit. If it is not specified, the HEAD of the default branch is supposed to be the target.

Diff of a commit range

/:user/:repository/diff/:revision_from..:revision_to   GET

These functions are already implemented in GraphQL by #165.

Create dummy search controller

It would be good for the minimal deployment to have some kind of way to access all repositories. The nicest way IMHO would be to have a dummy search page, that always shows all repositories. That way, we can just add the search functionality later and keep the frontend view.

Git: Commit Info

Implement the following commit info functionality:

Show info of a commit (HEAD of the default branch of no ref is given)
```
/:user/:repository[/ref/:reference]/commit   GET show (the commit)
```
Git log a directory (from the HEAD of the default branch of no ref is given)
```
/:user/:repository[/ref/:reference]/commits/:path_to_the_directory   GET
```
Git log a file (from the HEAD of the default branch of no ref is given)
```
/:user/:repository[/ref/:reference]/commits/:path_to_the_file   GET
```

These functions are already implemented in GraphQL by #165.

Search serializer should return merged users and organizations

They both share a common namespace anyway. Searching for either one alone can be done via a client side filter or with a filtering query parameter.

Parse relationships object of a POST request

Ember sends data as

{
  "data": {
    "attributes": {
      "name": "foobar",
      "description": "barfoo"
    },
    "relationships": {
      "namespace": {
        "data": {
          "type": "namespaces",
          "id": "ada"
        }
      }
    },
    "type": "repositories"
  }
}

and not as

{
  "data": {
    "attributes": {
      "name": "foobar",
      "description": "barfoo",
      "namespace_id": "ada"
    },
    "type": "repositories"
  }
}

Both of them are compatible to the JSON API. The upper one is currently parsed, which we don't want.

Parse the Ember JSON body.

Config option to generate links with http or https

Because the Rails app will sit behind a reverse proxy, that might have configured https, it would be nice to tell the Rails app to generate the links with the correct protocol. This will also help with the hets communication, since redirections from http to https have been a big problem with hets.

Add Short Routes for Namespace and Repository

Add routes for /:namespace_slug and /:namespace_slug/:repository_slug.

Put the old namespaced routes in a scope endpoints.

Structure json schema files

The JSON schema files that document the API need to be structured strictly for ontohub/ontohub-frontend#41 to work.

I suggest the following file names and directory tree:

spec/support/api/schemas/: The JSON Schema directory
- controllers/<controller_name>/: All the actions of the controllers are defined in this place. There is one file per action, for example, v2/repositories/get_show.json. These may contain $refs to files of the following bullet.
- models/: Definitions of the models with their attributes, links and relationships. For instance, a repository_model.json would be here. These will be needed in the frontend to generate models there. The relationships themselves are just $refs to files of the following bullet.
- relationships/: Definitions of the structure that the models have inside a relationships object. For example, the definition of properties.relationships.owner of a repository would be placed in the organizational_unit_relationship.json in this directory.

Every file should have the following header:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "<title for the documentation>",
  "description": "<description fot the documentation>",

  // the actual content of this schema goes here...
}

Note that there is no id of the schema itself any more.
Also, schemas that are referenced via $ref are defined in the root object and not nested in a definitions object.

Use gitlab_git

Use gitlab_git to

create a git repository
commit new files to master
commit changed files to master
commit deleted files to master

This shall be done in service objects that use the models File and Commit (ontohub/ontohub-models#13).

ontohub / ontohub-backend Goto Github PK

ontohub-backend's Introduction

ontohub-backend

Run the backend in development mode

With invoker

Manually

Access the backend

Dependencies

Set up a development environment

Build the REST API documentation

ontohub-backend's People

Contributors

Stargazers

Watchers

Forkers

ontohub-backend's Issues

Update (16 Nov 2017)

Authentication

HTTP & TLS

Security-related headers

Security Tools

Implementation Hints

Cloning

Pulling

Policies

Explanation of the Control Flow

Recommend Projects

Recommend Topics

Recommend Org