Giter VIP home page Giter VIP logo

entities-service's Introduction

Entities Service

REST API service for serving entities

This is a FastAPI-based REST API service running on onto-ns.com. It's purpose is to serve entities from an underlying database.

Other than the REST API service, the repository also contains a CLI for validating and uploading entities to the service, as well as create and manipulate the service' configuration file. See the CLI documentation for more information.

The repository also contains a pre-commit hook validate-entities, which may be used externally to validate entities before committing them a repository. See the pre-commit hook documentation for more information.

Install the service

First, download and install the Python package from GitHub:

# Download (git clone)
git clone https://github.com/SINTEF/entities-service.git
cd entities-service

# Install (using pip)
python -m pip install -U pip
pip install -U -e .

Important: If using this service locally alongside DLite, it is important to note that issues may occur if NumPy v2 is used. There is no known issues with NumPy v1. This is a registered issue found for DLite v0.5.16.

Run the service

The service requires a MongoDB server to be running, and the service needs to be able to connect to it. The service also requires a valid X.509 certificate, in order to connect to the MongoDB server.

The MongoDB server could be MongoDB Atlas, a local MongoDB server, or a Docker container running MongoDB.

Using the local environment and MongoDB Atlas

First, create a MongoDB Atlas cluster, and a user with read-only access to the entities database.

Set the necessary environment variables:

ENTITIES_SERVICE_MONGO_URI=<your MongoDB Atlas URI>
ENTITIES_SERVICE_X509_CERTIFICATE_FILE=<your X.509 certificate file>
ENTITIES_SERVICE_MONGO_USER=<your MongoDB Atlas user with read-only access (default: 'guest')>
ENTITIES_SERVICE_MONGO_PASSWORD=<your MongoDB Atlas user's password with read-only access (default: 'guest')>

Run the service:

uvicorn entities_service.main:APP --host localhost --port 8000 --no-server-header --header "Server:EntitiesService"

Finally, go to localhost:8000/docs and try out retrieving an entity.

--log-level debug can be added to the uvicorn command to get more verbose logging. --reload can be added to the uvicorn command to enable auto-reloading of the service when any files are changed.

Note, the environment variables can be set in a .env file, see the section on using a file for environment variables.

Using Docker and a local MongoDB server

First, we need to create self-signed certificates for the service to use. This is done by running the following command:

mkdir docker_security
cd docker_security
../.github/docker_init/setup_mongo_security.sh

Note, this is only possible with openssl installed on your system. And the OS on the system being Linux/Unix-based.

For development, start a local MongoDB server, e.g., through another Docker image:

docker run --rm -d \
  --env "IN_DOCKER=true" \
  --env "HOST_USER=${USER}" \
  --env "MONGO_INITDB_ROOT_USERNAME=root" \
  --env "MONGO_INITDB_ROOT_PASSWORD=root" \
  --name "mongodb" \
  -p "27017:27017" \
  -v "${PWD}/.github/docker_init/create_x509_user.js:/docker-entrypoint-initdb.d/0_create_x509_user.js" \
  -v "${PWD}/docker_security:/mongo_tls" \
  mongo:7 \
  --tlsMode allowTLS --tlsCertificateKeyFile /mongo_tls/test-server1.pem --tlsCAFile /mongo_tls/

Then build and run the Entities Service Docker image:

docker build --pull -t entities-service --target development .
docker run --rm -d \
  --env "ENTITIES_SERVICE_MONGO_URI=mongodb://localhost:27017" \
  --env "ENTITIES_SERVICE_X509_CERTIFICATE_FILE=docker_security/test-client.pem" \
  --env "ENTITIES_SERVICE_CA_FILE=docker_security/test-ca.pem" \
  --name "entities-service" \
  -u "${id -ur}:${id -gr}" \
  -p "8000:80" \
  entities-service

Now, fill up the MongoDB with valid entities at the entities_service database in the entities collection.

Then go to localhost:8000/docs and try out retrieving an entity.


For production, use a public MongoDB, and follow the same instructions above for building and running the Entities Service Docker image, but exchange the --target value with production, put in the proper value for the ENTITIES_SERVICE_MONGO_URI and ENTITIES_SERVICE_X509_CERTIFICATE_FILE environment values, possibly add the ENTITIES_SERVICE_MONGO_USER, ENTITIES_SERVICE_MONGO_PASSWORD, and ENTITIES_SERVICE_CA_FILE environment variables as well, if needed.

Using Docker Compose

Run the following commands:

docker compose pull
docker compose --env-file=.env up --build

By default the development target will be built, to change this, set the ENTITIES_SERVICE_DOCKER_TARGET environment variable accordingly, e.g.:

ENTITIES_SERVICE_DOCKER_TARGET=production docker compose --env-file=.env up --build

Furthermore, the used localhost port can be changed via the PORT environment variable.

The --env-file argument is optional, but if used, it should point to a file containing the environment variables needed by the service. See the section on using a file for environment variables for more information.

Using a file for environment variables

The service supports a "dot-env" file, i.e., a .env file with a list of (secret) environment variables.

In order to use this, create a new file named .env. This file will never be committed if you choose to git commit any files, as it has been hardcoded into the .gitignore file.

Fill up the .env file with (secret) environment variables.

For using it locally, no changes are needed, as the service will automatically check for a .env file and load it in, using it to set the service app configuration.

For using it with Docker, use the --env-file .env argument when calling docker run or docker compose up.

CLI

The CLI is a command-line interface for interacting with the Entities Service. It can be used to validate and upload entities to the service, as well as create and manipulate the service' configuration file.

To see the available commands and options, run:

entities-service --help

A complete list of commands and options can be found in the CLI documentation.

pre-commit hook validate-entities

The validate-entities pre-commit hook runs the CLI command entities-service validate on all files that are about to be committed. This is to ensure that all entities are valid before committing them to the repository.

By default it runs with the --verbose flag, which will print out detailed differences if an entity already exists externally and differs in its content. Furthermore, it will run such that all supported file formats (currently JSON and YAML/YML) will be validated.

Important: Add the .[cli] entry to the additional_dependencies argument.

It is also advisable to focus in which directories or files the hook should run for, by adding the files argument.

Here is an example of how to add the validate-entities pre-commit hook to your .pre-commit-config.yaml file, given a repository that contains entities in the entities directory in the root of the repository:

repos:
# ...
- repo: https://github.com/SINTEF/entities-service
  rev: v0.6.0
  hooks:
  - id: validate-entities
    additional_dependencies: [".[cli]"]
    files: ^entities/.*\.(json|yaml|yml)$

This will run for all JSON, YAML, and YML files in the entities directory and its subdirectories.

Note, you can add the --no-external-calls argument if you wish to not make external calls to the Entities Service when validating entities. This is useful when running the pre-commit hook in an environment where the Entities Service is not available, or when you wish to only validate the entities locally.

# ...
    args: ['--no-external-calls']
# ...

Testing

The repository code is tested using pytest. For the service, it can be tested against a local MongoDB server and Entities Service instance or against a mock MongoDB server and Entities Service instance utilizing Starlette's TestClient.

To run the tests, first install the test dependencies:

pip install -U -e .[testing]

Then run the tests (for mock MongoDB (mongomock) and Entities Service):

pytest

To run the tests against a live backend, you can pull, build, and run the Docker Compose file:

docker compose pull
docker compose build

Before running the services the self-signed certificates need to be created. See the section on using Docker and a local MongoDB server for more information.

Then run (up) the Docker Compose file and subsequently the tests:

docker compose up -d
pytest --live-backend

Remember to set the ENTITIES_SERVICE_X509_CERTIFICATE_FILE and ENTITIES_SERVICE_CA_FILE environment variables to docker_security/test-server1.pem and docker_security/test-ca.pem, respectively. Note, these environment variables are already specified in the docker-compose.yml file, however, one should still check that they are set correctly.

Test uploading entities using the CLI

To test uploading entities using the CLI, one must note that validation of the entities happens twice: First by the CLI, and then by the service. The validation that is most tricky when testing locally is the namespace validation, as the service will validate the namespace against the ENTITIES_SERVICE_BASE_URL environment variable set when starting the service, which defaults to http://onto-ns.com/meta. However, if using this namespace in the CLI, the CLI will connect to the publicly running service at http://onto-ns.com/meta, which will not work when testing locally.

So to make all this work together, one should start the service with the ENTITIES_SERVICE_BASE_URL environment variable set to http://localhost:8000 (which is done through the locally available environment variable ENTITIES_SERVICE_HOST), and then use the CLI to upload entities to the service running at http://localhost:8000.

In practice, this will look like this:

# Set the relevant environment variables
export ENTITIES_SERVICE_BASE_URL=http://localhost:8000
export ENTITIES_SERVICE_HOST=${ENTITIES_SERVICE_BASE_URL}

# Start the service
docker compose up -d

# Upload entities using the CLI
entities-service upload my_entities.yaml --format=yaml

The my_entities.yaml file should contain one or more entities with uri values of the form http://localhost:8000/....

Extra pytest markers

There are some custom pytest markers:

  • skip_if_live_backend: skips the test if the --live-backend flag is set. Add this marker to tests that should not be run against a live backend. Either because they are not relevant for a live backend, or because they currently impossible to replicate within a live backend.

    A reason can be specified as an argument to the marker, e.g.:

    @pytest.mark.skip_if_live_backend(reason="Cannot force an HTTP error")
    def test_something():
        ...

    Availability: This marker is available for all tests.

  • skip_if_not_live_backend: skips the test if the --live-backend flag is not set. Add this marker to tests that should only be run against a live backend. Mainly due to the fact that the mock backend does not support the test.

    A reason can be specified as an argument to the marker, e.g.:

    @pytest.mark.skip_if_not_live_backend(reason="Indexing is not supported by mongomock")
    def test_something():
        ...

    Availability: This marker is available for all tests.

Extra pytest fixtures

There is one fixture that may be difficult to locate, this is the parameterized_entity fixture. It can be invoked to automatically parameterize a test, iterating over all the valid entities that exist in the valid_entities.yaml static test file. It will return one of these entities as a parsed dictionary for each iteration, i.e., within each test.

The fixture is available for all tests.

Licensing & copyright

All files in this repository are MIT licensed.
Copyright by Casper Welzel Andersen, SINTEF.

Acknowledgements

This project is made possible by funding from:

  • MEDIATE (2022-2025) that receives funding from the RCN, Norway, FNR, Luxembourg, and SMWK, Germany via the M-ERA.NET programme, project9557. M-ERA.NET 2 and M-ERA.NET 3 have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements No 685451 and No 958174.
  • MatCHMaker (2022-2026) that receives funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101091687.

entities-service's People

Contributors

casperwa avatar team4-0 avatar pre-commit-ci[bot] avatar jesper-friis avatar

Watchers

Thomas Hagelien avatar  avatar  avatar

Forkers

jesper-friis

entities-service's Issues

πŸ”€ Split CLI into separate repository

The CLI is becoming a larger and more independent part of the repository. As such it should have its own repository to make it stand out from its ties with and development around the onto-ns.com-specific entities service.

πŸ”§ `dimensions` should always be returned

If dimensions is not defined for an entity, the key will be left out altogether from the returned entity. If using the service together with DLite, this will currently result in an error as DLite can not handle entities, where dimensions is not defined (even if it would be empty).

Release summary v0.3.0

CI/CD-friendly CLI

The CLI now supports setting an access token to upload entities - this avoids the need for manual interaction when authenticating using GitLab. The access token should preferably be case-specific and created as a group access token (see the GitLab documentation for more information about group access tokens).

Note, the minimum access level for the token should still be Developer for it to be allowed to create entities.
Beware that this minimum access level may change in the future.

Only deploy service if changes in service is detected

To minimize new deployments happening, which has the potential to disrupt the runtime, it would be good if deployments only happen if changes have been introduced that actually touch the service, i.e., changes to the CLI should not result in a new deployment, for example.

Separate entity URI/identity from hosting URL

(...) [F]or features I think it will be good in future update to separate the uri/identity of the entity and the url of the hosting service, as we will need to host entities created elsewhere with different namespaces. (...)

Originally posted by @quaat in #67 (review)

Add a release workflow

While this package is not planned to be released on a public package index (like PyPI), it would still be nice to create GitHub releases and up the version accordingly, keeping a changelog and such.

πŸ“– Consolidate documentation

There is a lot of documentation currently in the README.
Also, PR #114 will add CLI API documentation.

All of this documentation should be consolidated in a proper browser-able documentation site.

The sections of the README should also be split down to more manageable portions.

Release summary v0.2.0

Support specific namespaced URIs

Support specific namespaced URIs according to #7. If one has write rights for the entity backend, by setting the namespace and/or uri value in the entity files, they will use the relevant namespace, either the core namespace or a specific namespace.

UX/DX updates

Otherwise, the code has had some clean up related to entity model definitions, separating out the SOFT flavorings. Some fixes have been implemented after first tests have been run "in production".

Release summary v0.5.0

Use the CLI in CI/CD

With the latest updates to the entities-service CLI it can be used as intended in CI/CD workflows.

To make this happen, this release further upgrades the CLI, mainly by deprecating the --file/-f and --dir/-d inputs for the upload and validate commands in favor of a SOURCE... argument, i.e., one can supply (relative or absolute) paths to files and directories multiple times to the commands, separating them by a space (or wrapping them in quotation marks (either " or ' will work).

In addition, this new SOURCE... argument is utilized in two cases:

  • One can supply the arguments via stdin.
  • The validate command has been wrapped as a pre-commit hook.

The stdin possibility allows one to do something like:

git diff --name-only | entities-service validate -

This will supply the validate command with a list of files that are different between the current git working directory and the previous commit.

pre-commit hook validate-entities

When using the hook, one should focus in the files it runs on via the files hook argument. One must also supply the argument additional_dependencies with the value '.[cli]' (note, this argument expects a list, so this value should be one of the values in that list).

The hook will automatically run on all implemented formats (currently JSON and YAML/YML).

The hook will automatically use the --verbose flag, should there be any content differences between the local and externally existing counterparts.

✨ Support piping in SOURCE's (filepaths and directories)

It would be great if one could do something like:

git diff --name-only | entities-service validate

This being thought to work best with the updated input method introduced in #126.

If an option is necessary (like --stdin) this might be a better way of doing it, to ensure the user actively chooses to use the piped in text.

Change target for the CLI upload

Instead of targeting the backend directly (MongoDB in this case) a POST endpoint in the REST API service should be implemented that can be targeted by the CLI instead, and then the service deals with the backend.

This should help support #7 in the future as well as minimize the implementation overhead in this repository.

✨ New option `--strict` for the `validate` command

This option would ensure a non-zero error/return code is emitted from running the command if one or more entities are not "fully" valid - meaning, they do not already exist externally. I.e., this option should most likely only be used whenever --no-external-calls is not used.

Release summary v0.4.0

New validate CLI command

A new CLI command (validate) has been added to make it possible to validate entities.
This is convenient both as a split of the bloated upload command implementation, but also as a separate functionality for data documentation repositories to ensure any changes to entities will still result in valid entities.

Furthermore, a new --auto-confirm/-y option has been added to the upload command as an extension on the --quiet/-q option. It will still ensure print statements occur, but will use defaults and "Yes" responses whenever it is needed.

Release summary v0.1.0

This is the first "proper" release (after v0.0.1).

It introduces several steps up compared to a bare-bones REST API service that started as a way to return the entities from the URI/URL they have defined. Specifically on the onto-ns.com domain under the /meta path.

The main upgrade revolves around the CLI, implemented to facilitate uploading entities.
It does so by connecting to the REST API service, authenticating via SINTEF's GitLab OAuth2 flow.

A PyPI release is not planned yet. To install this package run (in a virtual environment):

pip install "entities-service[cli] @ git+https://github.com/SINTEF/entities-service.git"

Then run:

entities-service --help

To learn more about the CLI.

Improve logging

There's a lot of unnecessary logging concerning what backend client is used, etc.
It would also be nice to log the user name who is uploading or doing other stuff.

Supporting URI values not being a URL

URIs do not need to be a valid URL. The current core assumption of the entities service is that they are - specifically with a domain matching where the service is exposed and running.

Instead, the service should be changed to be more of a data catalogue type service, where it can run on an "agnostic" domain, and one is simply using URL query parameters (or similar) to ask for an entity with a specific URI/identity. Then the database backend can be a non-specific or even explicitly specified via another query parameter, if so desired.

This is another interpretation of the issue outlined originally by @sygout in #7.

Multiple database acting as source of entities

Define the modifications of the service to handle multiple databases where entities are stored.

The assumption is that different communities might use different databases to store their entities.

Onto-ns service could then explore and recover these entities and check if the uri exist in one or more of the database, if not return an error, if multiple but exactly equivalent return the entity (but inform the database managers about duplication) and if multiple different return error and inform databases managers.

Minimize code repeats in SOFT models

Almost all of the code for the SOFT5 and SOFT7 pydantic models is the same. This should be minimized by either extracting the common parts in a "base" SOFT model or have the oldest versioned model (SOFT5) be the "base", and the SOFT7 model then sub-classes SOFT5, making the necessary changes on top.

✨ Add a new `validate` command

This is useful for CI/CD workflows.
This may be used also as a pre-commit hook.

It should do everything the upload command does, without actually uploading anything.

πŸ“„ Document the CLI API

Typer has a possibility to do automatic CLI documentation, which should be exploited to get a CLI API documentation.

Help script for uploading entities

It would be helpful with a script for uploading entities to be served at http://onto-ns.com/meta/<entity.version>/<entity.name>.

It could also be used in a dedicated upload endpoint, however, this is considered out-of-scope of this issue as it is considered out-of-scope for this service at this point.

✨ New option `--strategy` for the `upload` command

An upload strategy option would be good to be more explicit and specific when wanting to pre-determine answers to how entities should be handled if they already exist.

This would make it more explicit than using the --auto-confirm to simply use the current defaults.

An access token is not tested properly in the service

Due to some last minute (bad) corrections in #110, the verify_user_access_token() function in the entities_service.service.security module is not invoked at the proper time. Calling it should be moved "up" to after calling the OpenID userinfo-endpoint and getting a non-successful response.

Release summary v0.6.0

Support identity and ensure dimensions is present

identity is now allowed as an alias for uri. This is in accord with regular SOFT schemas.

The dimensions key is now always returned when retrieving an entity, even if empty. This is done mainly to support DLite usage, since DLite cannot handle entities that do not explicitly define the dimensions key, even though it may be empty.

pre-commit.ci

The DX has been optimized by using pre-commit.ci for running pre-commit hooks on a PR as well as autoupgrading the hooks weekly as part of the repository's CI/CD.

Rely on soft7 package for pydantic models

Once the soft7 package has matured a bit more, the pydantic models there should be imported and utilized here in favor of the existing models.

This mainly reduces code duplication across repositories and collects the "truth" at a more reasonable place.

πŸ” Support supplying access token to upload

This ensures the CLI can be used in CI/CD workflows, where opening and authorizing a user through a browser window is not possible.

The idea here is to allow passing an access token or similar to the CLI that represents a user, who has already authorized the access for the app on the authorization platform (GitLab @ SINTEF as of the time of writing).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.