substra / substra-tests Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 4.0 1.02 MB

End to end tests for the Substra software

Home Page: https://docs.substra.org

License: Apache License 2.0

Makefile 0.56% Python 98.33% Dockerfile 0.46% Mustache 0.65%

substra-tests's Introduction

Substra is an open source federated learning (FL) software. It enables the training and validation of machine learning models on distributed datasets. It provides a flexible Python interface and a web application to run federated learning training at scale. This specific repository is the low-level Python library used to interact with a Substra network.

Substra's main usage is in production environments. It has already been deployed and used by hospitals and biotech companies (see the MELLODDY project for instance). Substra can also be used on a single machine to perform FL simulations and debug code.

Substra was originally developed by Owkin and is now hosted by the Linux Foundation for AI and Data. Today Owkin is the main contributor to Substra.

Join the discussion on Slack and subscribe here to our newsletter.

To start using Substra

Have a look at our documentation.

Try out our MNIST example.

Support

If you need support, please either raise an issue on Github or ask on Slack.

Contributing

Substra warmly welcomes any contribution. Feel free to fork the repo and create a pull request.

Setup

To setup the project in development mode, run:

pip install -e ".[dev]"

To run all tests, use the following command:

make test

Some of the tests require Docker running on your machine before running them.

Code formatting

You can opt into auto-formatting of code on pre-commit using Black.

This relies on hooks managed by pre-commit, which you can set up as follows.

Install pre-commit, then run:

pre-commit install

Documentation generation

To generate the command line interface documentation, sdk and schemas documentation, the python version must be 3.8. Run the following command:

make doc

Documentation will be available in the references/ directory.

Changelog generation

The changelog is managed with towncrier. To add a new entry in the changelog, add a file in the changes folder. The file name should have the following structure: <unique_id>.<change_type>. The unique_id is a unique identifier, we currently use the PR number. The change_type can be of the following types: added, changed, removed, fixed.

To generate the changelog (for example during a release), use the following command (you must have the dev dependencies installed):

towncrier build --version=<x.y.z>

You can use the --draft option to see what would be generated without actually writing to the changelog (and without removing the fragments).

substra-tests's People

Contributors

Stargazers

Watchers

Forkers

guillaumecisco mblottiere aureliengasser shrouder

substra-tests's Issues

Improve default algo / metrics / opener

The current implementation of the algo, the metrics and the opener are returning hard coded values that don't depend on the input values.

It doesn't allow to check that the output models / predictions and score are computed as expected.

It will be better if the implementation of these assets make sense from a Machine Learning point of view.

Tasks:

Define structure to use (must be simple to implement and simple to use, i.e. expected results should be easy to compute on the side)
Implement it
Improve existing tests to ensure the task outputs are correct

Tests: retry task test not working with prod settings

This is due to the fact that the dev and prod settings are not the same for the retry mechanism.

Factory: validate factory content

This could help to validate the data before sending them to the server.
We need also to have an option to remove the validation to send invalid objects for testing purposes.

Extra: investigate how we could use these classes to create a dynamic documentation of the input structures. This documentation is created currently manually and it has not been always up-to-date.

Google Cloud disk deletion does not work

It seems that the disk deletion logic does not work anymore since today: see this line in today's cron action vs yesterday's.

My best bet is that the disk naming scheme changed. It seems that the filter is not able to find any disks related to the task and the next command fails.

Failure on test tests/test_execution_compute_plan.py::test_compute_plan

Tests traceback:

        # check all traintuples are done and check they have been executed on the expected
        # node
        for t in traintuples:
            assert t.status == assets.Status.done

        traintuple_1, traintuple_2, traintuple_3 = traintuples

        assert len(traintuple_3.in_models) == 2

>       assert traintuple_1.dataset.worker == session_1.node_id
E       AssertionError: assert 'MyOrg2MSP' == 'MyOrg1MSP'
E         - MyOrg2MSP
E         ?      ^
E         + MyOrg1MSP
E         ?      ^

It could be the traintuples list not ordered as expected.

This is a random failure, I cannot reproduce it.

Not compatible with python 3.6

It's working in python 3.7 but not in 3.6.

To reproduce:

Create a foo.yaml file with the following content:

algo:
  hash: a261b4558a46c0f8a5285b08316298609aaea29d0f30e1e2f51dfacaa27aa857
  name: cc6f5e17ce9542d3b41f02c9fbbefe38_global - Algo 0
  storageAddress: http://substra-backend.node-1.com/algo/a261b4558a46c0f8a5285b08316298609aaea29d0f30e1e2f51dfacaa27aa857/file/
computePlanID: ''
creator: MyOrg1MSP
dataset:
  keys:
  - 252a74f10f897b38a267bdc8a3d53d2a2738df689ecf244f046d046c43ba97f3
  - 4c6df43a64ced52e00b3fe702530ab1c2403d050dcc0939bf01d4149d15452dd
  - bf2329236d0e7156c464ed9bb3352af4dc5c0b929d9307fb4cb7771306140976
  - ec8c7f6e823cf46d49f066c7b74e30012b69c381213e0826935fa13d8d69bd3a
  openerHash: 109a87c1872558fcd3a01bc17e3b530c0de8ce592575fa06b5108b4162cee76b
  worker: MyOrg1MSP
inModels:
- hash: 72328140d2926398580faf7cf2322e851376d7f2ea5cc75a8e50c8e217695c0b
  storageAddress: http://substra-backend.node-1.com/model/72328140d2926398580faf7cf2322e851376d7f2ea5cc75a8e50c8e217695c0b/file/
  traintupleKey: 283e9ccd74bccb12413d50ad9035dcae7732e74601acd86da7dc0d23d3cbb7bf
key: 6ef5f35a529f6af84162163b90f7df4f1203ed23cc202a730e5140411d553249
log: ''
outModel:
  hash: 13b9e485072de1e36ff575661b52160231393ff23122172cd544fcac44ca5d31
  storageAddress: http://substra-backend.node-1.com/model/13b9e485072de1e36ff575661b52160231393ff23122172cd544fcac44ca5d31/file/
permissions:
  process:
    authorizedIDs: []
    public: true
rank: 0
status: done
tag: ''

Create a foo.py file with the following content:

import yaml

import substratest as sbt

with open('foo.yaml') as f:
    d = yaml.load(f)

a = sbt.assets.Traintuple.load(d)
print(a)

Create the following Dockerfile:

FROM python:3.6

RUN pip install pyyaml substra

ADD requirements.txt .
RUN pip install -r requirements.txt


COPY substratest substratest/

ADD foo.yaml .
ADD foo.py .

ENV PYTHONPATH .

RUN python3 foo.py

Run the following command:

docker build -t substratest .

Once this is fixed, we will have to update the main README.

Tests test_add_data_sample_located_in_shared_path should failed when no shared folder

It looks like it's not failing, to investigate.

SUBSTRAT_* global variables

This project uses SUBSTRAT as a prefix for global variables (note the extra T).
All others projects in the substra ecosystem use a SUBSTRA prefix.

For coherence and avoiding misunderstanding, this project should use SUBSTRA_* prefix for its global variables.

Tests: mark/skip tests that cannot run without shared folder

Some tests required a shared folder to add data samples, we should mark or skip them to be able to run the tests with an environment located in gcp for instance.

Split session object into client and state

Currently each session has its own state which contains the assets created (and owned) by the session during the current test run.

The objective of the session is to contain all assets necessary for the execution of a test. For example, the fixture global_execution_env returns sessions which contain all necessary assets to run test_aggregatetuple

In order to better reflect this, we'd like to separate the session object into 2 independent object:

Client (replaces Session) is a wrapper around substra.Client to handle serialization
State is an independent object with private properties (state._dataset) and public methods to access assets owned by a given node (state.get_dataset(node_id))

Fixtures such as global_execution_env would then not return a factory and a network but instead a factory and a state containing all assets created by the fixture.

Individual tests such as test_aggregatetuple would therefore need to get as args the fixture global_execution_env and a clients fixture (or multiple individual fixtures client_1, client_2 etc.)

This will lead to a pretty big renaming throughout the tests (no more sessions, only clients and states instead)

xargs " --no-run-if-empty " does not exist on mac os

The PR #160 uses an option that is not available in the BSD implementation of xargs in MacOs

Substratest: Asset objects: add validation

Currently asset objects are defined through dataclass instances with type hinting: this is not checking that each field type is correct.

We could implement a validation layer to ensure that the objects returned by the SDK are as expected.

The first step is to do some search and propose a solution (look at mypy, pydantic, marshmallow, ...).

It should be easy to convert camelCase fields to snake_case fields with the chosen solution and to be able to rename some fields (as currently done with the Meta class for few assets).

Tests: add a fast mode

We could have 2 ways to run the tests:

fast: for a quick development loop: runs only a few relevant tests (should run in less than X minutes)
full: run all the tests

Test fails: test_compute_plan_aggregate_composite_traintuples

The test fails with all repos on master

Investigation

I dug a little bit, without getting to the bottom of it.

    tuples = (cp.list_traintuple(sessions[0]) +
              cp.list_composite_traintuple(sessions[0]) +
              cp.list_aggregatetuple(sessions[0]) +
              cp.list_testtuple(sessions[0]))

    print(len(cp.list_traintuple(sessions[0])))
    print(len(cp.list_composite_traintuple(sessions[0])))
    print(len(cp.list_aggregatetuple(sessions[0])))
    print(len(cp.list_testtuple(sessions[0])))
    print("---")
    print(len(cp.list_traintuple(sessions[1])))
    print(len(cp.list_composite_traintuple(sessions[1])))
    print(len(cp.list_aggregatetuple(sessions[1])))
    print(len(cp.list_testtuple(sessions[1])))
    print("---")
    print(len(tuples))
    for t in tuples:
        print(t.key + " / " + t.status)
        #assert t.status == 'done'

gives

0
0
0
2
---
0
0
0
2
---
2
534095158798302969a0cd657f90800b32f6a5f5b32569c4aeedec56abfbbed0 / failed
aa41f9171727357845258e449fd86e1f2b332916e2f3157191a9e2e304b0520a / failed

which looks wrong.

In the backend, I printed the result of data = query_ledger(fcn='queryCompositeTraintuples', args=[] and the tuples are there (with the right compute plan id). So I suspect there's an issue with filtering.

Separate system vs user tests

Move test checking system features into a dedicated module (celery retry, delete intermediary models, ...).

cp.list_traintuple doesn't return tuple in their creation order

In test_execution_compute_plan.py, multiple tuples specs are generated (e.g. tuple_spec_1 and tuple_spec_2) and once created, the matching tuples are retrieved using:

tuple_1, tuple_2 = cp.list_tuple()

However there is no guarantee that tuple_1 matches tuple_spec_1 here. Our tests work because the tuple are chained and therefore have ever-increasing ranks and because the list_tuple methods all sort the tuple by rank. Without that, our checks would fail.

Substratest: avoid using string to define tuple status

It is better to use global variables or custom type for this status. It improves code readability, code maintainability and is less error prone.

CI: Kaniko cache doesn't always work as expected

The cache of kaniko (the docker image builder) doesn't always work as expected.

I had an example where we changed the Dockerfile of celeryworker. The build of the new docker image worked fine in gcloud build. However, pulling the image led to a docker error:

failed to register layer: Error processing tar file(exit status 1): failed to mknod("/usr/share/doc/adduser", S_IFCHR, 0): file exists

Full event log for the pod

 Normal   Scheduled         3m34s                  default-scheduler                                             Successfully assigned org-1/backend-org-1-substra-backend-worker-6696b5f5c-x9z94 to gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n
 Normal   BackOff           87s                    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Back-off pulling image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50"
 Warning  Failed            87s                    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Error: ImagePullBackOff
 Normal   Pulling           75s (x2 over 3m24s)    kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Pulling image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50"
 Warning  Failed            3s (x2 over 88s)       kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Failed to pull image "eu.gcr.io/substra-208412/celeryworker:ci-0fffa243c58436a4d12e2b3ab971f8023ed0ee50": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): failed to mknod("/usr/share/doc/adduser", S_IFCHR, 0): file exists
 Warning  Failed            3s (x2 over 88s)       kubelet, gke-substra-tests-i7lyc8-default-pool-4ef6ad12-zz2n  Error: ErrImagePull

Running run-ci.py with the --no-cache option fixed it, which shows that the issue had to do with reusing cache layers. Note that subsequent runs of run-ci.py without the --no-cache option also succeeded: the cache layers were now sane.

Proposed fix:

Fix nightly test retries in ,travis.yaml:
- Only the first attempt does a full rebuild
- The 2nd attempts uses the cache
- No 3rd attempt (Travis max build time is 50 minute : not enough time for 3 attempts)