Giter VIP home page Giter VIP logo

substra-tools's Introduction



Substra


Substra is an open source federated learning (FL) software. It enables the training and validation of machine learning models on distributed datasets. It provides a flexible Python interface and a web application to run federated learning training at scale. This specific repository is the low-level Python library used to interact with a Substra network.

Substra's main usage is in production environments. It has already been deployed and used by hospitals and biotech companies (see the MELLODDY project for instance). Substra can also be used on a single machine to perform FL simulations and debug code.

Substra was originally developed by Owkin and is now hosted by the Linux Foundation for AI and Data. Today Owkin is the main contributor to Substra.

Join the discussion on Slack and subscribe here to our newsletter.

To start using Substra

Have a look at our documentation.

Try out our MNIST example.

Support

If you need support, please either raise an issue on Github or ask on Slack.

Contributing

Substra warmly welcomes any contribution. Feel free to fork the repo and create a pull request.

Setup

To setup the project in development mode, run:

pip install -e ".[dev]"

To run all tests, use the following command:

make test

Some of the tests require Docker running on your machine before running them.

Code formatting

You can opt into auto-formatting of code on pre-commit using Black.

This relies on hooks managed by pre-commit, which you can set up as follows.

Install pre-commit, then run:

pre-commit install

Documentation generation

To generate the command line interface documentation, sdk and schemas documentation, the python version must be 3.8. Run the following command:

make doc

Documentation will be available in the references/ directory.

Changelog generation

The changelog is managed with towncrier. To add a new entry in the changelog, add a file in the changes folder. The file name should have the following structure: <unique_id>.<change_type>. The unique_id is a unique identifier, we currently use the PR number. The change_type can be of the following types: added, changed, removed, fixed.

To generate the changelog (for example during a release), use the following command (you must have the dev dependencies installed):

towncrier build --version=<x.y.z>

You can use the --draft option to see what would be generated without actually writing to the changelog (and without removing the fragments).

substra-tools's People

Contributors

alexandrepicosson avatar aureliengasser avatar camillemarinisonos avatar clementgautier avatar esadruhn avatar fabien-gelus avatar grouane avatar guilhem-barthes avatar guillaumecisco avatar inalgnu avatar jmorel avatar kelvin-m avatar louishulot avatar maikia avatar mblottiere avatar milouu avatar natct10 avatar romaingoussault avatar samlesu avatar sdgjlbl avatar substra-deploy avatar thbcmlowk avatar thibaultfy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

substra-tools's Issues

rename the execute function

In function.py we have 2 different execute functions, and it's really difficult when calling for instance function.execute([to know which one is used, we have to track the function object type.
I propose to rename one of them, like execute_function for the one in wrapper or execute_cli for the one running the cli

Implement model serializers to simplify algo definition

We could define a few serializers in substra-tools to avoid users writing their own.

We could create numpy; json and pickle serializers to start with.

Proposition of API:

class MyAglo(tools.Algo):
    model_serializer = tools.serializers.NUMPY

    def train():
        ...

Note: we should not break the existing API; we should still be able to define manually the load and save methods.

Get rid of substratools docker images

As substratools is now open source we don't need to build and provide docker images.
End users could directly install subtratools through pip and don't need to create a Dockerfile inheriting from substratools.

It will allow them to choose the base image.

One of the issue is that currently substra-backend relies on the fact that the working directory inside the container is /sandbox (hardcoded value).

That's the reason why to get rid of the substratools, we must first ensure that:

  • all paths can be defined in substratools command line interfaces
  • substra-backend is passing the expected paths through the command line interface

We will then be able to update all the Dockerfile and stop inheriting from substratools images.

EDIT:

โš ๏ธ The substra-tools image is really useful for a quick development loop: update substratools code, rebuild the image and launch the tests locally. If substratools is installed from pypi in the end to end tests, we won't be able to test new substratools changes. So I think, we should continue to publish substratools docker images to the docker registry and use them for the end to end tests.

Add documentation on how to test manually algo / opener scripts

This is an alternative to the run-local command.

Example of python sripts to test an algo/opener:

import tensorflow as tf
import algo.algo as algo
import dataset.opener as opener
o = opener.Opener()
X = o.get_X(["dataset/train/train1"])
y = o.get_y(["dataset/train/train1"])
a = algo.ModelComp(local_folder='./sandbox/local/')
pred, model = a.train(X, y, None, None, 0)

No package description on pypi.org

The description of the package substratools is not fetched from the setup.py file whereas it is working for the substra package. On both projects, the uploader had a travis token added before.
Any idea why this is happening?
It would be great to have description='Python tools to submit algo on the Substra platform' displayed to the world!

allow creation of Dockerfile without substratools base image

One of the issue is that currently substra-backend relies on the fact that the working directory inside the container is /sandbox (hardcoded value).

Tasks to be done:

  • all paths can be defined in substratools command line interfaces (no more hardcoded paths)
  • substra-backend is passing the expected paths through the command line interface

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.