Giter VIP home page Giter VIP logo

central_model_repo's Introduction

Central Model Registry

While using this project, you need Python 3.X and pip or conda for package management.

Installing project requirements

pip install -r unit-requirements.txt

Unit testing

For local unit testing, please use pytest:

pytest tests/unit

CI/CD pipeline settings

Please set the following secrets or environment variables. Follow the documentation for GitHub Actions or for Azure DevOps Pipelines:

  • QA_HOST
  • QA_TOKEN

Creating the distribution

To create the distribution:

python setup.py sdist bdist_wheel

Saving the distribution to DBFS

With the databricks CLI:

databricks fs cp /local/path/to/your/wheel dbfs:/path/to/your/wheel --profile <dev-workspace-profile-name>

You can then install that wheel on your cluster by specifying path dbfs:/path/to/your/wheel in the Libraries tab of your cluster.

Secret scopes

According to the documentation, a personal access token must be created in the central model registry and secrets must be created for each dev/qa/prod workspace to access the central model registry. Example with the databricks cli:

databricks secrets create-scope --scope <scope> --initial-manage-principal users --profile <my_env_profile>
databricks secrets put --scope <scope> --key <key>-host --string-value <workspace_url> --profile <my_env_profile>
databricks secrets put --scope <scope> --key <key>-token --string-value <personal_access_token> --profile <my_env_profile>
databricks secrets put --scope <scope> --key <key>-workspace-id --string-value <workspace_id> --profile <my_env_profile>

Training and tracking a new model

Here is the process to train and track a new model version:

  • if it's the first time you're training a model for this project or you made some modifications to the module that are taken into account in the training pipeline:
    • update the module's version in central_model_registry/__init__.py
    • rebuild the distribution via python setup.py sdist bdist_wheel
    • copy the wheel to DBFS in the dev environment (replace the paths): databricks fs cp /path/to/new_wheel_version dbfs:/your/user/path/new_wheel_version --profile <dev-workspace-profile>
  • import training notebook from central_model_registry/notebooks/training.py to your dev environment
  • if it's the first time you're training a model for this project, create the parent experiment directory in the dev environment: /experiments/central-model-registry
  • use an interactive cluster and attach the training notebook
  • install the wheel on the cluster, either as a cluster library or as a notebook-scoped library. If you're using a cluster library, make sure you remove existing (previous) versions of your wheel
  • train models as you which (this is the part you can modify)
  • register the best one if the performance is good enough
  • check the new model version at the end of the notebook
  • update the model version in file model.json with that new version

central_model_repo's People

Contributors

frenchlam avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.