Giter VIP home page Giter VIP logo

mlops's Introduction

Alt text

README


This repo can be used as a starter kit to setup a fully git integrated Machine Learning Operations enviroment using Cloud Pak for Data and (in the future) watsonx. It uses a simple "credit score prediction" usecase that is split up into 4 jupyter notebooks as an example, which can easily be adapted to your business problem.

It tries to be as simple as possible and showing the basic concepts of MLOps using IBM tools. The intended use it that after you have set everyhting up and familiarized yourself with the concepts you throw out all the "credit score prediction" code and replace it with whatever problem you are trying to solve.

high level overview using three stages

high level overview using three stages

Setup Instructions

These instructions will guide you through the setup of a simple MLOps environment that uses just two stages ("dev" and "prod"). The setup can be easily extended to more stages if needed.

It is assumed that you have a "Cloud Pak for Data" instance available and that you have admin rights to it (This will not work with the cloud based "as a Service" Offering).

Alt text

detailed view using two stages

1. Fork this repo

need a detailed description?

Alt text

click the "Fork" button in the upper right corner of this repo. IMPORTANT: uncheck the "only fork the master branch" checkbox. This will create a copy of this repo in your own github account. We will be using this copy in the following steps.

2. Create one git-enabled project called "00-datascience-playground"

need a detailed description?

Overview

Alt text this is the project that we are creating in this step

Step by step

Alt text navigate to all projects Alt text create a project that is "integrated with git". In the next window we will need to provide the github repo address and a private access token. So lets create that token first. Alt text navigate to https://github.com/settings/tokens and choose "Generate new token". Give it a name and select the "repo" scope as shown in the next image. Alt text Copy the generated token to your clipboard. You will not be able to see it again after you close the window. Alt text Make this token available within your CP4D by creating a "New Token" and using the token you just created. Once you created it use the dropdown to select it. Alt text add the Repo URL (dont forget the .git at the end ;-) and choose the main branch. Then hit "Create"

Use the github repo address and your private access token You can Alter the notebooks to your needs if you want to. It is important that you keep the naming of the notebooks.

3. Create one git-enabled project called "01-staging-area"

need a detailed description?

Overview

Alt text this is the project that we are creating in this step

Step by step

Alt text navigate to all projects Alt text In your CP4D Instance you access the project overview by clicking on the "Projects" Icon in the upper left corner. Then click on "New Project" and select "Create a project integrated with a Git repository". Give it the name "01-staging-area" and select "create"

Use the same github repo address and your private access token as in 2

4. Configure custom enviroment in "01-staging-area"

need a detailed description?

TODO: Add description here! (use custom_env.yaml)

5. Configure Jobs in "01-staging-area"

need a detailed description?

Overview

Alt text this is the project that we are creating in this step

Step by step

Alt text navigate to "view local branch"

Alt text click "New code job"

Alt text choose the first notebook "00-git-pull.ipynb" and click "configure job"

Alt text give it the same name as the notebook and click "next" TODO: choose correct enviroment for every job accept all the defaults and click "next" until you can click "create job" TODO: add the "was_successful" output to every job repeat those steps for all six notebooks.

Alt text once you are done it should look like this.

We also need to create a .env file within the "01-staging-area" project. This file will contain the credentials that the pipeline will use to pull the code from github.

Alt text

Click "Launch IDE" and then "JupyterLab" to get access to the JupyterLab environment.

Alt text

You will be greeted by a tab called "Terminal 1". There you copy the following commands and hit enter:

echo "repo_adresse=PUT_YOUR_REPO_ADDRESS_HERE" > .env
echo "personal_access_token=PUT_YOUR_TOKEN_HERE" >> .env
echo "project_id=PUT_YOUR_PROJECT_ID_HERE" >> .env
echo "branch_name=main" >> .env
echo "cpd_technical_user=PUT_USERNAME_HERE" >> .env
echo "cpd_technical_user_password=PUT_PASSWORD_HERE">> .env
echo "cpd_url=PUT_URL_HERE">> .env

cpd_technical_user is a user that was created only to be used as a proxy in those scripts. If this is not available you can also use a personal user (i.e. the credentials you use to login) even though this not best practise

Alt text

You can check if everything worked by typing

cat .env

If that command displays the content of the .env file you are good to go.

5. Create a NON-git-enabled project called "02-automation-area"

need a detailed description?

Overview

Alt text this is the project that we are creating in this step

Step by step

Alt text repeat the same steps as in 2 and 3 but choose "create an empty project" to create a NON-git-enabled project. Name it "02-automation-area"

6. Configure pipeline in "02-automation-area"

need a detailed description?

Overview

Alt text those are the pieces we are creating in this step

Step by step

TODO: add global parameters

Alt text Click "New Asset" and choose "Pipeline". Name the pipeline "mlops_pipeline"

Alt text go to "Run">"Run Notebook Job" and drag it onto the plane. Then doubleclick this newly created node and click "select Job".

Alt text choose "01-staging-area" and there the first notebook "00-git-pull.ipynb" and click "choose" and then "save"

TODO: choose enviroment TODO: add pipeline params

Alt text repeat those steps for all notebooks until you end up with something that looks like this.

Alt text Click "Run Pipeline" and then "create job". Give it a name like "mlops_pipeline_job" . IMPORTANT: The github action assumes that you only have ONE job in this project. If you have more than one job you will need to change the github action accordingly.

7. Setup Github Actions

need a detailed description?

Overview

Alt text this is the piece that we are creating in this step

Step by step

We need a set of secrets to be able to run the github actions. Those secrets are:

  • API_KEY
  • USER_NAME
  • CLUSTER_URL
  • PROJECT_ID
  • PERSONAL_ACCESS_TOKEN_GITHUB

We will now go through all those step by step:

Alt text navigate to your fork of the github repo then "Settings">"Secrets and variables">"actions">"new repository secret"

7.1. retriving your CP4D API_KEY and USER_NAME

Alt text go to the "profile and settings" tab in your cp4d instance

Alt text copy the api key to your clipboard (and write it down somewhere. You will not be able to see it again after you close the window)

Alt text go back to github and creaete a new repository secret called "API_KEY"

Alt text

Also create the repository secret USER_NAME using the username that you use to login to your CP4D instance

7.2. retriving your CP4D CLUSTER_URL

this one is simple :-) Alt text

just take the URL of the cluster that you have been workin on

Alt text

and use it to create a secret called "CLUSTER_URL"

7.3. retriving your CP4D PROJECT_ID

Alt text

7.4. retriving your github PERSONAL_ACCESS_TOKEN_GITHUB

You can use the same token you used in step 2. If you dont have it anymore you can create a new one by following the steps in 2.

8. Create deployment space

need a detailed description?

TODO: describe how to create deployment space

9. Setup monitoring using open scale

need a detailed description?

TODO: describe how to set up open scale

10. Try it out :-)

11. Future Work and known issues

need a detailed description?
  • Future Work:

    • Put AI Fact sheets back into the "03-train_model" notebook
    • Figure out what is wrong with the deployments and fix it
    • Figure out what is wrong with monitoring (probably issue with the cluster we use)
    • Finish Documentation of 8. Create deployment space and 9. Setup monitoring using open scale
    • Delete all projects and set everything up again acording to documentation to find what is missing(~ one day of work)
    • describe how good usermanagement can work (e.g. normal Users can only see the "01_data_science_playground" project)
    • integrate Model Inventory/ model versioning
  • Known Issues

mlops's People

Contributors

dominikkreuzberger avatar ibm-open-source-bot avatar iiias avatar max-jesch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.