statmike / vertex-ai-mlops Goto Github PK

View Code? Open in Web Editor NEW

412.0 22.0 190.0 1.08 GB

Google Cloud Platform Vertex AI end-to-end workflows for machine learning operations

License: Apache License 2.0

Jupyter Notebook 99.98% Python 0.01% Shell 0.01% R 0.01%

machine-learning deep-learning gcp gcp-vertex-ai mlops mlops-workflow mlops-template

vertex-ai-mlops's Introduction

View on
GitHub

Vertex AI for Machine Learning Operations

👋 I'm Mike

I want to share and enable Vertex AI from Google Cloud with you. The goal here is to share a comprehensive set of end-to-end workflows for machine learning that each cover the range of data to model to serving and managing - even automating the flow. Regardless of your data type, skill level or framework preferences you will find something helpful here. You can even ask for what you need and I might be able to work it into updates!

Click to watch on YouTube

Click here to see current playlist for this repository

Tracking

To better understand which content is most helpful to users, this repository uses tracking pixels in each markdown (.md) and notebook (.ipynb) file. No user or location data is collected. The only information captured is that the content was rendered/viewed which gives us a daily count of usage. Please share any concerns you have with this in repositories discussion board and I am happy to also provide a branch without the tracking.

A script is provided to remove this tracking from your local copy of this repository in the file pixel_remove.py in the folder pixel. This readme also has the complete code for creating the tracking in case you want to use replicate it or just understand it in greater detail.

Approach Used In This Repository

This repository is presented as workflows using, primarily, interactive python notebooks .ipynb. Why? These are easy to review, share, and move. They contain elements for both code and narrative. The narrative can be written with plain text, Markdown and/or HTML which makes providing visual explanations easy. This reinforces the goal of this repository: information that is easily accessible, portable, and great for starting points in your own work.

In notebooks, execution is driven from the locally attached compute. In this repository that means the Python code is currently running in the notebooks compute. The code in this repository heavily leans on orchestrating services in GCP rather than doing data compute in the local environment to the notebook. That means these notebooks are designed to run on minimal machine sizes, like n1-standard2 even. The heavy work of training and serving is done on Vertex AI, BigQuery, and other Google Cloud services. You will even find notebooks that author code, and then deploy the code in services like Vertex AI Custon Training and Vertex AI Pipelines.

There are sections that use other languages, like R, as well as creating files that are external to the notebooks: dockerfile, .py scripts and modules, etc.

The code in this repository is opinionated. It is not completely production ready as well as not simply ad-hoc exploration. It aims to the right of the continum of exploration to deployment: 'hello-world' to CI/CD/CT. In our data science daily work we might think of the process as:

In explore, everything is code as you go. At some point in this exploration ideas find value and need to be developed.

In develop, the approach is usually something like:

make it work
- get a working end to end flow
clean it up
- revisit the code and remove parts that are no longer needed and reorder based on what is learned
generalize it
- parameterize
- use functions
- control flow: start using logic to check for out of bound conditions
optimize it
- better use of data structures to handle data usage during execution
- consider execution timing and optimize for the simoultaneous goal of readability (= maintainability) and compute time

In many cases, getting from development to deployment is simple:

schedule a notebook - a lot like skipping the develop stage
deploy a pipeline
create a cloud function

But, inevitably, as a workflow proves value it requires more effort before you deploy:

error handling
unit testing
move from specialized code to generalized code:
- use classes
- control environment handling

So where does the code in the repository fall? In the late develop phase with strong readability and adaptibility.

Considerations
Overview
Vertex AI
Interacting With Vertex AI
Setup
Helpful Sections
More Resources

Considerations

Data Type

Tables: Tabular, structured data in rows and columns
Language: Text for translation and/or understanding
Vision: Images
Video

Convenience Level

Use Pre-Trained APIs
Automate building Custom Models
End-to-end Custom ML with core tools in the framework of your choice

Framework Preferences

Overview

This is a series of workflow demonstrations that use the same data source to build and deploy the same machine learning model with different frameworks and automation. These are meant to help get started in understanding and learning Vertex AI and provide starting points for new projects.

The demonstrations focus on workflows and don't delve into the specifics of ML frameworks other than how to integrate and automate with Vertex AI. Let me know if you have ideas for more workflows or details to include!

To understand the contents of this repository, the following charts uncover the groupings of the content.

Direction

Pre-Trained APIs

Pre-Trained Models

AutoML

Data Type

Pre-Trained Model

Prediction Types

AutoML

AutoML
Data Type	AutoML	Prediction Types
Table	AutoML Tables	Classification Binary Multi-class Regression Forecasting
Image	AutoML Image	Classification Single-label Multi-label Object Detection
Video	AutoML Video	Classification Object Detection Action Recognition
Text	AutoML Text	Classification Single-label Multi-label Entity Extraction Sentiment Analysis
Text	AutoML Translation	Translation

With Training Data

This work focuses on cases where you have training data:

Overview

AutoML	BigQuery ML	Vertex AI	Forecasting with AutoML, BigQuery ML, OSS Prophet

Vertex AI For ML Training

Vetex AI

Vetex AI is a platform for end-to-end model development. It consist of core components that make the processes of MLOps possible for design patterns of all types.

Interacting with Vertex AI

Many Vertex AI resources can be viewed and monitored directly in the GCP Console. Vertex AI resources are primarily created, and modified with the Vertex AI API.

The API is accessible from:

the command line with gcloud ai,
REST,
gRPC,
or the client libraries (built on top of gRPC) for
- Python,
- Java, and
- Node.js.

The notebooks in this repository primarily use the Python client aiplatform. There is occasional use of aiplatform.gapic, aiplatform_v1 and aiplatform_v1beta1.

For the full details on the APIs versions and layers and how/when to use each, see this helpful note.

Install the Vertex AI Python Client

pip install google-cloud-aiplatform

Example Usage: Listing all Models in Vertex AI Model Registry

PROJECT = 'statmike-mlops-349915'
REGION = 'us-central1'

# List all models for project in region with: aiplatform
from google.cloud import aiplatform
aiplatform.init(project = PROJECT, location = REGION)

model_list = aiplatform.Model.list()

Setup

The demonstrations are presented in a series of notebooks that are best run in JupyterLab. These can be reviewed directly in this repository on GitHub or cloned to your Jupyter instance on Vertex AI Workbench Instances.

Option 1: Review And Use Individual Files

Select the files and review them directly in the browser or IDE of your choice. This can be helpful for general understanding and selecting sections to copy/paste to your project. Some options to get a local copy of this repositories content:

use git: git clone https://github.com/statmike/vertex-ai-mlops
use wget to copy individual files directly from GitHub:
- Go to the notebook on GitHub.com and right-click the download link. Then select copy link address.
- Alternatively, click the Raw button on GitHub and then copy the URL that loads.
- Run the following from a notebook cell or directly from a terminal (without the !). Note the slightly different address that points directly to raw content on GitHub.
  - !wget "https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/<path and filename>.ipynb"
Use Colab (and soon Vetex AI Enterprise Colab) to open the notebooks. Many of the notebooks have section at the top with buttons for opening directly in Colab. Some notebooks don't yet have this feature and some use local Docker which is not available on Colab.

Option 2: Run These Notebooks in a Vertex AI Workbench based Notebook

TL;DR

In Google Cloud Console, Select/Create a Project then go to Vertex AI > Workbench > Instances

Create a new notebook and Open JupyterLab

Clone this repository using Git Menu, Open and run 00 - Environment Setup.ipynb

Create a Project
1. Link, Alternatively, go to: Console > IAM & Admin > Manage Resources
2. Click "+ Create Project"
3. Provide: name, billing account, organization, location
4. Click "Create"
Enable the APIs: Vertex AI API and Notebooks API
1. Link
  1. Alternatively, go to:
    1. Console > Vertex AI, then enable API
    2. Then Console > Vertex AI > Workbench, then enable API
Create A Notebook with Vertex AI Workbench Instances:
1. Go to: Console > Vertex AI > Workbench > Instances - direct link
2. Create a new instance - instructions
3. Once it is started, click the Open JupyterLab link.
4. Clone this repository to the JupyterLab instance:
  1. Either:
    1. Go to the Git menu and choose Clone a Repository
    2. Choose the Git icon on the left toolbar and click Clone a Repository
  2. Provide the Clone URI of this repository: https://github.com/statmike/vertex-ai-mlops.git
  3. In the File Browser you will now have the folder "vertex-ai-mlops" that contains the files from this repository
Setup the Notebook Environment for these workflows
1. Open the notebook vertex-ai-mlops/00 - Environment Setup
2. Follow the instructions and run the cells

Resources on these items:

Helpful Sections

Learning Machine Learning
- I often get asked "How do I learn about ML?". There are lots of good answers. ....
Explorations
- This is a series of projects for exploring new, new-to-me, and emerging tools in the ML world!
Tips
- Tips for using the repository and notebooks with examples of core skills like building containers, parameterizing jobs and interacting with other GCP services. These tips help with scaling jobs and developing them with a focus on CI/CD.

More Resources Like This Repository

This is my personal repository of demonstrations I use for learning and sharing Vertex AI. There are many more resources available. Within each notebook I have included a resources section and a related training section.

GitHub Many Examples For Real World Scenarios! by @jcavezar
GitHub GoogleCloudPlatform/vertex-ai-samples
GitHub GoogleCloudPlatform/mlops-with-vertex-ai
Overview of Data Science on Google Cloud

vertex-ai-mlops's People

Contributors

Stargazers

Watchers

Forkers

newcooldiscoveries ann-korea motconmeobuon justinjm danielnguyen-ds rakesh22a nishitpatel01 mordilos nthangarajan markdheilong gokhankoc conchitagoogle alfonso-miranda tfmv kanikadhamija pareshppp zacharyvunguyen iz-nzy aidenren1215 sdr1993 alvaroferrerrizzo qinjeanli keiandr leerano gaurav3119 joelcalanche lcajachahua dxc7jack mkayanda jpgacrama abhilashindulkar barrosm salvatoremaraniello hoai-nguyen hamood564 pavelpetukhov rmohajer bhalachandranangarepatil dipakml psc-dlt megahanga as-himself hamehrabi jaycedowns42 paulycloud suddhasatwabhaumik davidchoi76 guytheguytheguy binbhutto frealexandro nithishreddyny sem8 nvdgoog deeptigoyal-tb yz1006 rohan5076 manu87ds adigew goodrules shahaparan kanishkpatel1995 nxorable bathinapullarao psod18 praneethkumar4 xjaztek ivanmkc yfumero littlefish0331 polanco-jaime jadhavomkar irfanimaduddin kemjim samm40me zgao41 samyukthavenn lalitnew keepinmindsh shishir-suresh jwlai-cloud marcin-pawlowski yingfu46 nicholaskarlson apoorv-tyagi cotaciocotacio shraderdm sunnyly2016 mahdikooshkbaghi arin1599 kajusarkar nickydark1 domizianostingi ashish615 markbpryan kmu973 navadiyaharsh111 pavankosuru data-craft-nld-reprise memojja orbonnie

vertex-ai-mlops's Issues

Notebook 05g hyperparameters are not being used.

In the cell where the {DIR}/train.py is being written you have declared the arguments --lr (learning_rate) and --m (momentum) in the parser.

However, those variables are not being used in the optimizer (SGD).

I think the correct definition of the optimizer will be:

opt = tf.keras.optimizers.SGD(learning_rate = args.learning_rate, momentum = args.momentum)

Am I correct? Or how can the hyperparameters be connected with the training of the model??

Thank you for your work.

Error on Docker Run Command

Hi Steve,
i am getting below errors on Docker Run command
!docker run {IMAGE_URI} --PROJECT_ID {PROJECT_ID} --DATANAME {DATANAME} --NOTEBOOK {NOTEBOOK} --horizon {14} --no-yearly

Errors -
model-stock-management applied_forecasting 04f
0%| | 0/12 [00:00<?, ?it/s]23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11
0%| | 0/12 [00:00<?, ?it/s]23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11
23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/fit/prophet.py", line 48, in run_prophet
p.fit(series)
File "/opt/conda/lib/python3.7/site-packages/prophet/forecaster.py", line 1181, in fit
self.params = self.stan_backend.fit(stan_init, dat, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/prophet/models.py", line 94, in fit
raise e
File "/opt/conda/lib/python3.7/site-packages/prophet/models.py", line 90, in fit
self.stan_fit = self.model.optimize(**args)
File "/opt/conda/lib/python3.7/site-packages/cmdstanpy/model.py", line 738, in optimize
raise RuntimeError(msg)
RuntimeError: Error during optimization! Command '/opt/conda/lib/python3.7/site-packages/prophet/stan_model/prophet_model.bin random seed=52644 data file=/tmp/tmptt1na6uf/ouw39fzc.json init=/tmp/tmptt1na6uf/q6k325gk.json output file=/tmp/tmptt1na6uf/prophet_modeli11498bg/prophet_model-20230627230157.csv method=optimize algorithm=newton iter=10000' failed:
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/fit/prophet.py", line 55, in
predictions = list(tqdm(pool.imap(run_prophet, seriesFrames), total = len(seriesFrames)))
File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
RuntimeError: Error during optimization! Command '/opt/conda/lib/python3.7/site-packages/prophet/stan_model/prophet_model.bin random seed=52644 data file=/tmp/tmptt1na6uf/ouw39fzc.json init=/tmp/tmptt1na6uf/q6k325gk.json output file=/tmp/tmptt1na6uf/prophet_modeli11498bg/prophet_model-20230627230157.csv method=optimize algorithm=newton iter=10000' failed:

Appreciate your help in this
Thanks,
Varsha

Applied Forecasting AutoML Forecasting Python client.ipynb

Cell 6, 'Create Dataset (Link to BigQuery Table)' requires notebook 1 'prepped' table to be created. I had to figure that out and go find it, and execute it. Didn't know if that was your intent. -steve

suggested updates on Prediction section in 03g

First cell under Retrieve Records for Prediction Add `` to qualify the bigquery dataset in the query as shown below
Before changes:
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()
After changes:
pred = bq.query(query = f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits='TEST' LIMIT {n}").to_dataframe()

MLB notebook fails on API enablement when run in Vertex AI Workbench

MLB example is brilliant! However, running the notebook (https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20-%20MLB%20Rules%20For%20Baseball.ipynb) in Vertex AI Workbench fails on this cell with authorization error:

# Enable Document AI For This Project
!gcloud services enable documentai.googleapis.com
# Enable Vertex AI For This Project
!gcloud services enable aiplatform.googleapis.com

Using the same approach used for setting the project ID corrects the problem and allows the APIs to be enabled without errors when the notebook is run in the Vertex AI Workbench:

try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    # Enable Document AI For This Project
    !gcloud services enable documentai.googleapis.com
    # Enable Vertex AI For This Project
    !gcloud services enable aiplatform.googleapis.com
except Exception:
    pass

02c: def pipeline() gives Attribute Error.

AttributeError: module 'google.cloud.aiplatform' has no attribute 'TabularDatasetCreateOp'

02c: change of region causes problems.

Hi Mike, and thanks for this excellent series of tutorials.

I got as far as 02b when I started having resource availability issues in us-central1. After several days of not being able to open the notebook I decided to start again in a new region (europe-west6). I began with 00-setup and got as far as 02b without any issues. However, several pipeline tasks in 02c failed because if a region is not explicitly specified in a task the location defaults to us-central1. Hence the executors were looking for artifacts in the wrong region.

To cut a long story short, I only managed to complete the pipeline successfully by explicitly specifying a location in each task:

   # dataset
    dataset = TabularDatasetCreateOp(
        location = REGION,
        project = project,
        ....
    )
    
    # training
    model = AutoMLTabularTrainingJobRunOp(
        location = REGION,
        project = project,
        ...
    )
    
    # Endpoint: Creation
    endpoint = EndpointCreateOp(
        location = REGION,
        project = project,
        ...
    )

At first I tried explicitly setting the location in the top-level pipeline definition in the hope that this would cause the location to be inherited by the underlying tasks, but this didn't work. Perhaps there is another way of provoking this behavior....

As an aside, the solution above involved running the pipeline several times because I had to wait for each task to complete before I could verify that the next one was ok. This meant I trained the model (with identical data) three times, with a 2 hour wait each time. It was only later that I realized that Vertex pipelines have a cache which can be used to skip over repeated invocations of the same task, The reason the cache was not used was because you use a timestamp as part of the pipeline id. This is the equivalent of a "cache-buster" and excludes the cache by default. I would suggest using a less volatile pipeline id (eg a an explicit version number) and add a note to explain how the timestamp can be used to provoke a complete recalculation of the pipeline if necessary.

Thanks again for the work you have done here, it's super useful!

getting error in creating custom job from local script for 05a Notebook

Hi Mike,

I am getting error while creating custom job from local, 05a - Vertex AI Custom Model - TensorFlow - Custom Job With Python File

Getting this error File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/compat/__init__.py", line 18, in <module> from google.cloud.aiplatform.compat import services File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/compat/services/__init__.py", line 18, in <module> from google.cloud.aiplatform_v1beta1.services.dataset_service import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/__init__.py", line 21, in <module> from .services.dataset_service import DatasetServiceClient File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/__init__.py", line 16, in <module> from .client import DatasetServiceClient File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/client.py", line 51, in <module> from google.cloud.aiplatform_v1beta1.services.dataset_service import pagers File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/pagers.py", line 27, in <module> from google.cloud.aiplatform_v1beta1.types import annotation File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/__init__.py", line 184, in <module> from .feature_online_store_admin_service import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/feature_online_store_admin_service.py", line 26, in <module> from google.cloud.aiplatform_v1beta1.types import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/feature_view_sync.py", line 24, in <module> from google.type import interval_pb2 # type: ignore ImportError: cannot import name 'interval_pb2' from 'google.type' (/opt/conda/lib/python3.7/site-packages/google/type/__init__.py)

Vertex AI Experiment Error (notebook 05i)

getting error while logging run name in the vertex ai experiments occurs when running hyper parameter notebook.

google.api_core.exceptions.AlreadyExists: 409 Context with name projects/1234/locations/us-central1/metadataStores/ default/contexts/experiment-05-05i-tf-classification-dnn-run-20230202120231-1 already exists

I am using same code and it's giving error.

[04 - Vertex AI Custom Model - scikit-learn - in Notebook] Cannot deploy model to endpoint

The notebook runs smoothly until "deploying the model to endpoint". Error screenshot is as below (via Google Photos):
https://photos.app.goo.gl/LgXedsdg89mqaXAz9
https://photos.app.goo.gl/dAto3r6mAat3jpGSA

Cloud Monitor has log as below:
https://photos.app.goo.gl/Qd4PW7uCXPZN1LxG7

How to Reproduce:
Run the notebook on Colab

02c: failure of import aiplatform from google_cloud_pipeline_components

The following fails in current version:

from google_cloud_pipeline_components import aiplatform as gcc_aip

Error while Configuring Local Docker to Use GCLOUD CLI

Hi Mike,
I am working on notebook "Vertex AI Custom Model - Prophet - Custom Job With Custom Container".
Below command throws an error

!gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

Error message -
WARNING: docker not in system PATH.
docker and docker-credential-gcloud need to be in the same PATH in order to work correctly together.
gcloud's Docker credential helper can be configured but it will not work until this is corrected.
Adding credentials for: us-west1-docker.pkg.dev
Docker configuration file updated.

Any idea, how to go about this?

Tabular-dataset-create_notebook_02c

Hello Mike, hope everything is going well, I tried to run the notebook 02c, but it looks like there is a problem when running the pipeline, the creation of the Tabular-dataset-create throws an error and stops the process, i tried to do a debug, but the logs of the error are not very friendly. I even tried to run the pipeline only with the tabular-datset-create component and check where the problem is at, but no luck. And I also checked the IAM roles and looks like everything is ok.

I hope you can help me, thanks in advance (:

ImportError: cannot import name 'aiplatform' from 'google.cloud' (unknown location)

Getting import error in 02c notebook in vertex AI workbench in GCP.
tensorflow version=2.3

Error:
ImportError: cannot import name 'aiplatform' from 'google.cloud' (unknown location)

Bad Request: 400 Syntax Error : Missing whitespace between literal and alias at [1:36] in 04 - Vertex AI Custom Model - scikit-learn - in Notebook

hey Mike!
Firstly , A Big Thank you for teaching people like me about Vertex AI and sharing your code.Really appreciate it and will be forever grateful for the awesome stuff.

Regarding the error in code block [15] for which I got the above error :Bad Request: 400 Syntax Error : Missing whitespace between literal and alias at [1:36]

query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{BQ_TABLE}'"
schema = bq.query(query).to_dataframe()
schema

I think it only needs :

query = f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.COLUMNS` WHERE TABLE_NAME = '{BQ_TABLE}'"
schema = bq.query(query).to_dataframe()
schema

this worked!

Explainations error in Notebook 2a

When executing the line of code explanation = endpoint.explain(instances=instances, parameters=parameters), I encounter a FailedPrecondition error with the message "Deployed model 8593450830185627648 does not support explanation."

Error TraceBack:

	status = StatusCode.FAILED_PRECONDITION
	details = "Deployed model 8593450830185627648 does not support explanation."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:64.233.183.95:443 {grpc_message:"Deployed model 8593450830185627648 does not support explanation.", grpc_status:9, created_time:"2023-06-03T03:17:19.874069042+00:00"}"
>
FailedPrecondition: 400 Deployed model 8593450830185627648 does not support explanation.

Any help is appreciated.Thanks.

FR: project structure skeleton

Feature request: create the skeleton of a project structure that integrates the following:

jupyter workbench development environment
trainer - code for custom training
predictor - code for custom prediction
shared featurizer (preprocessing module for trainer and predictor)
Dockerfile(s) for both a custom trainer and predictor along with cloudbuild.yaml(s)

Key goals:

avoid code duplication (e.g. copying featurizer b/t trainer and predictor)
simple project structure
follow / demo ML ops best practices

You might also create a video that demonstrates the development workflow.

Motivation: there are many guides that, in isolation, provide good coverage of a component, but they are always outside the context of an actual practical development workflow.

It might look something like this (?):

├── README.md
├── setup.py
├── notebooks
│   ├── explore.ipynb
│   └── prototype_model.ipynb
├── predictor
│   ...
├── common
│   ├── utils.py
└── trainer
    ├── Dockerfile
    ├── build.ipynb
    ├── cloudbuild.yaml
    └── src
        ├── __init__.py
        ├── dev get training data bq.ipynb
        ├── features.py
        ├── requirements.txt
        ├── sql
        │   └── train_data_gen.sql
        ├── train.py

MLB notebook fails on permission issue until compute engine default service account granted Vertex AI User permission on project

When you run the MLB notebook (https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20-%20MLB%20Rules%20For%20Baseball.ipynb) in Vertex Workbench, the get_embeddings statement generates a PermissionDenied error.

Here's the statement that generates the error:

embedding_model.get_embeddings([question])[0].values[0:5]

Here's the PermissionDenied error you get when you run that cell:

PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/genai-test-project-may28/locations/us-central1/publishers/google/models/textembedding-gecko@001' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
  key: "permission"
  value: "aiplatform.endpoints.predict"
}
metadata {
  key: "resource"
  value: "projects/genai-test-project-may28/locations/us-central1/publishers/google/models/textembedding-gecko@001"
}
]

You can get around this by granting the compute engine default service account (e.g. for project with project number 388500005335, compute engine default service account is: [email protected])) Vertex AI User role.

https://screenshot.googleplex.com/7qVMQNFxxHbs6ah

After granting the compute engine default service account this role in the console, the get_embeddings statement runs without error:

https://screenshot.googleplex.com/5F4h7oSHDk6u9EY

Suggest adding a note to the intro to this notebook to indicate that the user has to grant the Vertex AI User role to the compute engine default service account for their project. Adding code to do this automatically would be ideal, but I was only able to get as far as getting the compute engine default service account, not the command to grant this service account Vertex AI user role programmatically.

PROJECT_NUMBER_LIST = !gcloud projects list \
--filter="$(gcloud config get-value project)" \
--format="value(PROJECT_NUMBER)"
PROJECT_NUMBER = PROJECT_NUMBER_LIST[0]
compute_engine_default_service_account = PROJECT_NUMBER+'[email protected]'
print(compute_engine_default_service_account)

Trying to run Notebook 02c. with Pipeline. fails to create endpoint with strange error.

Hello Mike,
First up, thank you for the great videos. I am trying to understand vertex and implement pipelines.

Following through but the pipeline run stage is failing with the following. I dont even see a spec for the endpoint.description

Thanks in advance.

Eric.

"'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'endpoint.labels', 'description': 'There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores."

{
"insertId": "15csmzsfcvnaaq",
"jsonPayload": {
"levelname": "ERROR",
"message": "ValueError: Failed to create the resource. Error: {'code': 400, 'message': 'List of found errors:\t1.Field: endpoint.labels; Message: There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores. Label keys must start with a letter or number, must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8. Label values must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8.\t', 'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'endpoint.labels', 'description': 'There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores. Label keys must start with a letter or number, must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8. Label values must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8.'}]}]}\n"
},
"resource": {
"type": "ml_job",
"labels": {
"job_id": "9200781745828397056",
"task_name": "workerpool0-0",
"project_id": "vertextest-358106"
}
},
"timestamp": "2022-08-01T07:43:29.864480671Z",
"severity": "ERROR",
"labels": {
"compute.googleapis.com/zone": "us-central1-f",
"compute.googleapis.com/resource_name": "gke-cml-0801-073321--e2-standard-4-43-45039984-4q9l",
"ml.googleapis.com/trial_type": "",
"ml.googleapis.com/job_id/log_area": "root",
"ml.googleapis.com/trial_id": "",
"ml.googleapis.com/tpu_worker_id": "",
"compute.googleapis.com/resource_id": "6639669956991139615"
},
"logName": "projects/vertextest-358106/logs/workerpool0-0",
"receiveTimestamp": "2022-08-01T07:43:35.499563906Z"
}

Could you utilize MMR for UmpireBot - MLB Rules For Baseball

Would it be possible to update the document search to utilize MMR (Max marginal relevance) instead of this?
relevant_context = search_elements(question, k = 1 + 3int(10(1-closest_match)))

I want to say it will be better at grabbing more context for more ambiguous questions, but will leave final decision to you.

Relevant notebook: https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20v2%20-%20MLB%20Rules%20For%20Baseball.ipynb

Details on MMR (https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/mmr)

Using Vertex AI Pipeline with Workflow with Generative AI

Hi @statmike - Thank you for such great content. Do you have content on building generative AI with Vertex or AI workflow on it?

Thanks again.

Review data section in 03g - Switch cell type from markdown to code

Switch cell type from Markdown to code for this cell:
query = f""" WITH COUNTS as (SELECT splits, {VAR_TARGET}, count(*) as n FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} GROUP BY splits, {VAR_TARGET})
SELECT *, SAFE_DIVIDE(n, SUM(n) OVER(PARTITION BY {VAR_TARGET})) as n_pct_class, SAFE_DIVIDE(n, SUM(n) OVER(PARTITION BY splits)) as n_pct_split, SAFE_DIVIDE(SUM(n) OVER(PARTITION BY {VAR_TARGET}), SUM(n) OVER()) as class_pct_to SUM(n) OVER() as total,tal FROM COUNTS """ review = bq.query(query = query).to_dataframe() review
Add comma after to SUM(n) OVER()) as class_pct_to
Remove the last attribute in the query tal

On main branch, I see only notebooks 4a,4b,4c however in readme I do see a mention of 4g-4i. Where can I find that?

notebooks after 4c.

statmike / vertex-ai-mlops Goto Github PK

vertex-ai-mlops's Introduction

Vertex AI for Machine Learning Operations

👋 I'm Mike

Tracking

Approach Used In This Repository

Table of Contents

Considerations

Data Type

Convenience Level

Framework Preferences

Overview

Pre-Trained APIs

AutoML

With Training Data

Vertex AI For ML Training

Vetex AI

Interacting with Vertex AI

Setup

Option 1: Review And Use Individual Files

Option 2: Run These Notebooks in a Vertex AI Workbench based Notebook

Helpful Sections

More Resources Like This Repository

vertex-ai-mlops's People

Contributors

Stargazers

Watchers

Forkers

vertex-ai-mlops's Issues

Recommend Projects

Recommend Topics

Recommend Org