Giter VIP home page Giter VIP logo

vertex-ai-mlops's Introduction

tracker

GitHub logo
View on
GitHub





Vertex AI for Machine Learning Operations

๐Ÿ‘‹ I'm Mike

I want to share and enable Vertex AI from Google Cloud with you. The goal here is to share a comprehensive set of end-to-end workflows for machine learning that each cover the range of data to model to serving and managing - even automating the flow. Regardless of your data type, skill level or framework preferences you will find something helpful here. You can even ask for what you need and I might be able to work it into updates!

Click to watch on YouTube

Click here to see current playlist for this repository


Tracking

To better understand which content is most helpful to users, this repository uses tracking pixels in each markdown (.md) and notebook (.ipynb) file. No user or location data is collected. The only information captured is that the content was rendered/viewed which gives us a daily count of usage. Please share any concerns you have with this in repositories discussion board and I am happy to also provide a branch without the tracking.

A script is provided to remove this tracking from your local copy of this repository in the file pixel_remove.py in the folder pixel. This readme also has the complete code for creating the tracking in case you want to use replicate it or just understand it in greater detail.


Approach Used In This Repository

This repository is presented as workflows using, primarily, interactive python notebooks .ipynb. Why? These are easy to review, share, and move. They contain elements for both code and narrative. The narrative can be written with plain text, Markdown and/or HTML which makes providing visual explanations easy. This reinforces the goal of this repository: information that is easily accessible, portable, and great for starting points in your own work.

In notebooks, execution is driven from the locally attached compute. In this repository that means the Python code is currently running in the notebooks compute. The code in this repository heavily leans on orchestrating services in GCP rather than doing data compute in the local environment to the notebook. That means these notebooks are designed to run on minimal machine sizes, like n1-standard2 even. The heavy work of training and serving is done on Vertex AI, BigQuery, and other Google Cloud services. You will even find notebooks that author code, and then deploy the code in services like Vertex AI Custon Training and Vertex AI Pipelines.

There are sections that use other languages, like R, as well as creating files that are external to the notebooks: dockerfile, .py scripts and modules, etc.

The code in this repository is opinionated. It is not completely production ready as well as not simply ad-hoc exploration. It aims to the right of the continum of exploration to deployment: 'hello-world' to CI/CD/CT. In our data science daily work we might think of the process as:

In explore, everything is code as you go. At some point in this exploration ideas find value and need to be developed.

In develop, the approach is usually something like:

  • make it work
    • get a working end to end flow
  • clean it up
    • revisit the code and remove parts that are no longer needed and reorder based on what is learned
  • generalize it
    • parameterize
    • use functions
    • control flow: start using logic to check for out of bound conditions
  • optimize it
    • better use of data structures to handle data usage during execution
    • consider execution timing and optimize for the simoultaneous goal of readability (= maintainability) and compute time

In many cases, getting from development to deployment is simple:

  • schedule a notebook - a lot like skipping the develop stage
  • deploy a pipeline
  • create a cloud function

But, inevitably, as a workflow proves value it requires more effort before you deploy:

  • error handling
  • unit testing
  • move from specialized code to generalized code:
    • use classes
    • control environment handling

So where does the code in the repository fall? In the late develop phase with strong readability and adaptibility.


Table of Contents


Considerations

Data Type

  • Tables: Tabular, structured data in rows and columns
  • Language: Text for translation and/or understanding
  • Vision: Images
  • Video

Convenience Level

  • Use Pre-Trained APIs
  • Automate building Custom Models
  • End-to-end Custom ML with core tools in the framework of your choice

Framework Preferences


Overview

This is a series of workflow demonstrations that use the same data source to build and deploy the same machine learning model with different frameworks and automation. These are meant to help get started in understanding and learning Vertex AI and provide starting points for new projects.

The demonstrations focus on workflows and don't delve into the specifics of ML frameworks other than how to integrate and automate with Vertex AI. Let me know if you have ideas for more workflows or details to include!

To understand the contents of this repository, the following charts uncover the groupings of the content.

Direction

Pre-Trained APIs

Pre-Trained Models AutoML
Data Type Pre-Trained Model Prediction Types Related Solutions

Text

Cloud Translation API
Detect, Translate
Cloud Text-to-Speech

AutoML Translation

Cloud Natural Language API
Entities (Identify and label), Sentiment, Entity Sentiment, Syntax, Content Classification
Healthceare Natural Language API

AutoML Text

Image

Cloud Vision API
Crop Hint, OCR, Face Detect, Image Properties, Label Detect, Landmark Detect, Logo Detect, Object Localization, Safe Search, Web Detect

Document AI

Visual Inspection AI

AutoML Image

Audio

Cloud Media Translation API
Real-time speech translation
Cloud Speech-to-Text

Video

Cloud Video Intelligence API
Label Detect*, Shot Detect*, Explicit Content Detect*, Speech Transcription, Object Tracking*, Text Detect, Logo Detect, Face Detect, Person Detect, Celebrity Recognition
Vertex AI Vision

AutoML Video

AutoML

AutoML
Data Type
AutoML
Prediction Types

Table

AutoML Tables
Classification
Binary
Multi-class
Regression
Forecasting

Image

AutoML Image
Classification
Single-label
Multi-label
Object Detection

Video

AutoML Video
Classification
Object Detection
Action Recognition

Text

AutoML Text
Classification
Single-label
Multi-label
Entity Extraction
Sentiment Analysis

Text

AutoML Translation
Translation

With Training Data

This work focuses on cases where you have training data:

Overview
AutoML BigQuery ML Vertex AI Forecasting with AutoML, BigQuery ML, OSS Prophet

Vertex AI For ML Training

ย  ย  ย  ย 


Vetex AI

Vetex AI is a platform for end-to-end model development. It consist of core components that make the processes of MLOps possible for design patterns of all types.

Components ย  ย  ย  ย  Console


Interacting with Vertex AI

Many Vertex AI resources can be viewed and monitored directly in the GCP Console. Vertex AI resources are primarily created, and modified with the Vertex AI API.

The API is accessible from:

The notebooks in this repository primarily use the Python client aiplatform. There is occasional use of aiplatform.gapic, aiplatform_v1 and aiplatform_v1beta1.

For the full details on the APIs versions and layers and how/when to use each, see this helpful note.

Install the Vertex AI Python Client

pip install google-cloud-aiplatform

Example Usage: Listing all Models in Vertex AI Model Registry

PROJECT = 'statmike-mlops-349915'
REGION = 'us-central1'

# List all models for project in region with: aiplatform
from google.cloud import aiplatform
aiplatform.init(project = PROJECT, location = REGION)

model_list = aiplatform.Model.list()

Setup

The demonstrations are presented in a series of notebooks that are best run in JupyterLab. These can be reviewed directly in this repository on GitHub or cloned to your Jupyter instance on Vertex AI Workbench Instances.

Option 1: Review And Use Individual Files

Select the files and review them directly in the browser or IDE of your choice. This can be helpful for general understanding and selecting sections to copy/paste to your project. Some options to get a local copy of this repositories content:

  • use git: git clone https://github.com/statmike/vertex-ai-mlops
  • use wget to copy individual files directly from GitHub:
    • Go to the notebook on GitHub.com and right-click the download link. Then select copy link address.
    • Alternatively, click the Raw button on GitHub and then copy the URL that loads.
    • Run the following from a notebook cell or directly from a terminal (without the !). Note the slightly different address that points directly to raw content on GitHub.
      • !wget "https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/<path and filename>.ipynb"
  • Use Colab (and soon Vetex AI Enterprise Colab) to open the notebooks. Many of the notebooks have section at the top with buttons for opening directly in Colab. Some notebooks don't yet have this feature and some use local Docker which is not available on Colab.

Option 2: Run These Notebooks in a Vertex AI Workbench based Notebook

TL;DR

In Google Cloud Console, Select/Create a Project then go to Vertex AI > Workbench > Instances

  • Create a new notebook and Open JupyterLab
  • Clone this repository using Git Menu, Open and run 00 - Environment Setup.ipynb
  1. Create a Project
    1. Link, Alternatively, go to: Console > IAM & Admin > Manage Resources
    2. Click "+ Create Project"
    3. Provide: name, billing account, organization, location
    4. Click "Create"
  2. Enable the APIs: Vertex AI API and Notebooks API
    1. Link
      1. Alternatively, go to:
        1. Console > Vertex AI, then enable API
        2. Then Console > Vertex AI > Workbench, then enable API
  3. Create A Notebook with Vertex AI Workbench Instances:
    1. Go to: Console > Vertex AI > Workbench > Instances - direct link
    2. Create a new instance - instructions
    3. Once it is started, click the Open JupyterLab link.
    4. Clone this repository to the JupyterLab instance:
      1. Either:
        1. Go to the Git menu and choose Clone a Repository
        2. Choose the Git icon on the left toolbar and click Clone a Repository
      2. Provide the Clone URI of this repository: https://github.com/statmike/vertex-ai-mlops.git
      3. In the File Browser you will now have the folder "vertex-ai-mlops" that contains the files from this repository
  4. Setup the Notebook Environment for these workflows
    1. Open the notebook vertex-ai-mlops/00 - Environment Setup
    2. Follow the instructions and run the cells

Resources on these items:


Helpful Sections

  • Learning Machine Learning
    • I often get asked "How do I learn about ML?". There are lots of good answers. ....
  • Explorations
    • This is a series of projects for exploring new, new-to-me, and emerging tools in the ML world!
  • Tips
    • Tips for using the repository and notebooks with examples of core skills like building containers, parameterizing jobs and interacting with other GCP services. These tips help with scaling jobs and developing them with a focus on CI/CD.

More Resources Like This Repository

This is my personal repository of demonstrations I use for learning and sharing Vertex AI. There are many more resources available. Within each notebook I have included a resources section and a related training section.

vertex-ai-mlops's People

Contributors

goodrules avatar karticn-google avatar pavelpetukhov avatar statmike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vertex-ai-mlops's Issues

Notebook 05g hyperparameters are not being used.

In the cell where the {DIR}/train.py is being written you have declared the arguments --lr (learning_rate) and --m (momentum) in the parser.

However, those variables are not being used in the optimizer (SGD).

I think the correct definition of the optimizer will be:

opt = tf.keras.optimizers.SGD(learning_rate = args.learning_rate, momentum = args.momentum)

Am I correct? Or how can the hyperparameters be connected with the training of the model??

Thank you for your work.

Error on Docker Run Command

Hi Steve,
i am getting below errors on Docker Run command
!docker run {IMAGE_URI} --PROJECT_ID {PROJECT_ID} --DATANAME {DATANAME} --NOTEBOOK {NOTEBOOK} --horizon {14} --no-yearly

Errors -
model-stock-management applied_forecasting 04f
0%| | 0/12 [00:00<?, ?it/s]23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] start processing
23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11
0%| | 0/12 [00:00<?, ?it/s]23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11
23:01:57 - cmdstanpy - INFO - Chain [1] done processing
23:01:57 - cmdstanpy - ERROR - Chain [1] error: terminated by signal 11 Unknown error -11

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/fit/prophet.py", line 48, in run_prophet
p.fit(series)
File "/opt/conda/lib/python3.7/site-packages/prophet/forecaster.py", line 1181, in fit
self.params = self.stan_backend.fit(stan_init, dat, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/prophet/models.py", line 94, in fit
raise e
File "/opt/conda/lib/python3.7/site-packages/prophet/models.py", line 90, in fit
self.stan_fit = self.model.optimize(**args)
File "/opt/conda/lib/python3.7/site-packages/cmdstanpy/model.py", line 738, in optimize
raise RuntimeError(msg)
RuntimeError: Error during optimization! Command '/opt/conda/lib/python3.7/site-packages/prophet/stan_model/prophet_model.bin random seed=52644 data file=/tmp/tmptt1na6uf/ouw39fzc.json init=/tmp/tmptt1na6uf/q6k325gk.json output file=/tmp/tmptt1na6uf/prophet_modeli11498bg/prophet_model-20230627230157.csv method=optimize algorithm=newton iter=10000' failed:
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/fit/prophet.py", line 55, in
predictions = list(tqdm(pool.imap(run_prophet, seriesFrames), total = len(seriesFrames)))
File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
RuntimeError: Error during optimization! Command '/opt/conda/lib/python3.7/site-packages/prophet/stan_model/prophet_model.bin random seed=52644 data file=/tmp/tmptt1na6uf/ouw39fzc.json init=/tmp/tmptt1na6uf/q6k325gk.json output file=/tmp/tmptt1na6uf/prophet_modeli11498bg/prophet_model-20230627230157.csv method=optimize algorithm=newton iter=10000' failed:

Appreciate your help in this
Thanks,
Varsha

suggested updates on Prediction section in 03g

  1. First cell under Retrieve Records for Prediction Add `` to qualify the bigquery dataset in the query as shown below
    Before changes:
    pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()
    After changes:
    pred = bq.query(query = f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits='TEST' LIMIT {n}").to_dataframe()

MLB notebook fails on API enablement when run in Vertex AI Workbench

MLB example is brilliant! However, running the notebook (https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20-%20MLB%20Rules%20For%20Baseball.ipynb) in Vertex AI Workbench fails on this cell with authorization error:

# Enable Document AI For This Project
!gcloud services enable documentai.googleapis.com
# Enable Vertex AI For This Project
!gcloud services enable aiplatform.googleapis.com

Using the same approach used for setting the project ID corrects the problem and allows the APIs to be enabled without errors when the notebook is run in the Vertex AI Workbench:

try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    # Enable Document AI For This Project
    !gcloud services enable documentai.googleapis.com
    # Enable Vertex AI For This Project
    !gcloud services enable aiplatform.googleapis.com
except Exception:
    pass

02c: change of region causes problems.

Hi Mike, and thanks for this excellent series of tutorials.

I got as far as 02b when I started having resource availability issues in us-central1. After several days of not being able to open the notebook I decided to start again in a new region (europe-west6). I began with 00-setup and got as far as 02b without any issues. However, several pipeline tasks in 02c failed because if a region is not explicitly specified in a task the location defaults to us-central1. Hence the executors were looking for artifacts in the wrong region.

To cut a long story short, I only managed to complete the pipeline successfully by explicitly specifying a location in each task:

   # dataset
    dataset = TabularDatasetCreateOp(
        location = REGION,
        project = project,
        ....
    )
    
    # training
    model = AutoMLTabularTrainingJobRunOp(
        location = REGION,
        project = project,
        ...
    )
    
    # Endpoint: Creation
    endpoint = EndpointCreateOp(
        location = REGION,
        project = project,
        ...
    )

At first I tried explicitly setting the location in the top-level pipeline definition in the hope that this would cause the location to be inherited by the underlying tasks, but this didn't work. Perhaps there is another way of provoking this behavior....

As an aside, the solution above involved running the pipeline several times because I had to wait for each task to complete before I could verify that the next one was ok. This meant I trained the model (with identical data) three times, with a 2 hour wait each time. It was only later that I realized that Vertex pipelines have a cache which can be used to skip over repeated invocations of the same task, The reason the cache was not used was because you use a timestamp as part of the pipeline id. This is the equivalent of a "cache-buster" and excludes the cache by default. I would suggest using a less volatile pipeline id (eg a an explicit version number) and add a note to explain how the timestamp can be used to provoke a complete recalculation of the pipeline if necessary.

Thanks again for the work you have done here, it's super useful!

getting error in creating custom job from local script for 05a Notebook

Hi Mike,

I am getting error while creating custom job from local, 05a - Vertex AI Custom Model - TensorFlow - Custom Job With Python File

Getting this error File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/compat/__init__.py", line 18, in <module> from google.cloud.aiplatform.compat import services File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/compat/services/__init__.py", line 18, in <module> from google.cloud.aiplatform_v1beta1.services.dataset_service import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/__init__.py", line 21, in <module> from .services.dataset_service import DatasetServiceClient File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/__init__.py", line 16, in <module> from .client import DatasetServiceClient File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/client.py", line 51, in <module> from google.cloud.aiplatform_v1beta1.services.dataset_service import pagers File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/services/dataset_service/pagers.py", line 27, in <module> from google.cloud.aiplatform_v1beta1.types import annotation File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/__init__.py", line 184, in <module> from .feature_online_store_admin_service import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/feature_online_store_admin_service.py", line 26, in <module> from google.cloud.aiplatform_v1beta1.types import ( File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1beta1/types/feature_view_sync.py", line 24, in <module> from google.type import interval_pb2 # type: ignore ImportError: cannot import name 'interval_pb2' from 'google.type' (/opt/conda/lib/python3.7/site-packages/google/type/__init__.py)

Vertex AI Experiment Error (notebook 05i)

getting error while logging run name in the vertex ai experiments occurs when running hyper parameter notebook.

google.api_core.exceptions.AlreadyExists: 409 Context with name projects/1234/locations/us-central1/metadataStores/ default/contexts/experiment-05-05i-tf-classification-dnn-run-20230202120231-1 already exists

I am using same code and it's giving error.

Error while Configuring Local Docker to Use GCLOUD CLI

Hi Mike,
I am working on notebook "Vertex AI Custom Model - Prophet - Custom Job With Custom Container".
Below command throws an error

!gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

Error message -
WARNING: docker not in system PATH.
docker and docker-credential-gcloud need to be in the same PATH in order to work correctly together.
gcloud's Docker credential helper can be configured but it will not work until this is corrected.
Adding credentials for: us-west1-docker.pkg.dev
Docker configuration file updated.

Any idea, how to go about this?

Tabular-dataset-create_notebook_02c

Hello Mike, hope everything is going well, I tried to run the notebook 02c, but it looks like there is a problem when running the pipeline, the creation of the Tabular-dataset-create throws an error and stops the process, i tried to do a debug, but the logs of the error are not very friendly. I even tried to run the pipeline only with the tabular-datset-create component and check where the problem is at, but no luck. And I also checked the IAM roles and looks like everything is ok.

I hope you can help me, thanks in advance (:

Bad Request: 400 Syntax Error : Missing whitespace between literal and alias at [1:36] in 04 - Vertex AI Custom Model - scikit-learn - in Notebook

hey Mike!
Firstly , A Big Thank you for teaching people like me about Vertex AI and sharing your code.Really appreciate it and will be forever grateful for the awesome stuff.

Regarding the error in code block [15] for which I got the above error :Bad Request: 400 Syntax Error : Missing whitespace between literal and alias at [1:36]

query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{BQ_TABLE}'"
schema = bq.query(query).to_dataframe()
schema

I think it only needs :

query = f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.COLUMNS` WHERE TABLE_NAME = '{BQ_TABLE}'"
schema = bq.query(query).to_dataframe()
schema

this worked!

Explainations error in Notebook 2a

When executing the line of code explanation = endpoint.explain(instances=instances, parameters=parameters), I encounter a FailedPrecondition error with the message "Deployed model 8593450830185627648 does not support explanation."

Error TraceBack:

	status = StatusCode.FAILED_PRECONDITION
	details = "Deployed model 8593450830185627648 does not support explanation."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:64.233.183.95:443 {grpc_message:"Deployed model 8593450830185627648 does not support explanation.", grpc_status:9, created_time:"2023-06-03T03:17:19.874069042+00:00"}"
>
FailedPrecondition: 400 Deployed model 8593450830185627648 does not support explanation.

Any help is appreciated.Thanks.

FR: project structure skeleton

Feature request: create the skeleton of a project structure that integrates the following:

  • jupyter workbench development environment
  • trainer - code for custom training
  • predictor - code for custom prediction
  • shared featurizer (preprocessing module for trainer and predictor)
  • Dockerfile(s) for both a custom trainer and predictor along with cloudbuild.yaml(s)

Key goals:

  • avoid code duplication (e.g. copying featurizer b/t trainer and predictor)
  • simple project structure
  • follow / demo ML ops best practices

You might also create a video that demonstrates the development workflow.

Motivation: there are many guides that, in isolation, provide good coverage of a component, but they are always outside the context of an actual practical development workflow.

It might look something like this (?):

โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ setup.py
โ”œโ”€โ”€ notebooks
โ”‚   โ”œโ”€โ”€ explore.ipynb
โ”‚   โ””โ”€โ”€ prototype_model.ipynb
โ”œโ”€โ”€ predictor
โ”‚   ...
โ”œโ”€โ”€ common
โ”‚   โ”œโ”€โ”€ utils.py
โ””โ”€โ”€ trainer
    โ”œโ”€โ”€ Dockerfile
    โ”œโ”€โ”€ build.ipynb
    โ”œโ”€โ”€ cloudbuild.yaml
    โ””โ”€โ”€ src
        โ”œโ”€โ”€ __init__.py
        โ”œโ”€โ”€ dev get training data bq.ipynb
        โ”œโ”€โ”€ features.py
        โ”œโ”€โ”€ requirements.txt
        โ”œโ”€โ”€ sql
        โ”‚   โ””โ”€โ”€ train_data_gen.sql
        โ”œโ”€โ”€ train.py

MLB notebook fails on permission issue until compute engine default service account granted Vertex AI User permission on project

When you run the MLB notebook (https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20-%20MLB%20Rules%20For%20Baseball.ipynb) in Vertex Workbench, the get_embeddings statement generates a PermissionDenied error.

Here's the statement that generates the error:

embedding_model.get_embeddings([question])[0].values[0:5]

Here's the PermissionDenied error you get when you run that cell:

PermissionDenied: 403 Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/genai-test-project-may28/locations/us-central1/publishers/google/models/textembedding-gecko@001' (or it may not exist). [reason: "IAM_PERMISSION_DENIED"
domain: "aiplatform.googleapis.com"
metadata {
  key: "permission"
  value: "aiplatform.endpoints.predict"
}
metadata {
  key: "resource"
  value: "projects/genai-test-project-may28/locations/us-central1/publishers/google/models/textembedding-gecko@001"
}
]

You can get around this by granting the compute engine default service account (e.g. for project with project number 388500005335, compute engine default service account is: [email protected])) Vertex AI User role.

https://screenshot.googleplex.com/7qVMQNFxxHbs6ah

After granting the compute engine default service account this role in the console, the get_embeddings statement runs without error:

https://screenshot.googleplex.com/5F4h7oSHDk6u9EY

Suggest adding a note to the intro to this notebook to indicate that the user has to grant the Vertex AI User role to the compute engine default service account for their project. Adding code to do this automatically would be ideal, but I was only able to get as far as getting the compute engine default service account, not the command to grant this service account Vertex AI user role programmatically.

PROJECT_NUMBER_LIST = !gcloud projects list \
--filter="$(gcloud config get-value project)" \
--format="value(PROJECT_NUMBER)"
PROJECT_NUMBER = PROJECT_NUMBER_LIST[0]
compute_engine_default_service_account = PROJECT_NUMBER+'[email protected]'
print(compute_engine_default_service_account)

Trying to run Notebook 02c. with Pipeline. fails to create endpoint with strange error.

Hello Mike,
First up, thank you for the great videos. I am trying to understand vertex and implement pipelines.

Following through but the pipeline run stage is failing with the following. I dont even see a spec for the endpoint.description

Thanks in advance.

Eric.

"'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'endpoint.labels', 'description': 'There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores."

{
"insertId": "15csmzsfcvnaaq",
"jsonPayload": {
"levelname": "ERROR",
"message": "ValueError: Failed to create the resource. Error: {'code': 400, 'message': 'List of found errors:\t1.Field: endpoint.labels; Message: There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores. Label keys must start with a letter or number, must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8. Label values must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8.\t', 'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'endpoint.labels', 'description': 'There can be no more than 64 labels attached to a single resource. Label keys and values can only contain lowercase letters, numbers, dashes and underscores. Label keys must start with a letter or number, must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8. Label values must be less than 64 characters in length, and must be less that 128 bytes in length when encoded in UTF-8.'}]}]}\n"
},
"resource": {
"type": "ml_job",
"labels": {
"job_id": "9200781745828397056",
"task_name": "workerpool0-0",
"project_id": "vertextest-358106"
}
},
"timestamp": "2022-08-01T07:43:29.864480671Z",
"severity": "ERROR",
"labels": {
"compute.googleapis.com/zone": "us-central1-f",
"compute.googleapis.com/resource_name": "gke-cml-0801-073321--e2-standard-4-43-45039984-4q9l",
"ml.googleapis.com/trial_type": "",
"ml.googleapis.com/job_id/log_area": "root",
"ml.googleapis.com/trial_id": "",
"ml.googleapis.com/tpu_worker_id": "",
"compute.googleapis.com/resource_id": "6639669956991139615"
},
"logName": "projects/vertextest-358106/logs/workerpool0-0",
"receiveTimestamp": "2022-08-01T07:43:35.499563906Z"
}

Could you utilize MMR for UmpireBot - MLB Rules For Baseball

Would it be possible to update the document search to utilize MMR (Max marginal relevance) instead of this?
relevant_context = search_elements(question, k = 1 + 3int(10(1-closest_match)))

I want to say it will be better at grabbing more context for more ambiguous questions, but will leave final decision to you.

Relevant notebook: https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Vertex%20AI%20GenAI%20For%20Document%20Q%26A%20v2%20-%20MLB%20Rules%20For%20Baseball.ipynb

Details on MMR (https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/mmr)

Review data section in 03g - Switch cell type from markdown to code

  1. Switch cell type from Markdown to code for this cell:
    query = f""" WITH COUNTS as (SELECT splits, {VAR_TARGET}, count(*) as n FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} GROUP BY splits, {VAR_TARGET})
    SELECT *, SAFE_DIVIDE(n, SUM(n) OVER(PARTITION BY {VAR_TARGET})) as n_pct_class, SAFE_DIVIDE(n, SUM(n) OVER(PARTITION BY splits)) as n_pct_split, SAFE_DIVIDE(SUM(n) OVER(PARTITION BY {VAR_TARGET}), SUM(n) OVER()) as class_pct_to SUM(n) OVER() as total,tal FROM COUNTS """ review = bq.query(query = query).to_dataframe() review

  2. Add comma after to SUM(n) OVER()) as class_pct_to

  3. Remove the last attribute in the query tal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.