googlecloudplatform / vertex-ai-samples Goto Github PK

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.

Home Page: https://cloud.google.com/vertex-ai

License: Apache License 2.0

Python 2.04% Dockerfile 0.28% Shell 0.04% Jupyter Notebook 97.64%

samples google-cloud-platform vertex-ai notebook ml genai generative-ai vertexai model-garden colab

vertex-ai-samples's Introduction

Google Cloud Vertex AI Samples

This repository contains notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.

Overview

Vertex AI is a fully-managed, unified AI development platform for building and using generative AI. This repository is designed to help you get started with Vertex AI. Whether you're new to Vertex AI or an experienced ML practitioner, you'll find valuable resources here.

For more Vertex AI Generative AI notebook samples, please visit the Vertex AI Generative AI GitHub repository.

Explore, learn and contribute

You can explore, learn, and contribute to this repository to unleash the full potential of machine learning on Vertex AI!

Explore and learn

Explore this repository, follow the links in the header section of each of the notebooks to -

Open and run the notebook in Colab
Open and run the notebook in Colab Enterprise
Open and run the notebook in Vertex AI Workbench
View the notebook on Github

Contribute

See the Contributing Guide.

Get started

To get started using Vertex AI, you must have a Google Cloud project.

If you don't have a Google Cloud project, you can learn and build on GCP for free using Free Trail.
Once you have a Google Cloud project, you can learn more about setting up a project and a development environment.

Repository structure

├── notebooks
│   ├── official - Notebooks demonstrating use of each Vertex AI service
│   │   ├── automl
│   │   ├── custom
│   │   ├── ...
│   ├── community - Notebooks contributed by the community
│   │   ├── model_garden
│   │   ├── ...
├── community-content - Sample code and tutorials contributed by the community

Examples

Category	Product	Description
Model	`Model Garden/`	Curated collection of first-party, open-source, and third-party models available on Vertex AI including Gemini, Gemma, Llama 3, Claude 3 and many more.
Data	`Feature Store/`	Set up and manage online serving using Vertex AI Feature Store.
	`datasets/`	Use BigQuery and Data Labeling service with Vertex AI.
Model development	`automl/`	Train and make predictions on AutoML models
	`custom/`	Create, deploy and serve custom models on Vertex AI
	`ray_on_vertex_ai/`	Use Colab Enterprise and Vertex AI SDK for Python to connect to the Ray Cluster.
Deploy and use	`prediction/`	Build, train and deploy models using prebuilt containers for custom training and prediction.
	`model_registry/`	Use Model Registry to create and register a model.
	`Explainable AI/`	Use Vertex Explainable AI's feature-based and example-based explanations to explain how or why a model produced a specific prediction.
	`ml_metadata/`	Record the metadata and artifacts and query that metadata to help analyze, debug, and audit the performance of your ML system.
Tools	`Pipelines/`	Use `Vertex AI Pipelines` and `Google Cloud Pipeline Components` to build, tune, or deploy a custom model.

Get help

Please use the Issues page to provide feedback or submit a bug report.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

References

Vertex AI Jupyter Notebook tutorials
Vertex AI Generative AI GitHub repository
Vertex AI documentaton

vertex-ai-samples's People

Contributors

Stargazers

Watchers

Forkers

renovate-bot balajibr kathyfeiyang rajeshthallam ark-kun yutsai84 loopinf karu2976 yinghsienwu shenzhimo2 classicvalues bkl42 mcenirygroupm lclc19 cortiztech git19112019 austinfrazer epimenta-loreal diemtvu annasochandure-ssk mansiachuthan kumarnavn snapbuy hoonji aryamantaore degschta sararob pepecura sapient007 sudarshan333u sudarshan-springml cwzkevin henriquepeixoto rafael-carvalho ironpan pp-ct asmeyatsky masumbhuiyan lucasalvarengac kentmercier aliciaframe aba-ca-xi txwikinger hrjoshi28 f6wbl6 candicehou07 smathalikunnel ramanhacks yoyitowang maxhardt ronyu21 axelmagn vipinn123 jinglishi0206 ekmixon gabriel7131 saurav177 abhinavk2760 zzzhy jortizse changlan minhhienvo368 tech-for-fun yuya-honma esenthil2018 marsbroshok kweinmeister nomanciit carlosdeabreu13 nikohunt y56 ukrcbreddy1 izuna385 akshay-dunzo jisu-im-dev hercules261188 levoleg irvifa mcubuktepe agniji meenakshiarumugam bigdatavik moly-malibu jx-mlopslabs nicain rubenszimbres diegomarcello manasagudimella polong-lin kieranboyce bilygeorg ppatrzalek enochkan ultrons fmordarski monika-venck gogasca noshad-vida amaddahi ann-korea

vertex-ai-samples's Issues

google_cloud_pipeline_components Compiling Error in metadata Parameter in ModelDeployOp()

I try to write something to metadata of vertext ai model.
and my code is like this:

import google.cloud.aiplatform as aip 
from google_cloud_pipeline_components import aiplatform as gcc_aip

@dsl.pipeline(
    name="test-vertext-ai-02",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline():
    ds_op = gcc_aip.TextDatasetCreateOp(
        project=PROJECT_ID,
        display_name=dataset_display_name,
        gcs_source=gcs_source,
        import_schema_uri=aip.schema.dataset.ioformat.text.extraction,
    )

training_job_run_op = gcc_aip.CustomContainerTrainingJobRunOp(
    project=PROJECT_ID,
    display_name='pipeline_test_0112_1030',
    dataset=ds_op.outputs["dataset"],
    annotation_schema_uri=aip.schema.dataset.annotation.text.extraction,
    model_display_name=model_display_name,
    base_output_dir=base_output_dir,
    container_uri='us-central1-docker.pkg.dev/fr-dev-piiworker/ml-training/test_01_image:latest',
    staging_bucket='gs://vertex_ai_test_dev',
    model_serving_container_image_uri=model_serving_container_image_uri,
)

gcc_aip.ModelDeployOp(
    model=training_job_run_op.outputs["model"],
    metadata=None
)

then i compile this code:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=pipeline, package_path="intro_pipeline.json".replace(" ", "_")
)

but i get ths error message:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_280/3730446987.py in <module>
  2 
  3 compiler.Compiler().compile(
----> 4     pipeline_func=pipeline, package_path="intro_pipeline.json".replace(" ", "_")
  5 )

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, 
pipeline_name, pipeline_parameters, type_check)
   1275                 pipeline_func=pipeline_func,
   1276                 pipeline_name=pipeline_name,
-> 1277                 pipeline_parameters_override=pipeline_parameters)
   1278             self._write_pipeline(pipeline_job, package_path)
   1279         finally:

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, 
pipeline_name, pipeline_parameters_override)
   1194 
   1195         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
-> 1196             pipeline_func(*args_list)
   1197 
   1198         if not dsl_pipeline.ops:

/tmp/ipykernel_280/786563907.py in pipeline()
 25     gcc_aip.ModelDeployOp(
 26         model=training_job_run_op.outputs["model"],
---> 27         metadata=None
 28     )

TypeError: model_deploy() got an unexpected keyword argument 'metadata'

so how to fix this error?
Thanks!

Problem in the preprocessing function while defining the serving_default signature for a model in a community notebook

Problem

In the section Deployment > Upload the model for serving > Serving function for image data of this notebook, the preprocess function is defined as,

def _preprocess(bytes_input):
    decoded = tf.io.decode_jpeg(bytes_input, channels=3)
    decoded = tf.image.convert_image_dtype(decoded, tf.float32)
    resized = tf.image.resize(decoded, size=(32, 32))
    rescale = tf.cast(resized / 255.0, tf.float32)
    return rescale

For this, the author has written,

resized / 255.0 - Rescales (normalization) the pixel data between 0 and 1.

But if we check the working of tf.image.convert_image_dtype as per the current TensorFlow version 2.7.0, we see that this function not only converts the dtype, but also normalizes/rescales the image according to the target dtype.

In the documentation, it is mentioned,

Images that are represented using floating-point values are expected to have values in the range [0,1). Image data stored in integer data types are expected to have values in the range [0, MAX], where MAX is the largest positive representable number for the data type.

This means that while converting to tf.float32 it also normalizes the decoded image in the range [0,1]. Therefore, the line rescale = tf.cast(resized / 255.0, tf.float32) must be omitted.

Specifications

Version: Tensorflow Version: 2.7.0

[Policy Bot] found one or more issues with this repository.

Policy Bot found one or more issues with this repository.

Add Vertex Workbench Executor testing

Add Workbench Executor integration to CI to test that certain notebooks can work fine in that environment.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: `Syntax error near ,
],

}
`

How to reuse an existing Endpoint in pipeline?

Hi Experts

Expected Behavior

Hope to reuse an existing endpoint in pipeline.

Also see that google_cloud_pipeline_components.v1.endpoint has only EndpointCreateOp method.
https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.0/google_cloud_pipeline_components.v1.endpoint.html

Is there a way to reuse an existing Endpoint? And how to reuse it? This will saves costs.

Actual Behavior

Have to create a new Endpoint in the pipeline.
See that the EndpointCreateOp step is included in all the sample code.

 endpoint_op = EndpointCreateOp(
     project=project,
     location=region,
     display_name="train-automl-flowers",
 )

 ModelDeployOp(
     model=training_run_task.outputs["model"],
     endpoint=endpoint_op.outputs["endpoint"],
     automatic_resources_min_replica_count=1,
     automatic_resources_max_replica_count=1,
 )

Steps to Reproduce the Problem

N/A

Specifications

Version:
google-cloud-pipeline-components-1.0.0
Platform:
Google Cloud Vertex AI

Using model.resource_name and endpoint.resource_name to instantiate the model and endpoint resources

Expected Behavior

In the notebooks/community/managed_notebooks/fraud_detection/fraud-detection-model.ipynb
we need to Instantiate the resources using the resource names.

model = aiplatform.Model(model_name=model.resource_name)
endpoint = aiplatform.Endpoint(endpoint_name=endpoint.resource_name)

Actual Behavior

Failed to instantiate the resources using the displayed_names

Steps to Reproduce the Problem

Create a Managed-notebook in Vertex AI Workbench
open the terminal and git clone the repo
go to notebooks/community/managed_notebooks/fraud_detection/fraud-detection-model.ipynb
run the cells until cell of "Deploy the model to the created Endpoint"
run the current cell

Specifications

Version:
Platform:

mlops_pipeline_tf_agents_bandits_movie_recommendation.ipynb

Expected Behavior

In the Author and run the RL pipeline part, once we submit the pipeline
Training should not error out

Actual Behavior

No such object: gs://<bucket>/pipeline/<>/movielens-pipeline-startup-<>/train-reinforcement-learning-policy_<>/training_artifacts_dir; Failed to read GCS file: gs://<bucket>/pipeline/<>/movielens-pipeline-startup-<>/train-reinforcement-learning-policy_<>/training_artifacts_dir.; Failed to read output parameter training_artifacts_dir with spec type: STRING ; Failed to get and update task output.; Failed to refresh external task state

The pipeline fails during the training step,

I am thinking if this might be related
#19 (comment)

@KathyFeiyang if you are still maintaining this, is this something you have run into? I tried using Str after checking the discussion above it does not throw an error but i guess it still does not work as i get an error on the next step

Steps to Reproduce the Problem

Ran the notebook as is with required parameters

Specifications

Platform: Vertex AI workbench notebooks

pipelines_intro_kfp.ipynb: Failing at kfp import

Expected Behavior

This import fails

from kfp.v2.google.client import AIPlatformClient  # noqa: F811

Error message

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
   3015         try:
-> 3016             return self.__dep_map
   3017         except AttributeError:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
   3006         try:
-> 3007             return self._pkg_info
   3008         except AttributeError:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _pkg_info

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_19/3716807617.py in <module>
----> 1 from kfp.v2.google.client import AIPlatformClient  # noqa: F811
      2 
      3 api_client = AIPlatformClient(project_id=PROJECT_ID, region=REGION)
      4 
      5 # adjust time zone and cron schedule as necessary

/usr/local/lib/python3.9/site-packages/kfp/v2/google/client/__init__.py in <module>
     13 # limitations under the License.
     14 
---> 15 from kfp.v2.google.client.client import AIPlatformClient

/usr/local/lib/python3.9/site-packages/kfp/v2/google/client/client.py in <module>
     27 from google.oauth2 import credentials
     28 from google.protobuf import json_format
---> 29 from googleapiclient import discovery
     30 
     31 from kfp.v2.google.client import client_utils

/usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py in <module>
     66 from googleapiclient.errors import UnknownApiNameOrVersion
     67 from googleapiclient.errors import UnknownFileType
---> 68 from googleapiclient.http import build_http
     69 from googleapiclient.http import BatchHttpRequest
     70 from googleapiclient.http import HttpMock

/usr/local/lib/python3.9/site-packages/googleapiclient/http.py in <module>
     65 from googleapiclient.errors import UnexpectedBodyError
     66 from googleapiclient.errors import UnexpectedMethodError
---> 67 from googleapiclient.model import JsonModel
     68 
     69 

/usr/local/lib/python3.9/site-packages/googleapiclient/model.py in <module>
     34 from googleapiclient.errors import HttpError
     35 
---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
     37 _PY_VERSION = platform.python_version()
     38 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_distribution(dist)
    464         dist = Requirement.parse(dist)
    465     if isinstance(dist, Requirement):
--> 466         dist = get_provider(dist)
    467     if not isinstance(dist, Distribution):
    468         raise TypeError("Expected string, Requirement, or Distribution", dist)

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
    340     """Return an IResourceProvider for the named module or requirement"""
    341     if isinstance(moduleOrReq, Requirement):
--> 342         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
    343     try:
    344         module = sys.modules[moduleOrReq]

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in require(self, *requirements)
    884         included, even if they were already activated in this working set.
    885         """
--> 886         needed = self.resolve(parse_requirements(requirements))
    887 
    888         for dist in needed:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
    778 
    779             # push the new requirements onto the stack
--> 780             new_requirements = dist.requires(req.extras)[::-1]
    781             requirements.extend(new_requirements)
    782 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in requires(self, extras)
   2732     def requires(self, extras=()):
   2733         """List of Requirements needed for this distro if `extras` are used"""
-> 2734         dm = self._dep_map
   2735         deps = []
   2736         deps.extend(dm.get(None, ()))

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
   3016             return self.__dep_map
   3017         except AttributeError:
-> 3018             self.__dep_map = self._compute_dependencies()
   3019             return self.__dep_map
   3020 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _compute_dependencies(self)
   3025         reqs = []
   3026         # Including any condition expressions
-> 3027         for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
   3028             reqs.extend(parse_requirements(req))
   3029 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
   3007             return self._pkg_info
   3008         except AttributeError:
-> 3009             metadata = self.get_metadata(self.PKG_INFO)
   3010             self._pkg_info = email.parser.Parser().parsestr(metadata)
   3011             return self._pkg_info

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_metadata(self, name)
   1405             return ""
   1406         path = self._get_metadata_path(name)
-> 1407         value = self._get(path)
   1408         try:
   1409             return value.decode('utf-8')

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _get(self, path)
   1609 
   1610     def _get(self, path):
-> 1611         with open(path, 'rb') as stream:
   1612             return stream.read()
   1613 

FileNotFoundError: [Errno 2] No such file or directory: '/builder/home/.local/lib/python3.9/site-packages/google_auth-2.3.3.dist-info/METADATA'

control_flow_kfp fails to open in colab or GCP notebook

https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/pipelines/control_flow_kfp.ipynb

Errors out with JSON error in colab and Vertex notebook.

Support VPC in notebook execution test

Some features in notebooks require them to be run in a VPC. For example: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/matching_engine/matching_engine_for_indexing.ipynb

See https://cloud.google.com/build/docs/private-pools/set-up-private-pool-environment#setup-private-connection

notebooks/official/explainable_ai/gapic-custom_tabular_regression_batch_explain.ipynb doesn't install tensorflow

Step #3: / [0 files][    0.0 B/135.8 KiB]                                                
/ [1 files][135.8 KiB/135.8 KiB]                                                
Step #3: Operation completed over 1 objects/135.8 KiB.                                    
Step #3: Uploaded output to: gs://cloud-build-notebooks-presubmit/executed_notebooks/PR_120/BUILD_831c33e0-bc70-4cd3-a3cf-a6d3e9b3c4ae/gapic-custom_tabular_regression_batch_explain.ipynb
Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [33]":
Step #3: ---------------------------------------------------------------------------
Step #3: ModuleNotFoundError                       Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/818968412.py in <module>
Step #3: ----> 1 import tensorflow as tf
Step #3:       2 
Step #3:       3 model = tf.keras.models.load_model(MODEL_DIR)
Step #3: 
Step #3: ModuleNotFoundError: No module named 'tensorflow'
Step #3: 
Finished Step #3
ERROR
ERROR: build step 3 "gcr.io/cloud-devrel-public-resources/python-samples-testing-docker:latest" failed: step exited with non-zero status: 1

Logs: https://console.cloud.google.com/cloud-build/builds/a09a6f51-c0f9-473d-9233-929a8bb0ccda?project=1012616486416

In notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb Can't create custom job

Expected Behavior

Start custom training job in Vertex.

Code

def create_custom_job(custom_job):
    response = clients["job"].create_custom_job(parent=PARENT, custom_job=custom_job)
    print("name:", response.name)
    print("display_name:", response.display_name)
    print("state:", response.state)
    print("create_time:", response.create_time)
    print("update_time:", response.update_time)
    return response

response = create_custom_job(custom_job)

Actual Behavior

Got this error:

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     65         try:
---> 66             return callable_(*args, **kwargs)
     67         except grpc.RpcError as exc:

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "Received http2 header with status: 404"
	debug_error_string = "{"created":"@1644551451.237575140","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":131,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}"
>

The above exception was the direct cause of the following exception:

MethodNotImplemented                      Traceback (most recent call last)
/tmp/ipykernel_12135/2660296843.py in <module>
      8     return response
      9 
---> 10 response = create_custom_job(custom_job)

/tmp/ipykernel_12135/2660296843.py in create_custom_job(custom_job)
      1 def create_custom_job(custom_job):
----> 2     response = clients["job"].create_custom_job(parent=PARENT, custom_job=custom_job)
      3     print("name:", response.name)
      4     print("display_name:", response.display_name)
      5     print("state:", response.state)

/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py in create_custom_job(self, request, parent, custom_job, retry, timeout, metadata)
    648 
    649         # Send the request.
--> 650         response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
    651 
    652         # Done; return the response.

/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
    152             kwargs["metadata"] = metadata
    153 
--> 154         return wrapped_func(*args, **kwargs)
    155 
    156 

/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     66             return callable_(*args, **kwargs)
     67         except grpc.RpcError as exc:
---> 68             raise exceptions.from_grpc_error(exc) from exc
     69 
     70     return error_remapped_callable

MethodNotImplemented: 501 Received http2 header with status: 404

google_cloud_pipeline_components_automl_images.ipynb fails due to AutoMLImageClassificationDeployedModelNodes quota exceeded

Logs: https://pantheon.corp.google.com/vertex-ai/locations/us-central1/pipelines/runs/automl-image-training-v2-20220202211108?project=python-docs-samples-tests

RuntimeError: Failed to create the resource. Error: {'code': 8, 'message': 'The following quotas are exceeded: AutoMLImageClassificationDeployedModelNodes'}

Solutions

See why cleanup script doesn't solve this.
Increase quota.

Regression test failures: Various KFP failures

https://pantheon2.corp.google.com/cloud-build/builds;region=global/3d191399-2605-44d7-9422-c6f5768e10e2?project=python-docs-samples-tests

automl_tabular_classification_beans.ipynb: https://console.cloud.google.com/cloud-build/builds/75d65a4d-bdde-4d86-a922-6fcc626c1a10?project=1012616486416
custom_model_training_and_batch_prediction.ipynb: https://console.cloud.google.com/cloud-build/builds/a2597317-9cb4-4d24-84d1-906c5896c209?project=1012616486416
google_cloud_pipeline_components_automl_tabular.ipynb: https://console.cloud.google.com/cloud-build/builds/907259e8-8b21-4aef-8740-4127558fbef1?project=1012616486416
google_cloud_pipeline_components_automl_text.ipynb: https://console.cloud.google.com/cloud-build/builds/a8b53d3b-9ab4-420b-a398-c573f2f79307?project=1012616486416
pipelines_intro_kfp.ipynb: https://console.cloud.google.com/cloud-build/builds/43b4e35d-d11e-446e-8ef0-205a8638226d?project=1012616486416

On initiail inspection, could be a 500 error or a retry timeout.

Custom model deployed with a docker container but requests are not working as expected

Context

I have been using the Ludwig AI library to create tensorflow models. The library includes a serve tool to serve a model via HTTP, much like the Pytorch serve.

I'm attempting to use a model trained with Ludwig and serve with Ludwig serve from a custom docker container deployed as a custom model in Vertex attached to an endpoint.

I've described the context in more detail in a discussion in the Ludwig github.

Expected Behavior

Using the following JSON format for a request:

{
  "instances": [
    {
      "textfeature": "Words to be classified"
    }
  ]
}

The endpoint should return a JSON object with predictions from Ludwig serve.

Actual Behavior

I get an error from Ludwig serve:

{"error":"entry must contain all input features"}

Steps to Reproduce the Problem

This is a bit tricky, but I will explain at a high level. If need be, I can provide complete notebook with the whole procedure.

Train a text classification model with ludwig.
Create a custom docker container to use Ludwig serve and the trained model.
Submit the container to vertex container registry.
Deploy a custom model.
Attach the model to an endpoint.

Any ideas?

Replace use of os.environ["IS_TESTING"] with os.getenv("IS_TESTING")

Outside of the CI test environment, any calls to os.environ["IS_TESTING"] will raise a KeyError.

We should use os.getenv() or os.environment.get(), which will return None if the key is not found.

README.md "Tutorials" Hyperlink Broken

Expected Behavior

Hyperlink takes you to the "Tutorials" section.

Actual Behavior

Receive a 404 error.

Hyperlinked path: https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/tutorials

Steps to Reproduce the Problem

Click on the "Tutorials" link and it takes you to a tutorials section

Specifications

Version: N/A
Platform: N/A

Having trouble in running Two-Tower built-in algorithm

Hello, so I was trying to run this code
I was trying to understand the Input Schema, this seems like an invalid JSON schema format.
I tried to run the code with this schema, and I got an error.
I am unsure as to what is the valid input schema format here.
Here is the error downloaded from the logs.

 {
    "textPayload": "The replica workerpool0-0 exited with a non-zero status of 1. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1004927726779&resource=ml_job%2Fjob_id%2F1464827664639459328&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%221464827664639459328%22",
    "insertId": "1calpzlc593",
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "service",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:31.582429920Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/endpoint": ""
    },
    "logName": "projects/trell-staging/logs/ml.googleapis.com%2F1464827664639459328",
    "receiveTimestamp": "2022-01-18T11:16:32.678545724Z"
  },
  {
    "insertId": "t523mkflys0mw",
    "jsonPayload": {
      "message": "json.decoder.JSONDecodeError: Extra data: line 1 column 8 (char 7)\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582601640Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_id": "439373723071094359"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mv",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    raise JSONDecodeError(\"Extra data\", s, end)\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582595651Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mu",
    "jsonPayload": {
      "message": "  File \"/usr/lib/python3.8/json/decoder.py\", line 340, in decode\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "project_id": "trell-staging",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582584385Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/zone": "us-central1-c"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mt",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    return _default_decoder.decode(s)\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "project_id": "trell-staging",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582578628Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0ms",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "  File \"/usr/lib/python3.8/json/__init__.py\", line 357, in loads\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "project_id": "trell-staging",
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582572352Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mr",
    "jsonPayload": {
      "message": "    return loads(fp.read(),\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "job_id": "1464827664639459328",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582566745Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/trial_id": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mq",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "  File \"/usr/lib/python3.8/json/__init__.py\", line 293, in load\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582560508Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_type": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mp",
    "jsonPayload": {
      "message": "    input_schema = json.load(tf.io.gfile.GFile(args.input_schema_path))\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582554286Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_id": "",
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "compute.googleapis.com/zone": "us-central1-c"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mo",
    "jsonPayload": {
      "message": "  File \"/root/two_tower/task.py\", line 89, in main\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582548477Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mn",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    main()\n"
    }

Failed to pick channel; failed to connect to all addresses

Expected Behavior

In "Test your query" cell of notebook, a Match is expected to be returned

Actual Behavior

_InactiveRpcError Traceback (most recent call last)
/tmp/ipykernel_19870/467153318.py in
108 request.float_val.append(val)
109
--> 110 response = stub.Match(request)
111 response

~/.local/lib/python3.7/site-packages/grpc/_channel.py in call(self, request, timeout, metadata, credentials, wait_for_ready, compression)
944 state, call, = self._blocking(request, timeout, metadata, credentials,
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
948 def with_call(self,

~/.local/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
847 return state.response
848 else:
--> 849 raise _InactiveRpcError(state)
850
851

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1635211606.895231791","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3158,"referenced_errors":[{"created":"@1635211606.895230354","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":147,"grpc_status":14}]}"

Steps to Reproduce the Problem

Run https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/community/matching_engine/matching_engine_for_indexing.ipynb - with appropriate changes for your project, region, etc.
Get all the way to "Test your query"
Observe that this code leads to the above error:

import match_service_pb2
import match_service_pb2_grpc

channel = grpc.insecure_channel("{}:10000".format(DEPLOYED_INDEX_SERVER_IP))
stub = match_service_pb2_grpc.MatchServiceStub(channel)

Specifications

Version:
Platform: JupyterLab on Google Vertex Notebook

[Policy Bot] found one or more issues with this repository.

Policy Bot found one or more issues with this repository.

notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb doesn't install tensorflow

Logs: https://pantheon.corp.google.com/cloud-build/builds/98afe3ac-1589-4c10-bb9d-f28e1841558c?project=python-docs-samples-tests

Step #3: Copying file://notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb [Content-Type=application/octet-stream]...
Step #3: / [0 files][    0.0 B/139.4 KiB]                                                
/ [1 files][139.4 KiB/139.4 KiB]                                                
Step #3: Operation completed over 1 objects/139.4 KiB.                                    
Step #3: Uploaded output to: gs://cloud-build-notebooks-presubmit/executed_notebooks/PR_120/BUILD_831c33e0-bc70-4cd3-a3cf-a6d3e9b3c4ae/gapic-custom_tabular_regression_online_explain.ipynb
Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [33]":
Step #3: ---------------------------------------------------------------------------
Step #3: ModuleNotFoundError                       Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/818968412.py in <module>
Step #3: ----> 1 import tensorflow as tf
Step #3:       2 
Step #3:       3 model = tf.keras.models.load_model(MODEL_DIR)
Step #3: 
Step #3: ModuleNotFoundError: No module named 'tensorflow'
Step #3: 
Finished Step #3
ERROR
ERROR: build step 3 "gcr.io/cloud-devrel-public-resources/python-samples-testing-docker:latest" failed: step exited with non-zero status: 1

automl_tabular_classification_beans: Failed at KFP DAG run

Command

job.run

Error

Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [28]":
Step #3: ---------------------------------------------------------------------------
Step #3: RuntimeError                              Traceback (most recent call last)
Step #3: /tmp/ipykernel_19/2758967286.py in <module>
Step #3:       8 )
Step #3:       9 
Step #3: ---> 10 job.run()
Step #3:      11 
Step #3:      12 get_ipython().system(' rm tabular_classification_pipeline.json')
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
Step #3:     728                 if self:
Step #3:     729                     VertexAiResourceNounWithFutureManager.wait(self)
Step #3: --> 730                 return method(*args, **kwargs)
Step #3:     731 
Step #3:     732             # callbacks to call within the Future (in same Thread)
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in run(self, service_account, network, sync)
Step #3:     250         self.submit(service_account=service_account, network=network)
Step #3:     251 
Step #3: --> 252         self._block_until_complete()
Step #3:     253 
Step #3:     254     def submit(
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
Step #3:     347         # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
Step #3:     348         if self._gca_resource.state in _PIPELINE_ERROR_STATES:
Step #3: --> 349             raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
Step #3:     350         else:
Step #3:     351             _LOGGER.log_action_completed_against_resource("run", "completed", self)
Step #3: 
Step #3: RuntimeError: Job failed with:
Step #3: code: 9
Step #3: message: "The DAG failed because some tasks failed. The failed tasks are: [condition-deploy-decision-1].; Job (project_id = python-docs-samples-tests, job_id = 4820017083611873280) is failed due to the above error.; Failed to handle the job: {project_number = 1012616486416, job_id = 4820017083611873280}"

Unable to run custom job in sdk-custom-image-classification-batch.ipynb

In vertex-ai-samples/notebooks/official/custom/sdk-custom-image-classification-batch.ipynb, when I run the following cell on a Vertex AI workbench notebook

job = aiplatform.CustomTrainingJob(
    display_name=JOB_NAME,
    script_path="task.py",
    container_uri=TRAIN_IMAGE,
    requirements=["tensorflow_datasets==1.3.0"],
    model_serving_container_image_uri=DEPLOY_IMAGE,
)

MODEL_DISPLAY_NAME = "cifar10-" + TIMESTAMP

if TRAIN_GPU:
    model = job.run(
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_type=TRAIN_GPU.name,
        accelerator_count=TRAIN_NGPU,
    )
else:
    model = job.run(
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_count=0,
    )

it gives error
InvalidArgument: 400 Accelerator "NVIDIA_TESLA_K80" is not supported for machine type "n1-standard-4".

My understanding is Tesla K80 is compatible with 4vcpu, as listed here. https://cloud.google.com/ai-platform/training/docs/using-gpus

lightweight_functions_component_io_kfp fails

Step #4: lightweight_functions_component_io_kfp.ipynb                      FAILED    00:00:18    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [20]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         KeyError                                  Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_32486/4032352611.py in <module>
Step #4:                                                                                               1 from kfp.v2 import compiler
Step #4:                                                                                               2
Step #4:                                                                                         ----> 3 compiler.Compiler().compile(
Step #4:                                                                                               4     pipeline_func=pipeline, package_path="component_io_job.json"
Step #4:                                                                                               5 )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
Step #4:                                                                                            1175             kfp.TYPE_CHECK = type_check
Step #4:                                                                                            1176             kfp.COMPILING_FOR_V2 = True
Step #4:                                                                                         -> 1177             pipeline_job = self._create_pipeline_v2(
Step #4:                                                                                            1178                 pipeline_func=pipeline_func,
Step #4:                                                                                            1179                 pipeline_name=pipeline_name,
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
Step #4:                                                                                            1106
Step #4:                                                                                            1107         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
Step #4:                                                                                         -> 1108             pipeline_func(*args_list)
Step #4:                                                                                            1109
Step #4:                                                                                            1110         self._validate_exit_handler(dsl_pipeline)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_32486/2896638886.py in pipeline(message)
Step #4:                                                                                              14     train_task = train(
Step #4:                                                                                              15         dataset_one=preprocess_task.outputs["output_dataset_one"],
Step #4:                                                                                         ---> 16         dataset_two=preprocess_task.outputs["output_dataset_two"],
Step #4:                                                                                              17         imported_dataset=importer.output,
Step #4:                                                                                              18         message=preprocess_task.outputs["output_parameter"],
Step #4: 
Step #4:                                                                                         KeyError: 'output_dataset_two'

Build failure: Model deploy failure on google_cloud_pipeline_components_automl_tabular.ipynb

Investigate if this is a one-off deployment error, or there is something that needs to be fixed in this notebook.

Notebook

Build details

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_19/3131610175.py in <module>
      7 )
      8 
----> 9 job.run()
     10 
     11 get_ipython().system(' rm tabular_regression_pipeline.json')

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
    673                 if self:
    674                     VertexAiResourceNounWithFutureManager.wait(self)
--> 675                 return method(*args, **kwargs)
    676 
    677             # callbacks to call within the Future (in same Thread)

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in run(self, service_account, network, sync)
    250         self.submit(service_account=service_account, network=network)
    251 
--> 252         self._block_until_complete()
    253 
    254     def submit(

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
    347         # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
    348         if self._gca_resource.state in _PIPELINE_ERROR_STATES:
--> 349             raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
    350         else:
    351             _LOGGER.log_action_completed_against_resource("run", "completed", self)

RuntimeError: Job failed with:
code: 9
message: "The DAG failed because some tasks failed. The failed tasks are: [model-deploy].; Job (project_id = python-docs-samples-tests, job_id = ...) is failed due to the above error.; Failed to handle the job: {project_number = ..., job_id = ...}"

Vertex-AI Feature Store can't do batch serving from BigQuery as stated in documentation

I'm following this documentation and this example notebook to try and do batch serving using Vertex-AI feature store.

My mapping between label and entity_id+timestamp exists in a bigquery table. The documentation says the following, which suggests that it should be possible to do batch serving with a mapping in bigquery:

The read-instance list specifies the entities and timestamps for the feature values that you want to retrieve. The CSV file or BigQuery table must contain the following columns

However, I can't find any way to do this. The featurestore_service_pb2.BatchReadFeatureValuesRequest class has a required field csv_read_instances (link), but doesn't accept a bigquery source

Describe the solution you'd like

Make it possible to do batch serving with a mapping between features and labels in bigquery. For example, by letting the BatchReadFeatureValuesRequest class accept a BigquerySource instead of a CsvSource.

Colab link (among others) broken on https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/pipelines/pipelines_intro_kfp.ipynb

See https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/pipelines/pipelines_intro_kfp.ipynb

Upgrade "Intro to Pipelines" notebook to not use kfp

The "Intro to pipelines" notebook uses the kfp.compiler API to create a pipeline. During execution, the notebook reports that the kfp.v2 library will not be supported in future versions of google-cloud-aiplatform.

The introduction notebook should show best practices for building a pipeline, including the most current APIs to use.

Default branch transition to main

Deployment Prediction Error

Hi, I've been following the tutorial but have been making minor adjustments to fit a GCN model rather than a text-classification one. I have gotten up to the section where we send a POST request for prediction.

When I follow the request sent on the tutorial:

curl -s -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @./predictor/data_object.json \
  http://localhost:7080/predictions/test_model/

I get the error:

Error 1: 

{
  "code": 500,
  "type": "InternalServerException",
  "message": "Worker died."
}

I modified the curl command and sent:

curl -s -X POST \
  -d @./predictor/data_object.json \
  http://localhost:7080/predictions/test_model/

which got me this error:

Error 2: 
{
  "code": 503,
  "type": "InternalServerException",
  "message": "Prediction failed"
}

I think I've pinpointed the issues in the docker logs about what the issues might be, but I can't seem to find a solution.

Error 1 logs:

2022-01-26T17:29:44,750 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-01-26T17:29:44,751 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     worker.run_server()
2022-01-26T17:29:44,752 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 161, in run_server
2022-01-26T17:29:44,753 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2022-01-26T17:29:44,755 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 116, in handle_connection
2022-01-26T17:29:44,756 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     cmd, msg = retrieve_msg(cl_socket)
2022-01-26T17:29:44,756 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 36, in retrieve_msg
2022-01-26T17:29:44,757 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     msg = _retrieve_inference_msg(conn)
2022-01-26T17:29:44,757 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 226, in _retrieve_inference_msg
2022-01-26T17:29:44,758 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     request = _retrieve_request(conn)
2022-01-26T17:29:44,759 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 261, in _retrieve_request
2022-01-26T17:29:44,759 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     input_data = _retrieve_input_data(conn)
2022-01-26T17:29:44,760 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 314, in _retrieve_input_data
2022-01-26T17:29:44,760 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     model_input["value"] = json.loads(value.decode("utf-8"))
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     return _default_decoder.decode(s)
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
2022-01-26T17:29:44,773 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2022-01-26T17:29:44,773 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
2022-01-26T17:29:44,774 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     raise JSONDecodeError("Expecting value", s, err.value) from None
2022-01-26T17:29:44,775 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG - json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Error 2 Logs:

2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/service.py", line 102, in predict
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/base.py", line 26, in handle
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     data = self.parse_input(data)
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 19, in parse_input
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     lengths, batch = self._batch_from_json(data)
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 30, in _batch_from_json
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0 ACCESS_LOG - /172.17.0.1:51646 "POST /predictions/test_dict/ HTTP/1.1" 503 6
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     mini_batches = [self._from_json(data_row) for data_row in data_rows]
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0 TS_METRICS - Requests5XX.Count:1|#Level:Host|#hostname:6c7c4274b7e0,timestamp:null
2022-01-26T17:33:27,449 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 30, in <listcomp>
2022-01-26T17:33:27,449 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     mini_batches = [self._from_json(data_row) for data_row in data_rows]
2022-01-26T17:33:27,449 [DEBUG] W-9001-test_dict_1.0 org.pytorch.serve.job.Job - Waiting time ns: 162058, Inference time ns: 6406818
2022-01-26T17:33:27,451 [INFO ] W-9001-test_dict_1.0 TS_METRICS - WorkerThreadTime.ms:6|#Level:Host|#hostname:6c7c4274b7e0,timestamp:null
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 39, in _from_json
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     rows = (data.get('data') or data.get('body') or data)['instances']
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG - KeyError: 'instances'

My json object:

{
       "instances": [
                     {
                         "data": {
                                 "x": [[-1.0], [0.0], [1.0]], 
                               "edge_index": [[0, 1, 1, 2], [1, 0, 2, 1]],
                                 "y": [1, 2, 3]
                    }
              }
       ]
  }

I've tested out my json object many times to make sure it loads properly, and I've followed the structure shown in the tutorial making sure that instances is a key in the json. So I'm really really stuck as to what's happening. Could someone please help me!

pipelines_intro_kfp.ipynb has problems on Colab

if not os.environ["IS_TESTING"] should be if not os.getenv("IS_TESTING"):

ModelUploadOp from "Vertex AI Pipelines: model upload using google-cloud-pipeline-components" does not work

Expected Behavior

Code example from "Vertex AI Pipelines: model train, upload, and deploy using google-cloud-pipeline-components" 1 should work as intended.

Actual Behavior

Code example below from "Vertex AI Pipelines: model train, upload, and deploy using google-cloud-pipeline-components" 1 had issue and does not work

from google_cloud_pipeline_components import aiplatform as gcc_aip
    from google.cloud import aiplatform
    aiplatform.init(project=project, location=region)

    # THIS IS THE METHOD THAT DOESN'T APPEAR TO WORK
    model_upload_op = gcc_aip.ModelUploadOp(
            project=project,
            location=region,
            display_name=model_display_name,
            artifact_uri=model.uri,
            serving_container_image_uri=serving_container_image_uri
            )

On the other hand, the method below worked:

 # THIS METHOD DOES WORK
    # aiplatform.Model.upload(
    #     display_name=model_display_name,
    #     artifact_uri=model.uri,
    #     serving_container_image_uri=serving_container_image_uri,
    # )

I'm currently using Vertex AI Pipelines to train a model and upload to Vertex AI. Currently in the pipeline, I'm attempting to use the ModelUploadOp class to upload a custom model to Vertex AI models. The logs show the job is succeeding, but the model never actually gets uploaded.

Steps to Reproduce the Problem

Specifications

Version:

Pipeline SDK (Kubeflow Pipelines/TFX) Version: kfp
Pipelines Version: kfp==1.8.11
Platform: Google Cloud Vertex AI

how can we do early stopping for sample notebook I provided in the below

I saw there're argument called disable_early_stopping in autoML. Can we do the same thing in the below notebook?
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/36455b8125802459f3a40752fbda0e4c9407c854/notebooks/official/migration/UJ11%20Vertex%20SDK%20Hyperparameter%20Tuning.ipynb

intro-swivel downloads external code

It looks like the https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/matching_engine/intro-swivel.ipynb notebook downloads code from a Cloud Bucket

gsutil cp gs://cloud-samples-data/vertex-ai/matching-engine/swivel/pipeline/* .`

It isn't really standard practice to download external code to run. This means that code might not be tested/reviewed and could change at any time.
Notebooks in this repo must be self-contained notebooks, with all code inside the notebook..

Preferred solution: Rewrite the .sh files as code inside the notebook.
Alternative solution: Move your notebook to "community-content"

Regression test failures: Explainability notebooks have GAPIC API retries that are too short

Build log: https://pantheon2.corp.google.com/cloud-build/builds;region=global/3d191399-2605-44d7-9422-c6f5768e10e2?project=python-docs-samples-tests

gapic-custom_image_classification_batch_explain.ipynb
gapic-custom_image_classification_online_explain.ipynb
gapic-custom_tabular_regression_batch_explain.ipynb
gapic-custom_tabular_regression_online_explain.ipynb

Likely all are failing at the model upload step.

Can't cancel LRO for batch ingestion after deleting entity type / featurestore

Expected Behavior

Feature Store Batch Ingestion: I would expect to be able to cancel a batch ingestion job, either through the web UI or through the REST API.

Actual Behavior

I tried to cancel the operations through the rest API and got the response:

{
  "error": {
    "code": 400,
    "message": "Operation projects/<PROJECT_ID>/locations/us-central1/operations/<OPERATION_ID> is not cancellable.",
    "status": "FAILED_PRECONDITION"
  }
}

I tried to delete the entity type and then the featurestore in order to force the ingest job to stop, but it's still running. The only way I could get the LRO to stop was to delete the featurestore entirely, at which point it fails and gives the following error:

Online serving currently unavailable, Please retry the request shortly or reach out to support upon continued failure.

Steps to Reproduce the Problem

Create a featurestore and entity
start an LRO for batch ingest
Try and cancel it with REST API

intro-swivel.ipynb fails on TF import

Logs: https://pantheon.corp.google.com/cloud-build/builds/ebea2843-3345-4a7e-a4e4-5c625613d42d?project=python-docs-samples-tests

Error in intro-swivel.ipynb when importing tensorflow:

Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [14]":
Step #3: ---------------------------------------------------------------------------
Step #3: AttributeError                            Traceback (most recent call last)
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
Step #3:    3015         try:
Step #3: -> 3016             return self.__dep_map
Step #3:    3017         except AttributeError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
Step #3:    2812         if attr.startswith('_'):
Step #3: -> 2813             raise AttributeError(attr)
Step #3:    2814         return getattr(self._provider, attr)
Step #3: 
Step #3: AttributeError: _DistInfoDistribution__dep_map
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: AttributeError                            Traceback (most recent call last)
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
Step #3:    3006         try:
Step #3: -> 3007             return self._pkg_info
Step #3:    3008         except AttributeError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
Step #3:    2812         if attr.startswith('_'):
Step #3: -> 2813             raise AttributeError(attr)
Step #3:    2814         return getattr(self._provider, attr)
Step #3: 
Step #3: AttributeError: _pkg_info
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: FileNotFoundError                         Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/2420721388.py in <module>
Step #3:       1 import pandas as pd
Step #3: ----> 2 import tensorflow as tf
Step #3:       3 from google.cloud import aiplatform
Step #3:       4 from kfp.v2.google import client
Step #3:       5 from sklearn.metrics.pairwise import cosine_similarity
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/__init__.py in <module>
Step #3:      39 import sys as _sys
Step #3:      40 
Step #3: ---> 41 from tensorflow.python.tools import module_util as _module_util
Step #3:      42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
Step #3:      43 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/__init__.py in <module>
Step #3:      47 from tensorflow.python import distribute
Step #3:      48 # from tensorflow.python import keras
Step #3: ---> 49 from tensorflow.python.feature_column import feature_column_lib as feature_column
Step #3:      50 # from tensorflow.python.layers import layers
Step #3:      51 from tensorflow.python.module import module
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/feature_column/feature_column_lib.py in <module>
Step #3:      20 
Step #3:      21 # pylint: disable=unused-import,line-too-long,wildcard-import,g-bad-import-order
Step #3: ---> 22 from tensorflow.python.feature_column.feature_column import *
Step #3:      23 from tensorflow.python.feature_column.feature_column_v2 import *
Step #3:      24 from tensorflow.python.feature_column.sequence_feature_column import *
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/feature_column/feature_column.py in <module>
Step #3:     145 from tensorflow.python.framework import sparse_tensor as sparse_tensor_lib
Step #3:     146 from tensorflow.python.framework import tensor_shape
Step #3: --> 147 from tensorflow.python.layers import base
Step #3:     148 from tensorflow.python.ops import array_ops
Step #3:     149 from tensorflow.python.ops import check_ops
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/layers/base.py in <module>
Step #3:      18 from __future__ import print_function
Step #3:      19 
Step #3: ---> 20 from tensorflow.python.keras.legacy_tf_layers import base
Step #3:      21 
Step #3:      22 InputSpec = base.InputSpec
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/__init__.py in <module>
Step #3:      23 
Step #3:      24 # See b/110718070#comment18 for more details about this import.
Step #3: ---> 25 from tensorflow.python.keras import models
Step #3:      26 
Step #3:      27 from tensorflow.python.keras.engine.input_layer import Input
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/models.py in <module>
Step #3:      18 from tensorflow.python.framework import ops
Step #3:      19 from tensorflow.python.keras import backend
Step #3: ---> 20 from tensorflow.python.keras import metrics as metrics_module
Step #3:      21 from tensorflow.python.keras import optimizer_v1
Step #3:      22 from tensorflow.python.keras.engine import functional
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/metrics.py in <module>
Step #3:      32 from tensorflow.python.framework import ops
Step #3:      33 from tensorflow.python.framework import tensor_shape
Step #3: ---> 34 from tensorflow.python.keras import activations
Step #3:      35 from tensorflow.python.keras import backend
Step #3:      36 from tensorflow.python.keras.engine import base_layer
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/activations.py in <module>
Step #3:      16 
Step #3:      17 from tensorflow.python.keras import backend
Step #3: ---> 18 from tensorflow.python.keras.layers import advanced_activations
Step #3:      19 from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object
Step #3:      20 from tensorflow.python.keras.utils.generic_utils import serialize_keras_object
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/layers/__init__.py in <module>
Step #3:      20 # pylint: disable=g-bad-import-order
Step #3:      21 # pylint: disable=g-import-not-at-top
Step #3: ---> 22 from tensorflow.python.keras.engine.input_layer import Input
Step #3:      23 from tensorflow.python.keras.engine.input_layer import InputLayer
Step #3:      24 from tensorflow.python.keras.engine.input_spec import InputSpec
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_layer.py in <module>
Step #3:      22 from tensorflow.python.keras import backend
Step #3:      23 from tensorflow.python.keras.distribute import distributed_training_utils
Step #3: ---> 24 from tensorflow.python.keras.engine import base_layer
Step #3:      25 from tensorflow.python.keras.engine import keras_tensor
Step #3:      26 from tensorflow.python.keras.engine import node as node_module
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in <module>
Step #3:      46 from tensorflow.python.keras import initializers
Step #3:      47 from tensorflow.python.keras import regularizers
Step #3: ---> 48 from tensorflow.python.keras.engine import base_layer_utils
Step #3:      49 from tensorflow.python.keras.engine import input_spec
Step #3:      50 from tensorflow.python.keras.engine import keras_tensor
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in <module>
Step #3:      29 from tensorflow.python.keras.utils import control_flow_util
Step #3:      30 from tensorflow.python.keras.utils import tf_inspect
Step #3: ---> 31 from tensorflow.python.keras.utils import tf_utils
Step #3:      32 from tensorflow.python.ops import array_ops
Step #3:      33 from tensorflow.python.ops import variables as tf_variables
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py in <module>
Step #3:      20 
Step #3:      21 from tensorflow.python.data.experimental.ops import cardinality
Step #3: ---> 22 from tensorflow.python.distribute.coordinator import cluster_coordinator as coordinator_lib
Step #3:      23 from tensorflow.python.eager import context
Step #3:      24 from tensorflow.python.framework import composite_tensor
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py in <module>
Step #3:      32 from six.moves import queue
Step #3:      33 
Step #3: ---> 34 from tensorflow.python.distribute import parameter_server_strategy_v2
Step #3:      35 from tensorflow.python.distribute.coordinator import coordinator_context
Step #3:      36 from tensorflow.python.distribute.coordinator import metric_utils
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy_v2.py in <module>
Step #3:      32 from tensorflow.python.distribute import mirrored_run
Step #3:      33 from tensorflow.python.distribute import multi_worker_util
Step #3: ---> 34 from tensorflow.python.distribute import parameter_server_strategy
Step #3:      35 from tensorflow.python.distribute import ps_values
Step #3:      36 from tensorflow.python.distribute import sharded_variable
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy.py in <module>
Step #3:      32 from tensorflow.python.distribute import ps_values
Step #3:      33 from tensorflow.python.distribute import values
Step #3: ---> 34 from tensorflow.python.distribute.cluster_resolver import SimpleClusterResolver
Step #3:      35 from tensorflow.python.distribute.cluster_resolver import TFConfigClusterResolver
Step #3:      36 from tensorflow.python.eager import context
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/__init__.py in <module>
Step #3:      29 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import SimpleClusterResolver
Step #3:      30 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import UnionClusterResolver
Step #3: ---> 31 from tensorflow.python.distribute.cluster_resolver.gce_cluster_resolver import GCEClusterResolver
Step #3:      32 from tensorflow.python.distribute.cluster_resolver.kubernetes_cluster_resolver import KubernetesClusterResolver
Step #3:      33 from tensorflow.python.distribute.cluster_resolver.slurm_cluster_resolver import SlurmClusterResolver
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/gce_cluster_resolver.py in <module>
Step #3:      26 _GOOGLE_API_CLIENT_INSTALLED = True
Step #3:      27 try:
Step #3: ---> 28   from googleapiclient import discovery  # pylint: disable=g-import-not-at-top
Step #3:      29   from oauth2client.client import GoogleCredentials  # pylint: disable=g-import-not-at-top
Step #3:      30 except ImportError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py in <module>
Step #3:      66 from googleapiclient.errors import UnknownApiNameOrVersion
Step #3:      67 from googleapiclient.errors import UnknownFileType
Step #3: ---> 68 from googleapiclient.http import build_http
Step #3:      69 from googleapiclient.http import BatchHttpRequest
Step #3:      70 from googleapiclient.http import HttpMock
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/http.py in <module>
Step #3:      65 from googleapiclient.errors import UnexpectedBodyError
Step #3:      66 from googleapiclient.errors import UnexpectedMethodError
Step #3: ---> 67 from googleapiclient.model import JsonModel
Step #3:      68 
Step #3:      69 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/model.py in <module>
Step #3:      34 from googleapiclient.errors import HttpError
Step #3:      35 
Step #3: ---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
Step #3:      37 _PY_VERSION = platform.python_version()
Step #3:      38 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_distribution(dist)
Step #3:     464         dist = Requirement.parse(dist)
Step #3:     465     if isinstance(dist, Requirement):
Step #3: --> 466         dist = get_provider(dist)
Step #3:     467     if not isinstance(dist, Distribution):
Step #3:     468         raise TypeError("Expected string, Requirement, or Distribution", dist)
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
Step #3:     340     """Return an IResourceProvider for the named module or requirement"""
Step #3:     341     if isinstance(moduleOrReq, Requirement):
Step #3: --> 342         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
Step #3:     343     try:
Step #3:     344         module = sys.modules[moduleOrReq]
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in require(self, *requirements)
Step #3:     884         included, even if they were already activated in this working set.
Step #3:     885         """
Step #3: --> 886         needed = self.resolve(parse_requirements(requirements))
Step #3:     887 
Step #3:     888         for dist in needed:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
Step #3:     778 
Step #3:     779             # push the new requirements onto the stack
Step #3: --> 780             new_requirements = dist.requires(req.extras)[::-1]
Step #3:     781             requirements.extend(new_requirements)
Step #3:     782 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in requires(self, extras)
Step #3:    2732     def requires(self, extras=()):
Step #3:    2733         """List of Requirements needed for this distro if `extras` are used"""
Step #3: -> 2734         dm = self._dep_map
Step #3:    2735         deps = []
Step #3:    2736         deps.extend(dm.get(None, ()))
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
Step #3:    3016             return self.__dep_map
Step #3:    3017         except AttributeError:
Step #3: -> 3018             self.__dep_map = self._compute_dependencies()
Step #3:    3019             return self.__dep_map
Step #3:    3020 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _compute_dependencies(self)
Step #3:    3025         reqs = []
Step #3:    3026         # Including any condition expressions
Step #3: -> 3027         for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
Step #3:    3028             reqs.extend(parse_requirements(req))
Step #3:    3029 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
Step #3:    3007             return self._pkg_info
Step #3:    3008         except AttributeError:
Step #3: -> 3009             metadata = self.get_metadata(self.PKG_INFO)
Step #3:    3010             self._pkg_info = email.parser.Parser().parsestr(metadata)
Step #3:    3011             return self._pkg_info
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_metadata(self, name)
Step #3:    1405             return ""
Step #3:    1406         path = self._get_metadata_path(name)
Step #3: -> 1407         value = self._get(path)
Step #3:    1408         try:
Step #3:    1409             return value.decode('utf-8')
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _get(self, path)
Step #3:    1609 
Step #3:    1610     def _get(self, path):
Step #3: -> 1611         with open(path, 'rb') as stream:
Step #3:    1612             return stream.read()
Step #3:    1613 
Step #3: 
Step #3: FileNotFoundError: [Errno 2] No such file or directory: '/builder/home/.local/lib/python3.9/site-packages/google_auth-2.3.3.dist-info/METADATA'

No module named 'google.cloud.aiplatform' in sdk_custom_image_classification_online_explain.ipynb

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Cloned this git repository into a Managed Notebooks Environment in Vertex AI
Tried running the notebook sample sdk_custom_image_classification_online_explain.ipynb
the cell "import google.cloud.aiplatform as aip" errors out with ModuleNotFoundError: No module named 'google.cloud.aiplatform'

Specifications

Version:
Platform:

Add BigQuery Admin permission to service account for notebook execution tests

Due to a recent policy bot change, the [email protected] no longer has BigQuery write permissions.

Exception encountered at "In [23]":
Step #3: ---------------------------------------------------------------------------
Step #3: Forbidden                                 Traceback (most recent call last)
Step #3: Input In [23], in <module>
Step #3:      15 dataset_region = "US"  # @param {type : "string"}
Step #3:      16 bq_dataset.location = dataset_region
Step #3: ---> 17 bq_dataset = client.create_dataset(bq_dataset)
Step #3:      18 print(
Step #3:      19     "Created bigquery dataset {} in {}".format(
Step #3:      20         batch_predict_bq_output_dataset_path, dataset_region
Step #3:      21     )
Step #3:      22 )
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/bigquery/client.py:632, in Client.create_dataset(self, dataset, exists_ok, retry, timeout)
Step #3:     629 try:
Step #3:     630     span_attributes = {"path": path}
Step #3: --> 632     api_response = self._call_api(
Step #3:     633         retry,
Step #3:     634         span_name="BigQuery.createDataset",
Step #3:     635         span_attributes=span_attributes,
Step #3:     636         method="POST",
Step #3:     637         path=path,
Step #3:     638         data=data,
Step #3:     639         timeout=timeout,
Step #3:     640     )
Step #3:     641     return Dataset.from_api_repr(api_response)
Step #3:     642 except core_exceptions.Conflict:
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/bigquery/client.py:782, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
Step #3:     778 if span_name is not None:
Step #3:     779     with create_span(
Step #3:     780         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
Step #3:     781     ):
Step #3: --> 782         return call()
Step #3:     784 return call()
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/api_core/retry.py:283, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
Step #3:     279 target = functools.partial(func, *args, **kwargs)
Step #3:     280 sleep_generator = exponential_sleep_generator(
Step #3:     281     self._initial, self._maximum, multiplier=self._multiplier
Step #3:     282 )
Step #3: --> 283 return retry_target(
Step #3:     284     target,
Step #3:     285     self._predicate,
Step #3:     286     sleep_generator,
Step #3:     287     self._deadline,
Step #3:     288     on_error=on_error,
Step #3:     289 )
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/api_core/retry.py:190, in retry_target(target, predicate, sleep_generator, deadline, on_error)
Step #3:     188 for sleep in sleep_generator:
Step #3:     189     try:
Step #3: --> 190         return target()
Step #3:     192     # pylint: disable=broad-except
Step #3:     193     # This function explicitly must deal with broad exceptions.
Step #3:     194     except Exception as exc:
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/_http/__init__.py:480, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout)
Step #3:     469 response = self._make_request(
Step #3:     470     method=method,
Step #3:     471     url=url,
Step #3:    (...)
Step #3:     476     timeout=timeout,
Step #3:     477 )
Step #3:     479 if not 200 <= response.status_code < 300:
Step #3: --> 480     raise exceptions.from_http_response(response)
Step #3:     482 if expect_json and response.content:
Step #3:     483     return response.json()
Step #3: 
Step #3: Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/python-docs-samples-tests/datasets?prettyPrint=false: Access Denied: Project python-docs-samples-tests: User does not have bigquery.datasets.create permission in project python-docs-samples-tests.
Step #3:

Solution:

Write an exemption CL like this: cl/427208423

Affected notebooks:

https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/sdk_automl_tabular_forecasting_batch.ipynb

trainer.save_model(“gs://*****/“) doesn’t save the mode in GCP bucket?

Expected Behavior

The model should be saved in GCP bucket using trainer function.

Actual Behavior

The model is not saved.

Steps to Reproduce the Problem

train.py inside Custom Container

training_args = tr.TrainingArguments(

     output_dir='gs://****/results_mlm_exp2'
    ,logging_dir='gs://****/logs_mlm_exp2'        # directory for storing logs
    ,save_strategy="epoch"
    ,learning_rate=2e-5
    ,logging_steps=2000
    ,overwrite_output_dir=True
    ,num_train_epochs=20
    ,per_device_train_batch_size=4
    ,prediction_loss_only=True
    ,gradient_accumulation_steps=16

)
trainer = tr.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data
)
# print("training to start without bf16")
trainer.train()
trainer.save_model("gs://****/model_mlm_exp2")

Build timeout errors: Explainability GAPIC notebooks

gapic-custom_image_classification_batch_explain.ipynb and gapic-custom_image_classification_online_explain.ipynb both had timeout errors in the build. We should investigate if this was a one-off or a recurring issue.

Update: this issue has also been seen in gapic-custom_tabular_regression_batch_explain.ipynb and gapic-custom_tabular_regression_online_explain.ipynb

Step #3: RetryError: Deadline of 180.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7949f60460>>), last exception: 
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: TimeoutError                              Traceback (most recent call last)
Step #3: /tmp/ipykernel_19/629009188.py in <module>
Step #3:      20 
Step #3:      21 
Step #3: ---> 22 model_to_deploy_id = upload_model(
Step #3:      23     "cifar10-" + TIMESTAMP, IMAGE_URI, model_path_to_deploy
Step #3:      24 )
Step #3: 
Step #3: /tmp/ipykernel_19/629009188.py in upload_model(display_name, image_uri, model_uri)
Step #3:      14     response = clients["model"].upload_model(parent=PARENT, model=model)
Step #3:      15     print("Long running operation:", response.operation.name)
Step #3: ---> 16     upload_model_response = response.result(timeout=180)
Step #3:      17     print("upload_model_response")
Step #3:      18     print(" model:", upload_model_response.model)
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/api_core/future/polling.py in result(self, timeout, retry)
Step #3:     130         """
Step #3:     131         kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
Step #3: --> 132         self._blocking_poll(timeout=timeout, **kwargs)
Step #3:     133 
Step #3:     134         if self._exception is not None:
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/api_core/future/polling.py in _blocking_poll(self, timeout, retry)
Step #3:     110             retry_(self._done_or_raise)(**kwargs)
Step #3:     111         except exceptions.RetryError:
Step #3: --> 112             raise concurrent.futures.TimeoutError(
Step #3:     113                 "Operation did not complete within the designated " "timeout."
Step #3:     114             )
Step #3: 
Step #3: TimeoutError: Operation did not complete within the designated timeout.

Authentication process needed for automl-text-classification.ipynb

Expected Behavior

I tried to run the sample notebook of Vertex AI AutoML text classification model.

Actual Behavior

This notebooks has no authentication process so the user can not access the GCP resources.

Steps to Reproduce the Problem

Open the notebook in colab
Run all the cell

Workaround

Running following cell resolve the issue

from google.colab import auth
auth.authenticate_user()

"Failed to parse the container spec json payload to requested prototype" within CustomTrainingJobOp

Expected Behavior

Submit a custom training job op within a VPC peering and associated reserved ip ranges with pipeline params passed as args.
Component runs successfully

Actual Behavior

The compiler fails, complaining that the pipelineparam is not json serializable
TypeError: Object of type PipelineParam is not JSON serializable as seen here. For a parameter being passed to a training operation, this really doesn't make any sense.

If all pipiline params are removed before compilation, the Vertex component fails with the following error.

The full redacted object dump is here.

{
    "display_name": "SOMEOP",
    "job_spec": {
        "worker_pool_specs": [
            {
                "containerSpec": {
                    "args": [
                        "-A",
                        "AAAA",
                        "-B",
                        "BBBB",
                        "-C",
                        "CCCCC",
                        "-D",
                        "SOME_GS_URL"
                    ],
                    "env": [
                        {
                            "name": "AIP_MODEL_DIR",
                            "value": "SOME_GS_URL"
                        }
                    ],
                    "imageUri": "SOME_CONTAINER_IMAGE"
                },
                "replicaCount": "1",
                "machineSpec": {
                    "machineType": "n1-standard-8"
                }
            }
        ],
        "scheduling": {
            "timeout": "15m",
            "restart_job_on_worker_restart": "false"
        },
        "service_account": "[email protected]",
        "tensorboard": "TENSORBOARD_ID",
        "enable_web_access": "false",
        "network": "NETWORK_ID",
        "reserved_ip_ranges": [
            "google-reserved-range"
        ],
        "base_output_directory": {
            "output_uri_prefix": "SOME_GS_URL"
        }
    },
    "labels": {},
    "encryption_spec": {
        "kms_key_name": ""
    }
}

This is despite me following the guide as described here, which seems a little outdated in places? Any help would be greatly appreciated. Cheers!

Steps to Reproduce the Problem

google-cloud-pipeline-components = ^1.0.1
kfp ^1.8.11
Compile pipeline and upload to vertex
Training component fails

Specifications

Version: 1.0.1
Platform: Vertex AI on GCP

Check official CODEOWNERS for references to non-existent Github handles

notebooks/community/CODEOWNERS is referencing internal usernames instead of Github handles.

CODEOWNERS errors
Unknown owner on line 6: make sure @aferlitsch exists and has write access to the repository
/sdk/sdk_* @aferlitsch
Unknown owner on line 7: make sure @aferlitsch exists and has write access to the repository
/gapic @aferlitsch
Unknown owner on line 8: make sure @aferlitsch exists and has write access to the repository
/ml_ops @aferlitsch
Unknown owner on line 9: make sure @mco exists and has write access to the repository
/model_monitoring/* @mco
Unknown owner on line 12: make sure @notebooks-team exists and has write access to the repository
/managed_notebooks/ @notebooks-team
Unknown owner on line 15: make sure @thehardikv exists and has write access to the repository
…_Model_Training_Example.ipynb @thehardikv
Unknown owner on line 16: make sure @thehardikv exists and has write access to the repository
…ting_evaluating_a_model.ipynb @thehardikv
Unknown owner on line 19: make sure @benofben exists and has write access to the repository
/neo4j @benofben @htappen
Unknown owner on line 19: make sure @htappen exists and has write access to the repository
/neo4j @benofben @htappen
Unknown owner on line 21: make sure @wattli exists and has write access to the repository
/tensorboard @yfang1 @wattli
Unknown owner on line 22: make sure @inardini exists and has write access to the repository
…store @nayaknishant @morgandu @inardini

model_monitoring notebook is flaky

Describe the bug
model_monitoring notebook is flaky

Step #4: model_monitoring.ipynb                                            FAILED    00:28:43    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [13]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         _InactiveRpcError                         Traceback (most recent call last)
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              66         try:
Step #4:                                                                                         ---> 67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
Step #4:                                                                                             945                                       wait_for_ready, compression)
Step #4:                                                                                         --> 946         return _end_unary_response_blocking(state, call, False, None)
Step #4:                                                                                             947
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
Step #4:                                                                                             848     else:
Step #4:                                                                                         --> 849         raise _InactiveRpcError(state)
Step #4:                                                                                             850
Step #4: 
Step #4:                                                                                         _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
Step #4:                                                                                         	status = StatusCode.INVALID_ARGUMENT
Step #4:                                                                                         	details = "List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.	"
Step #4:                                                                                         	debug_error_string = "{"created":"@1629366729.915949502","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"List of found errors:\t1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.\t","grpc_status":3}"
Step #4:                                                                                         >
Step #4: 
Step #4:                                                                                         The above exception was the direct cause of the following exception:
Step #4: 
Step #4:                                                                                         InvalidArgument                           Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_31500/3164745998.py in <module>
Step #4:                                                                                              20 objective_configs = set_objectives(model_ids, objective_template)
Step #4:                                                                                              21
Step #4:                                                                                         ---> 22 monitoring_job = create_monitoring_job(objective_configs)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_31500/1942246749.py in create_monitoring_job(objective_configs)
Step #4:                                                                                              56     client = JobServiceClient(client_options=options)
Step #4:                                                                                              57     parent = f"projects/{PROJECT_ID}/locations/{REGION}"
Step #4:                                                                                         ---> 58     response = client.create_model_deployment_monitoring_job(
Step #4:                                                                                              59         parent=parent, model_deployment_monitoring_job=job
Step #4:                                                                                              60     )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/cloud/aiplatform_v1beta1/services/job_service/client.py in create_model_deployment_monitoring_job(self, request, parent, model_deployment_monitoring_job, retry, timeout, metadata)
Step #4:                                                                                            2294
Step #4:                                                                                            2295         # Send the request.
Step #4:                                                                                         -> 2296         response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
Step #4:                                                                                            2297
Step #4:                                                                                            2298         # Done; return the response.
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
Step #4:                                                                                             143             kwargs["metadata"] = metadata
Step #4:                                                                                             144
Step #4:                                                                                         --> 145         return wrapped_func(*args, **kwargs)
Step #4:                                                                                             146
Step #4:                                                                                             147
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4:                                                                                         ---> 69             six.raise_from(exceptions.from_grpc_error(exc), exc)
Step #4:                                                                                              70
Step #4:                                                                                              71     return error_remapped_callable
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/six.py in raise_from(value, from_value)
Step #4: 
Step #4:                                                                                         InvalidArgument: 400 List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.
Step #4: metrics_viz_run_compare_kfp.ipynb                                 FAILED    00:01:55    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [28]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         ValueError                                Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_1835/3492064536.py in <module>
Step #4:                                                                                         ----> 1 df = pd.DataFrame(pipeline_df["metric.confidenceMetrics"][0])
Step #4:                                                                                               2 auc = np.trapz(df["recall"], df["falsePositiveRate"])
Step #4:                                                                                               3 plt.plot(df["falsePositiveRate"], df["recall"], label="auc=" + str(auc))
Step #4:                                                                                               4 plt.legend(loc=4)
Step #4:                                                                                               5 plt.show()
Step #4: 
Step #4:                                                                                         ~/.local/lib/python3.9/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
Step #4:                                                                                             728         else:
Step #4:                                                                                             729             if index is None or columns is None:
Step #4:                                                                                         --> 730                 raise ValueError("DataFrame constructor not properly called!")
Step #4:                                                                                             731
Step #4:                                                                                             732             # Argument 1 to "ensure_index" has incompatible type "Collection[Any]";
Step #4: 
Step #4:                                                                                         ValueError: DataFrame constructor not properly called!

What sample is this bug related to?

Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

System Information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Framework and version (Tensorflow, scikit-learn, XGBoost):
Python version:
Exact command to reproduce:
Tensorflow Transform environment (if applicable, see below):

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context
Please fix (it may be hard to reproduce as it doesn't happen all the time) or move to unofficial.

Google cloud logging type ERROR for regular training progress output

Expected Behavior

Google cloud logging for a custom training job should report training progress as type INFO.

Actual Behavior

Logs show type ERROR

Steps to Reproduce the Problem

I've been using the notebook in this repo: Training, Tuning and Deploying a PyTorch Text Classification Model on Vertex AI

In the section, "Run Custom Job on Vertex Training with a pre-built container" a package is created and uploaded to a bucket and the job is run. However when monitoring the logs, I get the following:

ERROR 2022-01-26 14:19:34 +1100 workerpool0-0 31%|??? | 5974/19120 [31:21<59:25, 3.69it/s]
ERROR 2022-01-26 14:19:41 +1100 workerpool0-0 Configuration saved in /tmp/xlm-roberta-large/checkpoint-6000/config.json

For some events, it shows:

INFO 2022-01-26 14:19:41 +1100 workerpool0-0 {'loss': 5.3862, 'learning_rate': 1.3723849372384938e-05, 'epoch': 3.14}

Have trouble to use bigquery

google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/vf64cb108e9b1b4cfp-tp/jobs?prettyPrint=false: Access Denied: Project vf64cb108e9b1b4cfp-tp: User does not have bigquery.jobs.create permission in project vf64cb108e9b1b4cfp-tp.

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Version:
Platform:

custom_model_training_and_batch_prediction: Failed at tensorflow import

Command

import tensorflow as tf

Error

ContextualVersionConflict Traceback (most recent call last)
/tmp/ipykernel_20/3384745563.py in
----> 1 import tensorflow as tf
2 from google_cloud_pipeline_components import aiplatform as gcc_aip
3 from google_cloud_pipeline_components.experimental.custom_job import utils
4 from kfp.v2 import compiler, dsl
5 from kfp.v2.dsl import component

~/.local/lib/python3.9/site-packages/tensorflow/init.py in
39 import sys as _sys
40
---> 41 from tensorflow.python.tools import module_util as _module_util
42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
43

~/.local/lib/python3.9/site-packages/tensorflow/python/init.py in
46 from tensorflow.python import data
47 from tensorflow.python import distribute
---> 48 from tensorflow.python import keras
49 from tensorflow.python.feature_column import feature_column_lib as feature_column
50 from tensorflow.python.layers import layers

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/init.py in
23
24 # See b/110718070#comment18 for more details about this import.
---> 25 from tensorflow.python.keras import models
26
27 from tensorflow.python.keras.engine.input_layer import Input

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/models.py in
18 from tensorflow.python.framework import ops
19 from tensorflow.python.keras import backend
---> 20 from tensorflow.python.keras import metrics as metrics_module
21 from tensorflow.python.keras import optimizer_v1
22 from tensorflow.python.keras.engine import functional

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/metrics.py in
35 from tensorflow.python.framework import tensor_shape
36 from tensorflow.python.framework import tensor_spec
---> 37 from tensorflow.python.keras import activations
38 from tensorflow.python.keras import backend
39 from tensorflow.python.keras.engine import base_layer

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/activations.py in
16
17 from tensorflow.python.keras import backend
---> 18 from tensorflow.python.keras.layers import advanced_activations
19 from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object
20 from tensorflow.python.keras.utils.generic_utils import serialize_keras_object

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/layers/init.py in
20 # pylint: disable=g-bad-import-order
21 # pylint: disable=g-import-not-at-top
---> 22 from tensorflow.python.keras.engine.input_layer import Input
23 from tensorflow.python.keras.engine.input_layer import InputLayer
24 from tensorflow.python.keras.engine.input_spec import InputSpec

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_layer.py in
22 from tensorflow.python.keras import backend
23 from tensorflow.python.keras.distribute import distributed_training_utils
---> 24 from tensorflow.python.keras.engine import base_layer
25 from tensorflow.python.keras.engine import keras_tensor
26 from tensorflow.python.keras.engine import node as node_module

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in
47 from tensorflow.python.keras import initializers
48 from tensorflow.python.keras import regularizers
---> 49 from tensorflow.python.keras.engine import base_layer_utils
50 from tensorflow.python.keras.engine import input_spec
51 from tensorflow.python.keras.engine import keras_tensor

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in
29 from tensorflow.python.keras.utils import control_flow_util
30 from tensorflow.python.keras.utils import tf_inspect
---> 31 from tensorflow.python.keras.utils import tf_utils
32 from tensorflow.python.ops import array_ops
33 from tensorflow.python.ops import variables as tf_variables

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py in
20
21 from tensorflow.python.data.experimental.ops import cardinality
---> 22 from tensorflow.python.distribute.coordinator import cluster_coordinator as coordinator_lib
23 from tensorflow.python.eager import context
24 from tensorflow.python.framework import composite_tensor

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py in
34
35 from tensorflow.python.distribute import input_lib
---> 36 from tensorflow.python.distribute import parameter_server_strategy_v2
37 from tensorflow.python.distribute.coordinator import metric_utils
38 from tensorflow.python.eager import cancellation

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy_v2.py in
31 from tensorflow.python.distribute import mirrored_run
32 from tensorflow.python.distribute import multi_worker_util
---> 33 from tensorflow.python.distribute import parameter_server_strategy
34 from tensorflow.python.distribute import sharded_variable
35 from tensorflow.python.distribute import values

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy.py in
32 from tensorflow.python.distribute import ps_values
33 from tensorflow.python.distribute import values
---> 34 from tensorflow.python.distribute.cluster_resolver import SimpleClusterResolver
35 from tensorflow.python.distribute.cluster_resolver import TFConfigClusterResolver
36 from tensorflow.python.eager import context

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/init.py in
29 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import SimpleClusterResolver
30 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import UnionClusterResolver
---> 31 from tensorflow.python.distribute.cluster_resolver.gce_cluster_resolver import GCEClusterResolver
32 from tensorflow.python.distribute.cluster_resolver.kubernetes_cluster_resolver import KubernetesClusterResolver
33 from tensorflow.python.distribute.cluster_resolver.slurm_cluster_resolver import SlurmClusterResolver

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/gce_cluster_resolver.py in
26 _GOOGLE_API_CLIENT_INSTALLED = True
27 try:
---> 28 from googleapiclient import discovery # pylint: disable=g-import-not-at-top
29 from oauth2client.client import GoogleCredentials # pylint: disable=g-import-not-at-top
30 except ImportError:

~/.local/lib/python3.9/site-packages/googleapiclient/discovery.py in
66 from googleapiclient.errors import UnknownApiNameOrVersion
67 from googleapiclient.errors import UnknownFileType
---> 68 from googleapiclient.http import build_http
69 from googleapiclient.http import BatchHttpRequest
70 from googleapiclient.http import HttpMock

~/.local/lib/python3.9/site-packages/googleapiclient/http.py in
65 from googleapiclient.errors import UnexpectedBodyError
66 from googleapiclient.errors import UnexpectedMethodError
---> 67 from googleapiclient.model import JsonModel
68
69

~/.local/lib/python3.9/site-packages/googleapiclient/model.py in
34 from googleapiclient.errors import HttpError
35
---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
37 _PY_VERSION = platform.python_version()
38

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in get_distribution(dist)
464 dist = Requirement.parse(dist)
465 if isinstance(dist, Requirement):
--> 466 dist = get_provider(dist)
467 if not isinstance(dist, Distribution):
468 raise TypeError("Expected string, Requirement, or Distribution", dist)

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in get_provider(moduleOrReq)
340 """Return an IResourceProvider for the named module or requirement"""
341 if isinstance(moduleOrReq, Requirement):
--> 342 return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
343 try:
344 module = sys.modules[moduleOrReq]

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in require(self, *requirements)
884 included, even if they were already activated in this working set.
885 """
--> 886 needed = self.resolve(parse_requirements(requirements))
887
888 for dist in needed:

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
775 # Oops, the "best" so far conflicts with a dependency
776 dependent_req = required_by[req]
--> 777 raise VersionConflict(dist, req).with_context(dependent_req)
778
779 # push the new requirements onto the stack

ContextualVersionConflict: (google-api-core 2.3.2 (/builder/home/.local/lib/python3.9/site-packages), Requirement.parse('google-api-core<2dev,>=1.21.0'), {'google-api-python-client'})

Notebook fails: lightweight_functions_component_io_kfp.ipynb

Expected Behavior

Notebook doesn't fail

Actual Behavior

Step #4: lightweight_functions_component_io_kfp.ipynb                      FAILED    00:00:16    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [20]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         KeyError                                  Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_31031/4032352611.py in <module>
Step #4:                                                                                               1 from kfp.v2 import compiler
Step #4:                                                                                               2
Step #4:                                                                                         ----> 3 compiler.Compiler().compile(
Step #4:                                                                                               4     pipeline_func=pipeline, package_path="component_io_job.json"
Step #4:                                                                                               5 )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
Step #4:                                                                                            1134       kfp.TYPE_CHECK = type_check
Step #4:                                                                                            1135       kfp.COMPILING_FOR_V2 = True
Step #4:                                                                                         -> 1136       pipeline_job = self._create_pipeline_v2(
Step #4:                                                                                            1137           pipeline_func=pipeline_func,
Step #4:                                                                                            1138           pipeline_name=pipeline_name,
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
Step #4:                                                                                            1070
Step #4:                                                                                            1071     with dsl.Pipeline(pipeline_name) as dsl_pipeline:
Step #4:                                                                                         -> 1072       pipeline_func(*args_list)
Step #4:                                                                                            1073
Step #4:                                                                                            1074     self._validate_exit_handler(dsl_pipeline)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_31031/2896638886.py in pipeline(message)
Step #4:                                                                                              14     train_task = train(
Step #4:                                                                                              15         dataset_one=preprocess_task.outputs["output_dataset_one"],
Step #4:                                                                                         ---> 16         dataset_two=preprocess_task.outputs["output_dataset_two"],
Step #4:                                                                                              17         imported_dataset=importer.output,
Step #4:                                                                                              18         message=preprocess_task.outputs["output_parameter"],
Step #4: 
Step #4:                                                                                         KeyError: 'output_dataset_two'
Step #4: 
Step #4: === END RESULTS===
Step #4: