Giter VIP home page Giter VIP logo

googlecloudplatform / vertex-ai-samples Goto Github PK

View Code? Open in Web Editor NEW
1.4K 61.0 719.0 91.79 MB

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Home Page: https://cloud.google.com/vertex-ai

License: Apache License 2.0

Python 1.90% Dockerfile 0.26% Shell 0.04% Jupyter Notebook 97.79%
samples gcp google-cloud-platform vertex-ai notebook python ai ml data-science mlops

vertex-ai-samples's Introduction

Google Cloud Vertex AI Samples

License

Welcome to the Google Cloud Vertex AI sample repository.

Overview

The repository contains notebooks and community content that demonstrate how to develop and manage ML workflows using Google Cloud Vertex AI.

Repository structure

├── community-content - Sample code and tutorials contributed by the community
├── notebooks
│   ├── community - Notebooks contributed by the community
│   ├── official - Notebooks demonstrating use of each Vertex AI service
│   │   ├── automl
│   │   ├── custom
│   │   ├── ...

Contributing

Contributions welcome! See the Contributing Guide.

Getting help

Please use the issues page to provide feedback or submit a bug report.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

Feedback

Please feel free to fill out our survey to give us feedback on the repo and its content.

vertex-ai-samples's People

Contributors

aarondietz234 avatar abcdefgs0324 avatar andrewferlitsch avatar btrinh69 avatar connor-mccarthy avatar dependabot[bot] avatar dstnluong-google avatar genquan9 avatar gericdong avatar huguensjean avatar inardini avatar ivanmkc avatar kathyyu-google avatar katiemn avatar kcfindstr avatar kittyabs avatar krishr2d2 avatar kweinmeister avatar mco-gh avatar minwoo33park avatar morgandu avatar renovate-bot avatar reznitskii avatar soheilazangeneh avatar sudarshan-springml avatar telpirion avatar themichaelhu avatar udaypunna avatar weigary avatar xiangxu-google avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vertex-ai-samples's Issues

Build timeout errors: Explainability GAPIC notebooks

gapic-custom_image_classification_batch_explain.ipynb and gapic-custom_image_classification_online_explain.ipynb both had timeout errors in the build. We should investigate if this was a one-off or a recurring issue.

Update: this issue has also been seen in gapic-custom_tabular_regression_batch_explain.ipynb and gapic-custom_tabular_regression_online_explain.ipynb

Step #3: RetryError: Deadline of 180.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7949f60460>>), last exception: 
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: TimeoutError                              Traceback (most recent call last)
Step #3: /tmp/ipykernel_19/629009188.py in <module>
Step #3:      20 
Step #3:      21 
Step #3: ---> 22 model_to_deploy_id = upload_model(
Step #3:      23     "cifar10-" + TIMESTAMP, IMAGE_URI, model_path_to_deploy
Step #3:      24 )
Step #3: 
Step #3: /tmp/ipykernel_19/629009188.py in upload_model(display_name, image_uri, model_uri)
Step #3:      14     response = clients["model"].upload_model(parent=PARENT, model=model)
Step #3:      15     print("Long running operation:", response.operation.name)
Step #3: ---> 16     upload_model_response = response.result(timeout=180)
Step #3:      17     print("upload_model_response")
Step #3:      18     print(" model:", upload_model_response.model)
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/api_core/future/polling.py in result(self, timeout, retry)
Step #3:     130         """
Step #3:     131         kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
Step #3: --> 132         self._blocking_poll(timeout=timeout, **kwargs)
Step #3:     133 
Step #3:     134         if self._exception is not None:
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/api_core/future/polling.py in _blocking_poll(self, timeout, retry)
Step #3:     110             retry_(self._done_or_raise)(**kwargs)
Step #3:     111         except exceptions.RetryError:
Step #3: --> 112             raise concurrent.futures.TimeoutError(
Step #3:     113                 "Operation did not complete within the designated " "timeout."
Step #3:     114             )
Step #3: 
Step #3: TimeoutError: Operation did not complete within the designated timeout.

notebooks/official/explainable_ai/gapic-custom_tabular_regression_batch_explain.ipynb doesn't install tensorflow

Step #3: / [0 files][    0.0 B/135.8 KiB]                                                
/ [1 files][135.8 KiB/135.8 KiB]                                                
Step #3: Operation completed over 1 objects/135.8 KiB.                                    
Step #3: Uploaded output to: gs://cloud-build-notebooks-presubmit/executed_notebooks/PR_120/BUILD_831c33e0-bc70-4cd3-a3cf-a6d3e9b3c4ae/gapic-custom_tabular_regression_batch_explain.ipynb
Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [33]":
Step #3: ---------------------------------------------------------------------------
Step #3: ModuleNotFoundError                       Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/818968412.py in <module>
Step #3: ----> 1 import tensorflow as tf
Step #3:       2 
Step #3:       3 model = tf.keras.models.load_model(MODEL_DIR)
Step #3: 
Step #3: ModuleNotFoundError: No module named 'tensorflow'
Step #3: 
Finished Step #3
ERROR
ERROR: build step 3 "gcr.io/cloud-devrel-public-resources/python-samples-testing-docker:latest" failed: step exited with non-zero status: 1

Logs: https://console.cloud.google.com/cloud-build/builds/a09a6f51-c0f9-473d-9233-929a8bb0ccda?project=1012616486416

Add BigQuery Admin permission to service account for notebook execution tests

Due to a recent policy bot change, the [email protected] no longer has BigQuery write permissions.

Exception encountered at "In [23]":
Step #3: ---------------------------------------------------------------------------
Step #3: Forbidden                                 Traceback (most recent call last)
Step #3: Input In [23], in <module>
Step #3:      15 dataset_region = "US"  # @param {type : "string"}
Step #3:      16 bq_dataset.location = dataset_region
Step #3: ---> 17 bq_dataset = client.create_dataset(bq_dataset)
Step #3:      18 print(
Step #3:      19     "Created bigquery dataset {} in {}".format(
Step #3:      20         batch_predict_bq_output_dataset_path, dataset_region
Step #3:      21     )
Step #3:      22 )
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/bigquery/client.py:632, in Client.create_dataset(self, dataset, exists_ok, retry, timeout)
Step #3:     629 try:
Step #3:     630     span_attributes = {"path": path}
Step #3: --> 632     api_response = self._call_api(
Step #3:     633         retry,
Step #3:     634         span_name="BigQuery.createDataset",
Step #3:     635         span_attributes=span_attributes,
Step #3:     636         method="POST",
Step #3:     637         path=path,
Step #3:     638         data=data,
Step #3:     639         timeout=timeout,
Step #3:     640     )
Step #3:     641     return Dataset.from_api_repr(api_response)
Step #3:     642 except core_exceptions.Conflict:
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/bigquery/client.py:782, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
Step #3:     778 if span_name is not None:
Step #3:     779     with create_span(
Step #3:     780         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
Step #3:     781     ):
Step #3: --> 782         return call()
Step #3:     784 return call()
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/api_core/retry.py:283, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
Step #3:     279 target = functools.partial(func, *args, **kwargs)
Step #3:     280 sleep_generator = exponential_sleep_generator(
Step #3:     281     self._initial, self._maximum, multiplier=self._multiplier
Step #3:     282 )
Step #3: --> 283 return retry_target(
Step #3:     284     target,
Step #3:     285     self._predicate,
Step #3:     286     sleep_generator,
Step #3:     287     self._deadline,
Step #3:     288     on_error=on_error,
Step #3:     289 )
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/api_core/retry.py:190, in retry_target(target, predicate, sleep_generator, deadline, on_error)
Step #3:     188 for sleep in sleep_generator:
Step #3:     189     try:
Step #3: --> 190         return target()
Step #3:     192     # pylint: disable=broad-except
Step #3:     193     # This function explicitly must deal with broad exceptions.
Step #3:     194     except Exception as exc:
Step #3: 
Step #3: File ~/.local/lib/python3.9/site-packages/google/cloud/_http/__init__.py:480, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout)
Step #3:     469 response = self._make_request(
Step #3:     470     method=method,
Step #3:     471     url=url,
Step #3:    (...)
Step #3:     476     timeout=timeout,
Step #3:     477 )
Step #3:     479 if not 200 <= response.status_code < 300:
Step #3: --> 480     raise exceptions.from_http_response(response)
Step #3:     482 if expect_json and response.content:
Step #3:     483     return response.json()
Step #3: 
Step #3: Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/python-docs-samples-tests/datasets?prettyPrint=false: Access Denied: Project python-docs-samples-tests: User does not have bigquery.datasets.create permission in project python-docs-samples-tests.
Step #3: 

Solution:

Write an exemption CL like this: cl/427208423

Affected notebooks:

https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/sdk_automl_tabular_forecasting_batch.ipynb

pipelines_intro_kfp.ipynb: Failing at kfp import

Expected Behavior

This import fails

from kfp.v2.google.client import AIPlatformClient  # noqa: F811

Error message

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
   3015         try:
-> 3016             return self.__dep_map
   3017         except AttributeError:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
   3006         try:
-> 3007             return self._pkg_info
   3008         except AttributeError:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _pkg_info

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_19/3716807617.py in <module>
----> 1 from kfp.v2.google.client import AIPlatformClient  # noqa: F811
      2 
      3 api_client = AIPlatformClient(project_id=PROJECT_ID, region=REGION)
      4 
      5 # adjust time zone and cron schedule as necessary

/usr/local/lib/python3.9/site-packages/kfp/v2/google/client/__init__.py in <module>
     13 # limitations under the License.
     14 
---> 15 from kfp.v2.google.client.client import AIPlatformClient

/usr/local/lib/python3.9/site-packages/kfp/v2/google/client/client.py in <module>
     27 from google.oauth2 import credentials
     28 from google.protobuf import json_format
---> 29 from googleapiclient import discovery
     30 
     31 from kfp.v2.google.client import client_utils

/usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py in <module>
     66 from googleapiclient.errors import UnknownApiNameOrVersion
     67 from googleapiclient.errors import UnknownFileType
---> 68 from googleapiclient.http import build_http
     69 from googleapiclient.http import BatchHttpRequest
     70 from googleapiclient.http import HttpMock

/usr/local/lib/python3.9/site-packages/googleapiclient/http.py in <module>
     65 from googleapiclient.errors import UnexpectedBodyError
     66 from googleapiclient.errors import UnexpectedMethodError
---> 67 from googleapiclient.model import JsonModel
     68 
     69 

/usr/local/lib/python3.9/site-packages/googleapiclient/model.py in <module>
     34 from googleapiclient.errors import HttpError
     35 
---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
     37 _PY_VERSION = platform.python_version()
     38 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_distribution(dist)
    464         dist = Requirement.parse(dist)
    465     if isinstance(dist, Requirement):
--> 466         dist = get_provider(dist)
    467     if not isinstance(dist, Distribution):
    468         raise TypeError("Expected string, Requirement, or Distribution", dist)

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
    340     """Return an IResourceProvider for the named module or requirement"""
    341     if isinstance(moduleOrReq, Requirement):
--> 342         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
    343     try:
    344         module = sys.modules[moduleOrReq]

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in require(self, *requirements)
    884         included, even if they were already activated in this working set.
    885         """
--> 886         needed = self.resolve(parse_requirements(requirements))
    887 
    888         for dist in needed:

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
    778 
    779             # push the new requirements onto the stack
--> 780             new_requirements = dist.requires(req.extras)[::-1]
    781             requirements.extend(new_requirements)
    782 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in requires(self, extras)
   2732     def requires(self, extras=()):
   2733         """List of Requirements needed for this distro if `extras` are used"""
-> 2734         dm = self._dep_map
   2735         deps = []
   2736         deps.extend(dm.get(None, ()))

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
   3016             return self.__dep_map
   3017         except AttributeError:
-> 3018             self.__dep_map = self._compute_dependencies()
   3019             return self.__dep_map
   3020 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _compute_dependencies(self)
   3025         reqs = []
   3026         # Including any condition expressions
-> 3027         for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
   3028             reqs.extend(parse_requirements(req))
   3029 

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
   3007             return self._pkg_info
   3008         except AttributeError:
-> 3009             metadata = self.get_metadata(self.PKG_INFO)
   3010             self._pkg_info = email.parser.Parser().parsestr(metadata)
   3011             return self._pkg_info

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_metadata(self, name)
   1405             return ""
   1406         path = self._get_metadata_path(name)
-> 1407         value = self._get(path)
   1408         try:
   1409             return value.decode('utf-8')

/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _get(self, path)
   1609 
   1610     def _get(self, path):
-> 1611         with open(path, 'rb') as stream:
   1612             return stream.read()
   1613 

FileNotFoundError: [Errno 2] No such file or directory: '/builder/home/.local/lib/python3.9/site-packages/google_auth-2.3.3.dist-info/METADATA'

trainer.save_model(“gs://*****/“) doesn’t save the mode in GCP bucket?

Expected Behavior

The model should be saved in GCP bucket using trainer function.

Actual Behavior

The model is not saved.

Steps to Reproduce the Problem

train.py inside Custom Container

training_args = tr.TrainingArguments(

     output_dir='gs://****/results_mlm_exp2'
    ,logging_dir='gs://****/logs_mlm_exp2'        # directory for storing logs
    ,save_strategy="epoch"
    ,learning_rate=2e-5
    ,logging_steps=2000
    ,overwrite_output_dir=True
    ,num_train_epochs=20
    ,per_device_train_batch_size=4
    ,prediction_loss_only=True
    ,gradient_accumulation_steps=16

)
trainer = tr.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data
)
# print("training to start without bf16")
trainer.train()
trainer.save_model("gs://****/model_mlm_exp2")



"Failed to parse the container spec json payload to requested prototype" within CustomTrainingJobOp

Expected Behavior

  1. Submit a custom training job op within a VPC peering and associated reserved ip ranges with pipeline params passed as args.
  2. Component runs successfully

Actual Behavior

  1. The compiler fails, complaining that the pipelineparam is not json serializable
    TypeError: Object of type PipelineParam is not JSON serializable as seen here. For a parameter being passed to a training operation, this really doesn't make any sense.

If all pipiline params are removed before compilation, the Vertex component fails with the following error.
image

The full redacted object dump is here.

{
    "display_name": "SOMEOP",
    "job_spec": {
        "worker_pool_specs": [
            {
                "containerSpec": {
                    "args": [
                        "-A",
                        "AAAA",
                        "-B",
                        "BBBB",
                        "-C",
                        "CCCCC",
                        "-D",
                        "SOME_GS_URL"
                    ],
                    "env": [
                        {
                            "name": "AIP_MODEL_DIR",
                            "value": "SOME_GS_URL"
                        }
                    ],
                    "imageUri": "SOME_CONTAINER_IMAGE"
                },
                "replicaCount": "1",
                "machineSpec": {
                    "machineType": "n1-standard-8"
                }
            }
        ],
        "scheduling": {
            "timeout": "15m",
            "restart_job_on_worker_restart": "false"
        },
        "service_account": "[email protected]",
        "tensorboard": "TENSORBOARD_ID",
        "enable_web_access": "false",
        "network": "NETWORK_ID",
        "reserved_ip_ranges": [
            "google-reserved-range"
        ],
        "base_output_directory": {
            "output_uri_prefix": "SOME_GS_URL"
        }
    },
    "labels": {},
    "encryption_spec": {
        "kms_key_name": ""
    }
}

This is despite me following the guide as described here, which seems a little outdated in places? Any help would be greatly appreciated. Cheers!

Steps to Reproduce the Problem

  1. google-cloud-pipeline-components = ^1.0.1
  2. kfp ^1.8.11
  3. Compile pipeline and upload to vertex
  4. Training component fails

Specifications

  • Version: 1.0.1
  • Platform: Vertex AI on GCP

[Policy Bot] found one or more issues with this repository.

Policy Bot found one or more issues with this repository.

  • Default branch is 'main'
  • Branch protection is enabled
  • Renovate bot is enabled
  • Merge commits disabled
  • There is a CODEOWNERS file
  • There is a valid LICENSE.md
  • There is a CODE_OF_CONDUCT.md
  • There is a CONTRIBUTING.md
  • There is a SECURITY.md

Custom model deployed with a docker container but requests are not working as expected

Context

I have been using the Ludwig AI library to create tensorflow models. The library includes a serve tool to serve a model via HTTP, much like the Pytorch serve.

I'm attempting to use a model trained with Ludwig and serve with Ludwig serve from a custom docker container deployed as a custom model in Vertex attached to an endpoint.

I've described the context in more detail in a discussion in the Ludwig github.

Expected Behavior

Using the following JSON format for a request:

{
  "instances": [
    {
      "textfeature": "Words to be classified"
    }
  ]
}

The endpoint should return a JSON object with predictions from Ludwig serve.

Actual Behavior

I get an error from Ludwig serve:

{"error":"entry must contain all input features"}

Steps to Reproduce the Problem

This is a bit tricky, but I will explain at a high level. If need be, I can provide complete notebook with the whole procedure.

  1. Train a text classification model with ludwig.
  2. Create a custom docker container to use Ludwig serve and the trained model.
  3. Submit the container to vertex container registry.
  4. Deploy a custom model.
  5. Attach the model to an endpoint.

Any ideas?

Regression test failures: Explainability notebooks have GAPIC API retries that are too short

Build log: https://pantheon2.corp.google.com/cloud-build/builds;region=global/3d191399-2605-44d7-9422-c6f5768e10e2?project=python-docs-samples-tests

  • gapic-custom_image_classification_batch_explain.ipynb
  • gapic-custom_image_classification_online_explain.ipynb
  • gapic-custom_tabular_regression_batch_explain.ipynb
  • gapic-custom_tabular_regression_online_explain.ipynb

Likely all are failing at the model upload step.

Check official CODEOWNERS for references to non-existent Github handles

notebooks/community/CODEOWNERS is referencing internal usernames instead of Github handles.

CODEOWNERS errors
Unknown owner on line 6: make sure @aferlitsch exists and has write access to the repository
/sdk/sdk_* @aferlitsch
Unknown owner on line 7: make sure @aferlitsch exists and has write access to the repository
/gapic @aferlitsch
Unknown owner on line 8: make sure @aferlitsch exists and has write access to the repository
/ml_ops @aferlitsch
Unknown owner on line 9: make sure @mco exists and has write access to the repository
/model_monitoring/* @mco
Unknown owner on line 12: make sure @notebooks-team exists and has write access to the repository
/managed_notebooks/ @notebooks-team
Unknown owner on line 15: make sure @thehardikv exists and has write access to the repository
…_Model_Training_Example.ipynb @thehardikv
Unknown owner on line 16: make sure @thehardikv exists and has write access to the repository
…ting_evaluating_a_model.ipynb @thehardikv
Unknown owner on line 19: make sure @benofben exists and has write access to the repository
/neo4j @benofben @htappen
Unknown owner on line 19: make sure @htappen exists and has write access to the repository
/neo4j @benofben @htappen
Unknown owner on line 21: make sure @wattli exists and has write access to the repository
/tensorboard @yfang1 @wattli
Unknown owner on line 22: make sure @inardini exists and has write access to the repository
…store @nayaknishant @morgandu @inardini

intro-swivel.ipynb fails on TF import

Logs: https://pantheon.corp.google.com/cloud-build/builds/ebea2843-3345-4a7e-a4e4-5c625613d42d?project=python-docs-samples-tests

Error in intro-swivel.ipynb when importing tensorflow:

Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [14]":
Step #3: ---------------------------------------------------------------------------
Step #3: AttributeError                            Traceback (most recent call last)
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
Step #3:    3015         try:
Step #3: -> 3016             return self.__dep_map
Step #3:    3017         except AttributeError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
Step #3:    2812         if attr.startswith('_'):
Step #3: -> 2813             raise AttributeError(attr)
Step #3:    2814         return getattr(self._provider, attr)
Step #3: 
Step #3: AttributeError: _DistInfoDistribution__dep_map
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: AttributeError                            Traceback (most recent call last)
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
Step #3:    3006         try:
Step #3: -> 3007             return self._pkg_info
Step #3:    3008         except AttributeError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in __getattr__(self, attr)
Step #3:    2812         if attr.startswith('_'):
Step #3: -> 2813             raise AttributeError(attr)
Step #3:    2814         return getattr(self._provider, attr)
Step #3: 
Step #3: AttributeError: _pkg_info
Step #3: 
Step #3: During handling of the above exception, another exception occurred:
Step #3: 
Step #3: FileNotFoundError                         Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/2420721388.py in <module>
Step #3:       1 import pandas as pd
Step #3: ----> 2 import tensorflow as tf
Step #3:       3 from google.cloud import aiplatform
Step #3:       4 from kfp.v2.google import client
Step #3:       5 from sklearn.metrics.pairwise import cosine_similarity
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/__init__.py in <module>
Step #3:      39 import sys as _sys
Step #3:      40 
Step #3: ---> 41 from tensorflow.python.tools import module_util as _module_util
Step #3:      42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
Step #3:      43 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/__init__.py in <module>
Step #3:      47 from tensorflow.python import distribute
Step #3:      48 # from tensorflow.python import keras
Step #3: ---> 49 from tensorflow.python.feature_column import feature_column_lib as feature_column
Step #3:      50 # from tensorflow.python.layers import layers
Step #3:      51 from tensorflow.python.module import module
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/feature_column/feature_column_lib.py in <module>
Step #3:      20 
Step #3:      21 # pylint: disable=unused-import,line-too-long,wildcard-import,g-bad-import-order
Step #3: ---> 22 from tensorflow.python.feature_column.feature_column import *
Step #3:      23 from tensorflow.python.feature_column.feature_column_v2 import *
Step #3:      24 from tensorflow.python.feature_column.sequence_feature_column import *
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/feature_column/feature_column.py in <module>
Step #3:     145 from tensorflow.python.framework import sparse_tensor as sparse_tensor_lib
Step #3:     146 from tensorflow.python.framework import tensor_shape
Step #3: --> 147 from tensorflow.python.layers import base
Step #3:     148 from tensorflow.python.ops import array_ops
Step #3:     149 from tensorflow.python.ops import check_ops
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/layers/base.py in <module>
Step #3:      18 from __future__ import print_function
Step #3:      19 
Step #3: ---> 20 from tensorflow.python.keras.legacy_tf_layers import base
Step #3:      21 
Step #3:      22 InputSpec = base.InputSpec
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/__init__.py in <module>
Step #3:      23 
Step #3:      24 # See b/110718070#comment18 for more details about this import.
Step #3: ---> 25 from tensorflow.python.keras import models
Step #3:      26 
Step #3:      27 from tensorflow.python.keras.engine.input_layer import Input
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/models.py in <module>
Step #3:      18 from tensorflow.python.framework import ops
Step #3:      19 from tensorflow.python.keras import backend
Step #3: ---> 20 from tensorflow.python.keras import metrics as metrics_module
Step #3:      21 from tensorflow.python.keras import optimizer_v1
Step #3:      22 from tensorflow.python.keras.engine import functional
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/metrics.py in <module>
Step #3:      32 from tensorflow.python.framework import ops
Step #3:      33 from tensorflow.python.framework import tensor_shape
Step #3: ---> 34 from tensorflow.python.keras import activations
Step #3:      35 from tensorflow.python.keras import backend
Step #3:      36 from tensorflow.python.keras.engine import base_layer
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/activations.py in <module>
Step #3:      16 
Step #3:      17 from tensorflow.python.keras import backend
Step #3: ---> 18 from tensorflow.python.keras.layers import advanced_activations
Step #3:      19 from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object
Step #3:      20 from tensorflow.python.keras.utils.generic_utils import serialize_keras_object
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/layers/__init__.py in <module>
Step #3:      20 # pylint: disable=g-bad-import-order
Step #3:      21 # pylint: disable=g-import-not-at-top
Step #3: ---> 22 from tensorflow.python.keras.engine.input_layer import Input
Step #3:      23 from tensorflow.python.keras.engine.input_layer import InputLayer
Step #3:      24 from tensorflow.python.keras.engine.input_spec import InputSpec
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_layer.py in <module>
Step #3:      22 from tensorflow.python.keras import backend
Step #3:      23 from tensorflow.python.keras.distribute import distributed_training_utils
Step #3: ---> 24 from tensorflow.python.keras.engine import base_layer
Step #3:      25 from tensorflow.python.keras.engine import keras_tensor
Step #3:      26 from tensorflow.python.keras.engine import node as node_module
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in <module>
Step #3:      46 from tensorflow.python.keras import initializers
Step #3:      47 from tensorflow.python.keras import regularizers
Step #3: ---> 48 from tensorflow.python.keras.engine import base_layer_utils
Step #3:      49 from tensorflow.python.keras.engine import input_spec
Step #3:      50 from tensorflow.python.keras.engine import keras_tensor
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in <module>
Step #3:      29 from tensorflow.python.keras.utils import control_flow_util
Step #3:      30 from tensorflow.python.keras.utils import tf_inspect
Step #3: ---> 31 from tensorflow.python.keras.utils import tf_utils
Step #3:      32 from tensorflow.python.ops import array_ops
Step #3:      33 from tensorflow.python.ops import variables as tf_variables
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py in <module>
Step #3:      20 
Step #3:      21 from tensorflow.python.data.experimental.ops import cardinality
Step #3: ---> 22 from tensorflow.python.distribute.coordinator import cluster_coordinator as coordinator_lib
Step #3:      23 from tensorflow.python.eager import context
Step #3:      24 from tensorflow.python.framework import composite_tensor
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py in <module>
Step #3:      32 from six.moves import queue
Step #3:      33 
Step #3: ---> 34 from tensorflow.python.distribute import parameter_server_strategy_v2
Step #3:      35 from tensorflow.python.distribute.coordinator import coordinator_context
Step #3:      36 from tensorflow.python.distribute.coordinator import metric_utils
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy_v2.py in <module>
Step #3:      32 from tensorflow.python.distribute import mirrored_run
Step #3:      33 from tensorflow.python.distribute import multi_worker_util
Step #3: ---> 34 from tensorflow.python.distribute import parameter_server_strategy
Step #3:      35 from tensorflow.python.distribute import ps_values
Step #3:      36 from tensorflow.python.distribute import sharded_variable
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy.py in <module>
Step #3:      32 from tensorflow.python.distribute import ps_values
Step #3:      33 from tensorflow.python.distribute import values
Step #3: ---> 34 from tensorflow.python.distribute.cluster_resolver import SimpleClusterResolver
Step #3:      35 from tensorflow.python.distribute.cluster_resolver import TFConfigClusterResolver
Step #3:      36 from tensorflow.python.eager import context
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/__init__.py in <module>
Step #3:      29 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import SimpleClusterResolver
Step #3:      30 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import UnionClusterResolver
Step #3: ---> 31 from tensorflow.python.distribute.cluster_resolver.gce_cluster_resolver import GCEClusterResolver
Step #3:      32 from tensorflow.python.distribute.cluster_resolver.kubernetes_cluster_resolver import KubernetesClusterResolver
Step #3:      33 from tensorflow.python.distribute.cluster_resolver.slurm_cluster_resolver import SlurmClusterResolver
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/gce_cluster_resolver.py in <module>
Step #3:      26 _GOOGLE_API_CLIENT_INSTALLED = True
Step #3:      27 try:
Step #3: ---> 28   from googleapiclient import discovery  # pylint: disable=g-import-not-at-top
Step #3:      29   from oauth2client.client import GoogleCredentials  # pylint: disable=g-import-not-at-top
Step #3:      30 except ImportError:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/discovery.py in <module>
Step #3:      66 from googleapiclient.errors import UnknownApiNameOrVersion
Step #3:      67 from googleapiclient.errors import UnknownFileType
Step #3: ---> 68 from googleapiclient.http import build_http
Step #3:      69 from googleapiclient.http import BatchHttpRequest
Step #3:      70 from googleapiclient.http import HttpMock
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/http.py in <module>
Step #3:      65 from googleapiclient.errors import UnexpectedBodyError
Step #3:      66 from googleapiclient.errors import UnexpectedMethodError
Step #3: ---> 67 from googleapiclient.model import JsonModel
Step #3:      68 
Step #3:      69 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/googleapiclient/model.py in <module>
Step #3:      34 from googleapiclient.errors import HttpError
Step #3:      35 
Step #3: ---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
Step #3:      37 _PY_VERSION = platform.python_version()
Step #3:      38 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_distribution(dist)
Step #3:     464         dist = Requirement.parse(dist)
Step #3:     465     if isinstance(dist, Requirement):
Step #3: --> 466         dist = get_provider(dist)
Step #3:     467     if not isinstance(dist, Distribution):
Step #3:     468         raise TypeError("Expected string, Requirement, or Distribution", dist)
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
Step #3:     340     """Return an IResourceProvider for the named module or requirement"""
Step #3:     341     if isinstance(moduleOrReq, Requirement):
Step #3: --> 342         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
Step #3:     343     try:
Step #3:     344         module = sys.modules[moduleOrReq]
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in require(self, *requirements)
Step #3:     884         included, even if they were already activated in this working set.
Step #3:     885         """
Step #3: --> 886         needed = self.resolve(parse_requirements(requirements))
Step #3:     887 
Step #3:     888         for dist in needed:
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
Step #3:     778 
Step #3:     779             # push the new requirements onto the stack
Step #3: --> 780             new_requirements = dist.requires(req.extras)[::-1]
Step #3:     781             requirements.extend(new_requirements)
Step #3:     782 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in requires(self, extras)
Step #3:    2732     def requires(self, extras=()):
Step #3:    2733         """List of Requirements needed for this distro if `extras` are used"""
Step #3: -> 2734         dm = self._dep_map
Step #3:    2735         deps = []
Step #3:    2736         deps.extend(dm.get(None, ()))
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _dep_map(self)
Step #3:    3016             return self.__dep_map
Step #3:    3017         except AttributeError:
Step #3: -> 3018             self.__dep_map = self._compute_dependencies()
Step #3:    3019             return self.__dep_map
Step #3:    3020 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _compute_dependencies(self)
Step #3:    3025         reqs = []
Step #3:    3026         # Including any condition expressions
Step #3: -> 3027         for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
Step #3:    3028             reqs.extend(parse_requirements(req))
Step #3:    3029 
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _parsed_pkg_info(self)
Step #3:    3007             return self._pkg_info
Step #3:    3008         except AttributeError:
Step #3: -> 3009             metadata = self.get_metadata(self.PKG_INFO)
Step #3:    3010             self._pkg_info = email.parser.Parser().parsestr(metadata)
Step #3:    3011             return self._pkg_info
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in get_metadata(self, name)
Step #3:    1405             return ""
Step #3:    1406         path = self._get_metadata_path(name)
Step #3: -> 1407         value = self._get(path)
Step #3:    1408         try:
Step #3:    1409             return value.decode('utf-8')
Step #3: 
Step #3: /usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py in _get(self, path)
Step #3:    1609 
Step #3:    1610     def _get(self, path):
Step #3: -> 1611         with open(path, 'rb') as stream:
Step #3:    1612             return stream.read()
Step #3:    1613 
Step #3: 
Step #3: FileNotFoundError: [Errno 2] No such file or directory: '/builder/home/.local/lib/python3.9/site-packages/google_auth-2.3.3.dist-info/METADATA'

[Policy Bot] found one or more issues with this repository.

Policy Bot found one or more issues with this repository.

  • Default branch is 'main'
  • Branch protection is enabled
  • Renovate bot is enabled
  • Merge commits disabled
  • There is a CODEOWNERS file
  • There is a valid LICENSE.md
  • There is a CODE_OF_CONDUCT.md
  • There is a CONTRIBUTING.md
  • There is a SECURITY.md

Having trouble in running Two-Tower built-in algorithm

Hello, so I was trying to run this code
I was trying to understand the Input Schema, this seems like an invalid JSON schema format.
I tried to run the code with this schema, and I got an error.
I am unsure as to what is the valid input schema format here.
Here is the error downloaded from the logs.

 {
    "textPayload": "The replica workerpool0-0 exited with a non-zero status of 1. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1004927726779&resource=ml_job%2Fjob_id%2F1464827664639459328&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%221464827664639459328%22",
    "insertId": "1calpzlc593",
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "service",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:31.582429920Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/endpoint": ""
    },
    "logName": "projects/trell-staging/logs/ml.googleapis.com%2F1464827664639459328",
    "receiveTimestamp": "2022-01-18T11:16:32.678545724Z"
  },
  {
    "insertId": "t523mkflys0mw",
    "jsonPayload": {
      "message": "json.decoder.JSONDecodeError: Extra data: line 1 column 8 (char 7)\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582601640Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_id": "439373723071094359"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mv",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    raise JSONDecodeError(\"Extra data\", s, end)\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582595651Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mu",
    "jsonPayload": {
      "message": "  File \"/usr/lib/python3.8/json/decoder.py\", line 340, in decode\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "project_id": "trell-staging",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582584385Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/zone": "us-central1-c"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mt",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    return _default_decoder.decode(s)\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "project_id": "trell-staging",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582578628Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0ms",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "  File \"/usr/lib/python3.8/json/__init__.py\", line 357, in loads\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "project_id": "trell-staging",
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582572352Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mr",
    "jsonPayload": {
      "message": "    return loads(fp.read(),\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "job_id": "1464827664639459328",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582566745Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": "",
      "ml.googleapis.com/trial_id": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mq",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "  File \"/usr/lib/python3.8/json/__init__.py\", line 293, in load\n"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "task_name": "workerpool0-0",
        "project_id": "trell-staging",
        "job_id": "1464827664639459328"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582560508Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_type": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mp",
    "jsonPayload": {
      "message": "    input_schema = json.load(tf.io.gfile.GFile(args.input_schema_path))\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582554286Z",
    "severity": "ERROR",
    "labels": {
      "ml.googleapis.com/job_id/log_area": "root",
      "ml.googleapis.com/trial_id": "",
      "ml.googleapis.com/trial_type": "",
      "compute.googleapis.com/resource_id": "439373723071094359",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "compute.googleapis.com/zone": "us-central1-c"
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mo",
    "jsonPayload": {
      "message": "  File \"/root/two_tower/task.py\", line 89, in main\n",
      "levelname": "ERROR"
    },
    "resource": {
      "type": "ml_job",
      "labels": {
        "job_id": "1464827664639459328",
        "task_name": "workerpool0-0",
        "project_id": "trell-staging"
      }
    },
    "timestamp": "2022-01-18T11:16:10.582548477Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_id": "439373723071094359",
      "ml.googleapis.com/trial_id": "",
      "compute.googleapis.com/zone": "us-central1-c",
      "ml.googleapis.com/job_id/log_area": "root",
      "compute.googleapis.com/resource_name": "cmle-training-5612951348054016528",
      "ml.googleapis.com/trial_type": ""
    },
    "logName": "projects/trell-staging/logs/workerpool0-0",
    "receiveTimestamp": "2022-01-18T11:16:42.042793190Z"
  },
  {
    "insertId": "t523mkflys0mn",
    "jsonPayload": {
      "levelname": "ERROR",
      "message": "    main()\n"
    }

Build failure: Model deploy failure on google_cloud_pipeline_components_automl_tabular.ipynb

Investigate if this is a one-off deployment error, or there is something that needs to be fixed in this notebook.

Notebook

Build details

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_19/3131610175.py in <module>
      7 )
      8 
----> 9 job.run()
     10 
     11 get_ipython().system(' rm tabular_regression_pipeline.json')

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
    673                 if self:
    674                     VertexAiResourceNounWithFutureManager.wait(self)
--> 675                 return method(*args, **kwargs)
    676 
    677             # callbacks to call within the Future (in same Thread)

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in run(self, service_account, network, sync)
    250         self.submit(service_account=service_account, network=network)
    251 
--> 252         self._block_until_complete()
    253 
    254     def submit(

~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
    347         # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
    348         if self._gca_resource.state in _PIPELINE_ERROR_STATES:
--> 349             raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
    350         else:
    351             _LOGGER.log_action_completed_against_resource("run", "completed", self)

RuntimeError: Job failed with:
code: 9
message: "The DAG failed because some tasks failed. The failed tasks are: [model-deploy].; Job (project_id = python-docs-samples-tests, job_id = ...) is failed due to the above error.; Failed to handle the job: {project_number = ..., job_id = ...}"

intro-swivel downloads external code

It looks like the https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/matching_engine/intro-swivel.ipynb notebook downloads code from a Cloud Bucket

gsutil cp gs://cloud-samples-data/vertex-ai/matching-engine/swivel/pipeline/* .`

It isn't really standard practice to download external code to run. This means that code might not be tested/reviewed and could change at any time.
Notebooks in this repo must be self-contained notebooks, with all code inside the notebook..

Preferred solution: Rewrite the .sh files as code inside the notebook.
Alternative solution: Move your notebook to "community-content"

No module named 'google.cloud.aiplatform' in sdk_custom_image_classification_online_explain.ipynb

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

  1. Cloned this git repository into a Managed Notebooks Environment in Vertex AI
  2. Tried running the notebook sample sdk_custom_image_classification_online_explain.ipynb
  3. the cell "import google.cloud.aiplatform as aip" errors out with ModuleNotFoundError: No module named 'google.cloud.aiplatform'

Specifications

  • Version:
  • Platform:

ModelUploadOp from "Vertex AI Pipelines: model upload using google-cloud-pipeline-components" does not work

Expected Behavior

Code example from "Vertex AI Pipelines: model train, upload, and deploy using google-cloud-pipeline-components" 1 should work as intended.

Actual Behavior

Code example below from "Vertex AI Pipelines: model train, upload, and deploy using google-cloud-pipeline-components" 1 had issue and does not work

from google_cloud_pipeline_components import aiplatform as gcc_aip
    from google.cloud import aiplatform
    aiplatform.init(project=project, location=region)

    # THIS IS THE METHOD THAT DOESN'T APPEAR TO WORK
    model_upload_op = gcc_aip.ModelUploadOp(
            project=project,
            location=region,
            display_name=model_display_name,
            artifact_uri=model.uri,
            serving_container_image_uri=serving_container_image_uri
            )

On the other hand, the method below worked:

 # THIS METHOD DOES WORK
    # aiplatform.Model.upload(
    #     display_name=model_display_name,
    #     artifact_uri=model.uri,
    #     serving_container_image_uri=serving_container_image_uri,
    # )

I'm currently using Vertex AI Pipelines to train a model and upload to Vertex AI. Currently in the pipeline, I'm attempting to use the ModelUploadOp class to upload a custom model to Vertex AI models. The logs show the job is succeeding, but the model never actually gets uploaded.

Steps to Reproduce the Problem

Specifications

Version:

  • Pipeline SDK (Kubeflow Pipelines/TFX) Version: kfp
  • Pipelines Version: kfp==1.8.11
  • Platform: Google Cloud Vertex AI

custom_model_training_and_batch_prediction: Failed at tensorflow import

Command

import tensorflow as tf

Error


ContextualVersionConflict Traceback (most recent call last)
/tmp/ipykernel_20/3384745563.py in
----> 1 import tensorflow as tf
2 from google_cloud_pipeline_components import aiplatform as gcc_aip
3 from google_cloud_pipeline_components.experimental.custom_job import utils
4 from kfp.v2 import compiler, dsl
5 from kfp.v2.dsl import component

~/.local/lib/python3.9/site-packages/tensorflow/init.py in
39 import sys as _sys
40
---> 41 from tensorflow.python.tools import module_util as _module_util
42 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
43

~/.local/lib/python3.9/site-packages/tensorflow/python/init.py in
46 from tensorflow.python import data
47 from tensorflow.python import distribute
---> 48 from tensorflow.python import keras
49 from tensorflow.python.feature_column import feature_column_lib as feature_column
50 from tensorflow.python.layers import layers

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/init.py in
23
24 # See b/110718070#comment18 for more details about this import.
---> 25 from tensorflow.python.keras import models
26
27 from tensorflow.python.keras.engine.input_layer import Input

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/models.py in
18 from tensorflow.python.framework import ops
19 from tensorflow.python.keras import backend
---> 20 from tensorflow.python.keras import metrics as metrics_module
21 from tensorflow.python.keras import optimizer_v1
22 from tensorflow.python.keras.engine import functional

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/metrics.py in
35 from tensorflow.python.framework import tensor_shape
36 from tensorflow.python.framework import tensor_spec
---> 37 from tensorflow.python.keras import activations
38 from tensorflow.python.keras import backend
39 from tensorflow.python.keras.engine import base_layer

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/activations.py in
16
17 from tensorflow.python.keras import backend
---> 18 from tensorflow.python.keras.layers import advanced_activations
19 from tensorflow.python.keras.utils.generic_utils import deserialize_keras_object
20 from tensorflow.python.keras.utils.generic_utils import serialize_keras_object

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/layers/init.py in
20 # pylint: disable=g-bad-import-order
21 # pylint: disable=g-import-not-at-top
---> 22 from tensorflow.python.keras.engine.input_layer import Input
23 from tensorflow.python.keras.engine.input_layer import InputLayer
24 from tensorflow.python.keras.engine.input_spec import InputSpec

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_layer.py in
22 from tensorflow.python.keras import backend
23 from tensorflow.python.keras.distribute import distributed_training_utils
---> 24 from tensorflow.python.keras.engine import base_layer
25 from tensorflow.python.keras.engine import keras_tensor
26 from tensorflow.python.keras.engine import node as node_module

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in
47 from tensorflow.python.keras import initializers
48 from tensorflow.python.keras import regularizers
---> 49 from tensorflow.python.keras.engine import base_layer_utils
50 from tensorflow.python.keras.engine import input_spec
51 from tensorflow.python.keras.engine import keras_tensor

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py in
29 from tensorflow.python.keras.utils import control_flow_util
30 from tensorflow.python.keras.utils import tf_inspect
---> 31 from tensorflow.python.keras.utils import tf_utils
32 from tensorflow.python.ops import array_ops
33 from tensorflow.python.ops import variables as tf_variables

~/.local/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py in
20
21 from tensorflow.python.data.experimental.ops import cardinality
---> 22 from tensorflow.python.distribute.coordinator import cluster_coordinator as coordinator_lib
23 from tensorflow.python.eager import context
24 from tensorflow.python.framework import composite_tensor

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py in
34
35 from tensorflow.python.distribute import input_lib
---> 36 from tensorflow.python.distribute import parameter_server_strategy_v2
37 from tensorflow.python.distribute.coordinator import metric_utils
38 from tensorflow.python.eager import cancellation

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy_v2.py in
31 from tensorflow.python.distribute import mirrored_run
32 from tensorflow.python.distribute import multi_worker_util
---> 33 from tensorflow.python.distribute import parameter_server_strategy
34 from tensorflow.python.distribute import sharded_variable
35 from tensorflow.python.distribute import values

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/parameter_server_strategy.py in
32 from tensorflow.python.distribute import ps_values
33 from tensorflow.python.distribute import values
---> 34 from tensorflow.python.distribute.cluster_resolver import SimpleClusterResolver
35 from tensorflow.python.distribute.cluster_resolver import TFConfigClusterResolver
36 from tensorflow.python.eager import context

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/init.py in
29 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import SimpleClusterResolver
30 from tensorflow.python.distribute.cluster_resolver.cluster_resolver import UnionClusterResolver
---> 31 from tensorflow.python.distribute.cluster_resolver.gce_cluster_resolver import GCEClusterResolver
32 from tensorflow.python.distribute.cluster_resolver.kubernetes_cluster_resolver import KubernetesClusterResolver
33 from tensorflow.python.distribute.cluster_resolver.slurm_cluster_resolver import SlurmClusterResolver

~/.local/lib/python3.9/site-packages/tensorflow/python/distribute/cluster_resolver/gce_cluster_resolver.py in
26 _GOOGLE_API_CLIENT_INSTALLED = True
27 try:
---> 28 from googleapiclient import discovery # pylint: disable=g-import-not-at-top
29 from oauth2client.client import GoogleCredentials # pylint: disable=g-import-not-at-top
30 except ImportError:

~/.local/lib/python3.9/site-packages/googleapiclient/discovery.py in
66 from googleapiclient.errors import UnknownApiNameOrVersion
67 from googleapiclient.errors import UnknownFileType
---> 68 from googleapiclient.http import build_http
69 from googleapiclient.http import BatchHttpRequest
70 from googleapiclient.http import HttpMock

~/.local/lib/python3.9/site-packages/googleapiclient/http.py in
65 from googleapiclient.errors import UnexpectedBodyError
66 from googleapiclient.errors import UnexpectedMethodError
---> 67 from googleapiclient.model import JsonModel
68
69

~/.local/lib/python3.9/site-packages/googleapiclient/model.py in
34 from googleapiclient.errors import HttpError
35
---> 36 _LIBRARY_VERSION = pkg_resources.get_distribution("google-api-python-client").version
37 _PY_VERSION = platform.python_version()
38

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in get_distribution(dist)
464 dist = Requirement.parse(dist)
465 if isinstance(dist, Requirement):
--> 466 dist = get_provider(dist)
467 if not isinstance(dist, Distribution):
468 raise TypeError("Expected string, Requirement, or Distribution", dist)

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in get_provider(moduleOrReq)
340 """Return an IResourceProvider for the named module or requirement"""
341 if isinstance(moduleOrReq, Requirement):
--> 342 return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
343 try:
344 module = sys.modules[moduleOrReq]

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in require(self, *requirements)
884 included, even if they were already activated in this working set.
885 """
--> 886 needed = self.resolve(parse_requirements(requirements))
887
888 for dist in needed:

/usr/local/lib/python3.9/site-packages/pkg_resources/init.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
775 # Oops, the "best" so far conflicts with a dependency
776 dependent_req = required_by[req]
--> 777 raise VersionConflict(dist, req).with_context(dependent_req)
778
779 # push the new requirements onto the stack

ContextualVersionConflict: (google-api-core 2.3.2 (/builder/home/.local/lib/python3.9/site-packages), Requirement.parse('google-api-core<2dev,>=1.21.0'), {'google-api-python-client'})

Can't cancel LRO for batch ingestion after deleting entity type / featurestore

Expected Behavior

Feature Store Batch Ingestion: I would expect to be able to cancel a batch ingestion job, either through the web UI or through the REST API.

Actual Behavior

I tried to cancel the operations through the rest API and got the response:

{
  "error": {
    "code": 400,
    "message": "Operation projects/<PROJECT_ID>/locations/us-central1/operations/<OPERATION_ID> is not cancellable.",
    "status": "FAILED_PRECONDITION"
  }
}

I tried to delete the entity type and then the featurestore in order to force the ingest job to stop, but it's still running. The only way I could get the LRO to stop was to delete the featurestore entirely, at which point it fails and gives the following error:

Online serving currently unavailable, Please retry the request shortly or reach out to support upon continued failure.

Steps to Reproduce the Problem

  1. Create a featurestore and entity
  2. start an LRO for batch ingest
  3. Try and cancel it with REST API

mlops_pipeline_tf_agents_bandits_movie_recommendation.ipynb

Expected Behavior

In the Author and run the RL pipeline part, once we submit the pipeline
Training should not error out

Actual Behavior

No such object: gs://<bucket>/pipeline/<>/movielens-pipeline-startup-<>/train-reinforcement-learning-policy_<>/training_artifacts_dir; Failed to read GCS file: gs://<bucket>/pipeline/<>/movielens-pipeline-startup-<>/train-reinforcement-learning-policy_<>/training_artifacts_dir.; Failed to read output parameter training_artifacts_dir with spec type: STRING ; Failed to get and update task output.; Failed to refresh external task state

The pipeline fails during the training step,
image

I am thinking if this might be related
#19 (comment)

@KathyFeiyang if you are still maintaining this, is this something you have run into? I tried using Str after checking the discussion above it does not throw an error but i guess it still does not work as i get an error on the next step

Steps to Reproduce the Problem

  1. Ran the notebook as is with required parameters

Specifications

  • Platform: Vertex AI workbench notebooks

Deployment Prediction Error

Hi, I've been following the tutorial but have been making minor adjustments to fit a GCN model rather than a text-classification one. I have gotten up to the section where we send a POST request for prediction.

When I follow the request sent on the tutorial:

curl -s -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @./predictor/data_object.json \
  http://localhost:7080/predictions/test_model/

I get the error:

Error 1: 

{
  "code": 500,
  "type": "InternalServerException",
  "message": "Worker died."
}

I modified the curl command and sent:

curl -s -X POST \
  -d @./predictor/data_object.json \
  http://localhost:7080/predictions/test_model/

which got me this error:

Error 2: 
{
  "code": 503,
  "type": "InternalServerException",
  "message": "Prediction failed"
}

I think I've pinpointed the issues in the docker logs about what the issues might be, but I can't seem to find a solution.

Error 1 logs:

2022-01-26T17:29:44,750 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-01-26T17:29:44,751 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     worker.run_server()
2022-01-26T17:29:44,752 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 161, in run_server
2022-01-26T17:29:44,753 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2022-01-26T17:29:44,755 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/model_service_worker.py", line 116, in handle_connection
2022-01-26T17:29:44,756 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     cmd, msg = retrieve_msg(cl_socket)
2022-01-26T17:29:44,756 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 36, in retrieve_msg
2022-01-26T17:29:44,757 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     msg = _retrieve_inference_msg(conn)
2022-01-26T17:29:44,757 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 226, in _retrieve_inference_msg
2022-01-26T17:29:44,758 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     request = _retrieve_request(conn)
2022-01-26T17:29:44,759 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 261, in _retrieve_request
2022-01-26T17:29:44,759 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     input_data = _retrieve_input_data(conn)
2022-01-26T17:29:44,760 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/protocol/otf_message_handler.py", line 314, in _retrieve_input_data
2022-01-26T17:29:44,760 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     model_input["value"] = json.loads(value.decode("utf-8"))
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     return _default_decoder.decode(s)
2022-01-26T17:29:44,772 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
2022-01-26T17:29:44,773 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2022-01-26T17:29:44,773 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
2022-01-26T17:29:44,774 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG -     raise JSONDecodeError("Expecting value", s, err.value) from None
2022-01-26T17:29:44,775 [INFO ] W-9000-test_dict_1.0-stdout MODEL_LOG - json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


Error 2 Logs:

2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/service.py", line 102, in predict
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/base.py", line 26, in handle
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     data = self.parse_input(data)
2022-01-26T17:33:27,447 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 19, in parse_input
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     lengths, batch = self._batch_from_json(data)
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 30, in _batch_from_json
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0 ACCESS_LOG - /172.17.0.1:51646 "POST /predictions/test_dict/ HTTP/1.1" 503 6
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     mini_batches = [self._from_json(data_row) for data_row in data_rows]
2022-01-26T17:33:27,448 [INFO ] W-9001-test_dict_1.0 TS_METRICS - Requests5XX.Count:1|#Level:Host|#hostname:6c7c4274b7e0,timestamp:null
2022-01-26T17:33:27,449 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 30, in <listcomp>
2022-01-26T17:33:27,449 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     mini_batches = [self._from_json(data_row) for data_row in data_rows]
2022-01-26T17:33:27,449 [DEBUG] W-9001-test_dict_1.0 org.pytorch.serve.job.Job - Waiting time ns: 162058, Inference time ns: 6406818
2022-01-26T17:33:27,451 [INFO ] W-9001-test_dict_1.0 TS_METRICS - WorkerThreadTime.ms:6|#Level:Host|#hostname:6c7c4274b7e0,timestamp:null
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.6/site-packages/ts/torch_handler/request_envelope/json.py", line 39, in _from_json
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG -     rows = (data.get('data') or data.get('body') or data)['instances']
2022-01-26T17:33:27,452 [INFO ] W-9001-test_dict_1.0-stdout MODEL_LOG - KeyError: 'instances'

My json object:

{
       "instances": [
                     {
                         "data": {
                                 "x": [[-1.0], [0.0], [1.0]], 
                               "edge_index": [[0, 1, 1, 2], [1, 0, 2, 1]],
                                 "y": [1, 2, 3]
                    }
              }
       ]
  }

I've tested out my json object many times to make sure it loads properly, and I've followed the structure shown in the tutorial making sure that instances is a key in the json. So I'm really really stuck as to what's happening. Could someone please help me!

google_cloud_pipeline_components Compiling Error in metadata Parameter in ModelDeployOp()

I try to write something to metadata of vertext ai model.
and my code is like this:

import google.cloud.aiplatform as aip 
from google_cloud_pipeline_components import aiplatform as gcc_aip

@dsl.pipeline(
    name="test-vertext-ai-02",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline():
    ds_op = gcc_aip.TextDatasetCreateOp(
        project=PROJECT_ID,
        display_name=dataset_display_name,
        gcs_source=gcs_source,
        import_schema_uri=aip.schema.dataset.ioformat.text.extraction,
    )

training_job_run_op = gcc_aip.CustomContainerTrainingJobRunOp(
    project=PROJECT_ID,
    display_name='pipeline_test_0112_1030',
    dataset=ds_op.outputs["dataset"],
    annotation_schema_uri=aip.schema.dataset.annotation.text.extraction,
    model_display_name=model_display_name,
    base_output_dir=base_output_dir,
    container_uri='us-central1-docker.pkg.dev/fr-dev-piiworker/ml-training/test_01_image:latest',
    staging_bucket='gs://vertex_ai_test_dev',
    model_serving_container_image_uri=model_serving_container_image_uri,
)

gcc_aip.ModelDeployOp(
    model=training_job_run_op.outputs["model"],
    metadata=None
)

then i compile this code:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=pipeline, package_path="intro_pipeline.json".replace(" ", "_")
)

but i get ths error message:


TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_280/3730446987.py in <module>
  2 
  3 compiler.Compiler().compile(
----> 4     pipeline_func=pipeline, package_path="intro_pipeline.json".replace(" ", "_")
  5 )

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, 
pipeline_name, pipeline_parameters, type_check)
   1275                 pipeline_func=pipeline_func,
   1276                 pipeline_name=pipeline_name,
-> 1277                 pipeline_parameters_override=pipeline_parameters)
   1278             self._write_pipeline(pipeline_job, package_path)
   1279         finally:

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, 
pipeline_name, pipeline_parameters_override)
   1194 
   1195         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
-> 1196             pipeline_func(*args_list)
   1197 
   1198         if not dsl_pipeline.ops:

/tmp/ipykernel_280/786563907.py in pipeline()
 25     gcc_aip.ModelDeployOp(
 26         model=training_job_run_op.outputs["model"],
---> 27         metadata=None
 28     )

TypeError: model_deploy() got an unexpected keyword argument 'metadata'

so how to fix this error?
Thanks!

Vertex-AI Feature Store can't do batch serving from BigQuery as stated in documentation

I'm following this documentation and this example notebook to try and do batch serving using Vertex-AI feature store.

My mapping between label and entity_id+timestamp exists in a bigquery table. The documentation says the following, which suggests that it should be possible to do batch serving with a mapping in bigquery:

The read-instance list specifies the entities and timestamps for the feature values that you want to retrieve. The CSV file or BigQuery table must contain the following columns

However, I can't find any way to do this. The featurestore_service_pb2.BatchReadFeatureValuesRequest class has a required field csv_read_instances (link), but doesn't accept a bigquery source

Describe the solution you'd like

Make it possible to do batch serving with a mapping between features and labels in bigquery. For example, by letting the BatchReadFeatureValuesRequest class accept a BigquerySource instead of a CsvSource.

Google cloud logging type ERROR for regular training progress output

Expected Behavior

Google cloud logging for a custom training job should report training progress as type INFO.

Actual Behavior

Logs show type ERROR

Steps to Reproduce the Problem

I've been using the notebook in this repo: Training, Tuning and Deploying a PyTorch Text Classification Model on Vertex AI

In the section, "Run Custom Job on Vertex Training with a pre-built container" a package is created and uploaded to a bucket and the job is run. However when monitoring the logs, I get the following:

ERROR 2022-01-26 14:19:34 +1100 workerpool0-0 31%|??? | 5974/19120 [31:21<59:25, 3.69it/s]
ERROR 2022-01-26 14:19:41 +1100 workerpool0-0 Configuration saved in /tmp/xlm-roberta-large/checkpoint-6000/config.json

For some events, it shows:

INFO 2022-01-26 14:19:41 +1100 workerpool0-0 {'loss': 5.3862, 'learning_rate': 1.3723849372384938e-05, 'epoch': 3.14}

Failed to pick channel; failed to connect to all addresses

Expected Behavior

In "Test your query" cell of notebook, a Match is expected to be returned

Actual Behavior


_InactiveRpcError Traceback (most recent call last)
/tmp/ipykernel_19870/467153318.py in
108 request.float_val.append(val)
109
--> 110 response = stub.Match(request)
111 response

~/.local/lib/python3.7/site-packages/grpc/_channel.py in call(self, request, timeout, metadata, credentials, wait_for_ready, compression)
944 state, call, = self._blocking(request, timeout, metadata, credentials,
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
948 def with_call(self,

~/.local/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
847 return state.response
848 else:
--> 849 raise _InactiveRpcError(state)
850
851

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1635211606.895231791","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3158,"referenced_errors":[{"created":"@1635211606.895230354","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":147,"grpc_status":14}]}"

Steps to Reproduce the Problem

  1. Run https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/community/matching_engine/matching_engine_for_indexing.ipynb - with appropriate changes for your project, region, etc.
  2. Get all the way to "Test your query"
  3. Observe that this code leads to the above error:
import match_service_pb2
import match_service_pb2_grpc

channel = grpc.insecure_channel("{}:10000".format(DEPLOYED_INDEX_SERVER_IP))
stub = match_service_pb2_grpc.MatchServiceStub(channel)

Specifications

  • Version:
  • Platform: JupyterLab on Google Vertex Notebook

automl_tabular_classification_beans: Failed at KFP DAG run

Command

job.run

Error

Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [28]":
Step #3: ---------------------------------------------------------------------------
Step #3: RuntimeError                              Traceback (most recent call last)
Step #3: /tmp/ipykernel_19/2758967286.py in <module>
Step #3:       8 )
Step #3:       9 
Step #3: ---> 10 job.run()
Step #3:      11 
Step #3:      12 get_ipython().system(' rm tabular_classification_pipeline.json')
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
Step #3:     728                 if self:
Step #3:     729                     VertexAiResourceNounWithFutureManager.wait(self)
Step #3: --> 730                 return method(*args, **kwargs)
Step #3:     731 
Step #3:     732             # callbacks to call within the Future (in same Thread)
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in run(self, service_account, network, sync)
Step #3:     250         self.submit(service_account=service_account, network=network)
Step #3:     251 
Step #3: --> 252         self._block_until_complete()
Step #3:     253 
Step #3:     254     def submit(
Step #3: 
Step #3: ~/.local/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
Step #3:     347         # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
Step #3:     348         if self._gca_resource.state in _PIPELINE_ERROR_STATES:
Step #3: --> 349             raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
Step #3:     350         else:
Step #3:     351             _LOGGER.log_action_completed_against_resource("run", "completed", self)
Step #3: 
Step #3: RuntimeError: Job failed with:
Step #3: code: 9
Step #3: message: "The DAG failed because some tasks failed. The failed tasks are: [condition-deploy-decision-1].; Job (project_id = python-docs-samples-tests, job_id = 4820017083611873280) is failed due to the above error.; Failed to handle the job: {project_number = 1012616486416, job_id = 4820017083611873280}"

lightweight_functions_component_io_kfp fails

Step #4: lightweight_functions_component_io_kfp.ipynb                      FAILED    00:00:18    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [20]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         KeyError                                  Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_32486/4032352611.py in <module>
Step #4:                                                                                               1 from kfp.v2 import compiler
Step #4:                                                                                               2
Step #4:                                                                                         ----> 3 compiler.Compiler().compile(
Step #4:                                                                                               4     pipeline_func=pipeline, package_path="component_io_job.json"
Step #4:                                                                                               5 )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
Step #4:                                                                                            1175             kfp.TYPE_CHECK = type_check
Step #4:                                                                                            1176             kfp.COMPILING_FOR_V2 = True
Step #4:                                                                                         -> 1177             pipeline_job = self._create_pipeline_v2(
Step #4:                                                                                            1178                 pipeline_func=pipeline_func,
Step #4:                                                                                            1179                 pipeline_name=pipeline_name,
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
Step #4:                                                                                            1106
Step #4:                                                                                            1107         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
Step #4:                                                                                         -> 1108             pipeline_func(*args_list)
Step #4:                                                                                            1109
Step #4:                                                                                            1110         self._validate_exit_handler(dsl_pipeline)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_32486/2896638886.py in pipeline(message)
Step #4:                                                                                              14     train_task = train(
Step #4:                                                                                              15         dataset_one=preprocess_task.outputs["output_dataset_one"],
Step #4:                                                                                         ---> 16         dataset_two=preprocess_task.outputs["output_dataset_two"],
Step #4:                                                                                              17         imported_dataset=importer.output,
Step #4:                                                                                              18         message=preprocess_task.outputs["output_parameter"],
Step #4: 
Step #4:                                                                                         KeyError: 'output_dataset_two'

In notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb Can't create custom job

Expected Behavior

Start custom training job in Vertex.

Code

def create_custom_job(custom_job):
    response = clients["job"].create_custom_job(parent=PARENT, custom_job=custom_job)
    print("name:", response.name)
    print("display_name:", response.display_name)
    print("state:", response.state)
    print("create_time:", response.create_time)
    print("update_time:", response.update_time)
    return response

response = create_custom_job(custom_job)

Actual Behavior

Got this error:

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     65         try:
---> 66             return callable_(*args, **kwargs)
     67         except grpc.RpcError as exc:

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "Received http2 header with status: 404"
	debug_error_string = "{"created":"@1644551451.237575140","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":131,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}"
>

The above exception was the direct cause of the following exception:

MethodNotImplemented                      Traceback (most recent call last)
/tmp/ipykernel_12135/2660296843.py in <module>
      8     return response
      9 
---> 10 response = create_custom_job(custom_job)

/tmp/ipykernel_12135/2660296843.py in create_custom_job(custom_job)
      1 def create_custom_job(custom_job):
----> 2     response = clients["job"].create_custom_job(parent=PARENT, custom_job=custom_job)
      3     print("name:", response.name)
      4     print("display_name:", response.display_name)
      5     print("state:", response.state)

/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py in create_custom_job(self, request, parent, custom_job, retry, timeout, metadata)
    648 
    649         # Send the request.
--> 650         response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
    651 
    652         # Done; return the response.

/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
    152             kwargs["metadata"] = metadata
    153 
--> 154         return wrapped_func(*args, **kwargs)
    155 
    156 

/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     66             return callable_(*args, **kwargs)
     67         except grpc.RpcError as exc:
---> 68             raise exceptions.from_grpc_error(exc) from exc
     69 
     70     return error_remapped_callable

MethodNotImplemented: 501 Received http2 header with status: 404

Notebook fails: lightweight_functions_component_io_kfp.ipynb

Expected Behavior

Notebook doesn't fail

Actual Behavior

Step #4: lightweight_functions_component_io_kfp.ipynb                      FAILED    00:00:16    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [20]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         KeyError                                  Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_31031/4032352611.py in <module>
Step #4:                                                                                               1 from kfp.v2 import compiler
Step #4:                                                                                               2
Step #4:                                                                                         ----> 3 compiler.Compiler().compile(
Step #4:                                                                                               4     pipeline_func=pipeline, package_path="component_io_job.json"
Step #4:                                                                                               5 )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
Step #4:                                                                                            1134       kfp.TYPE_CHECK = type_check
Step #4:                                                                                            1135       kfp.COMPILING_FOR_V2 = True
Step #4:                                                                                         -> 1136       pipeline_job = self._create_pipeline_v2(
Step #4:                                                                                            1137           pipeline_func=pipeline_func,
Step #4:                                                                                            1138           pipeline_name=pipeline_name,
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
Step #4:                                                                                            1070
Step #4:                                                                                            1071     with dsl.Pipeline(pipeline_name) as dsl_pipeline:
Step #4:                                                                                         -> 1072       pipeline_func(*args_list)
Step #4:                                                                                            1073
Step #4:                                                                                            1074     self._validate_exit_handler(dsl_pipeline)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_31031/2896638886.py in pipeline(message)
Step #4:                                                                                              14     train_task = train(
Step #4:                                                                                              15         dataset_one=preprocess_task.outputs["output_dataset_one"],
Step #4:                                                                                         ---> 16         dataset_two=preprocess_task.outputs["output_dataset_two"],
Step #4:                                                                                              17         imported_dataset=importer.output,
Step #4:                                                                                              18         message=preprocess_task.outputs["output_parameter"],
Step #4: 
Step #4:                                                                                         KeyError: 'output_dataset_two'
Step #4: 
Step #4: === END RESULTS===
Step #4: 

google_cloud_pipeline_components_automl_images.ipynb fails due to AutoMLImageClassificationDeployedModelNodes quota exceeded

Logs: https://pantheon.corp.google.com/vertex-ai/locations/us-central1/pipelines/runs/automl-image-training-v2-20220202211108?project=python-docs-samples-tests

RuntimeError: Failed to create the resource. Error: {'code': 8, 'message': 'The following quotas are exceeded: AutoMLImageClassificationDeployedModelNodes'}

Solutions

  • See why cleanup script doesn't solve this.
  • Increase quota.

How to reuse an existing Endpoint in pipeline?

Hi Experts

Expected Behavior

Hope to reuse an existing endpoint in pipeline.

Also see that google_cloud_pipeline_components.v1.endpoint has only EndpointCreateOp method.
https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.0/google_cloud_pipeline_components.v1.endpoint.html

Is there a way to reuse an existing Endpoint? And how to reuse it? This will saves costs.

Actual Behavior

Have to create a new Endpoint in the pipeline.
See that the EndpointCreateOp step is included in all the sample code.

 endpoint_op = EndpointCreateOp(
     project=project,
     location=region,
     display_name="train-automl-flowers",
 )

 ModelDeployOp(
     model=training_run_task.outputs["model"],
     endpoint=endpoint_op.outputs["endpoint"],
     automatic_resources_min_replica_count=1,
     automatic_resources_max_replica_count=1,
 )

Steps to Reproduce the Problem

N/A

Specifications

  • Version:
    google-cloud-pipeline-components-1.0.0
  • Platform:
    Google Cloud Vertex AI

Problem in the preprocessing function while defining the serving_default signature for a model in a community notebook

Problem

In the section Deployment > Upload the model for serving > Serving function for image data of this notebook, the preprocess function is defined as,

def _preprocess(bytes_input):
    decoded = tf.io.decode_jpeg(bytes_input, channels=3)
    decoded = tf.image.convert_image_dtype(decoded, tf.float32)
    resized = tf.image.resize(decoded, size=(32, 32))
    rescale = tf.cast(resized / 255.0, tf.float32)
    return rescale

For this, the author has written,

resized / 255.0 - Rescales (normalization) the pixel data between 0 and 1.

But if we check the working of tf.image.convert_image_dtype as per the current TensorFlow version 2.7.0, we see that this function not only converts the dtype, but also normalizes/rescales the image according to the target dtype.

In the documentation, it is mentioned,

Images that are represented using floating-point values are expected to have values in the range [0,1). Image data stored in integer data types are expected to have values in the range [0, MAX], where MAX is the largest positive representable number for the data type.

This means that while converting to tf.float32 it also normalizes the decoded image in the range [0,1]. Therefore, the line rescale = tf.cast(resized / 255.0, tf.float32) must be omitted.

Specifications

  • Version: Tensorflow Version: 2.7.0

model_monitoring notebook is flaky

Describe the bug
model_monitoring notebook is flaky

Step #4: model_monitoring.ipynb                                            FAILED    00:28:43    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [13]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         _InactiveRpcError                         Traceback (most recent call last)
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              66         try:
Step #4:                                                                                         ---> 67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
Step #4:                                                                                             945                                       wait_for_ready, compression)
Step #4:                                                                                         --> 946         return _end_unary_response_blocking(state, call, False, None)
Step #4:                                                                                             947
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
Step #4:                                                                                             848     else:
Step #4:                                                                                         --> 849         raise _InactiveRpcError(state)
Step #4:                                                                                             850
Step #4: 
Step #4:                                                                                         _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
Step #4:                                                                                         	status = StatusCode.INVALID_ARGUMENT
Step #4:                                                                                         	details = "List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.	"
Step #4:                                                                                         	debug_error_string = "{"created":"@1629366729.915949502","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"List of found errors:\t1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.\t","grpc_status":3}"
Step #4:                                                                                         >
Step #4: 
Step #4:                                                                                         The above exception was the direct cause of the following exception:
Step #4: 
Step #4:                                                                                         InvalidArgument                           Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_31500/3164745998.py in <module>
Step #4:                                                                                              20 objective_configs = set_objectives(model_ids, objective_template)
Step #4:                                                                                              21
Step #4:                                                                                         ---> 22 monitoring_job = create_monitoring_job(objective_configs)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_31500/1942246749.py in create_monitoring_job(objective_configs)
Step #4:                                                                                              56     client = JobServiceClient(client_options=options)
Step #4:                                                                                              57     parent = f"projects/{PROJECT_ID}/locations/{REGION}"
Step #4:                                                                                         ---> 58     response = client.create_model_deployment_monitoring_job(
Step #4:                                                                                              59         parent=parent, model_deployment_monitoring_job=job
Step #4:                                                                                              60     )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/cloud/aiplatform_v1beta1/services/job_service/client.py in create_model_deployment_monitoring_job(self, request, parent, model_deployment_monitoring_job, retry, timeout, metadata)
Step #4:                                                                                            2294
Step #4:                                                                                            2295         # Send the request.
Step #4:                                                                                         -> 2296         response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
Step #4:                                                                                            2297
Step #4:                                                                                            2298         # Done; return the response.
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
Step #4:                                                                                             143             kwargs["metadata"] = metadata
Step #4:                                                                                             144
Step #4:                                                                                         --> 145         return wrapped_func(*args, **kwargs)
Step #4:                                                                                             146
Step #4:                                                                                             147
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4:                                                                                         ---> 69             six.raise_from(exceptions.from_grpc_error(exc), exc)
Step #4:                                                                                              70
Step #4:                                                                                              71     return error_remapped_callable
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/six.py in raise_from(value, from_value)
Step #4: 
Step #4:                                                                                         InvalidArgument: 400 List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.
Step #4: metrics_viz_run_compare_kfp.ipynb                                 FAILED    00:01:55    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [28]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         ValueError                                Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_1835/3492064536.py in <module>
Step #4:                                                                                         ----> 1 df = pd.DataFrame(pipeline_df["metric.confidenceMetrics"][0])
Step #4:                                                                                               2 auc = np.trapz(df["recall"], df["falsePositiveRate"])
Step #4:                                                                                               3 plt.plot(df["falsePositiveRate"], df["recall"], label="auc=" + str(auc))
Step #4:                                                                                               4 plt.legend(loc=4)
Step #4:                                                                                               5 plt.show()
Step #4: 
Step #4:                                                                                         ~/.local/lib/python3.9/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
Step #4:                                                                                             728         else:
Step #4:                                                                                             729             if index is None or columns is None:
Step #4:                                                                                         --> 730                 raise ValueError("DataFrame constructor not properly called!")
Step #4:                                                                                             731
Step #4:                                                                                             732             # Argument 1 to "ensure_index" has incompatible type "Collection[Any]";
Step #4: 
Step #4:                                                                                         ValueError: DataFrame constructor not properly called!

What sample is this bug related to?

Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

System Information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Framework and version (Tensorflow, scikit-learn, XGBoost):
  • Python version:
  • Exact command to reproduce:
  • Tensorflow Transform environment (if applicable, see below):

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context
Please fix (it may be hard to reproduce as it doesn't happen all the time) or move to unofficial.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: Invalid JSON (parsing failed)
Message: `Syntax error near ,
],

}
`

notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb doesn't install tensorflow

Logs: https://pantheon.corp.google.com/cloud-build/builds/98afe3ac-1589-4c10-bb9d-f28e1841558c?project=python-docs-samples-tests

Step #3: Copying file://notebooks/official/explainable_ai/gapic-custom_tabular_regression_online_explain.ipynb [Content-Type=application/octet-stream]...
Step #3: / [0 files][    0.0 B/139.4 KiB]                                                
/ [1 files][139.4 KiB/139.4 KiB]                                                
Step #3: Operation completed over 1 objects/139.4 KiB.                                    
Step #3: Uploaded output to: gs://cloud-build-notebooks-presubmit/executed_notebooks/PR_120/BUILD_831c33e0-bc70-4cd3-a3cf-a6d3e9b3c4ae/gapic-custom_tabular_regression_online_explain.ipynb
Step #3: Traceback (most recent call last):
Step #3:   File "/workspace/.cloud-build/execute_notebook_cli.py", line 34, in <module>
Step #3:     ExecuteNotebook.execute_notebook(
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 84, in execute_notebook
Step #3:     raise execution_exception
Step #3:   File "/workspace/.cloud-build/ExecuteNotebook.py", line 53, in execute_notebook
Step #3:     pm.execute_notebook(
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 122, in execute_notebook
Step #3:     raise_for_execution_errors(nb, output_path)
Step #3:   File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
Step #3:     raise error
Step #3: papermill.exceptions.PapermillExecutionError: 
Step #3: ---------------------------------------------------------------------------
Step #3: Exception encountered at "In [33]":
Step #3: ---------------------------------------------------------------------------
Step #3: ModuleNotFoundError                       Traceback (most recent call last)
Step #3: /tmp/ipykernel_15/818968412.py in <module>
Step #3: ----> 1 import tensorflow as tf
Step #3:       2 
Step #3:       3 model = tf.keras.models.load_model(MODEL_DIR)
Step #3: 
Step #3: ModuleNotFoundError: No module named 'tensorflow'
Step #3: 
Finished Step #3
ERROR
ERROR: build step 3 "gcr.io/cloud-devrel-public-resources/python-samples-testing-docker:latest" failed: step exited with non-zero status: 1

Using model.resource_name and endpoint.resource_name to instantiate the model and endpoint resources

Expected Behavior

In the notebooks/community/managed_notebooks/fraud_detection/fraud-detection-model.ipynb
we need to Instantiate the resources using the resource names.

model = aiplatform.Model(model_name=model.resource_name)
endpoint = aiplatform.Endpoint(endpoint_name=endpoint.resource_name)

Actual Behavior

Failed to instantiate the resources using the displayed_names

Steps to Reproduce the Problem

  1. Create a Managed-notebook in Vertex AI Workbench
  2. open the terminal and git clone the repo
  3. go to notebooks/community/managed_notebooks/fraud_detection/fraud-detection-model.ipynb
  4. run the cells until cell of "Deploy the model to the created Endpoint"
  5. run the current cell

Specifications

Screen Shot 2021-11-03 at 5 40 37 PM

  • Version:
  • Platform:

Unable to run custom job in sdk-custom-image-classification-batch.ipynb

In vertex-ai-samples/notebooks/official/custom/sdk-custom-image-classification-batch.ipynb, when I run the following cell on a Vertex AI workbench notebook

job = aiplatform.CustomTrainingJob(
    display_name=JOB_NAME,
    script_path="task.py",
    container_uri=TRAIN_IMAGE,
    requirements=["tensorflow_datasets==1.3.0"],
    model_serving_container_image_uri=DEPLOY_IMAGE,
)

MODEL_DISPLAY_NAME = "cifar10-" + TIMESTAMP

if TRAIN_GPU:
    model = job.run(
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_type=TRAIN_GPU.name,
        accelerator_count=TRAIN_NGPU,
    )
else:
    model = job.run(
        model_display_name=MODEL_DISPLAY_NAME,
        args=CMDARGS,
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_count=0,
    )

it gives error
InvalidArgument: 400 Accelerator "NVIDIA_TESLA_K80" is not supported for machine type "n1-standard-4".

My understanding is Tesla K80 is compatible with 4vcpu, as listed here. https://cloud.google.com/ai-platform/training/docs/using-gpus

Regression test failures: Various KFP failures

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.