kubeflow-kale / kale Goto Github PK

Kubeflow’s superfood for Data Scientists

Home Page: http://kubeflow-kale.github.io

License: Apache License 2.0

Python 60.65% Dockerfile 0.36% Shell 0.22% Jupyter Notebook 0.23% JavaScript 0.03% TypeScript 34.98% CSS 1.55% Jinja 1.97%

jupyter-notebook kubeflow kubeflow-pipelines machine-learning

kale's Introduction

KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.

Kubeflow is a great platform for orchestrating complex workflows on top Kubernetes and Kubeflow Pipeline provides the mean to create reusable components that can be executed as part of workflows. The self-service nature of Kubeflow make it extremely appealing for Data Science use, at it provides an easy access to advanced distributed jobs orchestration, re-usability of components, Jupyter Notebooks, rich UIs and more. Still, developing and maintaining Kubeflow workflows can be hard for data scientists, who may not be experts in working orchestration platforms and related SDKs. Additionally, data science often involve processes of data exploration, iterative modelling and interactive environments (mostly Jupyter notebook).

Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code.

Read more about Kale and how it works in this Medium post: Automating Jupyter Notebook Deployments to Kubeflow Pipelines with Kale

Getting started

Install the Kale backend from PyPI and the JupyterLab extension. You can find a set of curated Notebooks in the examples repository

# install kale
pip install kubeflow-kale

# install jupyter lab
pip install "jupyterlab>=2.0.0,<3.0.0"

# install the extension
jupyter labextension install kubeflow-kale-labextension
# verify extension status
jupyter labextension list

# run
jupyter lab

To build images to be used as a NotebookServer in Kubeflow, refer to the Dockerfile in the docker folder.

FAQ

Head over to FAQ to read about some known issues and some of the limitations imposed by the Kale data marshalling model.

Resources

Kale introduction blog post
Codelabs showcasing Kale working in MiniKF with Arrikto's Rok:
- From Notebook to Kubeflow Pipelines
- From Notebook to Kubeflow Pipelines with HP Tuning
KubeCon NA Tutorial 2019: From Notebook to Kubeflow Pipelines: An End-to-End Data Science Workflow / video
CNCF Webinar 2020: From Notebook to Kubeflow Pipelines with MiniKF & Kale / video
KubeCon EU Tutorial 2020: From Notebook to Kubeflow Pipelines with HP Tuning: A Data Science Journey / video

Contribute

Backend

Create a new Python virtual environment with Python >= 3.6. Then:

cd backend/
pip install -e .[dev]

# run tests
pytest -x -vv

Labextension

The JupyterLab Python package comes with its own yarn wrapper, called jlpm. While using the previously installed venv, install JupyterLab by running:

pip install "jupyterlab>=2.0.0,<3.0.0"

You can then run the following to install the Kale extension:

cd labextension/

# install dependencies from package.lock
jlpm install
# build extension
jlpm run build

# list installed jp extensions
jlpm labextension list
# install Kale extension
jlpm labextension install .

# for development:
# build and watch
jlpm run watch

# in another shell, run JupyterLab in watch mode
jupyter lab --no-browser --watch

Git Hooks

This repository uses husky to set up git hooks.

For husky to function properly, you need to have yarn installed and in your PATH. The reason that is required is that husky is installed via jlpm install and jlpm is a yarn wrapper. (Similarly, if it was installed using the npm package manager, then npm would have to be in PATH.)

Currently installed git hooks:

pre-commit: Run a prettier check on staged files, using pretty-quick

kale's People

Contributors

Stargazers

Watchers

Forkers

tesla3 klolos arrikto polya20 gyliu513 joseph-zhong chanwit micseb jinchihe t1seungy zlapp cloudysunny14 shangdibufashi huyanhcsp stefanofioravanzo sangam14 gavinljj oblynx dedmari sylus aniruddhachoudhury adamjm ivsanro1 radhakrishnang ibeakanmaj gadgeteer mstei4176 klaven hubayirp tusharkalecam karlschriek chsqueen noahhai bruceche11 ai-platform rochaporto davidspek suzil animeshsingh noushi zhilongli cmontemuino mardom nkgfirecream shashisingh opentechfn victorialm21 ydataai iptizer a3digit heedokang sagarshrestha24 soot3 vtsthegreat seunghwan1228 akhimji gbrlins md-fa pervrosen prashanthb-ai ca-scribner nehakapila94 jimmywhitaker mroy31 srinivasav22 ishan-kumar2 stjordanis kang9779 spathical ginstrom jjchen762 brness cah-srinath-deenadayalan pshah16 akravacyber jazzsir tuan-ff konsloiz joelitam2021 danielschulz kbuecher hellobiek subhrajit-mohanty githubdanielpape justgolyw galaxyhotdog ajyeager bimhud saemaromoon abhijitkul hoongeun gymaganis shashanksr6694 emayssat javyxu louanes1 jineunji veekayr epec254 prachapratik

kale's Issues

Titantic example pipeline fails

I followed the example pipeline:

# install kale
pip install kubeflow-kale

# download a tagged example notebook
wget https://raw.githubusercontent.com/kubeflow-kale/examples/master/titanic-ml-dataset/titanic_dataset_ml.ipynb
# convert the notebook to a python script that defines a kfp pipeline
kale --nb titanic_dataset_ml.ipynb```

The Kubeflow Pipeline UI says: This step is in Pending state with this message: ContainerCreating

but no container is being created.

Here are the logs from the ml-pipeline container:

I0212 19:25:31.311310       1 error.go:218] Failed to update run
github.com/kubeflow/pipelines/backend/src/apiserver/storage.(*RunStore).UpdateRun
        backend/src/apiserver/storage/run_store.go:405
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReportWorkflowResource
        backend/src/apiserver/resource/resource_manager.go:538
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
        backend/src/apiserver/server/report_server.go:39
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
        backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
        external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
        external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
        external/org_golang_google_grpc/server.go:685
runtime.goexit
        GOROOT/src/runtime/asm_amd64.s:1333
InternalServerError: Failed to update run 3e04659f-2c59-4f1e-ba2d-ebd36d2c2620. Row not found
github.com/kubeflow/pipelines/backend/src/common/util.NewInternalServerError
        backend/src/common/util/error.go:142
github.com/kubeflow/pipelines/backend/src/apiserver/storage.(*RunStore).UpdateRun
        backend/src/apiserver/storage/run_store.go:405
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReportWorkflowResource
        backend/src/apiserver/resource/resource_manager.go:538
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
        backend/src/apiserver/server/report_server.go:39
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
        backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
        external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
        external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
        external/org_golang_google_grpc/server.go:685
runtime.goexit
        GOROOT/src/runtime/asm_amd64.s:1333
Failed to update the run.
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrap
        backend/src/common/util/error.go:211
github.com/kubeflow/pipelines/backend/src/common/util.Wrap
        backend/src/common/util/error.go:244
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReportWorkflowResource
        backend/src/apiserver/resource/resource_manager.go:540
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
        backend/src/apiserver/server/report_server.go:39
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
        backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
        external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
        external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
        external/org_golang_google_grpc/server.go:685
runtime.goexit
        GOROOT/src/runtime/asm_amd64.s:1333
Report workflow failed.
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrap
        backend/src/common/util/error.go:211
github.com/kubeflow/pipelines/backend/src/common/util.Wrap
        backend/src/common/util/error.go:244
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
        backend/src/apiserver/server/report_server.go:41
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
        backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
        external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
        external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
        external/org_golang_google_grpc/server.go:685
runtime.goexit
        GOROOT/src/runtime/asm_amd64.s:1333
/api.ReportService/ReportWorkflow call failed
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrapf
        backend/src/common/util/error.go:206
github.com/kubeflow/pipelines/backend/src/common/util.Wrapf
        backend/src/common/util/error.go:231
main.apiServerInterceptor
        backend/src/apiserver/interceptor.go:32
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
        external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
        external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
        external/org_golang_google_grpc/server.go:685
runtime.goexit
        GOROOT/src/runtime/asm_amd64.s:1333
I0212 19:25:32.951929       1 interceptor.go:29] /api.ExperimentService/ListExperiment handler starting
I0212 19:25:32.953426       1 interceptor.go:37] /api.ExperimentService/ListExperiment handler finished
I0212 19:25:32.955800       1 interceptor.go:29] /api.ExperimentService/GetExperiment handler starting
I0212 19:25:32.956559       1 interceptor.go:37] /api.ExperimentService/GetExperiment handler finished
I0212 19:25:33.801737       1 interceptor.go:29] /api.RunService/CreateRun handler starting
I0212 19:25:34.015397       1 interceptor.go:37] /api.R

The pipeline doesn't generate any new events in in the Kubeflow notebook namespace.

kale.log

02-12 19:25 | kubeflow-kale |  DEBUG: ------------- Kale Start Run -------------
02-12 19:25 | kubeflow-kale |  INFO: Pipeline code saved at titanic-ml-0myxe.kale.py

Kubeflow version: 0.7.0
kale version: 0.4.0
Running on Azure AKS (Kubernetes 1.15.7)

Move randomness from pipeline name to pipeline version

Instead of adding random 5 characters to the pipeline's name:

kale/backend/kale/command_line.py

Line 109 in c8e2104

run_name = kfp_utils.generate_run_name(

add them to the pipeline's version like
https://www.kubeflow.org/docs/pipelines/tutorials/sdk-examples/#creating-a-pipeline-and-a-pipeline-version-using-the-sdk

Otherwise, kale, with each compile & run, generates a new "single-version" lineage of pipelines and clutters the pipeline overview with single pipeline

Tests to be implemented

In this issue we document all the tests that need to be implemented.

Test that multiple untagged cells are merged together and appended to their father, where the father is the earliest cell with a step_name
Add an e2e test with pipeline parameters and pipeline metrics
Add an e2e test with katib metadata
capture_streams in utils.jupyter_utils.py
Add more configurations to metadata validation

add_node_selector_constraint with Kale

Hello,

I would like to specify that one component (pipeline step) has to be run in a specified node of the cluster. This can be achieved by doing op.add_node_selector_constraint when working with kfp.dsl directly, but I would like to know if there is an automatic way of declaring it with Kale.

If there is not, I guess I have to manually edit either the .py file that generates the yaml of the pipeline, or edit the yaml directly, but none of this cases is optimal.

Also, if there is not, do you have any plans of adding this feture in the future? Maybe as an extra option when declaring the cell metadata in JupyterLab?

Thank you

Kale uploaded pipeline run with ModuleNotFoundError error in k8s kubeflow

I tried to use kale to convert the notebook titanic_dataset_ml to pipelines in my k8s kubeflow, the pipelines uploaded successfully, but the loaddata part is failed with the following error:
‘’‘
Traceback (most recent call last):
File "", line 59, in
File "", line 16, in loaddata
ModuleNotFoundError: No module named 'seaborn'
’‘’
I guess that is because i did not enable 'Use this notebook's volumes' , I don't know why i can not enable it in my k8s kubeflow, is it because the pvc I used had no storageclass? Is there any way to install the missed python package 'seaborn'?

Variables referenced inside functions not passed between pipeline stages

if you have:

def predict_fn(x):
    return clf.predict(preprocessor.transform(x))

Where clf is defined earlier and you use predict_fn in a future pipeline stage it will fail as clf is not passed to that stage.
I need this as I need to pass in a function/lambda to another package class.

My current code has

predict_fn = lambda x: clf.predict(preprocessor.transform(x))

This will also fail for same reason.

Marshal Volume: mount point and permissions

Currently Kale creates a Volume as the first step of a pipeline and mounts it under /marshal in every pipeline step. This volume is used to serialize data that needs to be passed between pipeline steps.

This mount point can be problematic in case the docker base image used by the pipeline steps does not execute as root.

Possible solutions

1. Mount the Volume in home directory

Pros

Removes any issue related to write permissions

Cons

Need a way to know what the home directory of the executing user in the pipeline step will be. In case pipeline steps run on the same docker image as the notebook server, then this information can be retrieved when running Kale on the notebook server.
Otherwise, if the user defines a custom image, we should ask also for the HOME path.

2. Do not use VolumeOp

We could save by default all marshal data under ~/.kale-marshal/.

Pros

No need to define a Volume. Data is just written to HOME regardless of the user or docker image

Cons

Data would be save in the workspace volume, taking a toll to the size of the volume.

"!pip install " embedding can't be compiled to pipeline

When I ran the titanic example on the Kubeflow deployed on K8s cluster, I met the seaborn, pandas and sklearn module not found problem during compile and run the pipeline phase. I solved this problem by add the pip install command in my notebook server docker file. I think these three libraries are quite common and can be built to the image. But when I run Taxicab example, "apache_beam" module isn't installed. I think this module isn't common and not a good choice to install it in the docker file. So I added "!pip3 install apache_beam --user" in the .jupyter file. Running this line in the notebook server is successful. But when I tried to compile and run the pipeline, this line can't been parsed. The error message is,
"An RPC Error has occurred
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36

Type: RPC

Method: nb.compile_notebook()

Code: 6 (UnhandledError)

Transaction ID: s514h7chi8

Message: invalid syntax (, line 53)

Details: You can find more information under /home/jovyan/kale.log"

When I view the kale.log, it is said,
"Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/kale/rpc/run.py", line 92, in run
result = func(request, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kale/rpc/nb.py", line 61, in compile_notebook
pipeline_graph, pipeline_parameters = instance.notebook_to_graph()
File "/usr/local/lib/python3.6/dist-packages/kale/core.py", line 231, in notebook_to_graph
ignore_symbols=set(pipeline_parameters_dict.keys()))
File "/usr/local/lib/python3.6/dist-packages/kale/static_analysis/dep_analysis.py", line 61, in variables_dependencies_detection
all_names = inspector.get_all_names(block_data['source'])
File "/usr/local/lib/python3.6/dist-packages/kale/static_analysis/inspector.py", line 193, in get_all_names
tree = ast.parse(code)
File "/usr/lib/python3.6/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 2
!pip3 install apache_beam --user
^
SyntaxError: invalid syntax"

Is there any way other than adding all the install library commands to the docker file?

Incompatible kfp-server-api version and python version requirements incorrect

While trying to incorporate kubeflow-kale into my notebook image I consistently get the error:
ERROR: kfp 0.3.0 has requirement kfp-server-api<0.4.0,>=0.2.5, but you'll have kfp-server-api 0.1.18.3 which is incompatible.
It seems there there is a discrepancy in the setup.py requirements.

Also, the image I am using has python 3.8 installed by default, which should be supported as python_requires='>=3.6.0' has been specified. However, the requirement 'ml_metadata > 0.21, < 0.22' creates an error with python 3.8 as ml_metadata is not available for python 3.8.

Titanic Example Freezes on Load Data

kubeflow titanic-ml-o56ks-44tt9-3397020455 0/1 Completed 0 23s
kubeflow titanic-ml-o56ks-44tt9-4161189240 0/2 Pending 0 20s
kubeflow titanic-ml-o56ks-44tt9-687348209 0/1 Completed 0 23s

When I run the titanic-ML example, it freezes on the loaddata stage.

I suspect the problem is that my kale environment doesn't allow me to connect my existing storage from my notebook. The toggle grayed out. Any ideas why?

After I try tailing the logs for the container, I get the option wait | main. I try tailing the log on both, and it's empty. Does anyone know what's going on?

$ kubectl logs --namespace kubeflow titanic-ml-o56ks-44tt9-4161189240
Error from server (BadRequest): a container name must be specified for pod titanic-ml-o56ks-44tt9-4161189240, choose one of: [wait main]

TF2.0 Kale in MiniKF

Hi, I've installed MiniKF by following the tutorials on kubeflow.org. When creating a new notebook server, there are arrikto images (that include jupyterlab+kale) for TF 1.15, but not for 2.0. Is there any way to use Kale for TF 2.0+? Apologies if this is the wrong place to ask, or if the answer is really simple. Thank you~

%%writefile causes error

if you have a pipeline cell

%%writefile sa.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: minio-sa
secrets:
  - name: container-secret

Compile this with current master you will get an error.

Message: Flakes reported the following error:
	kale:6:10: invalid syntax
	metadata:	
	         ^

import tensorflow causes kernel crash

When importing the tensorflow module the following dialog appears

An unexpected error has occurred
Browser: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0

Type: JS

Message: Error: Canceled future for execute_request message before replies were done

Details: Please see the console for more information

On the browser console I see this

Kernel: connected (a72fce90-4d70-47d3-8fe1-bf0d33aaf164) default.js:1487:17
Kernel: connected (4d5f461a-4dde-4d37-908c-54aa1b2415e3) default.js:1487:17
Kernel: restarting (a72fce90-4d70-47d3-8fe1-bf0d33aaf164) default.js:1487:17
Error: Canceled future for execute_request message before replies were done 2 future.js:142:30
Kernel: starting (a72fce90-4d70-47d3-8fe1-bf0d33aaf164)

The issue doesn't seems to be related to the browser since the same happens when running on Chromium

An unexpected error has occurred
Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) snap Chromium/83.0.4103.106 Chrome/83.0.4103.106 Safari/537.36

Type: JS

Message: Error: Canceled future for execute_request message before replies were done

Details: Please see the console for more information

I'm running a notebook server with an image built on a Dockerfile based on Dockerfile.rok. The error happens also with the original starting image, gcr.io/kubeflow-images-public/tensorflow-1.14.0-notebook-cpu:v0.7.0

FROM gcr.io/kubeflow-images-public/tensorflow-2.1.0-notebook-cpu:1.0.0
USER root

# Install basic dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        ca-certificates bash-completion tar less \
        python-pip python-setuptools build-essential python-dev \
        python3-pip python3-wheel && \
    rm -rf /var/lib/apt/lists/*

ENV SHELL /bin/bash
COPY bashrc /etc/bash.bashrc
RUN echo "set background=dark" >> /etc/vim/vimrc.local

# Install latest KFP SDK & Kale & JupyterLab Extension
RUN pip3 install --upgrade pip && \
    pip3 install --upgrade "jupyterlab<2.0.0" && \
    pip3 install https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz --upgrade && \
    pip3 install -U kubeflow-kale && \
    jupyter labextension install kubeflow-kale-labextension

RUN echo "jovyan ALL=(ALL:ALL) NOPASSWD:ALL" > /etc/sudoers.d/jovyan
WORKDIR /home/jovyan
USER jovyan

CMD ["sh", "-c", \
     "jupyter lab --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser \
      --allow-root --port=8888 --LabApp.token='' --LabApp.password='' \
      --LabApp.allow_origin='*' --LabApp.base_url=${NB_PREFIX}"]

All the other imports seem to work

JupyterLab Markdown and Raw Cell Error

When a new markdown or raw cell are added to a notebook, an error repeatedly pops up every time you try to interact with the cell:

I've tried this with 3 images: minimal-notebook-cpu, machine-learning-notebook-cpu, and geomatics-notebook-cpu and they all had the same problem. This only seems to be a problem with newly created notebook servers as it was tested with a notebook from ~2 weeks ago and a machine-leaning-notebook-cpu from 43 days ago and no errors came up.

The following is how we added Kale atm:

StatCan/aaw-kubeflow-containers@746d058

Creater pipeline error in kale deployment panel : The connection to the server 10.96.0.1:443 was refused

env:

fx@user:~/docs/helm3/harbor$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

kubeflow version 0.7.0

usage: titanic-ml-dataset.ipynb
ipynb source: https://codelabs.developers.google.com/codelabs/cloud-kubeflow-minikf-kale/index.html?index=../..index#3

error info:

time="2020-01-16T02:43:17Z" level=info msg="Creating a docker executor"
time="2020-01-16T02:43:17Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/titanic-ml-4jphd-2517916513) with template:\n{\"name\":\"kale-marshal-volume\",\"inputs\":{},\"outputs\":{\"parameters\":[{\"name\":\"kale-marshal-volume-manifest\",\"valueFrom\":{\"jsonPath\":\"{}\"}},{\"name\":\"kale-marshal-volume-name\",\"valueFrom\":{\"jsonPath\":\"{.metadata.name}\"}},{\"name\":\"kale-marshal-volume-size\",\"valueFrom\":{\"jsonPath\":\"{.status.capacity.storage}\"}}]},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: v1\\nkind: PersistentVolumeClaim\\nmetadata:\\n  name: 'titanic-ml-4jphd-kale-marshal-pvc'\\nspec:\\n  accessModes:\\n  - ReadWriteMany\\n  resources:\\n    requests:\\n      storage: 1Gi\\n\"}}"
time="2020-01-16T02:43:17Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2020-01-16T02:43:17Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2020-01-16T02:43:17Z" level=error msg="executor error: The connection to the server 10.96.0.1:443 was refused - did you specify the right host or port?\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).ExecResource\n\t/go/src/github.com/argoproj/argo/workflow/executor/resource.go:62\ngithub.com/argoproj/argo/cmd/argoexec/commands.execResource\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:44\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewResourceCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:21\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2020-01-16T02:43:17Z" level=fatal msg="The connection to the server 10.96.0.1:443 was refused - did you specify the right host or port?\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).ExecResource\n\t/go/src/github.com/argoproj/argo/workflow/executor/resource.go:62\ngithub.com/argoproj/argo/cmd/argoexec/commands.execResource\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:44\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewResourceCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:21\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

reproduce

If you need additional information, please leave a message, I will provide it immediately. thanks

Kernel Error: ModuleNotFoundError: No module named 'kale.rpc'

Hi there,

after I updated my Dockerfile with the latest release:

RUN pip install kubeflow-kale
RUN jupyter labextension install kubeflow-kale-launcher

my connection to the notebook server is lost as I got 404: not found error.

So I tested locally by installing kale and kale extension:

jupyter labextension list        
                  
JupyterLab v1.1.4
Known labextensions:
   app dir: /Users/hong/.pyenv/versions/3.6.5/share/jupyter/lab
        kubeflow-kale-launcher v1.4.0  enabled  OK

However I got an error as below after calling jupyter lab

A Kernel Error has occurred
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:72.0) Gecko/20100101 Firefox/72.0
Type: Kernel
Method: log.setup_logging()
Message: ModuleNotFoundError: No module named 'kale.rpc'
Details: [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-1-2f8b27adf4e0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mkale\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrpc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mrun\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0m__kale_rpc_run\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0m__kale_rpc_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m__kale_rpc_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"log.setup_logging\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'e30='\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'e30='\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'kale.rpc'" ]

Any ideas how could this happen? Thanks!

Make `rok_gw_client` optional

Currently Kale will fail to execute if rok_gw_client is not installed. We should make sure that when this library is not present, ROK integration is disabled.

Mounting existing PVC not possible.

Hi all,

first of all thank you for this great project. I think it really adds a lot of value to the whole kubeflow experience. While playing around a bit with kale I experienced an issue while trying something out:

The idea was to see pipeline outputs as files directly in the jupyterlab. Therefore I mounted an RWX volume data-volume to my notebook pod in /home/jovyan/data. Then I wanted to pass the name of the volume to kale and define the mountpoint of the volume as /data.

The code in my pipeline was prepared to write some output to /data/filename. As the volume is RWX I assumed that the output would be directly visible in my jupyterlab File Browser. I think this workflow would really add to the Data Science experience and the convenience of using pipelines.

However, running this pipeline I encountered an error in the first pipeline step. The step faild with the following message:

This step is in Error state with this message: Pod "rwx-test2-et0i4-tkphz-2706597555" is invalid: [spec.volumes[2].name: Invalid value: "pvolume-ca6c4cec0854efe7746ed49b8661abc2a924aef77f23218ad5c43fc8258b0dfe": must be no more than 63 characters, spec.containers[0].volumeMounts[2].name: Not found: "pvolume-ca6c4cec0854efe7746ed49b8661abc2a924aef77f23218ad5c43fc8258b0dfe", spec.containers[1].volumeMounts[0].name: Not found: "pvolume-ca6c4cec0854efe7746ed49b8661abc2a924aef77f23218ad5c43fc8258b0dfe"]

I was wondering why this happens as the volume name pvolume-ca6c4cec0854efe7746ed49b8661abc2a924aef77f23218ad5c43fc8258b0dfe was obviously not the one I specified. I tried to track down the problem and came across the part in the pipline.py file where you create the volume in the pipeline:

def auto_generated_pipeline(vol_data='data-volume'):
    pvolumes_dict = OrderedDict()

    annotations = {}

    volume = dsl.PipelineVolume(pvc=vol_data)
    ...

This looked fine to me so I was wondering whether something goes wrong in dsl.PipelineVolume and indeed there seems to be a problem with how you call the dsl.PipelineVolume. In https://github.com/kubeflow/pipelines/blob/df4bc2365e9bfe01e06fb12ce407130ec598d7ce/sdk/python/kfp/dsl/_pipeline_volume.py#L71 there seems to be checked for a kwargs parameter name. If this parameter is not given a new one with a hash is generated.

Right now I am not exactly sure whether this is a kale issue or a pipelines issue, but I guess this is not the behavior you expected, right?

Automatic Volume detection and Snapshot

To streamline the deployment of a Pipeline when working in MiniKF, it would be great to automate the process of snapshotting the volumes used in the current Notebook Server, retrieve the associated ROK URL and use this information to mount the snapshots in the pipeline steps.

In this way the use does not have to worry about manually creating snapshots and adding Volumes entries in the Jupyter Kale UI.

Proposed execution steps:

Use kubectl to query the pods in the current namespace and retrieve the necessary volume mounts
Use ROK APIs to snapshot these volumes
Retrieve ROK URLs for the created snapshots
Make Kale create VolumesOps using rok annotation from step 3.

This code snipped in Python retrieves the Volume mounts of the current running Notebok Server:

import os
import json
import subprocess

pod_name = os.getenv("HOSTNAME")
container_name = os.getenv("NB_PREFIX").split('/')[-1]
home = os.getenv("HOME")

def getProcessOutput(cmd):
    process = subprocess.Popen(
        cmd,
        shell=True,
        stdout=subprocess.PIPE)
    process.wait()
    data, err = process.communicate()
    if process.returncode is 0:
        return data.decode('utf-8')
    else:
        print("Error:", err)
    return ""

pods = json.loads(getProcessOutput("kubectl get pods -o json"))

pod_spec = list(filter(lambda x: x['metadata']['name'] == pod_name, pods['items']))[0]['spec']
container_spec = list(filter(lambda x: x['name'] == container_name, pod_spec['containers']))
current_docker_image = container_spec[0]['image']
current_volume_mounts = container_spec[0]['volumeMounts']

Potential Issues

We are interested in snapshotting just the volumes mounted by the use (i.e. Workspace, Data volumes). We need a way to recognize these in the list of volumeMounts. When using MiniKF this could be easy, as the server name is a substring of the volume names of interest. But his might not be the case when not using MiniKF.

Support for kfp.dsl.ExitHandler

Loving being able to use the kale CLI to compile a pipeline but I don't think Kale currently supports ExitHandler operations. As we look to ship pipelines into more stable environments being able to alert on failure scenarios is a key concern which we cannot find a solution in Kale for.

Azure KeyVault Support

Hey,

We have started using Kale recently, and we are very happy with it.

We are mounting all of our secrets (and some general configuration) via Azure KeyVault (AKV) in order to keep everything secure.

Currently, we have a PodDefault defined for AKV which works perfect.

We have managed to manually integrate AKV with Kale by adding the following code to the .py file:

def attachAKV(phase):
    return phase.add_volume(k8s_client.V1Volume(name='akv', flex_volume=k8s_client.V1FlexVolumeSource(
        driver="azure/kv",
        secret_ref=k8s_client.V1LocalObjectReference(name="keyvaultcreds"),
        options={
            "keyvaultname": "keyvault-name-in-azure",
            "keyvaultobjectnames": "kubeflow-config;trains-config",
            "keyvaultobjectaliases": "secrets.json;trains.conf",
            "keyvaultobjecttypes": "secret;secret",
            "tenantid": "<tenantId>"
        })
    )).add_volume_mount(k8s_client.V1VolumeMount(
        mount_path="/secrets",
        name="akv",
        read_only=True
    ))

Then we just make count(tasks) calls attachAKV(THE_TASK).

We are thinking to make a PR that will add an official support, so that we won't need to manually add this to the .py file.

One idea was to have an external plugin that allows users to define their AKV settings (you can have different pipeline mounting different AKV settings), and then maybe add to the Kale UI an option to choose AKV settings from a list. Will it make sense?

Failed to enable Kale in tutorial

During Kubecon tutorial click enable in the kale tab produced a blank side pane instead of enabling Kale.

Below is the logs from chrome

Kubeflow metadata:
LeftPanelWidget.js:318 {docker_image: "gcr.io/arrikto-public/tensorflow-1.14.0-notebook-cpu:kubecon-workshop", experiment: {…}, experiment_name: "Titanic", pipeline_description: "Predict which passengers survived the Titanic shipwreck", pipeline_name: "titanic-ml", …}
NotebookUtils.js:272 Executing command: from kale.rpc.run import run as __kale_rpc_run
__kale_rpc_result = __kale_rpc_run("nb.explore_notebook", 'eyJzb3VyY2Vfbm90ZWJvb2tfcGF0aCI6InRpdGFuaWNfZGF0YXNldF9tbC5pcHluYiJ9')
NotebookUtils.js:272 Executing command: from kale.rpc.run import run as __kale_rpc_run
__kale_rpc_result = __kale_rpc_run("kfp.list_experiments", 'e30=')
NotebookUtils.js:272 Executing command: from kale.rpc.run import run as __kale_rpc_run
__kale_rpc_result = __kale_rpc_run("nb.list_volumes", 'e30=')
NotebookUtils.js:272 Executing command: from kale.rpc.run import run as __kale_rpc_run
__kale_rpc_result = __kale_rpc_run("nb.get_base_image", 'e30=')
react-dom.production.min.js:198 TypeError: Cannot read property 'style' of null
    at m (InlineMetadata.js:67)
    at InlineMetadata.js:15
    at Oo (react-dom.production.min.js:199)
    at ss (react-dom.production.min.js:218)
    at ls (react-dom.production.min.js:218)
    at fs (react-dom.production.min.js:233)
    at Zs (react-dom.production.min.js:249)
    at Js (react-dom.production.min.js:248)
    at Us (react-dom.production.min.js:245)
    at gs (react-dom.production.min.js:243)
To @ react-dom.production.min.js:198
Ho.n.callback @ react-dom.production.min.js:210
wo @ react-dom.production.min.js:193
_o @ react-dom.production.min.js:193
os @ react-dom.production.min.js:217
cs @ react-dom.production.min.js:220
(anonymous) @ react-dom.production.min.js:250
t.unstable_runWithPriority @ scheduler.production.min.js:18
Qs @ react-dom.production.min.js:250
Zs @ react-dom.production.min.js:249
Js @ react-dom.production.min.js:248
Us @ react-dom.production.min.js:245
gs @ react-dom.production.min.js:243
enqueueSetState @ react-dom.production.min.js:130
x.setState @ react.production.min.js:13
addMetadataInfo @ InlineCellMetadata.js:96
handleChange @ InlineCellMetadata.js:157
onChange @ InlineCellMetadata.js:167
t.T @ react-switch.min.js:1
t.S @ react-switch.min.js:1
t.a @ react-switch.min.js:1
InlineMetadata.js:67 Uncaught TypeError: Cannot read property 'style' of null
    at m (InlineMetadata.js:67)
    at InlineMetadata.js:15
    at Oo (react-dom.production.min.js:199)
    at ss (react-dom.production.min.js:218)
    at ls (react-dom.production.min.js:218)
    at fs (react-dom.production.min.js:233)
    at Zs (react-dom.production.min.js:249)
    at Js (react-dom.production.min.js:248)
    at Us (react-dom.production.min.js:245)
    at gs (react-dom.production.min.js:243)
m @ InlineMetadata.js:67
(anonymous) @ InlineMetadata.js:15
Oo @ react-dom.production.min.js:199
ss @ react-dom.production.min.js:218
ls @ react-dom.production.min.js:218
fs @ react-dom.production.min.js:233
Zs @ react-dom.production.min.js:249
Js @ react-dom.production.min.js:248
Us @ react-dom.production.min.js:245
gs @ react-dom.production.min.js:243
enqueueSetState @ react-dom.production.min.js:130
x.setState @ react.production.min.js:13
addMetadataInfo @ InlineCellMetadata.js:96
handleChange @ InlineCellMetadata.js:157
onChange @ InlineCellMetadata.js:167
t.T @ react-switch.min.js:1
t.S @ react-switch.min.js:1
t.a @ react-switch.min.js:1

Jupyterlab 2.0.1 support?

Hi,

Just wanted to know are there any plans for kubeflow-kale-launcher for JupyterLab version 2.0.1?

Got error when running basic example

Compiled and submitted the basic numpy example (base_example_numpy).
Got this error while running the pipeline:
"This step is in Pending state with this message: Unschedulable: persistentvolumeclaim "pipelines-persistence-volume-claim" not found"

Instructions for setting up Kale on Kubeflow

I watched the session from Kubecon and want to use Kale for auto-generation of KF pipelines..

This is my setup currently.

Azure Kubernetes Cluster (AKS) with two nodes
Kubeflow 0.7.0 installed on top of AKS
Created Notebook server on Kubeflow

I've a couple of follow up questions..
A) Will it be possible to use Kale under this setup ? What additional steps I should follow ?
B) Do I need to get JupyterLab instead of KF Notebook ? If yes, how do I connect that to Kubeflow, to run the pipelines ?

PVs stuck in pending state

Hi, first of all, congrats on this great initiative. Loving it so far.
Im experiencing some problems after running the automated generated pipelines, the pods seems to get unschedulable

I checke on the GCP console, and found the PVs on pending state.

Hope you can help
Thanks
Luis

Volumes & VolumeSnapshots: Rok Annotations & Rok Chooser

Volumes

Add a special (optional) annotation field for RokURL adding the following annotation to VolumeOp:

rok/origin: <RokURL>

VolumeSnapshots

Add a special (optional) annotation field for RokURL adding the following annotation to VolumeSnapshotOp:

rok/register: <RokURL>

Rok Chooser

Integrate Rok Chooser on both of these fields.

Run Kale on separate Kubernetes cluster from Kubeflow

My team has an existing Jupyterhub cluster we use for our org-wide notebooks. We also have a separate Kubeflow instance. Is it possible to run Kale in our Jupyterhub images, but have them send their jobs over to the Kubeflow cluster, even if they're not running within it?

cannot import name 'mlmd_utils' from 'kale.utils'

All,

just noticed, when i generate pipeline on latest kale 0.5.0 using following command:

kale --nb xxx.ipynb --kfp_host 'hostname/pipeline' --pipeline_name --upload_pipeline --run_pipeline --experiment_name

experiment failed with following error:

Traceback (most recent call last):
File "", line 188, in
File "", line 2, in data_import
ImportError: cannot import name 'mlmd_utils' from 'kale.utils' (/opt/conda/lib/python3.7/site-packages/kale/utils/init.py)

if i switch back to kubeflow-kale==0.4.0, it works ok.

is this a known / reported issue ?

Getting Started fails: TypeError: code() takes at least 14 arguments (13 given)

Hi,
I'm trying to run an example from Getting Started from README and it seems there's a bug in code:

$ pip install kubeflow-kale
...
$ wget https://raw.githubusercontent.com/kubeflow-kale/examples/master/titanic-ml-dataset/titanic_dataset_ml.ipynb
...

$ kale --nb titanic_dataset_ml.ipynb
03-03 22:35 | kubeflow-kale |  INFO: Pipeline code saved at titanic-ml-dq7ht.kale.py
Traceback (most recent call last):
  File "/home/ay/venv/bin/kale", line 10, in <module>
    sys.exit(main())
  File "/home/ay/venv/lib/python3.8/site-packages/kale/command_line.py", line 65, in main
    pipeline_package_path = kfp_utils.compile_pipeline(script_path, kale.pipeline_metadata['pipeline_name'])
  File "/home/ay/venv/lib/python3.8/site-packages/kale/utils/kfp_utils.py", line 52, in compile_pipeline
    spec.loader.exec_module(foo)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/tmpbaqy973a/pipeline_code.py", line 903, in <module>
    loaddata_op = comp.func_to_container_op(loaddata)
  File "/home/ay/venv/lib/python3.8/site-packages/kfp/components/_python_op.py", line 694, in func_to_container_op
    return _create_task_factory_from_component_spec(component_spec)
  File "/home/ay/venv/lib/python3.8/site-packages/kfp/components/_components.py", line 276, in _create_task_factory_from_component_spec
    task_factory = _dynamic.create_function_from_parameters(
  File "/home/ay/venv/lib/python3.8/site-packages/kfp/components/_dynamic.py", line 47, in create_function_from_parameters
    modified_code = types.CodeType(
TypeError: code() takes at least 14 arguments (13 given)

$ python --version
Python 3.8.1

$ pip freeze | grep kale
kubeflow-kale==0.4.0

Pipeline steps base image

Currently Kale uses the KFP default base image when executing pipeline steps.

The alternative could be to use the NotebookServer image.

A user running a Notebook and testing some code, is working on the env defined by the Notebook Server image. If we snapshot and mount the workspace to preserve the user environment, then it is reasonable to run the steps on the same image where the user develops the notebook.

It is not enough to use the default Kale Notebook Server image, as users might extend this image with custom images that add some custom packages.

The approach explained in #3 could be used to retrieve the image of the current container when Kale is executed in the NotebookServer.

Resource conversion Error while executing Kale Pipeline

Hey all, hope all is well~~
So after using the Kale Deployment Panel to "Compile and Run" a Jupyter notebook, I get this error while the Pipeline is executing.

Saving general object: input_fn
Traceback (most recent call last):
  File "<string>", line 886, in <module>
  File "<string>", line 870, in build_model
  File "/usr/local/lib/python3.6/dist-packages/kale/marshal/dispatchers.py", line 63, in __call__
    return self.dispatch(s)(s, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kale/marshal/resource_save.py", line 17, in resource_all
    dill.dump(o, f)
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 259, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 445, in dump
    StockPickler.dump(self, obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 1413, in save_function
    obj.__dict__, fkwdefaults), obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 1147, in save_cell
    pickler.save_reduce(_create_cell, (f,), obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 912, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 912, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
  File "/home/jovyan/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 884, in __reduce__
    return convert_to_tensor, (self._numpy(),)
**ValueError: Cannot convert a Tensor of dtype resource to a NumPy array.**

I believe the first line number is referencing pipeline.yaml?

 883             if "epoch_cnt" in locals(): 
 884                 _kale_resource_save(epoch_cnt, os.path.join( 
 885                     _kale_data_directory, "epoch_cnt")) 
 886             else: 
 887                 print("_kale_resource_save: `epoch_cnt` not found.")

Notes

The code does execute properly in jupyter notebook
Possible causes (grasping at straws):
- tf.lookup.StaticVocabularyTable(tf.lookup.KeyValueTensorInitializer(
ran "pip3 install --user tensorflow=2.0.0rc0" in the notebook terminal prior to compiling

Been stuck on this forever. Any help would be greatly appreciated. Thanks~~

Updates
Found the cause

vocab_table = tf.lookup.StaticVocabularyTable(
      tf.lookup.KeyValueTensorInitializer(
          [""] + vocab, range(0,vocab_rows_cnt + 1), key_dtype=tf.string, value_dtype=tf.int64), num_oov_buckets)

... later, this line that calls .lookup() is what causes the error
_items = vocab_table.lookup(_items)

Still digging around more to figure out a solution~~

A rok API URL must be provided when compiling and running titanic jupyter lab

I have installed a fresh minikf and followed the steps outlined here. When I compile and run the pipeline, the loaddata step fails with the message:

ValueError: A Rok API URL must be provided either via the url' argument or the ROK_GW_URL' environment variable.

Any idea what this is about? See below screenshot. Thanks.

Support for leveraging reusable components

Following this guide allows a user of the KFP SDK the ability to use the kfp.components.load_component_from_url API to pull in reusable components as a part of a pipeline. Maybe I've missed this feature with Kale but it'd be awesome to handle this maybe via Notebook Cell annotations?

Delete marshal PVC after run

When Kale creates a pipeline, it includes a step to create a marshal PVC for marshalling of data between pipeline steps. However, this PVC doesn't seem to be deleted after the run, which creates a lot of stale volumes. Would it be an idea to add a step to remove the marshal PVC as the final step in the pipeline?

Add support for Pipeline parameters

Add support for defining pipeline parameters from a notebook cell. This could be useful in case the notebook code is dependent on global variables.

Current Behavior

When the user needs to change one (or multiple) variables (e.g. training steps, batch size, layer size, ..) that produce different execution behavior, he needs to execute Kale over the notebook every time.

Desired Behavior

The user can define a number of variables as pipeline parameters (with default values), so that he can upload the pipeline (instead of running it) and then create runs from the KFP UI applying the desired parameter values.

Implementation

Add a global cell tag, similar to imports and functions called parameters.

When detecting a cell with the parameters tag, Kale will go through the following steps:

Verify that the cell code is comprised of only variables definitions, and that every variable has a default value (using the static analysis module)
Merge the code to other previously found parameters cells.
Convert these variables assignment into a dictionary of <var_name>:<default_value>.
Pass dictionary to pipeline template renderer
Render the pipeline function adding the pipeline arguments

Remove Kale marshal volume whenever possible

We want to remove Kale marshal volume and use workspace volume whenever possible.

Kale tries to use a marshal directory next to notebook file, let's say <nb_name>.marshal.
Kale will then check if that path is a subpath of some volume's mount point [1].

If it is, use that as marshal path
If it is not, fallback to /marshal and create a new volume

[1]:

https://stackoverflow.com/questions/3812849/how-to-check-whether-a-directory-is-a-sub-directory-of-another-directory

Deploying models for inference?

First off, thank you for your work on this project! I was wondering if there is a roadmap for the kale project? I'm specifically interested in how it might be possible to deploy models for inference using kale and Kubeflow. Are there any active development discussion groups that are open to the public?

ModuleNotFoundError: No module named 'kale

Is it mandatory to install kale in the images which we are tagging to the cells of ipynb .
As i see below code embedded into the functions which are mapped to the comp.func_to_container_op functions .

       from kale.utils import pod_utils
       from kale.marshal import resource_save as _kale_resource_save
       from kale.marshal import resource_load as _kale_resource_load

Compilation Error

I was trying to create a pipeline using Kale, but encountered this error. I don't seem to understand what is the problem here.
I have successfully ran the Titanic example.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/kale/rpc/run.py", line 92, in run
    result = func(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kale/rpc/nb.py", line 61, in compile_notebook
    pipeline_graph, pipeline_parameters = instance.notebook_to_graph()
  File "/usr/local/lib/python3.6/dist-packages/kale/core.py", line 225, in notebook_to_graph
    pipeline_graph, pipeline_parameters_code_block = parser.parse_notebook(self.notebook)
  File "/usr/local/lib/python3.6/dist-packages/kale/nbparser/parser.py", line 240, in parse_notebook
    assert current_block is not None
AssertionError

can't input experiment name or generate a experiment name

Hi there,
I created a kubenetes cluster, installed kubeflow, kale and manually do some pv mounting work and made kale whole process work.
Now I need to test kale in Jupyter hub+EnterpriseGateway environment. I build a new image of Jupyter Hub with Kale installed(
RUN pip3 install https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz --upgrade
RUN pip3 install -Iv kubeflow-kale==0.4.0). But I can't input the experiment name. the .py file is generated. The error message is:
An RPC Error has occurred
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36

Type: RPC

Method: kfp.run_pipeline()

Code: 6 (UnhandledError)

Transaction ID: x0bqopwxvn

Message: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Trailer': 'Grpc-Trailer-Content-Type', 'Date': 'Wed, 01 Jul 2020 19:27:39 GMT', 'Transfer-Encoding': 'chunked'}) HTTP response body: {"error":"Validate experiment request failed.: Invalid input error: Experiment name is empty. Please specify a valid experiment name.","message":"Validate experiment request failed.: Invalid input error: Experiment name is empty. Please specify a valid experiment name.","code":3,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Experiment name is empty. Please specify a valid experiment name.","error_details":"Validate experiment request failed.: Invalid input error: Experiment name is empty. Please specify a valid experiment name."}]}

Details: You can find more information under /home/jovyan/kale.log

The panel is attached as this

Kale mlmd_utils which failed to access the metadata on kubeflow mlpipeline.

Hello,

Thank you for upgrading kale (v.0.5.0)!!
I was surprised that kale is able to work with katib.

So ,Please I would like to your help .
I'm trying to compile/run the official base-candies example, but I have the issue that the Kale mlmd_utils which failed to access the metadata on kubeflow mlpipeline .
( I have Kale in a notebook server on kubeflow and not using miniKF/Rok. I tried mount a PVC that contains datas using the Volumes form of the Kale panel. )

if I switch back to kubeflow-kale==0.4.0 & kubeflow-kale-launcher, it works success.
is this a known cause or issue ? What happened?

Best regards.

Cell Metadata and Experiments not showing in Kale extension

I'm installing Kale in jupyter/tensorflow-notebook The cell tags are not showing in Kale UI. The behaviour of Kale is quite inconsistent as sometimes kale commands works just fine and other times it shows issue (perhaps) leading to experiments not being shown in the dropdown:

Traceback (most recent call last):
  File "/usr/local/bin/kale", line 7, in <module>
    from kale.command_line import main
  File "/usr/local/lib/python3.6/dist-packages/kale/command_line.py", line 2, in <module>
    import nbformat as nb
  File "/usr/local/lib/python3.6/dist-packages/nbformat/__init__.py", line 9, in <module>
    from ipython_genutils import py3compat
ModuleNotFoundError: No module named 'ipython_genutils'

pip list has one of its output as ipython-genutils 0.2.0 Note the hiphen and underscore difference.

I'm trying to setup Kale in Microk8s locally. The Jupyter version in the notebook version:
jupyter core : 4.6.1
jupyter-notebook : 6.0.0
qtconsole : not installed
ipython : 7.10.1
ipykernel : 5.1.3
jupyter client : 5.3.3
jupyter lab : 1.2.1
nbconvert : 5.6.1
ipywidgets : 7.5.1
nbformat : 4.4.0
traitlets : 4.3.3

Failed to define a PVC size using kale deployment panel

Hi there,

first of all thanks for developing such a great and useful tool!

I installed kale in my kubeflow notebook server based on GKE (with snapshot created) and cloned the titanic example to give it a try.

The pipeline can be successfully complied, and uploaded, however the loaddata component cannot be completed as there comes a warning of
This step is in Pending state with this message: Unschedulable: pod has unbound immediate PersistentVolumeClaims (repeated 2 times).

And here's the log file

11-20 17:50 | kubeflow-kale |  DEBUG: ------------- Kale Start Run -------------
11-20 17:50 | kubeflow-kale |  INFO: Pipeline code saved at kfp_titanic-ml-pipeline.kfp.py
11-20 17:50 | kubeflow-kale |  INFO: Deployment Successful. Pipeline run at None/#/runs/details/52a84d9e-b1bc-41cf-af40-cc87785c5b7f

It would be very appreciated if someone could maybe kindly point me out whether I configured the volumes correctly. Thanks!

Titanic Example: loaddata fails

Hello,

I'm trying to compile/execute the official titanic-ml example, but I have the issue that the python packages which I import under the tag "imports" cannot be found when I run the pipeline. For example, in the step loaddata, I get

ModuleNotFoundError: No module named 'seaborn'

So does that mean that kale does not take care of making sure that all imported packages are actually installed and that I have to do it myself by either

using subprocess inside my code
referring to a docker image that has the package preinstalled? In this case, what if my docker image is in a private repository, do I then need to manually edit the generated .kale.py script and add image pull secrets?

(I also noticed that in the Kale panel, the option "Use this notebook's volumes" is disabled and I cannot enable it on the UI. Could these issues be related to one another?)

Would be very grateful for some advice! I'm using Kubeflow 0.7 and my notebook server is based on the image gcr.io/arrikto-public/tensorflow-1.14.0-notebook-cpu:kubecon-workshop. The kubeflow-kale version is 0.3.4. If you need any additional info, I'd be happy to provide them.

Help setting up volumes to access data

I have Kale in a custom notebook image on kubeflow and not using miniKF/Rok.
There is no documentation, so I'm a little stuck on how to set things up so I can run a pipeline that can access data?

Development Docker Image Bugs: Bashrc error and Incorrect Pipeline Step Base Image

Hi, I would like to run the latest version of the labextension (with the metrics support), but this is not yet available in the NPM repo that stores the extension. I also understand that I need the latest version of the Kale backend in order to to run everything with the metrics feature.

To try this out before the latest version is released and updated in Pip/NPM, I simply built a dev image from Dockerfile.dev, which includes the latest development version of Kale backend and labextension (now in this repo). But there were some issues with the resulting image:

The /etc/bash.bashrc file seems to have some syntax errors that cause issues when opening a terminal in Jupyterlab. Specifically:

bash: /etc/bash.bashrc: line 40: syntax error near unexpected token `elif'
'ash: /etc/bash.bashrc: line 40: `  elif [ -f /etc/bash_completion ]; then

The Kale extension seems to assume a default Kubeflow image (not the custom built kale dev image) in the compiled YAML file. There seems to be a closed issue for this, perhaps I am missing something?

Thanks for this great project. I've tried everything with the gcr.io/arrikto-public/tensorflow-1.14.0-notebook-cpu:kubecon-workshop image and I am beyond impressed with how easy Kale makes it to build KF pipelines.

Configurable storageClass name

It would be great if from the UI you could configure the storageClass name used for the marshall.
At present it will use the cluster default, e.g. "standard" which causes issues if you have set up a new storageClass say for NFS in your cluster.

Support data sharing between notebook server and Kubeflow pipeline in k8s

Proposed change

In current version, the data of the notebook server can't be passed to Kubeflow pipeline directly because the notebook server is user profile namespace while the pipeline run is in kubeflow namespace. Current only miniKF has rok can handle the data passing. But in current k8s, we haven't rok. So I solved this problem by connecting two pvcs in two namespaces manually or using volumesnapshot. These two methods both need k8s administrator right.

Proposed change

Integrate the rok related data sharing method into Kale which can be used in k8s.

Alternative options

Who would use this feature?

Anyone wants to use Kale in normal Kubernetes cluster.

Suggest a solution

Support it with multi-user kubeflow pipeline together. For each user defined in user profile, there is a corresponding namespace, this notebook server and pipeline are both in the user namespace other than one in user namespace, one in kubeflow. Then the data can be shared between notebook and kubeflow pipeline.

Cannot access to google cloud bucket using pipeline

Hi there,

I got a permission error when I was trying to write the results to gcs bucket within a pipeline function using kale:

Traceback (most recent call last):
  File "<string>", line 93, in <module>
  File "<string>", line 83, in generate_data
  File "<string>", line 74, in generate_data
  File "/opt/conda/lib/python3.7/site-packages/google/cloud/storage/client.py", line 305, in get_bucket
    bucket.reload(client=self)
  File "/opt/conda/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 140, in reload
    _target_object=self,
  File "/opt/conda/lib/python3.7/site-packages/google/cloud/_http.py", line 421, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/bucket-data?projection=noAcl: Primary: /namespaces/kube.svc.id.goog with additional claims does not have storage.buckets.get access to bucket-data.

This feels like a lack of proper gcp-secret, I'm using kubeflow 0.7 on GCP and the workload identity works fine in my notebook pod. However as the pipeline component doesn't run in the same pod.
I want to ask does kale support the kfp.gcp module now? Or is there a way to set the private key as a pipeline parameter?

Thanks in advance!

kubeflow-kale / kale Goto Github PK

kale's Introduction

Getting started

FAQ

Resources

Contribute

Backend

Labextension

Git Hooks

kale's People

Contributors

Stargazers

Watchers

Forkers

kale's Issues

Possible solutions

env:

error info:

reproduce

Volumes

VolumeSnapshots

Rok Chooser

Current Behavior

Desired Behavior

Implementation

Proposed change

Proposed change

Alternative options

Who would use this feature?

Suggest a solution

Recommend Projects

Recommend Topics

Recommend Org