mlrun / functions Goto Github PK

MLRun template functions and examples

License: Apache License 2.0

Jupyter Notebook 96.92% Python 3.04% HTML 0.04%

functions's Introduction

Using MLRun

MLRun is an open MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications, significantly reducing engineering efforts, time to production, and computation resources. With MLRun, you can choose any IDE on your local machine or on the cloud. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous improvements.

Get started with MLRun Tutorials and Examples, Installation and setup guide, or read about MLRun Architecture.

This page explains how MLRun addresses the MLOps Tasks and the MLRun core components.

MLOps tasks

The MLOps development workflow section describes the different tasks and stages in detail. MLRun can be used to automate and orchestrate all the different tasks or just specific tasks (and integrate them with what you have already deployed).

Project management and CI/CD automation

In MLRun the assets, metadata, and services (data, functions, jobs, artifacts, models, secrets, etc.) are organized into projects. Projects can be imported/exported as a whole, mapped to git repositories or IDE projects (in PyCharm, VSCode, etc.), which enables versioning, collaboration, and CI/CD. Project access can be restricted to a set of users and roles.

See: Docs: Projects and Automation, CI/CD Integration, Tutorials: Quick start, Automated ML Pipeline, Video: Quick start.

Ingest and process data

MLRun provides abstract interfaces to various offline and online data sources, supports batch or realtime data processing at scale, data lineage and versioning, structured and unstructured data, and more. In addition, the MLRun Feature Store automates the collection, transformation, storage, catalog, serving, and monitoring of data features across the ML lifecycle and enables feature reuse and sharing.

See: Docs: Ingest and process data, Feature Store, Data & Artifacts; Tutorials: Quick start, Feature Store.

Develop and train models

MLRun allows you to easily build ML pipelines that take data from various sources or the Feature Store and process it, train models at scale with multiple parameters, test models, tracks each experiments, register, version and deploy models, etc. MLRun provides scalable built-in or custom model training services, integrate with any framework and can work with 3rd party training/auto-ML services. You can also bring your own pre-trained model and use it in the pipeline.

See: Docs: Develop and train models, Model Training and Tracking, Batch Runs and Workflows; Tutorials: Train, compare, and register models, Automated ML Pipeline; Video: Train and compare models.

Deploy models and applications

MLRun rapidly deploys and manages production-grade real-time or batch application pipelines using elastic and resilient serverless functions. MLRun addresses the entire ML application: intercepting application/user requests, running data processing tasks, inferencing using one or more models, driving actions, and integrating with the application logic.

See: Docs: Deploy models and applications, Realtime Pipelines, Batch Inference, Tutorials: Realtime Serving, Batch Inference, Advanced Pipeline; Video: Serving pre-trained models.

Monitor and alert

Observability is built into the different MLRun objects (data, functions, jobs, models, pipelines, etc.), eliminating the need for complex integrations and code instrumentation. With MLRun, you can observe the application/model resource usage and model behavior (drift, performance, etc.), define custom app metrics, and trigger alerts or retraining jobs.

See: Docs: Monitor and alert, Model Monitoring Overview, Tutorials: Model Monitoring & Drift Detection.

MLRun core components

MLRun includes the following major components:

Project Management: A service (API, SDK, DB, UI) that manages the different project assets (data, functions, jobs, workflows, secrets, etc.) and provides central control and metadata layer.

Functions: automatically deployed software package with one or more methods and runtime-specific attributes (such as image, libraries, command, arguments, resources, etc.).

Data & Artifacts: Glueless connectivity to various data sources, metadata management, catalog, and versioning for structures/unstructured artifacts.

Feature Store: automatically collects, prepares, catalogs, and serves production data features for development (offline) and real-time (online) deployment using minimal engineering effort.

Batch Runs & Workflows: Execute one or more functions with specific parameters and collect, track, and compare all their results and artifacts.

Real-Time Serving Pipeline: Rapid deployment of scalable data and ML pipelines using real-time serverless technology, including API handling, data preparation/enrichment, model serving, ensembles, driving and measuring actions, etc.

Real-Time monitoring: monitors data, models, resources, and production components and provides a feedback loop for exploring production data, identifying drift, alerting on anomalies or data quality issues, triggering retraining jobs, measuring business impact, etc.

functions's People

Contributors

Stargazers

Watchers

functions's Issues

describe cpu function slow

describe is slow for the most part due to seaborn pairplot. The seaborn sourcecode has a comment suggesting plot set-up is the time-consuming part.

Needed is a new parameter that limits the number of items plotted:

only those where the distributions are significantly different, plus
an upper limit

tf2_serving_v2 Notebook update

I was going through the tf2_serving_v2 notebook and i noticed that methods that were mentioned in text cell above were actually from tf2_serving notebook. It led to some confusion in the start and once i had gone through both of these notebooks i could tell the difference, maybe you can look into this and update it. I will add some snippets from tf2_serving_v2 notebook for your reference.

Create "Live" Test-Model function (vs. Running endpoint)

We would like to be able and test a running model endpoint.

The function should receive:

Test dataset
Endpoint

Tests:

Is the endpoint working ? (i.e. receiving 200)
- If we get errors - what are they? (stacktrace)
Quality: Model evaluation metrics....
Performance: Latency, request time, etc...

test/debug dask functions

none of the dask functions have been tested in this new repo

readme per function with use cases

Each function should have its own README file describing the correct usage, options and provide use cases

All inputs should support MLRun DataItem

We need to verify that all our functions support receiving tables and configurations as MLRun's DataItem (input).

Some functions still support receiving tables / files via filepath string only.

This is specially important since we are adding more capabilities to MLRun's Artifact store and specially the Project / DB Level artifacts which will be saved under store://.

Fix describe function

When running the describe function with mlrun/mlrun-models:0.4.5 the following error occurs:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/mlrun/runtimes/local.py", line 179, in exec_from_params
    val = handler(*args_list)
  File "main.py", line 53, in describe
    snsplt = sns.pairplot(table, hue=label_column)
  File "/opt/conda/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2121, in pairplot
    grid.map_diag(kdeplot, **diag_kws)
  File "/opt/conda/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1490, in map_diag
    func(data_k, label=label_k, color=color, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py", line 705, in kdeplot
    cumulative=cumulative, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py", line 295, in _univariate_kdeplot
    cumulative=cumulative)
  File "/opt/conda/lib/python3.7/site-packages/seaborn/distributions.py", line 367, in _statsmodels_univariate_kde
    kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
  File "/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py", line 140, in fit
    clip=clip, cut=cut)
  File "/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py", line 453, in kdensityfft
    bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
  File "/opt/conda/lib/python3.7/site-packages/statsmodels/nonparametric/bandwidths.py", line 174, in select_bandwidth
    raise RuntimeError(err)
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.

Function yaml:

kind: job
metadata:
  name: describe
  tag: ''
  hash: 5eb3219b3dd8c944bf6427ad37de31511c07d944
  project: network-operations
  labels:
    author: yjb
    stage: development
  categories:
  - models
  - graphics
spec:
  command: ''
  args: []
  image: mlrun/ml-models:0.4.5
  volumes:
  - flexVolume:
      driver: v3io/fuse
      options:
        accessKey: <...>
        container: users
        subPath: /admin
    name: v3io
  volume_mounts:
  - mountPath: /User
    name: v3io
  env:
  - name: V3IO_API
    value: v3io-webapi.default-tenant.svc:8081
  - name: V3IO_USERNAME
    value:  <...>
  - name: V3IO_ACCESS_KEY
    value: <...>
  default_handler: describe
  description: ''
  build:
    functionSourceCode: IyBHZW5lcmF0ZWQgYnkgbnVjbGlvLmV4cG9ydC5OdWNsaW9FeHBvcnRlciBvbiAyMDIwLTAzLTE3IDE1OjQzCgppbXBvcnQgb3MKaW1wb3J0IG51bXB5IGFzIG5wCmltcG9ydCBwYW5kYXMgYXMgcGQKaW1wb3J0IG1hdHBsb3RsaWIucHlwbG90IGFzIHBsdAppbXBvcnQgc2VhYm9ybiBhcyBzbnMKCmZyb20gbWxydW4uZXhlY3V0aW9uIGltcG9ydCBNTENsaWVudEN0eApmcm9tIG1scnVuLmRhdGFzdG9yZSBpbXBvcnQgRGF0YUl0ZW0KZnJvbSBtbHJ1bi5hcnRpZmFjdHMgaW1wb3J0IFBsb3RBcnRpZmFjdCwgVGFibGVBcnRpZmFjdAoKZnJvbSBza2xlYXJuLnByZXByb2Nlc3NpbmcgaW1wb3J0IFN0YW5kYXJkU2NhbGVyCmZyb20geWVsbG93YnJpY2sgaW1wb3J0IENsYXNzQmFsYW5jZQoKZnJvbSB0eXBpbmcgaW1wb3J0IElPLCBBbnlTdHIsIFVuaW9uLCBMaXN0LCBPcHRpb25hbAoKcGQuc2V0X29wdGlvbigiZGlzcGxheS5mbG9hdF9mb3JtYXQiLCBsYW1iZGEgeDogIiUuMmYiICUgeCkKCmRlZiBfZ2NmX2NsZWFyKHBsdCk6CiAgICBwbHQuY2xhKCkKICAgIHBsdC5jbGYoKQogICAgcGx0LmNsb3NlKCkgCgpkZWYgZGVzY3JpYmUoCiAgICBjb250ZXh0OiBNTENsaWVudEN0eCwKICAgIHRhYmxlOiBVbmlvbltEYXRhSXRlbSwgc3RyXSwKICAgIGxhYmVsX2NvbHVtbjogc3RyLAogICAgY2xhc3NfbGFiZWxzOiBMaXN0W3N0cl0gPSBOb25lLAogICAga2V5OiBzdHIgPSAidGFibGUtc3VtbWFyeSIsCiAgICBwbG90X2hpc3Q6IGJvb2wgPSBUcnVlLAogICAgcGxvdHNfZGVzdDogc3RyID0gJ3Bsb3RzJwopIC0+IE5vbmU6CiAgICAiIiJTdW1tYXJpemUgYSB0YWJsZQoKICAgIFRPRE86IG1lcmdlIHdpdGggZGFzayB2ZXJzaW9uCgogICAgOnBhcmFtIGNvbnRleHQ6ICAgICAgICAgdGhlIGZ1bmN0aW9uIGNvbnRleHQKICAgIDpwYXJhbSB0YWJsZTogICAgICAgICAgIHBhbmRhcyBkYXRhZnJhbWUKICAgIDpwYXJhbSBsYWJlbF9jb2x1bW46ICAgIGdyb3VuZCB0cnV0aCBjb2x1bW4gbGFiZWwKICAgIDpwYXJhbSBrZXk6ICAgICAgICAgICAgIGtleSBvZiB0YWJsZSBzdW1tYXJ5IGluIGFydGlmYWN0IHN0b3JlCiAgICA6cGFyYW0gcGxvdF9oaXN0OiAgICAgICAoVHJ1ZSkgc2V0IHRoaXMgdG8gRmFsc2UgZm9yIGxhcmdlIHRhYmxlcwogICAgOnBhcmFtIHBsb3RzX2Rlc3Q6ICAgICAgZGVzdGluYXRpb24gZm9sZGVyIG9mIHN1bW1hcnkgcGxvdHMgKHJlbGF0aXZlIHRvIGFydGlmYWN0X3BhdGgpCiAgICAiIiIKICAgIGJhc2VfcGF0aCA9IGNvbnRleHQuYXJ0aWZhY3RfcGF0aAogICAgb3MubWFrZWRpcnMoYmFzZV9wYXRoLCBleGlzdF9vaz1UcnVlKQogICAgb3MubWFrZWRpcnMob3MucGF0aC5qb2luKGJhc2VfcGF0aCwgcGxvdHNfZGVzdCksIGV4aXN0X29rPVRydWUpCiAgICAKICAgIHRhYmxlID0gcGQucmVhZF9wYXJxdWV0KHN0cih0YWJsZSkpCiAgICBoZWFkZXIgPSB0YWJsZS5jb2x1bW5zLnZhbHVlcwoKICAgIF9nY2ZfY2xlYXIocGx0KQogICAgc25zcGx0ID0gc25zLnBhaXJwbG90KHRhYmxlLCBodWU9bGFiZWxfY29sdW1uKQogICAgc25zcGx0LnNhdmVmaWcob3MucGF0aC5qb2luKGJhc2VfcGF0aCwgZiJ7cGxvdHNfZGVzdH0vaGlzdC5wbmciKSkKICAgIGNvbnRleHQubG9nX2FydGlmYWN0KFBsb3RBcnRpZmFjdCgiaGlzdG9ncmFtcyIsICBib2R5PXBsdC5nY2YoKSksIGxvY2FsX3BhdGg9ZiJ7cGxvdHNfZGVzdH0vaGlzdC5odG1sIikKICAgIAogICAgc3VtdGJsID0gdGFibGUuZGVzY3JpYmUoKQogICAgc3VtdGJsID0gc3VtdGJsLmFwcGVuZChsZW4odGFibGUuaW5kZXgpLXRhYmxlLmNvdW50KCksIGlnbm9yZV9pbmRleD1UcnVlKQogICAgc3VtdGJsLmluc2VydCgwLCAibWV0cmljIiwgWyJjb3VudCIsICJtZWFuIiwgInN0ZCIsICJtaW4iLCIyNSUiLCAiNTAlIiwgIjc1JSIsICJtYXgiLCAibmFucyJdKQogICAgCiAgICBzdW10YmwudG9fY3N2KG9zLnBhdGguam9pbihiYXNlX3BhdGgsIGtleSsiLmNzdiIpLCBpbmRleD1GYWxzZSkKICAgIGNvbnRleHQubG9nX2FydGlmYWN0KGtleSwgbG9jYWxfcGF0aD1rZXkrIi5jc3YiKQoKICAgIF9nY2ZfY2xlYXIocGx0KQogICAgCiAgICBsYWJlbHMgPSB0YWJsZS5wb3AobGFiZWxfY29sdW1uKQogICAgY2xhc3NfYmFsYW5jZV9tb2RlbCA9IENsYXNzQmFsYW5jZShsYWJlbHM9Y2xhc3NfbGFiZWxzKQogICAgY2xhc3NfYmFsYW5jZV9tb2RlbC5maXQobGFiZWxzKQogICAgCiAgICBzY2FsZV9wb3Nfd2VpZ2h0ID0gY2xhc3NfYmFsYW5jZV9tb2RlbC5zdXBwb3J0X1swXS9jbGFzc19iYWxhbmNlX21vZGVsLnN1cHBvcnRfWzFdCiAgICBjb250ZXh0LmxvZ19hcnRpZmFjdCgic2NhbGVfcG9zX3dlaWdodCIsIGYie3NjYWxlX3Bvc193ZWlnaHQ6MC4yZn0iKQoKICAgIGNsYXNzX2JhbGFuY2VfbW9kZWwuc2hvdyhvdXRwYXRoPW9zLnBhdGguam9pbihiYXNlX3BhdGgsIGYie3Bsb3RzX2Rlc3R9L2ltYmFsYW5jZS5wbmciKSkKICAgIGNvbnRleHQubG9nX2FydGlmYWN0KFBsb3RBcnRpZmFjdCgiaW1iYWxhbmNlIiwgYm9keT1wbHQuZ2NmKCkpLCBsb2NhbF9wYXRoPWYie3Bsb3RzX2Rlc3R9L2ltYmFsYW5jZS5odG1sIikKICAgIAogICAgX2djZl9jbGVhcihwbHQpCiAgICB0Ymxjb3JyID0gdGFibGUuY29ycigpCiAgICBheCA9IHBsdC5heGVzKCkKICAgIHNucy5oZWF0bWFwKHRibGNvcnIsIGF4PWF4LCBhbm5vdD1GYWxzZSwgY21hcD1wbHQuY20uUmVkcykKICAgIGF4LnNldF90aXRsZSgiZmVhdHVyZXMgY29ycmVsYXRpb24iKQogICAgcGx0LnNhdmVmaWcob3MucGF0aC5qb2luKGJhc2VfcGF0aCwgZiJ7cGxvdHNfZGVzdH0vY29yci5wbmciKSkKICAgIGNvbnRleHQubG9nX2FydGlmYWN0KFBsb3RBcnRpZmFjdCgiY29ycmVsYXRpb24iLCAgYm9keT1wbHQuZ2NmKCkpLCBsb2NhbF9wYXRoPWYie3Bsb3RzX2Rlc3R9Y29yci5odG1sIikKICAgIAogICAKICAgIF9nY2ZfY2xlYXIocGx0KQoK
    commands: []
    code_origin: https://github.com/mlrun/functions.git#b0ba922e10fa0af2fcf2e04cce51eb7d9243bf25:describe/describe.ipynb

And the following task:

summ_task = NewTask(
    "sum", 
    handler="describe",  
    params={"key": "summary", 
            "label_column": "is_error", 
            'class_labels': ['0', '1'],
            'plot_hist': True,
            'plot_dest': 'plots'},
    inputs={"table": os.path.join(project_dir, 'data', 'aggregate.pq')},
    artifact_path=ARTIFACT_PATH)

open_archive only top folder for zipfile

tar.gz and zip files quite often contain nested archives. tarfile recursively extracts files, zipfile doesn't and requires an extra step. This leaves the artifact contents incompletely extracted.
In either case folder and final file locations aren't clear by default.
We may want to log all the nested file contents as artifacts, particularly if they are tables, if they are layered images (image_blue, image_red, image_green) we may want to generate a metadata description summarizing these findings...
(idea: if 3. is the case, and the files represent keys and data, normalized database tables, the extract process might also recommend possible joins)

remove makedirs from functions

remove redundancy as log_artifact creates the path if it doesn't exist

Batch Inference V2 tensor flow model

I am trying to find the drift detection using batch inference V2 for the tensor flow model. Facing issue while passing the data in np.ndarry . So i am saving my data in .npy artifact and mapping it to input data set (also tried passing it into param ) . It is always throwing error .

I am passing

run = project.run_function(
batch_inference_function,
inputs={"dataset": trainer_run.outputs['prediction_set']},
params={

    "model_path": trainer_run.outputs['model']
},
local=True

)

#here trainer_run.outputs['prediction_set'] is in .npy format

File "/opt/conda/lib/python3.9/site-packages/mlrun/runtimes/local.py", line 441, in exec_from_params
val = mlrun.handler(
File "/opt/conda/lib/python3.9/site-packages/mlrun/package/init.py", line 140, in wrapper
func_outputs = func(*args, **kwargs)
File "/tmp/tmpqwz0zi74.py", line 209, in infer
x, label_columns = mlrun.model_monitoring.api.read_dataset_as_dataframe(
File "/opt/conda/lib/python3.9/site-packages/mlrun/model_monitoring/api.py", line 599, in read_dataset_as_dataframe
dataset = dataset.as_df()
File "/opt/conda/lib/python3.9/site-packages/mlrun/datastore/base.py", line 510, in as_df
df = self._store.as_df(
File "/opt/conda/lib/python3.9/site-packages/mlrun/datastore/base.py", line 283, in as_df
raise Exception(f"file type unhandled {url}")
Exception: file type unhandled s3://mlrun/projects/temp-pipeline-jovyan/artifacts/trainer-train/0/prediction_set.npy

2024-03-09 01:42:04,268 [error] exec error - file type unhandled s3://mlrun/projects/temp-pipeline-jovyan/artifacts/trainer-train/0/prediction_set.npy

Please provide the demo for tensor flow libraries as well