Deploying Vision Models (TensorFlow) from 🤗 Transformers

By Chansung Park and Sayak Paul

This repository shows various ways of deploying a vision model (TensorFlow) from 🤗 Transformers using the TensorFlow Ecosystem. In particular, we use TensorFlow Serving (for local deployment), Vertex AI (serveless deployment), Kubernetes and GKE (more controlled deployment) with TensorFlow Serving and ONNX.

For this project, we leverage Google Cloud Platform for using managed services Vertex AI and GKE.

Methods covered

Local TensorFlow Serving | Blog post from 🤗
- We cover how to locally deploy a Vision Transformer (ViT) model from 🤗 Transformers with TensorFlow Serving.
- With this, you will be able to serve your own machine learning models in a standalone Python application.
TensorFlow Serving on Kubernetes (GKE) | Blog post from 🤗
- We cover how to build a custom TensorFlow Serving Docker image with Vision Transformer (ViT) model from 🤗 Transformers, provision Google Kubernetes Engine(GKE) cluster, deploy the Docker image to the GKE cluster.
- Particularly, we cover Kubernetes specific topics such as creating Deployment/Service/HPA Kubernetes objects for scalable deployment of the Docker image to the nodes(VMs) and expose them as a service to clients.
- With this, you will be able to serve and scale your own machine learning models according to the CPU utilizations of the deployment as a whole.
- We provide utilities to perform load-test with Locust and visualization notebook as well. Refer here for more details.
ONNX on Kubernetes (GKE)
- The workflow here is similar to the above one but here we used an ONNX-optimized version of the ViT model.
- ONNX is particularly useful when you're deploying models using x86 CPUs.
- This workflow doesn't require you to build any custom TF Serving image.
- One important thing to keep in mind is to generate the ONNX model in a machine type which is the same as the deployment hardware. This means if you're going to use the n1-standard-8 machine type for deployment, generate the ONNX model in the same machine type to ensure ONNX optimizations are relevant.
Vertex AI Prediction | Blog post from 🤗
- We cover how to deploy Vision Transformer (ViT) model from 🤗 Transformers to Google Cloud's fully managed machine learning deployment service (Vertex AI Prediction).
- Under the hood, Vertex AI Prediction leverages all the technologies from GKE, TensorFlow Serving, and more.
- That means you can deploy and scale the deployment of machine learning models, but you don't need to worry about building a custom Docker image or writing Kubernetes-specific manifests, or setting up model monitoring capability.
- With this, you will be able to serve and scale your own machine learning model by calling various APIs from google-cloud-aiplatform SDK to interact with Vertex AI.
- We provide utilities to perform load-test with Locust. Refer here for more details.
Vertex AI Prediction (w/ optimized TFRT)
- TBD
- Know more about the optimized TFRT(TensorFlow RunTime) here.

Acknowledgements

We're thankful to the ML Developer Programs team at Google that provided GCP support.

methodology to load-test on GKE

This issue is to brainstorm how to perform load-test the TF Serving deployed on GKE.

To start, I think it is good to choose over spec/configs of VM, TFServing itself, and batch inference. For instance:

number of pods(nodes): 1, 2, 4, 8, ...
number of vCPU: 16
RAM Capacity: 64GB
TFServing
- intra_op_parallelism_threads: equals to the number of vCPU
- inter_op_parallelism_threads: really high number (128) -> ... -> 4
batch inference
- max_batch_size: really high number (1024) -> ... -> 16
- batch_timeout_micros: 1000000 (1 second) -> 100000 -> ... -> 10
- num_batch_threads: equals to the number of vCPU
- max_enqueued_batches: really high number (128) -> ... -> equals to the num_batch_threads

We can gradually reduce the spec of VM and values of each configurations, then we can figure out which one is optimal.

The values above are chosen randomly without any solid principle. Some of them are referenced from this official doc though.

sayakpaul / deploy-hf-tf-vision-models Goto Github PK

deploy-hf-tf-vision-models's Introduction

Deploying Vision Models (TensorFlow) from 🤗 Transformers

Methods covered

Acknowledgements

deploy-hf-tf-vision-models's People

Contributors

Stargazers

Watchers

Forkers

deploy-hf-tf-vision-models's Issues

Recommend Projects

Recommend Topics

Recommend Org