Giter VIP home page Giter VIP logo

sayakpaul / deploy-hf-tf-vision-models Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 2.0 888 KB

This repository shows various ways of deploying a vision model (TensorFlow) from ๐Ÿค— Transformers.

License: Apache License 2.0

Jupyter Notebook 95.67% Shell 1.73% Python 2.46% Dockerfile 0.13%
huggingface keras kubernetes tensorflow tfserving transformers vertex-ai vision-transformers autoscaling docker

deploy-hf-tf-vision-models's Introduction

Deploying Vision Models (TensorFlow) from ๐Ÿค— Transformers

By Chansung Park and Sayak Paul

This repository shows various ways of deploying a vision model (TensorFlow) from ๐Ÿค— Transformers using the TensorFlow Ecosystem. In particular, we use TensorFlow Serving (for local deployment), Vertex AI (serveless deployment), Kubernetes and GKE (more controlled deployment) with TensorFlow Serving and ONNX.

For this project, we leverage Google Cloud Platform for using managed services Vertex AI and GKE.

Methods covered

  • Local TensorFlow Serving | Blog post from ๐Ÿค—

    • We cover how to locally deploy a Vision Transformer (ViT) model from ๐Ÿค— Transformers with TensorFlow Serving.
    • With this, you will be able to serve your own machine learning models in a standalone Python application.
  • TensorFlow Serving on Kubernetes (GKE) | Blog post from ๐Ÿค—

    • We cover how to build a custom TensorFlow Serving Docker image with Vision Transformer (ViT) model from ๐Ÿค— Transformers, provision Google Kubernetes Engine(GKE) cluster, deploy the Docker image to the GKE cluster.
    • Particularly, we cover Kubernetes specific topics such as creating Deployment/Service/HPA Kubernetes objects for scalable deployment of the Docker image to the nodes(VMs) and expose them as a service to clients.
    • With this, you will be able to serve and scale your own machine learning models according to the CPU utilizations of the deployment as a whole.
    • We provide utilities to perform load-test with Locust and visualization notebook as well. Refer here for more details.
  • ONNX on Kubernetes (GKE)

    • The workflow here is similar to the above one but here we used an ONNX-optimized version of the ViT model.
    • ONNX is particularly useful when you're deploying models using x86 CPUs.
    • This workflow doesn't require you to build any custom TF Serving image.
    • One important thing to keep in mind is to generate the ONNX model in a machine type which is the same as the deployment hardware. This means if you're going to use the n1-standard-8 machine type for deployment, generate the ONNX model in the same machine type to ensure ONNX optimizations are relevant.
  • Vertex AI Prediction | Blog post from ๐Ÿค—

    • We cover how to deploy Vision Transformer (ViT) model from ๐Ÿค— Transformers to Google Cloud's fully managed machine learning deployment service (Vertex AI Prediction).
    • Under the hood, Vertex AI Prediction leverages all the technologies from GKE, TensorFlow Serving, and more.
    • That means you can deploy and scale the deployment of machine learning models, but you don't need to worry about building a custom Docker image or writing Kubernetes-specific manifests, or setting up model monitoring capability.
    • With this, you will be able to serve and scale your own machine learning model by calling various APIs from google-cloud-aiplatform SDK to interact with Vertex AI.
    • We provide utilities to perform load-test with Locust. Refer here for more details.
  • Vertex AI Prediction (w/ optimized TFRT)

    • TBD
    • Know more about the optimized TFRT(TensorFlow RunTime) here.

Acknowledgements

We're thankful to the ML Developer Programs team at Google that provided GCP support.

deploy-hf-tf-vision-models's People

Contributors

deep-diver avatar sayakpaul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deploy-hf-tf-vision-models's Issues

methodology to load-test on GKE

This issue is to brainstorm how to perform load-test the TF Serving deployed on GKE.

To start, I think it is good to choose over spec/configs of VM, TFServing itself, and batch inference. For instance:

  • number of pods(nodes): 1, 2, 4, 8, ...
  • number of vCPU: 16
  • RAM Capacity: 64GB
  • TFServing
    • intra_op_parallelism_threads: equals to the number of vCPU
    • inter_op_parallelism_threads: really high number (128) -> ... -> 4
  • batch inference
    • max_batch_size: really high number (1024) -> ... -> 16
    • batch_timeout_micros: 1000000 (1 second) -> 100000 -> ... -> 10
    • num_batch_threads: equals to the number of vCPU
    • max_enqueued_batches: really high number (128) -> ... -> equals to the num_batch_threads

We can gradually reduce the spec of VM and values of each configurations, then we can figure out which one is optimal.

The values above are chosen randomly without any solid principle. Some of them are referenced from this official doc though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.