Giter VIP home page Giter VIP logo

x-stable-diffusion's Introduction

Stochastic.ai Stochastic.ai


Welcome to x-stable-diffusion by Stochastic!

This project is a compilation of acceleration techniques for the Stable Diffusion model to help you generate images faster and more efficiently, saving you both time and money.

With example images and a comprehensive benchmark, you can easily choose the best technique for your needs. When you're ready to deploy, our CLI called stochasticx makes it easy to get started on your local machine. Try x-stable-diffusion and see the difference it can make for your image generation performance and cost savings.

๐Ÿš€ Installation

Quickstart

Make sure you have Python and Docker installed on your system

  1. Install the latest version of stochasticx library.
pip install stochasticx
  1. Deploy the Stable Diffusion model
stochasticx stable-diffusion deploy --type aitemplate

If you don't have a Stochastic account, then the CLI will prompt you to quickly create one. It is free and just takes 1 minute Sign up โ†’

Alternatively, you can deploy stable diffusion without our CLI by checking the steps here.

  1. To perform inference with this deployed model:
stochasticx stable-diffusion inference --prompt "Riding a horse"

Check all the options of the inference command:

stochasticx stable-diffusion inference --help
  1. You can get the logs of the deployment executing the following command:
stochasticx stable-diffusion logs
  1. Stop and remove the deployment with this command:
stochasticx stable-diffusion stop

How to get less than 1s latency?

Change the num_inference_steps to 30. With this, you can get an image generated in 0.88 seconds.

{
  'max_seq_length': 64,
  'num_inference_steps': 30, 
  'image_size': (512, 512) 
}

You can also experiment with reducing the image_size.

How to run on Google Colab?

In each folder, we will provide a Google Colab notebook with which you can test the full flow and inference on a T4 GPU

Manual deployment

Check the README.md of the following directories:

๐Ÿ”ฅ Optimizations

Benchmarks

Setup

For hardware, we used 1x40GB A100 GPU with CUDA 11.6 and the results are reported by averaging 50 runs.

The following arguments were used for image generation for all the benchmarks:

{
  'max_seq_length': 64,
  'num_inference_steps': 50, 
  'image_size': (512, 512) 
}

Online results

For batch_size 1, these are the latency results:

A100 GPU

A100_GPU_graph

project Latency (s) GPU VRAM (GB)
PyTorch fp16 5.77 10.3
nvFuser fp16 3.15 ---
FlashAttention fp16 2.80 7.5
TensorRT fp16 1.68 8.1
AITemplate fp16 1.38 4.83
ONNX (CUDA) 7.26 13.3

T4 GPU

Note: AITemplate might not support T4 GPU yet. Check support here

T4_GPU_graph

project Latency (s)
PyTorch fp16 16.2
nvFuser fp16 19.3
FlashAttention fp16 13.7
TensorRT fp16 9.3

Batched results - A100 GPU

The following results were obtained by varying batch_size from 1 to 24.

A100_GPU_batch_size

project \ bs 1 4 8 16 24
Pytorch fp16 5.77s/10.3GB 19.2s/18.5GB 36s/26.7GB OOM
FlashAttention fp16 2.80s/7.5GB 9.1s/17GB 17.7s/29.5GB OOM
TensorRT fp16 1.68s/8.1GB OOM
AITemplate fp16 1.38s/4.83GB 4.25s/8.5GB 7.4s/14.5GB 15.7s/25GB 23.4s/36GB
ONNX (CUDA) 7.26s/13.3GB OOM OOM OOM OOM

Note: TensorRT fails to convert UNet model from ONNX to TensorRT due to memory issues.

Sample images generated

Click here to view the complete list of generated images

Optimization \ Prompt Super Mario learning to fly in an airport, Painting by Leonardo Da Vinci The Easter bunny riding a motorcycle in New York City Drone flythrough of a tropical jungle convered in snow
PyTorch fp16 pytorch_stable-diffusion_mario pytorch_stable-diffusion_bunny pytorch_stable-diffusion_bunny
nvFuser fp16 nvFuser_stable-diffusion_mario nvFuser_stable-diffusion_bunny nvFuser_stable-diffusion_bunny
FlashAttention fp16 FlashAttention_stable-diffusion_mario FlashAttention_stable-diffusion_bunny FlashAttention_stable-diffusion_bunny
TensorRT fp16 TensorRT_stable-diffusion_mario TensorRT_stable-diffusion_bunny TensorRT_stable-diffusion_bunny
AITemplate fp16 AITemplate_stable-diffusion_mario AITemplate_stable-diffusion_bunny AITemplate_stable-diffusion_bunny

References

๐ŸŒŽ Join our community

Team and contributors

x-stable-diffusion is a community-driven project with several AI systems engineers and researchers contributing to it.

It is currently maintained by: Toan Do, Marcos Rivera, Sarthak Langde, Subhash GN, Riccardo Romagnoli, Roman Ageev and Glenn Ko

โœ… Stochastic

Stochastic was founded with a vision to make deep learning optimization and deployment effortless. With our cloud platform, you can easily optimize and deploy your deep learning models with confidence, knowing that you are getting the best performance possible. Our platform automatically optimizes your models, benchmarking them on various evaluation metrics to ensure they are running at their peak.

And when it comes time to deploy, Stochastic has you covered with auto-scaling accelerated inference for models like BLOOM 176B, Stable Diffusion, and GPT-J. Plus, our platform is cloud agnostic, supporting AWS, GCP, Azure, and Kubernetes clusters

Stochastic X Dashboard

For fully-managed solution hosted on Stochastic Sign up โ†’
For private hosting on your cloud or on-prem Contact us โ†’

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.