Giter VIP home page Giter VIP logo

clip-onnx's Introduction

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)!

Open In Colab Open AI CLIP

Open In Colab RuCLIP Example

Open In Colab RuCLIP tiny Example

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git
!pip install git+https://github.com/openai/CLIP.git
!pip install onnxruntime-gpu

Example in 3 steps

  1. Download CLIP image from repo
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
  1. Load standard CLIP model, image, text on cpu
import clip
from PIL import Image
import numpy as np

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)

# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
image_onnx = image.detach().cpu().numpy().astype(np.float32)

# batch first
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
text_onnx = text.detach().cpu().numpy().astype(np.int32)
  1. Create CLIP-ONNX object to convert model to onnx
from clip_onnx import clip_onnx

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

onnx_model = clip_onnx(model, visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model.start_sessions(providers=["CPUExecutionProvider"]) # cpu mode
  1. Use for standard CLIP API. Batch inference
image_features = onnx_model.encode_image(image_onnx)
text_features = onnx_model.encode_text(text_onnx)

logits_per_image, logits_per_text = onnx_model(image_onnx, text_onnx)
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()

print("Label probs:", probs)  # prints: [[0.9927937  0.00421067 0.00299571]]

Enjoy the speed

Load saved model

Example for ViT-B/32 from Model Zoo

!wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/visual.onnx
!wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/textual.onnx
onnx_model = clip_onnx(None)
onnx_model.load_onnx(visual_path="visual.onnx",
                     textual_path="textual.onnx",
                     logit_scale=100.0000) # model.logit_scale.exp()
onnx_model.start_sessions(providers=["CPUExecutionProvider"])

Model Zoo

Models of the original CLIP can be found on this page.
They are not part of this library but should work correctly.

If something doesn't work

It happens that onnx does not convert the model the first time, in these cases it is worth trying to run it again.

If it doesn't help, it makes sense to change the export settings.

Model export options in onnx looks like this:

DEFAULT_EXPORT = dict(input_names=['input'], output_names=['output'],
                      export_params=True, verbose=False, opset_version=12,
                      do_constant_folding=True,
                      dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

You can change them pretty easily.

from clip_onnx.utils import DEFAULT_EXPORT

DEFAULT_EXPORT["opset_version"] = 15

Alternative option (change only visual or textual):

from clip_onnx import clip_onnx
from clip_onnx.utils import DEFAULT_EXPORT

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

textual_export_params = DEFAULT_EXPORT.copy()
textual_export_params["dynamic_axes"] = {'input': {1: 'batch_size'},
                                         'output': {0: 'batch_size'}}
textual_export_params["opset_version"] = 12

Textual = lambda x: x

onnx_model = clip_onnx(model.cpu(), visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(dummy_input_image, dummy_input_text, verbose=True,
                        textual_wrapper=Textual,
                        textual_export_params=textual_export_params)

Best practices

See benchmark.md

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

clip-onnx's People

Contributors

lednik7 avatar blahblahhhj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.