Model deion <a href="https://huggingface.co/jinaai/jina-clip

jinaai/jina-clip-v1: support for model names with prefixes about transformers.js HOT 3 OPEN

do-me commented on July 29, 2024

jinaai/jina-clip-v1: support for model names with prefixes

from transformers.js.

Comments (3)

xenova commented on July 29, 2024

You can specify model_file_name as one of the options in .from_pretrained(model_id, { model_file_name: 'model' } :)
Although, do note that the weights I uploaded only work for Transformers.js v3 (unless you manually override the onnxruntime-web/node version to >= 1.16.0).

See the README for example Transformers.js code:

import { AutoTokenizer, CLIPTextModelWithProjection, AutoProcessor, CLIPVisionModelWithProjection, RawImage, cos_sim } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('jinaai/jina-clip-v1');
const text_model = await CLIPTextModelWithProjection.from_pretrained('jinaai/jina-clip-v1');

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch32');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('jinaai/jina-clip-v1');

// Run tokenization
const texts = ['A blue cat', 'A red cat'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute text embeddings
const { text_embeds } = await text_model(text_inputs);

// Read images and run processor
const urls = [
    'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
    'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
];
const image = await Promise.all(urls.map(url => RawImage.read(url)));
const image_inputs = await processor(image);

// Compute vision embeddings
const { image_embeds } = await vision_model(image_inputs);

//  Compute similarities
console.log(cos_sim(text_embeds[0].data, text_embeds[1].data)) // text embedding similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[1].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[1].data)) // text-image cross-modal similarity

from transformers.js.

wujohns commented on July 29, 2024

Feel so fuck for the v3 version. Because there is no v3 for nodejs and the new onnx package is only work for v3

from transformers.js.

wujohns commented on July 29, 2024

The code not work at all, and when I try to using optimum-cli to build the onnx model, the optimum not support the nomic-bert type model(nomic-embed-text-v1.5 can be build but the nomic-embed-vision-v1.5 failed)
so there is no way to run the demo code in transformer.js even stable version
If v3 not ready please not release the onnx only for v3

from transformers.js.

jinaai/jina-clip-v1: support for model names with prefixes about transformers.js HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent