Giter VIP home page Giter VIP logo

Comments (8)

jwyang avatar jwyang commented on May 13, 2024 4

Hi, @zhangxgu @drahmad89 , I had included the script to extract concept embeddings for customized concept pool, please have a check. Also, @zhangxgu , your code for extracting concept embeddings seems pretty correct except that you may want to use all prompt templates and do the normalization to the output text features.

from regionclip.

jwyang avatar jwyang commented on May 13, 2024

Hi, @drahmad89

You can refer to our demo code here:
https://huggingface.co/spaces/CVPR/regionclip-demo/blob/main/detectron2/modeling/meta_arch/clip_rcnn.py#L755

And you can build a text encoder here:
https://huggingface.co/spaces/CVPR/regionclip-demo/blob/main/detectron2/modeling/meta_arch/clip_rcnn.py#L593

from regionclip.

drahmad89 avatar drahmad89 commented on May 13, 2024

@jwyang
I wanted to get only bbox on specific classes.
in the documentation:" put it in the folder ./datasets/lvis/lvis_v1_val.json. The file is used to specify object class names."
I replaced this "lvis_v1_val.json" with custom annotation json file that only contains 6 different classes. I ran the zeroshot detection and always getting classes from LVIS dataset. (section:Visualization on custom images)

from regionclip.

jwyang avatar jwyang commented on May 13, 2024

@drahmad89 , currently, visualize on custom images does not support user-specific queries. We had built a huggingface demo as shared above which takes one category as the query. Per your request, we will build a customized demo into this repo to support user-specific queries given an image. Please stay tuned!

from regionclip.

zhangxgu avatar zhangxgu commented on May 13, 2024

Nice work! @jwyang
I also need the code to generate text embeddings of my own dataset. By now I write a code following CLIP like these:

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("RN50", device=device)

categories = ['XXX','XXX',XXX',]

text = clip.tokenize(["a photo of a %s"%c for c in categories]).to(device)

with torch.no_grad():
text_features = model.encode_text(text).cpu()

torch.save(text_features,'xxx.pth')

Hope you can give me some advice on this code.

from regionclip.

Jawing avatar Jawing commented on May 13, 2024

The script output at the end had a dim of (class_len, 1, emb_size), it is also still on gpu. I suggest to add torch.squeeze(concept_feats).cpu() after

concept_feats = torch.stack(concept_feats, 0)

I wasn't too sure of the format of the classes to input in concepts.txt. If it is listed line by line then shouldn't

for line in f:
            concept = line.strip()

replace

concept = f.readline().strip()

Just some suggestions :)

from regionclip.

jwyang avatar jwyang commented on May 13, 2024

Hi, @Jawing, good suggestion!

from regionclip.

Jawing avatar Jawing commented on May 13, 2024

There seems to be a bug with generating concept embeddings in extract_concept_features.py.

The generated embeddings seem to correspond to the first letter of each concepts, which can be duplicate and isn't what we want. I propose the correction below.

    concept_feats = []
    with open(concept_file, 'r') as f:
            concepts = []
            for line in f:
                concept = line.strip()
                concepts.append(concept)
            with torch.no_grad():
                token_embeddings_concepts = pre_tokenize(concepts).to(model.device)
                for token_embeddings in token_embeddings_concepts:
                    text_features = model.lang_encoder.encode_text(token_embeddings)
                    # average over all templates
                    text_features = text_features.mean(0, keepdim=True)
                    concept_feats.append(text_features)

from regionclip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.