Giter VIP home page Giter VIP logo

Comments (5)

ecoxial2007 avatar ecoxial2007 commented on September 3, 2024

Thank you for your interest in my work. I have uploaded the code for processing Next-QA with GLIP, which can be found in src/tools/README.md and src/tools/extract_glip_bboxes.py.

The item_dict['video_features'] refers to the CLIP's [cls] token, and item_dict['bbox_features'] represents the features extracted by CLIP after GLIP's bbox extraction.

It's a bit cumbersome, but necessary since GLIP's ROI features cannot be directly extracted. Furthermore, even if extracted, they wouldn't align with CLIP's BERT representation.

from lgva_videoqa.

SWXxing avatar SWXxing commented on September 3, 2024

Thank you for your interest in my work. I have uploaded the code for processing Next-QA with GLIP, which can be found in src/tools/README.md and src/tools/extract_glip_bboxes.py.

The item_dict['video_features'] refers to the CLIP's [cls] token, and item_dict['bbox_features'] represents the features extracted by CLIP after GLIP's bbox extraction.

It's a bit cumbersome, but necessary since GLIP's ROI features cannot be directly extracted. Furthermore, even if extracted, they wouldn't align with CLIP's BERT representation.

Thank you for your reply. Isee that the extraxt_embedding.py loaded pre-train clip model to get an image_features of an image.
How to extract the region features of the generated bboxes from glip does not provide processing details.
Can you provide details on how to use clip to extract feature representations of regions for given bboxes?

from lgva_videoqa.

ecoxial2007 avatar ecoxial2007 commented on September 3, 2024

Using clip to extract features from the bbox region is straightforward. You can crop the original image using OpenCV or Pillow, and then use clip to extract features from the corresponding bbox region.

from lgva_videoqa.

bxwldljh avatar bxwldljh commented on September 3, 2024

hi, can you provide the code for extracting region features by clip after GLIP's bbox extraction?

from lgva_videoqa.

ecoxial2007 avatar ecoxial2007 commented on September 3, 2024

Sorry for the late response, I've been quite busy lately. Here is a simple example code:

x1, y1, x2, y2 = bbox
cropped_image = image_np[y1:y2, x1:x2]
cropped_image_pil = Image.fromarray(cropped_image)
tensor = val_transform(cropped_image_pil)
output = model.encode_image(tensor)

Recently, I've been too busy. If I find the time, I will organize and provide the complete code.

from lgva_videoqa.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.