Thanks for doing such a great job！ two issues: Is item_dic

The extraction embedding coding of "bbox_features" about lgva_videoqa HOT 5 OPEN

SWXxing commented on September 3, 2024

The extraction embedding coding of "bbox_features"

from lgva_videoqa.

Comments (5)

ecoxial2007 commented on September 3, 2024

Thank you for your interest in my work. I have uploaded the code for processing Next-QA with GLIP, which can be found in src/tools/README.md and src/tools/extract_glip_bboxes.py.

The item_dict['video_features'] refers to the CLIP's [cls] token, and item_dict['bbox_features'] represents the features extracted by CLIP after GLIP's bbox extraction.

It's a bit cumbersome, but necessary since GLIP's ROI features cannot be directly extracted. Furthermore, even if extracted, they wouldn't align with CLIP's BERT representation.

from lgva_videoqa.

SWXxing commented on September 3, 2024

Thank you for your interest in my work. I have uploaded the code for processing Next-QA with GLIP, which can be found in src/tools/README.md and src/tools/extract_glip_bboxes.py.

The item_dict['video_features'] refers to the CLIP's [cls] token, and item_dict['bbox_features'] represents the features extracted by CLIP after GLIP's bbox extraction.

It's a bit cumbersome, but necessary since GLIP's ROI features cannot be directly extracted. Furthermore, even if extracted, they wouldn't align with CLIP's BERT representation.

Thank you for your reply. Isee that the extraxt_embedding.py loaded pre-train clip model to get an image_features of an image.
How to extract the region features of the generated bboxes from glip does not provide processing details.
Can you provide details on how to use clip to extract feature representations of regions for given bboxes?

from lgva_videoqa.

ecoxial2007 commented on September 3, 2024

Using clip to extract features from the bbox region is straightforward. You can crop the original image using OpenCV or Pillow, and then use clip to extract features from the corresponding bbox region.

from lgva_videoqa.

bxwldljh commented on September 3, 2024

hi, can you provide the code for extracting region features by clip after GLIP's bbox extraction?

from lgva_videoqa.

ecoxial2007 commented on September 3, 2024

Sorry for the late response, I've been quite busy lately. Here is a simple example code:

x1, y1, x2, y2 = bbox
cropped_image = image_np[y1:y2, x1:x2]
cropped_image_pil = Image.fromarray(cropped_image)
tensor = val_transform(cropped_image_pil)
output = model.encode_image(tensor)

Recently, I've been too busy. If I find the time, I will organize and provide the complete code.

from lgva_videoqa.

The extraction embedding coding of "bbox_features" about lgva_videoqa HOT 5 OPEN

Comments (5)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent