Giter VIP home page Giter VIP logo

vltvg's People

Contributors

yangli18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

vltvg's Issues

Split of Flickr30K dataset

Dear, I find that the split file flickr_train.pth has 427193 datas, which is supposed to has 29,783 training datas. So is it a mistake in the data.tar? or how can we get the correct split files. Thanks!

how to show discriminative features

HI,
I only want to see this discriminative feature, but the size I obtained is (400,16,256). How can I display the image of this discriminative feature, as shown in your article.
Thanks a lot.

inferece api for REC

Hi, thanks for sharing this nice work!
Could you please provide an inference api so that, for example, the user only needs to provide the path to the image and the corresponding description?

Question about Eq.1

I noticed "compute their semantic correlation as the verification score" about Eq.1
My question:

  1. Does it work as a similarity function?
  2. Could it be replaced by other similarity functions, such as cosine ...?

您好呀

老师您好呀,我尝试重新训练您的代码,使用的是双卡3090,batchsize=32,其他的参数未作任何的调整。我在referit数据集上训练测试验证的,我的最好的结果如下:val:71.31;test:69.18,您在论文中resnet50写的是最高的test是71.60.和您的相差有点大,请问该如何调整,或者有隐藏的trick嘛

How to visualize the attention map

Dear, thanks for your great job. I am now wanting to reproduce your job. Can you tell me how you visualize your attention map w.r.t. those picture in your paper?

Doubts about the hyperparameter args.box_xyxy

Thanks for your excellent work and releasing your code!
I find that if doesn't set args.box_xyxy as True, then when calculate GIoU loss, the code will treat box format as cxcywh and convert to xyxy.

VLTVG/models/VLTVG.py

Lines 119 to 121 in e5be26b

if not self.box_xyxy:
src_boxes = box_ops.box_cxcywh_to_xyxy(src_boxes)
target_boxes = box_ops.box_cxcywh_to_xyxy(target_boxes)

But when build dataset, the target box format is always xyxy

VLTVG/datasets/dataset.py

Lines 111 to 116 in e5be26b

if not (self.dataset == 'referit' or self.dataset == 'flickr'): # for refcoco, etc
# xywh to xyxy
for bbox in self.bboxs:
bbox = np.array(bbox, dtype=np.float32)
bbox[2:] += bbox[:2]
self.covert_bbox.append(bbox)

Is there any problem?

Question about the verification score

I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose,which is a little different with Eq.1. Could you further explain it?Thanks a lot!

text_embed = self.text_proj(text_info)
img_embed = self.img_proj(img_feat)
verify_score = (F.normalize(img_embed, p=2, dim=-1) *
F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True)
verify_score = self.tf_scale *
torch.exp( - (1 - verify_score).pow(self.tf_pow)
/ (2 * self.tf_sigma**2))

The dataset of ReferItGame and Flickr30k is unavailable.

Thank u for u solid work. I follow your repository and meet some problems.

On one hand, the link of the dataset ReferItGame in the download_data.sh script is unavailable. I can't download the file from this link.
On the other hand, it is no provided way to download the dataset Flickr30k.

Can u give some suggestions?

about verification score

Hello, I have some doubts about the calculation of the verification score in formula 1, can you tell me why the verification score is calculated like this?
Can you provide the code for the verification score visualization?
thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.