yangli18 / vltvg Goto Github PK

View Code? Open in Web Editor NEW

85.0 2.0 4.0 617 KB

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

Python 97.42% Shell 2.58%

visual-grounding vision-language visual-linguistic cross-modal

vltvg's People

Contributors

Stargazers

Watchers

Forkers

sunyuxi z-w-wang uason-chen lparolari

vltvg's Issues

What is the GPU version and cuda version?

Split of Flickr30K dataset

Dear, I find that the split file flickr_train.pth has 427193 datas, which is supposed to has 29,783 training datas. So is it a mistake in the data.tar? or how can we get the correct split files. Thanks!

请问有人跑到论文中的精度了嘛，我现在任然差了很多，可以联系我讨论一下：vx：13330411658

有预训练权重可以进行推理测试吗

请问有预训练权重可以进行推理测试吗

how to show discriminative features

HI，
I only want to see this discriminative feature, but the size I obtained is (400,16,256). How can I display the image of this discriminative feature, as shown in your article.
Thanks a lot.

inferece api for REC

Hi, thanks for sharing this nice work!
Could you please provide an inference api so that, for example, the user only needs to provide the path to the image and the corresponding description?

Hello, the download link of the "ReferitGame" dataset is no longer available. Could you please send me a file through other methods? Thank you very much!

Hello, thanks for your work. I have a problem that the download link of the "ReferitGame" dataset is no longer available. Could you please send me a file through other methods? Thank you very much!

关于GFlops统计和params统计

您好，请问论文中这两个指标的统计，在project中有相关的统计脚本不，谢谢！

请有人跑到论文中的精度了嘛

请问有人跑到论文中的精度了嘛，我现在任然差了很多，可以联系我讨论一下：vx：13330411658

请问有人跑到论文中的精度了嘛

请问有人跑到论文中的精度了嘛，我现在任然差了好多，可以联系我讨论一下：vx：13330411658

请问有人跑到论文中的精度了嘛，我现在任然差了很多，可以联系我讨论一下：vx：13330411658

Question about Eq.1

I noticed "compute their semantic correlation as the verification score" about Eq.1
My question:

Does it work as a similarity function?
Could it be replaced by other similarity functions, such as cosine ...?

您好呀

老师您好呀，我尝试重新训练您的代码，使用的是双卡3090，batchsize=32，其他的参数未作任何的调整。我在referit数据集上训练测试验证的，我的最好的结果如下：val：71.31;test：69.18，您在论文中resnet50写的是最高的test是71.60.和您的相差有点大，请问该如何调整，或者有隐藏的trick嘛

How to visualize the attention map

Dear, thanks for your great job. I am now wanting to reproduce your job. Can you tell me how you visualize your attention map w.r.t. those picture in your paper?

Doubts about the hyperparameter args.box_xyxy

Thanks for your excellent work and releasing your code!
I find that if doesn't set args.box_xyxy as True, then when calculate GIoU loss, the code will treat box format as cxcywh and convert to xyxy.

VLTVG/models/VLTVG.py

Lines 119 to 121 in e5be26b

 if not self.box_xyxy: 

 src_boxes = box_ops.box_cxcywh_to_xyxy(src_boxes) 

 target_boxes = box_ops.box_cxcywh_to_xyxy(target_boxes)

But when build dataset, the target box format is always xyxy

VLTVG/datasets/dataset.py

Lines 111 to 116 in e5be26b

 if not (self.dataset == 'referit' or self.dataset == 'flickr'): # for refcoco, etc 

 # xywh to xyxy 

 for bbox in self.bboxs: 

 bbox = np.array(bbox, dtype=np.float32) 

 bbox[2:] += bbox[:2] 

 self.covert_bbox.append(bbox)

Is there any problem?

Visualize the attention map for a point

Thank you for your great work.
Could you tell me how to visualize the attention map for a point in Fig.4, or share its code?

请问有人跑到论文中的精度了嘛，我现在任然差了很多，可以联系我讨论一下：vx：13330411658

The CLOUD disk has no file

Is it easy to re-share because there is no file in the CLOUD drive？

Question about the verification score

I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose，which is a little different with Eq.1. Could you further explain it？Thanks a lot！

text_embed = self.text_proj(text_info)
img_embed = self.img_proj(img_feat)
verify_score = (F.normalize(img_embed, p=2, dim=-1) *
F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True)
verify_score = self.tf_scale *
torch.exp( - (1 - verify_score).pow(self.tf_pow)
/ (2 * self.tf_sigma**2))

The dataset of ReferItGame and Flickr30k is unavailable.

Thank u for u solid work. I follow your repository and meet some problems.

On one hand, the link of the dataset ReferItGame in the download_data.sh script is unavailable. I can't download the file from this link.
On the other hand, it is no provided way to download the dataset Flickr30k.

Can u give some suggestions?

about verification score

Hello, I have some doubts about the calculation of the verification score in formula 1, can you tell me why the verification score is calculated like this?
Can you provide the code for the verification score visualization?
thank you very much!

	if not self.box_xyxy:
	src_boxes = box_ops.box_cxcywh_to_xyxy(src_boxes)
	target_boxes = box_ops.box_cxcywh_to_xyxy(target_boxes)

	if not (self.dataset == 'referit' or self.dataset == 'flickr'): # for refcoco, etc
	# xywh to xyxy
	for bbox in self.bboxs:
	bbox = np.array(bbox, dtype=np.float32)
	bbox[2:] += bbox[:2]
	self.covert_bbox.append(bbox)

yangli18 / vltvg Goto Github PK

vltvg's People

Contributors

Stargazers

Watchers

Forkers

vltvg's Issues

Recommend Projects

Recommend Topics

Recommend Org