leaplabthu / pseudo-q Goto Github PK

View Code? Open in Web Editor NEW

140.0 140.0 10.0 23.49 MB

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Home Page: https://arxiv.org/abs/2203.08481

License: Apache License 2.0

Python 93.73% Shell 6.27%

computer-vision cvpr2022 deep-learning multimodal-deep-learning pytorch vision-and-language visual-grounding

pseudo-q's People

Contributors

Stargazers

Watchers

Forkers

xixiareone linhuixiao hou-yz jiangllan jianghaojun marloweeee huijiang2486 strategist922 sorokinvld

pseudo-q's Issues

can't find build_model

Hey, I can't the fing the code for build_model in the models folder. which .py file is it in?

如何改变迭代次数？How to set the epoch?

请问epoch设置在哪儿啊？原本的迭代13643太多了，我的电脑运行需要好几天。我想将epoch减少一些。谢谢！

How to set the epoch? The original iteration 13643 was so much that it took days on my computer to run. I want to reduce the number of epoch. Thanks!

statistics of datasets

Hi, thank you for your excellent work! I found that /data/statistic/ did not have the .txt split files of other datasets. Is there any way to access these files?

Inference API

Hi, this is a nice work!
Could you please provide an inference api so that, for example, the user only needs to provide the path to the image and the corresponding description?

Evaluation on RefCOCO

Hi. Thanks to sharing your nice work!

When I run eval.sh on RefCOCO testA, I got the error "No such file unc_testA.pth'.
I wonder Why unc_testA.pth file is needed during evaluation?

I also run generate_pseudo_data_unc.sh before evaluation and I got unc_train_pseudo_split.pth files, not unc_pseudo_val.pth or unc_pseudo_testA.pth.

Thanks.

A work (RSVG) that is very similar to yours but did not cite your work.

Hi, I found a work (https://github.com/ZhanYang-nwpu/RSVG-pytorch) that is very similar to yours but did not cite your work.

the method of constructing pseudo labels of remote sensing data
MLCF (Multi-level Cross-modal Fusion) mentioned in RSVG

you can pay attention to it.

您好，请问您可视化pseudo sample 的脚本有添加在github中吗？About visualizing the pseudo samples.

About getting detection and attribute results by your own

Hi, thank you for sharing nice work!
I have a question about getting detection results when generating pseudo queries.
The repositories you mentioned (MILVLG/bottom-up-attention.pytorch and shilrley6/Faster-R-CNN-with-model-pretrained-on-Visual-Genome) yield numpy files as output, but your code reads them as ".pth" files in here. I think there should be a preprocessing code to make them compatible. Can you share more details? Thanks!

Clarification about the loss

TL; DR

Can you explain what is loaded in the dataset along with image data? I would like to understand especially the content of bbox.

Dear authors,

I'm trying to figure out how the training of your model works.

In particular, from this line

Pseudo-Q/engine.py

Line 40 in ce1688f

loss_dict = loss_utils.trans_vg_loss(output, target)

I noticed that the target is used to compute the loss. The function trans_vg_loss confirms it:

Pseudo-Q/utils/loss_utils.py

Lines 107 to 125 in ce1688f

 def trans_vg_loss(batch_pred, batch_target): 

 """Compute the losses related to the bounding boxes,  

  including the L1 regression loss and the GIoU loss 

  """ 

 batch_size = batch_pred.shape[0] 

 # world_size = get_world_size() 

 num_boxes = batch_size 

 loss_bbox = F.l1_loss(batch_pred, batch_target, reduction='none') 

 loss_giou = 1 - torch.diag(generalized_box_iou( 

 xywh2xyxy(batch_pred), 

 xywh2xyxy(batch_target) 

 )) 

 losses = {} 

 losses['loss_bbox'] = loss_bbox.sum() / num_boxes 

 losses['loss_giou'] = loss_giou.sum() / num_boxes 

 return losses

I tried to understand what target is, and from this line

Pseudo-Q/engine.py

Line 30 in ce1688f

img_data, text_data, target = batch

I checked the collate_fn used in dataloader:

Pseudo-Q/utils/misc.py

Lines 294 to 308 in ce1688f

 def collate_fn(raw_batch): 

 raw_batch = list(zip(*raw_batch)) 

 img = torch.stack(raw_batch[0]) 

 img_mask = torch.tensor(raw_batch[1]) 

 img_data = NestedTensor(img, img_mask) 

 word_id = torch.tensor(raw_batch[2]) 

 word_mask = torch.tensor(raw_batch[3]) 

 text_data = NestedTensor(word_id, word_mask) 

 bbox = torch.tensor(raw_batch[4]) 

 if len(raw_batch) == 7: 

 batch = [img_data, text_data, bbox, raw_batch[5], raw_batch[6]] 

 else: 

 batch = [img_data, text_data, bbox] 

 return tuple(batch)

Is this using the ground truth bounding box from the dataset?

I checked the __getitem__ function from the dataset and I ended up with this three lines

Pseudo-Q/datasets/data_loader.py

Lines 219 to 221 in ce1688f

 imgset_file = '{0}_{1}.pth'.format(self.dataset, split) 

 imgset_path = osp.join(dataset_path, imgset_file) 

 self.images += torch.load(imgset_path)

Here a .pth file is loaded, and along with image data something else is loaded. Can you explain what the loaded bbox exactly contains?

Thank you,
Luca

Problem about the training.

Recently, several researchers asked me questions about training. The phenomenon is that the training loss did not decrease or the validation acc was very low.

The reason might be that they adopted a smaller batch size, e.g., 96, but did not change the learning rate.

First of all, I strongly recommend using the same batch size to reproduce our work. Secondly, if you use a smaller batch size, please try to use a smaller learning rate.

If you have any new problems with training, please post your questions inside this issue or open a new one. It would be better to provide as much information as you can, which can help me understand your question quicker.

More details about choosed pre-trained detector and attribute classifier on Visual Genome dataset.

A good job. Can you provide more details about choosed pre-trained detector and attribute classifier on Visual Genome dataset? Such as download link or else.

作者您好，请问能不能对数据集目录部分描述再详细一点，谢谢。Could author provide more detailed descriptions about the dataset folder？

|-- image_data
   |-- data
      |-- flickr
      |-- gref
      |-- gref_umd
      |-- referit
      |-- unc
      |-- unc+
   |-- Flickr30k
      |-- flickr30k-images
   |-- other
      |-- images
      |-- refcoco
      |-- refcoco+
      |-- refcocog
   |-- referit
      |-- images
      |-- mask
      |-- splits

上述 Readme 部分提供的数据集目录结构，只是简单罗列，而且很多都有出入未说明。''other'' 目录应该是指原始 ''refer'' 仓库的文件吧？另外，''data'' 目录应该是指下载的 ''pseudo_samples'' 吧？然后 ''referit'' 文件目录有点莫名其妙不太理解，原始的 ''refer'' 里面是 ''ReCLEF''，但是目录下面的 ‘’images, mask, splits‘’ 未查到，请问这个referit 目录又该怎么构建？？
谢谢！

In data_loader, bbox tensor got wrong format.

Fixed

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Sorry for bothering.When i run the train.py , something went wrong. Here is the output imformation:

E:\Users\JayLee\anaconda3\envs\myenv\python.exe E:/Pseudo-Q-main/train.py
Not using distributed mode
git:
sha: N/A, status: clean, branch: N/A

INFO ### torch.backends.cudnn.benchmark = False

number of params: 155559940
Missing keys when loading detr model:
[]
Start training
E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Traceback (most recent call last):
File "E:\Pseudo-Q-main\train.py", line 310, in
main(args)
File "E:\Pseudo-Q-main\train.py", line 265, in main
train_stats = train_one_epoch(
File "E:\Pseudo-Q-main\engine.py", line 38, in train_one_epoch
output = model(img_data, text_data)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\trans_vg_mlcma.py", line 36, in forward
visu_mask, visu_src = self.visumodel(img_data)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\detr.py", line 72, in forward
out = self.transformer(self.input_proj(src), mask, pos[-1], query_embed=None)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 56, in forward
memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 118, in forward
output = layer(output, src_mask=mask,
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 225, in forward
return self.forward_post(src, src_mask, src_key_padding_mask, pos)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 196, in forward_post
src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\activation.py", line 1031, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 4969, in multi_head_attention_forward
q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 4734, in _in_projection_packed
return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

进程已结束,退出代码1

I don't know how to fix it. Could you please help and give me some ideas? Thank you!

您好，我想学习这篇论文，在代码运行方面出现了点问题，还请您指教

您好，我想学习这篇论文，代码运行训练时出现了点问题，还请您指教

Pseudo-Q can train an epoch much faster than TransVG, why is the training time shortened?

Hello, I want to ask you a question. In the process of Pseudo-Q replication, we found that for [the same fully supervised dataset] (yes, the same, fully), Pseudo-Q can train an epoch much faster than TransVG (about faster 5~6 times). I compared the codes of the two, but did not find the reason. Why is the training time shortened?

关于生成伪标注的问题 Regarding the Issue of Generating Pseudo-label

请问您在生成伪标注时，针对不同的数据集是否使用了不同的方法或超参？我注意到refcoco和refcoco+生成的伪标注数量相差很大，但refcoco和refcoco+包含的图片数量似乎相差并不大。

Could you please tell me if you used different methods or hyperparameters when generating pseudo-label for different datasets? I have noticed that the number of pseudo-label generated for refcoco and refcoco+ differs significantly, but the number of images contained in refcoco and refcoco+ seems to be quite similar.

Unable to download the faster RCNN results

Hi, thank you for your work!

I cannot download the 'detection_results.tar.gz' file following the instructions here. Is that a server issue? Can you please provide other download sources?

Best,
Yunzhong

	def trans_vg_loss(batch_pred, batch_target):
	"""Compute the losses related to the bounding boxes,
	including the L1 regression loss and the GIoU loss
	"""
	batch_size = batch_pred.shape[0]
	# world_size = get_world_size()
	num_boxes = batch_size

	loss_bbox = F.l1_loss(batch_pred, batch_target, reduction='none')
	loss_giou = 1 - torch.diag(generalized_box_iou(
	xywh2xyxy(batch_pred),
	xywh2xyxy(batch_target)
	))

	losses = {}
	losses['loss_bbox'] = loss_bbox.sum() / num_boxes
	losses['loss_giou'] = loss_giou.sum() / num_boxes

	return losses

	def collate_fn(raw_batch):
	raw_batch = list(zip(*raw_batch))

	img = torch.stack(raw_batch[0])
	img_mask = torch.tensor(raw_batch[1])
	img_data = NestedTensor(img, img_mask)
	word_id = torch.tensor(raw_batch[2])
	word_mask = torch.tensor(raw_batch[3])
	text_data = NestedTensor(word_id, word_mask)
	bbox = torch.tensor(raw_batch[4])
	if len(raw_batch) == 7:
	batch = [img_data, text_data, bbox, raw_batch[5], raw_batch[6]]
	else:
	batch = [img_data, text_data, bbox]
	return tuple(batch)

	imgset_file = '{0}_{1}.pth'.format(self.dataset, split)
	imgset_path = osp.join(dataset_path, imgset_file)
	self.images += torch.load(imgset_path)