leaplabthu / pseudo-q Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Home Page: https://arxiv.org/abs/2203.08481
License: Apache License 2.0
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Home Page: https://arxiv.org/abs/2203.08481
License: Apache License 2.0
Hey, I can't the fing the code for build_model in the models folder. which .py file is it in?
请问epoch设置在哪儿啊?原本的迭代13643太多了,我的电脑运行需要好几天。我想将epoch减少一些。谢谢!
How to set the epoch? The original iteration 13643 was so much that it took days on my computer to run. I want to reduce the number of epoch. Thanks!
Hi, thank you for your excellent work! I found that /data/statistic/ did not have the .txt split files of other datasets. Is there any way to access these files?
Hi, this is a nice work!
Could you please provide an inference api so that, for example, the user only needs to provide the path to the image and the corresponding description?
Hi. Thanks to sharing your nice work!
When I run eval.sh on RefCOCO testA, I got the error "No such file unc_testA.pth'.
I wonder Why unc_testA.pth file is needed during evaluation?
I also run generate_pseudo_data_unc.sh before evaluation and I got unc_train_pseudo_split.pth files, not unc_pseudo_val.pth or unc_pseudo_testA.pth.
Thanks.
Hi, I found a work (https://github.com/ZhanYang-nwpu/RSVG-pytorch) that is very similar to yours but did not cite your work.
you can pay attention to it.
Hi, thank you for sharing nice work!
I have a question about getting detection results when generating pseudo queries.
The repositories you mentioned (MILVLG/bottom-up-attention.pytorch and shilrley6/Faster-R-CNN-with-model-pretrained-on-Visual-Genome) yield numpy files as output, but your code reads them as ".pth" files in here. I think there should be a preprocessing code to make them compatible. Can you share more details? Thanks!
TL; DR
Can you explain what is loaded in the dataset along with image data? I would like to understand especially the content of
bbox
.
Dear authors,
I'm trying to figure out how the training of your model works.
In particular, from this line
Line 40 in ce1688f
target
is used to compute the loss. The function trans_vg_loss
confirms it: Lines 107 to 125 in ce1688f
I tried to understand what target
is, and from this line
Line 30 in ce1688f
collate_fn
used in dataloader: Lines 294 to 308 in ce1688f
Is this using the ground truth bounding box from the dataset?
I checked the __getitem__
function from the dataset and I ended up with this three lines
Pseudo-Q/datasets/data_loader.py
Lines 219 to 221 in ce1688f
Here a .pth
file is loaded, and along with image data something else is loaded. Can you explain what the loaded bbox
exactly contains?
Thank you,
Luca
Recently, several researchers asked me questions about training. The phenomenon is that the training loss did not decrease or the validation acc was very low.
The reason might be that they adopted a smaller batch size, e.g., 96, but did not change the learning rate.
First of all, I strongly recommend using the same batch size to reproduce our work. Secondly, if you use a smaller batch size, please try to use a smaller learning rate.
If you have any new problems with training, please post your questions inside this issue or open a new one. It would be better to provide as much information as you can, which can help me understand your question quicker.
A good job. Can you provide more details about choosed pre-trained detector and attribute classifier on Visual Genome dataset? Such as download link or else.
|-- image_data
|-- data
|-- flickr
|-- gref
|-- gref_umd
|-- referit
|-- unc
|-- unc+
|-- Flickr30k
|-- flickr30k-images
|-- other
|-- images
|-- refcoco
|-- refcoco+
|-- refcocog
|-- referit
|-- images
|-- mask
|-- splits
上述 Readme 部分提供的数据集目录结构,只是简单罗列,而且很多都有出入未说明。''other'' 目录应该是指 原始 ''refer'' 仓库的文件吧? 另外,''data'' 目录应该是指下载的 ''pseudo_samples'' 吧? 然后 ''referit'' 文件目录有点莫名其妙不太理解,原始的 ''refer'' 里面是 ''ReCLEF'',但是目录下面的 ‘’images, mask, splits‘’ 未查到,请问这个referit 目录又该怎么构建??
谢谢!
Fixed
Sorry for bothering.When i run the train.py , something went wrong. Here is the output imformation:
E:\Users\JayLee\anaconda3\envs\myenv\python.exe E:/Pseudo-Q-main/train.py
Not using distributed mode
git:
sha: N/A, status: clean, branch: N/A
number of params: 155559940
Missing keys when loading detr model:
[]
Start training
E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Traceback (most recent call last):
File "E:\Pseudo-Q-main\train.py", line 310, in
main(args)
File "E:\Pseudo-Q-main\train.py", line 265, in main
train_stats = train_one_epoch(
File "E:\Pseudo-Q-main\engine.py", line 38, in train_one_epoch
output = model(img_data, text_data)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\trans_vg_mlcma.py", line 36, in forward
visu_mask, visu_src = self.visumodel(img_data)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\detr.py", line 72, in forward
out = self.transformer(self.input_proj(src), mask, pos[-1], query_embed=None)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 56, in forward
memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 118, in forward
output = layer(output, src_mask=mask,
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 225, in forward
return self.forward_post(src, src_mask, src_key_padding_mask, pos)
File "E:\Pseudo-Q-main\models\visual_model\transformer.py", line 196, in forward_post
src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\modules\activation.py", line 1031, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 4969, in multi_head_attention_forward
q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 4734, in _in_projection_packed
return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v)
File "E:\Users\JayLee\anaconda3\envs\myenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
进程已结束,退出代码1
I don't know how to fix it. Could you please help and give me some ideas? Thank you!
Hello, I want to ask you a question. In the process of Pseudo-Q replication, we found that for [the same fully supervised dataset] (yes, the same, fully), Pseudo-Q can train an epoch much faster than TransVG (about faster 5~6 times). I compared the codes of the two, but did not find the reason. Why is the training time shortened?
请问您在生成伪标注时,针对不同的数据集是否使用了不同的方法或超参?我注意到refcoco和refcoco+生成的伪标注数量相差很大,但refcoco和refcoco+包含的图片数量似乎相差并不大。
Could you please tell me if you used different methods or hyperparameters when generating pseudo-label for different datasets? I have noticed that the number of pseudo-label generated for refcoco and refcoco+ differs significantly, but the number of images contained in refcoco and refcoco+ seems to be quite similar.
Hi, thank you for your work!
I cannot download the 'detection_results.tar.gz' file following the instructions here. Is that a server issue? Can you please provide other download sources?
Best,
Yunzhong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.