anosorae / irra Goto Github PK
View Code? Open in Web Editor NEWCross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
License: MIT License
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
License: MIT License
谢谢你写出这么优秀的文章。我对你的文章比较感兴趣。作为初学者,我在复现代码的时候采用这个数据集,发现mlm设置是false,修改为true之后出现了新的问题,当执行到bulid文件的if 'mlm' in self.current_task时,计算输入x的qkv时候维度出现了问题。q维度是(128,77,512),k和v维度是(128,193,512),我个人理解是觉得transformer内部会对k进行转置再和q进行计算,但是出现了shape '[-1,616,64]' is invalid for input of size 12648448,请问在复现这个数据集的时候出现这个问题是需要自己修改维度嘛?感觉计算的时候都是进入内置函数计算的,如果需要修改的话要提前在进入cross_former之前对qkv维度进行修改吗?
In the paper, the calculation of IRR loss made me confused,
We can treat this as an CE loss. |M| can be treat as the batch size, |V| can be treat as the class num. As
Hello,
In module utils.metrics
, there is the Evaluator
class and its private method _compute_embedding
for computing the features and IDs for texts and images on the whole test dataset.
On line L60 and L70 we must add to(device)
at the end of the concatenated Tensor
of IDs.
If we do not send those ids to the GPU then we have a problem by computing tensors that are not on the same tensor. In method eval
, we use the helper function rank
to compute metrics. It computes on the similarity (GPU) and IDs (CPU). We get an error if we do not send the IDs to GPU. Plus, on the next line (L85) we see that the metrics are sent to the CPU. Then, those metrics were supposed to be on GPU. Thus IDs should have gone to GPU.
I tried to give some details even if I think this is a very tiny fix / error. I can do a PR if you want with the fix aforementioned. :)
Best,
Mathias.
Hi, I'm experiencing a memory shortage problem doing validation operations while training the ICFG-PEDES dataset, I'm using 12G of TITAN and have the training batch size set to 32 and the test batch size set to 128, but it seems that there is still a GPU memory shortage problem.
I checked the training logs and found that there seems to be no val dataset, which seems to be the cause of the memory shortage. So I'm asking if the ICFG-PEDES data has a validation dataset loaded, or rather a json file of the cut data?
Thank you very much for open sourcing the code and I look forward to seeing the correct link to your paper, thanks!
Hello,
It seems that the method resize_text_pos_embed is missing in your CLIP model. Can you provide it, please ?
Or maybe it's some dead code left because we never enter in this condition?
Thanks again for your time and your hard work for this paper.
Mathias.
Thanks for providing such a great work about TBPS. I have been very interested in your project recently, and I would like to ask you why I only added the IRR module under the same conditions when reproducing the ablation experiment, the final accuracy of the model is often reduced?
For example, when setting the training parameters, the 'loss_names' part is set to 'sdm+id', and the obtained Rank-1 value is about 0.8% higher than sdm+mlm+id'. Other parameters are the same as in the README.
Looking forward to receiving your reply, thank you very much!
I'm really confuse, in the code you initialize the layers using the normal distribution, but what I understood from the paper is that you are using CLIP model.
Your answer will really help me understand.
Thanks
When I run train.py, the following error appears:
Traceback (most recent call last):
File "f:\Workspace\0research0\IRRA-main\train.py", line 77, in
do_train(start_epoch, args, model, train_loader, evaluator, optimizer, scheduler, checkpointer)
File "f:\Workspace\0research0\IRRA-main\processor\processor.py", line 50, in do_train
ret = model(batch)
^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "f:\Workspace\0research0\IRRA-main\model\build.py", line 126, in forward
mlm_ids = batch['mlm_ids']
~~~~~^^^^^^^^^^^
KeyError: 'mlm_ids'
同学你好,论文链接好像失效了?
First, thank you all for this amazing work.
During testing, I came up with a problem like this. First, I trained the model with the 80:10:10 split on my dataset, resulted in 5103 training datasets. However, as I tested on another configure, which is 2:1 (train-test) split only, the test.py function return the error:
#test.py
RuntimeError: Error(s) in loading state_dict for IRRA:
size mismatch for classifier.weight: copying a param with shape torch.Size([5103, 512]) from checkpoint, the shape in current model is torch.Size([4200, 512]).
size mismatch for classifier.bias: copying a param with shape torch.Size([5103]) from checkpoint, the shape in current model is torch.Size([4200]).
So, in my observation, the weight partly depends on the initial dataset. So, in the cases when the dataset is being updated frequently, I would have to retrain the whole dataset. Am I missing something? I hope that I could get the answer from you all. Thank you a lot!
t_feats = text_feats[torch.arange(text_feats.shape[0]), caption_ids.argmax(dim=-1)].float()
和你文中的描述似乎并不一致,请问你代码中的这部分,该如何理解,选择的是文本的那部分作为文本的全局特征的?
谢谢1
the 'nan' error occurs when I change the Vit backbone to the resnet50 backbone
$ bash run_irra.sh
File "./IRRA-main/model/build.py", line 148, in build_model
model = IRRA(args, num_classes)
File "./IRRA-main/model/build.py", line 27, in __init__
self.cross_attn = nn.MultiheadAttention(self.embed_dim,
TypeError: __init__() got an unexpected keyword argument 'batch_first'
File "C:\Users\14439\Documents\Code\Pycharm\IRRA-main\utils\checkpoint.py", line 150, in load_state_dict
model.load_state_dict(model_state_dict,strict=False)
File "C:\Users\14439\Documents\Jetbrains\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for IRRA:
size mismatch for classifier.weight: copying a param with shape torch.Size([3701, 512]) from checkpoint, the shape in current model is torch.Size([11003, 512]).
size mismatch for classifier.bias: copying a param with shape torch.Size([3701]) from checkpoint, the shape in current model is torch.Size([11003]).
In model/build.py, line 136, there is no method named "compute_mcm_or_mlm".
At datasets/bases.py#L157, you directly pass caption_tokens to the function _build_random_masked_tokens_and_labels and caption_tokens has be modified in this function. Thus, the masked captions are also used in the sdm task and id task, which is inconsistent with the clarification in the paper.
In visualize.py,line 44:"gt_img_paths=test_dataset['gt_img_paths']".But there is not 'gt_img_paths' in test_dataset.How to solve this bug?
Your answer will really help me understand.
Thanks!
The id loss in the table is not a separate loss, but is trained jointly with the itc loss in the baseline.
Originally posted by @anosorae in #16 (comment)
也就是说你的所有损失 其实都是加上了itc_loss的?
Thanks for your contribution, as a newbie I would like to ask how to specify a certain GPU to train the model?
sry, a little problem, most baseline methods in table 1 use RN50 or ViT as backbone,
i think it's better to report the performance on RN50 or ViT to avoid the extra benefit that pre-trained large models brings?
Hello,
Thanks for providing an amazing work about text-based person search. I am interested in reproducing your work. But I found that the introduced sdm loss is similar to the cmpm loss when I read the paper. Even if I check the implementation you provided, I found nothing different but a logit_scale. Cloud you please provide more introduction about the difference between them and why sdm is superior to cmpm?
Thank you so much.
想测试像图5这种主观结果
In the data processing part, we found that when using mlm, the caption tokens after processing are the same as mlm tokens, but this is different from the description in the paper
Hello,
Good job, your paper is very interesting. I am happy to see papers leveraging VLP for Text ReID.
I am interested to have the fine-tuned CLIP model on PEDES. Do you plan to share the training script and/or the weights of this model, please ? I can re-do it by my own but it would save some times for me (and some other people, probably).
Thanks again for the contribution to the community.
如何使用多GPU训练?服务器有四张卡(0,1,2,3).然后我想使用卡1和卡2.怎么办?求大神告知,我看源码里面好像对多GPU训练的部分写的有些问题,有人遇到过这个问题吗
您好,我按照您所提供的代码在单张3090跑了训练,但是在logging日志中发现id_loss并没有收敛,最终一直维持在7.847左右,请问这是什么原因呢?
Thank you for your great work. I retrained the model according to the code in README. But I founded the default value of "--val_dataset" is "test" in line 12, utils/options.py. Will it affect the results?
请问CUHK-PEDES数据集怎么下载的?发邮件没有反应。
Thank you for your excellent work! I am very interested in your work and am currently using multiple GPUs for distributed training. As a beginner, I would like to ask if it is normal for the number of iterations of an epoch to not decrease when using multiple GOUs?
In your paper, you mentioned that all parameters in the multimodal interaction encoder are randomly initialized. In fact, there are a lot of parameters in this part. I would like to ask if you have considered using the parameters of the CLIP encoder for initialization, as this may have an impact on the performance of the model. Also, I would like to ask what is the approximate accuracy of the mlm task (mlm_acc) in the end? I didn't run the entire code because my graphics card and PyTorch version are not supported.
Line 170 in 6caff74
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.