anosorae / irra Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 24.0 1.99 MB

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)

License: MIT License

Python 99.82% Shell 0.18%

irra's People

Contributors

Stargazers

Watchers

irra's Issues

About using RN50 as image encoder

Hello, thanks for your good paper. And how can I run this code with resnet50 pre-trained weights?

Confusion about caption tokens and mlm tokens when using mlm

In the data processing part, we found that when using mlm, the caption tokens after processing are the same as mlm tokens, but this is different from the description in the paper

table4. cuhk-pedes rank1 maybe typos~

A small bug

In model/build.py, line 136, there is no method named "compute_mcm_or_mlm".

How to specify a particular GPU to train？

Thanks for your contribution, as a newbie I would like to ask how to specify a certain GPU to train the model?

File "C:\Users\14439\Documents\Code\Pycharm\IRRA-main\utils\checkpoint.py", line 150, in load_state_dict
model.load_state_dict(model_state_dict,strict=False)
File "C:\Users\14439\Documents\Jetbrains\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for IRRA:
size mismatch for classifier.weight: copying a param with shape torch.Size([3701, 512]) from checkpoint, the shape in current model is torch.Size([11003, 512]).
size mismatch for classifier.bias: copying a param with shape torch.Size([3701]) from checkpoint, the shape in current model is torch.Size([11003]).

Code for straight forward CLIP fine-tuned to PEDES

Hello,

Good job, your paper is very interesting. I am happy to see papers leveraging VLP for Text ReID.

I am interested to have the fine-tuned CLIP model on PEDES. Do you plan to share the training script and/or the weights of this model, please ? I can re-do it by my own but it would save some times for me (and some other people, probably).

Thanks again for the contribution to the community.

PROBLEMS REGARDING DIFFRENT SPIT OF DATASET

First, thank you all for this amazing work.

During testing, I came up with a problem like this. First, I trained the model with the 80:10:10 split on my dataset, resulted in 5103 training datasets. However, as I tested on another configure, which is 2:1 (train-test) split only, the test.py function return the error:

#test.py
RuntimeError: Error(s) in loading state_dict for IRRA:
	size mismatch for classifier.weight: copying a param with shape torch.Size([5103, 512]) from checkpoint, the shape in current model is torch.Size([4200, 512]).
	size mismatch for classifier.bias: copying a param with shape torch.Size([5103]) from checkpoint, the shape in current model is torch.Size([4200]).

So, in my observation, the weight partly depends on the initial dataset. So, in the cases when the dataset is being updated frequently, I would have to retrain the whole dataset. Am I missing something? I hope that I could get the answer from you all. Thank you a lot!

Are training the CLIP model from scratch? or are you using the pretrained weights?

I'm really confuse, in the code you initialize the layers using the normal distribution, but what I understood from the paper is that you are using CLIP model.

Your answer will really help me understand.

Thanks

The id loss in the table is not a separate loss, but is trained jointly with the itc loss in the baseline.

          The id loss in the table is not a separate loss, but is trained jointly with the itc loss in the baseline.

Originally posted by @anosorae in #16 (comment)
也就是说你的所有损失其实都是加上了itc_loss的？

Confusion about the IRR module

Thanks for providing such a great work about TBPS. I have been very interested in your project recently, and I would like to ask you why I only added the IRR module under the same conditions when reproducing the ablation experiment, the final accuracy of the model is often reduced？
For example, when setting the training parameters, the 'loss_names' part is set to 'sdm+id', and the obtained Rank-1 value is about 0.8% higher than sdm+mlm+id'. Other parameters are the same as in the README.
Looking forward to receiving your reply, thank you very much!

CUHK-PEDES

请问CUHK-PEDES数据集怎么下载的？发邮件没有反应。

fix device for ids in _compute_embedding in Evaluator

Hello,

In module utils.metrics, there is the Evaluator class and its private method _compute_embedding for computing the features and IDs for texts and images on the whole test dataset.

On line L60 and L70 we must add to(device) at the end of the concatenated Tensor of IDs.

If we do not send those ids to the GPU then we have a problem by computing tensors that are not on the same tensor. In method eval, we use the helper function rank to compute metrics. It computes on the similarity (GPU) and IDs (CPU). We get an error if we do not send the IDs to GPU. Plus, on the next line (L85) we see that the metrics are sent to the CPU. Then, those metrics were supposed to be on GPU. Thus IDs should have gone to GPU.

I tried to give some details even if I think this is a very tiny fix / error. I can do a PR if you want with the fix aforementioned. :)

Best,
Mathias.

Maybe an error in the original paper?

In the paper, the calculation of IRR loss made me confused,

We can treat this as an CE loss. |M| can be treat as the batch size, |V| can be treat as the class num. As $y^i$ is one-hot,so only single $y^i_j$ is set to 1 and others are 0. Thus the loss should not be divided by |V|.

请问哪里可以看到论文？

同学你好，论文链接好像失效了？

visualize.py

In visualize.py,line 44:"gt_img_paths=test_dataset['gt_img_paths']".But there is not 'gt_img_paths' in test_dataset.How to solve this bug?
Your answer will really help me understand.
Thanks!

关于id_loss在训练过程中不收敛的问题

您好，我按照您所提供的代码在单张3090跑了训练，但是在logging日志中发现id_loss并没有收敛，最终一直维持在7.847左右，请问这是什么原因呢？

a small bug about RSTPReid dataset

谢谢你写出这么优秀的文章。我对你的文章比较感兴趣。作为初学者，我在复现代码的时候采用这个数据集，发现mlm设置是false，修改为true之后出现了新的问题，当执行到bulid文件的if 'mlm' in self.current_task时，计算输入x的qkv时候维度出现了问题。q维度是(128,77,512)，k和v维度是(128,193,512)，我个人理解是觉得transformer内部会对k进行转置再和q进行计算，但是出现了shape '[-1,616,64]' is invalid for input of size 12648448，请问在复现这个数据集的时候出现这个问题是需要自己修改维度嘛？感觉计算的时候都是进入内置函数计算的，如果需要修改的话要提前在进入cross_former之前对qkv维度进行修改吗？

Multi gpu training problem

Thank you for your excellent work! I am very interested in your work and am currently using multiple GPUs for distributed training. As a beginner, I would like to ask if it is normal for the number of iterations of an epoch to not decrease when using multiple GOUs?

There is a mistake that the mlm module that the mask token's output and the whole token

Missing resize_text_pos_embed in clip_model.py

Hello,

It seems that the method resize_text_pos_embed is missing in your CLIP model. Can you provide it, please ?
Or maybe it's some dead code left because we never enter in this condition?

Thanks again for your time and your hard work for this paper.
Mathias.

about the CUHK-PEDES dataset

Confusion about the setting of validation dataset

Thank you for your great work. I retrained the model according to the code in README. But I founded the default value of "--val_dataset" is "test" in line 12, utils/options.py. Will it affect the results?

Is it unfair to use a pre-trained CLIP model compared to some other methods in Table 1?

sry, a little problem, most baseline methods in table 1 use RN50 or ViT as backbone,
i think it's better to report the performance on RN50 or ViT to avoid the extra benefit that pre-trained large models brings？

KeyError: 'mlm_ids'

When I run train.py, the following error appears:

Traceback (most recent call last):
File "f:\Workspace\0research0\IRRA-main\train.py", line 77, in
do_train(start_epoch, args, model, train_loader, evaluator, optimizer, scheduler, checkpointer)
File "f:\Workspace\0research0\IRRA-main\processor\processor.py", line 50, in do_train
ret = model(batch)
^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "f:\Workspace\0research0\IRRA-main\model\build.py", line 126, in forward
mlm_ids = batch['mlm_ids']
~~~~~^^^^^^^^^^^
KeyError: 'mlm_ids'

About ICFG-PEDES dataset

Hi, I'm experiencing a memory shortage problem doing validation operations while training the ICFG-PEDES dataset, I'm using 12G of TITAN and have the training batch size set to 32 and the test batch size set to 128, but it seems that there is still a GPU memory shortage problem.
I checked the training logs and found that there seems to be no val dataset, which seems to be the cause of the memory shortage. So I'm asking if the ICFG-PEDES data has a validation dataset loaded, or rather a json file of the cut data?

Thank you very much for open sourcing the code and I look forward to seeing the correct link to your paper, thanks!

请问谁有ICFG-PEDES数据集，救救孩子，发邮件给作者三天了也没有回复

Maybe an error

文本全局特征如何获得？

t_feats = text_feats[torch.arange(text_feats.shape[0]), caption_ids.argmax(dim=-1)].float()
和你文中的描述似乎并不一致，请问你代码中的这部分，该如何理解，选择的是文本的那部分作为文本的全局特征的？
谢谢1

如何使用多GPU训练？

如何使用多GPU训练？服务器有四张卡（0，1，2，3）.然后我想使用卡1和卡2.怎么办？求大神告知，我看源码里面好像对多GPU训练的部分写的有些问题，有人遇到过这个问题吗

Confusion about cmpm loss and the introduced sdm loss.

Hello,

Thanks for providing an amazing work about text-based person search. I am interested in reproducing your work. But I found that the introduced sdm loss is similar to the cmpm loss when I read the paper. Even if I check the implementation you provided, I found nothing different but a logit_scale. Cloud you please provide more introduction about the difference between them and why sdm is superior to cmpm?

Thank you so much.

There seems to be a bug at datasets/bases.py#L157.

At datasets/bases.py#L157, you directly pass caption_tokens to the function _build_random_masked_tokens_and_labels and caption_tokens has be modified in this function. Thus, the masked captions are also used in the sdm task and id task, which is inconsistent with the clarification in the paper.

local variable 'num_classes' referenced before assignment

IRRA/datasets/build.py

Line 170 in 6caff74

return test_img_loader, test_txt_loader, num_classes

The accuracy of IDLoss experiment is much lower than 65.33% of the paper

parameter
loss_names='id'
lr=1e-05
lr_factor=5.0
Concrete result

Looking forward to your reply！

想问一下想得到主观结果是运行哪个文件呢

想测试像图5这种主观结果

Confusion about the multimodal interaction encoder and mlm task

In your paper, you mentioned that all parameters in the multimodal interaction encoder are randomly initialized. In fact, there are a lot of parameters in this part. I would like to ask if you have considered using the parameters of the CLIP encoder for initialization, as this may have an impact on the performance of the model. Also, I would like to ask what is the approximate accuracy of the mlm task (mlm_acc) in the end? I didn't run the entire code because my graphics card and PyTorch version are not supported.

The nan error

the 'nan' error occurs when I change the Vit backbone to the resnet50 backbone

A small multi GPU training bug

It's on lines 98-100 of IRRA/processor/processor.py

if args.distributed: 
     top1 = evaluator.eval(model.module.eval())
else:
     top1 = evaluator.eval(model.eval())

Unexpected keyword argument 'batch_first'

$ bash run_irra.sh

File "./IRRA-main/model/build.py", line 148, in build_model
    model = IRRA(args, num_classes)
  File "./IRRA-main/model/build.py", line 27, in __init__
    self.cross_attn = nn.MultiheadAttention(self.embed_dim,
TypeError: __init__() got an unexpected keyword argument 'batch_first'

anosorae / irra Goto Github PK

irra's People

Contributors

Stargazers

Watchers

Forkers

irra's Issues

Recommend Projects

Recommend Topics

Recommend Org