Giter VIP home page Giter VIP logo

irra's People

Contributors

anosorae avatar cottoncandyz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

irra's Issues

a small bug about RSTPReid dataset

谢谢你写出这么优秀的文章。我对你的文章比较感兴趣。作为初学者,我在复现代码的时候采用这个数据集,发现mlm设置是false,修改为true之后出现了新的问题,当执行到bulid文件的if 'mlm' in self.current_task时,计算输入x的qkv时候维度出现了问题。q维度是(128,77,512),k和v维度是(128,193,512),我个人理解是觉得transformer内部会对k进行转置再和q进行计算,但是出现了shape '[-1,616,64]' is invalid for input of size 12648448,请问在复现这个数据集的时候出现这个问题是需要自己修改维度嘛?感觉计算的时候都是进入内置函数计算的,如果需要修改的话要提前在进入cross_former之前对qkv维度进行修改吗?

Maybe an error in the original paper?

In the paper, the calculation of IRR loss made me confused,
image
We can treat this as an CE loss. |M| can be treat as the batch size, |V| can be treat as the class num. As $y^i$ is one-hot,so only single $y^i_j$ is set to 1 and others are 0. Thus the loss should not be divided by |V|.

fix device for ids in _compute_embedding in Evaluator

Hello,

In module utils.metrics, there is the Evaluator class and its private method _compute_embedding for computing the features and IDs for texts and images on the whole test dataset.

On line L60 and L70 we must add to(device) at the end of the concatenated Tensor of IDs.

If we do not send those ids to the GPU then we have a problem by computing tensors that are not on the same tensor. In method eval, we use the helper function rank to compute metrics. It computes on the similarity (GPU) and IDs (CPU). We get an error if we do not send the IDs to GPU. Plus, on the next line (L85) we see that the metrics are sent to the CPU. Then, those metrics were supposed to be on GPU. Thus IDs should have gone to GPU.

I tried to give some details even if I think this is a very tiny fix / error. I can do a PR if you want with the fix aforementioned. :)

Best,
Mathias.

About ICFG-PEDES dataset

Hi, I'm experiencing a memory shortage problem doing validation operations while training the ICFG-PEDES dataset, I'm using 12G of TITAN and have the training batch size set to 32 and the test batch size set to 128, but it seems that there is still a GPU memory shortage problem.
I checked the training logs and found that there seems to be no val dataset, which seems to be the cause of the memory shortage. So I'm asking if the ICFG-PEDES data has a validation dataset loaded, or rather a json file of the cut data?

Thank you very much for open sourcing the code and I look forward to seeing the correct link to your paper, thanks!

Confusion about the IRR module

Thanks for providing such a great work about TBPS. I have been very interested in your project recently, and I would like to ask you why I only added the IRR module under the same conditions when reproducing the ablation experiment, the final accuracy of the model is often reduced?
For example, when setting the training parameters, the 'loss_names' part is set to 'sdm+id', and the obtained Rank-1 value is about 0.8% higher than sdm+mlm+id'. Other parameters are the same as in the README.
Looking forward to receiving your reply, thank you very much!

KeyError: 'mlm_ids'

When I run train.py, the following error appears:

Traceback (most recent call last):
File "f:\Workspace\0research0\IRRA-main\train.py", line 77, in
do_train(start_epoch, args, model, train_loader, evaluator, optimizer, scheduler, checkpointer)
File "f:\Workspace\0research0\IRRA-main\processor\processor.py", line 50, in do_train
ret = model(batch)
^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\env\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "f:\Workspace\0research0\IRRA-main\model\build.py", line 126, in forward
mlm_ids = batch['mlm_ids']
~~~~~^^^^^^^^^^^
KeyError: 'mlm_ids'

PROBLEMS REGARDING DIFFRENT SPIT OF DATASET

First, thank you all for this amazing work.

During testing, I came up with a problem like this. First, I trained the model with the 80:10:10 split on my dataset, resulted in 5103 training datasets. However, as I tested on another configure, which is 2:1 (train-test) split only, the test.py function return the error:

#test.py
RuntimeError: Error(s) in loading state_dict for IRRA:
	size mismatch for classifier.weight: copying a param with shape torch.Size([5103, 512]) from checkpoint, the shape in current model is torch.Size([4200, 512]).
	size mismatch for classifier.bias: copying a param with shape torch.Size([5103]) from checkpoint, the shape in current model is torch.Size([4200]).

So, in my observation, the weight partly depends on the initial dataset. So, in the cases when the dataset is being updated frequently, I would have to retrain the whole dataset. Am I missing something? I hope that I could get the answer from you all. Thank you a lot!

文本全局特征如何获得?

t_feats = text_feats[torch.arange(text_feats.shape[0]), caption_ids.argmax(dim=-1)].float()
和你文中的描述似乎并不一致,请问你代码中的这部分,该如何理解,选择的是文本的那部分作为文本的全局特征的?
谢谢1

The nan error

the 'nan' error occurs when I change the Vit backbone to the resnet50 backbone

Unexpected keyword argument 'batch_first'

$ bash run_irra.sh

File "./IRRA-main/model/build.py", line 148, in build_model
    model = IRRA(args, num_classes)
  File "./IRRA-main/model/build.py", line 27, in __init__
    self.cross_attn = nn.MultiheadAttention(self.embed_dim,
TypeError: __init__() got an unexpected keyword argument 'batch_first'

cant use best.pth

File "C:\Users\14439\Documents\Code\Pycharm\IRRA-main\utils\checkpoint.py", line 150, in load_state_dict
model.load_state_dict(model_state_dict,strict=False)
File "C:\Users\14439\Documents\Jetbrains\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for IRRA:
size mismatch for classifier.weight: copying a param with shape torch.Size([3701, 512]) from checkpoint, the shape in current model is torch.Size([11003, 512]).
size mismatch for classifier.bias: copying a param with shape torch.Size([3701]) from checkpoint, the shape in current model is torch.Size([11003]).

A small bug

In model/build.py, line 136, there is no method named "compute_mcm_or_mlm".

There seems to be a bug at datasets/bases.py#L157.

At datasets/bases.py#L157, you directly pass caption_tokens to the function _build_random_masked_tokens_and_labels and caption_tokens has be modified in this function. Thus, the masked captions are also used in the sdm task and id task, which is inconsistent with the clarification in the paper.

visualize.py

In visualize.py,line 44:"gt_img_paths=test_dataset['gt_img_paths']".But there is not 'gt_img_paths' in test_dataset.How to solve this bug?
Your answer will really help me understand.
Thanks!

Confusion about cmpm loss and the introduced sdm loss.

Hello,

Thanks for providing an amazing work about text-based person search. I am interested in reproducing your work. But I found that the introduced sdm loss is similar to the cmpm loss when I read the paper. Even if I check the implementation you provided, I found nothing different but a logit_scale. Cloud you please provide more introduction about the difference between them and why sdm is superior to cmpm?

Thank you so much.

Code for straight forward CLIP fine-tuned to PEDES

Hello,

Good job, your paper is very interesting. I am happy to see papers leveraging VLP for Text ReID.

I am interested to have the fine-tuned CLIP model on PEDES. Do you plan to share the training script and/or the weights of this model, please ? I can re-do it by my own but it would save some times for me (and some other people, probably).

Thanks again for the contribution to the community.

如何使用多GPU训练?

如何使用多GPU训练?服务器有四张卡(0,1,2,3).然后我想使用卡1和卡2.怎么办?求大神告知,我看源码里面好像对多GPU训练的部分写的有些问题,有人遇到过这个问题吗

Confusion about the setting of validation dataset

Thank you for your great work. I retrained the model according to the code in README. But I founded the default value of "--val_dataset" is "test" in line 12, utils/options.py. Will it affect the results?

CUHK-PEDES

请问CUHK-PEDES数据集怎么下载的?发邮件没有反应。

Multi gpu training problem

Thank you for your excellent work! I am very interested in your work and am currently using multiple GPUs for distributed training. As a beginner, I would like to ask if it is normal for the number of iterations of an epoch to not decrease when using multiple GOUs?

Confusion about the multimodal interaction encoder and mlm task

In your paper, you mentioned that all parameters in the multimodal interaction encoder are randomly initialized. In fact, there are a lot of parameters in this part. I would like to ask if you have considered using the parameters of the CLIP encoder for initialization, as this may have an impact on the performance of the model. Also, I would like to ask what is the approximate accuracy of the mlm task (mlm_acc) in the end? I didn't run the entire code because my graphics card and PyTorch version are not supported.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.