Comments (11)
Please give a more detailed evidence to prove your claim.
from irra.
The caption_tokens in the diagram are a list that is passed directly to the function, and if the value in the function changes, the value in the list will also change
from irra.
We also confirmed this by printing
from irra.
I have confirmed the issue you mentioned, which was caused by an oversight in my code writing, and I am evaluating the impact of this issue on the final retrieval results.
Anyway, thank you for pointing this out, I will provide more explanation after the evaluation.
from irra.
OK,thanks
from irra.
The two list share the same memory. This may be the reason. Turning to use deepcopy will solve it. However, will it affect the performance?
from irra.
When I changed the shallow copy to a deep copy, i.e. calculating the SDM loss between image feature and masked text feature, there was some performance loss. Here are the results:
task | R1 | R5 | R10 | mAP | mINP |
---|---|---|---|---|---|
t2i | 71.069 | 87.606 | 92.122 | 64.879 | 50.267 |
And when I calculate the SDM loss bwtween image feature and masked text feature and text feature, the performance loss is lower:
task | R1 | R5 | R10 | mAP | mINP |
---|---|---|---|---|---|
t2i | 72.840 | 88.678 | 93.324 | 65.407 | 49.512 |
From my point of view, I think the question raised in this issue about using a shallow copy and not a deep copy is equivalent to the fact that I have applied an augmentation to the input text, similar to the data augmentation done to the input image. This leads to a difference in the final result. But it does not affect the conclusions of the paper.
Thank you for your attention to our work, I hope that my response will satisfy you.
from irra.
Is the result of the above table under the setting of id+mlm+sdm?
from irra.
Yes. @lzfff12
from irra.
In the build_random_masked_tokens_and_labels function, there is no deepcopy operation on caption_tokens to get the mlm_tokens. And this will make the two tokens completely same. It means the caption_tokens using for calculating SDM_loss are also masked. But in the paper, it uses unmasked caption_tokens to calculate it. Maybe is it a bug?
from irra.
@Zplusdragon When I was writing the code, I thought the .numpy() function would convert from tensor to a new ndarray, not realising that the two would actually share memory. So, you could say it's a bug.
from irra.
Related Issues (20)
- How to Solve multi-GPU Training Problems? HOT 1
- Multi gpu training problem HOT 2
- Is it unfair to use a pre-trained CLIP model compared to some other methods in Table 1?
- PROBLEMS REGARDING DIFFRENT SPIT OF DATASET
- The id loss in the table is not a separate loss, but is trained jointly with the itc loss in the baseline.
- KeyError: 'mlm_ids'
- 文本全局特征如何获得? HOT 2
- There is a mistake that the mlm module that the mask token's output and the whole token HOT 1
- How to specify a particular GPU to train? HOT 1
- a small bug about RSTPReid dataset HOT 2
- Maybe an error
- Maybe an error in the original paper? HOT 1
- CUHK-PEDES HOT 7
- 请问谁有ICFG-PEDES数据集,救救孩子,发邮件给作者三天了也没有回复 HOT 12
- The nan error HOT 1
- 想问一下想得到主观结果是运行哪个文件呢 HOT 1
- Confusion about the IRR module HOT 1
- Are training the CLIP model from scratch? or are you using the pretrained weights? HOT 1
- 如何使用多GPU训练? HOT 6
- visualize.py HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from irra.