sense-x / mixmim Goto Github PK

View Code? Open in Web Editor NEW

123.0 8.0 6.0 665 KB

MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning

License: MIT License

Python 98.74% Shell 1.26%

masked-image-modeling transformer

mixmim's Issues

Hardware & training time for pretraining

What's the hardware & training time? Specifically, I'm interested in the statistics below:

Arch & Epoch: [e.g. ViT-B, 300 ep]
Hardware: [e.g. single 8 V100 node]
Batch time: [e.g. 0.8s]
Epoch time: [e.g. 12 mins 30s ]
Training time: [e.g. 28 h]

About 'Mix Embedding' in paper.

In your paper, you mentioned that you 'add two mix embeddings to the visual tokens to implicitly differentiate the two mixing groups' and 'use different mix embeddings for the 4 stages of the encoder', which is in section 2.3. However, it appears to me that there isn't such mix embedding in the function forward_encoder of models_mixmim.py. So, if it's my silly mistake that I didn't find it, would you kindly point it out to me? Thanks!

关于混合图像训练的输入

Looking forward to zhe Conv Version of MIxMIM，

下游任务收敛速度慢？

请问在下游任务上收敛速度比imagenet权重要慢很多是正常的吗，，？
还有复现来看，感觉和simmim并没有明显的差距呢

关于GPU显存

你好，请问您用的16张GPU是什么型号内存呢，我只有2x48G的两张显卡可以复现结果么

is the CNN part also implemented?

Pretrained Weights

Hello, great work!
Can you release the pretrained weights of your MixMIM-B?

关于 absolute_pos_embed位置编码

请问替换为其它backbone还需要加上absolute_pos_embed位置编码吗，因为我看SimMIM也采用swin，但是没有加absolute_pos_embed，而且最新的一些transformer设计都已经用conv卷积来替换绝对位置编码了

关于混合损失的问题

我在代码里看见您在计算损失函数时，只计算重建的两张图像被mask区域的损失，对于这一点我没有异议。但是在target的设置上，我看您是直接是用源单张图像作为目标图像标签，而非源混合图像作为目标图像标签。不知道我描述得是否正确，还请您解惑

Integrating other models

Love the work. I would like to run MixMAE with PVT-V2, is the any way you could release the code for training other encoders (e.g. PVT-L in your paper)?

无法解析导入“petrel_client.client”

请问，我在win系统下，from petrel_client.client import Client提示无法解析导入“petrel_client.client”，这个问题怎么解决？
安装也找不到这个库

grad_norm always appear Inf？

Training with the default Settings, gradients always appear inf， is it normal？

When Code is available

Checkpoint of pre-trained model

Thank you for this great model!

Can you publish the self-sup pre-trained model on Imagenet?

代码小错误

在运行代码的过程中发现了两个小错误
add_weight_decay ——> param_groups_weight_decay
$RHLE4F(X{6V{_0BI%3`8S09$
module ——> _modules

预训练权重

非常感谢您的研究，正在深入您的研究内容，请问是否能提供预训练权重，感谢！

Visualization Demo

Is there a visualization demo using pre-trained MixMIM models? Thanks.

模型预训练权重

你好，请问可以分享一下MixMIM以Swin-B作为backbone在预训练过程中得到的权重吗？包括Swin-B编码器的权重和Transformer解码器的权重，非常感谢！

when release the convnet version of mixmim？

Attention masking bug?

It seems that your attention masking part is wrong, as the image patches from different images should have a different mask, while you use a single mask definition for all patches regardless of where they are from.

MixMIM/models_mixmim.py

Line 130 in 9da3eee

if attn_mask is not None:

BTW, how do you cope with mixing more images? I see only mixing 2 images case in the code.

请问这个怎么部署原始swin呢？

hi，我注意到模型的attention部分存在 masked self-attention计算，如果用到原始swin中，它里面shifted window也会存在一个mask计算，是不是会重叠不好部署呢？

About the mixed images

Hello,I can‘t find the implement of the mixed images input .Can you tell me where is ?

sense-x / mixmim Goto Github PK

mixmim's Issues

Recommend Projects

Recommend Topics

Recommend Org