sense-x / mixmim Goto Github PK
View Code? Open in Web Editor NEWMixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
License: MIT License
MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
License: MIT License
What's the hardware & training time? Specifically, I'm interested in the statistics below:
In your paper, you mentioned that you 'add two mix embeddings to the visual tokens to implicitly differentiate the two mixing groups' and 'use different mix embeddings for the 4 stages of the encoder', which is in section 2.3. However, it appears to me that there isn't such mix embedding in the function forward_encoder
of models_mixmim.py
. So, if it's my silly mistake that I didn't find it, would you kindly point it out to me? Thanks!
请问在下游任务上收敛速度比imagenet权重要慢很多是正常的吗,, ?
还有复现来看,感觉和simmim并没有明显的差距呢
你好,请问您用的16张GPU是什么型号内存 呢,我只有2x48G的两张显卡可以复现结果么
Hello, great work!
Can you release the pretrained weights of your MixMIM-B?
请问替换为其它backbone还需要加上absolute_pos_embed位置编码吗,因为我看SimMIM也采用swin,但是没有加absolute_pos_embed, 而且最新的一些transformer设计都已经用conv卷积来替换绝对位置编码了
我在代码里看见您在计算损失函数时,只计算重建的两张图像被mask区域的损失,对于这一点我没有异议。但是在target的设置上,我看您是直接是用源单张图像作为目标图像标签,而非源混合图像作为目标图像标签。不知道我描述得是否正确,还请您解惑
Love the work. I would like to run MixMAE with PVT-V2, is the any way you could release the code for training other encoders (e.g. PVT-L in your paper)?
请问,我在win系统下,from petrel_client.client import Client提示无法解析导入“petrel_client.client”,这个问题怎么解决?
安装也找不到这个库
Training with the default Settings, gradients always appear inf, is it normal?
Thank you for this great model!
Can you publish the self-sup pre-trained model on Imagenet?
非常感谢您的研究,正在深入您的研究内容,请问是否能提供预训练权重,感谢!
Is there a visualization demo using pre-trained MixMIM models? Thanks.
你好,请问可以分享一下MixMIM以Swin-B作为backbone在预训练过程中得到的权重吗?包括Swin-B编码器的权重和Transformer解码器的权重,非常感谢!
It seems that your attention masking part is wrong, as the image patches from different images should have a different mask, while you use a single mask definition for all patches regardless of where they are from.
Line 130 in 9da3eee
BTW, how do you cope with mixing more images? I see only mixing 2 images case in the code.
hi,我注意到模型的attention部分存在 masked self-attention计算,如果用到原始swin中,它里面shifted window也会存在一个mask计算, 是不是会重叠不好部署呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.