Giter VIP home page Giter VIP logo

mixmim's Introduction

Pytorch implementation of MixMAE (CVPR 2023)

tenser

This repo is the offcial implementation of the paper MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

@article{MixMAE,
  author  = {Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li},
  journal = {arXiv:2205.13137},
  title   = {MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers},
  year    = {2022},
}

Availble pretrained models

Models Params (M) FLOPs (G) Pretrain Epochs Top-1 Acc. Pretrain_ckpt Finetune_ckpt
Swin-B/W14 88 16.3 600 85.1 base_600ep base_600ep_ft
Swin-B/W16-384x384 89.6 52.6 600 86.3 base_600ep base_600ep_ft_384x384
Swin-L/W14 197 35.9 600 85.9 large_600ep large_600ep_ft
Swin-L/W16-384x384 199 112 600 86.9 large_600ep large_600ep_ft_384x384

Training and evaluation

We use Slurm for multi-node distributed pretraining and finetuning.

Pretrain

sh exp/base_600ep/pretrain.sh partition 16 /path/to/imagenet
  • Training with 16 GPUs on your partition.
  • Batch size is 128 * 16 = 2048.
  • Default setting is to train for 600 epochs with mask ratio of 0.5.

Finetune

sh exp/base_600ep/finetune.sh partition 8 /path/to/imagenet
  • Training with 8 GPUs on your partition.
  • Batch size is 128 * 8 = 1024.
  • Default setting is to finetune for 100 epochs.

mixmim's People

Contributors

jihaonew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mixmim's Issues

无法解析导入“petrel_client.client”

请问,我在win系统下,from petrel_client.client import Client提示无法解析导入“petrel_client.client”,这个问题怎么解决?
安装也找不到这个库

Attention masking bug?

It seems that your attention masking part is wrong, as the image patches from different images should have a different mask, while you use a single mask definition for all patches regardless of where they are from.

if attn_mask is not None:

BTW, how do you cope with mixing more images? I see only mixing 2 images case in the code.

预训练权重

非常感谢您的研究,正在深入您的研究内容,请问是否能提供预训练权重,感谢!

关于混合损失的问题

我在代码里看见您在计算损失函数时,只计算重建的两张图像被mask区域的损失,对于这一点我没有异议。但是在target的设置上,我看您是直接是用源单张图像作为目标图像标签,而非源混合图像作为目标图像标签。不知道我描述得是否正确,还请您解惑

关于 absolute_pos_embed位置编码

请问替换为其它backbone还需要加上absolute_pos_embed位置编码吗,因为我看SimMIM也采用swin,但是没有加absolute_pos_embed, 而且最新的一些transformer设计都已经用conv卷积来替换绝对位置编码了

Integrating other models

Love the work. I would like to run MixMAE with PVT-V2, is the any way you could release the code for training other encoders (e.g. PVT-L in your paper)?

下游任务收敛速度慢?

请问在下游任务上收敛速度比imagenet权重要慢很多是正常的吗,, ?
还有复现来看,感觉和simmim并没有明显的差距呢

请问这个怎么部署原始swin呢?

hi,我注意到模型的attention部分存在 masked self-attention计算,如果用到原始swin中,它里面shifted window也会存在一个mask计算, 是不是会重叠不好部署呢?

代码小错误

在运行代码的过程中发现了两个小错误
add_weight_decay ——> param_groups_weight_decay
RHLE4F(X{6V{_0BI%3`8S09
module ——> _modules
VBS(UTF )0~TQXQ{NJ11GKA

About 'Mix Embedding' in paper.

In your paper, you mentioned that you 'add two mix embeddings to the visual tokens to implicitly differentiate the two mixing groups' and 'use different mix embeddings for the 4 stages of the encoder', which is in section 2.3. However, it appears to me that there isn't such mix embedding in the function forward_encoder of models_mixmim.py. So, if it's my silly mistake that I didn't find it, would you kindly point it out to me? Thanks!

About the mixed images

image
Hello,I can‘t find the implement of the mixed images input .Can you tell me where is ?

关于GPU显存

你好,请问您用的16张GPU是什么型号内存 呢,我只有2x48G的两张显卡可以复现结果么

模型预训练权重

你好,请问可以分享一下MixMIM以Swin-B作为backbone在预训练过程中得到的权重吗?包括Swin-B编码器的权重和Transformer解码器的权重,非常感谢!

Hardware & training time for pretraining

What's the hardware & training time? Specifically, I'm interested in the statistics below:

  • Arch & Epoch: [e.g. ViT-B, 300 ep]
  • Hardware: [e.g. single 8 V100 node]
  • Batch time: [e.g. 0.8s]
  • Epoch time: [e.g. 12 mins 30s ]
  • Training time: [e.g. 28 h]

Pretrained Weights

Hello, great work!
Can you release the pretrained weights of your MixMIM-B?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.