Giter VIP home page Giter VIP logo

mae's Introduction

MAE

PyTorch implementation of Masked AutoEncoder

Due to limited resources, I only test my randomly designed ViT-Tiny on the CIFAR10 dataset. It is not my goal to reproduce MAE perfectly, but my implementation is aligned with the official MAE as much as possible so that users can learn MAE quickly and accurately.

1. Pretrain

We have kindly provided the bash script train_pretrain.sh file for pretraining. You can modify some hyperparameters in the script file according to your own needs.

bash train_pretrain.sh

2. Finetune

We have kindly provided the bash script train_finetune.sh file for finetuning. You can modify some hyperparameters in the script file according to your own needs.

bash train_finetune.sh

3. Scratch

We have kindly provided the bash script train_scratch.sh file for training from scratch. You can modify some hyperparameters in the script file according to your own needs.

bash train_scratch.sh

4. Evaluate

  • Evaluate the top1 & top5 accuracy of ViT-Tiny on CIFAR10 dataset:
python train_finetune.py --dataset cifar10 -m vit_tiny --batch_size 256 --img_size 32 --patch_size 2 --eval --resume path/to/vit_tiny_cifar10.pth
  • Evaluate the top1 & top5 accuracy of ViT-Tiny on ImageNet-1K dataset:
python train_finetune.py --dataset imagenet_1k -m vit_tiny --batch_size 256 --img_size 224 --patch_size 16 --eval --resume path/to/vit_tiny_imagenet_1k.pth

5. Visualize Image Reconstruction

  • Evaluate MAE-ViT-Tiny on CIFAR10 dataset:
python train_pretrain.py --dataset cifar10 -m mae_vit_tiny --resume path/to/mae_vit_tiny_cifar10.pth --img_size 32 --patch_size 2 --eval --batch_size 1
  • Evaluate MAE-ViT-Tiny on ImageNet-1K dataset:
python train_pretrain.py --dataset imagenet_1k -m mae_vit_tiny --resume path/to/mae_vit_tiny_imagenet_1k.pth --img_size 224 --patch_size 16 --eval --batch_size 1

6. Experiments

6.1 MAE pretrain

  • Visualization on CIFAR10 validation

Masked Image | Original Image | Reconstructed Image

image

  • Visualization on ImageNet validation

...

6.2 Finutune

  • On CIFAR10
Model MAE pretrained Epoch Top 1 Weight MAE weight
ViT-Tiny No 300 86.8 ckpt -
ViT-Tiny Yes 100 91.8 ckpt ckpt
  • On ImageNet-1K
Model MAE pretrained Epoch Top 1 Weight MAE weight
ViT-Tiny No 300
ViT-Tiny Yes 100

Since ImageNet-1K is a sufficiently large-scale dataset, we recommend using the default training hyperparameters of the code to pretrain MAE and finetune ViT from MAE pretraining weight.

7. Acknowledgment

Thank you to Kaiming He for his inspiring work on MAE. His research effectively elucidates the semantic distinctions between vision and language, offering valuable insights for subsequent vision-related studies. I would also like to express my gratitude for the official source code of MAE. Additionally, I appreciate the efforts of IcarusWizard for reproducing the MAE implementation.

mae's People

Contributors

yjh0410 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.