The efficienttrain from leaplabthu

Cannot reproduce the results on DeiT-Tiny for ImageNet-1k

Thanks for providing the code of EfficientTrain! We have some questions about the results of experiments on DeiT-Tiny for ImageNet-1k.

We try to reproduce the results of Table8 (a) in origin paper. We use the script in README to train DeiT-Tiny:

result_dir=/result/et_train/deit/imagenet_tiny_run1
dataset_dir=/home/data/imagenet/image-net.org/data/ILSVRC/2012
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python ET_training.py \
--data_path $dataset_dir \
--output_dir $result_dir \
--model deit_tiny_patch16_224 \
--final_bs 256 --epochs 300 \
--num_gpus 8 --num_workers 8

The part of log is as follows:

{"train_lr": 0.0032559395282436713, "train_min_lr": 0.0032559395282436713, "train_loss": 4.354069483824647, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.4368886798620224, "test_loss": 1.8930224837625729, "test_acc1": 57.76200146789551, "test_acc5": 81.58200245544434, "epoch": 99, "n_parameters": 5698984}
{"train_lr": 0.0032384000923775754, "train_min_lr": 0.0032384000923775754, "train_loss": 4.347171298299845, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.4361276876850006, "test_loss": 1.8822389315156376, "test_acc1": 57.42200148498535, "test_acc5": 81.62800247436523, "epoch": 100, "n_parameters": 5698984}
...
{"train_lr": 0.0011431063586592564, "train_min_lr": 0.0011431063586592564, "train_loss": 4.027055377188401, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.5856632233048097, "test_loss": 1.4776335459421663, "test_acc1": 66.34400211059571, "test_acc5": 87.6380025302124, "epoch": 199, "n_parameters": 5707432}
{"train_lr": 0.001122893755989195, "train_min_lr": 0.001122893755989195, "train_loss": 4.016480489228016, "train_weight_decay": 0.05000000000000049, "train_grad_norm": Infinity, "test_loss": 1.449809268116951, "test_acc1": 66.3800021170044, "test_acc5": 87.57600238250733, "epoch": 200, "n_parameters": 5707432}
{"train_lr": 0.0011027916320893176, "train_min_lr": 0.0011027916320893176, "train_loss": 4.003306985140229, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.5942679251997899, "test_loss": 1.461431296870989, "test_acc1": 66.61200170288086, "test_acc5": 87.78600267547607, "epoch": 201, "n_parameters": 5707432}
...
{"train_lr": 6.328153925039405e-06, "train_min_lr": 6.328153925039405e-06, "train_loss": 3.7115134861893377, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8702810922494302, "test_loss": 1.167566333185224, "test_acc1": 72.78800256469727, "test_acc5": 91.56800273162841, "epoch": 293, "n_parameters": 5717416}
{"train_lr": 4.8186317113912905e-06, "train_min_lr": 4.8186317113912905e-06, "train_loss": 3.6995124538930564, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.870861506423889, "test_loss": 1.1652020443888271, "test_acc1": 72.78800224243165, "test_acc5": 91.58000273162841, "epoch": 294, "n_parameters": 5717416}
...
{"train_lr": 1.2942618680829815e-06, "train_min_lr": 1.2942618680829815e-06, "train_loss": 3.7039048994580903, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8682293318785154, "test_loss": 1.1661551656091915, "test_acc1": 72.76000192993165, "test_acc5": 91.56800231323243, "epoch": 298, "n_parameters": 5717416}
{"train_lr": 1.042153755247833e-06, "train_min_lr": 1.042153755247833e-06, "train_loss": 3.705186177713749, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8685083528741812, "test_loss": 1.1662813990431673, "test_acc1": 72.74400226898193, "test_acc5": 91.56600253082276, "epoch": 299, "n_parameters": 5717416}

The best acc of DeiT-Tiny is 72.8, we cannot reproduce the result of 73.3.

Furthermore, the best acc in 100 epoch and 200 epoch are 57.4 and 66.4, respectively.

Cannot achieve 68.1 and 71.8.

leaplabthu / efficienttrain Goto Github PK

efficienttrain's People

Contributors

Stargazers

Watchers

efficienttrain's Issues

Cannot reproduce the results on DeiT-Tiny for ImageNet-1k

Perfect job. Can you share cswinBase384_22k model?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent