Thanks for providing the code of EfficientTrain! We have some questions about the results of experiments on DeiT-Tiny for ImageNet-1k.
We try to reproduce the results of Table8 (a) in origin paper. We use the script in README to train DeiT-Tiny:
result_dir=/result/et_train/deit/imagenet_tiny_run1
dataset_dir=/home/data/imagenet/image-net.org/data/ILSVRC/2012
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python ET_training.py \
--data_path $dataset_dir \
--output_dir $result_dir \
--model deit_tiny_patch16_224 \
--final_bs 256 --epochs 300 \
--num_gpus 8 --num_workers 8
{"train_lr": 0.0032559395282436713, "train_min_lr": 0.0032559395282436713, "train_loss": 4.354069483824647, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.4368886798620224, "test_loss": 1.8930224837625729, "test_acc1": 57.76200146789551, "test_acc5": 81.58200245544434, "epoch": 99, "n_parameters": 5698984}
{"train_lr": 0.0032384000923775754, "train_min_lr": 0.0032384000923775754, "train_loss": 4.347171298299845, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.4361276876850006, "test_loss": 1.8822389315156376, "test_acc1": 57.42200148498535, "test_acc5": 81.62800247436523, "epoch": 100, "n_parameters": 5698984}
...
{"train_lr": 0.0011431063586592564, "train_min_lr": 0.0011431063586592564, "train_loss": 4.027055377188401, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.5856632233048097, "test_loss": 1.4776335459421663, "test_acc1": 66.34400211059571, "test_acc5": 87.6380025302124, "epoch": 199, "n_parameters": 5707432}
{"train_lr": 0.001122893755989195, "train_min_lr": 0.001122893755989195, "train_loss": 4.016480489228016, "train_weight_decay": 0.05000000000000049, "train_grad_norm": Infinity, "test_loss": 1.449809268116951, "test_acc1": 66.3800021170044, "test_acc5": 87.57600238250733, "epoch": 200, "n_parameters": 5707432}
{"train_lr": 0.0011027916320893176, "train_min_lr": 0.0011027916320893176, "train_loss": 4.003306985140229, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.5942679251997899, "test_loss": 1.461431296870989, "test_acc1": 66.61200170288086, "test_acc5": 87.78600267547607, "epoch": 201, "n_parameters": 5707432}
...
{"train_lr": 6.328153925039405e-06, "train_min_lr": 6.328153925039405e-06, "train_loss": 3.7115134861893377, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8702810922494302, "test_loss": 1.167566333185224, "test_acc1": 72.78800256469727, "test_acc5": 91.56800273162841, "epoch": 293, "n_parameters": 5717416}
{"train_lr": 4.8186317113912905e-06, "train_min_lr": 4.8186317113912905e-06, "train_loss": 3.6995124538930564, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.870861506423889, "test_loss": 1.1652020443888271, "test_acc1": 72.78800224243165, "test_acc5": 91.58000273162841, "epoch": 294, "n_parameters": 5717416}
...
{"train_lr": 1.2942618680829815e-06, "train_min_lr": 1.2942618680829815e-06, "train_loss": 3.7039048994580903, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8682293318785154, "test_loss": 1.1661551656091915, "test_acc1": 72.76000192993165, "test_acc5": 91.56800231323243, "epoch": 298, "n_parameters": 5717416}
{"train_lr": 1.042153755247833e-06, "train_min_lr": 1.042153755247833e-06, "train_loss": 3.705186177713749, "train_weight_decay": 0.05000000000000049, "train_grad_norm": 0.8685083528741812, "test_loss": 1.1662813990431673, "test_acc1": 72.74400226898193, "test_acc5": 91.56600253082276, "epoch": 299, "n_parameters": 5717416}
The best acc of DeiT-Tiny is 72.8, we cannot reproduce the result of 73.3.
Furthermore, the best acc in 100 epoch and 200 epoch are 57.4 and 66.4, respectively.
Cannot achieve 68.1 and 71.8.