I'm trying to reproduce this part of the main table.
However, the result alsways seems to be off. Especially the score from ImageNet-A, which always lying around 20-22
Here is the result from 3 seeds, using the same version of given dependencies (python 3.8)
|
val_top1 |
val_top5 |
imagenet-a_top1 |
imagenet-a_top5 |
imagenet-r_top1 |
imagenet-r_top5 |
sketch_top1 |
sketch_top5 |
imagenetv2-matched-frequency-format-val_top1 |
imagenetv2-matched-frequency-format-val_top5 |
imagenet-style_top1 |
imagenet-style_top5 |
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 |
81.69 |
96.078 |
20.787 |
43.987 |
35.233 |
50.2 |
35.788 |
57.684 |
71.17 |
90.49 |
17.842 |
31.726 |
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 |
81.586 |
96.066 |
21.147 |
44.227 |
35.053 |
49.967 |
35.56 |
57.399 |
71.28 |
90.45 |
17.656 |
31.644 |
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 |
81.598 |
96.088 |
20.653 |
43.933 |
35.26 |
49.923 |
35.825 |
57.682 |
71.41 |
90.29 |
17.78 |
31.634 |
Here is the result from the same 3 seeds, using different version of dependencies (similar results from above)
|
val_top1 |
val_top5 |
imagenet-a_top1 |
imagenet-a_top5 |
imagenet-r_top1 |
imagenet-r_top5 |
sketch_top1 |
sketch_top5 |
imagenetv2-matched-frequency-format-val_top1 |
imagenetv2-matched-frequency-format-val_top5 |
imagenet-style_top1 |
imagenet-style_top5 |
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 |
81.676 |
96.13 |
18.36 |
41.08 |
34.863 |
49.9 |
35.803 |
57.893 |
71.31 |
90.37 |
17.388 |
31.048 |
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 |
81.63 |
96.108 |
20.56 |
43.587 |
35.21 |
50.053 |
35.827 |
57.661 |
71.35 |
90.35 |
17.676 |
31.65 |
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 |
81.66 |
96.108 |
20.013 |
42.84 |
35.27 |
49.93 |
35.837 |
57.832 |
71.2 |
90.29 |
17.708 |
31.476 |
Here is the setting I used
{
"data": "Dataset/CV/imagenet/train",
"seg_data": "work/data/general/imagenet-s/ImageNetS919/train-semi-segmentation",
"workers": 4,
"epochs": 50,
"start_epoch": 0,
"batch_size": 8,
"lr": 3e-06,
"momentum": 0.9,
"weight_decay": 0.0001,
"print_freq": 10,
"resume": "",
"evaluate": false,
"pretrained": false,
"world_size": -1,
"rank": -1,
"dist_url": "tcp://224.66.41.62:23456",
"dist_backend": "nccl",
"gpu": 1,
"save_interval": 20,
"num_samples": 3,
"multiprocessing_distributed": false,
"lambda_seg": 0.8,
"lambda_acc": 0.2,
"experiment_folder": "experiment/vitb_robustvit_environment_seed/lr_3e-06_seg_0.8_acc_0.2_bckg_2.0_fgd_0.3_num_epochs_50_seed_1",
"dilation": 0,
"lambda_background": 2.0,
"lambda_foreground": 0.3,
"num_classes": 500,
"temperature": 1.0,
"class_seed": 1, # or 27, 42
"folder_name": "vitb_robustvit_environment_seed"
}
I used model_best.pth.tar
to make an evaluation. Anything I should do or try to make the result closer to the paper?