Hello, I'm a little confused of your experimental settings on ImageN

Settings on ImageNet about adahessian HOT 2 CLOSED

amirgholami commented on June 18, 2024

Settings on ImageNet

from adahessian.

Comments (2)

yaozhewei commented on June 18, 2024

Hi,

1/ The initial learning rate is set to 0.15. That is to say, weight decay args.wd / args.weight_decay = 1e-4 / 0.15 on ImageNet. Is it right?
-- Yes, this is how we set the weight decay.

2/ Two lr schedules have been studied in this paper...
-- The accuracy we got with step decay is higher than AdamW but worse than the result of adahessian with plateau decay. The reason behind this is that step decay (i.e., decay the lr by a factor of 10 at epoch 30/60) is heavily tuned for sgd optimizer.

3/ Could you further share the hyper parameter settings of the plateau based schedule.
-- No, we make the patience to be 3 (we did not tune it yet and we believe if you tune this parameter, you may be able to get a better result). The exact command we use is: torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, verbose=True, threshold=0.001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

Please let us know if you have any other questions.

Best,

from adahessian.

lld533 commented on June 18, 2024

Hi,
1/ The initial learning rate is set to 0.15. That is to say, weight decay args.wd / args.weight_decay = 1e-4 / 0.15 on ImageNet. Is it right?
-- Yes, this is how we set the weight decay.
2/ Two lr schedules have been studied in this paper...
-- The accuracy we got with step decay is higher than AdamW but worse than the result of adahessian with plateau decay. The reason behind this is that step decay (i.e., decay the lr by a factor of 10 at epoch 30/60) is heavily tuned for sgd optimizer.
3/ Could you further share the hyper parameter settings of the plateau based schedule.
-- No, we make the patience to be 3 (we did not tune it yet and we believe if you tune this parameter, you may be able to get a better result). The exact command we use is: torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, verbose=True, threshold=0.001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
Please let us know if you have any other questions.
Best,

Great! Many thanks!

from adahessian.

Recommend Projects

Settings on ImageNet about adahessian HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent