Comments (2)
Hi,
1/ The initial learning rate is set to 0.15. That is to say, weight decay args.wd / args.weight_decay = 1e-4 / 0.15 on ImageNet. Is it right?
-- Yes, this is how we set the weight decay.
2/ Two lr schedules have been studied in this paper...
-- The accuracy we got with step decay is higher than AdamW but worse than the result of adahessian with plateau decay. The reason behind this is that step decay (i.e., decay the lr by a factor of 10 at epoch 30/60) is heavily tuned for sgd optimizer.
3/ Could you further share the hyper parameter settings of the plateau based schedule.
-- No, we make the patience to be 3 (we did not tune it yet and we believe if you tune this parameter, you may be able to get a better result). The exact command we use is: torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, verbose=True, threshold=0.001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
Please let us know if you have any other questions.
Best,
from adahessian.
Hi,
1/ The initial learning rate is set to 0.15. That is to say, weight decay args.wd / args.weight_decay = 1e-4 / 0.15 on ImageNet. Is it right?
-- Yes, this is how we set the weight decay.
2/ Two lr schedules have been studied in this paper...
-- The accuracy we got with step decay is higher than AdamW but worse than the result of adahessian with plateau decay. The reason behind this is that step decay (i.e., decay the lr by a factor of 10 at epoch 30/60) is heavily tuned for sgd optimizer.
3/ Could you further share the hyper parameter settings of the plateau based schedule.
-- No, we make the patience to be 3 (we did not tune it yet and we believe if you tune this parameter, you may be able to get a better result). The exact command we use is: torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, verbose=True, threshold=0.001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
Please let us know if you have any other questions.
Best,
Great! Many thanks!
from adahessian.
Related Issues (20)
- AdaHessian in tensorflow 1 version
- Alpha unused HOT 1
- Optimizer is not respecting "trainable" attribute of variables.
- Replace numpy power by TF pow HOT 1
- Help using adahessian in TensorFlow HOT 3
- Error using adahessian in PyTorch HOT 3
- About how to group my params
- Use of AdaHessian with batched training data? HOT 2
- Reasonable learning rate range for adahessian?
- Use of FP16 in backward with create_graph = True?
- Is Hutch++ applicable to improve AdaHessian? HOT 1
- Scalability Question HOT 1
- Inconsistence between paper and training scripts on NMT tasks
- Images
- Object Detection HOT 1
- Possible to use with PyTorch Lightning? HOT 1
- Pre-trained model not available anymore (google drive link expired)
- Can this deal with complex numbers?
- Performance issue about tf.function HOT 1
- I get this error when I use the AdaHessian. Is it a bug?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adahessian.