Giter VIP home page Giter VIP logo

pytorch-gradual-warmup-lr's People

Contributors

1adrianb avatar ildoonet avatar ir1d avatar shuoyangd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-gradual-warmup-lr's Issues

It seems you got one learning rate per epoch.

If I read your code correctly, I think your learning rate computes based on number of epochs, not loops. According to the paper, it seems the learning rate should be computed based on number of loops.

multiplier works weird

While I modify the example code like this:

import torch
from torch.optim.lr_scheduler import StepLR, ExponentialLR
from torch.optim.sgd import SGD

from warmup_scheduler import GradualWarmupScheduler


if __name__ == '__main__':
    model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
    optim = SGD(model, 0.0001)

    # scheduler_warmup is chained with schduler_steplr
    scheduler_steplr = StepLR(optim, step_size=10, gamma=0.1)
    scheduler_warmup = GradualWarmupScheduler(optim, multiplier=10, total_epoch=5, after_scheduler=scheduler_steplr)

    # this zero gradient update is needed to avoid a warning message, issue #8.
    optim.zero_grad()
    optim.step()

    for epoch in range(1, 20):
        scheduler_warmup.step(epoch)
        print(epoch, optim.param_groups[0]['lr'])

        optim.step()    # backward pass (update network)

I get an unexcepted result, the sixth epoch is strange

1 0.00028
2 0.00045999999999999996
3 0.00064
4 0.00082
5 0.001
6 0.0001    
7 0.001
8 0.001
9 0.001
10 0.001
11 0.001
12 0.001
13 0.001
14 0.001
15 0.0001
16 0.0001
17 0.0001
18 0.0001
19 0.0001

`warmup_lr` is computed incorrectly in `step_ReduceLROnPlateau`

I wonder whether you forgot to modify like the line shown below in:

warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]

+ warmup_lr = self.get_lr()
- warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]

Here is the details:

  1. When I use ReduceLROnPlateau as the after_scheduler of GradualWarmupScheduler, the warm-up failed. The way I get the learning rate is: optim.param_groups[0]['lr']. Then I use get_lr to get the learning rate, I found it is correct.
  2. I use StepLR as the after_scheduler, I found there was no exception and no error.

Therefor, I think the learning rate of the optimizer hadn't been warmed up correctly.

why i got this error, when the warmup epoches ends.

File "train_mesh.py", line 266, in main scheduler_warmup.step() File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 39, in step return super(GradualWarmupScheduler, self).step(epoch) File "/home/liuziming/anaconda3/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 52, in step for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 30, in get_lr return self.after_scheduler.get_lr() AttributeError: 'ReduceLROnPlateau' object has no attribute 'get_lr'

when warmup epoches ends. i got this error. it use the examples as u shows. and run a simple resnet50 of torchvision

LR not work when pytorch version under 1.2.0

Hi, ildoonet!
Thanks for your code. But I found that your code does not work when the pytorch version <= 1.2.0.
In order not to cost time waste, I think it better specific the required pytorch version.

Question of run.py

Hello!
Within each epoch, shouldn't we firstly set gradient of the optimizer as zero? I think we should use "optim.zero_grad()" in front of each loop, is it right?

Set Starting learning rate

This is not an issue per say, maybe a modification/extension
I feel there should be an argument to set the learning rate at epoch zero, then gradually increase it to the target learning rate over some number of epochs (=5 in the paper).

Let me know what you guys think

StepLR and Cos has no funcion like ".get_last_lr()"

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, nesterov=True, weight_decay=0.0001) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_epoch, eta_min=0, last_epoch=-1) scheduler_warmup = GradualWarmupScheduler(optimizer, multiplier=8, total_epoch=5, after_scheduler=scheduler)
AttributeError: 'CosineAnnealingLR' object has no attribute 'get_last_lr'

why multiplier must be greater than 1.0?

I want to use this linear warmup for gradually increasing learning rate from 0 to base_lr, which involves multiplier being 1.0.

However, this code enforce us using multiplier greater than 1.0.
Do we really need this restriction?

Math is wrong for multiplier=1

Here

warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
it should have the same special case from a few lines above:

if self.multiplier == 1.0:
    warmup_lr = [base_lr * (float(self.last_epoch) / self.total_epoch) for base_lr in self.base_lrs]
else:
    warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]

Otherwise the calculation will always be:

warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (0 * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr for base_lr in self.base_lrs]
# <=>
warmup_lr = self.base_lrs

WARNING: Did not find branch or tag '08f7d5e', assuming revision or ref

Hello, thank you very much for your code, but I am currently encountering a minor issue. I encountered the following issue while using the Anconda prompt to install Python graphical warmup lr. git: WARNING: Did not find branch or tag '08f7d5e', assuming revision or ref. I am not sure if it was an address change or a module name change, and I hope you can help me clarify this

Target optimizer not set properly when loading from state dict

When loading the GradualWarmupScheduler from a state dict to resume a training, the optimizer attribute of the nested after_scheduler is loaded from the state_dict. This causes a static learning rate after resuming a training, as the after_scheduler tries to update the learning rate of an optimizer that doesn't match the one used by the resumed training. Setting self.after_scheduler.optimizer = self.optimizer as a part of the load_state_dict() method should probably suffice to fix this.

the initial lr value is higher than target lr value

Thank you for sharing great works!

The initial LR value seems to be larger than I expected.

code

v = torch.zeros(10)
optim = torch.optim.SGD([v], lr=1e-2)
scheduler = GradualWarmupScheduler(optim, multiplier=8, total_epoch=10)

 for epoch in range(1, 20):
    scheduler.step(epoch)
    print(epoch, optim.param_groups[0]['lr'])

printed results

1 0.017
2 0.024
3 0.031000000000000003
4 0.038
5 0.045
6 0.052000000000000005
7 0.059000000000000004
8 0.066
9 0.073
10 0.08
11 0.08
12 0.08
13 0.08
14 0.08
15 0.08
16 0.08
17 0.08
18 0.08
19 0.08

As you can see, the initial lr value (0.017) is higher than target lr value (0.01).

This result is right ??

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.