ildoonet / pytorch-gradual-warmup-lr Goto Github PK

View Code? Open in Web Editor NEW

967.0 11.0 125.0 60 KB

Gradually-Warmup Learning Rate Scheduler for PyTorch

License: MIT License

Python 100.00%

pytorch deep-learning learning-rate-decay pytorch-extension multinode multigpu large-scale-learning

pytorch-gradual-warmup-lr's People

Contributors

Stargazers

Watchers

Forkers

asep-fajar-firmansyah barbecacov biexingle artechstark shuoyangd hiterstudy yuange250 felixzhang7 jcblaisecruz02 jhj93 valencebond zhouyonglong andyato maskong qchenzi ml-lab suzhengpeng haihaixia mengxiangming lihaossu phoenix1327 qiulinzhang liqi0126 lcaikk1314 saareliad ir1d ares2013 fanxing11 fireoil peipei-pig shenxuhui xyy19920105 yanghoji royzon xingliujia tobeatraceur pandapyh fhlt oryondark yat011 li-ming-fan mentorezio wendadeng twistedmove shashi29 pgsrv seunghyunni lliai 1adrianb kuaikuaikim mumianyuxin deep-learning-algorithm uehwan cultureli08 swhan9873 julianbaozi strideradu itisgrisha hityzy1122 jinming0912 hustzxd gtwell lvjc xzgr zhengjiawen ryul99 ricoshin nasir6 codewithzichao binhnd102 hucui2022 jizongfox pku1700013208 stonegiggity wkiulu cbe135 drownfish19 initxuan ozhyo xueyue404 haawron pravin74 sakura2233565548 chenfengchenfeng chen960 tjdhg456 msterdc chengyawlow siyao-shi godcherry foowaa alaskaw sunmild aunusualman dawnywu shouxieai unique-chan keshav-staqu pritam-n hyokong

pytorch-gradual-warmup-lr's Issues

It seems you got one learning rate per epoch.

If I read your code correctly, I think your learning rate computes based on number of epochs, not loops. According to the paper, it seems the learning rate should be computed based on number of loops.

multiplier works weird

While I modify the example code like this:

import torch
from torch.optim.lr_scheduler import StepLR, ExponentialLR
from torch.optim.sgd import SGD

from warmup_scheduler import GradualWarmupScheduler


if __name__ == '__main__':
    model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
    optim = SGD(model, 0.0001)

    # scheduler_warmup is chained with schduler_steplr
    scheduler_steplr = StepLR(optim, step_size=10, gamma=0.1)
    scheduler_warmup = GradualWarmupScheduler(optim, multiplier=10, total_epoch=5, after_scheduler=scheduler_steplr)

    # this zero gradient update is needed to avoid a warning message, issue #8.
    optim.zero_grad()
    optim.step()

    for epoch in range(1, 20):
        scheduler_warmup.step(epoch)
        print(epoch, optim.param_groups[0]['lr'])

        optim.step()    # backward pass (update network)

I get an unexcepted result, the sixth epoch is strange

1 0.00028
2 0.00045999999999999996
3 0.00064
4 0.00082
5 0.001
6 0.0001    
7 0.001
8 0.001
9 0.001
10 0.001
11 0.001
12 0.001
13 0.001
14 0.001
15 0.0001
16 0.0001
17 0.0001
18 0.0001
19 0.0001

`warmup_lr` is computed incorrectly in `step_ReduceLROnPlateau`

I wonder whether you forgot to modify like the line shown below in:

pytorch-gradual-warmup-lr/warmup_scheduler/scheduler.py

Line 44 in 6b5e895

 warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs] 

+ warmup_lr = self.get_lr()
- warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]

Here is the details:

When I use ReduceLROnPlateau as the after_scheduler of GradualWarmupScheduler, the warm-up failed. The way I get the learning rate is: optim.param_groups[0]['lr']. Then I use get_lr to get the learning rate, I found it is correct.
I use StepLR as the after_scheduler, I found there was no exception and no error.

Therefor, I think the learning rate of the optimizer hadn't been warmed up correctly.

When to call scheduler.step?

scheduler.step.is call after each batch or epoch?

why i got this error, when the warmup epoches ends.

File "train_mesh.py", line 266, in main scheduler_warmup.step() File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 39, in step return super(GradualWarmupScheduler, self).step(epoch) File "/home/liuziming/anaconda3/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 52, in step for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 30, in get_lr return self.after_scheduler.get_lr() AttributeError: 'ReduceLROnPlateau' object has no attribute 'get_lr'

when warmup epoches ends. i got this error. it use the examples as u shows. and run a simple resnet50 of torchvision

LR not work when pytorch version under 1.2.0

Hi, ildoonet!
Thanks for your code. But I found that your code does not work when the pytorch version <= 1.2.0.
In order not to cost time waste, I think it better specific the required pytorch version.

Question of run.py

Hello!
Within each epoch, shouldn't we firstly set gradient of the optimizer as zero? I think we should use "optim.zero_grad()" in front of each loop, is it right?

Set Starting learning rate

This is not an issue per say, maybe a modification/extension
I feel there should be an argument to set the learning rate at epoch zero, then gradually increase it to the target learning rate over some number of epochs (=5 in the paper).

Let me know what you guys think

What is the meaning of base_lrs?

I don't find the initialization of base_lrs? Does it initialize with 0?

StepLR and Cos has no funcion like ".get_last_lr()"

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, nesterov=True, weight_decay=0.0001) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_epoch, eta_min=0, last_epoch=-1) scheduler_warmup = GradualWarmupScheduler(optimizer, multiplier=8, total_epoch=5, after_scheduler=scheduler)
AttributeError: 'CosineAnnealingLR' object has no attribute 'get_last_lr'

why multiplier must be greater than 1.0?

I want to use this linear warmup for gradually increasing learning rate from 0 to base_lr, which involves multiplier being 1.0.

However, this code enforce us using multiplier greater than 1.0.
Do we really need this restriction?

Math is wrong for multiplier=1

Here

pytorch-gradual-warmup-lr/warmup_scheduler/scheduler.py

Line 44 in 6b5e895

 warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs] 

it should have the same special case from a few lines above:

if self.multiplier == 1.0:
    warmup_lr = [base_lr * (float(self.last_epoch) / self.total_epoch) for base_lr in self.base_lrs]
else:
    warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]

Otherwise the calculation will always be:

warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (0 * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr for base_lr in self.base_lrs]
# <=>
warmup_lr = self.base_lrs

optimizer.step() and lr_scheduler.step()

UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

when i use your readme code ,it has a bug????

AttributeError: 'StepLR' object has no attribute 'get_last_lr'

WARNING: Did not find branch or tag '08f7d5e', assuming revision or ref

Hello, thank you very much for your code, but I am currently encountering a minor issue. I encountered the following issue while using the Anconda prompt to install Python graphical warmup lr. git: WARNING: Did not find branch or tag '08f7d5e', assuming revision or ref. I am not sure if it was an address change or a module name change, and I hope you can help me clarify this

Target optimizer not set properly when loading from state dict

When loading the GradualWarmupScheduler from a state dict to resume a training, the optimizer attribute of the nested after_scheduler is loaded from the state_dict. This causes a static learning rate after resuming a training, as the after_scheduler tries to update the learning rate of an optimizer that doesn't match the one used by the resumed training. Setting self.after_scheduler.optimizer = self.optimizer as a part of the load_state_dict() method should probably suffice to fix this.

Usage mandatory metric

Is in the usage example required a mandatory metric param?
https://github.com/ildoonet/pytorch-gradual-warmup-lr#usage

the initial lr value is higher than target lr value

Thank you for sharing great works!

The initial LR value seems to be larger than I expected.

code

v = torch.zeros(10)
optim = torch.optim.SGD([v], lr=1e-2)
scheduler = GradualWarmupScheduler(optim, multiplier=8, total_epoch=10)

 for epoch in range(1, 20):
    scheduler.step(epoch)
    print(epoch, optim.param_groups[0]['lr'])

printed results

1 0.017
2 0.024
3 0.031000000000000003
4 0.038
5 0.045
6 0.052000000000000005
7 0.059000000000000004
8 0.066
9 0.073
10 0.08
11 0.08
12 0.08
13 0.08
14 0.08
15 0.08
16 0.08
17 0.08
18 0.08
19 0.08

As you can see, the initial lr value (0.017) is higher than target lr value (0.01).

This result is right ??