Comments (14)
Forcing optimize_config to None may not be a good idea, for example, it's not easy to adapt to fp16 training when setting optimizer_config to None.
Why not just comment out all the
.backward
and.step
in https://github.com/open-mmlab/mmrazor/blob/master/mmrazor/models/algorithms/autoslim.py#L226I tried and it works.
Thank you for your issue.
It does work to comment out all the .backward
and .step
. And we use these here just to save GPU memory. It is hacky as it's not easy to adapt to fp16 training. But just using OptimizerHook
may lead to cuda out of memory
when training on bigger dataset, such as Imagenet. We are still working on it.
from mmrazor.
Maybe I didn't make it clear.
code 1:
loss1 = forward_loss()
loss2 = forward_loss()
loss3 = forward_loss()
loss = loss1 + loss2 + loss3
loss.backward()
optimizor.step()
code2:
loss1 = forward_loss()
loss1.backward()
loss2 = forward_loss()
loss2.backward()
loss3 = forward_loss()
loss3.backward()
optimizor.step()
The memory cost of code1 is 3 times as much as that of code2.
If we use OptimizerHook
, the effect is similar to code1.
from mmrazor.
I ran into this issue just now. And I solve it by setting optimizer_config = None
as
The setting avoids the second call of
backward()
via no registering OptimizerHook.from mmrazor.
Forcing optimize_config to None may not be a good idea, for example, it's not easy to adapt to fp16 training when setting optimizer_config to None.
Why not just comment out all the .backward
and .step
in https://github.com/open-mmlab/mmrazor/blob/master/mmrazor/models/algorithms/autoslim.py#L226
I tried and it works.
from mmrazor.
I am wondering why the current implementation would result OOM, I reimplement with SlimConv2D from https://github.com/JiahuiYu/slimmable_networks/blob/master/models/slimmable_ops.py#L25.
And there is no OOM occur.
from mmrazor.
from mmrazor.
I am wondering why the current implementation would result OOM, I reimplement with SlimConv2D from https://github.com/JiahuiYu/slimmable_networks/blob/master/models/slimmable_ops.py#L25.
And there is no OOM occur.
It is better to make a new issue to discuss this question. @twmht
from mmrazor.
I am wondering why the current implementation would result OOM, I reimplement with SlimConv2D from https://github.com/JiahuiYu/slimmable_networks/blob/master/models/slimmable_ops.py#L25.
And there is no OOM occur.
Look here. In the official code, the gradient of the loss is computed immediately for each width_mult
. That is, loss.backward()
is executed immediately after forward_loss
. So there is no OOM occur.
But in our implementation, if we use OptimizerHook
, the pseudo code will be roughly as follows:
loss = 0.
for i, subnet in enumerate(self.channel_cfg):
...
model_loss, _ = self._parse_losses(model_losses)
loss += model_loss
loss.backward()
It will use much more memory.
from mmrazor.
But you can also do in the same way with mmrazor, where is the OOM from if you do in the same way ?
from mmrazor.
Interesting. To the best of my knowledge, I though both use the same memory. Thank you for pointing out that.
from mmrazor.
Interesting. To the best of my knowledge, I though both use the same memory. Thank you for pointing out that.
I'm sorry to bother you again. To some degree, the above code2 has the same effect as grad-cumulating. So, I think the implementation like code2 may save much memory. If it turns out that I'm wrong, please let me know. Thanks.
from mmrazor.
Yup. I think you are right, since we need to trace the autograd graph when training.
from mmrazor.
Well, I found out it also works by commenting out all the backward and step when training object detection, No OOM happens.
It's weird since the consuming memory is almost the same as when doing normal object detection training.
My pytorch version is 1.9.
What is your batch size and number of gpus when training with ImageNet?
from mmrazor.
Samples_per_gpu=128 and GPUS=8
from mmrazor.
Related Issues (20)
- I can't reproduce dfad results
- How to get started??
- [Bug] TypeError: 'NoneType' object is not iterable
- Try to reproduce CWD in VOC data set
- [Bug] (suggested temporary fix) Pytorch >= 2 causes mmrazor.engine to fail HOT 4
- [Bug] (suggested fix) `nn.Parameter` are not added to root after being traced in `mmrazor.models.task_utils.tracer.fx.custom_tracer.build_graphmodule()` HOT 2
- [Bug] (suggested fix) `mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()` fails if there are modules present in other modes but not in forward `mode='tensor'` HOT 4
- I want to obtain the current epoch value and associate it with the custom distillation loss
- cannot use recorder to obtain panoptic_head info from mask2former
- [Bug] `mmrazor.engine.runner.quantization_loops.QATValLoop` calls `after_val_epoch` hook twice with different keys, causing `mmengine.hooks.checkpoint_hook._save_best_checkpoint()` to fail with `KeyError` for the `save_best` config
- [Bug] Custom Distillation MMSeg CWD loss nan problem
- When I use methodoutputs to access the results of assigner, I only obtain one sample
- Regarding tables and accuracy
- [Bug] (suggested fix) `mmrazor.models.algorithms.mm_architecture.MMArchitectureQuant.get_deploy_model()` fails if `predict` mode lacks nodes from the `model.quantizer.tracer.skipped_methods` configuration, but the architecture `quantizer.prepare(fp32_model)` has these nodes. HOT 4
- Is this a dead project ? HOT 1
- 我在用mmrazor通过yolov5-x蒸馏yolov5-s时候遇到了问题 HOT 1
- No Sign of activation quantization with QAT HOT 1
- MAP is stucked at 0 for Mobilenet V2 SSD QAT without pretrained model [Bug]
- [Docs] A100算力加持!书生大模型实战营第3期全面升级,趣味闯关模式等你开启
- Missing keys after RTMDET knowledge distillation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmrazor.