Even by reducing the batch size the fusion model doesn't fit into a 16G GPU. <p di

Freddie, also the <a href="https://github.com/ElementAI/HighRes-net/issues/2#issuecomm

Yes, already tried that : <div class="highlight highlight-source-json notranslate

GG <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

RuntimeError: CUDA out of memory about highres-net HOT 7 CLOSED

servicenow commented on August 20, 2024

RuntimeError: CUDA out of memory

from highres-net.

Comments (7)

ilkarman commented on August 20, 2024 5

Freddie, also the create_patches = false in the config natually increases the memory usage.

With create_patches = true; I can run batch-size 32 with min_l and n_views = 16, which matches your memory usage predictions table. So the below works on our tiny 16GB GPUs:

  "training": {
     "num_epochs": 400,
     "batch_size": 32,

     "min_L": 16,
     "n_views": 16,
     "n_workers": 4,
     "crop": 3,


     "lr": 0.0007,
     "lr_step": 2,
     "lr_decay": 0.97,

     "load_lr_maps": false,
     "beta": 50.0,

    "create_patches": true,
    "patch_size": 64,
    "val_proportion": 0.10,
    "lambda": 0.000001
  }

}

from highres-net.

alkalait commented on August 20, 2024 3

That's right.

min_L acts as a lower-bound on the views always present. If the input has views fewer than min_L then it will be padded to min_L with dummy views (0-padding).
n_views : the number of LR views randomly* selected from the LR pool of a site.

For instance, what you want to avoid is n_views = 16 and min_L = 32 as that would always 0-pad your views to 32.

* views with larger clearance are more likely to be selected

If I were you, I'd try min_L = n_views = 16 first, then min_L = n_views = 8.

from highres-net.

alkalait commented on August 20, 2024

Have you tried reducing the number of views?

Have a look at the table in the README

Increased batch size will buy you more performance, whereas 32 views has diminished returns compared to 16.

from highres-net.

jqueguiner commented on August 20, 2024

Yes, already tried that :

  "training": {
     "num_epochs": 400,
     "batch_size": 16,

     "min_L": 32,
     "n_views": 4,
     "n_workers": 4,
     "crop": 3,


     "lr": 0.0007,
     "lr_step": 2,
     "lr_decay": 0.97,

     "load_lr_maps": false,
     "beta": 50.0,

    "create_patches": false,
    "patch_size": 64,
    "val_proportion": 0.10,
    "lambda": 0.000001
  }

gives

root@8b530a18a524:/data# python src/train.py --config config/config.json
  0%|                                                                                                                                                                                                                                                 | 0/400 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                     | 0/66 [00:00<?, ?it/s]
  File "src/train.py", line 307, in <module>
    main(config)
  File "src/train.py", line 293, in main
    trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config)
  File "src/train.py", line 174, in trainAndGetBestModel
    srs = fusion_model(lrs, alphas)  # fuse multi frames (B, 1, 3*W, 3*H)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 207, in forward
    layer1 = self.encode(stacked_input) # encode input tensor
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 74, in forward
    x = self.res_layers(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 34, in forward
    residual = self.block(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 961, in forward
    return F.prelu(input, self.weight)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1121, in prelu
    return torch.prelu(input, weight)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 15.75 GiB total capacity; 14.27 GiB already allocated; 433.56 MiB free; 1.16 MiB cached)

while GPU mem was flushed just before the experiment

from highres-net.

ilkarman commented on August 20, 2024

Would you also decrease min_L (minimum number of views) to match n_views? On a 16GB P100 I can just about squeeze into memory: batch=8, min_L=16 n_views=16 (but not if min_L=32)

from highres-net.

alkalait commented on August 20, 2024

Indeed. That flag flew under our radar!

from highres-net.

jqueguiner commented on August 20, 2024

GG @ilkarman

from highres-net.

RuntimeError: CUDA out of memory about highres-net HOT 7 CLOSED

Comments (7)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent