Giter VIP home page Giter VIP logo

Comments (7)

ilkarman avatar ilkarman commented on August 20, 2024 5

Freddie, also the create_patches = false in the config natually increases the memory usage.

With create_patches = true; I can run batch-size 32 with min_l and n_views = 16, which matches your memory usage predictions table. So the below works on our tiny 16GB GPUs:

  "training": {
     "num_epochs": 400,
     "batch_size": 32,

     "min_L": 16,
     "n_views": 16,
     "n_workers": 4,
     "crop": 3,


     "lr": 0.0007,
     "lr_step": 2,
     "lr_decay": 0.97,

     "load_lr_maps": false,
     "beta": 50.0,

    "create_patches": true,
    "patch_size": 64,
    "val_proportion": 0.10,
    "lambda": 0.000001
  }

}

from highres-net.

alkalait avatar alkalait commented on August 20, 2024 3

That's right.

  • min_L acts as a lower-bound on the views always present. If the input has views fewer than min_L then it will be padded to min_L with dummy views (0-padding).

  • n_views : the number of LR views randomly* selected from the LR pool of a site.

For instance, what you want to avoid is n_views = 16 and min_L = 32 as that would always 0-pad your views to 32.

* views with larger clearance are more likely to be selected

If I were you, I'd try min_L = n_views = 16 first, then min_L = n_views = 8.

from highres-net.

alkalait avatar alkalait commented on August 20, 2024

Have you tried reducing the number of views?

Have a look at the table in the README

Increased batch size will buy you more performance, whereas 32 views has diminished returns compared to 16.

from highres-net.

jqueguiner avatar jqueguiner commented on August 20, 2024

Yes, already tried that :

  "training": {
     "num_epochs": 400,
     "batch_size": 16,

     "min_L": 32,
     "n_views": 4,
     "n_workers": 4,
     "crop": 3,


     "lr": 0.0007,
     "lr_step": 2,
     "lr_decay": 0.97,

     "load_lr_maps": false,
     "beta": 50.0,

    "create_patches": false,
    "patch_size": 64,
    "val_proportion": 0.10,
    "lambda": 0.000001
  }

gives

root@8b530a18a524:/data# python src/train.py --config config/config.json
  0%|                                                                                                                                                                                                                                                 | 0/400 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                     | 0/66 [00:00<?, ?it/s]
  File "src/train.py", line 307, in <module>
    main(config)
  File "src/train.py", line 293, in main
    trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config)
  File "src/train.py", line 174, in trainAndGetBestModel
    srs = fusion_model(lrs, alphas)  # fuse multi frames (B, 1, 3*W, 3*H)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 207, in forward
    layer1 = self.encode(stacked_input) # encode input tensor
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 74, in forward
    x = self.res_layers(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/src/DeepNetworks/HRNet.py", line 34, in forward
    residual = self.block(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 961, in forward
    return F.prelu(input, self.weight)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1121, in prelu
    return torch.prelu(input, weight)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 15.75 GiB total capacity; 14.27 GiB already allocated; 433.56 MiB free; 1.16 MiB cached)

while GPU mem was flushed just before the experiment

from highres-net.

ilkarman avatar ilkarman commented on August 20, 2024

Would you also decrease min_L (minimum number of views) to match n_views? On a 16GB P100 I can just about squeeze into memory: batch=8, min_L=16 n_views=16 (but not if min_L=32)

from highres-net.

alkalait avatar alkalait commented on August 20, 2024

Indeed. That flag flew under our radar!

from highres-net.

jqueguiner avatar jqueguiner commented on August 20, 2024

GG @ilkarman

from highres-net.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.