victorca25 / trainner Goto Github PK

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.

License: Apache License 2.0

Python 95.08% MATLAB 0.73% Jupyter Notebook 4.19%

convolutional-neural-networks esrgan super-resolution upscale pix2pix cyclegan denoising deblurring cartoonization srflow

trainner's People

Contributors

Stargazers

Watchers

trainner's Issues

"EOFError: Ran out of input" "AttributeError: Can't pickle local object 'get_totensor.<locals>.<lambda>'"

21-08-20 15:26:58.678 - INFO: Dataset [SingleDataset - seta] is created.
21-08-20 15:26:58.678 - INFO: Number of test_1 images in [seta]: 100
21-08-20 15:26:58.678 - INFO: Dataset [SingleDataset - setb] is created.
21-08-20 15:26:58.678 - INFO: Number of test_2 images in [setb]: 100
21-08-20 15:26:58.709 - INFO: AMP library available
21-08-20 15:27:03.014 - INFO: Loading pretrained model for G [C:\Users\User\Desktop\traiNNer-master\traiNNer-master\codes\experiments\pretrained_models\4x_RRDB_ESRGAN.pth]
21-08-20 15:27:03.400 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987
21-08-20 15:27:03.400 - INFO: Model [SRModel] created.
21-08-20 15:27:03.400 - INFO:
Testing [seta]...
Traceback (most recent call last):
File "test.py", line 253, in
main()
File "test.py", line 249, in main
test_loop(model, opt, dataloaders, data_params)
File "test.py", line 120, in test_loop
for data in dataloader:
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_totensor..'

C:\Users\User\Desktop\traiNNer-master\traiNNer-master\codes>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

The avobe error message come up when I run "python test.py -opt options/sr/test_sr.yml".
And I modified the yml to specify the image and model path.
But That error message appeared.
How can run test.py ? why this error message appear? I don't know how do I run traiNNer...

AttributeError: module 'collections' has no attribute 'Iterable'

Traceback (most recent call last):
File "D:\PycharmProjects\traiNNer\codes\train.py", line 500, in
main()
File "D:\PycharmProjects\traiNNer\codes\train.py", line 496, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "D:\PycharmProjects\traiNNer\codes\train.py", line 224, in fit
for n, train_data in enumerate(dataloaders['train'], start=1):
File "D:\venvs\AI\Lib\site-packages\torch\utils\data\dataloader.py", line 631, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "D:\venvs\AI\Lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\venvs\AI\Lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data
data.reraise()
File "D:\venvs\AI\Lib\site-packages\torch_utils.py", line 722, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "D:\venvs\AI\Lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
^^^^^^^^^^^^^^^^^^^^
File "D:\venvs\AI\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\venvs\AI\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "D:\PycharmProjects\traiNNer\codes\data\aligned_dataset.py", line 126, in getitem
A_transform = get_transform(
^^^^^^^^^^^^^^
File "D:\PycharmProjects\traiNNer\codes\dataops\augmentations.py", line 573, in get_transform
transform_list.append(transforms.Resize(osize, method))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\PycharmProjects\traiNNer\codes\dataops\augmennt\augmennt\transforms.py", line 175, in init
elif isinstance(size, collections.Iterable) and len(size) == 2:
^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'collections' has no attribute 'Iterable'

Update requirements.txt with proper versioning for torch

There is a better way to define torch's requirement in requirements.txt. the current way installs CPU by default.

I know how to do this better. I will include this in a PR later.

Video learning rate too high

By default the video learning rate is 0.001, which is far too high when replacing the SR component with RRDB (for sofvsr). In general this is still pretty high. I recommend setting the LR to 0.0001 and then increase the ofr weights instead.

I will make a PR for this.

UnicodeDecodeError: 'charmap' can't decode byte 0x88

Tried to train a model, always end up with this error in cmd.

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB

On Colab restarting training again results in following error:

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in call_impl
result = self.forward(*input, **kwargs)
File "/content/BasicSR/codes/models/modules/architectures/block.py", line 428, in forward
sampled_noise = self.noise.repeat(*x.size()).normal() * scale
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 14.32 GiB already allocated; 20.75 MiB free; 14.46 GiB reserved in total by PyTorch)

The only way to restart training was to reduce batch all the way from 64 to 3

I've tried running following commands to no avail:
gc.collect()
torch.cuda.empty_cache()

There seems to a resolution here:
https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27

train_ppon.py import errors

I tried train_ppon.py and got an import error on

from models.modules.LPIPS import compute_dists as lpips

I tried commenting it out and the usage of lpips and there seemed to be other import errors.

I've not tried test_ppon.py, but maybe that has errors too.

"CUDA out of memory. Tried to allocate 1.48 GiB" when trying to validate

It uses about 3GB when training and then about 5.2 when it starts the validation then it crashes.
My training data is 512x512 jpg files, one frame: https://imgur.com/a/MulGiTE

GPU: 2070 super
CPU: 5600x
cuda 11 installed
torch==1.9.1+cu111 torchvision==0.10.1+cu111

Complete powershell output:

export CUDA_VISIBLE_DEVICES=0
Path already exists. Rename it to [D:\Code\GitHub\BasicSR\experiments\debug_001_template_archived_210928-091440]
21-09-28 09:14:40.677 - INFO:   name: debug_001_template
  use_tb_logger: True
  model: srragan
  scale: 4
  gpu_ids: [0]
  use_amp: False
  use_swa: False
  datasets:[
    train:[
      name: DIV2K
      mode: LRHRC
      dataroot_HR: ..\..\train\hr
      dataroot_LR: ..\..\train\lr
      subset_file: None
      use_shuffle: True
      znorm: False
      n_workers: 6
      batch_size: 8
      virtual_batch_size: 8
      HR_size: 128
      image_channels: 3
      dataroot_kernels: ../training/kernels/results/
      lr_downscale: True
      lr_downscale_types: [1, 2, 777]
      use_flip: True
      use_rot: True
      hr_rrot: False
      lr_blur: False
      lr_blur_types: ['gaussian', 'clean', 'clean', 'clean']
      noise_data: ../noise_patches/normal/
      lr_noise: False
      lr_noise_types: ['gaussian', 'JPEG', 'clean', 'clean', 'clean', 'clean']
      lr_noise2: False
      lr_noise_types2: ['dither', 'dither', 'clean', 'clean']
      hr_noise: False
      hr_noise_types: ['gaussian', 'clean', 'clean', 'clean', 'clean']
      phase: train
      scale: 4
      data_type: img
    ]
    val:[
      name: val_set14_part
      mode: LRHROTF
      dataroot_HR: ..\..\val\hr
      dataroot_LR: ..\..\val\lr
      znorm: False
      lr_downscale: False
      lr_downscale_types: [1, 2]
      phase: val
      scale: 4
      data_type: img
    ]
  ]
  path:[
    strict: False
    root: D:\Code\GitHub\BasicSR
    pretrain_model_G: ..\experiments\pretrained_models\1xPSNR.pth
    experiments_root: D:\Code\GitHub\BasicSR\experiments\debug_001_template
    models: D:\Code\GitHub\BasicSR\experiments\debug_001_template\models
    training_state: D:\Code\GitHub\BasicSR\experiments\debug_001_template\training_state
    log: D:\Code\GitHub\BasicSR\experiments\debug_001_template
    val_images: D:\Code\GitHub\BasicSR\experiments\debug_001_template\val_images
  ]
  network_G:[
    strict: False
    which_model_G: RRDB_net
    norm_type: None
    mode: CNA
    nf: 64
    nb: 23
    nr: 3
    in_nc: 3
    out_nc: 3
    gc: 32
    group: 1
    convtype: Conv2D
    net_act: leakyrelu
    gaussian: True
    plus: False
    scale: 4
  ]
  network_D:[
    strict: True
    which_model_D: discriminator_vgg
    norm_type: batch
    act_type: leakyrelu
    mode: CNA
    nf: 64
    in_nc: 3
    nlayer: 3
    num_D: 3
  ]
  train:[
    lr_G: 0.0001
    weight_decay_G: 0
    beta1_G: 0.9
    lr_D: 0.0001
    weight_decay_D: 0
    beta1_D: 0.9
    lr_scheme: MultiStepLR
    lr_gamma: 0.5
    swa_start_iter: 375000
    swa_lr: 0.0001
    swa_anneal_epochs: 10
    swa_anneal_strategy: cos
    pixel_criterion: l1
    pixel_weight: 0.01
    feature_criterion: l1
    feature_weight: 1
    gan_type: vanilla
    gan_weight: 0.005
    manual_seed: 0
    niter: 500000.0
    val_freq: 8
    metrics: psnr,ssim,lpips
    overwrite_val_imgs: None
    val_comparison: None
    lr_decay_iter: 10
    lr_steps: [50000, 100000, 200000, 300000]
  ]
  logger:[
    print_freq: 2
    save_checkpoint_freq: 8
    overwrite_chkp: False
  ]
  is_train: True

21-09-28 09:14:40.678 - INFO: Random seed: 0
21-09-28 09:14:41.321 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-09-28 09:14:41.322 - INFO: Number of train images: 63,792, iters: 7,974
21-09-28 09:14:41.323 - INFO: Total epochs needed: 63 for iters 500,000
21-09-28 09:14:41.324 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-09-28 09:14:41.324 - INFO: Number of val images in [val_set14_part]: 5
21-09-28 09:14:41.558 - INFO: AMP library available
21-09-28 09:14:42.583 - INFO: Initialization method [kaiming]
21-09-28 09:14:42.799 - INFO: Initialization method [kaiming]
21-09-28 09:14:42.891 - INFO: Loading pretrained model for G [..\experiments\pretrained_models\1xPSNR.pth] ...
21-09-28 09:14:43.753 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987
21-09-28 09:14:43.754 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281
21-09-28 09:14:43.756 - INFO: Model [SRRaGANModel] is created.
21-09-28 09:14:43.757 - INFO: Start training from epoch: 0, iter: 0
21-09-28 09:14:52.560 - INFO: <epoch:  0, iter:       2, lr:1.000e-04, t:-1.0000s, td:3.0840s, eta:0.0000h> pix-l1: 1.6838e-03 fea-vgg19-l1: 1.5493e+00 l_g_gan: 6.9997e-03 l_d_real: 3.2938e-01 l_d_fake: 3.4658e-01 D_real: 5.9246e-01 D_fake: -4.6950e-01
21-09-28 09:14:53.462 - INFO: <epoch:  0, iter:       4, lr:1.000e-04, t:-1.0000s, td:0.0000s, eta:0.0000h> pix-l1: 2.4982e-03 fea-vgg19-l1: 1.7615e+00 l_g_gan: 1.9201e-02 l_d_real: 5.7227e-02 l_d_fake: 6.6596e-02 D_real: 1.0418e+00 D_fake: -2.7365e+00
21-09-28 09:14:54.274 - INFO: <epoch:  0, iter:       6, lr:1.000e-04, t:0.9020s, td:0.0000s, eta:125.2761h> pix-l1: 2.0472e-03 fea-vgg19-l1: 1.7822e+00 l_g_gan: 3.1084e-02 l_d_real: 6.8650e-03 l_d_fake: 3.2773e-03 D_real: 1.3466e+00 D_fake: -4.8651e+00
21-09-28 09:14:55.233 - INFO: <epoch:  0, iter:       8, lr:1.000e-04, t:0.8125s, td:0.0000s, eta:112.8456h> pix-l1: 2.5441e-03 fea-vgg19-l1: 1.4662e+00 l_g_gan: 2.8835e-02 l_d_real: 1.3615e-02 l_d_fake: 5.1962e-03 D_real: 1.5751e+00 D_fake: -4.1826e+00
21-09-28 09:14:55.669 - INFO: Models and training states saved.
Setting up Perceptual loss...
Loading model from: J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\modules\LPIPS\lpips_weights\v0.1\squeeze.pth
...[net-lin [squeeze]] initialized
...Done
Traceback (most recent call last):
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\train.py", line 416, in <module>
    main()
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\train.py", line 412, in main
    fit(model, opt, dataloaders, steps_states, data_params, loggers)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\train.py", line 289, in fit
    model.test()  # run inference
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\SRRaGAN_model.py", line 387, in test
    self.forward(CEM_net=CEM_net)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\SRRaGAN_model.py", line 254, in forward
    self.fake_H = self.netG(self.var_L)  # G(LR)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\parallel\data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\modules\architectures\RRDBNet_arch.py", line 49, in forward
    x = self.model(x)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\modules\architectures\block.py", line 195, in forward
    output = x + self.sub(x)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\modules\architectures\RRDBNet_arch.py", line 93, in forward
    out = self.RDB3(out)
  File "C:\Program Files\Python39\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "J:\Videos\ESRGAN\DATASET\traiNNer-2.0\codes\models\modules\architectures\RRDBNet_arch.py", line 159, in forward
    x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
RuntimeError: CUDA out of memory. Tried to allocate 1.48 GiB (GPU 0; 8.00 GiB total capacity; 2.59 GiB already allocated; 332.74 MiB free; 5.41 GiB reserved in total by PyTorch)

lam < 0 or lam contains NaNs

I got error( 'lam < 0 or lam contains NaNs') on some images in
at https://github.com/victorca25/BasicSR/blob/14aced7d1049a283761c145f3cf300a94c6ac4b9/codes/dataops/augmentations.py#L786
I just modified to perform gaussian noise on the error images

try:
  noise_img = np.random.poisson(img_LR * vals) / float(vals)
except ValueError:
  print("Poissoning Err,Just Gaussing it")
  compression = np.random.uniform(10, 50) #randomize quality between 10 and 50%
  encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), compression] #encoding parameters
  # encode
  is_success, encimg = cv2.imencode('.jpg', img_LR, encode_param) 
  
  # decode
  noise_img = cv2.imdecode(encimg, 1) 
  noise_img = noise_img.astype(np.uint8)

PPON Error when moving to Phase 2

I was training a model with PPON (192) + MultiScale + Diffaug, and I receive the following error when moving to Phase 2:
I have AMP disabled because my GPU doesn't support it.
error.log

21-01-27 11:26:52.449 - INFO: Random seed: 0
21-01-27 11:26:52.647 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-01-27 11:26:52.647 - INFO: Number of train images: 37,933, iters: 2,371
21-01-27 11:26:52.647 - INFO: Total epochs needed: 43 for iters 100,000
21-01-27 11:26:52.648 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-01-27 11:26:52.648 - INFO: Number of val images in [val_set14_part]: 1
21-01-27 11:26:52.650 - INFO: AMP library available
21-01-27 11:26:52.827 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.127 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.185 - INFO: Loading pretrained model for G [../experiments/pretrained_models/PPON_G.pth] ...
21-01-27 11:26:55.276 - INFO: Network G structure: DataParallel - PPON, with parameters: 17,267,657
21-01-27 11:26:55.277 - INFO: Network D structure: DataParallel - MultiscaleDiscriminator, with parameters: 8,296,899
21-01-27 11:26:55.277 - INFO: Model [PPONModel] is created.
21-01-27 11:26:55.277 - INFO: Start training from epoch: 0, iter: 0
21-01-27 11:26:55.991 - INFO: Switching to phase: p2, step: 1
Traceback (most recent call last):
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 382, in
main()
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 378, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 221, in fit
model.optimize_parameters(virtual_step) # calculate loss functions, get gradients, update network weights
File "/mnt/ext4-storage/Training/BasicSR/codes/models/ppon_model.py", line 199, in optimize_parameters
l_g_total.backward()
AttributeError: 'float' object has no attribute 'backward'

[Suggestion]: Relativistic GAN Type

I was reading about the GAN types (Vanilla, LSGAN, and WGAN-GP) already included in BasicSR, and I found a new type that may bring a sizable performance increase to the discriminators used in upscaling methods like ESRGAN and PPON.

https://arxiv.org/abs/1807.00734

This paper outlines the idea behind a relativistic discriminator and showcases new variants of existing GANs that were created to use this approach.
There is also source code available:
https://www.github.com/AlexiaJM/RelativisticGAN

The one that stood out to me was RaLSGAN.

It performs better than the other variants in most tests involving generating images that are 128x128 or less. When it comes to SGAN (Standard GAN), it outperforms this variant by a large margin.

Interested to hear your thoughts on this,

N0man

NameError: name 'log_dir' is not defined

Hey,
When trying to train, I got the following error. Does anyone know how to fix it?

Traceback (most recent call last):
File "train.py", line 65, in configure_loggers
tb_logger = SummaryWriter(log_dir=log_dir)
NameError: name 'log_dir' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 379, in
main()
File "train.py", line 351, in main
loggers = configure_loggers(opt)
File "train.py", line 69, in configure_loggers
tb_logger = SummaryWriter(logdir=log_dir)
NameError: name 'log_dir' is not defined

E:\BasicSR-master\codes>

how to train for real ESRGAN

hi
how can I use your model for train super resolution whit ideas of real ESRGAN(2 stage degradation)??

GPU usage at 0% during training

So i see it is taking up VRAM but im not seeing in windows 10 that it is using the GPU
I am using a 4090 cuda V11.8.89 pytorch 1.12.1 python 3.9

Is there any way to train video super resolution models using this?

I can't tell whether its possible to train something like EGVSR or BaicVSR. Is that an upcoming feature?

so many bugs in your sftgan implementation

Multifolder dataset limitation

From what I can tell, when using multiple folders to specify training data, they must have the same prefix path. If that is not done, training gives a confusing "image too large" error.

If prefix requirement is intended, perhaps good to document in the example file and/or detect for a more specific error message?

FutureWarning and UserWarning

D:\traiNNer\codes\models\base_model.py:921: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
self.grad_clip(
C:\Python39\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "

How fix this?

ETA is grossly over-estimated.

I think your ETA calculation is off by a factor of 100...

Through both my own manual calculations and visually inspecting the ETA while training, it seems that the decimal separator is placed in the wrong spot.

Example:
23-07-26 02:16:57.959 - INFO: <epoch: 33, iter: 167,900, lr:6.250e-06, t:31.6040s, td:0.0001s, eta:281.8024h> pix-l1: 4.2061e-04 fea-vgg19-l1: 7.0643e-01 l_g_gan: 1.4697e-02 l_d_real: 5.6621e-02 l_d_fake: 5.6456e-02 D_real: 4.0469e+01 D_fake: 3.7594e+01

Here we see that the ETA is 281.8024h(rs), when in fact it is much closer to 2.81hrs. This is consistent with every training run I've done so far, no matter how big or small the ETA actually is.

Even though I've gotten used to it now, I thought I might raise an issue here and let you know 👍

lr not updated upon niter change

I've noticed that if you update niter mid training, it displays the correct new 'lr steps', but it does not correct the current 'lr rate', in accordance with the new steps. Looks like, it just keep the old lr rate from 'latest.state' file.

Example:
When i change niter from 100,000 to 500,000, following were the logs:

Updating lr_steps from [10000, 20000, 40000, 60000] to [50000, 100000, 200000, 300000]
INFO: <epoch:432, iter: 46,600, lr:1.250e-05, ......

Herer is the settings in config file:
lr_steps_rel: [0.1, 0.2, 0.4, 0.6]
lr_G: 0.0001

Video dataloader crashes at 1x scale

Caused by it assuming that since the LR and HR datasets are the same size it should generate LR on the fly.

This should be an easy fix, I'll make a PR for it later.

Correct usage of lmdb

I used create_lmdb.py to create both my LR and HR datasets, and I was wondering how I should configure my options file.
Do the settings differ from using HR/LR image folders?

Add lr_crop_size in config

I think crop size is a bit misleading for people that are new to training. Many times I've had to explain how training a 1x model with a crop size of 128 is equivalent to training a 4x model at 512. It seems silly to set the crop size to 32 for a 1x model, but it might seem less silly if every model by default used the same LR crop size of 32 regardless of scale.

At the very least I think a comment or something explaining this concept would suffice. This might be a case of us just needing better documentation rather than adding extra hand holding.

Thoughts?

Pix2Pix 3->1 channel

Hi!
Is it possible to initialize pix2pix for working with 3 channel input image and 1 channel output?

Trying with following options:

name: 001_pix2pix_test
use_tb_logger: true
model: pix2pix
scale: 1
gpu_ids: [0]
use_amp: true
use_swa: false

# Dataset options:
datasets:
  train:
    name: test
    mode: aligned
    outputs: AB
    dataroot_B: '../datasets/test/B'
    dataroot_A: '../datasets/test/A'
    
    use_shuffle: true
    n_workers: 8
    batch_size: 2
    virtual_batch_size: 2
    preprocess: none
    crop_size: 256 
    input_nc: 3
    output_nc: 1
    image_channels: 3

# Generator options:
network_G:
    strict: false
    which_model_G: unet_net

# Discriminator options:
network_D:
    strict: true
    which_model_D: patchgan
    in_nc: 4

And got this in few secs after training start:

2024-01-03 11:07:26.963466: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
24-01-03 11:07:34.462 - WARNING: From c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.
24-01-03 11:07:35.310 - INFO: Random seed: 0
24-01-03 11:07:35.677 - INFO: Dataset [AlignedDataset - test] is created.
24-01-03 11:07:35.677 - INFO: Number of train images: 3,159, epoch iters: 1,579
24-01-03 11:07:35.678 - INFO: Total epochs needed: 32 for iters 50,000
24-01-03 11:07:35.910 - INFO: AMP library available
24-01-03 11:07:36.229 - INFO: Initialization method [kaiming]
24-01-03 11:07:36.900 - INFO: Initialization method [kaiming]
24-01-03 11:07:36.927 - INFO: GAN enabled
24-01-03 11:07:36.929 - INFO: AMP enabled
24-01-03 11:07:36.930 - INFO: Network G structure: DataParallel - UnetGenerator, with parameters: 54,413,955
24-01-03 11:07:36.930 - INFO: Network D structure: DataParallel - NLayerDiscriminator, with parameters: 2,766,657
24-01-03 11:07:36.931 - INFO: Model [Pix2PixModel] created.
24-01-03 11:07:36.930 - INFO: Network G structure: DataParallel - UnetGenerator, with parameters: 54,413,955
24-01-03 11:07:36.930 - INFO: Network D structure: DataParallel - NLayerDiscriminator, with parameters: 2,766,657
24-01-03 11:07:36.931 - INFO: Model [Pix2PixModel] created.
24-01-03 11:07:36.931 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "f:\GIT\traiNNer\codes\train.py", line 500, in <module>
    main()
  File "f:\GIT\traiNNer\codes\train.py", line 496, in main
    fit(model, opt, dataloaders, steps_states, data_params, loggers)
  File "f:\GIT\traiNNer\codes\train.py", line 238, in fit
    model.optimize_parameters(virtual_step)  # calculate loss functions, get gradients, update network weights
  File "f:\GIT\traiNNer\codes\models\pix2pix_model.py", line 219, in optimize_parameters
    self.backward_D()
  File "f:\GIT\traiNNer\codes\models\pix2pix_model.py", line 146, in backward_D
    self.log_dict = self.backward_D_Basic(
  File "f:\GIT\traiNNer\codes\models\base_model.py", line 871, in backward_D_Basic
    l_d_total, gan_logs = self.adversarial(
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "f:\GIT\traiNNer\codes\models\losses.py", line 595, in forward
    return self.conditional_discriminator(
  File "f:\GIT\traiNNer\codes\models\losses.py", line 530, in conditional_discriminator
    return self.regular_discriminator(
  File "f:\GIT\traiNNer\codes\models\losses.py", line 536, in regular_discriminator
    pred_d_fake, pred_d_real = self.get_predictions_dis(
  File "f:\GIT\traiNNer\codes\models\losses.py", line 475, in get_predictions_dis
    pred_d_fake = netD(fake.detach())
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\parallel\data_parallel.py", line 183, in forward
    return self.module(*inputs[0], **module_kwargs[0])
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "f:\GIT\traiNNer\codes\models\modules\architectures\discriminators.py", line 579, in forward
    return self.model(x)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
    input = module(input)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 4, 4, 4], expected input[2, 6, 1024, 1024] to have 4 channels, but got 6 channels instead

I'm double checked: all A images got 3 channels and all B images got only 1 grayscale channel.
Whats going on and it's a possible at all?

Pixel Unshuffle is broken

Training a 1x model with Pixel Unshuffle (using the supplied pretrained model) yields this error:

[Python] RuntimeError: Given groups=1, weight of size [64, 48, 3, 3], expected input[1, 4, 297, 397] to have 48 channels, but got 4 channels instead [ESRGAN] Upscaling Error: Index was outside the bounds of the array. at Cupscale.PreviewMerger.Merge() at Cupscale.Main.Upscale.<Run>d__8.MoveNext()

[Feature Request] Curriculum Training for Augmentations

The ability to change which augmentation preset is being used at different points in training would be great. For example, at 10k iterations, resrgan_blur could be used, but at 30k it's automatically switched to bsrgan_blur.

This was discussed in the #trainner channel on the GU Discord server

Edit: A possible expansion on this idea, augmentation preset strengths. I'm not sure how it'd function, but I figured I'd bring it up

I Have No Idea What I Am Doing to Cause This:

python3 train.py -opt train_sr.yml
Traceback (most recent call last):
  File "/home/nickdbts2022/Desktop/traiNNer/codes/train.py", line 500, in 
    main()
  File "/home/nickdbts2022/Desktop/traiNNer/codes/train.py", line 466, in main
    opt = parse_options()
  File "/home/nickdbts2022/Desktop/traiNNer/codes/train.py", line 25, in parse_options
    opt = options.parse(args.opt, is_train=is_train)
  File "/home/nickdbts2022/Desktop/traiNNer/codes/options/options.py", line 552, in parse
    raise ValueError("Configuration file {} not found.".format(opt_path))
ValueError: Configuration file options/train/train_sr.yml not found.

"cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\resize.cpp:4065: error: (-215:Assertion failed) inv_scale_x > 0 in function 'cv::resize'"

This error shows up after starting train.py with the configuration that came with traiNNer. Fresh install with nothing modified (except the train_sr.yml"

23-05-31 19:09:48.945 - INFO: Random seed: 0
23-05-31 19:09:49.264 - INFO: Dataset [AlignedDataset - DIV2K] is created.
23-05-31 19:09:49.266 - INFO: Number of train images: 14,361, epoch iters: 1,795
23-05-31 19:09:49.266 - INFO: Total epochs needed: 279 for iters 500,000
23-05-31 19:09:49.266 - INFO: Dataset [AlignedDataset - val_set14_part] is created.
23-05-31 19:09:49.267 - INFO: Number of val images in [val_set14_part]: 1
23-05-31 19:09:49.624 - INFO: AMP library available
23-05-31 19:09:51.252 - INFO: Initialization method [kaiming]
23-05-31 19:09:51.547 - INFO: Initialization method [kaiming]
23-05-31 19:09:51.670 - INFO: Loading pretrained model for G [..\experiments\pretrained_models\RRDB_PSNR_x4.pth]
23-05-31 19:09:52.916 - INFO: GAN enabled
23-05-31 19:09:52.922 - INFO: AMP enabled
23-05-31 19:09:52.923 - INFO: norm gradient clip enabled. Clip value: 0.1.
23-05-31 19:09:52.935 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987
23-05-31 19:09:52.936 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281
23-05-31 19:09:52.936 - INFO: Model [SRModel] created.
23-05-31 19:09:52.936 - INFO: Start training from epoch: 0, iter: 0
E:\nn\trainner\codes\models\base_model.py:921: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
self.grad_clip(
C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Traceback (most recent call last):
File "E:\nn\trainner\codes\train.py", line 500, in
main()
File "E:\nn\trainner\codes\train.py", line 496, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "E:\nn\trainner\codes\train.py", line 224, in fit
for n, train_data in enumerate(dataloaders['train'], start=1):
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1229, in _process_data
data.reraise()
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch_utils.py", line 425, in reraise
raise self.exc_type(msg)
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\nn\trainner\codes\data\aligned_dataset.py", line 117, in getitem
img_A, img_B = paired_imgs_check(
File "E:\nn\trainner\codes\dataops\augmentations.py", line 1388, in paired_imgs_check
img_A, img_B = shape_change_fn(
File "E:\nn\trainner\codes\dataops\augmentations.py", line 1141, in shape_change_fn
img_A = transforms.Resize((int(h/scale), int(w/scale)),
File "E:\nn\trainner\codes\dataops\augmennt\augmennt\transforms.py", line 192, in call
return F.resize(img, self.size, self.interpolation)
File "E:\nn\trainner\codes\dataops\augmennt\augmennt\common.py", line 211, in wrapped_function
result = func(img, *args, **kwargs)
File "E:\nn\trainner\codes\dataops\augmennt\augmennt\functional.py", line 187, in resize
output = cv2.resize(img, dsize=(size[1], size[0]), interpolation=_cv2_str2interpolation[interpolation])
cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\resize.cpp:4065: error: (-215:Assertion failed) inv_scale_x > 0 in function 'cv::resize'

[SUGGESTION] Zooming-Slow-Mo implementation

Very cool fork...

According to this video they achieved better (quality) performances of DAIN+EDVR, so it would be great to be implemented in standard-motion too:

https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020

Hope that inspires !

TypeError: 'NoneType' object cannot be interpreted as an integer

Hello. I am training on Colab and i get following error.

export CUDA_VISIBLE_DEVICES=0
20-12-30 03:59:48.658 - INFO: name: ftrainer
use_tb_logger: False
model: srragan
scale: 8
batch_multiplier: 1
gpu_ids: [0]
datasets:[
train:[
name: Dataset
mode: LRHROTF
dataroot_HR: ['/content/datasets/set0/train/hr', '/content/datasets/set1/train/hr', '/content/datasets/set2/train/hr']
dataroot_LR: ['/content/datasets/set0/train/lr', '/content/datasets/set1/train/lr', '/content/datasets/set2/train/lr']
subset_file: None
use_shuffle: True
n_workers: 4
batch_size: 100
HR_size: 128
phase: train
scale: 8
data_type: img
virtual_batch_size: 100
]
val:[
name: Validation
mode: LRHROTF
dataroot_HR: ['/content/datasets/set0/val/hr', '/content/datasets/set1/val/hr', '/content/datasets/set2/val/hr']
dataroot_LR: ['/content/datasets/set0/val/lr', '/content/datasets/set1/val/lr', '/content/datasets/set2/val/lr']
phase: val
scale: 8
data_type: img
]
]
path:[
root: /content/BasicSR/
pretrain_model_G: ../experiments/pretrained_models/Restart.pth
experiments_root: /content/BasicSR/experiments/ftrainer
models: /content/BasicSR/experiments/ftrainer/models
training_state: /content/BasicSR/experiments/ftrainer/training_state
log: /content/BasicSR/experiments/ftrainer
val_images: /content/BasicSR/experiments/ftrainer/val_images
]
network_G:[
which_model_G: RRDB_net
norm_type: None
mode: CNA
nf: 64
nb: 23
in_nc: 3
out_nc: 3
gc: 32
group: 1
convtype: Conv2D
net_act: leakyrelu
scale: 8
]
network_D:[
which_model_D: discriminator_vgg
norm_type: batch
act_type: leakyrelu
mode: CNA
nf: 64
in_nc: 3
]
train:[
lr_G: 0.0001
lr_D: 0.0001
use_frequency_separation: False
lr_scheme: MultiStepLR
lr_steps: [50000, 100000, 200000, 300000]
lr_gamma: 0.5
pixel_criterion: l1
pixel_weight: 0.01
feature_criterion: l1
feature_weight: 1
gan_type: vanilla
gan_weight: 0.005
manual_seed: 0
niter: 500000.0
val_freq: 100
overwrite_val_imgs: None
val_comparison: None
]
logger:[
print_freq: 100
save_checkpoint_freq: 100.0
backup_freq: 100
overwrite_chkp: None
]
is_train: True

20-12-30 03:59:48.658 - INFO: Random seed: 0
20-12-30 03:59:48.716 - INFO: Dataset [LRHRDataset - Dataset] is created.
20-12-30 03:59:48.716 - INFO: Number of train images: 1,307, iters: 14
20-12-30 03:59:48.716 - INFO: Total epochs needed: 35715 for iters 500,000
20-12-30 03:59:48.719 - INFO: Dataset [LRHRDataset - Validation] is created.
20-12-30 03:59:48.719 - INFO: Number of val images in [Validation]: 358
20-12-30 03:59:48.752 - INFO: AMP library available
Traceback (most recent call last):
File "train.py", line 256, in
main()
File "train.py", line 98, in main
model = create_model(opt)
File "/content/BasicSR/codes/models/init.py", line 26, in create_model
m = M(opt)
File "/content/BasicSR/codes/models/SRRaGAN_model.py", line 51, in init
self.netG = networks.define_G(opt).to(self.device) # G
File "/content/BasicSR/codes/models/networks.py", line 160, in define_G
finalact=opt_net['finalact'], gaussian_noise=opt_net['gaussian'], plus=opt_net['plus'], nr=opt_net['nr'])
File "/content/BasicSR/codes/models/modules/architectures/RRDBNet_arch.py", line 26, in init
gaussian_noise=gaussian_noise, plus=plus) for _ in range(nb)]
File "/content/BasicSR/codes/models/modules/architectures/RRDBNet_arch.py", line 26, in
gaussian_noise=gaussian_noise, plus=plus) for _ in range(nb)]
File "/content/BasicSR/codes/models/modules/architectures/RRDBNet_arch.py", line 86, in init
gaussian_noise=gaussian_noise, plus=plus) for _ in range(nr)]
TypeError: 'NoneType' object cannot be interpreted as an integer

Results are blue

I'm using the test.py as per the instruction and my results are blue. How can I run upscaling to produce proper results?

How do i even use this

Why are the installation instructions so complicated? I dont understand this shit.

Feature request/bug fix: Perform scaling and other operations in linear light

I was looking to try this out to train an upscaling model but thought to try one of my test images first, and found that downscaling was being done in srgb gamma. Most images are encoded as srgb (~188 is half as bright as 255) but downscaling algorithms, where it's especially relevant, assume they're taking linear rgb as input (~127 is half as bright as 255).

I used this image as my input (this isn't a good image for training an upscaler but it does demonstrate the problem) and manually ran it through resize from imresize.py, the same way it is done in generate_mod_LR_bic.py. It's best to open this image in a program that does not perform any scaling, since your browser might be doing some.

What I got out of it at 1/4 scale was a uniform grey square.

But I can fix this by converting it to and from linear RGB using the methods you already have in colors.py (the functions are named incorrectly, rgb2srgb should be srgb2rgb and vice versa):

    img = cv2.imread('gamma.jpg')
    img = img * 1.0 / 255
    img = torch.from_numpy(np.transpose(img[:, :, [2, 1, 0]], (2, 0, 1))).float()
    img = rgb2srgb(img)

    rlt = resize(img, 1/4)
    rlt = srgb2rgb(rlt)

    torchvision.utils.save_image(
        (rlt * 255).round() / 255, 'rlt.png', nrow=1, padding=0, normalize=False)

This code snippet gives me the expected result:

While this is an artificial example that exaggerates the effect, the colour distortion is going to happen to a varying degree on any images that are transformed in non-linear gamma. I believe this is decreasing the accuracy of the trained models, since they'll be learning to attempt to reverse this colour distortion which can cause noticeable colour shift when upscaling images that were not produced from srgb downscaling.

lmdb has no valid image file

i get this error

  File "C:\ManduScale\Train\codes\train.py", line 500, in <module>
    main()
  File "C:\ManduScale\Train\codes\train.py", line 487, in main
    dataloaders, data_params = get_dataloaders(opt)
  File "C:\ManduScale\Train\codes\train.py", line 134, in get_dataloaders
    dataset = create_dataset(dataset_opt)
  File "C:\ManduScale\Train\codes\data\__init__.py", line 79, in create_dataset
    dataset = D(dataset_opt)
  File "C:\ManduScale\Train\codes\data\aligned_dataset.py", line 41, in __init__
    self.A_paths, self.B_paths = get_dataroots_paths(self.opt, strict=False, keys_ds=self.keys_ds)
  File "C:\ManduScale\Train\codes\data\base_dataset.py", line 235, in get_dataroots_paths
    paths_A, paths_B = read_dataroots(opt, keys_ds=keys_ds)
  File "C:\ManduScale\Train\codes\data\base_dataset.py", line 168, in read_dataroots
    paths_A, paths_B = paired_dataset_validation(A_images_paths, B_images_paths,
  File "C:\ManduScale\Train\codes\data\base_dataset.py", line 99, in paired_dataset_validation
    A_paths = get_image_paths(data_type, paths[0], max_dataset_size)  # get image paths
  File "C:\ManduScale\Train\codes\dataops\common.py", line 82, in get_image_paths
    paths = sorted(_get_paths_from_images(dataroot, max_dataset_size=max_dataset_size))
  File "C:\ManduScale\Train\codes\dataops\common.py", line 43, in _get_paths_from_images
    assert images, '{:s} has no valid image file'.format(path)
AssertionError: C:\ManduScale\OPScale\DataSet\FourthSet\LR1.lmdb has no valid image file

my config is

dataroot_HR: ['C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR',  
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    'C:\ManduScale\OPScale\DataSet\FourthSet\HR', 
    ]
    dataroot_LR: ['C:\ManduScale\OPScale\DataSet\FourthSet\LR1.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR2.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR3.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR4.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR5.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR6.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR7.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR8.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR9.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR10.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR11.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR12.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR13.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR14.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR15.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR16.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR17.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR18.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR19.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR20.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR22.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR23.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR24.lmdb',
    'C:\ManduScale\OPScale\DataSet\FourthSet\LR25.lmdb',
    ]

anyone can help me fix this issue?

Forcing image size to a multiple of 4 even when 'scale' is 1

The first error is that it is forcing the image size to mutiple of 4 when the scale is 1.
Secondly, even thought it has cropped/expanded the image it still gives this error as it does not scale the corresponding lr image

LOGS:
The image size needs to be a multiple of 4. The loaded image size was (817, 398), so it was adjusted to (816, 400). This adjustment will be done to all images whose sizes are not multiples of 4.
The image size needs to be a multiple of 4. The loaded image size was (476, 485), so it was adjusted to (476, 484). This adjustment will be done to all images whose sizes are not multiples of 4.
Traceback (most recent call last):
File "train.py", line 417, in
main()
File "train.py", line 413, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "train.py", line 215, in fit
for n, train_data in enumerate(dataloaders['train'], start=1):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 73, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 73, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 400, 816] at entry 0 and [3, 560, 464] at entry 1

`nearest_aligned` is not aligned

When using nearest_aligned the output is noticeably shifted down and to the right. This affects my models severely and causes noticeable warping in their output.

I used this code in augmentations.py to produce the output images:

if __name__ == '__main__':
    img = cv2.imread('test.png')
    img_A, _ = Scale(img=img, scale=4, algo=997, ds_kernel=None, img_type='cv2')
    cv2.imwrite('output.png', img_A)

original image

Output from nearest_aligned as per the above code:

Output from convert test.png -interpolate Average -filter point -resize 25% magick-nearest.png:

Explicitly sampling the top left corner closely matches the offset from nearest_aligned:
convert test.png -define sample:offset=0%x0% -sample 25% magick-sampled-top-left.png

victorca25 / trainner Goto Github PK

trainner's People

Contributors

Stargazers

Watchers

Forkers

trainner's Issues

Recommend Projects

Recommend Topics

Recommend Org