Giter VIP home page Giter VIP logo

esrgan-tf2's People

Contributors

peteryux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

esrgan-tf2's Issues

Loss is always nan

Hello PeteryuX,

Thanks a lot for sharing your implementation of ESRGAN.

I have been testing some of the GAN based superresolution network recently. I have got a lot of training HR/LR images and would like to train the ESRGAN (PSNR+ESRGAN) network using your training code.

I have followed your instructions on data preparation and converted my 1,825,587 pairs of LR/HR samples to *bin.tfrecord checked dataset_checker no problem, LR/HR images displayed well, modified few lines of your code for the hardcoded paths etc. and started PSNR training on the RTX3090 GPU. However, the calculated and printed out "loss" is always "nan" in every iteration, and even after "successfully" finished PSNR training, the loss_D and loss_G in ESRGAN training is also shown as "nan".

in psnr training:
...
Training [>> ] 20004/600000, loss=nan, lr=2.0e-04 2.0 step/sec
...

in esrgan training:
...
Training [>>> ] 40000/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
[*] save ckpt file at ./checkpoints/esrgan/ckpt-32
Training [>>>> ] 47877/285240, loss_G=nan, loss_D=nan, lr_G=1.0e-04, lr_D=1.0e-04 1.4 step/sec
...

Do you have any suggestions on this issue?

I here attach the psnr+esrgan parameter files:

psnr.yaml:
batch_size: 64
input_size: 32
gt_size: 128
ch_size: 3
scale: 4
sub_name: 'psnr_pretrain'
pretrain_name: null

network_G:
nf: 64
nb: 23

train_dataset:
path: '/data/EOSC/EOSC_sub_bin.tfrecord'
num_samples: 1825587
using_bin: True
using_flip: True
using_rot: True
test_dataset:
EOSC_path: '/data2/EOSC_test'

niter: 600000
lr: !!float 2e-4
lr_steps: [200000, 300000, 400000, 500000]
lr_rate: 0.5

adam_beta1_G: 0.9
adam_beta2_G: 0.99

w_pixel: 1.0
pixel_criterion: l1
save_steps: 20000

esrgan.yaml:
batch_size: 64
input_size: 32
gt_size: 128
ch_size: 3
scale: 4
sub_name: 'esrgan'
pretrain_name: 'psnr_pretrain'

network_G:
nf: 64
nb: 23
network_D:
nf: 64

train_dataset:
path: '/data/EOSC/EOSC_sub_bin.tfrecord'
num_samples: 1825587
using_bin: True
using_flip: False
using_rot: False
test_dataset:
EOSC_path: '/data2/EOSC_test'

niter: 285240
lr_G: !!float 1e-4
lr_D: !!float 1e-4
lr_steps: [60000, 120000, 180000, 240000]
lr_rate: 0.5

adam_beta1_G: 0.9
adam_beta2_G: 0.99
adam_beta1_D: 0.9
adam_beta2_D: 0.99

w_pixel: !!float 1e-2
pixel_criterion: l1

w_feature: 1.0
feature_criterion: l1

w_gan: !!float 5e-3
gan_type: ragan # gan | ragan

save_steps: 20000

Any help would be much appreciated! Thank you!

loss_maybe_wrong

Hi, I'm here again!becuase i maybe find the wrong code.
In the code.
def generator_loss(hr, sr): return cross_entropy(tf.ones_like(sr), sigma(sr))

I have a question that tf.one_like maybe hr, sigma(sr)

update the gradient do not need freeze one of network?

when update the gradient , did not freeze g or d in code
`
@tf.function
def train_step(lr, hr):
with tf.GradientTape(persistent=True) as tape:
sr = generator(lr, training=True)
hr_output = discriminator(hr, training=True)
sr_output = discriminator(sr, training=True)

        losses_G = {}
        losses_D = {}
        losses_G['reg'] = tf.reduce_sum(generator.losses)
        losses_D['reg'] = tf.reduce_sum(discriminator.losses)
        losses_G['pixel'] = cfg['w_pixel'] * pixel_loss_fn(hr, sr)
        losses_G['feature'] = cfg['w_feature'] * fea_loss_fn(hr, sr)
        losses_G['gan'] = cfg['w_gan'] * gen_loss_fn(hr_output, sr_output)
        losses_D['gan'] = dis_loss_fn(hr_output, sr_output)
        total_loss_G = tf.add_n([l for l in losses_G.values()])
        total_loss_D = tf.add_n([l for l in losses_D.values()])
    grads_G = tape.gradient(
        total_loss_G, generator.trainable_variables)
    grads_D = tape.gradient(
        total_loss_D, discriminator.trainable_variables)
    optimizer_G.apply_gradients(
        zip(grads_G, generator.trainable_variables))
    optimizer_D.apply_gradients(
        zip(grads_D, discriminator.trainable_variables))

    return total_loss_G, total_loss_D, losses_G, losses_D

`

output SR images saved in comparison mode.

I am able to test the image but only with comparison mode, basically where the BiCu, ESRGAN, HR outputs are stitched.
I am not able to get a singular up-scaled image for LR input.

Speed

I have SR Resnet and the it faster than esrgan about x6 (speed).
How to enhanse the network to be faster? With some loss in quality.

Quality

After full training, my result is not good. SRGAN trained by a colleague is better than the trained by me on ESRGAN. Why I see artifacts on images? What I have to change?
Building

Transfer learning

Hello @peteryuX ,
It's a nice work and appreciatable for converting from pytorch to tensorflow. I have a doubt that can we perform transfer learning in this ESRGAN model for the custom dataset, because of Training issues?
Thanks in advance.

NOT an issue: It works amazingly well. However, use Spectral Normalization will converge VERY fast and nice !

Your codes works amazingly well. However, use Spectral Normalization will converge VERY fast and nice !

That is all. I changed 2 things: (1) Added Spectral Normalization to Conv2D layers of Discriminator network (except the first Conv2D layer) and (2) Replace VGG19 by VGGface for my problem of scale-up faces those are not very clear for a better face recognition/triplet loss.

Thanks mate.
Steve

[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.