Thanks to share easy-to-follow code. I am currently applying WGAN to learning text

Questions on Loss scale, Hyperparameters about wassersteingan HOT 4 OPEN

martinarjovsky commented on May 30, 2024

Questions on Loss scale, Hyperparameters

from wassersteingan.

Comments (4)

lifelongeek commented on May 30, 2024

oops.. The paper already mention about Question1.

"However, we do not claim that this is a new method to quantitatively evaluate
generative models yet. The constant scaling factor that depends on the critic’s
architecture means it’s hard to compare models with different critics."

from wassersteingan.

gwern commented on May 30, 2024

Question2. Could you share what the sensitive hyperparameter for WGAN are? For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)

I can comment a bit on this: me and @FeepingCreature have been trying out WGAN for modeling anime images (and 64px cropped faces specifically because attempts at larger images and more diverse datasets failed totally even with increased learning rates/discriminator steps). So far we've found that batch size doesn't seem especially important, model size and image size are very important (64px works great but 128px struggles to get anywhere, we've had better results enlarging the model while keeping it at 64px), learning rate is important and higher than the defaults doesn't seem to work well, and #D/G steps or --Diters can be useful to tweak and definitely must be increased if learning rate is increased. We haven't tried changing the weight initialization or the clamping, but we have tried adding 4 fully connected layers to the generator (in between the latent z vector input and the convolutional layers) and the discriminator (at the top before the final state output) to try to encourage more global coherency, and this currently seems very promising but we haven't run any of the FC models to convergence yet so maybe it won't wind up helping. The Loss_D is reasonably helpful but hasn't turned out to be a panacea - there are long stretches where it bounces up and down despite the apparent image quality increasing. Overfitting thus far has not been a problem and expanding our face dataset, cleaning out non-faces (using a modified version of main.py to do scoring of image files by the discriminator to find & delete non-faces), and aggressive data augmentation have not helped - the WGANs heavily underfit.

Personally, I'm still wondering what it'll take to get unsupervised GANs to generate really diverse scenes on the level of StackGAN. I thought perhaps regular DCGANs could do it except that they diverged before leaning; but while, very impressively, none of my WGANs have diverged (just plateaued and stopped learning), they're still limited to highly homogeneous image sets.

from wassersteingan.

LukasMosser commented on May 30, 2024

@gmkim90 I also have some experience here to share:
Increasing the learning rate required me to also set the DIters parameter higher to 20 instead of 5.
That produced a somewhat similar curve to what is shown in the paper, but flattening at much higher loss.

I have also observed improvements in image quality with no decrease in the discriminator loss, although I am still learning whether it is necessary for me to increase DIters further.

Leaving learning rate and Diters at its default never lowers my loss for the datasets I'm running (may be due to not enough DIters)

from wassersteingan.

Kaede93 commented on May 30, 2024

I have similar questions, but it's relative to Wasserstein distance.

I trained WGAN, WGAN-GP, and WGAN-DIV using celebA dataset and DCGAN in 64*64 image size (with default hyperparameters recommended in paper).

The WD scale in WGAN are in range [0, 2], WGAN-GP and WGAN-DIV can reach hundreds at the beginning of training, and converged in range [0, 10]. The images generated by WGAN-DIV is much better than WGAN (using same noise), but why the "distance" between the fake and real are much higher in WGAN-DIV?
I replaced the lsgan loss in CycleGAN model with wgan loss, and the WD is extremely small (about 1e-4) at the beginning of training, and then goes to nan (which means that the gradients vanished?). So I am wondering if the wgan loss works in other GAN (except for DCGAN)?

Please contact me if you have any advice, thank you!

from wassersteingan.

Questions on Loss scale, Hyperparameters about wassersteingan HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent