Giter VIP home page Giter VIP logo

Comments (4)

lifelongeek avatar lifelongeek commented on May 30, 2024

oops.. The paper already mention about Question1.

"However, we do not claim that this is a new method to quantitatively evaluate
generative models yet. The constant scaling factor that depends on the critic’s
architecture means it’s hard to compare models with different critics."

from wassersteingan.

gwern avatar gwern commented on May 30, 2024

Question2. Could you share what the sensitive hyperparameter for WGAN are? For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)

I can comment a bit on this: me and @FeepingCreature have been trying out WGAN for modeling anime images (and 64px cropped faces specifically because attempts at larger images and more diverse datasets failed totally even with increased learning rates/discriminator steps). So far we've found that batch size doesn't seem especially important, model size and image size are very important (64px works great but 128px struggles to get anywhere, we've had better results enlarging the model while keeping it at 64px), learning rate is important and higher than the defaults doesn't seem to work well, and #D/G steps or --Diters can be useful to tweak and definitely must be increased if learning rate is increased. We haven't tried changing the weight initialization or the clamping, but we have tried adding 4 fully connected layers to the generator (in between the latent z vector input and the convolutional layers) and the discriminator (at the top before the final state output) to try to encourage more global coherency, and this currently seems very promising but we haven't run any of the FC models to convergence yet so maybe it won't wind up helping. The Loss_D is reasonably helpful but hasn't turned out to be a panacea - there are long stretches where it bounces up and down despite the apparent image quality increasing. Overfitting thus far has not been a problem and expanding our face dataset, cleaning out non-faces (using a modified version of main.py to do scoring of image files by the discriminator to find & delete non-faces), and aggressive data augmentation have not helped - the WGANs heavily underfit.

Personally, I'm still wondering what it'll take to get unsupervised GANs to generate really diverse scenes on the level of StackGAN. I thought perhaps regular DCGANs could do it except that they diverged before leaning; but while, very impressively, none of my WGANs have diverged (just plateaued and stopped learning), they're still limited to highly homogeneous image sets.

from wassersteingan.

LukasMosser avatar LukasMosser commented on May 30, 2024

@gmkim90 I also have some experience here to share:
Increasing the learning rate required me to also set the DIters parameter higher to 20 instead of 5.
That produced a somewhat similar curve to what is shown in the paper, but flattening at much higher loss.

I have also observed improvements in image quality with no decrease in the discriminator loss, although I am still learning whether it is necessary for me to increase DIters further.

Leaving learning rate and Diters at its default never lowers my loss for the datasets I'm running (may be due to not enough DIters)

from wassersteingan.

Kaede93 avatar Kaede93 commented on May 30, 2024

I have similar questions, but it's relative to Wasserstein distance.

I trained WGAN, WGAN-GP, and WGAN-DIV using celebA dataset and DCGAN in 64*64 image size (with default hyperparameters recommended in paper).

  1. The WD scale in WGAN are in range [0, 2], WGAN-GP and WGAN-DIV can reach hundreds at the beginning of training, and converged in range [0, 10]. The images generated by WGAN-DIV is much better than WGAN (using same noise), but why the "distance" between the fake and real are much higher in WGAN-DIV?

  2. I replaced the lsgan loss in CycleGAN model with wgan loss, and the WD is extremely small (about 1e-4) at the beginning of training, and then goes to nan (which means that the gradients vanished?). So I am wondering if the wgan loss works in other GAN (except for DCGAN)?

Please contact me if you have any advice, thank you!

from wassersteingan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.