Comments (4)
oops.. The paper already mention about Question1.
"However, we do not claim that this is a new method to quantitatively evaluate
generative models yet. The constant scaling factor that depends on the critic’s
architecture means it’s hard to compare models with different critics."
from wassersteingan.
Question2. Could you share what the sensitive hyperparameter for WGAN are? For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)
I can comment a bit on this: me and @FeepingCreature have been trying out WGAN for modeling anime images (and 64px cropped faces specifically because attempts at larger images and more diverse datasets failed totally even with increased learning rates/discriminator steps). So far we've found that batch size doesn't seem especially important, model size and image size are very important (64px works great but 128px struggles to get anywhere, we've had better results enlarging the model while keeping it at 64px), learning rate is important and higher than the defaults doesn't seem to work well, and #D/G steps or --Diters
can be useful to tweak and definitely must be increased if learning rate is increased. We haven't tried changing the weight initialization or the clamping, but we have tried adding 4 fully connected layers to the generator (in between the latent z vector input and the convolutional layers) and the discriminator (at the top before the final state output) to try to encourage more global coherency, and this currently seems very promising but we haven't run any of the FC models to convergence yet so maybe it won't wind up helping. The Loss_D is reasonably helpful but hasn't turned out to be a panacea - there are long stretches where it bounces up and down despite the apparent image quality increasing. Overfitting thus far has not been a problem and expanding our face dataset, cleaning out non-faces (using a modified version of main.py
to do scoring of image files by the discriminator to find & delete non-faces), and aggressive data augmentation have not helped - the WGANs heavily underfit.
Personally, I'm still wondering what it'll take to get unsupervised GANs to generate really diverse scenes on the level of StackGAN. I thought perhaps regular DCGANs could do it except that they diverged before leaning; but while, very impressively, none of my WGANs have diverged (just plateaued and stopped learning), they're still limited to highly homogeneous image sets.
from wassersteingan.
@gmkim90 I also have some experience here to share:
Increasing the learning rate required me to also set the DIters parameter higher to 20 instead of 5.
That produced a somewhat similar curve to what is shown in the paper, but flattening at much higher loss.
I have also observed improvements in image quality with no decrease in the discriminator loss, although I am still learning whether it is necessary for me to increase DIters further.
Leaving learning rate and Diters at its default never lowers my loss for the datasets I'm running (may be due to not enough DIters)
from wassersteingan.
I have similar questions, but it's relative to Wasserstein distance.
I trained WGAN, WGAN-GP, and WGAN-DIV using celebA dataset and DCGAN in 64*64 image size (with default hyperparameters recommended in paper).
-
The WD scale in WGAN are in range [0, 2], WGAN-GP and WGAN-DIV can reach hundreds at the beginning of training, and converged in range [0, 10]. The images generated by WGAN-DIV is much better than WGAN (using same noise), but why the "distance" between the fake and real are much higher in WGAN-DIV?
-
I replaced the lsgan loss in CycleGAN model with wgan loss, and the WD is extremely small (about 1e-4) at the beginning of training, and then goes to nan (which means that the gradients vanished?). So I am wondering if the wgan loss works in other GAN (except for DCGAN)?
Please contact me if you have any advice, thank you!
from wassersteingan.
Related Issues (20)
- After training the model, how to generate Test samples using the generator?
- Where can I find bibtex of Wasserstein GAN and related works? HOT 1
- Why have a tensor of 1 or -1 in loss.backward()? HOT 1
- Problems with the optimization of loss. HOT 4
- cifar10 result not good as expect ! HOT 7
- how to train a 256*128 image dataset and output 256*128 result? HOT 3
- Results on cifar10 very bad even if trained for over 1000 epochs HOT 1
- The parameter ‘db_path' of LSUN setting in 'main.py' should be changed to 'root'
- Results cannot be reproduced. HOT 2
- module name can\'t contain "." HOT 1
- Inconsistent loss function from the paper? HOT 8
- No convergence in onw dataset
- No sigmoid activation for G on MLP?
- should the gamma and beta on batchnormalization layer be clipped?
- some problem when running the WassersteinGAN HOT 1
- Interpreting Generator and Critic loss HOT 1
- How can I use a loss as the stopping criteria in Wasserstein GAN?
- Why did not tell the label to the discriminator
- Generator update HOT 1
- I cannot find the calculating or estimating of wasserstein distance! HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wassersteingan.