martinarjovsky / wassersteingan Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
Dear authors,
I used the default parameters and trained it on Cifar10 datasets for over 1000 epochs, but the results are not very good looking. It is blurry and can hardly recognize the category. Did you use another set of parameters for cifar10?
Looking forward to your reply. Thanks.
how to run on my data on WassersteinGAN.can you give some advice thanks.
The Wasserstein GAN I'm training is presenting mode collapse behavior. There are 5 conditions and the samples associated with each condition each have one single mode.
The critic loss graph shows wild oscillations.
Parameters and example of generated text after the 10 epochs, where each epoch has 100*128 examples, are below.
Namespace(boolean=0, bs=128, clip=0.01, epoch_size=100, lr=5e-05, n_epochs=100)
Dataset shape (129822,)
Building model and compiling functions...
('Generator output:', (None, 1, 128, 128))
('Critic output:', (None, 1))
Suggestions are very welcome!
0, (aueldablecarbld damubkeckecolait astir thin in bowpbor siry le ty therandurcing day anat yale beain ghckvincqundg"bdxk'ntqxw8v'
1, bueldeblecarbcdsdamuqfackeckbalt astar than in tiwpgor sury ye th thetandurting kellanat lale beain ghapvincquod)"bdak'nthxw�v'
2, bueldafredawuckodelreficha kbalv astar ;hathin o0 wor cure ey rh th stodeutine sellavet lale eain chapvinckunvewb'az'xthg\�d�
3, (bueldablecarbcdndamuqlabkecobalv artar , an in tin bor ciry ef rh thettndluting dellavat lale eain jhapvincquid)wb'av'nthxlDy'
4, bumldeblecawbcdsdaluqfackackbalt astar that in tow gor cure ey rh thestndurting kellanet lale eain ghapvinckuod)"b'sz'xthxw(y�
5, bueldabrecawuclodeldadbchanksaltursdarithit os tmn #39;s re lerperyo raouetcane key tne. <ate begad ghakmfgelunc)"bdayex hgr8v'
6, raiedlibldrisblx grjucambngcoln-tursdiait in os worg ba sicr he pe co ma-iuld berday tia. .ate began ghcuzid qunkg"zlxk'n wejxvd
7, bueldifredawuckopelredicha qualt rsdarithathon ton toxZs(xelyerre to saouestene sel ane? late eaid chapadgelunve(bdaz'x hxr8v�
8, pbceldefrldrwuqlefetdadibh, qual; rsdar tsathis, or wor ct wl f re to stouluteve sevltve? late eaid chapadgckjnve(bzaz'x hg\�d�
9, bumldabkecawbcdsdalumfichacksait astarithin in tonpgor s rellertertherandetcing key anet <ate beain ghakmidclund)"bdak'xtqxr8v'
10, rauedjabldrispld dajucllongcolait artir t an in win bor sicg lf th themandlrning day aiat yale beain jhckvincqvidg"zlxk'n wxwDv'
Hey,
I noticed that the dcgan generator ends with a Tanh non-linearity whereas the mlp has no non-linearity at the end. Is this done on purpose? Am I missing something?
Thanks to share easy-to-follow code.
I am currently applying WGAN to learning text distribution.
Here is questions regarding WGAN.
Question1. In Figure 3, the loss of MLP and DCGAN seems comparable. However, I think scale of loss can be varied depending on weight initialization scale and model size. (Please correct me if I am wrong.) In this case, what could be the way to compare learning result of two different model?
Question2. Could you share what the sensitive hyperparameter for WGAN are?
For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)
Thank you
In my experiments , Wgan in celeba generate more low quality images after 100000 iters compared to DCGAN.
I don't know why.
How to setup the [lsun-train-folder]?: python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda. Thank you!
What happens if we get rid of the mone and one in the .backward for the losses? What are they for?
when I was trying to train WGAN on Imagenet , I got this error.
[0/25][1/5631] Loss_D: -1.322320 Loss_G: 0.666666 Loss_D_real: -0.665913 Loss_D_fake 0.656407
[0/25][2/5631] Loss_D: -1.471963 Loss_G: 0.730293 Loss_D_real: -0.750265 Loss_D_fake 0.721698
[0/25][3/5631] Loss_D: -1.504353 Loss_G: 0.745277 Loss_D_real: -0.767033 Loss_D_fake 0.737320
[0/25][4/5631] Loss_D: -1.522664 Loss_G: 0.754508 Loss_D_real: -0.776120 Loss_D_fake 0.746544
[0/25][5/5631] Loss_D: -1.534124 Loss_G: 0.758739 Loss_D_real: -0.782926 Loss_D_fake 0.751198
[0/25][6/5631] Loss_D: -1.539814 Loss_G: 0.761637 Loss_D_real: -0.785868 Loss_D_fake 0.753947
[0/25][7/5631] Loss_D: -1.541982 Loss_G: 0.763071 Loss_D_real: -0.786177 Loss_D_fake 0.755805
[0/25][8/5631] Loss_D: -1.544353 Loss_G: 0.764505 Loss_D_real: -0.787565 Loss_D_fake 0.756788
[0/25][9/5631] Loss_D: -1.545964 Loss_G: 0.764762 Loss_D_real: -0.788965 Loss_D_fake 0.756999
[0/25][10/5631] Loss_D: -1.547822 Loss_G: 0.765548 Loss_D_real: -0.789867 Loss_D_fake 0.757955
[0/25][11/5631] Loss_D: -1.542095 Loss_G: 0.765476 Loss_D_real: -0.784480 Loss_D_fake 0.757615
[0/25][12/5631] Loss_D: -1.547668 Loss_G: 0.766164 Loss_D_real: -0.789187 Loss_D_fake 0.758481
[0/25][13/5631] Loss_D: -1.548513 Loss_G: 0.766813 Loss_D_real: -0.789454 Loss_D_fake 0.759059
[0/25][14/5631] Loss_D: -1.549405 Loss_G: 0.767250 Loss_D_real: -0.790068 Loss_D_fake 0.759336
[0/25][15/5631] Loss_D: -1.548578 Loss_G: 0.767710 Loss_D_real: -0.788655 Loss_D_fake 0.759923
IOError: image file is truncated
Traceback (most recent call last):
File "main.py", line 178, in
data = data_iter.next()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 162, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
return recv()
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv
buf = self.recv_bytes()
First of all thank you for the great article and for the code!
I was looking through the code and found that you do not use bias in the Convoltional layers. Does this have a influence in the algorithm? I tried to find the reason in the paper, but I might have missed it.
Best Regards!
my name is wangchao nice to meet you.
run https://github.com/martinarjovsky/WassersteinGAN in my tensorflow find a problem
[jianglin@JULYEDU-GPU1 WassersteinGAN-master]$ python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda
Namespace(Diters=5, adam=False, batchSize=64, beta1=0.5, clamp_lower=-0.01, clamp_upper=0.01, cuda=True, dataroot='[lsun-train-folder]', dataset='lsun', experiment=None, imageSize=64, lrD=5e-05, lrG=5e-05, mlp_D=False, mlp_G=False, n_extra_layers=0, nc=3, ndf=64, netD='', netG='', ngf=64, ngpu=1, niter=25, noBN=False, nz=100, workers=2)
mkdir: cannot create directory ‘samples’: File exists
Random Seed: 2745
Traceback (most recent call last):
File "main.py", line 78, in
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
File "/usr/lib/python2.7/site-packages/torchvision/datasets/lsun.py", line 99, in init
transform=transform))
File "/usr/lib/python2.7/site-packages/torchvision/datasets/lsun.py", line 20, in init
readahead=False, meminit=False)
lmdb.Error: [lsun-train-folder]/bedroom_train_lmdb: No such file or directory
please give some advice
I believe that the parameter clamping also reduces the batchnorm scaling factor to near zero, when (as far as I understand) it should stay near 1 (where it was initialized).
# clamp parameters to a cube
for p in netD.parameters():
p.data.clamp_(opt.clamp_lower, opt.clamp_upper)
Do you have hyperparameters for bigger batchSize, like 256 and 512?
When I trained my WGAN using DCGAN topology on MNIST dataset, I observed that at the very beginning G Loss has decreased to near zero and kept being very small, but D Loss did not decrease even trained after 20000 iterations. What does that mean? Please advice.
=============
Reading other people's comments , it seems that G loss does not have much meaning. But If I removed the training of Generator, just trained on Discriminator.
Iteration 0 complete. Discriminator avg loss: 6.67944732413e-06 Generator avg loss: 0
Iteration 100 complete. Discriminator avg loss: 1082.13891602 Generator avg loss: 0
Iteration 200 complete. Discriminator avg loss: 993.882385254 Generator avg loss: 0
Iteration 300 complete. Discriminator avg loss: 1047.83435059 Generator avg loss: 0
Iteration 400 complete. Discriminator avg loss: 1081.64208984 Generator avg loss: 0
Iteration 500 complete. Discriminator avg loss: 1024.39331055 Generator avg loss: 0
Iteration 600 complete. Discriminator avg loss: 936.895996094 Generator avg loss: 0
Iteration 700 complete. Discriminator avg loss: 1028.81530762 Generator avg loss: 0
Iteration 800 complete. Discriminator avg loss: 1029.08447266 Generator avg loss: 0
Iteration 900 complete. Discriminator avg loss: 1047.0847168 Generator avg loss: 0
Iteration 1000 complete. Discriminator avg loss: 995.261230469 Generator avg loss: 0
Iteration 1100 complete. Discriminator avg loss: 1048.86035156 Generator avg loss: 0
Iteration 1200 complete. Discriminator avg loss: 1040.19873047 Generator avg loss: 0
Iteration 1300 complete. Discriminator avg loss: 1020.10339355 Generator avg loss: 0
Iteration 1400 complete. Discriminator avg loss: 1075.15576172 Generator avg loss: 0
What does this mean?
Excellent work!!!
I want to implement WGAN in Caffe. I have confused with the gradient computing of WGAN loss.
Would you give some details of mathematical formulas?
Thank you!
thanks for your great work!
i'd like to implement a WGAN critic for keras
but i'm new to torch and having a hard time figuring out a couple of things in your code.
i want to double check what nonlinearity do you use for your critic output. i see that you use linear for your MLP_D
here but i don't see it in DCGAN_D
or DCGAN_D_nobn
.
why linear if f_w
is a distance metric? why not relu
?
am i right that this is the loss for the critic? (y_true
is 0/1 for generated/training sample)
def wgan_loss(y_true, y_pred):
return K.mean(((-1) ** y_true) * y_pred)
n_critic
do you use?Line 73 in f81eafd
Same as mentioned in #62 and #64, the results on CIFAR10 dataset cannot be reproduced!
I trained with the default settings for 100 epochs (which I think is enough), but the results
still don't make sense.
Here is the example fake images generated via the Generator:
Any body reproduced the results on CIFAR10 (or LSUN bedroom) dataset, can you please tell the training details?
Hi! @martinarjovsky
I have some questions about the loss curve. When I don't use the median filter operation, the loss curve is still oscillates, which describes the training is unstable, am I right??
Please help me! Thank you!
I just got confusion about the "one" and "mone" used in "backward()". The goal of discriminator "D" is to maximize the output of netD for real data, but minimize the output of netD for fake data (as I understand it). While "backward" and "optimizerD" are minimizing a loss function. Therefore, I think "errD_real.backward(one)" should be "errD_real.backward(mone)", and "errD_fake.backward(mone)" should be "errD_fake.backward(one)". That is my opinion. Could you give some explanations? Thank you!
Hello, I'm very sorry to disturb you, I would like to submit a paper in the arXiv.org, but must have had in this article recommend, take the liberty of ask: can you help me to confirm, here is the link, you can open, thank you very much
Hi!
Thank you for your amazing work with this repository.
I am experimenting with unsupervised learning - I would like to use the G / D of the trained GAN as encoders (there are ways to use G via knowledge distillation into a simpler CNN or papers like this one. I would like to build a model that can recognize 2 images of the same room.
The problem is that standard techniques (hashing, CNN => dimensionality reduction => distance, siamese networks) will either find only very similar images / clustered images or image replicas. If the image is shot from a bit different angle - these methods fail. Ideally I need to build an index of images by such variables as couches / TVs / carpets. Looks like DCGAN / WGAN learns something like this as a latent variable. If we have 100+ latent variables that represent this - they may work for different images.
I have a dataset of ~300-500k similar images, that mostly resemble lsun living room, but are shot in Russia. So far I have tried the variations of hyperparameters similar to these (please refer to the PR to see what the new parameters mean):
python3 main.py \
--dataset fastfolder --dataroot ../data/data_for_wgan/all_imgs \
--imgList ../data/generator_imgs_wgan \
--ngf 64 --ndf 64 \
--workers 6 --batchSize 256 --imageSize 256 \
--nz 100 --niter 50 \
--cuda --ngpu 2 \
--n_extra_layers 0 \
--experiment flat_dataset_224_0extra \
--adam \
--tensorboard_images --tensorboard
So far I did not achieve any success yet. Can you please add some guidelines into README on training on larger resolutions / batch sizes?
Hi, here when displaying the images , donot we need to convert back to the original images value range before storing them to disk.
Hi, do you plan to provide a pytorch implementation of the recent paper on "Improved Training of Wasserstein GANs"?
Is there an easy way to compute the gradient w.r.t. of the gradient norm?
I've got a question for the discriminator loss.
It seems when training using WGAN you can end up with increased image quality with increased loss.
I have plotted here -log D vs. generator iterations, smoothed using a median filter of length 101.
Are there any guidelines how to diagnose these losses?
The initial dip has significant lower image quality than the most recent peak.
Thanks!
D for real images output is always negative, for generated images, the output is always positive.
Why does this happen?
And, I find that training for a long time later, D's Loss tends to 0, and G's Loss also tends to 0.
I think if D and G to achieve game balance, G's Loss should tend to 0.5.
How do you see this question?
Thanks very much
The clip operation here is done before optimizerD.step()
. So the discriminator used in updating G is the un-clampped version, which is not consistent with the paper.
Can anyone tell me how to make some change for mnist dataset?
For WGAN, it should maximize the loss of Discriminator, and minimize the loss of negative Generator. However, it did just in the opposite way in the codes. am I wrong?
I think it should like this:
errD_real.backward(mone) in 189.
errD_fake.backward(one) in 197.
errG.backward(mone) in 213.
In the current implementation, the parameters of batchnorm layers will also be clamped. Is this a desired behavior?
I run your code in cifar10, but the result seems not as good as our expected.
system: debian 8
python: python2
pytorch: torch==0.3.1
$python main.py --dataset cifar10 --dataroot ~/.torch/datasets --cuda
[24/25][735/782][3335] Loss_D: -1.287177 Loss_G: 0.642245 Loss_D_real: -0.651701 Loss_D_fake 0.635477
[24/25][740/782][3336] Loss_D: -1.269792 Loss_G: 0.621307 Loss_D_real: -0.657210 Loss_D_fake 0.612582
[24/25][745/782][3337] Loss_D: -1.250543 Loss_G: 0.636843 Loss_D_real: -0.667046 Loss_D_fake 0.583497
[24/25][750/782][3338] Loss_D: -1.196252 Loss_G: 0.589907 Loss_D_real: -0.606480 Loss_D_fake 0.589772
[24/25][755/782][3339] Loss_D: -1.189609 Loss_G: 0.564263 Loss_D_real: -0.612895 Loss_D_fake 0.576714
[24/25][760/782][3340] Loss_D: -1.178156 Loss_G: 0.586755 Loss_D_real: -0.600268 Loss_D_fake 0.577888
[24/25][765/782][3341] Loss_D: -1.087157 Loss_G: 0.508717 Loss_D_real: -0.522565 Loss_D_fake 0.564592
[24/25][770/782][3342] Loss_D: -1.092081 Loss_G: 0.674212 Loss_D_real: -0.657483 Loss_D_fake 0.434598
[24/25][775/782][3343] Loss_D: -0.937950 Loss_G: 0.209016 Loss_D_real: -0.310877 Loss_D_fake 0.627073
[24/25][780/782][3344] Loss_D: -1.316574 Loss_G: 0.653665 Loss_D_real: -0.693675 Loss_D_fake 0.622899
[24/25][782/782][3345] Loss_D: -1.222763 Loss_G: 0.558372 Loss_D_real: -0.567426 Loss_D_fake 0.655337
when I just run the command
python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda
,
it comes out with this error without any other info.
The error info comes from line 152 in main.py, data_iter = iter(dataloader)
parser.add_argument('--imageSize', type=int, default=64, help='the height / width of the input image to network')
the default imagesize is 6464?how to define a new image size like mn
Can someone explain why this is done: https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L70
i have been studying your paper:《Wasserstein GAN》, could you share the code corresponding to figure 2, page 9 in your paper to me ?
thanks very much! And WGAN is a masterpiece in my mind !!
I am not sure if I miss something.
Looking at the source gives:
f(real) - f(g(prior samples))
f(g(prior samples))
Looking at the paper gives:
f(real) - f(g(prior samples))
in Algorithm 1 line 5- f(g(prior samples))
in Algorithm 1 line 10Is the minus sign correct in Algorithm 1 line 10?
Hi, have you tried to apply WGAN for conditional image generation?
Say, in the simplest scenario of conditioning on the class label.
I'm trying to do that, but observe some weird behavior:
Any suggestions?
I'm new to the GAN, what I understand about it is that netG and netD should be optimized alternatively,but I cannot find netG.parameters.requires_grad = False
when updating netD, I get confused about it.
Hello! I run your WGAN's code.
I get this bug
input.resize_as_(real_cpu).copy_(real_cpu)
TypeError: resize_as_ received an invalid combination of arguments - got (!torch.FloatTensor!), but expected (torch.cuda.FloatTensor template)
How to solve it?
Hello, your method computing the d_loos is that using
errD_real.backward(one)
errD_fake.backward(mone).
But why not using
errD = errD_real - errD_fake
errD.backward()
It is faster than the previous attenuation to 0
I tried WGAN with Tensorflow and found that when batch normalization is not used in the generator (MLP network), then there is no mode collapse. However, when I added batch normalization between hidden layers, partial mode collapse occurred. I read your code and found that your MLP_G doesn't use any batch norm layers so I am wondering whether you have tried to add batch normalization in the generator only to see whether they will cause partial mode collapse.
Or, anyone already has some experience/ideas on this?
lmdb module is missing in the Requirements section.
conda create -n py3.6 python=3.6 anaconda
conda install pytorch torchvision cuda80 -c soumith
python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda
...
ModuleNotFoundError: No module named 'lmdb'
In continuation with the discussion in issue #9, I'm still missing something about the critic objective.
Isn't Loss_D
supposed to be the approximation of the Wasserstein distance? Then, if the critic is trained to optimalilty, the distance should be at least positive.
Related to this, the actual objective of the critic is going up when training. Does this mean that the estimate of the Wasserstein distance is not improving? Should the picture be the same if we train the critic to optimality at each iteration?
Hi, I'm new to computer vision and is trying to get up to speed with everything. When I run the
python main.py --dataset lsun --dataroot [dataroot] --cuda
command there is an error saying lmdb.Error: [dataroot]/bedroom_train_lmdb: no such file or directory
. It looks like that I need to download the dataset myself. Can you provide a pointer to which I can get the dataset? Thank you very much!
Would you suggest having fixed noise and conditions when looking at outputs of the generator to evaluate progress?
In the paper, you report a negative result that
WGAN training becomes unstable at times when one uses a momentum based optimizer such as Adam [8] (with B1 > 0) on the critic, or when one uses high learning rates
You advocate using RMSProp for the discriminator instead. Yet in the implementation, although RMSProp is the default, there is an option to use Adam (line 144). Is this included for consistency with your evaluation, or have you found settings for which Adam is effective with the WGAN?
I have few questions:
errD
, but the code seems to be minimizing it.-errG
, but the code seems to be minimizing errG
instead.Maybe I'm missing something about how the losses are computed and optimized.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.