martinarjovsky / wassersteingan Goto Github PK

View Code? Open in Web Editor NEW

3.2K 104.0 729.0 1.79 MB

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

wassersteingan's Introduction

Wasserstein GAN

Code accompanying the paper "Wasserstein GAN"

A few notes

The first time running on the LSUN dataset it can take a long time (up to an hour) to create the dataloader. After the first run a small cache file will be created and the process should take a matter of seconds. The cache is a list of indices in the lmdb database (of LSUN)
The only addition to the code (that we forgot, and will add, on the paper) are the lines 163-166 of main.py. These lines act only on the first 25 generator iterations or very sporadically (once every 500 generator iterations). In such a case, they set the number of iterations on the critic to 100 instead of the default 5. This helps to start with the critic at optimum even in the first iterations. There shouldn't be a major difference in performance, but it can help, especially when visualizing learning curves (since otherwise you'd see the loss going up until the critic is properly trained). This is also why the first 25 iterations take significantly longer than the rest of the training as well.
If your learning curve suddenly takes a big drop take a look at this. It's a problem when the critic fails to be close to optimum, and hence its error stops being a good Wasserstein estimate. Known causes are high learning rates and momentum, and anything that helps the critic get back on track is likely to help with the issue.

Prerequisites

Computer with Linux or OSX
PyTorch
For training, an NVIDIA GPU is strongly recommended for speed. CPU is supported but training is very slow.

Two main empirical claims:

Generator sample quality correlates with discriminator loss

Improved model stability

Reproducing LSUN experiments

With DCGAN:

python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda

With MLP:

python main.py --mlp_G --ngf 512

Generated samples will be in the samples folder.

If you plot the value -Loss_D, then you can reproduce the curves from the paper. The curves from the paper (as mentioned in the paper) have a median filter applied to them:

med_filtered_loss = scipy.signal.medfilt(-Loss_D, dtype='float64'), 101)

wassersteingan's People

Contributors

Stargazers

Watchers

Forkers

jfsantos vyraun snazz2001 arnabgho ml-lab soprof wangg12 gibipara92 caomw doetsch rap9430 sguada singlasahil14 johndpope ruotianluo blackyang benjamesbabala allgan clcarwin tsingjinyun gojira bestbyte jinyu0310 kingofoz milesqli vikingmew elviswf ericeiffel caoge4 pliu007 njusoftwaremining zhiqiangwan mjc14 awesome-archive ilibx ericxsun showly chenglongchen cupwater joyhuang9473 zilongzhong walkacross doubledaibo hfxunlp elliskui wucpmark zhangqianhui wangyoutcai adrianlsk synpon feepingcreature codeaudit caigaojiang xjwxjw yhldhit mornydew zhunzhong07 zxydi1992 mguo001 agistrueai haooooooqi cins-china mldl cosmozhang zhaoerchao zhangxujinsh kissyzhou wenxuanliu albert-lzg mysee1989 cfwen johnsonc xinqi427 morganw 123chengbo lim0606 hzauccg sophiezhou mikuhatsune yifangfu lllliiillll hengqujushi ieswxia clusteranalysis szhaomsft phecy godricly zeitgeistqian allensmile kensun0 furyphoenix bityangke felxichan davidz-zzz zhixinshu liwei606 iemppu shichaosuper mutual-ai hope-yao

wassersteingan's Issues

Question on clamp weights

In the current implementation, the parameters of batchnorm layers will also be clamped. Is this a desired behavior?

Using batch normalization in the generator network leads to mode collapse

I tried WGAN with Tensorflow and found that when batch normalization is not used in the generator (MLP network), then there is no mode collapse. However, when I added batch normalization between hidden layers, partial mode collapse occurred. I read your code and found that your MLP_G doesn't use any batch norm layers so I am wondering whether you have tried to add batch normalization in the generator only to see whether they will cause partial mode collapse.

Or, anyone already has some experience/ideas on this?

Question about 'one' and 'mone'

I just got confusion about the "one" and "mone" used in "backward()". The goal of discriminator "D" is to maximize the output of netD for real data, but minimize the output of netD for fake data (as I understand it). While "backward" and "optimizerD" are minimizing a loss function. Therefore, I think "errD_real.backward(one)" should be "errD_real.backward(mone)", and "errD_fake.backward(mone)" should be "errD_fake.backward(one)". That is my opinion. Could you give some explanations? Thank you!

How to setup the [lsun-train-folder]?: python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda

How to setup the [lsun-train-folder]?: python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda. Thank you!

critic output nonlinearity, loss and `n_critic`

thanks for your great work!

i'd like to implement a WGAN critic for keras but i'm new to torch and having a hard time figuring out a couple of things in your code.

i want to double check what nonlinearity do you use for your critic output. i see that you use linear for your MLP_D here but i don't see it in DCGAN_D or DCGAN_D_nobn.
why linear if f_w is a distance metric? why not relu?
am i right that this is the loss for the critic? (y_true is 0/1 for generated/training sample)

def wgan_loss(y_true, y_pred):
    return K.mean(((-1) ** y_true) * y_pred)

in practice, what value of n_critic do you use?

Clip too early

The clip operation here is done before optimizerD.step(). So the discriminator used in updating G is the un-clampped version, which is not consistent with the paper.

Why is Adam solver an option?

In the paper, you report a negative result that

WGAN training becomes unstable at times when one uses a momentum based optimizer such as Adam [8] (with B1 > 0) on the critic, or when one uses high learning rates

You advocate using RMSProp for the discriminator instead. Yet in the implementation, although RMSProp is the default, there is an option to use Adam (line 144). Is this included for consistency with your evaluation, or have you found settings for which Adam is effective with the WGAN?

Error in python: double free or corruption (out)

when I just run the command
python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda,
it comes out with this error without any other info.

The error info comes from line 152 in main.py, data_iter = iter(dataloader)

is the estimate of Wasserstein distance negative?

In continuation with the discussion in issue #9, I'm still missing something about the critic objective.
Isn't Loss_D supposed to be the approximation of the Wasserstein distance? Then, if the critic is trained to optimalilty, the distance should be at least positive.
Related to this, the actual objective of the critic is going up when training. Does this mean that the estimate of the Wasserstein distance is not improving? Should the picture be the same if we train the critic to optimality at each iteration?

Questions on Loss scale, Hyperparameters

Thanks to share easy-to-follow code.
I am currently applying WGAN to learning text distribution.

Here is questions regarding WGAN.

Question1. In Figure 3, the loss of MLP and DCGAN seems comparable. However, I think scale of loss can be varied depending on weight initialization scale and model size. (Please correct me if I am wrong.) In this case, what could be the way to compare learning result of two different model?

Question2. Could you share what the sensitive hyperparameter for WGAN are?
For example : weight initialization scale(0.02), clamping threshold (0.01), batch size(64), model size, #D/G step(5), lr(0.00005)

Thank you

Why have a tensor of 1 or -1 in loss.backward()?

What happens if we get rid of the mone and one in the .backward for the losses? What are they for?

How to make some change for mnist dataset?

Can anyone tell me how to make some change for mnist dataset?

Results cannot be reproduced.

Same as mentioned in #62 and #64, the results on CIFAR10 dataset cannot be reproduced!

I trained with the default settings for 100 epochs (which I think is enough), but the results
still don't make sense.

Here is the example fake images generated via the Generator:

Any body reproduced the results on CIFAR10 (or LSUN bedroom) dataset, can you please tell the training details?

some problem run program in my tensorflow

my name is wangchao nice to meet you.
run https://github.com/martinarjovsky/WassersteinGAN in my tensorflow find a problem

[jianglin@JULYEDU-GPU1 WassersteinGAN-master]$ python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda
Namespace(Diters=5, adam=False, batchSize=64, beta1=0.5, clamp_lower=-0.01, clamp_upper=0.01, cuda=True, dataroot='[lsun-train-folder]', dataset='lsun', experiment=None, imageSize=64, lrD=5e-05, lrG=5e-05, mlp_D=False, mlp_G=False, n_extra_layers=0, nc=3, ndf=64, netD='', netG='', ngf=64, ngpu=1, niter=25, noBN=False, nz=100, workers=2)
mkdir: cannot create directory ‘samples’: File exists
Random Seed: 2745
Traceback (most recent call last):
File "main.py", line 78, in
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
File "/usr/lib/python2.7/site-packages/torchvision/datasets/lsun.py", line 99, in init
transform=transform))
File "/usr/lib/python2.7/site-packages/torchvision/datasets/lsun.py", line 20, in init
readahead=False, meminit=False)
lmdb.Error: [lsun-train-folder]/bedroom_train_lmdb: No such file or directory
please give some advice

Results on cifar10 very bad even if trained for over 1000 epochs

Dear authors,

I used the default parameters and trained it on Cifar10 datasets for over 1000 epochs, but the results are not very good looking. It is blurry and can hardly recognize the category. Did you use another set of parameters for cifar10?

Looking forward to your reply. Thanks.

Why there is no netG.parameters.requires_grad = False when updating netD?

I'm new to the GAN, what I understand about it is that netG and netD should be optimized alternatively，but I cannot find netG.parameters.requires_grad = False when updating netD, I get confused about it.

Conditional WGAN

Hi, have you tried to apply WGAN for conditional image generation?
Say, in the simplest scenario of conditioning on the class label.
I'm trying to do that, but observe some weird behavior:

If I add an extra head for the class label (like AC-GAN), the WGAN head just wins and another one is simply ignored. This is understandable because its gradients saturate, but the ones of WGAN do not.
If I do like CGAN, i.e. feed the critic the class label as well, the discriminator loss does not really move the right direction.

Any suggestions?

convert the real images and generated images back to original format to display???

Hi, here when displaying the images , donot we need to convert back to the original images value range before storing them to disk.

Why is data always normalized w/ hardcoded mean/std = 0.5

Can someone explain why this is done: https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L70

cifar10 result not good as expect !

I run your code in cifar10, but the result seems not as good as our expected.

system information:

system: debian 8
python: python2
pytorch: torch==0.3.1

run command:

$python main.py --dataset cifar10 --dataroot ~/.torch/datasets --cuda

output part:

[24/25][735/782][3335] Loss_D: -1.287177 Loss_G: 0.642245 Loss_D_real: -0.651701 Loss_D_fake 0.635477
[24/25][740/782][3336] Loss_D: -1.269792 Loss_G: 0.621307 Loss_D_real: -0.657210 Loss_D_fake 0.612582
[24/25][745/782][3337] Loss_D: -1.250543 Loss_G: 0.636843 Loss_D_real: -0.667046 Loss_D_fake 0.583497
[24/25][750/782][3338] Loss_D: -1.196252 Loss_G: 0.589907 Loss_D_real: -0.606480 Loss_D_fake 0.589772
[24/25][755/782][3339] Loss_D: -1.189609 Loss_G: 0.564263 Loss_D_real: -0.612895 Loss_D_fake 0.576714
[24/25][760/782][3340] Loss_D: -1.178156 Loss_G: 0.586755 Loss_D_real: -0.600268 Loss_D_fake 0.577888
[24/25][765/782][3341] Loss_D: -1.087157 Loss_G: 0.508717 Loss_D_real: -0.522565 Loss_D_fake 0.564592
[24/25][770/782][3342] Loss_D: -1.092081 Loss_G: 0.674212 Loss_D_real: -0.657483 Loss_D_fake 0.434598
[24/25][775/782][3343] Loss_D: -0.937950 Loss_G: 0.209016 Loss_D_real: -0.310877 Loss_D_fake 0.627073
[24/25][780/782][3344] Loss_D: -1.316574 Loss_G: 0.653665 Loss_D_real: -0.693675 Loss_D_fake 0.622899
[24/25][782/782][3345] Loss_D: -1.222763 Loss_G: 0.558372 Loss_D_real: -0.567426 Loss_D_fake 0.655337

fake_samples_500.png

fake_samples_1000.png

fake_samples_1500.png

fake_samples_2000.png

fake_samples_2500.png

fake_samples_3000.png

Note that this is real_samples.png!!!

Interpretation of Discriminator Loss

I've got a question for the discriminator loss.

It seems when training using WGAN you can end up with increased image quality with increased loss.

I have plotted here -log D vs. generator iterations, smoothed using a median filter of length 101.
Are there any guidelines how to diagnose these losses?

The initial dip has significant lower image quality than the most recent peak.

Thanks!

Where can I find bibtex of Wasserstein GAN and related works?

resize problem

Hello! I run your WGAN's code.
I get this bug
input.resize_as_(real_cpu).copy_(real_cpu)
TypeError: resize_as_ received an invalid combination of arguments - got (!torch.FloatTensor!), but expected (torch.cuda.FloatTensor template)
How to solve it?

ask for code

i have been studying your paper：《Wasserstein GAN》， could you share the code corresponding to figure 2, page 9 in your paper to me ?
thanks very much! And WGAN is a masterpiece in my mind !!

IOError: image file is truncated

when I was trying to train WGAN on Imagenet , I got this error.

[0/25][1/5631] Loss_D: -1.322320 Loss_G: 0.666666 Loss_D_real: -0.665913 Loss_D_fake 0.656407
[0/25][2/5631] Loss_D: -1.471963 Loss_G: 0.730293 Loss_D_real: -0.750265 Loss_D_fake 0.721698
[0/25][3/5631] Loss_D: -1.504353 Loss_G: 0.745277 Loss_D_real: -0.767033 Loss_D_fake 0.737320
[0/25][4/5631] Loss_D: -1.522664 Loss_G: 0.754508 Loss_D_real: -0.776120 Loss_D_fake 0.746544
[0/25][5/5631] Loss_D: -1.534124 Loss_G: 0.758739 Loss_D_real: -0.782926 Loss_D_fake 0.751198
[0/25][6/5631] Loss_D: -1.539814 Loss_G: 0.761637 Loss_D_real: -0.785868 Loss_D_fake 0.753947
[0/25][7/5631] Loss_D: -1.541982 Loss_G: 0.763071 Loss_D_real: -0.786177 Loss_D_fake 0.755805
[0/25][8/5631] Loss_D: -1.544353 Loss_G: 0.764505 Loss_D_real: -0.787565 Loss_D_fake 0.756788
[0/25][9/5631] Loss_D: -1.545964 Loss_G: 0.764762 Loss_D_real: -0.788965 Loss_D_fake 0.756999
[0/25][10/5631] Loss_D: -1.547822 Loss_G: 0.765548 Loss_D_real: -0.789867 Loss_D_fake 0.757955
[0/25][11/5631] Loss_D: -1.542095 Loss_G: 0.765476 Loss_D_real: -0.784480 Loss_D_fake 0.757615
[0/25][12/5631] Loss_D: -1.547668 Loss_G: 0.766164 Loss_D_real: -0.789187 Loss_D_fake 0.758481
[0/25][13/5631] Loss_D: -1.548513 Loss_G: 0.766813 Loss_D_real: -0.789454 Loss_D_fake 0.759059
[0/25][14/5631] Loss_D: -1.549405 Loss_G: 0.767250 Loss_D_real: -0.790068 Loss_D_fake 0.759336
[0/25][15/5631] Loss_D: -1.548578 Loss_G: 0.767710 Loss_D_real: -0.788655 Loss_D_fake 0.759923
IOError: image file is truncated

Traceback (most recent call last):
File "main.py", line 178, in
data = data_iter.next()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 162, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
return recv()
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv
buf = self.recv_bytes()

Reference parameters for large resolution training

Hi!

Thank you for your amazing work with this repository.
I am experimenting with unsupervised learning - I would like to use the G / D of the trained GAN as encoders (there are ways to use G via knowledge distillation into a simpler CNN or papers like this one. I would like to build a model that can recognize 2 images of the same room.

The problem is that standard techniques (hashing, CNN => dimensionality reduction => distance, siamese networks) will either find only very similar images / clustered images or image replicas. If the image is shot from a bit different angle - these methods fail. Ideally I need to build an index of images by such variables as couches / TVs / carpets. Looks like DCGAN / WGAN learns something like this as a latent variable. If we have 100+ latent variables that represent this - they may work for different images.

I have a dataset of ~300-500k similar images, that mostly resemble lsun living room, but are shot in Russia. So far I have tried the variations of hyperparameters similar to these (please refer to the PR to see what the new parameters mean):

python3 main.py \
	--dataset fastfolder --dataroot ../data/data_for_wgan/all_imgs \
	--imgList ../data/generator_imgs_wgan \
	--ngf 64 --ndf 64 \
	--workers 6 --batchSize 256 --imageSize 256 \
	--nz 100 --niter 50 \
	--cuda --ngpu 2 \
	--n_extra_layers 0 \
	--experiment flat_dataset_224_0extra \
	--adam \
	--tensorboard_images --tensorboard

So far I did not achieve any success yet. Can you please add some guidelines into README on training on larger resolutions / batch sizes?

Quetions about the D out

D for real images output is always negative, for generated images, the output is always positive.
Why does this happen?
And, I find that training for a long time later, D's Loss tends to 0, and G's Loss also tends to 0.
I think if D and G to achieve game balance, G's Loss should tend to 0.5.
How do you see this question?
Thanks very much

lmdb as a dependency

lmdb module is missing in the Requirements section.

Ubuntu Linux 16.04
a conda virtual environment created with conda create -n py3.6 python=3.6 anaconda
installed PyTorch via conda install pytorch torchvision cuda80 -c soumith
run the command python main.py --dataset lsun --dataroot [lsun-train-folder] --cuda
get the following:

...

ModuleNotFoundError: No module named 'lmdb'

Question about the biases in the Convolutional Layers

First of all thank you for the great article and for the code!

I was looking through the code and found that you do not use bias in the Convoltional layers. Does this have a influence in the algorithm? I tried to find the reason in the paper, but I might have missed it.

Best Regards!

Fixed noise and conditions for Wasserstein GAN training

Would you suggest having fixed noise and conditions when looking at outputs of the generator to evaluate progress?

Mode collapse in Conditional GAN Sequence (text) generation

The Wasserstein GAN I'm training is presenting mode collapse behavior. There are 5 conditions and the samples associated with each condition each have one single mode.
The critic loss graph shows wild oscillations.
Parameters and example of generated text after the 10 epochs, where each epoch has 100*128 examples, are below.

Namespace(boolean=0, bs=128, clip=0.01, epoch_size=100, lr=5e-05, n_epochs=100)
Dataset shape (129822,)
Building model and compiling functions...
('Generator output:', (None, 1, 128, 128))
('Critic output:', (None, 1))

Suggestions are very welcome!

0, (aueldablecarbld damubkeckecolait astir thin in bowpbor siry le ty therandurcing day anat yale beain ghckvincqundg"bdxk'ntqxw8v'
1,  bueldeblecarbcdsdamuqfackeckbalt astar than in tiwpgor sury ye th thetandurting kellanat lale beain ghapvincquod)"bdak'nthxw�v'
2,  bueldafredawuckodelreficha kbalv astar ;hathin  o0 wor cure ey rh th stodeutine sellavet lale  eain chapvinckunvewb'az'xthg\�d�
3, (bueldablecarbcdndamuqlabkecobalv artar , an in tin bor ciry ef rh thettndluting dellavat lale  eain jhapvincquid)wb'av'nthxlDy'
4,  bumldeblecawbcdsdaluqfackackbalt astar that in tow gor cure ey rh thestndurting kellanet lale  eain ghapvinckuod)"b'sz'xthxw(y�
5,  bueldabrecawuclodeldadbchanksaltursdarithit os tmn #39;s re lerperyo raouetcane key tne. <ate begad ghakmfgelunc)"bdayex hgr8v'
6, raiedlibldrisblx grjucambngcoln-tursdiait in os worg ba sicr he pe co ma-iuld berday tia. .ate began ghcuzid qunkg"zlxk'n wejxvd
7,  bueldifredawuckopelredicha qualt rsdarithathon ton toxZs(xelyerre to saouestene sel ane? late  eaid chapadgelunve(bdaz'x hxr8v�
8, pbceldefrldrwuqlefetdadibh, qual; rsdar tsathis, or wor ct	wl f re to stouluteve sevltve? late  eaid chapadgckjnve(bzaz'x hg\�d�
9,  bumldabkecawbcdsdalumfichacksait astarithin in tonpgor s rellertertherandetcing key anet <ate beain ghakmidclund)"bdak'xtqxr8v'
10, rauedjabldrispld dajucllongcolait artir t an in win bor sicg lf th themandlrning day aiat yale beain jhckvincqvidg"zlxk'n wxwDv'

Questions about loss curve

Hi! @martinarjovsky

I have some questions about the loss curve. When I don't use the median filter operation, the loss curve is still oscillates, which describes the training is unstable, am I right??

Please help me! Thank you!

Missmatch between loss in paper and code

I have few questions:

According eq (2) and pseudo-code line 6, one should maximize errD, but the code seems to be minimizing it.
Similarly in pseudo-code line 10, one should minimize -errG, but the code seems to be minimizing errG instead.

Maybe I'm missing something about how the losses are computed and optimized.

Typo in paper?

I am not sure if I miss something.

Looking at the source gives:

update D by f(real) - f(g(prior samples))
update G by f(g(prior samples))

Looking at the paper gives:

update D by f(real) - f(g(prior samples)) in Algorithm 1 line 5
update G by - f(g(prior samples)) in Algorithm 1 line 10

Is the minus sign correct in Algorithm 1 line 10?

how to train a 256128 image dataset and output 256128 result？

parser.add_argument('--imageSize', type=int, default=64, help='the height / width of the input image to network')
the default imagesize is 6464？how to define a new image size like mn

The parameter ‘db_path' of LSUN setting in 'main.py' should be changed to 'root'

WassersteinGAN/main.py

Line 73 in f81eafd

dataset = dset.LSUN(db_path=opt.dataroot, classes=['bedroom_train'],

The new version of torchvision has modified this parameter. https://github.com/pytorch/vision/commit/73a29e02aff4f30c1d32bcdfa032f6bb72b1bcce#diff-deea30150565e331de97e6ba14a3e119

Questions about the D_loss

Hello, your method computing the d_loos is that using
errD_real.backward(one)
errD_fake.backward(mone).
But why not using
errD = errD_real - errD_fake
errD.backward()

It is faster than the previous attenuation to 0

Generator non linearity missing?

Hey,

I noticed that the dcgan generator ends with a Tanh non-linearity whereas the mlp has no non-linearity at the end. Is this done on purpose? Am I missing something?

After training the model, how to generate Test samples using the generator?

How to compute the WGAN loss gradient

Excellent work!!!
I want to implement WGAN in Caffe. I have confused with the gradient computing of WGAN loss.
Would you give some details of mathematical formulas?
Thank you!

confirm

Hello, I'm very sorry to disturb you, I would like to submit a paper in the arXiv.org, but must have had in this article recommend, take the liberty of ask: can you help me to confirm, here is the link, you can open, thank you very much

Bigger batchsize?

Do you have hyperparameters for bigger batchSize, like 256 and 512?

pytorch code for Improved Training of Wasserstein GANs

Hi, do you plan to provide a pytorch implementation of the recent paper on "Improved Training of Wasserstein GANs"?
Is there an easy way to compute the gradient w.r.t. of the gradient norm?

inputv update

Shouldn't there be an update on inputv with the real data after line 182, where the real_data is copied into inputv:

inputv.data.resize_(real_cpu.size()).copy_(real_cpu)

Then line 191 also has to be updated.

Is this a bug or am I missing something about how pytorch works?

Have you trained in celebA dataset?

In my experiments , Wgan in celeba generate more low quality images after 100000 iters compared to DCGAN.
I don't know why.

Where to get LSUN dataset

Hi, I'm new to computer vision and is trying to get up to speed with everything. When I run the

python main.py --dataset lsun --dataroot [dataroot] --cuda

command there is an error saying lmdb.Error: [dataroot]/bedroom_train_lmdb: no such file or directory. It looks like that I need to download the dataset myself. Can you provide a pointer to which I can get the dataset? Thank you very much!

Batchnorm Scaling Factor is Clamped Near Zero

I believe that the parameter clamping also reduces the batchnorm scaling factor to near zero, when (as far as I understand) it should stay near 1 (where it was initialized).

In line 170 of main.py:

# clamp parameters to a cube
for p in netD.parameters():
    p.data.clamp_(opt.clamp_lower, opt.clamp_upper)

D Loss did not decrease

When I trained my WGAN using DCGAN topology on MNIST dataset, I observed that at the very beginning G Loss has decreased to near zero and kept being very small, but D Loss did not decrease even trained after 20000 iterations. What does that mean? Please advice.

=============

Reading other people's comments , it seems that G loss does not have much meaning. But If I removed the training of Generator, just trained on Discriminator.

Iteration 0 complete. Discriminator avg loss: 6.67944732413e-06 Generator avg loss: 0
Iteration 100 complete. Discriminator avg loss: 1082.13891602 Generator avg loss: 0
Iteration 200 complete. Discriminator avg loss: 993.882385254 Generator avg loss: 0
Iteration 300 complete. Discriminator avg loss: 1047.83435059 Generator avg loss: 0
Iteration 400 complete. Discriminator avg loss: 1081.64208984 Generator avg loss: 0
Iteration 500 complete. Discriminator avg loss: 1024.39331055 Generator avg loss: 0
Iteration 600 complete. Discriminator avg loss: 936.895996094 Generator avg loss: 0
Iteration 700 complete. Discriminator avg loss: 1028.81530762 Generator avg loss: 0
Iteration 800 complete. Discriminator avg loss: 1029.08447266 Generator avg loss: 0
Iteration 900 complete. Discriminator avg loss: 1047.0847168 Generator avg loss: 0
Iteration 1000 complete. Discriminator avg loss: 995.261230469 Generator avg loss: 0
Iteration 1100 complete. Discriminator avg loss: 1048.86035156 Generator avg loss: 0
Iteration 1200 complete. Discriminator avg loss: 1040.19873047 Generator avg loss: 0
Iteration 1300 complete. Discriminator avg loss: 1020.10339355 Generator avg loss: 0
Iteration 1400 complete. Discriminator avg loss: 1075.15576172 Generator avg loss: 0

What does this mean?

some problem when run WassersteinGAN

how to run on my data on WassersteinGAN.can you give some advice thanks.

Problems with the optimization of loss.

For WGAN, it should maximize the loss of Discriminator, and minimize the loss of negative Generator. However, it did just in the opposite way in the codes. am I wrong?
I think it should like this:
errD_real.backward(mone) in 189.
errD_fake.backward(one) in 197.
errG.backward(mone) in 213.