google / compare_gan Goto Github PK
View Code? Open in Web Editor NEWCompare GAN code.
License: Apache License 2.0
Compare GAN code.
License: Apache License 2.0
I am training dcgan with TPU on colab but when I try to evaluate with GPU I get this
TypeError: '<=' not supported between instances of 'int' and 'str'
This happens with both a fake and real dataset.
Command used:
!python compare_gan/main.py --gin_config example_configs/dcgan_celeba64.gin --data_fake_dataset true --model_dir 'gs://***/models' --tfds_data_dir 'gs://***/' --schedule=continuous_eval --eval_every_steps=0
Here's the tail of the log:
2019-06-03 05:03:32.856155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-03 05:03:32.856235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-03 05:03:32.856252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-06-03 05:03:32.856264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-06-03 05:03:32.856460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14115 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5) I0603 05:03:49.514711 140045070399360 fid_score.py:54] Frechet Inception Distance: 433.464. I0603 05:03:49.515237 140045070399360 eval_gan_lib.py:209] Computed results for task <compare_gan.metrics.fid_score.FIDScoreTask object at 0x7f5e5e6a1eb8>: {'fid_score_mean': 433.68958, 'fid_score_std': 0.2974496, 'fid_score_list': '434.10986_433.4947_433.46417'} I0603 05:03:49.515923 140045070399360 runner_lib.py:276] Evaluation result for checkpoint gs://***/models/model.ckpt-0: {'inception_score_mean': 1.0062603, 'inception_score_std': 0.00025255934, 'inception_score_list': '1.0059446_1.0065628_1.0062735', 'fid_score_mean': 433.68958, 'fid_score_std': 0.2974496, 'fid_score_list': '434.10986_433.4947_433.46417'} (default value: -1.0) Traceback (most recent call last): File "compare_gan/main.py", line 133, in <module> app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "compare_gan/main.py", line 127, in main eval_every_steps=FLAGS.eval_every_steps) File "/content/gdrive/My Drive/compare_gan/compare_gan/runner_lib.py", line 354, in run_with_schedule num_averaging_runs=num_eval_averaging_runs) File "/content/gdrive/My Drive/compare_gan/compare_gan/runner_lib.py", line 277, in _run_eval task_manager.add_eval_result(checkpoint_path, result_dict, default_value) File "/content/gdrive/My Drive/compare_gan/compare_gan/runner_lib.py", line 209, in add_eval_result config = self._get_config_for_step(step) File "/content/gdrive/My Drive/compare_gan/compare_gan/runner_lib.py", line 202, in _get_config_for_step last_config_step = sorted([s for s in config_steps if s <= step])[-1] File "/content/gdrive/My Drive/compare_gan/compare_gan/runner_lib.py", line 202, in <listcomp> last_config_step = sorted([s for s in config_steps if s <= step])[-1] TypeError: '<=' not supported between instances of 'int' and 'str'
in gilbo.py change:
outdir = os.path.join(outdir, checkpoint_path.replace('/', '_'))
to
outdir = os.path.join(outdir, checkpoint_path.replace('/', '').replace('\', ''))
No matter my inputs, InfoGAN produces a huge model (12g+) causing tpu to close its socket. Changing settings (batch size, image size, number of samples, etc) did not seem to help.
Samples used were 256x256 RGB images with 4 labels
From google bucket:
model.ckpt-0.data-00000-of-00001 | 12.02 GB
Log:
I0606 12:01:05.467655 140082657560448 estimator.py:1111] Calling model_fn.
I0606 12:01:05.468298 140082657560448 datasets.py:210] Running with 1 hosts, modifying dataset seed for host 0 to 547.
I0606 12:01:05.468435 140082657560448 datasets.py:311] train_input_fn(): params={'batch_size': 16, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f6752a6f5c0>} seed=547
I0606 12:01:05.512352 140082657560448 modular_gan.py:396] _preprocess_fn(): images=Tensor("arg0:0", shape=(256, 256, 3), dtype=float32, device=/job:worker/task:0/device:CPU:0), labels=Tensor("arg1:0", shape=(4,), dtype=int32, device=/job:worker/task:0/device:CPU:0), seed=547
I0606 12:01:05.526810 140082657560448 tpu_random.py:71] Passing random offset: Tensor("Cast:0", shape=(), dtype=int32, device=/job:worker/task:0/device:CPU:0) with data ({'images': <tf.Tensor 'arg1:0' shape=(256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'arg2:0' shape=(14,) dtype=float32>}, <tf.Tensor 'arg3:0' shape=(4,) dtype=int32>).
I0606 12:01:05.617208 140082657560448 modular_gan.py:529] model_fn(): features={'images': <tf.Tensor 'InfeedQueue/dequeue:1' shape=(2, 256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'InfeedQueue/dequeue:2' shape=(2, 14) dtype=float32>, '_RANDOM_OFFSET': <tf.Tensor 'InfeedQueue/dequeue:0' shape=(2,) dtype=int32>}, labels=Tensor("InfeedQueue/dequeue:3", shape=(2, 4), dtype=int32, device=/device:TPU_REPLICATED_CORE:0),mode=train, params={'batch_size': 2, 'use_tpu': True, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f67521429e8>}
W0606 12:01:05.617559 140082657560448 modular_gan.py:537] Graph will be unrolled.
...
Generator variables:
+--------------------------+-----------------+-------------+---------+
| Name | Shape | Size | Type |
+--------------------------+-----------------+-------------+---------+
| generator/g_fc1/kernel:0 | (14, 1024) | 14,336 | float32 |
| generator/g_fc1/bias:0 | (1024,) | 1,024 | float32 |
| generator/g_bn1/gamma:0 | (1024,) | 1,024 | float32 |
| generator/g_bn1/beta:0 | (1024,) | 1,024 | float32 |
| generator/g_fc2/kernel:0 | (1024, 524288) | 536,870,912 | float32 |
| generator/g_fc2/bias:0 | (524288,) | 524,288 | float32 |
| generator/g_bn2/gamma:0 | (524288,) | 524,288 | float32 |
| generator/g_bn2/beta:0 | (524288,) | 524,288 | float32 |
| generator/g_dc3/kernel:0 | (4, 4, 64, 128) | 131,072 | float32 |
| generator/g_dc3/bias:0 | (64,) | 64 | float32 |
| generator/g_bn3/gamma:0 | (64,) | 64 | float32 |
| generator/g_bn3/beta:0 | (64,) | 64 | float32 |
| generator/g_dc4/kernel:0 | (4, 4, 3, 64) | 3,072 | float32 |
| generator/g_dc4/bias:0 | (3,) | 3 | float32 |
+--------------------------+-----------------+-------------+---------+
Total: 538,595,523
I0606 12:01:08.306834 140082657560448 utils.py:174]
Discriminator variables:
+--------------------------------+-----------------+-------------+---------+
| Name | Shape | Size | Type |
+--------------------------------+-----------------+-------------+---------+
| discriminator/d_conv1/kernel:0 | (4, 4, 3, 64) | 3,072 | float32 |
| discriminator/d_conv1/bias:0 | (64,) | 64 | float32 |
| discriminator/d_conv2/kernel:0 | (4, 4, 64, 128) | 131,072 | float32 |
| discriminator/d_conv2/bias:0 | (128,) | 128 | float32 |
| discriminator/d_fc3/kernel:0 | (524288, 1024) | 536,870,912 | float32 |
| discriminator/d_fc3/bias:0 | (1024,) | 1,024 | float32 |
| discriminator/d_fc4/kernel:0 | (1024, 1) | 1,024 | float32 |
| discriminator/d_fc4/bias:0 | (1,) | 1 | float32 |
...
I0606 12:09:19.918299 140082657560448 tpu_estimator.py:504] Init TPU system
I0606 12:09:27.281899 140082657560448 tpu_estimator.py:510] Initialized TPU in 7 seconds
I0606 12:09:27.785771 140081668916992 tpu_estimator.py:463] Starting infeed thread controller.
I0606 12:09:27.786308 140081652131584 tpu_estimator.py:482] Starting outfeed thread controller.
I0606 12:09:27.928345 140082657560448 tpu_estimator.py:536] Enqueue next (1000) batch(es) of data to infeed.
I0606 12:09:27.928739 140082657560448 tpu_estimator.py:540] Dequeue next (1000) batch(es) of data from outfeed.
I0606 12:09:34.653408 140081652131584 error_handling.py:70] Error recorded from outfeed: Socket closed
And if yes could you please specify how.
Hi, I trained TPU-accelerated GANs from https://github.com/tensorflow/gan
without any issues, but can't seem to get compare_gan examples to run on GCP TPUs.
Here is the general error, which appears whether using ctpu, gcloud, or the online GUI to setup compute resources.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation input_pipeline_task0/TensorSliceDataset: node input_pipeline_task0/TensorSliceDataset (defined at /usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py:1748) was explicitly assigned to /job:worker/task:0/device:CPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
Any thoughts here?
Is there a specific python/tensorflow version I should use for running compare_gan?
Thanks!
I looked into the SSGAN model in tfhub, it seems that the discriminator is the default one in resnet_biggan.py, with no parameter in rotation loss.
thank you for you job
when i ran eval_tuils.py,it come out "no module named tensorflow_gan"
what is wrong?
I'm trying to train a 128x128 image dataset with the BigGAN implementation here using a v2-128 pod, but am encountering several changing errors (highlights listed below) after the first "Dequeue next (500) batch(es) of data from outfeed". These remain even when I change the batch size from 2048 to 1024 and reduce iterations per run, etc. These don't occur when training on v2-8 or v3-8 TPUs. Have you ever encountered these while trying to train on pods instead, if that is the issue? Thanks!
Session::Close()
.Hi, it seems that SSGAN model is unavailable on tfhub. I have consult the tfhub team, but no reply yet. You say the colab code still work for you, but as I search 'SSGAN' in tfhub, nothing occured.
You may need to recheck that if the SSGAN model still on tfhub. I'm looking forward to your reply.
By the way, the unsupervised version of S3GAN (clustering) seems also unavaliable, could you be so nice to upload it?
Hello, when I run ”bash compare_gan_prepare_datasets.sh“, I got the error "tensor2tensor not found!". Clearly, there does not exist anything called t2t-datagen in the folder $HOME/.local/bin. So how can I get this file?
@Marvin182 Hi, the tfhub team has just upload your SSGAN module. It's wonderful but seems dose not have a trainable version.
I set m = hub.Module(spec_name, name="gen_module", tags={"gen", "bsNone"}, trainable=True), but the module offers no gradients when optimizor is applied.
Below is part of my code.
`
class Generator(object):
def init(self, module_spec, trainable=True):
self._module_spec = module_spec
self._trainable = trainable
self._module = hub.Module(self._module_spec, name="gen_module",
tags={"gen", "bsNone"}, trainable=self._trainable)
self.input_info = self._module.get_input_info_dict()
def build_graph(self, input_dict):
"""
Build tensorflow graph for Generator
:param input_dict: {'z_': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>,
'labels': None or (?,)}
:return:{'generated': <hub.ParsedTensorInfo shape=(?, 128, 128, 3) dtype=float32 is_sparse=False>}
"""
inv_input = {}
inv_input['z'] = G_mapping_ND(input_dict['z_'], 120, 120)
# inv_input['labels'] = input_dict.get('labels', None)
self.samples = self._module(inputs=inv_input, as_dict=True)['generated']
return self.samples
@Property
def trainable_variables(self):
return [var for var in tf.trainable_variables() if 'generator' in var.name]
`
I wonder if it is my implementaion not right or the module itself not trainable.
compare_gan/compare_gan/architectures/sndcgan.py
Lines 109 to 121 in 19922d3
Spectral Normalization paper uses 64-64-128-128-256-256-512 conv blocks for D. However, this repo's D is using 64-128-128-256-256-512-512 conv blocks, which is larger.
Will this project be updated to TF 2.0 once its release comes?
First of all, thanks for the code which is very complete and well documented. Nonetheless, I have an issue to use the modules created in the evaluation to generate samples.
I trained a BigGAN on the Cifar10 dataset, with the same config file as the one provided for BigGAN on ImageNet. I visualized the generated images on Tensorboard, and the GAN seemed to be learning correctly. Thus, I evaluated it to check the metrics and create the modules corresponding to different checkpoints.
The problem is that, once I loaded a module from the file system, it was generating images only composed of 0s and 1s. I'm not able to generate similar pictures to the ones generated during training and displayed on Tensorboard.
Do you have an idea of how to fix this issue ?
Thanks in advance
got the following error msg when I want to run 'python main.py':
File "main.py", line 30, in
from compare_gan import datasets
ModuleNotFoundError: No module named 'compare_gan'
INCEPTION_URL = "http://download.tensorflow.org/models/frozen_inception_v1_2015_12_05.tar.gz"
tfgan.eval.get_graph_def_from_url_tarball(url=INCEPTION_URL,filename=INCEPTION_FROZEN_GRAPH,tar_filename=os.path.basename(INCEPTION_URL))
执行上述代码会出现 http error 409: conflict
should change
scripts=[ 'bin/compare_gan_generate_tasks', 'bin/compare_gan_prepare_datasets.sh', 'bin/compare_gan_run_one_task', 'bin/compare_gan_run_test.sh', ],
to
scripts=[ 'compare_gan/bin/compare_gan_generate_tasks', 'compare_gan/bin/compare_gan_prepare_datasets.sh', 'compare_gan/bin/compare_gan_run_one_task', 'compare_gan/bin/compare_gan_run_test.sh', ],
First of all thank you for this thorough study! There is a slight bug carried out from the dcgan repo to here:
compare_gan/compare_gan/src/gans/GAN.py
Line 36 in 615bdc6
In here we are passing is_training
as a boolean which is true throughout training. This results in batchnorms getting updated when they shouldn't be:
when we are updating the discriminator weights, we want the batch norms of the generator to not get updated, the same when we are doing discriminator updates.
A separate is_training
flag for the discriminator and generator which is fed-in seems to be the way to do it.
Might want to mention this in the README at some point
I've cloned this repo and used python -m pip install -e . --user
in the director
but when I prepare the dataset .tensor2tensor not found! promtps.
Dose anyone have any ideas?
compare_gan$ compare_gan_prepare_datasets.sh
tensor2tensor not found!
compare_gan$ bash compare_gan/bin/compare_gan_prepare_datasets.sh
tensor2tensor not found!
so I uninstall compare-gan and install tensor2tensor
compare_gan$ pip uninstall compare-gan
Uninstalling compare-gan-1.0:
Would remove:
/home/mp/miniconda3/lib/python3.6/site-packages/compare-gan.egg-link
Proceed (y/n)? y
Successfully uninstalled compare-gan-1.0
compare_gan$ pip install tensor2tensor
Requirement already satisfied: pyasn1<0.5.0,>=0.4.1 in /home/mp/miniconda3/lib/python3.6/site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.4.1->google-api-python-client->tensor2tensor) (0.4.3)
Installing collected packages: tensor2tensor
Successfully installed tensor2tensor-1.6.3
And again using pip install -e .
or python -m pip install -e . --user
and execute compare_gan_prepare_datasets.sh
$ compare_gan_prepare_datasets.sh
tensor2tensor not found!
Successful for testing t2t
t2t-trainer \
--generate_data \
--data_dir=~/t2t_data \
--output_dir=~/t2t_train/mnist \
--problem=image_mnist \
--model=shake_shake \
--hparams_set=shake_shake_quick \
--train_steps=1000 \
--eval_steps=100
INFO:tensorflow:Finished evaluation at 2018-06-15-05:36:40
INFO:tensorflow:Saving dict for global step 1000: global_step = 1000, loss = 0.047610797, metrics-image_mnist/targets/accuracy = 0.9956, metrics-image_mnist/targets/accuracy_per_sequence = 0.9956, metrics-image_mnist/targets/accuracy_top5 = 1.0, metrics-image_mnist/targets/neg_log_perplexity = -0.015870268
INFO:tensorflow:Stop training model as max steps reached
thankyou very much fou your good job
I have some questions in ssgan, where is the discrimitor used in ssgan,and where is the loss used?
thanks again
Hi, I've read your paper "High-Fidelity Image GenerationWith Fewer Labels". It's a very fascinating work but I have one question about the pretrained feature extractor F.
In the paper you say you use ResNet50 V2 with widening factor 16, which is abnormally large, including 4 times more parameters than the BigGAN discriminator. My computer even cannot successfully construct it in CPU.
My resnet v2 code is from pytorch official repo https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py. I wonder whether I am misunderstanding your paper.
When I try to run the example config sndcgan_celebahq128.gin
like
python -m compare_gan.main --model_dir=hqtest --gin_config example_configs/sndcgan_celebahq128.gin
I get the error:
ValueError: Dataset celeb_a_hq_128 is not available.
I don't see reference to CelebaHQ in the datasets code either, although it does exist in some of the branches (ie "v2").
My understanding is that spectral_norm_update_ops in compare_gan/src/gans/ops.py is not used at all and perhaps wasn't used. Is my understanding correct?
Currently there are no baseline results published. It would really help me as a developer as it provides a head start into extending your code base:
Are you considering to provide those after an external event in the near future, e.g. after approving review of a conference submission?
My current tensorflow, cuda and cudnn are 1.13.2
, 10.0
and 7.6.5
, respectively. I also tried other versions (1.14
and 1.15
for tensorflow), but I got same error messages. Details are described below.
When training SSGAN, I used the following gin configuration, which is slightly modified from examples/resnet_cifar10.gin
:
dataset.name = "cifar10"
options.architecture = "resnet_cifar_arch"
options.batch_size = 64
options.gan_class = @SSGAN
options.lamba = 1
options.training_steps = 40000
options.z_dim = 128
# Generator
G.batch_norm_fn = @batch_norm
standardize_batch.decay = 0.9
standardize_batch.epsilon = 1e-5
# Discriminator
options.disc_iters = 5
D.spectral_norm = True
# Loss and optimizer
loss.fn = @non_saturating
penalty.fn = @no_penalty
SSGAN.g_lr = 0.0002
SSGAN.g_optimizer_fn = @tf.train.AdamOptimizer
SSGAN.rotated_batch_size = 64
tf.train.AdamOptimizer.beta1 = 0.5
tf.train.AdamOptimizer.beta2 = 0.999
Then, the below error message was occurred:
Traceback (most recent call last):
File "main.py", line 133, in <module> [24/1911]
app.run(main)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main.py", line 127, in main
eval_every_steps=FLAGS.eval_every_steps)
File "/home/hankook/Codes/compare_gan/compare_gan/runner_lib.py", line 337, in run_with_schedule
hooks=train_hooks)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2457, in train
rendezvous.raise_errors()
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
six.reraise(typ, value, traceback)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train
saving_listeners=saving_listeners)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1407, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1171, in run
run_metadata=run_metadata)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/hankook/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[7] does not have value
When using examples/resnet_cifar10.gin
, the training code was working successfully. How to fix this issue? Is there any gin configuration examples for SSGAN?
Seems that biggan can only be used unconditionally. This might be the case for infogan too. As far as I can tell, compare_gan does not make use of labels at all.
Here is my parse function:
def parse_function(self, proto):
feature_map = tf.parse_single_example(
proto,
features = {'image': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenSequenceFeature([NUM_CLASSES], tf.int64, allow_missing=1, default_value=[0])
}
)
image = tf.decode_raw(feature_map['image'], tf.uint8)
image = tf.reshape(image, (IMAGE_SIZE,IMAGE_SIZE,3))
image = tf.cast(image, tf.float32) / 255.0
label = feature_map['label']
label = tf.cast(label, tf.int32)
label = tf.reshape(label, (-1,NUM_CLASSES))
return image, label
This starts throwing errors when the layers are being set up. It will only work if I return label as an integer (not a tensor).
Is architectures/infogan.py the architecture (with only two conv layers) used for all datasets in "Are GANs Created Equal"? If not, what are the architectures for cifar10 and celeba? Thanks.
gradient penalty coupled with spectral normalization ,two (generator and discriminator) loss are nan.
cancel gradient penalty and just use spectral normalization in Ex∼qdata[D(x)]−Ez∼p(z)[D(G(z))],two (generator and discriminator) loss are normal.
why?cause i am a noob in deeplearning.
thx!
Traceback (most recent call last):
File "compare_gan_run_one_task", line 71, in
tf.app.run()
File "C:\anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "compare_gan_run_one_task", line 65, in main
"inceptionv1_for_inception_score.pb")))
File "c:\users\e\python\compare_gan\compare_gan\src\eval_gan_lib.py", line 888, in RunTaskEval
tasks_to_run)
File "c:\users\e\python\compare_gan\compare_gan\src\eval_gan_lib.py", line 778, in RunCheckpointEval
result_dict.update(task.RunInSession(options, sess, gan, real_images))
File "c:\users\e\python\compare_gan\compare_gan\src\eval_gan_lib.py", line 524, in RunInSession
return ComputeAccuracyLoss(options, sess, gan, real_images)
File "c:\users\e\python\compare_gan\compare_gan\src\eval_gan_lib.py", line 310, in ComputeAccuracyLoss
train_accuracy = sum(train_predictions) / float(len(train_predictions))
ZeroDivisionError: float division by zero
As tf-nightly-gpu versions below 2 no longer exists on pypi I am using
tensorboard==1.12.2
tensorflow==1.12.0
tensorflow-datasets==1.0.1
tensorflow-estimator==1.14.0
tensorflow-gan==0.0.0.dev0
tensorflow-gpu==1.12.0
tensorflow-hub==0.2.0
tensorflow-metadata==0.21.1
I am able to train using the BigGan network, however when it tries to evaluate I get the error
Traceback (most recent call last):
File "compare_gan/main.py", line 134, in
app.run(main)
File "/home/odak/.conda/envs/tester_venv2/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/odak/.conda/envs/tester_venv2/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "compare_gan/main.py", line 128, in main
eval_every_steps=FLAGS.eval_every_steps)
File "/home/odak/compare_gan/compare_gan/runner_lib.py", line 349, in run_with_schedule
gan.as_module_spec(),
File "/home/odak/compare_gan/compare_gan/gans/modular_gan.py", line 310, in as_module_spec
self._module_fn, tags_and_args=tags_and_args)
File "/home/odak/.conda/envs/tester_venv2/lib/python3.6/site-packages/tensorflow_hub/native_module.py", line 189, in create_module_spec
if err: raise ValueError(err)
ValueError: A state-holding node x of a module's graph (e.g., a Variable op) must not be subject to a tf.colocate_with(y) constraint unless y is also a state-holding node.
Details: in the graph for tags set(), node 'generator/embed_y/kernel/ExponentialMovingAverage' has op 'VarHandleOp', which counts as state-holding, but Operation.colocation_groups() == [b'loc:@generator/embed_y/kernel/ExponentialMovingAverage/Read/ReadVariableOp']
Thank you for the extensive experiments and reliable implementations which are hard to find in these days!
I have a few questions on CelebA-HQ 128x128 dataset preprocessing, which was mentioned in "A Large-Scale Study on Regularization and Normalization in GANs, Kurach et al., ICML 2019"
In the section 2.6 of the paper, authors mention that the images were preprocessed by running the 128x128x3 version of the code provided from PGGAN repository.
Can you give some detail on how exactly the "128x128x3 version" was implemented?
Two possibilities in my mind are that
(a) replace all the 1024's of the code with 128, or
(b) resize preprocessed 1024x1024x3 images (original CelebA-HQ images) to 128x128x3
My questions are:
Thanks again for your invaluable contribution!
After installing all packages I get the error:
compare_gan/gans/modular_gan.py:400 _preprocess_fn *
features = {
gin/config.py:407 wrapper *
operative_parameter_values = _get_default_configurable_parameter_values(
gin/config.py:738 _get_default_configurable_parameter_values *
representable = _is_literally_representable(arg_vals[k])
gin/config.py:537 _is_literally_representable *
return _format_value(value) is not None
gin/config.py:520 _format_value *
if parse_value(literal) == value:
gin/config.py:1480 parse_value *
return config_parser.ConfigParser(value, ParserDelegate()).parse_value()
gin/config_parser.py:250 parse_value *
self._raise_syntax_error('Unable to parse value.')
gin/config_parser.py:287 _raise_syntax_error *
raise SyntaxError(msg, location)
tensorflow_core/python/autograph/impl/api.py:396 converted_call
return py_builtins.overload_of(f)(*args)
TypeError: 'NoneType' object is not iterable
Is there a place I can access the triangle data set mentioned in the paper, or it's generator anywhere? Thanks
Thanks for open sourcing the code for this awesome paper!
I’m wondering if you used distributed training of the different GAN models during experimentation. If so, could you share an example of how to launch a distributed training job using compare_gan
code?
The image of Fashion is gray image which is 28x28x1, but the inception-v3 requires the channels of input image is 3. Thus you transform the size of images from Fashion? If so, how to do it in your code?
Thanks, please.
flake8 testing of https://github.com/google/compare_gan on Python 3.6.3
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./compare_gan/src/datasets.py:86:16: F821 undefined name 'xrange'
for i in xrange(range_start, range_end)
^
./compare_gan/src/gan_lib.py:198:38: E999 SyntaxError: invalid syntax
print " [*] Training started!"
^
./compare_gan/src/image_similarity.py:296:14: F821 undefined name 'xrange'
for k in xrange(len(power_factors)):
^
./compare_gan/src/simple_task_pb2.py:46:46: F821 undefined name 'unicode'
has_default_value=False, default_value=unicode("", "utf-8"),
^
./compare_gan/src/simple_task_pb2.py:53:46: F821 undefined name 'unicode'
has_default_value=False, default_value=unicode("", "utf-8"),
^
./compare_gan/src/gans/WGAN.py:42:35: E999 SyntaxError: invalid syntax
print "Using Adam optimizer."
^
./compare_gan/src/gans/gans_with_penalty.py:73:35: E999 SyntaxError: invalid syntax
print "Using Adam optimizer."
^
./compare_gan/src/gans/ops.py:141:35: E999 SyntaxError: invalid syntax
print " [*] Spectral norm layers"
^
4 E999 SyntaxError: invalid syntax
4 F821 undefined name 'xrange'
8
I tried instantiating a simple WGAN_CP (based on your task 3), but got this error:
NotImplementedError: _model_fn() must be implemented in subclasses of AbstractGAN.
Looking at the code, there's only one file (compare_gan/src/gans/gans_with_penalty.py
) which implements this function. I wonder how this worked for you – is this related to TF version or so?
I uploaded a gist with a testcase: https://gist.github.com/hmeine/d7ef0c38790dc885f3f73e621e14ff31
There's a button opening the notebook on colab, where you can see that the dependency versions are:
tensor2tensor 1.11.0
tensorflow 1.13.0rc1
In penalty_lib.py line 75, interpolates = x + alpha * (x_fake - x). As for the Algorithm 1 of paper 'Improved Training of Wasserstein GANs', the interpolates equation is wrote as interpolates = alpha * x + (1-alpha)*x_fake. Are these two formulation equivalent? Or There were some tricks I missed.
Traceback (most recent call last):
File "compare_gan_run_one_task", line 71, in
tf.app.run()
File "C:\anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "compare_gan_run_one_task", line 57, in main
gan_lib.run_with_options(options, task_workdir)
File "c:\users\e\python\compare_gan\compare_gan\src\gan_lib.py", line 255, in run_with_options
gan.train(sess, progress_reporter)
File "c:\users\e\python\compare_gan\compare_gan\src\gans\abstract_gan.py", line 441, in train
d_loss, g_loss = self.run_single_train_step(features, step, g_loss, sess)
TypeError: run_single_train_step() missing 1 required positional argument: 'sess'
Getting this error despite doing pip install -e .
$ python main.py --help
Traceback (most recent call last):
File "main.py", line 31, in
from compare_gan import runner_lib
File "/home/maths/dual/mt6160653/biggan/compare_gan-master/compare_gan/runner_lib.py", line 30, in
from compare_gan import eval_gan_lib
File "/home/maths/dual/mt6160653/biggan/compare_gan-master/compare_gan/eval_gan_lib.py", line 27, in
from compare_gan import eval_utils
File "/home/maths/dual/mt6160653/biggan/compare_gan-master/compare_gan/eval_utils.py", line 35, in
import tensorflow_gan as tfgan
File "/home/maths/dual/mt6160653/.local/lib/python3.6/site-packages/tensorflow_gan/init.py", line 72, in
from tensorflow_gan.python import * # pylint: disable=wildcard-import
File "/home/maths/dual/mt6160653/.local/lib/python3.6/site-packages/tensorflow_gan/python/init.py", line 36, in
from tensorflow_gan.python import estimator
File "/home/maths/dual/mt6160653/.local/lib/python3.6/site-packages/tensorflow_gan/python/estimator/init.py", line 24, in
from .gan_estimator import *
File "/home/maths/dual/mt6160653/.local/lib/python3.6/site-packages/tensorflow_gan/python/estimator/gan_estimator.py", line 28, in
from tensorflow_gan.python import contrib_utils as contrib
File "/home/maths/dual/mt6160653/.local/lib/python3.6/site-packages/tensorflow_gan/python/contrib_utils.py", line 48, in
collection=tf.GraphKeys.GLOBAL_VARIABLES):
AttributeError: module 'tensorflow' has no attribute 'GraphKeys'
I have read this but I cannot alter the package tensorflow_gan.
Hello, if I want to run training on the GPU, how should I modify the program?
Trying to run the code with an example config and the described command line options, I get the following error when doing eval:
last_config_step = sorted([s for s in config_steps if s <= step])[-1]
TypeError: '<=' not supported between instances of 'int' and 'str'
Adding in an explicit casting int(step) fixed the problem. I assume this is a problem with my versions/libraries and not the code, but I can't make any sense of it.
I notice that the shuffle buffer size is 10000, and is not included in gin configurations. While the original imagenet datasets are sorted by classes, this means that the images are feeded to the unconditional GANs with class info. Could that contradicts the unconditional training, especially for SSGAN and S3GAN?
Hello. After roughly one year and 200+ training runs, I finally narrowed down why compare_gan fails to achieve any kind of reasonable result for BigGAN-Deep, and why it fails to achieve the same FID for vanilla BigGAN.
The answer is that it's missing a + 1
in the conditional batch norm function. Specifically, you must add 1 to gamma so that it's centered around 1. Without this, the model is basically multiplied by zero to start with.
Good luck to whoever finds this, and godspeed. (Twitter thread with proof that BigGAN-Deep works now: https://twitter.com/theshawwn/status/1342684798905688065 feel free to DM me with questions or whatever.)
Will you provide the pretrained models?
Hello TF team! Greatly appreciate the work, robust documentation, and examples. THANK YOU inadvance!
I think the bug I've identified is related to either:
tensorboard 1.13.1
tensorflow 1.13.1
tensorflow-estimator 1.13.0
tensorflow-hub 0.5.0
from future import absolute_import
from future import division
from future import print_function
import os
import IPython
from IPython.display import display
import numpy as np
import PIL.Image
import pandas as pd
import six
import tensorflow as tf
import tensorflow_hub as hub
def imgrid(imarray, cols=8, pad=1):
pad = int(pad)
assert pad >= 0
cols = int(cols)
assert cols >= 1
N, H, W, C = imarray.shape
rows = int(np.ceil(N / float(cols)))
batch_pad = rows * cols - N
assert batch_pad >= 0
post_pad = [batch_pad, pad, pad, 0]
pad_arg = [[0, p] for p in post_pad]
imarray = np.pad(imarray, pad_arg, 'constant')
H += pad
W += pad
grid = (imarray
.reshape(rows, cols, H, W, C)
.transpose(0, 2, 1, 3, 4)
.reshape(rowsH, colsW, C))
return grid[:-pad, :-pad]
def imshow(a, format='png', jpeg_fallback=True):
a = np.asarray(a, dtype=np.uint8)
if six.PY3:
str_file = six.BytesIO()
else:
str_file = six.StringIO()
PIL.Image.fromarray(a).save(str_file, format)
png_data = str_file.getvalue()
try:
disp = display(IPython.display.Image(png_data))
except IOError:
if jpeg_fallback and format != 'jpeg':
print ('Warning: image was too large to display in format "{}"; '
'trying jpeg instead.').format(format)
return imshow(a, format='jpeg')
else:
raise
return disp
class Generator(object):
def init(self, module_spec):
self._module_spec = module_spec
self._sess = None
self._graph = tf.Graph()
self._load_model()
@Property
def z_dim(self):
return self._z.shape[-1].value
@Property
def conditional(self):
return self._labels is not None
def _load_model(self):
with self._graph.as_default():
self._generator = hub.Module(self._module_spec, name="gen_module",
tags={"gen", "bsNone"})
input_info = self._generator.get_input_info_dict()
inputs = {k: tf.placeholder(v.dtype, v.get_shape().as_list(), k)
for k, v in self._generator.get_input_info_dict().items()}
self._samples = self._generator(inputs=inputs, as_dict=True)["generated"]
print("Inputs:", inputs)
print("Outputs:", self._samples)
self._z = inputs["z"]
self._labels = inputs.get("labels", None)
def _init_session(self):
if self._sess is None:
self._sess = tf.Session(graph=self._graph)
self._sess.run(tf.global_variables_initializer())
def get_noise(self, num_samples, seed=None):
if np.isscalar(seed):
np.random.seed(seed)
return np.random.normal(size=[num_samples, self.z_dim])
z = np.empty(shape=(len(seed), self.z_dim), dtype=np.float32)
for i, s in enumerate(seed):
np.random.seed(s)
z[i] = np.random.normal(size=[self.z_dim])
return z
def get_samples(self, z, labels=None):
with self._graph.as_default():
self._init_session()
feed_dict = {self._z: z}
if self.conditional:
assert labels is not None
assert labels.shape[0] == z.shape[0]
feed_dict[self._labels] = labels
samples = self._sess.run(self._samples, feed_dict=feed_dict)
return np.uint8(np.clip(256 * samples, 0, 255))
class Discriminator(object):
def init(self, module_spec):
self._module_spec = module_spec
self._sess = None
self._graph = tf.Graph()
self._load_model()
@Property
def conditional(self):
return "labels" in self._inputs
@Property
def image_shape(self):
return self._inputs["images"].shape.as_list()[1:]
def _load_model(self):
with self._graph.as_default():
self._discriminator = hub.Module(self._module_spec, name="disc_module",
tags={"disc", "bsNone"})
input_info = self._discriminator.get_input_info_dict()
self._inputs = {k: tf.placeholder(v.dtype, v.get_shape().as_list(), k)
for k, v in input_info.items()}
self._outputs = self._discriminator(inputs=self._inputs, as_dict=True)
print("Inputs:", self._inputs)
print("Outputs:", self._outputs)
def _init_session(self):
if self._sess is None:
self._sess = tf.Session(graph=self._graph)
self._sess.run(tf.global_variables_initializer())
def predict(self, images, labels=None):
with self._graph.as_default():
self._init_session()
feed_dict = {self._inputs["images"]: images}
if "labels" in self._inputs:
assert labels is not None
assert labels.shape[0] == images.shape[0]
feed_dict[self._inputs["labels"]] = labels
return self._sess.run(self._outputs, feed_dict=feed_dict)
model_name = "SSGAN 128x128 (FID 20.6, IS 24.9)"
models = {
"SSGAN 128x128": "https://tfhub.dev/google/compare_gan/ssgan_128x128/1",
}
NotFoundError Traceback (most recent call last)
in
11 tf.reset_default_graph()
12 print("Loading model...")
---> 13 sampler = Generator(module_spec)
14 print("Model loaded.")
in init(self, module_spec)
64 self._sess = None
65 self._graph = tf.Graph()
---> 66 self._load_model()
67
68 @Property
in _load_model(self)
77 with self._graph.as_default():
78 self._generator = hub.Module(self._module_spec, name="gen_module",
---> 79 tags={"gen", "bsNone"})
80 input_info = self._generator.get_input_info_dict()
81 inputs = {k: tf.placeholder(v.dtype, v.get_shape().as_list(), k)
~/anaconda3/envs/base_ml/lib/python3.7/site-packages/tensorflow_hub/module.py in init(self, spec, trainable, name, tags)
168 name=self._name,
169 trainable=self._trainable,
--> 170 tags=self._tags)
171 # pylint: enable=protected-access
172
~/anaconda3/envs/base_ml/lib/python3.7/site-packages/tensorflow_hub/native_module.py in _create_impl(self, name, trainable, tags)
338 trainable=trainable,
339 checkpoint_path=self._checkpoint_variables_path,
--> 340 name=name)
341
342 def _export(self, path, variables_saver):
~/anaconda3/envs/base_ml/lib/python3.7/site-packages/tensorflow_hub/native_module.py in init(self, spec, meta_graph, trainable, checkpoint_path, name)
380
381 register_ops_if_needed({
--> 382 op.name for op in self._meta_graph.meta_info_def.stripped_op_list.op})
383
384 if _is_tpu_graph_function():
~/anaconda3/envs/base_ml/lib/python3.7/site-packages/tensorflow_hub/native_module.py in register_ops_if_needed(graph_ops)
820 "Graph ops missing from the python registry (%s) are also absent from "
821 "the c++ registry."
--> 822 % missing_ops.difference(set(cpp_registry_ops.keys())))
823
824
NotFoundError: Graph ops missing from the python registry ({'BatchMatMulV2'}) are also absent from the c++ registry.
Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.
First, I notice that the default moments used for batchnorm during inference are the accumulated values:
compare_gan/compare_gan/architectures/arch_ops.py
Lines 299 to 304 in e0b739f
Does it mean that the hyperparameter decay
for batchnorm is not used at all?
Second, I also notice that the shortcuts are added only when in_channels !=out_channels
:
BigGAN-pytorch
:in_channels !=out_channels
or when the block is an upsampling or downsampling block.
Third, I find that BigGAN-pytorch
omit the first relu activation in the first DBlock by setting preactivation=False
, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch
, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.
Do you think these discrepancy would have a significant influence on the performance of BigGAN?
Thanks
In line 260, resnet_biggan_deep.py, z already be set into tf.concat([z,y],1), while in line 285, when feed parameters to the resnet block, z is treated as the original latents code and y is repeatedly feed into the block, which means that the resnet block in fact get y twice. This seems inconsistent with the settings in the biggan paper.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.