mit-han-lab / data-efficient-gans Goto Github PK

View Code? Open in Web Editor NEW

1.3K 19.0 175.0 34.71 MB

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training

Home Page: https://arxiv.org/abs/2006.10738

License: BSD 2-Clause "Simplified" License

Python 94.93% Cuda 3.32% Shell 0.89% Dockerfile 0.06% C++ 0.80%

gans pytorch tensorflow data-efficient generative-adversarial-network image-generation neurips-2020

data-efficient-gans's People

Contributors

Stargazers

Watchers

Forkers

yqgans afterall204168 ytian8 pgsrv jason-168 ejhortala zsyzzsoft shyamalschandra suvrajeet01 jac002020 wangkua1 obake2ai lotayou ggsonic jiayiliu chunyuanli trantorrepository ashbt andersonfaaria dvschultz doantientai yyingh hejiaxing97 cdq14 rian-t l4rz wn9081 naoyanickf mehdidc vermeille justinpinkney bopenggit sayonb kunato peteaj hytseng0509 edwardnguyen1705 cedro3 mingukkang ak9250 n00mkrad sinianyutian tanmdl blakecheng zhonglj2012 lsheiba kkodoo littlefish12 zhigaloff cesar-claros templeblock aman0044 happypennygames molo32 jxzhangjhu duyuankai1992 ubcdingxin songkq xxchenxx diting-li weej1 arthurmteodoro materialvision arufuss bmaser datduong steveabate edmontdants right-son chaoso ovsen aghazahedim syncle harini-ashok zcxxlshirley giannisdaras milkigit shaun95 sailfish009 luckyplusten nupurkmr9 nathanhundley helloimmelie siriussota a411919924 mithi21 comeonlgq junweiz yaoyao-liu wchan757 curtisasmith srossi93 mary-hh rbnprdy stevenjokess chernobylcitybus wjgaas dariadiatlova widemeadows pragyanaischool

data-efficient-gans's Issues

Training seems to be stuck?

I am currently training with with Colab using DiffAug StyleGan with this command

!python data-efficient-gans/DiffAugment-stylegan2/run_low_shot.py --dataset=images --num-gpus=1 \
  --DiffAugment=color,translation,cutout --resolution=256 \
  --total-kimg=100 --batch-size=8

Seems like the training is stuck? It has been an hour and the output is still this:

16x16/Conv1_down     2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
16x16/Skip           262144    (?, 512, 8, 8)      (1, 1, 512, 512)
8x8/Conv0            2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)      (3, 3, 512, 512)
8x8/Skip             262144    (?, 512, 4, 4)      (1, 1, 512, 512)
4x4/MinibatchStddev  -         (?, 513, 4, 4)      -               
4x4/Conv             2364416   (?, 512, 4, 4)      (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)            (8192, 512)     
Output               513       (?,)                (512, 1)        
scores_out           -         (?,)                -               
---                  ---       ---                 ---             
Total                28864129                                      

Building TensorFlow graph...
Initializing logs...
Training for 100 kimg...

tick 0     kimg 0.0      lod 0.00  minibatch 8    time 34s          sec/tick 33.8    sec/kimg 1055.01 maintenance 0.0    gpumem 8.8
Downloading http://d36zk2xti64re0.cloudfront.net/stylegan1/networks/metrics/inception_v3_features.pkl ... done
network-snapshot-000000        time 4m 11s       fid5k-train 414.4864

Or is this normal?

AttributeError: module 'tensorflow.compat.v2'has no attribute 'contrib'

When trying to train the model with a custom dataset,There was an error. Checking the information found that the TensorFlow version needs to be upgraded, but you need version 1.14. Is this a contradiction?My installed version is 1.14。

Running Cifar10 stylegan experiment gives broadcasting error

I am running the command as mentioned in the github repo:
python run_cifar.py --dataset=cifar10 --num-gpus=4 --DiffAugment=color,cutout

I am getting the following error log:

Dataset shape = [3, 32, 32]
Dynamic range = [0, 255]
Label size = 0
Traceback (most recent call last):
File "run_cifar.py", line 166, in
main()
File "run_cifar.py", line 160, in main
run(**vars(args))
File "run_cifar.py", line 90, in run
dnnlib.submit_run(**kwargs)
File "/home/msingh/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/home/msingh/DiffAugment-stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/home/msingh/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/home/msingh/DiffAugment-stylegan2/training/training_loop.py", line 148, in training_loop
grid_size, grid_reals, grid_labels = misc.setup_snapshot_image_grid(training_set, **grid_args)
File "/home/msingh/DiffAugment-stylegan2/training/misc.py", line 134, in setup_snapshot_image_grid
reals[:], labels[:] = training_set.get_minibatch_np(gw * gh)
ValueError: could not broadcast input array from shape (1024) into shape (1024,0)

Training with custom datasets

When trying to train the model with a custom dataset, after specifying the path to the dataset in "run_few_shot.py" with the argument "dataset". I get an error, AssertionError : in create_dataset assert os.path.isdir(data_dir).
though I specified the right folder path.

can't pickle _thread.RLock objects

Hey there :)

I tried to follow the colab tutorial, and when I try to generate more images with increasing rows and cols to 25, like

generate('mit-han-lab:DiffAugment-stylegan2-100-shot-obama.pkl', num_rows=25, num_cols=25, seed=1000)

it gives me the error of

MaybeEncodingError                        Traceback (most recent call last)
<ipython-input-8-310a7f1bf6ff> in <module>()
----> 1 generate('mit-han-lab:DiffAugment-stylegan2-100-shot-obama.pkl', num_rows=25, num_cols=25, seed=1000)

2 frames
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7eff39dd3128>'. Reason: 'TypeError("can't pickle _thread.RLock objects",)'

Is there any solution to solve it?

Thanks

Better FID with smaller batch size

Hi, thanks for the quick release of the code. The following is not an issue, but an observation I made while playing around with the code. If we keep everything the same, and simply reduce the batch size to 16 (default is 32), the FID for the Obama dataset improves from 54.39 (reported in the paper) to 47.0032. Was there a trend with variation in batch size that the authors observed in the scenario of few-shot generation?

comparison against stylegan2-ada

how does the precision of this job compare vs stylegan2-ada.

[Error] How do you add your custom dataset to train?

Sorry I am kind of new to this but based on my experience using StyleGAN2 you convert your images to tfrecords files so I assume you also do this in when using DiffAugment-StyleGAN2 so when I run this in Colab

!python run_cifar.py --metrics=none --mirror-augment=True --total-kimg=5000 --result-dir='/content/drive/My Drive/storage/mydataset' --num-gpus=1 --resume="/content/drive/My Drive/pretrainednetworksnapshot-0XXXX.pkl" --DiffAugment=color,cutout,translation --resolution=512 --dataset='/content/drive/My Drive/mydataset'

where mydataset is the folder which contains the tfrecords file but I am getting an error

dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Traceback (most recent call last):
  File "run_cifar.py", line 171, in <module>
    main()
  File "run_cifar.py", line 165, in main
    run(**vars(args))
  File "run_cifar.py", line 95, in run
    dnnlib.submit_run(**kwargs)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/training/training_loop.py", line 158, in training_loop
    training_set = dataset.load_dataset(verbose=True, **dataset_args)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/training/dataset.py", line 245, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**kwargs)
  File "/content/data-efficient-gans-master/DiffAugment-stylegan2/training/dataset.py", line 113, in __init__
    dset, info = tfds.load(name=self.name, data_dir=tfds_data_dir, split=split, with_info=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/registered.py", line 292, in load
    name, name_builder_kwargs = _dataset_name_and_kwargs_from_name_str(name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/registered.py", line 339, in _dataset_name_and_kwargs_from_name_str
    raise ValueError(_NAME_STR_ERR.format(name_str))
ValueError: Parsing builder name string /content/drive/My Drive/mydataset failed.
The builder name string must be of the following format:
  dataset_name[/config_name][:version][/kwargs]

  Where:

    * dataset_name and config_name are string following python variable naming.
    * version is of the form x.y.z where {x,y,z} can be any digit or *.
    * kwargs is a comma list separated of arguments and values to pass to
      builder.

  Examples:
    my_dataset
    my_dataset:1.2.*
    my_dataset/config1
    my_dataset/config1:1.*.*
    my_dataset/config1/arg1=val1,arg2=val2
    my_dataset/config1:1.2.3/right=True,foo=bar,rate=1.2

Then I read the readme for DiffAugment-StyleGAN2 which says

... After putting all images into a single folder, pass it to WHICH_DATASET, the images will be resized to the specified resolution if necessary, and then enjoy the outputs!

So I thought you would just put your images to that folder and pass it to WHICH_DATASET and it would automatically convert that to a tfrecord file? So I just did that but I am still getting the same error as above. Please help.

how to change time to save chekpoint?

how to change time to save chekpoint, eg every 30m 1hs?

OOM and CUDA Error's

Hi,

I am getting an OOM error on Colab (P-100 16 GB RAM) with the following:

cd DiffAugment-stylegan2
python run_few_shot.py --dataset=100-shot-obama --num-gpus=1

Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[32,128,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GPU0/loss/D_1/256x256/Conv0/FusedBiasAct}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[TrainG/Apply0/cond_111/pred_id/_2541]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[32,128,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GPU0/loss/D_1/256x256/Conv0/FusedBiasAct}}]]

So I tried it on 8xV-100's.
It gave an OOM error with my dataset at 1024 but the obama dataset reached till below and then gave another error.

tick 0     kimg 0.1      lod 0.00  minibatch 32   time 49s          sec/tick 49.1    sec/kimg 383.77  maintenance 0.0    gpumem 6.3
Downloading http://d36zk2xti64re0.cloudfront.net/stylegan1/networks/metrics/inception_v3_features.pkl ... done
network-snapshot-000000        time 3m 09s       fid5k-train 396.6058
2020-07-02 13:34:00.935725: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-07-02 13:34:00.935780: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Aborted (core dumped)

CUDA details:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

What is the max resolution supported on 16 GB RAM? Sorry to mix two issues. I can open a separate issue for the CUDA error if needed.

Testing samples for calculating FID on 100-shot tasks.

Hi,
Can you please share the testing samples (5k) used to calculate the FID on few-shot generation (AnimalFaces: cat and dog)? The Animal Faces dataset contains only 160 examples of cat and 389 examples of dog.
Thanks

DiffAug with other GAN architecture

Thank you for your wonderful job! I was wondering did you try diffaug on other GAN networks like SNGAN? Thank you in advance!

DiffAugment-biggan-imagenet missing 'mock' dependency

I've started trying to run the ImageNet code on my TPU pod to validate that it works before trying at higher resolutions with my anime datasets, and the requirements appears to be missing the mock dependency, so it currently has to be pip-installed separately:

cd ./data-efficient-gans/DiffAugment-biggan-imagenet/
pip install -e .
pip install mock

On a side note, I am curious if you guys have any thoughts about why this fork works while the original compare_gan appears to not match the BigGAN numbers? Your paper clearly indicates that you trained the baseline BigGAN 3 times and got good FID/IS numbers (which are not copied from the original BigGAN paper), but our ImageNet runs with compare_gan never worked well. I copied over all the original files to look at the diff, and the only thing that seems meaningfully different is that all of your runs, data-augmented or not, use image-zooming rather than the original cropping code. I can't imagine how that could fix compare_gan but I also don't see what else could be the difference.

Custom Dataset of 800x600 or 600x450 resolution

Is it possible to use a custom dataset with this input for training and output?
How do you do this?

Wrong illustration

On the project page, there is a wrong illustration for "LSUN-Cat Generation".

Indeed, these two images are the same:

Images in [-0.5, 0.5] range expected, right?

Incompatible PKL's?

Hi. Is the PKL generated by run_few_shot not compatible with regular StyleGAN2? I tried to transfer learn from a PKL generated by run_few_shot using a regular StyleGAN2 repo and got the following error:

Building TensorFlow graph...
Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 642, in set_shape
    unknown_shape)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes must be equal rank, but are 2 and 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/run_training.py", line 192, in <module>
    main()
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/run_training.py", line 187, in main
    run(**vars(args))
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/run_training.py", line 120, in run
    dnnlib.submit_run(**kwargs)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/training/training_loop.py", line 220, in training_loop
    G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, **G_loss_args)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/util.py", line 256, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/training/loss.py", line 152, in G_logistic_ns_pathreg
    fake_images_out, fake_dlatents_out = G.get_output_for(latents, labels, is_training=True, return_dlatents=True)
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/tflib/network.py", line 221, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "<string>", line 206, in G_main
  File "/content/drive/My Drive/Nvidia_Stylegan2/stylegan2/dnnlib/tflib/network.py", line 221, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "<string>", line 281, in G_mapping
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 645, in set_shape
    raise ValueError(str(e))
ValueError: Shapes must be equal rank, but are 2 and 1

Generating images

Hey there,

I once trained my custom dataset with 100 images with below configure (64 resolution)

!python3 run_100_shot.py --dataset=100-custom --resolution=64

and its sample at result dir is a concatenated image consists of 1024 small images. I could easily separate them and have 1000 images. That is good!

-The problem is when I make --resolution=256, it samples only a 120 small images at directory. Is there anyway that I could sample more images?! like again 1024 images!

Thanks for your response

ImportError: This version of TensorFlow Datasets requires TensorFlow version >= 2.1.0; Detected an installation of version 1.15.0. Please upgrade TensorFlow to proceed.

Traceback (most recent call last):
File "run_low_shot.py", line 16, in
from metrics import metric_base
File "/content/data-efficient-gans/DiffAugment-stylegan2/metrics/metric_base.py", line 18, in
from training import dataset
File "/content/data-efficient-gans/DiffAugment-stylegan2/training/dataset.py", line 14, in
import tensorflow_datasets as tfds
File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/init.py", line 51, in
from tensorflow_datasets import __init__py3 as api
File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/__init__py3.py", line 43, in
from tensorflow_datasets.core import tf_compat
File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/init.py", line 21, in
tf_compat.ensure_tf_install()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/tf_compat.py", line 59, in ensure_tf_install
"This version of TensorFlow Datasets requires TensorFlow "
ImportError: This version of TensorFlow Datasets requires TensorFlow version >= 2.1.0; Detected an installation of version 1.15.0. Please upgrade TensorFlow to proceed.

Error when train on Colab

It says me 'ImportError: This version of TensorFlow Datasets requires TensorFlow version >= 2.1.0;' when try to train on the Colab notebook.
But after upgrading I got this other error: 'no module named tensorflow.contrib'.

Do you have a sample Colab example with pytorch or tensorflow?

Do you have a sample Colab example with pytorch or tensorflow? That would be great!

overfitting issues

"The training length (default to 300k images) may be increased for larger datasets; note that there may be overfitting issues if the training is too long."

for too long training, is there any metric for this based on the dataset size?

raise RuntimeError('No GPU devices found')

DiffAugment-stylegan2/dnnlib/tflib/custom_ops.py", line 52, in _get_cuda_gpu_arch_string
raise RuntimeError('No GPU devices found')
RuntimeError: No GPU devices found
The GPU in my PC is 3 RTX 2080Ti
what happens?

training with grayscale images

Hi there,

I wanted to train the network with grayscale images. To do so, after I cnvert my images to grayscale, I changed the num_channels=1 in training.networks_stylegan2. However, I still observe the same number of network parameters during training as when my images were rgb. May I ask you please why? or what did i do wrong?

Best Regards

Problem in training with partial data in DiffAugment-stylegan2

Thanks for your contribution!
I got errors when I run the following command.

python run_cifar.py --dataset=cifar10 --num-samples=5000 --num-gpus=2 --DiffAugment=color,translation,cutout

I also found this error only appears when setting num-samples.

The details are as follows:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
         [[{{node GPU0/DataFetch/IteratorGetNext}}]]
  (1) Out of range: End of sequence
         [[{{node GPU0/DataFetch/IteratorGetNext}}]]
         [[GPU0/DataFetch/IteratorGetNext/_3169]]
0 successful operations.
1 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_cifar.py", line 167, in <module>
    main()
  File "run_cifar.py", line 161, in main
    run(**vars(args))
  File "run_cifar.py", line 91, in run
    dnnlib.submit_run(**kwargs)
  File "/tf/code/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/tf/code/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/tf/code/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/tf/code/training/training_loop.py", line 291, in training_loop
    tflib.run(D_train_op, feed_dict)
  File "/tf/code/dnnlib/tflib/tfutil.py", line 31, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
         [[node GPU0/DataFetch/IteratorGetNext (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Out of range: End of sequence
         [[node GPU0/DataFetch/IteratorGetNext (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[GPU0/DataFetch/IteratorGetNext/_3169]]
0 successful operations.
1 derived errors ignored.

Original stack trace for 'GPU0/DataFetch/IteratorGetNext':
  File "run_cifar.py", line 167, in <module>
    main()
  File "run_cifar.py", line 161, in main
    run(**vars(args))
  File "run_cifar.py", line 91, in run
    dnnlib.submit_run(**kwargs)
  File "/tf/code/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/tf/code/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/tf/code/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/tf/code/training/training_loop.py", line 202, in training_loop
    reals_read, labels_read = training_set.get_minibatch_tf()
  File "/tf/code/training/dataset.py", line 171, in get_minibatch_tf
    return self._tf_iterator.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Envs:
Python 3.6
CUDA 10.0
tensorflow-gpu: 1.15.2

Question about IS and FID score reported.

Hi.

Thank you for your wondrous works, GANs with DiffAugment!

I wonder the evaluation details of your work: dataloader, IS, and FID values on imagenet 128 resolution experiments.

In the BigGAN paper and authors' PyTorch implementation, the authors wrote that cropping along with long edge and resizing images are used (plz refer to 453-476 lines of https://github.com/ajbrock/BigGAN-PyTorch/blob/master/utils.py).

Also, for evaluation:

they use 1,281,167 training images to calculate moments: mu, and covariance, for calculating FID.
they split 50,000 generated images into ten folds (each fold has 5,000 samples), and each fold is used to calculate IS.

However, your implementation of BigGAN on Imagenet:

use imagenet validation images (total 50,000) to calculate the moments for calculating FID (CRGAN, ICRGAN does).
use 50,000 generated images to calculate IS without splitting into 10 folds.
As can see at "data-efficient-gans/DiffAugment-biggan-imagenet/compare_gan/datasets.py" 476-491 lines, your implementation for pre-processing seems to use bounding box information (tf.image.sample_distorted_bounding_box).

Since all values reported in your paper are calculated from your implementation, this is not a problem.
But, I just wonder about the details of your evaluation.

Thank you:)

Sincerely,

MInguk Kang

TPU support

Hi, are you planning to add TPU support only for DiffAugment-biggan-imagenet and only on the Pytorch backend?

policy=“translation” is failed

Hi， I test you code with two framework, and two dataformat, and the same result is "translation" policy is failed!

Envs:
Python: 3.7
Ubuntu: 1804
CUDA 10.1
Pytorch: 1.4.0

Test1:

I use your code in BMSG-GAN repo and train 3-channel dataset(102 -flower) but only using 755 samples for checking, when the policy included "translation" , and failed to train.

RuntimeError: CuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input

but if policy I just used "color,cut", it work fine!

Test2:

In my own GAN with 1-channel dataset, the performance is the same, if not use "translation" policy it work fine, otherwise it failed

FID Score improvement

Hi there,

I have a dataset of 600 images. I trained the styleGAN-1 with all of them (resolution = 256) and got the maximum FID = 86 at kimg = 7726.
For training your data-efficient-gans network, I used 110 images (resolution = 256) and at kimg = 201, I got FID = 114, and at kimg = 301, the FID = 119.
May I ask what should I do to get better FID score? because the tutorial says, the data-efficient network is supposed to get 1.7 time better FID than normal styleGAN. Thanks for your support and response.

Best

kimg meaning

what does kimg mean?
when I summarize a pkl the kimg starts from 0.
Does it affect something from where kimg starts training?

nvcc fails on windows

op.h exists nowhere in my python environment.. where do I get it?

(tf115py37) c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2>python run_low_shot.py --dataset=100-shot-obama --resolution=64
2020-12-20 02:40:10.999131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Local submit - run_dir: results\00018-DiffAugment-stylegan2-100-shot-obama-64-batch16-1gpu-color-translation-cutout
dnnlib: Running training.training_loop.training_loop() on localhost...
2020-12-20 02:40:13.518280: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-12-20 02:40:13.523440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-12-20 02:40:13.862532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.8
pciBusID: 0000:2d:00.0
2020-12-20 02:40:13.862964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.342
pciBusID: 0000:23:00.0
2020-12-20 02:40:13.864667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-12-20 02:40:13.867997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-12-20 02:40:13.870859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-12-20 02:40:13.872201: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-12-20 02:40:13.876274: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-12-20 02:40:13.878809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-12-20 02:40:13.885505: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-12-20 02:40:13.885683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-12-20 02:40:17.775727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-20 02:40:17.775907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2020-12-20 02:40:17.777511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2020-12-20 02:40:17.777982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
2020-12-20 02:40:17.778962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8550 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2d:00.0, compute capability: 8.6)
2020-12-20 02:40:17.781185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3000 MB memory) -> physical GPU (device: 1, name: GeForce GTX 970, pci bus id: 0000:23:00.0, compute capability: 5.2)
Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 64, 64]
Dynamic range = [0, 255]
Label size = 0
Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Failed!
Traceback (most recent call last):
File "run_low_shot.py", line 171, in
main()
File "run_low_shot.py", line 165, in main
run(**vars(args))
File "run_low_shot.py", line 94, in run
dnnlib.submit_run(**kwargs)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\training_loop.py", line 155, in training_loop
G = tflib.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **G_args)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\network.py", line 97, in init
self._init_graph()
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\networks_stylegan2.py", line 195, in G_main
components.synthesis = tflib.Network('G_synthesis', func_name=globals()[synthesis_func], **kwargs)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\network.py", line 97, in init
self._init_graph()
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\networks_stylegan2.py", line 396, in G_synthesis_stylegan2
x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\networks_stylegan2.py", line 358, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv, impl=impl)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\networks_stylegan2.py", line 106, in modulated_conv2d_layer
s = apply_bias_act(s, bias_var=mod_bias_var, impl=impl) + 1 # [BI] Add bias (initially 1).
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\training\networks_stylegan2.py", line 72, in apply_bias_act
return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, alpha=alpha, gain=gain, impl=impl)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\ops\fused_bias_act.py", line 68, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\ops\fused_bias_act.py", line 122, in _fused_bias_act_cuda
cuda_kernel = _get_plugin().fused_bias_act
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\ops\fused_bias_act.py", line 16, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\custom_ops.py", line 115, in get_plugin
_run_cmd(_prepare_nvcc_cli('"%s" --preprocess -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir)))
File "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\custom_ops.py", line 61, in _run_cmd
raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc "c:\Programming\PythonNotebooks\DataEfficientGans\data-efficient-gans\DiffAugment-stylegan2\dnnlib\tflib\ops\fused_bias_act.cu" --preprocess -o "C:\Users\Tasha\AppData\Local\Temp\tmpjd6q1ev3\fused_bias_act_tmp.cu" --keep --keep-dir "C:\Users\Tasha\AppData\Local\Temp\tmpjd6q1ev3" --disable-warnings --include-path "C:\Users\Tasha.conda\envs\tf115py37\lib\site-packages\tensorflow_core\include" --include-path "C:\Users\Tasha.conda\envs\tf115py37\lib\site-packages\tensorflow_core\include\external\protobuf_archive\src" --include-path "C:\Users\Tasha.conda\envs\tf115py37\lib\site-packages\tensorflow_core\include\external\com_google_absl" --include-path "C:\Users\Tasha.conda\envs\tf115py37\lib\site-packages\tensorflow_core\include\external\eigen_archive" --include-path "C:\Users\Tasha\AppData\Local\Programs\Python\Python37\Lib\site-packages\tensorflow_core\include\tensorflow_core\core\framework" --compiler-bindir "C:/Program Files (x86)/Microsoft Visual Studio 14.0/vc/bin" 2>&1

fused_bias_act.cu
c:/Programming/PythonNotebooks/DataEfficientGans/data-efficient-gans/DiffAugment-stylegan2/dnnlib/tflib/ops/fused_bias_act.cu(9): fatal error C1083: Cannot open include file: 'tensorflow_core/core/framework/op.h': No such file or directory

Training on TPU for StyleGan2

Is it possible to use TPU from Google Colab to train using StyleGan2?

code question in rand_cutout

in this method rand_cutout i'm not able to follow what this line is doing

data-efficient-gans/DiffAugment-biggan-cifar/DiffAugment_pytorch.py

Line 67 in 96d6d87

mask[grid_batch, grid_x, grid_y] = 0

is it just indexing into the mask? grid_batch and grid_x and grid_y also have multiple dimensions, so it's throwing me off a bit.

any help to clarify would be much appreciated.

why did not add random flip augmentation in this？

Hi，I want to add “Random flip augmentation” but I am not sure whether is good or not
And I am not sure how to implement it

Did you try “Random flip augmentation” ??? I think this is easiest augment for implementing

AssertionError when training my own dataset

When trying to train the model with a custom dataset, after specifying the path to the dataset in "run_slow_shot.py" with the argument "dataset". I get an error, AssertionError : in create_dataset assert os.path.isdir(data_dir).though I specified the right folder path.
What is the right folder path? Thank you！
(my conmmand is python run_slow_shot.py --dataset=''z:\GraduateWork\data-efficient-gans-master\DiffAugment-stylegan2\datasets\opera'')

Error Report

Traceback (most recent call last):
File "z:/GraduateWork/data-efficient-gans-master/DiffAugment-stylegan2/run_low_shot.py", line 171, in
main()
File "z:/GraduateWork/data-efficient-gans-master/DiffAugment-stylegan2/run_low_shot.py", line 165, in main
run(**vars(args))
File "z:/GraduateWork/data-efficient-gans-master/DiffAugment-stylegan2/run_low_shot.py", line 37, in run
dataset = dataset_tool.create_dataset(dataset, resolution)
File "z:\GraduateWork\data-efficient-gans-master\DiffAugment-stylegan2\training\dataset_tool.py", line 194, in create_dataset
assert os.path.isdir(data_dir)
AssertionError

OOM when allocating tensor with shape[4,32,1024,1024] where is the 32 coming from?

So I have access to a V100 on which I successfully trained on a dataset of 512x512 images. However, when I moved up to 1024x1024 I had to reduce the batch size to 2 in order to avoid OOM errors. This resulted in monochromatic looking images. I'm wondering where the 32 in:

(0) Resource exhausted: OOM when allocating tensor with shape[4,32,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

is coming from and if it can be reduced to 16 so that the batch size can be 4 which will hopefully result in colorful images.

Thank you for your help!

Q: Why not use adjust_brightness of torchvision?

Why not use functions from torchvision instead of writing you own? Is it because adjust_brightness, adjust_saturation and adjust_contrast of torchvision are not differentiable? Because I thought they were. Thanks for your answer. :)

Replicating FFHQ architecture

The paper mentions that the "# feature maps at shallow layers is halved to match the architecture of ADA". I was skimming over the code but couldn't find where this was implemented. Thanks.

Training time

Hey there :)

Do you have any suggestion how we could reduce training time? Is there any way like we reduce size of the network for less complex images than human portrait, or change the arguments below?

parser.add_argument('--dataset', help='Training dataset path', required=True)
parser.add_argument('--resolution', help='Specifies resolution', default=256, type=int)
parser.add_argument('--result-dir', help='Root directory for run results (default: %(default)s)', default='results', metavar='DIR')
parser.add_argument('--DiffAugment', help='Comma-separated list of DiffAugment policy', default='color,translation,cutout')
parser.add_argument('--num-gpus', help='Number of GPUs (default: %(default)s)', default=1, type=int, metavar='N')
parser.add_argument('--batch-size', help='Batch size', default=16, type=int, metavar='N')
parser.add_argument('--total-kimg', help='Training length in thousands of images (default: %(default)s)', metavar='KIMG', default=300, type=int)
parser.add_argument('--ema-kimg', help='Half-life of exponential moving average in thousands of images', metavar='KIMG', default=None, type=int)
parser.add_argument('--num-samples', help='Number of samples', default=None, type=int)
parser.add_argument('--gamma', help='R1 regularization weight', default=None, type=float)
parser.add_argument('--fmap-base', help='Number of feature maps', default=None, type=int)
parser.add_argument('--fmap-max', help='Maximum number of feature maps', default=None, type=int)
parser.add_argument('--latent-size', help='Latent size', default=None, type=int)
parser.add_argument('--mirror-augment', help='Mirror augment (default: %(default)s)', default=True, metavar='BOOL', type=_str_to_bool)
parser.add_argument('--impl', help='Custom op implementation (default: %(default)s)', default='cuda')
parser.add_argument('--metrics', help='Comma-separated list of metrics or "none" (default: %(default)s)', default='fid5k-train', type=_parse_comma_sep)
parser.add_argument('--resume', help='Resume checkpoint path', default=None)
parser.add_argument('--resume-kimg', help='Resume training length', default=0, type=int)
parser.add_argument('--num-repeats', help='Repeats of evaluation runs (default: %(default)s)', default=1, type=int, metavar='N')
parser.add_argument('--eval', help='Evalulate mode?', action='store_true')

I know that if we have more powerful GPU and if we reduce resolution, the timing get less but I'm not sure about other arguments.

Best

comparison with few shot gan

will there be an update to compare performance with few shot gan?
https://github.com/e-271/few-shot-gan

Loading just the discriminator model

Hi,

I tried loading the discriminator model and evaluating on some images but was running into some issues (don't have the error message handy right now but will try to update when I rerun the code), but I'm wondering if there's anything amiss with how I'm loading it:

`with tf.device('/gpu:0'):
print('Constructing networks...')

        D_neg = tflib.Network('D', num_channels=num_channels, resolution=resolution, label_size=label_size, **D_args)
        resume_networks = misc.load_pkl(resume_pkl_neg)
        rG, rD, rGs = resume_networks
        D_neg.copy_vars_from(rD)

        dl=torch.utils.data.DataLoader(neg_dataset, batch_size=batch_size)
        d_scores_neg=[]
        for batch_index, (img, label) in enumerate(dl):
            # d_out_neg = D_neg.run(img.numpy(), is_training=False, minibatch_size=batch_size)
            with sess.as_default():
                d_out_neg = D_neg.get_output_for(tf.convert_to_tensor(img.numpy()), is_training=False).eval()
            d_scores_neg.extend(list(d_out_neg))
        d_scores_neg=np.array(d_scores_neg)`

Thanks.

Training won't start

Good day
After building .tfrecods from imageset (1024x1024), I am trying to start training (tensorflow version: 1.15.2;
GPU: Tesla P100-PCIE-16GB):

!python3 run_ffhq.py \
--num-gpus=1 --resolution=1024 --latent-size 512 --DiffAugment="" \
--total-kimg 50 --mirror-augment=true \
--result-dir="/path/to/dir/" --dataset="/path/to/images/with/tfrecords/aligned/" \
--resume="/path/to/stylegan2-ffhq-config-f.pkl"  --fmap-base=8192 \

training won't start with the following error:

dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 1024, 1024]
Dynamic range = [0, 255]
Label size    = 0
Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Traceback (most recent call last):
  File "run_ffhq.py", line 171, in <module>
    main()
  File "run_ffhq.py", line 165, in main
    run(**vars(args))
  File "run_ffhq.py", line 94, in run
    dnnlib.submit_run(**kwargs)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/training/training_loop.py", line 162, in training_loop
    G.copy_vars_from(rG)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/tflib/network.py", line 324, in copy_vars_from
    tfutil.set_vars(tfutil.run({self.vars[name]: src_net.vars[name] for name in names}))
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/tflib/tfutil.py", line 217, in set_vars
    run(ops, feed_dict)
  File "/content/data-efficient-gans/DiffAugment-stylegan2/dnnlib/tflib/tfutil.py", line 31, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 3, 512, 512) for Tensor 'G_synthesis/64x64/Conv0_up/weight/new_value:0', which has shape '(3, 3, 512, 256)

I've tried different resolutions (256, 512) with the same error; Dataset verified to be correct (also unmodified stylegan2 runs and was successfully finetuned fine from it); FFHQpkl is from original soruce.

Can you please address the issue?

Question about Generator Update

Hello, I tried to train a conditional SNGAN with and without DiffAugment on my own dataset, and got identical results when I ran inference. The FID and IS seem to be pretty close as well.

Based on this, I think the gradients from DiffAugment are not being properly backpropagated. Would you have any suggestions on how to check for gradient flow or other ideas to see whether DiffAugment is working or not?

How do you ensure that the results of two random operations are consistent？？

data-efficient-gans/DiffAugment-stylegan2/training/loss.py

Lines 19 to 20 in ce3c9c4

 real_scores = D.get_output_for(DiffAugment(reals, policy=policy, channels_first=True), is_training=True) 

 fake_scores = D.get_output_for(DiffAugment(fakes, policy=policy, channels_first=True), is_training=True)

In your code ，you did twice DiffAugment 。 But torch.rand() in DiffAugment function, and the of value of it is not the same!!

I have the test in One Image and do DiffAugment twice, the result is different

Comparison with the 3 other DiffAugment papers published

The core insight of this paper, that doing data augmentation on the reals and fakes while training D, has been recently published by (at least) 3 other papers: Zhao, Tran, and Karras (in that chronological order). A comparison and contrast with the differing results in those papers would be very useful for the README & future versions of this paper.

In particular, I would like to know: did you simply disable path length regularization in StyleGAN2 rather than work around the higher-order gradient issues? Why do you think your D-only augmentation diverged when Zhao (the first) does all their experiments with only D augmentation without any issue at all? Did you experiment with stronger or weaker settings for each data augmentation to understand if the stack of multiple data augmentations is collectively too weak or too strong? Also, one part of the paper seems ambiguous: how exactly are the data augmentations done - does it pick one augmentation at random per batch, one augmentation per image, or does it apply all 1/2/3 augmentations to each image as a stack? The paper seems to suggest, given the emphasis on strong augmentation, that it's applying as a stack, but it never actually seems to say (and looking at the source code didn't help).

No such file or directory: inception_moments.pkl

Hi, I am running the differentiable data aug. with biggan model. I get this error

File "/vf/users/duongdb/data-efficient-gans/DiffAugment-biggan-cifar/dnnlib/util.py", line 422, in open_file_or_url return open(file_or_url, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'SomeDataSetName_inception_moments.pkl'

where the SomeDataSetName is my own dataset. Do I need to run the file calculate_inception_moments.py in ajbrock's biggan github?

Thanks.

I would like to train your model on this fruits dataset: https://www.kaggle.com/moltean/fruits

It has about 80K images, 100 classes.

How long do you think it would take to train on Colab GANs for all classes? Does the model train one class at a time or does it train multiple classes all at once?

Thanks for the help and great work!

	real_scores = D.get_output_for(DiffAugment(reals, policy=policy, channels_first=True), is_training=True)
	fake_scores = D.get_output_for(DiffAugment(fakes, policy=policy, channels_first=True), is_training=True)