pbaylies / stylegan-encoder Goto Github PK

This project forked from puzer/stylegan-encoder

StyleGAN Encoder - converts real images to latent space

License: Other

Jupyter Notebook 96.82% Python 3.17% Shell 0.01%

deep-learning deep-neural-networks gan generative-adversarial-network loss-functions neural-networks perceptual-losses resnet-50

stylegan-encoder's People

Contributors

Stargazers

Watchers

Forkers

manuforks iamsile taffsigg faisalshahbaz s9021025292140 esmaeilinia biozid-0208 dragon-dane brettmontague linhduongtuan chikiuso mohammedalbaqerh thomas-harrington oneiroid joshuahu deepchatterjeevns mponty dniku ak9250 mistobaan aliborji shgidi shawwn jmliu88 aydao shartoo retrorebuild vnicula mebinum patriotyk gsygsy96 walnut-ree chengjiewu nekomiao123 andribiz slavaelizarov konatasick xiaohaipeng jjandnn rimless d1on macd2 martigrau elephantgit espizarro wang93 csjoax dkluffy binbomb lqvito braincradle b-yassine rolux liuwenbo3 antoinedandi olzn mnbjhu gywspace zeltserj immocat mohammadalm zoezhou1999 yqgans 5agado nduller shafiahmed shuxiangguo apollyon125 jiesonshan thetruelam avantcontra theyellowdiary thestarboy azmarie polyrhythm-inc daonancai jeffcarpenter maryhh anjanau ray-mami jiewei119 ypw1996 amrmkayid xd99999 dmschauer naomi-ken-korem yotamnitzan matbeich waizfdc cd9800777 gouxiayibu robbyhuelsi pseudo-ankit xhhong k-bodapati affekaffe prasad3000 xiaomao136 georgedavila github2016-yuan

stylegan-encoder's Issues

Two bugs found

First, hats off for the amazing repo !
Second, I've identified two bugs you'd might want to fix.

Running with "--tile_dlatents=true" doesn't work for me. The shape of the dlatents seems wrong as it should be (batch_size, 1, 512) instead of (batch_size, 512).
After fixing this, I still wasn't able to run the code with lower StyleGAN resolutions (e.g with "--model_res=256"). The "create_variable_for_generator" function should get as input the correct "model_scale" instead of using default value 18.

These two bugs are very simple to resolve, I've fixed both locally and it works for me.
I'll be happy to contribute, please let me know if I should submit a PR.

Hi,
Really like what you did here, I was wondering if you're planning to train a new ResNet model for StyleGAN2, or could guide me through the process, it would be interesting to see how it compares :)

Question about non human face data

Hi @pbaylies

I know that the faces dataset model is the highest quality among the several models that exist, but I was interested in the cars model. It generates good images, and I was hoping to encode some existing images into the car model's latent space and interpolate between them. However the results, using the existing pre-trained cars model and training resnet a bit did not do well at embedding images.
Any advice?
Thanks!

Do I need align the generated data first when I want train an encoder from the scratch?

Hi @pbaylies , I wonder whether I need align the generated data first when I want train an new encoder?

try optimize resnet50 instead of latent code

after fintuning resnet，load resnet and generator together，then optimize resnet，can this way have better results？

Generator Error

Thank you very much for sharing the good code.

I keep getting the following error while running the example in colab.
I'm almost there, and I would appreciate your help.

ValueError Traceback (most recent call last)
in ()
16 generator_network, discriminator_network, Gs_network = pickle.load(f)
17
---> 18 generator = Generator(Gs_network, batch_size=1, randomize_noise=False)
19
20 model_res = 1024

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
866 tb = [x for x in tb if "tensorflow/python" not in x[0]][:5]
867 raise ValueError("%s Originally defined at:\n\n%s" %
--> 868 (err_msg, "".join(traceback.format_list(tb))))
869 found_var = self._vars[name]
870 if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable learnable_dlatents already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)

How to calculate the proportion of each loss

Hi!
This work is excellent! I want to know how to calculate the proportion of each loss(such as your default value: 0.4,1.5,100,100,1).
Thank you so much!

Inverse network output shape

Picking up from here...

My understanding of the code in train_effnet.py is that you generate a training set in which the targets are the dlatent outputs of the StyleGAN mapping network, and the inputs are the images synthesized from those dlatents with the StyleGAN synthesis network.

The thing that confuses me is that the StyleGAN mapping network outputs a single [1, 512] vector that is then tiled up to [18, 512], so that all 18 layers are identical. But the effnet's architecture doesn't constrain its output similarly. It outputs a [18, 512] vector in which the layers don't seem to be constrained to be identical to one another, and in practice it doesn't learn to do so. (Example: Target image, composite image that it generates, and each of the 18 layers synthesized individually)

Am I understanding it correctly? If so, wouldn't you normally constrain the architecture of a network to the same rough domain as the targets in the training set? For example, if you were training a GAN with a 512x512 grayscale training set, wouldn't you set its output to 512x512, and not 512x512x3?

google drive exceeded encode_images.py

help me please ......................................

Dlats stochastic clipping range

Hey! I've checked distribution of dlatent's components (W space) - they seem to be not in [-2,2] range, and not normal at all contrary to what's said in the readme...
It's more like gamma distribution in range [-0.25, 1.5]. I hardcoded the latter in you repo - quality seems to improve.

change age not precisely

Hi,thank you for sharing your awesome repo,thus i could make lots of fun.While playing with this repo i found it's not precisely when generating older faces,for example.

Here is the original image

and the geneated aged old face with age coeff=-2.0 is

purpose of train_resnet

Hi and thanks for your contribution.
What is the purpose of training or using resent/affnet model in encode_images if you already have the vgg perceptual model?

OSError: Google Drive quota exceeded

I'm trying to run train_resnet.py but I keep getting this:

OSError: Google Drive quota exceededOSError: Google Drive quota exceeded issue.

I tried to work around it by downloading the file in the default URL and uploading it to my own personal Google Drive, and then I get this error:

Downloading https://drive.google.com/open?id=1CeLTgZHwnwXG7rc0rXSxsf2uumBwRga6 ... done Traceback (most recent call last): File "train_resnet.py", line 242, in <module> generator_network, discriminator_network, Gs_network = pickle.load(f) _pickle.UnpicklingError: invalid load key, '<'.

Which I looked up and could not find anything useful.
Any help on how to move forward and be able to train a resent / utilize the pretrained one?

Confusing description of stochastic clipping in the CLI

Hey!
First of all thank you for all the brilliant work on this project!

One thing that confused me was the description of the clipping_threshold in your encode_images.py CLI

Stochastic clipping of gradient values outside of this threshold

However this technique does not clip gradients - it clips the values of the optimized variable.

Request - Add support for StyleGanV2

Hi, Would be great if you can make your encoder support the latest Stylegan version - Reference to Puzer StyleganV2 fork - https://github.com/rolux/stylegan2encoder

Error while running encode_images.py #41

please help me ....i have to submit my stylegan project asap

How long did it take to train on your advice?

As titled. We are implementing a pytorch version but found the network quite hard to converge...

train new feature classifier on face dataset problem

I'm trying to train a face feature classifier and then get a feature axis like TL-GAN ,but it seems that the dlib face detector did not work very well,it may fail on some facial image. I checked some face detection model and thought MCTNN may be a better replacement for dlib in perceptual_model.py ,as MTCNN is of higher accuracy and faster.

Trianing error about launch failed

when training,
the error is "Internal Error: Blas GEMV launch failed: m=10, n=10

StopIteration exception in encode_images.py

I'm using the StyleGAN_Encoder_Tutorial notebook in google colab but whe running the: !python encode_images.py aligned_images/ generated_images/ latent_representations/ \ --vgg_url=https://rolux.org/media/stylegan/vgg16_zhang_perceptual.pkl \ --batch_size=2
cell, I get the following error:
Using TensorFlow backend.
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Traceback (most recent call last):
File "encode_images.py", line 243, in
main()
File "encode_images.py", line 118, in main
generator = Generator(Gs_network, args.batch_size, randomize_noise=args.randomize_noise)
File "/content/stylegan2/encoder/generator_model.py", line 62, in init
self.dlatent_variable = next(v for v in tf.global_variables() if 'learnable_dlatents' in v.name)
StopIteration

The encoder doesn't work anymore

Hi,
i constantly get ''custom_input'' is not defined problem.

Where is imutils is defined?

Where is imutils is defined?

File "/data_large/stylegan/stylegan-encoder/encoder/perceptual_model.py", line 136, in generate_face_mask
    from imutils import face_utils
ModuleNotFoundError: No module named 'imutils'

https://github.com/pbaylies/stylegan-encoder/blob/master/encoder/perceptual_model.py#L136

VGG loss alone

Hi! I have a small question. How to do original VGG-loss-only with your code? I put VGG loss to 1 and all the others to 0 but still some difference with Puzer for example. Below is the results:

Original Puzer Pbaylies

The scarf in some reason is not visualized. I don't use face mask. Can you please advice?

feature axis corelated with each other

I once edited image on age or gender direction and bad result return.For example

i want to make a female be older but increase the age direction coef may leading to her gender be male like
if i made a male older ,this will lead a glass appear on his face

Obviously, feature direction are corelated with each other.We could borrow idea from tl-gan such as feature disentangle.There are two scripts there which i thought very helpful ,feature_axis.py and script_label_regression.py .

Colab incompatibility with layer name

Current versions of Colab seem to expect a layer named G_synthesis_1/_Run/concat/concat:0 instead of G_synthesis_1/_Run/concat:0 in the generator. Thanks to @xsteenbrugge on Twitter and his following for discovering and reporting this issue!

learning directions

how would you get something like 'emotion' 'happiness' for the linear model?
y_emotion_data = np.array([x['faceAttributes']['emotion']['happiness'] for x in labels_data])?

Please delete this

dlatent_avg explain

Hi pbaylies,

just wanna say thank u for your wonderful repos. However, as i can see from the NVlabs original readme, they said that the truncation trick will be disables when we use the subnetwork of the G and we would have to do it manually. Since your code is using the dlatent and only make use of the synthesis network , i think that the dlatent_avg part is missing the truncation trick part.
Please correct me if i'm wrong

reasons for having 18 (8) identical dlat vectors

Hey, very cool implementation, respect for masking.
I think i got it why them karras et al map lats to 18 identical dlat vectors, wondering if valid.
10 out of 18 are for noise, so we have 8 dlats. if imagine that each dlat is 1D then stylegan maps ffhq faces space to the line in 8D space which has 45° angle with all axes. so by forcing all ffhq faces embeds to this line they make stylegan to learn the whole 8D space yo.
wondering if this means that we should truncate not not avg dlat, but with dlat that rpresents the closest point on that line.

Finding latent representation in other domain

First of all, thanks Pbaylies for sharing this great repo! Hopefully you can help me with the following.

How can I find a latent representation of myself, in an portrait-art stylegan model? Right now, I can find a good latent representation of myself, but the generated image is not in portrait art style. This generated image is in the style of the original FFHQ stylegan model, just similar as the image of myself. How can I achieve what e.g. aiportraits.com has, showing the most similar representation of a person in a portrait-art stylegan model? I have tried to play with parameters and cross-over latent codes, but still without succes.

image alignment corrupts 32bit images

when a padding is needed, the code in face_alignment will corrupt images with an alpha channel. fixed by:

img = PIL.Image.open(src_file).convert('RGBA').convert('RGB')

identity and age

when using the Learn_direction_in_latent_space notebook for a specific facial attribute like glasses, the identity and age seem to change in the trained vector outputs

Runs out of memory and crashes when encoding an image sequence.

When encoding a large number of images the encoding time will slowly increase until it becomes 2x-3x the time it took to encode the first image, then the script encode_images.py will crash. On my system it always crashes on the 56th image.

The culprit appears to be these lines in perceptual_model.py

self.sess.run(tf.assign(self.features_weight, weight_mask)) self.sess.run(tf.assign(self.ref_img_features, image_features)) self.sess.run(tf.assign(self.ref_weight, image_mask)) self.sess.run(tf.assign(self.ref_img, loaded_image))

I posted a pull request on Puzer's original stylegan-encoder: Puzer#4

but I'm not familiar enough with your changes to know how to fix it. There is more information here: Puzer#3

The changes you have made and collected are a fantastic step forward and actually make frame to frame stylegan animations possible. A fix for this bug would go a long way to helping encode image sequences.

AttributeError: module 'tensorflow' has no attribute 'Dimension'

Running
!python train_resnet.py --help
on Google Colab, I got message
AttributeError: module 'tensorflow' has no attribute 'Dimension'

I don't know how to fix it.

Conditional training

Thank you for open sourcing, can you please add how to train the conditional model in the documentation ? Thanks

Error while running encode_images.py

I am getting this error while trying to load the drive file to google colab and i can't find a solution! Is it because i am limited and the file is too big? Is there any solution?

Downloading https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ ............ failed Traceback (most recent call last): File "encode_images.py", line 241, in <module> main() File "encode_images.py", line 115, in main with dnnlib.util.open_url(args.model_url, cache_dir=config.cache_dir) as f: File "/content/stylegan-encoder/dnnlib/util.py", line 381, in open_url raise IOError("Google Drive quota exceeded") OSError: Google Drive quota exceeded

Encode only coarse dlatents

Hey @pbaylies thanks for this repo! :)
You suggested to predict pose I could train a resnet to predict just the coarse dlatents (Puzer#15).

Would it be something to do with this reshape?

stylegan-encoder/train_resnet.py

Line 162 in b5ddcd7

x = Reshape((model_scale, 512))(x) # train against all dlatent values

Or changing the size of W?

stylegan-encoder/train_resnet.py

Lines 55 to 60 in b5ddcd7

 W = Gs.components.mapping.run(Z, None, minibatch_size=minibatch_size) # Use mapping network to get unique dlatents for more variation. 

 dlatent_avg = Gs.get_var('dlatent_avg') # [component] 

 W = (W[np.newaxis] - dlatent_avg) * np.reshape([truncation, -truncation], [-1, 1, 1, 1]) + dlatent_avg # truncation trick and add negative image pair 

 W = np.append(W[0], W[1], axis=0) 

 W = W[:, :mod_r] 

 W = W.reshape((n*2, model_scale, 512))

Thanks for your help, sorry I'm not from a ML background.

OSError: Unable to open file (file signature not found) loading the finetuned_resnet.h5

Hi @pbaylies ,

I'm trying to train the ResNET or efficientNET in order to generate an estimation of the latent vector of an input image of a bedroom.
In my case, I'm using the StyleGAN trained with LSUN Bedroom dataset at 256×256 to achieve a dataset of 10.000 synthesise bedroom images to train the ResNet.

Training Step:
Using your train_resnet.py code, the model is saved every 2 epochs in the following directory:
save_path = 'data/finetuned_resnet.h5'
model.save(save_path)

Testing Step:
When I try to load the above model in order to generate a latent vector for a specific bedroom image:
load_resnet='data/finetuned_resnet.h5'
ff_model = load_model(args.load_resnet)

The following error occurs:
File "/home/usuaris/imatge/mgrau/stylegan_env/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 88, in h5py.h5f.open OSError: Unable to open file (file signature not found)

It seems like the h5py don't recognise my file/model..
I've searched and tried some solutions without success:

Trying to load you model and trying to load it. (Same error)
Change the filename and directory. (Same error)

There are someone with the same problem?
Could you provide me some solution to fix this bug?

Thank you,

Render npy not matching the generated images

@pbaylies , thanks for the awesome work.
Do you have any ideas why the saved latent representations are not matching the generated images? I tried to regenerate images from your saved latent vectors, but the new generated images are different.

W latent vector prediction model

Is it true that model that predict W latent vector (i.e. train_resnet.py) can be only used as initial estimate of W latent vector in optimization process? i.e. it's not possible to directly use this model to predict W latent vector without optimization process(i.e. it looks like if it used like this it change person identity)?

learn directions in latent space about facial attributes

Hi, @pbaylies when using the Learn_direction_in_latent_space notebook for training the facial attributes of the age, the shape of the clf.coef_ is (72,9216).how to process? Here some results.

what's more, can you give some suggestions to train the face pose?

Error running encode_images.py on Colab

As in the earlier issue I am getting the new error trying to run the stylegan pickle file and getting the same error as in the earlier issue.I would like to say that my computer doesn't have GPU and I intend to complete job on colab.Can you suggest some way?

Why not use Dense layer during encoding?

Hi @pbaylies, I'm consufed the reason you don't use Dense layer after Tree Connect, because generally speaking, we won't add ReLU/ELU after the last layer.

how to use cpu to run the model

I have a computer that didn't have GPU. How can I use cpu to run the model.

Does Resnet trained on FHHQ?

Hi @pbaylies, sorry to ask such naive question. But I'm a little confused about the policy to train an encoder network. Here are my detailed questions.

Can the encoder be trained in minibatch? or it should be optimized for each image?
--- Because I found it I set batchsize=8, the generated images seems crashed. So I wonder if the encoder should be optimized for every single image?
The pretrained resnet provided by you is trained on imagenet or other datasets? Could you simply explain the training details, such as trained on which dataset, and which losses you adopt?

Typo: features_weight.initializer is run twice

stylegan-encoder/encoder/perceptual_model.py

Line 125 in 18919c2

 self.sess.run([self.features_weight.initializer, self.features_weight.initializer]) 

Hey, seems like ref_img_features remain uninitialized, features_weight inited twice instead.
PS: If corrected - the first iteration yelds zero vgg loss, but that might be because I messed something else (trying to use facenet instead of vgg). but might be not _

Encoding process stops too early

Sorry if this seems to be a stupid question.
I'm running this project in google colab.
When encode_images.py is running, it always stops at about 20 iterations even though I set it to run 200 iterations.
This is the command I used:
!python encode_images.py --batch_size=2 --output_video=True --load_resnet='data/finetuned_resnet.h5' --lr=0.01 --decay_rate=0.2 --iterations=200 --use_l1_penalty=0.1 aligned_images/ generated_images/ latent_representations/

about efficientnet

hi,P:

When your effnet.py file is from efficientnet import *, the .keras is missing, it should be from efficientnet.keras import * or from efficientnet.tfkeras import *

It may be that the original author adjusted the structure.

I wish you a smooth creation and look forward to further work.

Stochastic Weight Averaging bug

I believe there is a bug in the implementation of stochastic weight averaging. Specifically, inside the apply_swa function for the network code, the scaling appears incorrect because the new model weights are scaled up by the epoch:

        tfutil.set_vars(tfutil.run({self.vars[name]: (src_net.vars[name] * epoch + self.vars[name])/(epoch + 1) for name in names}))

The result is that, regardless of what models swa.py reads in, the last pkl it reads will be scaled so much it overwrites pretty much all of weights in the current model. For example, the tenth model will be scaled massively (i.e., epoch=10) relative to the ones that come before it.

I believe the correct implementation would be:

        scale_new_data = 1.0 / (epoch + 1)
        scale_moving_average = (1.0 - scale_new_data)
        tfutil.set_vars(tfutil.run({self.vars[name]: (src_net.vars[name] * scale_new_data + self.vars[name] * scale_moving_average) for name in names}))

This is derived from the swa authors' repo, with the relevant portions here and here. I'd be happy to submit this fix in a pull request, and wanted to raise the issue here first in case you'd like to handle it differently.

Reproducing the bug should be straightforward. I encountered this problem while experimenting by running swa.py on wildly different models (as a kind of a cheap form of transfer or regularization). For example, I was averaging together gwern's anime model with ak9250's fine art portrait model, among several others, and noticed that the network_avg.pkl always produced samples matching whichever model was last among input pkls files it read. That lead me to inspect the code more closely, and find the original swa code. With the changes above, the output network_avg.pkl now works as expected, producing an average across the input models that appears close to what transfer learning would yield after a few ticks. And as you might expect, applying swa to anime and fine art portraits creates some nightmare material, painterly people with weird cartoony anime eyes :)

Also, I'd like to say thanks for the excellent implementation and updates here. It's really clean and been quite nice to work with.

RuntimeError: cannot join current thread

Encoding process crashes when trying to exit the tqdm() loop:

[00:51, 51.92s/it]
Exception ignored in: <bound method tqdm.__del__ of img_01: loss 84.5358; lr 0.0064:   9% 18/200 [00:25<01:46,  1.70it/s]>

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 931, in __del__
    self.close()
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 1133, in close
    self._decr_instances(self)
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 496, in _decr_instances
    cls.monitor.exit()
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 52, in exit
    self.join()
  File "/usr/lib/python3.6/threading.py", line 1053, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

************ Latent code optimization finished! ***************

This might be related to this tqdm issue, but I'm not entirely sure. It might be another issue that I can't see from the traceback, tried for an hour to fix it without luck.

To reproduce you can run this Colab Notebook.

	W = Gs.components.mapping.run(Z, None, minibatch_size=minibatch_size) # Use mapping network to get unique dlatents for more variation.
	dlatent_avg = Gs.get_var('dlatent_avg') # [component]
	W = (W[np.newaxis] - dlatent_avg) * np.reshape([truncation, -truncation], [-1, 1, 1, 1]) + dlatent_avg # truncation trick and add negative image pair
	W = np.append(W[0], W[1], axis=0)
	W = W[:, :mod_r]
	W = W.reshape((n*2, model_scale, 512))