Giter VIP home page Giter VIP logo

stylegan-encoder's People

Contributors

aydao avatar oneiroid avatar pbaylies avatar puzer avatar shawwn avatar tkarras avatar yotamnitzan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stylegan-encoder's Issues

Does Resnet trained on FHHQ?

Hi @pbaylies, sorry to ask such naive question. But I'm a little confused about the policy to train an encoder network. Here are my detailed questions.

  1. Can the encoder be trained in minibatch? or it should be optimized for each image?
    --- Because I found it I set batchsize=8, the generated images seems crashed. So I wonder if the encoder should be optimized for every single image?

  2. The pretrained resnet provided by you is trained on imagenet or other datasets? Could you simply explain the training details, such as trained on which dataset, and which losses you adopt?

Error running encode_images.py on Colab

As in the earlier issue I am getting the new error trying to run the stylegan pickle file and getting the same error as in the earlier issue.I would like to say that my computer doesn't have GPU and I intend to complete job on colab.Can you suggest some way?
github_isssuee_aniket

OSError: Google Drive quota exceeded

I'm trying to run train_resnet.py but I keep getting this:

OSError: Google Drive quota exceededOSError: Google Drive quota exceeded issue.

I tried to work around it by downloading the file in the default URL and uploading it to my own personal Google Drive, and then I get this error:

Downloading https://drive.google.com/open?id=1CeLTgZHwnwXG7rc0rXSxsf2uumBwRga6 ... done Traceback (most recent call last): File "train_resnet.py", line 242, in <module> generator_network, discriminator_network, Gs_network = pickle.load(f) _pickle.UnpicklingError: invalid load key, '<'.

Which I looked up and could not find anything useful.
Any help on how to move forward and be able to train a resent / utilize the pretrained one?

Inverse network output shape

Picking up from here...

My understanding of the code in train_effnet.py is that you generate a training set in which the targets are the dlatent outputs of the StyleGAN mapping network, and the inputs are the images synthesized from those dlatents with the StyleGAN synthesis network.

The thing that confuses me is that the StyleGAN mapping network outputs a single [1, 512] vector that is then tiled up to [18, 512], so that all 18 layers are identical. But the effnet's architecture doesn't constrain its output similarly. It outputs a [18, 512] vector in which the layers don't seem to be constrained to be identical to one another, and in practice it doesn't learn to do so. (Example: Target image, composite image that it generates, and each of the 18 layers synthesized individually)

Am I understanding it correctly? If so, wouldn't you normally constrain the architecture of a network to the same rough domain as the targets in the training set? For example, if you were training a GAN with a 512x512 grayscale training set, wouldn't you set its output to 512x512, and not 512x512x3?

train new feature classifier on face dataset problem

I'm trying to train a face feature classifier and then get a feature axis like TL-GAN ,but it seems that the dlib face detector did not work very well,it may fail on some facial image. I checked some face detection model and thought MCTNN may be a better replacement for dlib in perceptual_model.py ,as MTCNN is of higher accuracy and faster.

identity and age

when using the Learn_direction_in_latent_space notebook for a specific facial attribute like glasses, the identity and age seem to change in the trained vector outputs

Render npy not matching the generated images

@pbaylies , thanks for the awesome work.
Do you have any ideas why the saved latent representations are not matching the generated images? I tried to regenerate images from your saved latent vectors, but the new generated images are different.

dlatent_avg explain

Hi pbaylies,

just wanna say thank u for your wonderful repos. However, as i can see from the NVlabs original readme, they said that the truncation trick will be disables when we use the subnetwork of the G and we would have to do it manually. Since your code is using the dlatent and only make use of the synthesis network , i think that the dlatent_avg part is missing the truncation trick part.
Please correct me if i'm wrong

Question about non human face data

Hi @pbaylies

I know that the faces dataset model is the highest quality among the several models that exist, but I was interested in the cars model. It generates good images, and I was hoping to encode some existing images into the car model's latent space and interpolate between them. However the results, using the existing pre-trained cars model and training resnet a bit did not do well at embedding images.
Any advice?
Thanks!
acura1

OSError: Unable to open file (file signature not found) loading the finetuned_resnet.h5

Hi @pbaylies ,

I'm trying to train the ResNET or efficientNET in order to generate an estimation of the latent vector of an input image of a bedroom.
In my case, I'm using the StyleGAN trained with LSUN Bedroom dataset at 256×256 to achieve a dataset of 10.000 synthesise bedroom images to train the ResNet.

Training Step:
Using your train_resnet.py code, the model is saved every 2 epochs in the following directory:
save_path = 'data/finetuned_resnet.h5'
model.save(save_path)

Testing Step:
When I try to load the above model in order to generate a latent vector for a specific bedroom image:
load_resnet='data/finetuned_resnet.h5'
ff_model = load_model(args.load_resnet)

The following error occurs:
File "/home/usuaris/imatge/mgrau/stylegan_env/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 88, in h5py.h5f.open OSError: Unable to open file (file signature not found)

It seems like the h5py don't recognise my file/model..
I've searched and tried some solutions without success:

  1. Trying to load you model and trying to load it. (Same error)
  2. Change the filename and directory. (Same error)

There are someone with the same problem?
Could you provide me some solution to fix this bug?

Thank you,

Stochastic Weight Averaging bug

I believe there is a bug in the implementation of stochastic weight averaging. Specifically, inside the apply_swa function for the network code, the scaling appears incorrect because the new model weights are scaled up by the epoch:

        tfutil.set_vars(tfutil.run({self.vars[name]: (src_net.vars[name] * epoch + self.vars[name])/(epoch + 1) for name in names}))

The result is that, regardless of what models swa.py reads in, the last pkl it reads will be scaled so much it overwrites pretty much all of weights in the current model. For example, the tenth model will be scaled massively (i.e., epoch=10) relative to the ones that come before it.

I believe the correct implementation would be:

        scale_new_data = 1.0 / (epoch + 1)
        scale_moving_average = (1.0 - scale_new_data)
        tfutil.set_vars(tfutil.run({self.vars[name]: (src_net.vars[name] * scale_new_data + self.vars[name] * scale_moving_average) for name in names}))

This is derived from the swa authors' repo, with the relevant portions here and here. I'd be happy to submit this fix in a pull request, and wanted to raise the issue here first in case you'd like to handle it differently.

Reproducing the bug should be straightforward. I encountered this problem while experimenting by running swa.py on wildly different models (as a kind of a cheap form of transfer or regularization). For example, I was averaging together gwern's anime model with ak9250's fine art portrait model, among several others, and noticed that the network_avg.pkl always produced samples matching whichever model was last among input pkls files it read. That lead me to inspect the code more closely, and find the original swa code. With the changes above, the output network_avg.pkl now works as expected, producing an average across the input models that appears close to what transfer learning would yield after a few ticks. And as you might expect, applying swa to anime and fine art portraits creates some nightmare material, painterly people with weird cartoony anime eyes :)

Also, I'd like to say thanks for the excellent implementation and updates here. It's really clean and been quite nice to work with.

Dlats stochastic clipping range

Hey! I've checked distribution of dlatent's components (W space) - they seem to be not in [-2,2] range, and not normal at all contrary to what's said in the readme...
It's more like gamma distribution in range [-0.25, 1.5]. I hardcoded the latter in you repo - quality seems to improve.

Two bugs found

First, hats off for the amazing repo !
Second, I've identified two bugs you'd might want to fix.

  • Running with "--tile_dlatents=true" doesn't work for me. The shape of the dlatents seems wrong as it should be (batch_size, 1, 512) instead of (batch_size, 512).

  • After fixing this, I still wasn't able to run the code with lower StyleGAN resolutions (e.g with "--model_res=256"). The "create_variable_for_generator" function should get as input the correct "model_scale" instead of using default value 18.

These two bugs are very simple to resolve, I've fixed both locally and it works for me.
I'll be happy to contribute, please let me know if I should submit a PR.

Error while running encode_images.py

I am getting this error while trying to load the drive file to google colab and i can't find a solution! Is it because i am limited and the file is too big? Is there any solution?

Downloading https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ ............ failed Traceback (most recent call last): File "encode_images.py", line 241, in <module> main() File "encode_images.py", line 115, in main with dnnlib.util.open_url(args.model_url, cache_dir=config.cache_dir) as f: File "/content/stylegan-encoder/dnnlib/util.py", line 381, in open_url raise IOError("Google Drive quota exceeded") OSError: Google Drive quota exceeded

Runs out of memory and crashes when encoding an image sequence.

When encoding a large number of images the encoding time will slowly increase until it becomes 2x-3x the time it took to encode the first image, then the script encode_images.py will crash. On my system it always crashes on the 56th image.

The culprit appears to be these lines in perceptual_model.py

self.sess.run(tf.assign(self.features_weight, weight_mask)) self.sess.run(tf.assign(self.ref_img_features, image_features)) self.sess.run(tf.assign(self.ref_weight, image_mask)) self.sess.run(tf.assign(self.ref_img, loaded_image))

I posted a pull request on Puzer's original stylegan-encoder: Puzer#4

but I'm not familiar enough with your changes to know how to fix it. There is more information here: Puzer#3

The changes you have made and collected are a fantastic step forward and actually make frame to frame stylegan animations possible. A fix for this bug would go a long way to helping encode image sequences.

Confusing description of stochastic clipping in the CLI

Hey!
First of all thank you for all the brilliant work on this project!

One thing that confused me was the description of the clipping_threshold in your encode_images.py CLI

Stochastic clipping of gradient values outside of this threshold

However this technique does not clip gradients - it clips the values of the optimized variable.

change age not precisely

Hi,thank you for sharing your awesome repo,thus i could make lots of fun.While playing with this repo i found it's not precisely when generating older faces,for example.

Here is the original image
00559

and the geneated aged old face with age coeff=-2.0 is

00559-1

RuntimeError: cannot join current thread

Encoding process crashes when trying to exit the tqdm() loop:

[00:51, 51.92s/it]
Exception ignored in: <bound method tqdm.__del__ of img_01: loss 84.5358; lr 0.0064:   9% 18/200 [00:25<01:46,  1.70it/s]>

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 931, in __del__
    self.close()
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 1133, in close
    self._decr_instances(self)
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 496, in _decr_instances
    cls.monitor.exit()
  File "/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 52, in exit
    self.join()
  File "/usr/lib/python3.6/threading.py", line 1053, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

************ Latent code optimization finished! ***************

This might be related to this tqdm issue, but I'm not entirely sure. It might be another issue that I can't see from the traceback, tried for an hour to fix it without luck.

To reproduce you can run this Colab Notebook.

StopIteration exception in encode_images.py

I'm using the StyleGAN_Encoder_Tutorial notebook in google colab but whe running the: !python encode_images.py aligned_images/ generated_images/ latent_representations/ \ --vgg_url=https://rolux.org/media/stylegan/vgg16_zhang_perceptual.pkl \ --batch_size=2
cell, I get the following error:
Using TensorFlow backend.
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Traceback (most recent call last):
File "encode_images.py", line 243, in
main()
File "encode_images.py", line 118, in main
generator = Generator(Gs_network, args.batch_size, randomize_noise=args.randomize_noise)
File "/content/stylegan2/encoder/generator_model.py", line 62, in init
self.dlatent_variable = next(v for v in tf.global_variables() if 'learnable_dlatents' in v.name)
StopIteration

Encode only coarse dlatents

Hey @pbaylies thanks for this repo! :)
You suggested to predict pose I could train a resnet to predict just the coarse dlatents (Puzer#15).

Would it be something to do with this reshape?

x = Reshape((model_scale, 512))(x) # train against all dlatent values

Or changing the size of W?

W = Gs.components.mapping.run(Z, None, minibatch_size=minibatch_size) # Use mapping network to get unique dlatents for more variation.
dlatent_avg = Gs.get_var('dlatent_avg') # [component]
W = (W[np.newaxis] - dlatent_avg) * np.reshape([truncation, -truncation], [-1, 1, 1, 1]) + dlatent_avg # truncation trick and add negative image pair
W = np.append(W[0], W[1], axis=0)
W = W[:, :mod_r]
W = W.reshape((n*2, model_scale, 512))

Thanks for your help, sorry I'm not from a ML background.

Conditional training

Thank you for open sourcing, can you please add how to train the conditional model in the documentation ? Thanks

Colab incompatibility with layer name

Current versions of Colab seem to expect a layer named G_synthesis_1/_Run/concat/concat:0 instead of G_synthesis_1/_Run/concat:0 in the generator. Thanks to @xsteenbrugge on Twitter and his following for discovering and reporting this issue!

W latent vector prediction model

Is it true that model that predict W latent vector (i.e. train_resnet.py) can be only used as initial estimate of W latent vector in optimization process? i.e. it's not possible to directly use this model to predict W latent vector without optimization process(i.e. it looks like if it used like this it change person identity)?

Encoding process stops too early

Sorry if this seems to be a stupid question.
I'm running this project in google colab.
When encode_images.py is running, it always stops at about 20 iterations even though I set it to run 200 iterations.
This is the command I used:
!python encode_images.py --batch_size=2 --output_video=True --load_resnet='data/finetuned_resnet.h5' --lr=0.01 --decay_rate=0.2 --iterations=200 --use_l1_penalty=0.1 aligned_images/ generated_images/ latent_representations/

reasons for having 18 (8) identical dlat vectors

Hey, very cool implementation, respect for masking.
I think i got it why them karras et al map lats to 18 identical dlat vectors, wondering if valid.
10 out of 18 are for noise, so we have 8 dlats. if imagine that each dlat is 1D then stylegan maps ffhq faces space to the line in 8D space which has 45° angle with all axes. so by forcing all ffhq faces embeds to this line they make stylegan to learn the whole 8D space yo.
wondering if this means that we should truncate not not avg dlat, but with dlat that rpresents the closest point on that line.

Finding latent representation in other domain

First of all, thanks Pbaylies for sharing this great repo! Hopefully you can help me with the following.

How can I find a latent representation of myself, in an portrait-art stylegan model? Right now, I can find a good latent representation of myself, but the generated image is not in portrait art style. This generated image is in the style of the original FFHQ stylegan model, just similar as the image of myself. How can I achieve what e.g. aiportraits.com has, showing the most similar representation of a person in a portrait-art stylegan model? I have tried to play with parameters and cross-over latent codes, but still without succes.

Generator Error

Thank you very much for sharing the good code.

I keep getting the following error while running the example in colab.
I'm almost there, and I would appreciate your help.


ValueError Traceback (most recent call last)
in ()
16 generator_network, discriminator_network, Gs_network = pickle.load(f)
17
---> 18 generator = Generator(Gs_network, batch_size=1, randomize_noise=False)
19
20 model_res = 1024

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
866 tb = [x for x in tb if "tensorflow/python" not in x[0]][:5]
867 raise ValueError("%s Originally defined at:\n\n%s" %
--> 868 (err_msg, "".join(traceback.format_list(tb))))
869 found_var = self._vars[name]
870 if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable learnable_dlatents already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)

VGG loss alone

Hi! I have a small question. How to do original VGG-loss-only with your code? I put VGG loss to 1 and all the others to 0 but still some difference with Puzer for example. Below is the results:

         Original                         Puzer                        Pbaylies

Original_1024256Puzer Adam 500it256Pbaylies VGG 100256

The scarf in some reason is not visualized. I don't use face mask. Can you please advice?

StyleGAN2 support

Hi,
Really like what you did here, I was wondering if you're planning to train a new ResNet model for StyleGAN2, or could guide me through the process, it would be interesting to see how it compares :)

feature axis corelated with each other

I once edited image on age or gender direction and bad result return.For example

  • i want to make a female be older but increase the age direction coef may leading to her gender be male like

  • if i made a male older ,this will lead a glass appear on his face

Obviously, feature direction are corelated with each other.We could borrow idea from tl-gan such as feature disentangle.There are two scripts there which i thought very helpful ,feature_axis.py and script_label_regression.py .

purpose of train_resnet

Hi and thanks for your contribution.
What is the purpose of training or using resent/affnet model in encode_images if you already have the vgg perceptual model?

image alignment corrupts 32bit images

when a padding is needed, the code in face_alignment will corrupt images with an alpha channel. fixed by:

img = PIL.Image.open(src_file).convert('RGBA').convert('RGB')

about efficientnet

hi,P:

When your effnet.py file is from efficientnet import *, the .keras is missing, it should be from efficientnet.keras import * or from efficientnet.tfkeras import *

It may be that the original author adjusted the structure.

I wish you a smooth creation and look forward to further work.

learning directions

how would you get something like 'emotion' 'happiness' for the linear model?
y_emotion_data = np.array([x['faceAttributes']['emotion']['happiness'] for x in labels_data])?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.