Giter VIP home page Giter VIP logo

encoder4editing's Introduction

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021)

Open In Colab

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Description

Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation" paper for both training and evaluation. The e4e encoder is specifically designed to complement existing image manipulation techniques performed over StyleGAN's latent space.

Recent Updates

2021.08.17: Add single style code encoder (use --encoder_type SingleStyleCodeEncoder).
2021.03.25: Add pose editing direction.

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Clone the repository:
git clone https://github.com/omertov/encoder4editing.git
cd encoder4editing
  • Dependencies:
    We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/e4e_env.yaml.

Inference Notebook

We provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to encode and perform several editings on real images using StyleGAN.

Pretrained Models

Please download the pre-trained models from the following links. Each e4e model contains the entire pSp framework architecture, including the encoder and decoder weights.

Path Description
FFHQ Inversion FFHQ e4e encoder.
Cars Inversion Cars e4e encoder.
Horse Inversion Horse e4e encoder.
Church Inversion Church e4e encoder.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.

In addition, we provide various auxiliary models needed for training your own e4e model from scratch.

Path Description
FFHQ StyleGAN StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution.
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during training.
MOCOv2 Model Pretrained ResNet-50 model trained using MOCOv2 for use in our simmilarity loss for domains other then human faces during training.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

To train the e4e encoder, make sure the paths to the required models, as well as training and testing data is configured in configs/path_configs.py and configs/data_configs.py.

Training the e4e Encoder

python scripts/train.py \
--dataset_type cars_encode \
--exp_dir new/experiment/directory \
--start_from_latent_avg \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--id_lambda 0.5 \
--val_interval 10000 \
--max_steps 200000 \
--stylegan_size 512 \
--stylegan_weights path/to/pretrained/stylegan.pt \
--workers 8 \
--batch_size 8 \
--test_batch_size 4 \
--test_workers 4 

Training on your own dataset

In order to train the e4e encoder on a custom dataset, perform the following adjustments:

  1. Insert the paths to your train and test data into the dataset_paths variable defined in configs/paths_config.py:
dataset_paths = {
    'my_train_data': '/path/to/train/images/directory',
    'my_test_data': '/path/to/test/images/directory'
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py:
DATASETS = {
   'my_data_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['my_train_data'],
        'train_target_root': dataset_paths['my_train_data'],
        'test_source_root': dataset_paths['my_test_data'],
        'test_target_root': dataset_paths['my_test_data']
    }
}

Refer to configs/transforms_config.py for the transformations applied to the train and test images during training.

  1. Finally, run a training session with --dataset_type my_data_encode.

Inference

Having trained your model, you can use scripts/inference.py to apply the model on a set of images.
For example,

python scripts/inference.py \
--images_dir=/path/to/images/directory \
--save_dir=/path/to/saving/directory \
path/to/checkpoint.pt 

Latent Editing Consistency (LEC)

As described in the paper, we suggest a new metric, Latent Editing Consistency (LEC), for evaluating the encoder's performance. We provide an example for calculating the metric over the FFHQ StyleGAN using the aging editing direction in metrics/LEC.py.

To run the example:

cd metrics
python LEC.py \
--images_dir=/path/to/images/directory \
path/to/checkpoint.pt 

Acknowledgments

This code borrows heavily from pixel2style2pixel

Citation

If you use this code for your research, please cite our paper Designing an Encoder for StyleGAN Image Manipulation:

@article{tov2021designing,
  title={Designing an Encoder for StyleGAN Image Manipulation},
  author={Tov, Omer and Alaluf, Yuval and Nitzan, Yotam and Patashnik, Or and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2102.02766},
  year={2021}
}

encoder4editing's People

Contributors

omertov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

encoder4editing's Issues

Pretrained discriminator model

Thank you for the nice work.

I was wondering if you guys have and can share the pretrained discriminator pt file.

Thank you.

resume_training_from_ckpt is missing some parameters

First of all, it's a great library

coach.py

    def load_from_train_checkpoint(self, ckpt):
        print('Loading previous training data...')
        self.global_step = ckpt['global_step'] + 1
        self.best_val_loss = ckpt['best_val_loss']
        self.net.load_state_dict(ckpt['state_dict'])

        if self.opts.keep_optimizer:
            self.optimizer.load_state_dict(ckpt['optimizer'])
        if self.opts.w_discriminator_lambda > 0:
            self.discriminator.load_state_dict(ckpt['discriminator_state_dict'])
            self.discriminator_optimizer.load_state_dict(ckpt['discriminator_optimizer_state_dict'])
        if self.opts.progressive_steps:
            self.check_for_progressive_training_update(is_resume_from_ckpt=True)
        print(f'Resuming training from step {self.global_step}')

global_step best_val_loss discriminator_state_dict discriminator_optimizer_state_dict It's not stored in the model

Error when trying to use encoder trained on own dataset

After training the encoder on my own dataset and trying to use it for inference, I get the following error :

Loading e4e over the pSp framework from checkpoint: e4e_ffhq_encode.pt
Traceback (most recent call last):
File "scripts/train.py", line 88, in
main()
File "scripts/train.py", line 28, in main
coach = Coach(opts, previous_train_ckpt)
File "./training/coach.py", line 39, in init
self.net = pSp(self.opts).to(self.device)
File "./models/psp.py", line 28, in init
self.load_weights()
File "./models/psp.py", line 43, in load_weights
self.encoder.load_state_dict(get_keys(ckpt, 'encoder'), strict=True)
File "/opt/conda/envs/e4e_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Encoder4Editing:
Unexpected key(s) in state_dict: "styles.16.convs.0.weight", "styles.16.convs.0.bias", "styles.16.convs.2.weight", "styles.16.convs.2.bias", "styles.16.convs.4.weight", "styles.16.convs.4.bias", "styles.16.convs.6.weight", "styles.16.convs.6.bias", "styles.16.convs.8.weight", "styles.16.convs.8.bias", "styles.16.convs.10.weight", "styles.16.convs.10.bias", "styles.16.linear.weight", "styles.16.linear.bias", "styles.17.convs.0.weight", "styles.17.convs.0.bias", "styles.17.convs.2.weight", "styles.17.convs.2.bias", "styles.17.convs.4.weight", "styles.17.convs.4.bias", "styles.17.convs.6.weight", "styles.17.convs.6.bias", "styles.17.convs.8.weight", "styles.17.convs.8.bias", "styles.17.convs.10.weight", "styles.17.convs.10.bias", "styles.17.linear.weight", "styles.17.linear.bias".

Did anyone face the same problem or does anyone have any hints which may help to solve the problem ?

Other directions from InterfaceGAN trained on StyleGAN2

Thank you for an awesome repo!

Since InterfaceGAN is originally trained on the StyleGAN1 generator, it seems you retrained it using StyleGAN2 (for ease of comparison with your method). Did you generate directions other than age and smile, (for example pose)? If so, could you pls provide it as part of the repo?

Thanks

train ffhq

Hi, thanks for the excellent work. I need to train my own face dataset, Can you provide scripts for training FFHQ(Some hyperparameters setting)?
In addition, my face image is 256x256, does it affect the model?

Styleflow editings

Thank you so much for your work.
Could you explain the next point in more detail? Step-by-step, please.

  1. Note, that for Styleflow editings,
    one need to save the output latent codes and load them over the official StyleFlow repository:
    torch.save(latents, 'latents.pt')

How to train on my own data?

Hi guys,

Really impressive work here!

I am wondering whether I can train on my own data and obtain a well-designed encoder for my in-domain issue instead of standard datasets of cats, horses, etc. Could you please give some instructions?

Inconsistent device when loading pSp model

pSp model uses device stored in the opts field of the checkpoint when loading latent_avg

self.latent_avg = ckpt['latent_avg'].to(self.opts.device)

In the currently saved checkpoint that defaults to cuda:0. So when loading the model to another device this will cause an error since model and latent_avg will b on different devices.

Adding:

opts['device'] = device

after this line should do the trick

opts = ckpt['opts']

About W space?

Hi authors:
Many thanks for your excellent works. However, I have some questions about W your novel definition as follows:
What is the difference between ?
image
and, In Figure. 3, I do not understand the means of the red/blue arrow. Could you kindly help me resolve this question in your spare time?

Best wishes.

how get other interfacegan_direction file?

your job very great!You provided three interfacegan_direction files:age.pt,pose.pt and smile.pt.
However,if I want to get other direction results,how get other interfacegan_direction file?
And I compared interfacegan with your file, and the result seems to be different. Can't we use a file directly.

sample_real_and_fake_latents

Thanks for your great work! Sorry for my english
But line 428 in coach.py. It should add latent_avg of style gan ?
It's right or my mistake ?
image

about ffhq_encode performance

I trained a model for ffhq_encode but the performance is bad on some scenes.
The background is difficult to learn.So what should i do to improve the performance?My training data is 5000 pictures.Should I add some training data?And my loss is id_loss, should I use moco loss?

}3DD8X%7HRWX)H4SR42DSBG
74FMT9D 19JBVA@TS)1I3JN
2%)2U7CY7B 5T(3OYAR%C8O

train on owned dataset

Hi, thanks for the amazing work. I am still confused for the training dataset, do I not need to put any class label to the images? Looking forward to your reply. Thanks

Save Interval Checkpoints?

Hi there! I'm having trouble getting the --save_interval flag to work. I set it to 1000 to test, for my last round of training, but no checkpoints were saved out. Am I right in assuming the repo is setup to only save a single checkpoint at the end of max_steps?

How to preprocess StyleGAN's latent code of size (1, 18, 512) to (1, 512) to get interfacegan_direction.

Hi thanks for your work!

I would like to ask how did you get the interfacegan_direction of size (1, 512) that is used in the colab notebook.

When we invert the image to the latent space using your code, the resulting latent code is of size (1, 18, 512). However, the method train_boundary() in the InterfaceGAN github receives input latent code of size (1, 512). What did you do to preprocess the latent code from size (1, 18, 512) to (1, 512)?

Thank you very much for your help!

interface gan pt

Hello,

Thanks for the amazing work! I'm running your code and I wondered where does age.pt, smile.pt and pose.pt come from. I checked interface gan repo and they did not provide it. Could you help me with this issue? Thank you very much!

Inverted Images Exhibit Droopy Eyes

Hi,

I am trying to use the provided E4E model pre-trained on FFHQ dataset to invert CelebA images into latent codes and generated them back. However, I found that all reconstructed face images exhibit droopy eyes, unlike the originals. do you know what is the cause and probably, how to fix it?

Thank you very much!

Examples:

182643
inv_5

182646
inv_8

182649
inv_11

182651
inv_13

How could you get directions for age, smile and pose?

Hi. Thanks for your great work!

Is it possible to obtain other directions(about face)?
You said you used StyleFlow. But StyleFlow needs latent codes and facial attributes(MS-API).
And it uses W-latents space(18 x 512) other than Z-space.
How did you do that? Am I missing about StyleFlow?

How to train on other animal dataset

If I want to train this model on other dataset like animal faces, but I don't have the correspond pretrained stylegan model. How can I train the stylegan model by myself?

StyleGAN2-ada-pytorch models supported?

Hello and thanks for sharing your code!

I am trying to train an encoder on a stylegan2-ada-pytorch model that I have pretrained. I am using a .pkl file (as is the default output of that repo) and I am getting errors (related to the ada repo in fact). Is it possible to use that model here, or do I need to convert it somehow?

Thanks in advance!

About some details of the PsP_encoder.py

Hi , Thanks for your great work ,I want to konw why you set" i == coarse_ind' not ">=",in this way, only one style vector will follow FPN's middle features. And the follow is the same way.
if i == self.coarse_ind:
p2 = _upsample_add(c3, self.latlayer1(c2)) # FPN's middle features
features = p2
elif i == self.middle_ind:
p1 = _upsample_add(p2, self.latlayer2(c1)) # FPN's fine features
features = p1

About pretrained model

Hi, I notice the Pretrained IR-SE50 model which is used for ID loss during training is trained based on face dataset. If I wanna train the model for car dataset whose ID feature shows big difference from FaceID's. Is it also work fine for the ID loss? Or is it better to replace the pretrained model with the one trained based on car datset. Thank you!

How to train for paired data?

Hi, Sorry if it's a silly issue.
I am a bit confused, Is it just the encoder?

How can we use it to train it on paired dataset , For example male to female, human to toon, etc?

Applying encoder for StyleSpace

Hi sir,

I recently have read a paper called StyleSpace.

  1. Can we apply the GANs inversion technique to invert the image into latent space of StyleSpace? It seems that this is S space (not W or W+)
  2. Should I retrain the psp model with StyleSpace Generator because StyleSpace generator seems to be a little different compared to StyleGan2 generator?

Is it possible to run on Windows?

Greetings!~

I'm just wondering if there is a way to run e4e on Windows.

Currently I'm keep getting this error when trying.. (Win10 with CUDA 10.2)

File "C:\Tools\Code\anaconda3\envs\StyleCLIP\lib\site-packages\torch\utils\cpp_extension.py", line 1539, in _run_ninja_build
env=env)
File "C:\Tools\Code\anaconda3\envs\StyleCLIP\lib\subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

About the number of iterations

Hi, thank you for the nice work.

I was wondering how many iterations you trained for the provided "FFHQ Inversion Pretrained Model".

Thank you.

How to sample latent codes from the W space?

Hi, as mentioned in here and here, we need to randomly sample w vectors in order to learn the interfacegan_direction.

However, I have no idea how to randomly sample the w vectors. Would you kindly tell me how to do it (using some function in this repo probably?)

Thank you very much!

# GANSpace

Hi, what other parameters besides the ones below can i change, where can i find the complete list?
['eye_openness', 'smile', 'trimmed_beard', 'white_hair', 'lipstick']
how to properly manipulate these code values?
image

Invert Images to W space

Hi, thanks for your code!

I need to invert images into their latent representations of size (1, 512) each. However, I notice that each latent representation produced by your code is of size (1, 18, 512) (I suppose this is the dimension of W+ space).

Is there a way to get a latent representation of size (1, 512) using your code? (probably the representation in the W space)
Or do you think one of the layers in the (1, 18, 512) tensor is reasonable to use as the image representation for further editing in the latent space?

Thank you very much!

Training Encoder On Xray Dataset

Hello,

First of all, thanks for your great work. I am trying to train the encoder on chest X-ray dataset. Although the results seems good, some details are missing, especially for medical sense. As can be seen from the example below, some important details such as cables are not recovered and this case is absolutely undesirable. By the way, results may seem pretty good for you but medical experts totally disagree :)

0002_200000

The parameters are:

{
"batch_size": 8,
"board_interval": 50,
"checkpoint_path": null,
"d_reg_every": 16,
"dataset_type": "xray_encode",
"delta_norm": 2,
"delta_norm_lambda": 0.0002,
"encoder_type": "Encoder4Editing",
"exp_dir": "/path/to/experiment/dir",
"id_lambda": 0.5,
"image_interval": 100,
"keep_optimizer": false,
"l2_lambda": 1.0,
"learning_rate": 0.0001,
"lpips_lambda": 0.8,
"lpips_type": "alex",
"max_steps": 200000,
"optim_name": "ranger",
"progressive_start": 20000,
"progressive_step_every": 2000,
"progressive_steps": [
0,
20000,
22000,
24000,
26000,
28000,
30000,
32000,
34000,
36000,
38000,
40000,
42000,
44000
],
"r1": 10,
"resume_training_from_ckpt": null,
"save_interval": null,
"save_training_data": false,
"start_from_latent_avg": true,
"stylegan_size": 256,
"stylegan_weights": "/path/to/stylegan2.pt",
"sub_exp_dir": null,
"test_batch_size": 4,
"test_workers": 4,
"train_decoder": false,
"update_param_list": null,
"use_w_pool": true,
"val_interval": 10000,
"w_discriminator_lambda": 0.1,
"w_discriminator_lr": 2e-05,
"w_pool_size": 50,
"workers": 8
}

In order to get better inversion for this kind of dataset, which parameters should I tune? How can I improve my results?

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.