Giter VIP home page Giter VIP logo

vinairesearch / hyperinverter Goto Github PK

View Code? Open in Web Editor NEW
113.0 5.0 13.0 4.42 MB

HyperInverter: Improving StyleGAN Inversion via Hypernetwork (CVPR 2022)

Home Page: https://di-mi-ta.github.io/HyperInverter/

License: Apache License 2.0

Python 86.20% C++ 2.81% Cuda 10.07% Shell 0.92%
cvpr2022 gan-inversion generative-adversarial-network image-interpolation image-manipulation stylegan-encoder stylegan-inversion stylegan2 stylegan2-ada

hyperinverter's Introduction

Table of contents
  1. Getting Started
  2. Experiments
  3. Acknowledgments
  4. Contacts

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

Tan M. Dinh, Anh Tran, Rang Nguyen, Binh-Son Hua
VinAI Research, Vietnam

Abstract: Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. In the first phase, we train an encoder to map the input image to StyleGAN2 W space, which was proven to have excellent editability but lower reconstruction quality. In the second phase, we supplement the reconstruction ability in the initial phase by leveraging a series of hypernetworks to recover the missing information during inversion. These two steps complement each other to yield high reconstruction quality thanks to the hypernetwork branch and excellent editability due to the inversion done in the W space. Our method is entirely encoder-based, resulting in extremely fast inference. Extensive experiments on two challenging datasets demonstrate the superiority of our method.

teaser.png
Our method outperforms significantly other encoder-based methods (pSp, e4e, ReStyle) while having the inferece time as same as them. In comparison with optimization-based approaches (SG2-W+, PTI), our work is on par of quality with SG2-W+, and sightly lower than PTI. However, it is worth noting that, our method run very faster allowing for interactive applications (3000 and 1100 times shorter than SG2-W+ and PTI).

Details of the model architecture and experimental results can be found in our following paper.

@inproceedings{dinh2021hyperinverter,
    title={HyperInverter: Improving StyleGAN Inversion via Hypernetwork},
    author={Tan M. Dinh and Anh Tuan Tran and Rang Nguyen and Binh-Son Hua},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

Please CITE our paper whenever our model implementation is used to help produce published results or incorporated into other software.

Getting Started

The codebase is tested on

  • Ubuntu
  • CUDA 10.0, CuDNN 7

Installation

  • Clone this repo:
git clone https://github.com/VinAIResearch/HyperInverter.git
cd HyperInverter
  • Install dependencies:
conda create -p ./envs python=3.7.3
conda activate ./envs
pip install -r requirements.txt

Datasets

  • Human Faces: We use 70,000 images from FFHQ dataset to train, and 2,824 images from CelebA-HQ dataset to test. The images have 1024 x 1024 resolution and are cropped and aligned to the center. Refering FFHQ for more details about the pre-processing step.

  • Churches: We use all 126,227 images and 300 images from the official train and test sets of LSUN Church to train and evaluate our model. The images are resized to 256 x 256 resolution.

Please download the corresponding datasets and unzip to the data folder. Then, go to configs/paths_config.py and modify this file to link data correctly:

dataset_paths = {
    "ffhq": "/path/to/ffhq/train_img",
    "celeba_test": "/path/to/CelebA-HQ/test_img",
    "church_train": "/path/to/lsun-church/train_img",
    "church_test": "/path/to/lsun-church/test_img",
}

If you need to try with your own dataset, you can make the necessary modifications in: (i) data_configs.py to define your data paths; (ii) transforms_configs.py to define your data transformations.

Auxiliary pre-trained models

Run the below command to automatically download the auxiliary pre-trained models needed in the experiments.

python scripts/download_auxiliary_pretrained_models.py

Basicly, we set these paths correctly to training or inference processes. If you want to change these paths, please visit file configs/paths_config.py and modify the corresponding record in the model_paths dict.

Experiments

Pre-trained Models.

See Model Zoo for our official pre-trained models. Please download our pretrained models (both W Encoder and HyperInverter) and put them to the pretrained_models folder.

Training

Phase I: W Encoder

Please follow pSp or e4e to train the W encoder to encode images to latent codes in W space (512).

To save time, we release our pre-trained W encoders for churches and human faces in Model Zoo.

Phase II: HyperInverter

We provide the default training scripts as below. If you have time, please tune the hyper-parameters further to get the best results. Note that the argument --hidden_dim has a large effect on the quality of the model, increasing them cause a bigger model and tend to have better quality. In the paper, we use hidden_dim=256. However, we recommend using hidden_dim=128 to balance the model size and model performance. In addition, it is worth noting that, our code is also easy to modify for predicting different layer weights of StyleGAN2. Refering to weight_shapes for reference and modify them if you need to try with other layers. Then, you modify the get_target_shapes function in hyper_inverter to add your option.

  • Human Faces
EXPERIMENT_DIR=""
W_ENCODER_PATH=""
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python scripts/train.py \
--dataset_type=ffhq_encode \
--encoder_type=LayerWiseEncoder \
--w_encoder_path="$W_ENCODER_PATH" \
--output_size=1024 \
--exp_dir="$EXPERIMENT_DIR" \
--batch_size=8 \
--batch_size_used_with_adv_loss=4 \
--workers=4 \
--val_interval=1000 \
--save_interval=5000 \
--encoder_optim_name=adam \
--discriminator_optim_name=adam \
--encoder_learning_rate=1e-4 \
--discriminator_learning_rate=1e-4 \
--hyper_lpips_lambda=0.8 \
--hyper_l2_lambda=1.0 \
--hyper_id_lambda=0.1 \
--hyper_adv_lambda=0.005 \
--hyper_d_reg_every=16 \
--hyper_d_r1_gamma=10.0 \
--step_to_add_adversarial_loss=200000 \
--target_shape_name=conv_without_bias  \
--max_steps=500000 \
--hidden_dim=128 \
--num_cold_steps=20000 \
--save_checkpoint_for_resuming_training \
--use_wandb
  • Churches
EXPERIMENT_DIR=""
W_ENCODER_PATH=""
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python scripts/train.py \
--dataset_type=church_encode \
--encoder_type=ResNetLayerWiseEncoder \
--w_encoder_path="$W_ENCODER_PATH" \
--output_size=256 \
--exp_dir="$EXPERIMENT_DIR" \
--batch_size=8 \
--batch_size_used_with_adv_loss=4 \
--workers=4 \
--val_interval=1000 \
--save_interval=5000 \
--encoder_optim_name=adam \
--discriminator_optim_name=adam \
--encoder_learning_rate=1e-4 \
--discriminator_learning_rate=1e-4 \
--hyper_lpips_lambda=0.8 \
--hyper_l2_lambda=1.0 \
--hyper_id_lambda=0.5 \
--hyper_adv_lambda=0.15 \
--hyper_d_reg_every=16 \
--hyper_d_r1_gamma=100.0 \
--step_to_add_adversarial_loss=100000 \
--target_shape_name=conv_without_bias \
--max_steps=500000 \
--hidden_dim=128 \
--num_cold_steps=10000 \
--save_checkpoint_for_resuming_training \
--use_wandb

Inference

  • Placing the input images in a folder.

  • Pre-processing (if needed). For human facial domain, if the input images have not been cropped and aligned, please run the below script to preprocess data.

RAW_IMAGE_DIR=""
PROCESSED_IMAGE_DIR=""

python scripts/align_all_parallel.py \
--raw_dir "$RAW_IMAGE_DIR" \
--saved_dir "$PROCESSED_IMAGE_DIR" \
--num_threads 8 

The descriptions of the arguments are shown below.

Args Descriptions
RAW_IMAGE_DIR Path to folder containing raw input images
PROCESSED_IMAGE_DIR Path to folder saving processed input images
  • Run inference

Set the argument correctly before running the below script.

INPUT_DATA_DIR=""
RESULT_DIR=""
MODEL_PATH=""
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python scripts/inference.py \
--exp_dir="$RESULT_DIR" \
--checkpoint_path="$MODEL_PATH" \
--data_path="$INPUT_DATA_DIR" \
--batch_size=4 \
--workers=4

The description of the arguments are shown below.

Args Descriptions
RESULT_DIR The directory saved the inference results
MODEL_PATH Path to HyperInverter model
INPUT_DATA_DIR Path to folder containing processed input images

Finally, the reconstructed images can be found in RESULT_DIR/inference_results folder.

Quantitative Evaluation

We have prepared the scripts for quantitatively reconstruction evaluation on human faces and churches models. Please set the arguments to compatible with your model in these files and run below commands to conduct the evaluation process.

  • Human Faces
sh sample_scripts/human_faces_reconstruction_quantitative_evaluation.sh
  • Churches
sh sample_scripts/church_reconstruction_quantitative_evaluation.sh

Qualitative Comparison

For the following experiments, please visit file configs/paths_config.py and update model_paths dict with the paths to the pre-trained models of HyperInverter and other inversion methods. For other inversion methods, please visit their Github repositories to download their pre-trained weights.

Reconstruction

The sample script for qualitatively reconstruction comparison is:

DOMAIN=""
METHODS=""
INPUT_DATA_DIR=""
SAVED_RESULTS_DIR_NAME=""
MAX_NUM_IMAGES=100
SAVED_SIZE=1024
GPU_ID=0


CUDA_VISIBLE_DEVICES="$GPU_ID" \
python evaluation/reconstruction_comparison.py \
--methods="$METHODS" \
--domain="$DOMAIN" \
--input_data_dir="$INPUT_DATA_DIR"  \
--input_data_id="$SAVED_RESULTS_DIR_NAME" \
--output_dir=outputs \
--saved_embedding_dir=embeddings \
--max_num_images="$MAX_NUM_IMAGES" \
--resize="$SAVED_SIZE"

The descriptions of the arguments are shown below.

Args Descriptions
DOMAIN The input domain, options are {human_faces, churches}
METHODS The inversion methods, separated by comma, supported methods are {hyper_inverter, psp, e4e, SG2_plus, SG2, w_encoder, hyper_inverter, restyle_e4e}
INPUT_DATA_DIR Path to folder containing processed input images
SAVED_RESULTS_DIR_NAME The name of folder saving results
MAX_NUM_IMAGES The maximum of number images to process
SAVED_SIZE The size of saved images for each method

The results can be found in the outputs/SAVED_RESULTS_DIR_NAME folder.

Editing

The sample script for editing comparison is:

DOMAIN=""
METHODS=""
DIRECTION=""
INPUT_DATA_DIR=""
SAVED_RESULTS_DIR_NAME=""
MAX_NUM_IMAGES=10
SAVED_SIZE=1024
MIN_MAG=-30  
MAX_MAG=30
STEP=5 
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python evaluation/editing_inference.py \
--methods="$METHODS" \
--domain="$DOMAIN" \
--input_data_dir="$INPUT_DATA_DIR" \
--input_data_id="$SAVED_RESULTS_DIR_NAME" \
--output_dir=outputs \
--saved_embedding_dir=embeddings \
--direction="$DIRECTION" \
--min_factor="$MIN_MAG" \
--max_factor="$MAX_MAG" \
--step="$STEP" \
--max_num_images="$MAX_NUM_IMAGES" \
--resize="$SAVED_SIZE" \
--save_edited_images \
--gif_speed=4 

The results can be found in the outputs/SAVED_RESULTS_DIR_NAME folder. Please try with different values of MIN_MAG and MAX_MAG to get the best result.

The descriptions of the arguments are shown below.

Args Descriptions
DOMAIN The input domain, options are {human_faces, churches}
METHODS The inversion methods, separated by comma, supported methods are {hyper_inverter, psp, e4e, SG2_plus, SG2, w_encoder, restyle_e4e}, for example: hyper_inverter,psp,e4e
DIRECTION The editing direction, the supported directions are shown the below table
INPUT_DATA_DIR Path to folder containing processed input images
MIN_MAG The minimum editing magnitude, please tune this argument to get the best result
MAX_MAG The maximum editing magnitude, please tune this argument to get the best result
STEP The step we move from min magnitude to max magnitude
SAVED_RESULTS_DIR_NAME The name of folder saving results
MAX_NUM_IMAGES The maximum of number images to process
SAVED_SIZE The size of saved images for each method

The supported editing directions are listed below.

  • Human Faces
Method Editing Directions
GANSpace eye_openness, trimmed_beard, lipstick face_roundness, nose_length, eyebrow_thickness, head_angle_up, displeased
InterFaceGAN age, smile, rotation
StyleCLIP surprised, afro, angry, beyonce, bobcut, bowlcut, curly_hair, hilary_clinton, depp, mohawk, purple_hair, taylor_swift, trump, zuckerberg
  • Churches
Method Editing Directions
GANSpace clouds, vibrant, blue_skies, trees

Applications: Real-world image interpolation

The sample script for the interpolation of two real images is:

DOMAIN=""
METHODS=""
PATH_TO_INPUT_IMAGE_1=""
PATH_TO_INPUT_IMAGE_2=""
SAVED_RESULTS_DIR=""
SAVED_RESULTS_FILE_NAME=""
SAVED_SIZE=1024
NUM_STEPS=100
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python evaluation/real_image_interpolation.py \
--domain="$DOMAIN" \
--method="$METHODS" \
--left_image_path="$PATH_TO_INPUT_IMAGE_1" \
--right_image_path="$PATH_TO_INPUT_IMAGE_2" \
--steps="$NUM_STEPS" \
--saved_image_size="$SAVED_SIZE" \
--saved_dir="$SAVED_RESULTS_DIR" \
--saved_file_name="$SAVED_RESULTS_FILE_NAME" \
--save_interpolated_images \
--gif_speed=2

The descriptions of the arguments are shown below.

Args Descriptions
DOMAIN The input domain, options are {human_faces, churches}
METHODS The inversion methods, separated by comma, supported methods are {hyper_inverter, psp, e4e, SG2_plus, SG2, w_encoder, restyle_e4e}, for example: hyper_inverter,psp,e4e
PATH_TO_INPUT_IMAGE_1 Path to the input image 1
PATH_TO_INPUT_IMAGE_2 Path to the input image 2
NUM_STEPS Number of interpolation steps
SAVED_RESULTS_DIR The path to folder saving results
MAX_NUM_IMAGES The maximum of number images to process
SAVED_RESULTS_FILE_NAME The name of gif result file
SAVED_SIZE The size of saved images for each method

Acknowledgments

Our source code is developed based on the codebase of a great series of StyleGAN inversion researches from the Tel Aviv University group, which are: pSp, e4e, ReStyle and PTI.

For auxiliary pre-trained models, we specifically thank to TreB1eN, MoCov2, CurricularFace and MTCNN. For editing directions, thanks to the authors of GANSpace, InterFaceGAN and StyleCLIP.

We leverage the PyTorch implementation of StyleGAN2-ADA for the StyleGAN model. All pre-trained StyleGAN models are from the official release of StyleGAN2. We convert the original weights exported by TensorFlow code to compatible with the PyTorch version of StyleGAN2-ADA by using the author's official script.

Overall, thank you so much to the authors for their great works and efforts to release source code and pre-trained weights.

Contacts

If you have any questions, please drop an email to [email protected] or open an issue in this repository.

hyperinverter's People

Contributors

anhttran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hyperinverter's Issues

confused with some code.

Hi, sorry to bother you again. I find some code in class WeightRegressor like:

  out = torch.matmul(out, self.w1) + self.b1
  out = out.view(bs, self.in_channels, self.hidden_dim)
  out = torch.matmul(out, self.w2) + self.b2
  kernel = out.view(bs, self.out_channels, self.in_channels, self.kernel_size, self.kernel_size)  # like bz, 512, 512, 3, 3

why not use linear layer, like

w1 = nn.Linear(128, 65536)
w2 = nn.Linear(65536, 2359296)
out = w1(out)
out = w2(out)
kernel = out.view(bs, self.out_channels, self.in_channels, self.kernel_size, self.kernel_size)

Is it possible to convert image to different domain?

THank you for your great work.
Is to possible to convert image to different domain and not using latent directions to change latect codes?
For example, pSp can generate a front-facing face from a given input image and generate photo-realistic face images from ambiguous sketch images

I am wondering HyperInverter can do the samething?

If it is possilbe, should I modeified your codes? or I just need to change data path?

Looking forward to your reply.

擷取

Problem testing inference

Trying to run inference on Windows:
Installed all requirements without any error, but I do get one when try to run inference:

(HyperInverter) H:\AI\HyperInverter>python scripts/inference.py --exp_dir=./results --checkpoint_path=./pretrained_models/hyper_inverter_e4e_ffhq_encode_large.pt --data_path=./data --batch_size= 4 --workers=4
C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py:287: UserWarning: Error checking compiler version for cl: [WinError 2] Impossibile trovare il file specificato
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
INFORMAZIONI: impossibile trovare file corrispondenti ai
criteri di ricerca indicati.
Traceback (most recent call last):
  File "scripts/inference.py", line 14, in <module>
    from models.hyper_inverter import HyperInverter  # noqa: E402
  File ".\models\hyper_inverter.py", line 10, in <module>
    from models.encoders import fpn_encoders
  File ".\models\encoders\fpn_encoders.py", line 5, in <module>
    from models.stylegan2.model import EqualLinear
  File ".\models\stylegan2\model.py", line 5, in <module>
    from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
  File ".\models\stylegan2\op\__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File ".\models\stylegan2\op\fused_act.py", line 14, in <module>
    os.path.join(module_path, "fused_bias_act_kernel.cu"),
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py", line 1202, in _jit_compile
    with_cuda=with_cuda)
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py", line 1293, in _write_ninja_file_and_build_library
    with_cuda=with_cuda)
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py", line 1689, in _write_ninja_file_to_build_library
    with_cuda=with_cuda)
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\site-packages\torch\utils\cpp_extension.py", line 1791, in _write_ninja_file
    'cl']).decode().split('\r\n')
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "C:\Users\GIN\.conda\envs\HyperInverter\lib\subprocess.py", line 487, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

ModuleNotFoundError: No module named 'dnnlib.tflib'

Hi, I am getting the following error when try to run the inference script, I did complete all the installation steps without an error

Traceback (most recent call last):
  File "scripts/inference.py", line 91, in <module>
    run()
  File "scripts/inference.py", line 34, in run
    net = HyperInverter(opts)
  File "./models/hyper_inverter.py", line 58, in __init__
    self.load_weights()
  File "./models/hyper_inverter.py", line 99, in load_weights
    ckpt = pickle.load(f)
ModuleNotFoundError: No module named 'dnnlib.tflib'

girl have beard

You really did a great job. Thanks for sharing. When test some image, I found when person become old, person will wear glass, some girl have beard. How to solve that?
screenshot-20220616-104340
8b927126db85d175fb8cf6b69724579d

It seems exits much noise on face in some images

Hi, thank you for your great work. When I test on many images, I found comes several defect as follows:

  1. It seems exits much noise on face in some images(zoom in)
    orgin image: tbqorg (2)
    invert image:
    tbqinv (2)

  2. It seems sensitive to hair on the face:
    origin image:
    tbqorg (3)
    invert image:
    tbqinv (3)

Can you give me some advise to solve that?

When try with different images loss not going down as low as in the given INVERTME.png

Hi, Thank you for the great tutorial about GAN invention, I am new to this topic.

I just checked the inversion and rebuilding of a different image of size 256x256 from the ffhq dataset. But it stops at 0.5 mse loss. What may be the reason for that and How can I avoid it and make it generate like the given INVERTME.png image.

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.