Giter VIP home page Giter VIP logo

3dgan-inversion's Introduction

3D GAN Inversion with Pose Optimization

Official PyTorch implementation of the WACV 2023 paper

Jaehoon Ko*, Kyusun Cho*, Daewon Choi, Kwangrok Ryoo, Seungryong Kim,

*equal contribution

With the recent advances in NeRF-based 3D aware GANs quality, projecting an image into the latent space of these 3D-aware GANs has a natural advantage over 2D GAN inversion: not only does it allow multi-view consistent editing of the projected image, but it also enables 3D reconstruction and novel view synthesis when given only a single image. However, the explicit viewpoint control acts as a main hindrance in the 3D GAN inversion process, as both camera pose and latent code have to be optimized simultaneously to reconstruct the given image. Most works that explore the latent space of the 3D-aware GANs rely on ground-truth camera viewpoint or deformable 3D model, thus limiting their applicability. In this work, we introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing. The key to our approach is to leverage pre-trained estimators for better initialization and utilize the pixel-wise depth calculated from NeRF parameters to better reconstruct the given image. We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing to demonstrate the advantages of utilizing the latent space of 3D GANs.

1 2 3 0

For more information, check out the paper on Arxiv or Project page

Requirements

NVIDIA GPUs. We have done all testings on RTX 3090 GPU.

64-bit Python 3.9, PyTorch 1.11.0 + CUDA toolkit 11.3

conda env create -f environment.yml
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
conda activate 3dganinv

Pre-trained Networks

Download pre-trained weights on this google drive Links

Put weight of initializers and generators as followings:

└── root

    └── initializer

        └── pose_estimator.pt
    
        └── pose_estimator_quat.pt
    
        └── pose_estimator_afhq.pt
    
        └── e4e_ffhq.pt
    
        └── e4e_afhq.pt
    
    └── pretrained_models

        └── afhqcats512-128.pkl
    
        └── ffhqrebalanced512-128.pkl

Image Alignment

We refer the users to the preprocessing code from the EG3D representation

We also provide an easy-to-use image alignment notebook at

In addition, we manually cropped the facial areas for inverting images of cats.

Inversion

Run inversion process

python scripts/run_pti.py

You can edit the input & output directories, or GPU number on configs/paths_config.py

Credits

EG3D model and implementation:
https://github.com/NVlabs/eg3d Copyright (c) 2021-2022, NVIDIA Corporation & affiliates. License (NVIDIA) https://github.com/NVlabs/eg3d/blob/main/LICENSE.txt

PTI implementation:
https://github.com/danielroich/PTI Copyright (c) 2021 Daniel Roich
License (MIT) https://github.com/danielroich/PTI/blob/main/LICENSE

GANSPACE implementation:
https://github.com/harskish/ganspace Copyright (c) 2020 harkish
License (Apache License 2.0) https://github.com/harskish/ganspace/blob/master/LICENSE

Acknowledgement

This code implementation is heavily borrowed from the official implementation of EG3D and PTI. We really appreciate for all the projects.

Bibtex

@article{ko20233d,
  author    = {Ko, Jaehoon and Cho, Kyusun and Choi, Daewon and Ryoo, Kwangrok and Kim, Seungryong},
  title     = {3D GAN Inversion with Pose Optimization},
  journal   = {WACV},
  year      = {2023},
}

3dgan-inversion's People

Contributors

kyustorm7 avatar mlnyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

3dgan-inversion's Issues

Training with custom dataset

Hello,

To train a custom dataset, I wanted to know if you could provide the train.py file for training with custom dataset like eg3d does here? I guess I should be following the eg3d training scheme and specifying the trained eg3d model in PTI's config file. But would be happy to hear your suggestions.

Also in the paper it mentioned that:

We also use the pre-trained weights on the AFHQ dataset [8] for cat faces and evaluate on the AnimalFace10 dataset [31].

Does that mean you feed the pre-trained model into PTI? Could you give more details or explain the pipeline a bit? Thanks so much for the help!

Encoder and Pose Estimator pre-trained models

From which work do you borrow the encoder and pose estimator pre-trained models. Can you please clarify this?
Or if you have trained your own Encoder and Pose Estimator, can you provide the training scripts for that?

Image Alignment

Thanks for your excellent job

When I use the provided easy-to-use image alignment notebook, I got the wrong image that exists with a black border

2

I need more help.

Thanks

editing

Hi,

Are there any reasons for using the following editing directions?

    ganspace_directions = {
            'bright hair': (2, 7, 7, 4), #positive (direction)
            'smile': (12, 0, 5, 2), #positive 
            'age' : (5, 0, 5, 3.5), #negative: young
            'short hair': (2, 0, 5, 4), #negative
            'glass': (4, 0, 5, 4), #negative
            'gender': (0, 0, 5, 4) #negative(female -> male)
    }

Is there any literature helping you find these directions for eg3d or are these from your practical experiments? if so, how did you find them?

Thanks!

The code of inversion running error

I downloaded the corresponding models and placed them in the corresponding folder. I also ran this code on RTX3090, but the following bug appeared without changing any code.

Loading ResNet ArcFace
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/ubuntu/anaconda3/envs/latent3d/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
0%| | 0/4 [00:00<?, ?it/s]Setting up PyTorch plugin "bias_act_plugin"... Done.
/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/projectors/w_projector.py:115: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
w_opt = torch.tensor(mean_w + start_w, dtype=torch.float32, device=device,
/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/projectors/w_projector.py:118: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
translation_opt = torch.tensor(start_translation, dtype=torch.float32, device=device,
Setting up PyTorch plugin "upfirdn2d_plugin"... Done. | 0/400 [00:00<?, ?it/s]
0%| | 0/400 [00:01<?, ?it/s]
0%| | 0/4 [00:10<?, ?it/s]
Traceback (most recent call last):
File "scripts/run_pti.py", line 60, in
run_PTI(run_name='', use_wandb=False, use_multi_id_training=False)
File "scripts/run_pti.py", line 54, in run_PTI
coach.train(P, E)
File "/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/coaches/single_id_coach.py", line 50, in train
w_pivot, freezed_cam = self.calc_inversions(image, image_name, cam_encoder, e4e_encoder, folder_dir)
File "/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/coaches/base_coach.py", line 86, in calc_inversions
ws, cam = w_projector.project(self.G, id_image, device=torch.device(global_config.device), w_avg_samples=5000,
File "/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/projectors/w_projector.py", line 204, in project
warp_loss, test_img = calc_warping_loss(ws_clone, canonical_cam_clone, pred_ext, init_ext, intrinsic, pred_depths, target_images_contiguous,
File "/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/warping_loss.py", line 35, in calc_warping_loss
torch_target_features = get_features(target_images, torch_vgg, layers)
File "/home/ubuntu/Documents/ruihua/StyleGAN/Code/3DGAN-Inversion-main/./training/warping_loss.py", line 79, in get_features
x1 = layer_list0
File "/home/ubuntu/anaconda3/envs/latent3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/latent3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/anaconda3/envs/latent3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 3, 3], but got 3-dimensional input of size [3, 512, 512] instead

seek to generate data

Congratulations on getting such an excellent job, could you please give me a copy of the generated data, if you can, we will be very grateful, and we will also cite such excellent work in our future work.

generate video

Hi, I would like to know how to generate a video given the output of PTI.
I saw some pipelines here: NVlabs/eg3d#66 and seems PTI outputs only the .pt network here and gen_videos.py require the pkl network format here
I am wondering how did you save the network (.pkl) and use that in the gen_videos.py. Thanks!

3d shape and depth map extraction

First, congrats for your interesting work, but I have a question:

How can I get the 3d shape in mrc file from a given photo that has been inverted? So far the output is only a mp4 video. How can I get the correspondent 3d shape? (as the ones generated with gen_samples from the ffhq pkl).

On the other hand.. is it possible to obtain the depth map video as the one you display in your project website aside the image video?.

Any help would be much appreciated.

Thanks.

Details smoothed out

pexels-simon-robben-614810_inversion

Lot of details are smoothed out, how to avoid that.
Also hairstyle changes a lot

CelebA-HQ dataset consultation

Can I ask you for the CelebA-HQ dataset with pose for evaluating models? It would be even better if you could share the corresponding download link with me, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.