Giter VIP home page Giter VIP logo

deltaedit's Introduction

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

Overview

This repository contains the offical PyTorch implementation of paper:

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation, CVPR 2023

News

  • [2023-03-11] Upload the training and inference code for the facial domain (◍•ڡ•◍).

To be continued...

We will release the training and inference code for the LSUN cat, church, horse later : )

Dependences

  • Install CLIP:

    conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
    pip install ftfy regex tqdm gdown
    pip install git+https://github.com/openai/CLIP.git
  • Download pre-trained models :

    • The code relies on the Rosinality pytorch implementation of StyleGAN2.
    • Download the pre-trained StyleGAN2 generator model for the faical domain from here, and then place it into the folder ./models/pretrained_models.
    • Download the pre-trained StyleGAN2 generator model for the LSUN cat, church, horse domains from here and then place them into the folder ./models/pretrained_models/stylegan2-{cat/church/horse}.

Training

Data preparing

  • DeltaEdit is trained on latent vectors.

  • For the facial domain, 58,000 real images from FFHQ dataset are randomly selected and 200,000 fake images from the z space in StyleGAN are sampled for training. Note that all real images are inverted by e4e encoder.

  • Download the provided FFHQ latent vectors from here and then place all numpy files into the folder ./latent_code/ffhq.

  • Generate the 200,000 sampled latent vectors by running the following commands for each specific domain:

    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname ffhq --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname cat --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname church --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname horse --samples 200000

Usage

  • The main training script is placed in ./scripts/train.py.
  • Training arguments can be found at ./options/train_options.py.

For training please run the following commands:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py

Inference

  • The main inferece script is placed in ./scripts/inference.py.
  • Inference arguments can be found at ./options/test_options.py.
  • Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
  • Some inference data are provided in ./examples.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

The produced results are showed in the following.

You can also specify your desired target attributes to the flag of --target.

Inference for real images

  • The main inferece script is placed in ./scripts/inference_real.py.
  • Inference arguments can be found at ./options/test_options.py.
  • Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
  • Download the pretrained e4e encoder e4e_ffhq_encode.pt from e4e.
  • One test image is provided in ./test_imgs.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference_real.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

Results

results

Acknowledgements

This code is developed based on the code of orpatashnik/StyleCLIP by Or Patashnik et al.

Citation

If you use this code for your research, please cite our paper:

@InProceedings{lyu2023deltaedit,
    author    = {Lyu, Yueming and Lin, Tianwei and Li, Fu and He, Dongliang and Dong, Jing and Tan, Tieniu},
    title     = {DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

deltaedit's People

Contributors

yueming6568 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.