Giter VIP home page Giter VIP logo

pnp-diffusers's Introduction

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation (CVPR 2023)

arXiv Hugging Face Spaces TI2I

teaser

To plug-and-play diffusion features, please follow these steps:

  1. Setup
  2. Latent extraction
  3. Running PnP

Setup

Create the environment and install the dependencies by running:

conda create -n pnp-diffusers python=3.9
conda activate pnp-diffusers
pip install -r requirements.txt

Latent Extraction

We first compute the intermediate noisy latents of the structure guidance image. To do that, run:

python preprocess.py --data_path <path_to_guidance_image> --inversion_prompt <inversion_prompt>

where <inversion_prompt> should describe the content of the guidance image. The intermediate noisy latents will be saved under the path latents_forward/<image_name>, where <image_name> is the filename of the provided guidance image.

Running PnP

Run the following command for applying PnP on the structure guidance image:

python pnp.py --config_path <pnp_config_path>

where <pnp_config_path> is a path to a yaml config file. The config includes fields for providing the guidance image path, the PnP output path, translation prompt, guidance scale, PnP feature and self-attention injection thresholds, and additional hyperparameters. See an example config in config_pnp.yaml.

Citation

@InProceedings{Tumanyan_2023_CVPR,
    author    = {Tumanyan, Narek and Geyer, Michal and Bagon, Shai and Dekel, Tali},
    title     = {Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {1921-1930}
}

pnp-diffusers's People

Contributors

tnarek avatar michalgeyer avatar

Stargazers

 avatar Abhishek Tandon avatar  avatar ShenXiaolei avatar  avatar Shimin Li avatar  avatar  avatar  avatar Guo Xun avatar Maki Nishikino avatar  avatar  avatar EMPTY avatar Ruixiang JIANG avatar  avatar Wentao Hu avatar anyai avatar  avatar Long Tang avatar Jumponthemoon avatar Frank Liu avatar Hideki Izumi avatar Shaoxu Li avatar  avatar zhoujun avatar Sucong avatar PHAM Trung Kien  avatar Max Ku avatar  avatar Jiawei Wu avatar  avatar  avatar Sijie Zhao avatar Cameltr avatar Haodong LI avatar sangyeob avatar  avatar  avatar Jean de Dieu Nyandwi avatar 爱可可-爱生活 avatar Xiyan Xu avatar Jongmin Gim avatar Guy Tevet avatar Thodoris Kouzelis avatar Allan avatar FishNotFish avatar  avatar Maximilian Menke avatar yanqi avatar dingangui avatar  avatar fikry102 avatar  avatar Hiep Nguyen avatar  avatar Sherlock_yyf avatar Hyogon Ryu avatar Doron Adler avatar Yuseung (Phillip) Lee avatar Zijin Yin avatar WU YINWEI avatar Soumitri Chattopadhyay avatar Edward Seo avatar  avatar yao teng avatar  avatar Jeff Carpenter avatar Chunyu Li avatar Luo Hao avatar adakoda avatar Zhao (Dylan) Wang avatar Ferry Huang avatar  avatar kai wang avatar Sai Kumar Dwivedi avatar Jiawei Ren avatar  avatar Jiang Kai avatar David Sirera avatar  avatar Dvir Ben Or avatar Sicheng Mo avatar

Watchers

 avatar  avatar  avatar

pnp-diffusers's Issues

Way of obtaining latents embedding

Great work. I see that the latent embedding is extracted from the denoising process in your paper, but you set the 'latents_forward' as the default way in this code. Can you explain this? Is there any difference between these two settings? And Do you think which one is better?

Diffusers with newer version

Hi, the Code does not work anymore with diffusers version e.g. 0.25.0 because the unet expects another input nowadays.
Is there a quick fix how to handle such?

Injection thresholds

Hello, in you paper I found the following information on the injection threshold values
We set our default injection thresholds to: τA = 25, τf = 40 out of the 50 sampling steps; for primitive guidance image, we found that τA = τf = 25 to work better.
I'm not quite sure what you mean by primitive guidance image, and in your config you have pnp_attn_t: 0.5 and pnp_f_t: 0.8, so I'm trying to wrap my head around what the implications of these values are without needing to try them all. Do you have more information on the subject?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.