Giter VIP home page Giter VIP logo

plug-and-play's Introduction

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation (CVPR 2023)

arXiv Hugging Face Spaces TI2I

teaser

Updates:

19/06/23 🧨 Diffusers implementation of Plug-and-Play is available here.

TODO:

  • Diffusers support and pipeline integration
  • Gradio demo
  • Release TI2I Benchmarks

Usage

To plug-and-play diffusion features, please follow these steps:

  1. Setup
  2. Feature extraction
  3. Running PnP
  4. TI2I Benchmarks

Setup

Our codebase is built on CompVis/stable-diffusion and has shared dependencies and model architecture.

Creating a Conda Environment

conda env create -f environment.yaml
conda activate pnp-diffusion

Downloading StableDiffusion Weights

Download the StableDiffusion weights from the CompVis organization at Hugging Face (download the sd-v1-4.ckpt file), and link them:

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

Setting Experiment Root Path

The data of all the experiments is stored in a root directory. The path of this directory is specified in configs/pnp/setup.yaml, under the config.exp_path_root key.

Feature Extraction

For generating and extracting the features of an image, first set the parameters for the translation in a yaml config file. An example of extraction configs can be found in configs/pnp/feature-extraction-generated.yaml for generated images and in configs/pnp/feature-extraction-real.yaml for real images. Once the arguments are set, run:

python run_features_extraction.py --config <extraction_config_path>

For real images, the timesteps at which features are saved are determined by the save_feature_timesteps argument. Note that for running PnP with T sampling steps for real images, you need to run the extraction with save_feature_timesteps = T (since we're sampling with 999 steps for reconstructing the real image, we need to specify the timesteps at which features are saved).

After running the extraction script, an experiment folder is created in <exp_path_root>/<source_experiment_name>, where source_experiment_name is specified by the config file. The experiment directory contains the following structure:

- <source_experiment_name>
    - feature_maps         # contains the extracted features
    - predicted_samples    # predicted clean images for each sampling timestep
    - samples              # contains the generated/inverted image
    - translations         # PnP translation results
    - z_enc.pt             # the initial noisy latent code
    - args.json            # the config arguments of the experiment

For visualizing the extracted features, see the Feature Visualization section.

Running PnP

For running PnP, first set the parameters for the translation in a yaml config file. An example of PnP config can be found in configs/pnp/pnp-generated.yaml for generated images and in configs/pnp/pnp-real.yaml for real images. Once the arguments are set, run:

python run_pnp.py --config <pnp_config_path>

In the config parameters, you can control the following aspects in the translation:

  • Structure preservation can be controlled by the feature_injection_threshold parameter (a higher value allows better structure preservation but can also leak details from the source image, ~80% of the total sampling steps generally gives a good tradeoff).
  • Deviation from the guidance image can be controlled through the scale, negative_prompt_alpha and negative_prompt_schedule parameters (see the sample config files for details). The effect of negative prompting is minor in case of realistic guidance images, but it can significantly help in case of minimalistic and abstract guidance images (e.g. segmentations).

Note that you can run a batch of translations by providing multiple target prompts in the prompts parameter.

Feature Visualization

ResBlock Features Visualization

For running PCA visualizations on the extracted ResBlock features (Figure 3 in the paper), first set the parameters for the visualization in a yaml config file. An example of visualization config can be found in configs/pnp/feature-pca-vis.yaml. Once the arguments are set, run:

python run_features_pca.py --config "<pca_vis_config_path>"

The feature visualizations are saved under <config.exp_path_root>/PCA_features_vis/<experiment_name> directory, where <experiment_name> is specified in the visualization config file.

Self-Attention Visualization

To visualize the self-attention maps of a generated/inverted image (Figure 6 in the paper), run:

python run_self_attn_pca.py --block "<visualization_module_name>" --experiment "<experiment_name>"

The self-attention visualizations are saved under <config.exp_path_root>/PCA_self_attention_vis/<experiment_name> directory.

TI2I Benchmarks

You can find the Wild-TI2I, ImageNetR-TI2I and ImageNetR-Fake-TI2I benchmarks in this dropbox folder. The translation prompts and all the necessary configs (e.g. seed, generation prompt, guidance image path) are provided in a yaml file in each benchmark folder.

Citation

@InProceedings{Tumanyan_2023_CVPR,
    author    = {Tumanyan, Narek and Geyer, Michal and Bagon, Shai and Dekel, Tali},
    title     = {Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {1921-1930}
}

plug-and-play's People

Contributors

arielreplicate avatar eltociear avatar hysts avatar michalgeyer avatar tnarek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plug-and-play's Issues

OSError

OSError: Can't load feature extractor for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwis
e, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a preprocessor_config.json file

transformers

Is the version of transformers correct? Error reported : without CLIPTextModelWithProjection package

How can I link sd-v1-4.ckpt file on windows 11?

I download sd-v1-4.ckpt file on hugging face and put it in models/ldm/stable-diffusion-v1. Then I don't know how to link the file because ln -s cmd cannot be identified in Windows cmd. Instead, I run command:

mklink "models/ldm/stable-diffusion-v1/" "models/ldm/stable-diffusion-v1/model.ckpt"

Then it reports Access Denied. I am not sure if my operation is correct.

High memory demand

Hi there, thanks for sharing this code. I am currently trying to make it work using an NVIDIA GeForce RTX 3060 with 12GB RAM. If I run "run_features_extraction.py" the call "z_enc, _ = sampler.encode_ddim(...)" finishes. However, right afterward when calling "samples_ddim, _ = sampler.sample()" I run into a "RuntimeError: CUDA out of memory" error. Is there some problem or does the model really need that much memory?
I would appreciate any help.

how to batch convert real images + fast feature extraction

Hello, I want to use your model to batch convert real images into different styles, but the intermediate files generated by your model take up too much space. Is there any way to avoid this issue? Also, the feature extraction stage in the first step of the model is quite slow. Are there any related commands to make it faster?

self attention Pca map

Hello,
Thanks for the wonderful work!! When I was visualizing the PCA of the self-attention map, I found that it could not achieve the same effect as in the PnP paper, that is, it could not share the spatial distribution with the input image.

pca0

If I directly average the resulting self-attention map (8 attention heads) and visualise it, the result is shown below, with some highlighting at the diagonals.
attention_map0

As I understand it, each row of the self-attention map is a query of one feature of q for the entire k map. The diagonal of self attention map corresponds to the feature doing the query at the same position of q and k. Because the values are the same so the similarity is high, it's a white dot at the diagonal. I would like to ask how this kind of graph has the same spatial distribution as the input image?
Maybe I misunderstood, please give me some advice!

Looking forward to your apply.

OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like CompVis/stable-diffusion-safety-checker is not the path to a directory containing a preprocessor_config.json file.

python run_features_extraction.py --config configs/pnp/feature-extraction-real.yaml
Traceback (most recent call last):
File "/data1/wz/anaconda3/envs/pnp-diffusion/lib/python3.8/site-packages/transformers/feature_extraction_utils.py", line 403, in get_feature_extractor_dict
resolved_feature_extractor_file = cached_path(
File "/data1/wz/anaconda3/envs/pnp-diffusion/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
output_path = get_from_cache(
File "/data1/wz/anaconda3/envs/pnp-diffusion/lib/python3.8/site-packages/transformers/utils/hub.py", line 545, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_features_extraction.py", line 14, in
from pnp_utils import check_safety
File "/data1/wz/research/plug-and-play-main/pnp_utils.py", line 14, in
safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id)
File "/data1/wz/anaconda3/envs/pnp-diffusion/lib/python3.8/site-packages/transformers/models/auto/feature_extraction_auto.py", line 270, in from_pretrained
config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
File "/data1/wz/anaconda3/envs/pnp-diffusion/lib/python3.8/site-packages/transformers/feature_extraction_utils.py", line 436, in get_feature_extractor_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like CompVis/stable-diffusion-safety-checker is not the path to a directory containing a preprocessor_config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

I would like to ask how you can solve this problem. I downloaded the sd-v1-4.ckpt file directly from the official website of the CompVis organization at Hugging Face, and then put this file directly into models/ldm/stable-diffusion-v1/
Don't know if this is possible?

Exact steps for running end-to-end editing

It would be great if you provide the exact steps to follow for running this editing code, it is very confusing to see real and generated configs and dont know when to run with what config. Please update

Affect of 'scale' and negative prompt

Hi,

I'm trying to understand the way the scale parameter affects the translation output.
The only information I found at the here was in the config file: "unconditional guidance scale. Note that a higher value encourages deviation from the source image"

Would you mind explaining how this parameter affect the translation and how it should be combined with other structure preserving control parameters like 'feature_injection_threshold' and the negative prompt parameters?

Prompting

Hello! Thanks for sharing your excellent work. A kind of question rather than an issue. Did you follow specific rules to prompt writing?For example if I want to change the viewpoint or the lighting of a photo is that possible via a suitable prompting? Or your method doesn't support this?
Thanks in advance.

Controling the number of inversion steps

Hi it seems the number of DDim steps for inversion is fixed :

ddim_inversion_steps = 999

Is there a specific reason why you only exposed the number of parameters for sampling the features (ddim_steps: 999 in feature extraction .yaml config files)

How related are the two parameters (i.e ddim_inversion_steps and exp_config.config.ddim_steps ) does a change in one requires changing the other?
I think the inversion could be good enough with 50 ddim_steps ?

A question about the model choice

I'm currently writing a google colab demo for this code and I was wondering if the code will run as expected from custom trained models or third party models (stable diffusion models) because in this repo you are using an outdated version of SD

Thank you and have a nice day :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.