Giter VIP home page Giter VIP logo

blended-diffusion's Introduction

Blended Diffusion for Text-driven Editing of Natural Images [CVPR 2022]

Blended Diffusion for Text-driven Editing of Natural Images

Omri Avrahami, Dani Lischinski, Ohad Fried

Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.

News

You may be interested in the follow-up project Blended Latent Diffusion, which produces better results and with a significant speed-up. Code is available here.

Getting Started

Installation

  1. Create the virtual environment:
$ conda create --name blended-diffusion python=3.9
$ conda activate blended-diffusion
$ pip3 install ftfy regex matplotlib lpips kornia opencv-python torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  1. Create a checkpoints directory and download the pretrained diffusion model from here to this folder.

Image generation

An example of text-driven multiple synthesis results:

$ python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"

The generation results will be saved in output/ranked folder, ordered by CLIP similarity rank. In order to get the best results, please generate a large number of results (at least 64) and take the best ones.

In order to generate multiple results in a single diffusion process, we utilized batch processing. If you get CUDA out of memory try first to lower the batch size by setting --batch_size 1.

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Acknowledgments

This code borrows from CLIP, Guided-diffusion and CLIP-Guided Diffusion.

Citation

If you use this code for your research, please cite the following:

@InProceedings{Avrahami_2022_CVPR,
    author    = {Avrahami, Omri and Lischinski, Dani and Fried, Ohad},
    title     = {Blended Diffusion for Text-Driven Editing of Natural Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {18208-18218}
}

blended-diffusion's People

Contributors

omriav avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.