Giter VIP home page Giter VIP logo

michaelgira23 / compositional-visual-generation-with-composable-diffusion-models-pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from energy-based-model/compositional-visual-generation-with-composable-diffusion-models-pytorch

0.0 2.0 0.0 170.45 MB

[ECCV 2022] Compositional Generation using Diffusion Models (PyTorch)

Home Page: https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/

License: Other

Python 4.74% Jupyter Notebook 95.26% Batchfile 0.01%

compositional-visual-generation-with-composable-diffusion-models-pytorch's Introduction

Composable Diffusion

We propose to use conjunction and negation (negative prompts) operators for compositional generation with conditional diffusion models (i.e., Stable Diffusion, Point-E, etc).


This is the official codebase for Compositional Visual Generation with Composable Diffusion Models.

Compositional Visual Generation with Composable Diffusion Models
Nan Liu 1*, Shuang Li 2*, Yilun Du 2*, Antonio Torralba 2, Joshua B. Tenenbaum 2
* Equal Contributation
1UIUC, 2MIT CSAIL
ECCV 2022 / MIT News / MIT CSAIL News

News

  • 12/22/22: Now you can use our code to apply compositional operators to Point-E!
  • 12/13/22: stabilityai/stable-diffusion-2-1-base and other updated versions can now be used for compositional generation. (see here!)
  • 10/10/22: Our proposed operators have been added into stable-diffusion-webui-conjunction!
  • 09/08/22: Our paper is on MIT News and MIT CSAIL News!
  • Now you can try to use compose Stable-Diffusion Model using our or to sample 512x512 images.

  • The codebase is built upon GLIDE and Improved-Diffusion.
  • This codebase provides both training and inference code.
  • The codebase can be used to train text-conditioned diffusion model in a similar manner as GLIDE.

Composed 2D Image Results using Stable-Diffusion.

Image Positive Prompts (AND Operator) Negative Prompts (NOT Operator)
Left ["A stone castle surrounded by lakes and trees, fantasy, wallpaper, concept art, extremely detailed", "Black and white"] None
Right ["A stone castle surrounded by lakes and trees, fantasy, wallpaper, concept art, extremely detailed"] ["Black and white"]
Image Positive Prompts (AND Operator) Negative Prompts (NOT Operator)
Left ["mystical trees", "A magical pond", "Dark"] None
Right ["mystical trees", "A magical pond"] ["Dark"]
  1. Samples generated by Stable-Diffusion using our compositional generation operator.
  2. More discussions and results about our proposed methods can be found in Reddit Post 1, Reddit Post 2 and Reddit Post 3!
  3. Some prompts are borrowed from Lexica!

Composed 3D Mesh Results using Point-E.

A green avocado AND A chair A chair AND NOT Chair legs A toilet AND A chair
A couch AND A boat A monitor AND A brown couch A chair AND A cake

Setup

Run following to create a conda environment, and activate it:

conda create -n compose_diff python=3.8
conda activate compose_diff

To install this package, clone this repository and then run:

pip install -e .
pip install diffusers==0.10.2
pip install open3d==0.16.0

Inference

Google Colab

The demo notebook shows how to compose natural language descriptions, and CLEVR objects for image generation.

Python

Compose natural language descriptions to generate 3D mesh using Point-E:

python scripts/txt2pointclouds_compose_pointe.py --prompts "a cake" "a house" --weights 3 3

Compose natural language descriptions using Stable-Diffusion:

# Conjunction (AND) by specifying positive weights
# weights can be adjusted, otherwise will be the same as scale
python scripts/image_sample_compose_stable_diffusion.py --prompts "mystical trees | A magical pond | dark" --weights "7.5 | 7.5 | 7.5" --scale 7.5 --steps 50 --seed 2
# NEGATION (NOT) by specifying negative weights
python scripts/image_sample_compose_stable_diffusion.py --prompts "mystical trees | A magical pond | dark" --weights "7.5 | 7.5 | -7.5" --scale 7.5 --steps 50 --seed 2

Compose natural language descriptions using pretrained GLIDE:

# Conjunction (AND) 
python scripts/image_sample_compose_glide.py --prompts "a camel" "a forest" --weights 7.5 7.5 --steps 100

Compose objects:

# Conjunction (AND) 
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma False --use_scale_shift_norm False --num_classes 2 --dataset clevr_pos --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
python scripts/image_sample_compose_clevr_pos.py $MODEL_FLAGS $DIFFUSION_FLAGS --ckpt_path $YOUR_CHECKPOINT_PATH

Compose objects relational descriptions:

# Conjunction (AND) 
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 4,3,9,3,3,7 --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
python scripts/image_sample_compose_clevr_rel.py $MODEL_FLAGS $DIFFUSION_FLAGS --ckpt_path $YOUR_CHECKPOINT_PATH

Training

To train a model on CLEVR Objects, we need to decide some hyperparameters as follows:

MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 2  --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"

Then, we run training script as such:

python scripts/image_train.py --data_dir ./dataset/ --dataset clevr_pos $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAG

Similarly, we use following commands to train a model on CLEVR Relations:

MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 4,3,9,3,3,7 --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"
python scripts/image_train.py --data_dir ./dataset/ --dataset clevr_rel $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

To train a text-conditioned GLIDE model, we also provide code for training on MS-COCO dataset.
Firstly, specify the image root directory path and corresponding json file for captions in image_dataset file.
Then, we can use following command example to train a model on MS-COCO captions:

MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"
python scripts/image_train.py --data_dir ./dataset/ --dataset coco $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

Dataset

Training datasets for both CLEVR Objects and CLEVR Relations will be downloaded automatically when running the script above.

If you need to manually download, the datasets used for training our models can be found at:

Dataset Link
CLEVR Objects https://www.dropbox.com/s/5zj9ci24ofo949l/clevr_pos_data_128_30000.npz?dl=0
CLEVR Relations https://www.dropbox.com/s/urd3zgimz72aofo/clevr_training_data_128.npz?dl=0

Citing our Paper

If you find our code useful for your research, please consider citing

@article{liu2022compositional,
  title={Compositional Visual Generation with Composable Diffusion Models},
  author={Liu, Nan and Li, Shuang and Du, Yilun and Torralba, Antonio and Tenenbaum, Joshua B},
  journal={arXiv preprint arXiv:2206.01714},
  year={2022}
}

compositional-visual-generation-with-composable-diffusion-models-pytorch's People

Contributors

michaelgira23 avatar nanlliu avatar shuangli59 avatar yilundu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.