Giter VIP home page Giter VIP logo

instructdiffusion's Introduction

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Project Page | Arxiv | Web Demo | QuickStart | Training | Acknowledge | Citation

This is the pytorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Our code is based on the Instruct-pix2pix and CompVis/stable_diffusion.

QuickStart

Follow the steps below to quickly edit your own images. The inference code in our repository requires one GPU with > 9GB memory to test images with a resolution of 512.

  1. Clone this repo.

  2. Setup conda environment:

    conda env create -f environment.yaml
    conda activate instructdiff
    
  3. We provide a well-trained checkpoint and a checkpoint that has undergone human-alignment. Feel free to download to the folder checkpoints and try both of them.

  4. You can edit your own images:

python edit_cli.py --input example.jpg --edit "Transform it to van Gogh, starry night style."

# Optionally, you can customize the parameters by using the following syntax: 
# --resolution 512 --steps 50 --config configs/instruct_diffusion.yaml --ckpt YOUR_CHECKPOINT --cfg-text 3.5 --cfg-image 1.25

# We also support loading image from the website and edit, e.g., you could run the command like this:
python edit_cli.py --input "https://wallup.net/wp-content/uploads/2016/01/207131-animals-nature-lion.jpg" \
   --edit "Transform it to van Gogh, starry night style." \
   --resolution 512 --steps 50 \
   --config configs/instruct_diffusion.yaml \
   --ckpt checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt \
   --outdir logs/

For other different tasks, we provide recommended parameter settings, which can be found in scripts/inference_example.sh.

  1. (Optional) You can launch your own interactive editing Gradio app:
python edit_app.py 

# You can also specify the path to the checkpoint
# The default checkpoint is checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt
python edit_app.py --ckpt checkpoints/v1-5-pruned-emaonly-adaption-task-humanalign.ckpt

Training

The code is developed using python 3.8 on Ubuntu 18.04. The code is developed and tested using 48 NVIDIA V100 GPU cards, each with 32GB of memory. Other platforms are not fully tested.

Installation

  1. Clone this repo.
  2. Setup conda environment:
    conda env create -f environment.yaml
    conda activate instructdiff
    

Pre-trained Model Preparation

You can use the following command to download the official pre-trained stable diffusion model, or you can download the model trained by our pretraining adaptation process from OneDrive and put it into the following folder: stable_diffusion/models/ldm/stable-diffusion-v1/.

bash scripts/download_pretrained_sd.sh

Data Preparation

You can refer to the dataset to prepare your data.

Training Command

For multi-GPU training on a single machine, you can use the following command:

python -m torch.distributed.launch --nproc_per_node=8 main.py --name v0 --base configs/instruct_diffusion.yaml --train --logdir logs/instruct_diffusion

For multi-GPU training on multiple machines, you can use the following command (assuming 6 machines as an example):

bash run_multinode.sh instruct_diffusion v0 6

Convert EMA-Model

You can get the final EMA checkpoint for inference using the command below:

python convert_ckpt.py --ema-ckpt logs/instruct_diffusion/checkpoint/ckpt_epoch_200/state.pth --out-ckpt checkpoints/v1-5-pruned-emaonly-adaption-task.ckpt

Acknowledge

Thanks to

Citation

@article{Geng23instructdiff,
  author       = {Zigang Geng and
                  Binxin Yang and
                  Tiankai Hang and
                  Chen Li and
                  Shuyang Gu and
                  Ting Zhang and
                  Jianmin Bao and
                  Zheng Zhang and
                  Han Hu and
                  Dong Chen and
                  Baining Guo},
  title        = {InstructDiffusion: {A} Generalist Modeling Interface for Vision Tasks},
  journal      = {CoRR},
  volume       = {abs/2309.03895},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2309.03895},
  doi          = {10.48550/arXiv.2309.03895},
}

instructdiffusion's People

Contributors

gengzigang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.