Giter VIP home page Giter VIP logo

omg's Introduction

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

Zhe Kong · Yong Zhang* · Tianyu Yang · Tao Wang· Kaihao Zhang

Bizhu Wu · Guanying Chen · Wei Liu · Wenhan Luo*

*Corresponding Authors

GitHub

TL; DR: OMG is a framework for multi-concept image generation, supporting character and style LoRAs on Civitai.com. It also can be combined with InstantID for multiple IDs with using a single image for each ID.

Introduction of OMG: A tool for high-quality multi-character image generation.

IMAGE ALT TEXT

Trailor Demo: A short trailor "Home Defense" created by using OMG + SVD.

IMAGE ALT TEXT

🏷️ Change Log

🔆 Introduction

1. OMG + LoRA (ID with multiple images)

2. OMG + InstantID (ID with single image)

3. OMG + ControlNet (Layout Control )

4. OMG + style LoRAs (Style Control)

🔧 Dependencies and Installation

  1. The code requires python==3.10.6, as well as pytorch==2.0.1 and torchvision==0.15.2. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
conda create -n OMG python=3.10.6
conda activate OMG
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/segment-anything.git
  1. For Visual comprehension, you can choose YoloWorld + EfficientViT SAM or GroundingDINO + SAM
    1. (Recommend) YoloWorld + EfficientViT SAM:
pip install inference[yolo-world]==0.9.13
pip install  onnxsim==0.4.35
    1. (Optional) If you can not install inference[yolo-world]. You can install GroundingDINO for visual comprehension.

GroundingDINO requires manual installation.

Run this so the environment variable will be set under current shell.

export CUDA_HOME=/path/to/cuda-11.3

In this example, /path/to/cuda-11.3 should be replaced with the path where your CUDA toolkit is installed.

git clone https://github.com/IDEA-Research/GroundingDINO.git

cd GroundingDINO/

pip install -e .

More installation details can be found in GroundingDINO

⏬ Pretrained Model Preparation

1) DownLoad Models

1. Required download:

Download stable-diffusion-xl-base-1.0, controlnet-openpose-sdxl-1.0.

For InstantID + OMG download: InstantID, antelopev2,

2. For Visual comprehension, you can choose "YoloWorld + EfficientViT SAM" or "GroundingDINO + SAM".

For YoloWorld + EfficientViT SAM: EfficientViT-SAM-XL1, yolo-world.

For GroundingDINO + SAM: GroundingDINO, SAM.

3. For Character LoRAs, download at least one character for man and another character for woman.

For Character LoRAs for man: Chris Evans, Gleb Savchenko, Harry Potter, Jordan Torres.

For Character LoRAs for woman: Taylor Swift, Jennifer Lawrence, Hermione Granger, Keira Knightley.

4. (Optional) If using ControlNet, download:

ControlNet, controlnet-canny-sdxl-1.0, controlnet-depth-sdxl-1.0, dpt-hybrid-midas.

5. (Optional) If using Style LoRAs, download:

Anime Sketch Style, Oil Painting Style, Cinematic Photography Style.

2) Preparation

Put the models under checkpoint as follow:

OMG
├── checkpoint
│   ├── antelopev2
│   ├── ControlNet
│   ├── controlnet-openpose-sdxl-1.0
│   ├── controlnet-canny-sdxl-1.0
│   ├── controlnet-depth-sdxl-1.0
│   ├── dpt-hybrid-midas
│   ├── style
│   │   ├── EldritchPaletteKnife.safetensors
│   │   ├── Cinematic Hollywood Film.safetensors
│   │   └── Anime_Sketch_SDXL.safetensors
│   ├── InstantID
│   ├── GroundingDINO
│   ├── lora
│   │   ├── chris-evans.safetensors
│   │   ├── Harry_Potter.safetensors
│   │   ├── Hermione_Granger.safetensors
│   │   ├── jordan_torres_v2_xl.safetensors
│   │   ├── keira_lora_sdxl_v1-000008.safetensors
│   │   ├── lawrence_dh128_v1-step00012000.safetensors
│   │   ├── Gleb-Savchenko_Liam-Hemsworth.safetensors
│   │   └── TaylorSwiftSDXL.safetensors
│   ├── sam
│   │   ├── sam_vit_h_4b8939.pth
│   │   └── xl1.pt
│   └── stable-diffusion-xl-base-1.0
├── gradio_demo
├── src
├── inference_instantid.py
└── inference_lora.py

Put ViT-B-32.pt (download from openai) to ~/.cache/clip/ViT-B-32.pt. If using YoloWorld, put yolo-world.pt to /tmp/cache/yolo_world/l/yolo-world.pt.

Or you can manually set the checkpoint path as follows:

python inference_lora.py  \
--pretrained_sdxl_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_checkpoint <path to controlnet-openpose-sdxl-1.0> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--lora_path <Lora path to character1|Lora path to character1> \
--style_lora <Path to style LoRA>

For OMG + InstantID:

python inference_instantid.py  \
--pretrained_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_path <path to InstantID controlnet> \
--face_adapter_path <path to InstantID face adapter> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--antelopev2_path <path to antelopev2> \
--style_lora <Path to style LoRA>

💻 Usage

1: OMG + LoRA

The <TOK> for Harry_Potter.safetensors is Harry Potter and for Hermione_Granger.safetensors is Hermione Granger.

For visual comprehension, you can set --segment_type 'yoloworld' for YoloWorld + EfficientViT SAM, or --segment_type 'GroundingDINO' for GroundingDINO + SAM.

python inference_lora.py \
    --prompt <prompt for the two person> \
    --negative_prompt <negative prompt> \
    --prompt_rewrite "[<prompt for person 1>]-*-[<negative prompt>]|[<prompt for person 2>]-*-[negative prompt]" \
    --lora_path "[<Lora path for character1|Lora path for character1>]"

For example:

python inference_lora.py \
    --prompt "Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed." \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the Chris Evans in surprised expressions as he wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]|[Close-up photo of the TaylorSwift in surprised expressions as she wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]' \
    --lora_path './checkpoint/lora/chris-evans.safetensors|./checkpoint/lora/TaylorSwiftSDXL.safetensors'

For OMG + LoRA + ControlNet:

python inference_lora.py \
    --prompt "Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed." \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the Chris Evans in surprised expressions as he wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]|[Close-up photo of the TaylorSwift in surprised expressions as she wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]' \
    --lora_path './checkpoint/lora/chris-evans.safetensors|./checkpoint/lora/TaylorSwiftSDXL.safetensors' \
    --spatial_condition './example/pose.png' \
    --controlnet_checkpoint './checkpoint/controlnet-openpose-sdxl-1.0'

For OMG + LoRA + Style:

python inference_lora.py \
    --prompt "Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed, Pencil_Sketch:1.2, messy lines, greyscale, traditional media, sketch." \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the Chris Evans in surprised expressions as he wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed, Pencil_Sketch:1.2, messy lines, greyscale, traditional media, sketch.]-*-[noisy, blurry, soft, deformed, ugly]|[Close-up photo of the TaylorSwift in surprised expressions as she wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed, Pencil_Sketch:1.2, messy lines, greyscale, traditional media, sketch.]-*-[noisy, blurry, soft, deformed, ugly]' \
    --lora_path './checkpoint/lora/chris-evans.safetensors|./checkpoint/lora/TaylorSwiftSDXL.safetensors' \
    --style_lora './checkpoint/style/Anime_Sketch_SDXL.safetensors' 

2: OMG + InstantID

python inference_instantid.py \
    --prompt <prompt for the two person> \
    --negative_prompt <negative prompt> \
    --prompt_rewrite "[<prompt for person 1>]-*-[<negative prompt>]-*-<path to reference image1>|[<prompt for person 2>]-*-[negative prompt]-*-<path to reference image2>"

For example:

python inference_instantid.py \
    --prompt 'Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed.' \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the a man, 35mm photograph, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]-*-./example/chris-evans.jpg|[Close-up photo of the a woman, 35mm photograph, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]-*-./example/TaylorSwift.png'

3. Local gradio demo with OMG + LoRA

If you choose YoloWorld + EfficientViT SAM:

python gradio_demo/app.py --segment_type yoloworld

For GroundingDINO + SAM:

python gradio_demo/app.py --segment_type GroundingDINO

Connect to the public URL displayed after the startup process is completed.

omg's People

Contributors

kongzhecn avatar yzhang2016 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.