Giter VIP home page Giter VIP logo

stable-diffusion-playground's Introduction

⛹️‍♀️:basketball: Stable-Diffusion-Playground :soccer:⛹️

License: MIT

An application that generates images or videos using Stable Diffusion models.

Description 📜

What is the term "diffusion"?

From Wikipedia, "Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration."

Similar to the definition, diffusion models apply noise to an image sequentially across multiple steps in forward pass. This essentially diffuses the pixels. In the backward pass, the noisy image is denoised across same steps. Since it is a sequential process, there is less chance of mode collapse (a problem with GANs) to occur.

Most diffusion models use UNet architecture to preserve the dimensionality of the image. Usually, diffusion models apply diffusion in pixel space, but stable diffusion models apply diffusion in latent space. Hence, the term "Latent diffusion model (LDM)". The conversion between pixel space to latent space is done using Encoder and Decoder. This method is memory efficient compared to previous methods, and also produces highly detailed image.

Read through the paper for more details. Big-ups to the researchers/creators for the work and for open-sourcing it.

General Requirements 🧙‍♂️

  • Atleast 6GB of VRAM is required to generate a single 512x512 image.
  • For better image generation, use descriptive and detailed prompt.

Code Requirements 🧙‍♀️

Use Python 3.8.13. Setup conda environment, git clone repo and run the below commands,

pip install -r requirements.txt
python setup.py
mkdir models
mkdir pretrained
cd animation_mode
python setup.py
cd ..

How to run 🏃‍♂️

Command line arguments:

Argument Requirement Default Choices Description
--mode / -m True - "txt2img", "img2img", "inpaint", "dream", "animate" Mode of application.
--local / -l False False True / False If argument is provided, use local model files. Else download from hugging face.
--device / -d False "cpu" "cpu", "gpu" Run on target device.
--num / -n False 1 integer number Number of images to generate.
--save / -s False False True / False If argument is provided, save generated images.
--limit / -limit False True True / False If argument is provided, limit memory usage.

There are five different modes of running the application,

  • Text to Image (txt2img)
  • Image to Image (img2img)
  • Inpaint (inpaint)
  • Dream (dream)
  • Animate (animate) - sub-modes: 2D, 3D, Video Input

Mode: Text to Image

python run.py --mode txt2img --device gpu --save

Mode: Image to Image

python run.py --mode img2img --device gpu --save

Mode: Inpaint

python run.py --mode inpaint --device gpu --save

Mode: Dream

python run.py --mode dream --device gpu --save --num <number of frames>

Mode: Animate

python run.py --mode animate --device gpu --save

Note:

  • For each of the modes, run the command and follow the cli to provide hugging face user token, prompt and size (Height, Width) of image.
  • Generated images or video will be saved to $PWD/images dir. For animate mode, video will be saved to $PWD/out_video dir.
  • Single 512x512 image generation takes ~12 seconds on NVIDIA GeForce RTX 3060 with 6GB VRAM.
  • Dream mode will generate --num image frames based on input prompt, and create a video.
  • Image to Image mode will generate new image from initial image and input prompt. Inpaint mode will generate the masked part of image from initial image, mask image and input prompt. The strength input in CLI will indicate the amount of change from initial image. In range [0, 1]; with 0 indicating no change and 1 indicating complete change from original image.

Hugging face Access Token:

  • Create an account in huggingface.co. Go to Settings -> Access Tokens. Create an access token with read permission.

How to use Animate mode 🖌️

This implemetation is an optimized version of DeforumStableDiffusionLocal and Deforum_Stable_Diffusion.ipynb. Thanks for their work.

Animate mode is quite different from the other modes of the app. Animate mode can generate "2D" or "3D" videos from input prompts. Also, it can perform Video-to-Video conversion of a "Video Input" based on input prompts.

To use this mode, follow the below steps,

Requirements

Clone the repo, and run the following cmds,

pip install -r requirements.txt
python setup.py
mkdir models
mkdir pretrained
cd animation_mode
python setup.py
cd ..

Next, manually download the models,

Animate mode uses configurations specified in ./animation_mode/config.py. Specify the configurations for video generation in this file. Refer animation_mode/README.md for details on parameters usage in config.py.

Run command

python run.py --mode animate --save

Generated video will be saved to ./out_video dir.

Results 📊

Text to Image

python run.py --mode txt2img --device gpu --num 1 --limit --save

Image to Image

python run.py --mode img2img --device gpu --num 1 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter initial image path: flower.png

Enter prompt: beautiful red flower, vibrant, realistic, smooth, bokeh, highly detailed, 4k

Enter strength in [0, 1] range: 0.8

Running Image to Image generation...

Inpaint

python run.py --mode inpaint --device gpu --num 1 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter initial image path: rose.png

Enter mask image path: mask_rose.png

Enter prompt: beautiful blue butterfly on a rose, glossy, detailed, sharp, 4k

Enter strength in [0, 1] range: 0.8

Running Inpaint...
Initial image Mask Inpainted image

Dream

python run.py --mode dream --device gpu --num 780 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter prompt: highly detailed bowl of lucrative ramen, stephen bliss, unreal engine, fantasy art by greg rutkowski, loish, rhads and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, detailed and intricate environment

Enter height and width of image: 512 512

Dreaming...
ramen.mp4

Animate

2D 3D
TODO boat_in_storm

References 📄

Happy Learning! 😄

stable-diffusion-playground's People

Contributors

logeswaran123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stable-diffusion-playground's Issues

Prompt bar down for at least 12 hours

The prompt bar on Stable Diffusion Playground has failed to load for the last 24 hours. In the past, this issue has been resolved after loading it a few times but this has been consistent. I have refreshed the page approximately 300 times and no change.
Screenshot 2023-08-24 at 11 57 20 AM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.