Giter VIP home page Giter VIP logo

vqgan-clip-app's Introduction

VQGAN-CLIP web app & CLIP guided diffusion web app

LGTM Grade License

Link to repo: tnwei/vqgan-clip-app.

Intro to VQGAN-CLIP

VQGAN-CLIP has been in vogue for generating art using deep learning. Searching the r/deepdream subreddit for VQGAN-CLIP yields quite a number of results. Basically, VQGAN can generate pretty high fidelity images, while CLIP can produce relevant captions for images. Combined, VQGAN-CLIP can take prompts from human input, and iterate to generate images that fit the prompts.

Thanks to the generosity of creators sharing notebooks on Google Colab, the VQGAN-CLIP technique has seen widespread circulation. However, for regular usage across multiple sessions, I prefer a local setup that can be started up rapidly. Thus, this simple Streamlit app for generating VQGAN-CLIP images on a local environment. Screenshot of the UI as below:

Screenshot of the UI

Be advised that you need a beefy GPU with lots of VRAM to generate images large enough to be interesting. (Hello Quadro owners!). For reference, an RTX2060 can barely manage a 300x300 image. Otherwise you are best served using the notebooks on Colab.

Reference is this Colab notebook originally by Katherine Crowson. The notebook can also be found in this repo hosted by EleutherAI.

Intro to CLIP guided diffusion

In mid 2021, Open AI released Diffusion Models Beat GANS on Image Synthesis, with corresponding source code and model checkpoints released on github. The cadre of people that brought us VQGAN-CLIP worked their magic, and shared CLIP guided diffusion notebooks for public use. CLIP guided diffusion uses more GPU VRAM, runs slower, and has fixed output sizes depending on the trained model checkpoints, but is capable of producing more breathtaking images.

Here's a few examples using the prompt "Flowery fragrance intertwined with the freshness of the ocean breeze by Greg Rutkowski", run on the 512x512 HQ Uncond model:

Example output for CLIP guided diffusion

The implementation of CLIP guided diffusion in this repo is based on notebooks from the same EleutherAI/vqgan-clip repo.

Setup

  1. Install the required Python libraries. Using conda, run conda env create -f environment.yml
  2. Git clone this repo. After that, cd into the repo and run:
    • git clone https://github.com/CompVis/taming-transformers (Update to pip install if either of these two PRs are merged)
    • git clone https://github.com/crowsonkb/guided-diffusion (Update to pip install if this PR is merged)
  3. Download the pretrained weights and config files using the provided links in the files listed below. Note that that all of the links are commented out by default. Recommend to download one by one, as some of the downloads can take a while.
    • For VQGAN-CLIP: download-weights.sh. You'll want to at least have both the ImageNet weights, which are used in the reference notebook.
    • For CLIP guided diffusion: download-diffusion-weights.sh.

Usage

  • VQGAN-CLIP: streamlit run app.py, launches web app on localhost:8501 if available
  • CLIP guided diffusion: streamlit run diffusion_app.py, launches web app on localhost:8501 if available
  • Image gallery: python gallery.py, launches a gallery viewer on localhost:5000. More on this below.

In the web app, select settings on the sidebar, key in the text prompt, and click run to generate images using VQGAN-CLIP. When done, the web app will display the output image as well as a video compilation showing progression of image generation. You can save them directly through the browser's right-click menu.

A one-time download of additional pre-trained weights will occur before generating the first image. Might take a few minutes depending on your internet connection.

If you have multiple GPUs, specify the GPU you want to use by adding -- --gpu X. An extra double dash is required to bypass Streamlit argument parsing. Example commands:

# Use 2nd GPU
streamlit run app.py -- --gpu 1

# Use 3rd GPU
streamlit run diffusion_app.py -- --gpu 2

See: tips and tricks

Output and gallery viewer

Each run's metadata and output is saved to the output/ directory, organized into subfolders named using the timestamp when a run is launched, as well as a unique run ID. Example output dir:

$ tree output
├── 20210920T232927-vTf6Aot6
│   ├── anim.mp4
│   ├── details.json
│   └── output.PNG
└── 20210920T232935-9TJ9YusD
    ├── anim.mp4
    ├── details.json
    └── output.PNG

The gallery viewer reads from output/ and visualizes previous runs together with saved metadata.

Screenshot of the gallery viewer

If the details are too much, call python gallery.py --kiosk instead to only show the images and their prompts.

More details

vqgan-clip-app's People

Contributors

tnwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vqgan-clip-app's Issues

gitpython detected but not able to write commit SHA to file

I've finally got this up and running, after dealing with a ton of PyTorch issues. but I keep getting the error "gitpython detected but not able to write commit SHA to file." It doesn't seem to matter as far as I can tell, everything gets correctly written to /output but I just thought I'd see if anyone had a fix or if a fix is even necessary.

Gallery viewer slows down over time

Current behaviour of gallery.py reindexes all saved outputs before serving each page. The intent is to allow a simple page refresh to suffice for new saved outputs to show up in the gallery viewer. Current logic does not scale well w/ number of saved outputs however; at ~500 total images saved, reindexing saved outputs before serving each page takes ~5s.

tensor is not a torch image???

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8503
Network URL: http://192.168.2.43:8503

Using device: None
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from assets/vqgan_imagenet_f16_1024.ckpt
c:\temp\vqgan-clip-app-main\venv\lib\site-packages\torch\functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Tri
ggered internally at ..\aten\src\ATen\native\TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2022-03-08 21:40:04.481 Traceback (most recent call last):
File "c:\temp\vqgan-clip-app-main\venv\lib\site-packages\streamlit\script_runner.py", line 430, in _run_script
exec(code, module.dict)
File "C:\Temp\vqgan-clip-app-main\app.py", line 641, in
generate_image(
File "C:\Temp\vqgan-clip-app-main\app.py", line 150, in generate_image
_, im = run.iterate()
File "C:/Temp/vqgan-clip-app-main\logic.py", line 323, in iterate
losses = self._ascend_txt()
File "C:/Temp/vqgan-clip-app-main\logic.py", line 278, in _ascend_txt
self.normalize(self.make_cutouts(out))
File "c:\temp\vqgan-clip-app-main\venv\lib\site-packages\torchvision\transforms\transforms.py", line 163, in call
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "c:\temp\vqgan-clip-app-main\venv\lib\site-packages\torchvision\transforms\functional.py", line 201, in normalize
raise TypeError('tensor is not a torch image.')
TypeError: tensor is not a torch image.

Replace streamlit progress bar for stqdm progress bar.

Hi there, I created recently a PR (#20) with some small QOL improvements and after that I decided to continue improving the code a bit so it was easier to use and more comfortable but got stuck trying to replace the Streamlit progress bar (st.progress()) with the STQDM alternative, I's probably something simple to do but my brain just gave out, I would appreciate if someone could help me figure out how to do it.

The reason I want to replace the streamlit progress bar with stqdm is that it provides some extra features that will most likely be helpful when trying to figure out how to optimize things for speed vs quality. For example, here is a simple view comparison of how the progress bar of STQDM looks:
image

And here is what the native Streamlit progress bar looks like:
image

As you can probably see there is a bit more information on the STQDM progress bar, things like the current progress and and how much is still left, the time it has been running for, the time remaining, but the thing I find more important than all that is the speed in iterations per seconds, this alone can help figuring out how each option affects the generation process and it can help finding out what option to change to increase the generation speed.

Newbie question: AttributeError: module 'clip' has no attribute 'load'

Hello, I did the setup but when I try to use it I am getting an error. I think I messed up the setup at some point.

Local URL: http://localhost:8501
Network URL: http://192.168.1.146:8501

Using device: cuda:0
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from assets/vqgan_imagenet_f16_16384.ckpt
2021-09-27 10:06:08.915 Traceback (most recent call last):
File "c:\users-\anaconda3\lib\site-packages\streamlit\script_runner.py", line 354, in _run_script
exec(code, module.dict)
File "C:\Users-\CLIP-Guided-Diffusion\app.py", line 358, in
generate_image(
File "C:\Users-\CLIP-Guided-Diffusion\app.py", line 69, in generate_image
st.session_state["model"], st.session_state["perceptor"] = run.load_model()
File "C:\Users-\CLIP-Guided-Diffusion\logic.py", line 128, in load_model
clip.load(self.args.clip_model, jit=False)[0]
AttributeError: module 'clip' has no attribute 'load'

Requirements

Hello,

I installed all and took requirement libs from EleutherAI/vqgan-clip repo. I can get imagenet, wiki and coco dataset working but others will not work. Are those libs up to date? Thus having error

ModuleNotFoundError: No module named 'taming.modules.misc'

Using -1 on the Num steps field does not save anything on stop.

It seems that when using the value -1 on the Num steps on the app it will not save anything once you hit the stop button, according to the tooltip it should save the image generated so far as well as the animation but nothing happens once you stop the app and it will just lose anything you have generated so far, because of this behavior the -1 value is technically unusable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.