Giter VIP home page Giter VIP logo

joaolages / diffusers-interpret Goto Github PK

View Code? Open in Web Editor NEW
259.0 6.0 14.0 79.4 MB

Diffusers-Interpret ๐Ÿค—๐Ÿงจ๐Ÿ•ต๏ธโ€โ™€๏ธ: Model explainability for ๐Ÿค— Diffusers. Get explanations for your generated images.

License: MIT License

Jupyter Notebook 99.72% Python 0.24% CSS 0.01% HTML 0.01% JavaScript 0.02%
computer-vision deep-learning diffusers diffusion explainable-ai image-generation interpretability model-explainability pytorch text2image

diffusers-interpret's Introduction

Diffusers-Interpret ๐Ÿค—๐Ÿงจ๐Ÿ•ต๏ธโ€โ™€๏ธ

PyPI Latest Package Version GitHub License

diffusers-interpret is a model explainability tool built on top of ๐Ÿค— Diffusers

Installation

Install directly from PyPI:

pip install --upgrade diffusers-interpret

Usage

Let's see how we can interpret the new ๐ŸŽจ๐ŸŽจ๐ŸŽจ Stable Diffusion!

  1. Explanations for StableDiffusionPipeline
  2. Explanations for StableDiffusionImg2ImgPipeline
  3. Explanations for StableDiffusionInpaintPipeline

Explanations for StableDiffusionPipeline

Open In Collab

import torch
from diffusers import StableDiffusionPipeline
from diffusers_interpret import StableDiffusionPipelineExplainer

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True,
    revision='fp16',
    torch_dtype=torch.float16
).to('cuda')

# optional: reduce memory requirement with a speed trade off 
pipe.enable_attention_slicing()

# pass pipeline to the explainer class
explainer = StableDiffusionPipelineExplainer(pipe)

# generate an image with `explainer`
prompt = "A cute corgi with the Eiffel Tower in the background"
with torch.autocast('cuda'):
    output = explainer(
        prompt, 
        num_inference_steps=15
    )

If you are having GPU memory problems, try reducing n_last_diffusion_steps_to_consider_for_attributions, height, width and/or num_inference_steps.

output = explainer(
    prompt, 
    num_inference_steps=15,
    height=448,
    width=448,
    n_last_diffusion_steps_to_consider_for_attributions=5
)

You can completely deactivate token/pixel attributions computation by passing n_last_diffusion_steps_to_consider_for_attributions=0.

Gradient checkpointing also reduces GPU usage, but makes computations a bit slower:

explainer = StableDiffusionPipelineExplainer(pipe, gradient_checkpointing=True)

To see the final generated image:

output.image

You can also check all the images that the diffusion process generated at the end of each step:

output.all_images_during_generation.show()

To analyse how a token in the input prompt influenced the generation, you can study the token attribution scores:

>>> output.token_attributions # (token, attribution)
[('a', 1063.0526),
 ('cute', 415.62888),
 ('corgi', 6430.694),
 ('with', 1874.0208),
 ('the', 1223.2847),
 ('eiffel', 4756.4556),
 ('tower', 4490.699),
 ('in', 2463.1294),
 ('the', 655.4624),
 ('background', 3997.9395)]

Or their computed normalized version, in percentage:

>>> output.token_attributions.normalized # (token, attribution_percentage)
[('a', 3.884),
 ('cute', 1.519),
 ('corgi', 23.495),
 ('with', 6.847),
 ('the', 4.469),
 ('eiffel', 17.378),
 ('tower', 16.407),
 ('in', 8.999),
 ('the', 2.395),
 ('background', 14.607)]

Or plot them!

output.token_attributions.plot(normalize=True)

diffusers-interpret also computes these token/pixel attributions for generating a particular part of the image.

To do that, call explainer with a particular 2D bounding box defined in explanation_2d_bounding_box:

with torch.autocast('cuda'):
    output = explainer(
        prompt, 
        num_inference_steps=15, 
        explanation_2d_bounding_box=((70, 180), (400, 435)), # (upper left corner, bottom right corner)
    )
output.image

The generated image now has a red bounding box to indicate the region of the image that is being explained.

The attributions are now computed only for the area specified in the image.

>>> output.token_attributions.normalized # (token, attribution_percentage)
[('a', 1.891),
 ('cute', 1.344),
 ('corgi', 23.115),
 ('with', 11.995),
 ('the', 7.981),
 ('eiffel', 5.162),
 ('tower', 11.603),
 ('in', 11.99),
 ('the', 1.87),
 ('background', 23.05)]

Explanations for StableDiffusionImg2ImgPipeline

Open In Collab

import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
from diffusers_interpret import StableDiffusionImg2ImgPipelineExplainer


pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True,
).to('cuda')

explainer = StableDiffusionImg2ImgPipelineExplainer(pipe)

prompt = "A fantasy landscape, trending on artstation"

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((448, 448))

with torch.autocast('cuda'):
    output = explainer(
        prompt=prompt, init_image=init_image, strength=0.75
    )

output will have all the properties that were presented for StableDiffusionPipeline. For example, to see the gif version of all the images during generation:

output.all_images_during_generation.gif()

Additionally, it is also possible to visualize pixel attributions of the input image as a saliency map:

output.input_saliency_map.show()

or access their values directly:

>>> output.pixel_attributions
array([[ 1.2714844 ,  4.15625   ,  7.8203125 , ...,  2.7753906 ,
         2.1308594 ,  0.66552734],
       [ 5.5078125 , 11.1953125 ,  4.8125    , ...,  5.6367188 ,
         6.8828125 ,  3.0136719 ],
       ...,
       [ 0.21386719,  1.8867188 ,  2.2109375 , ...,  3.0859375 ,
         2.7421875 ,  0.7871094 ],
       [ 0.85791016,  0.6694336 ,  1.71875   , ...,  3.8496094 ,
         1.4589844 ,  0.5727539 ]], dtype=float32)

or the normalized version:

>>> output.pixel_attributions.normalized 
array([[7.16054201e-05, 2.34065039e-04, 4.40411852e-04, ...,
        1.56300011e-04, 1.20002325e-04, 3.74801020e-05],
       [3.10180156e-04, 6.30479713e-04, 2.71022669e-04, ...,
        3.17439699e-04, 3.87615233e-04, 1.69719147e-04],
       ...,
       [1.20442292e-05, 1.06253210e-04, 1.24512037e-04, ...,
        1.73788882e-04, 1.54430119e-04, 4.43271674e-05],
       [4.83144104e-05, 3.77000870e-05, 9.67938031e-05, ...,
        2.16796136e-04, 8.21647482e-05, 3.22554370e-05]], dtype=float32)

Note: Passing explanation_2d_bounding_box to the explainer will also change these values to explain a specific part of the output image. The attributions are always calculated for the model's input (image and text) with respect to the output image.

Explanations for StableDiffusionInpaintPipeline

Open In Collab

Same as StableDiffusionImg2ImgPipeline, but now we also pass a mask_image argument to explainer.

import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionInpaintPipeline
from diffusers_interpret import StableDiffusionInpaintPipelineExplainer


def download_image(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")


pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True,
).to('cuda')

explainer = StableDiffusionInpaintPipelineExplainer(pipe)

prompt = "a cat sitting on a bench"

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((448, 448))
mask_image = download_image(mask_url).resize((448, 448))

with torch.autocast('cuda'):
    output = explainer(
        prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75
    )

output will have all the properties that were presented for StableDiffusionImg2ImgPipeline and StableDiffusionPipeline.
For example, to see the gif version of all the images during generation:

output.all_images_during_generation.gif()

The only difference in output now, is that we can now see the masked part of the image:

output.input_saliency_map.show()

Check other functionalities and more implementation examples in here.

Future Development

  • Add interactive display of all the images that were generated in the diffusion process
  • Add explainer for StableDiffusionImg2ImgPipeline
  • Add explainer for StableDiffusionInpaintPipeline
  • Add attentions visualization
  • Add unit tests
  • Website for documentation
  • Do not require another generation every time the explanation_2d_bounding_box argument is changed
  • Add interactive bounding-box and token attributions visualization
  • Add more explainability methods

Contributing

Feel free to open an Issue or create a Pull Request and let's get started ๐Ÿš€

Credits

A special thanks to:

diffusers-interpret's People

Contributors

andrewizbatista avatar joaolages avatar tompham97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

diffusers-interpret's Issues

Website for documentation

What would you like your documentation website to look like? I would like to help creating and/or maintaining it.
I have some experience with Github Pages built via Jekyll. If you want something similar, would you mind creating a gh-pages branch and choose a Jekyll theme of choice?

cannot import name '_png' from 'matplotlib'

I received this error, cannot import name '_png' from 'matplotlib' when trying to run the new plot command when using Google Colab.

It seems Colab has an older version of matplotlib running - 3.2.2
If I forced an install of matplotlib, pip install -U matplotlib and restarted the runtime, I could get the plot to work correctly.

Plot is helpful!

diffusers=0.3.0 no longer loads models correctly

Hi, I added it as a comment in previous issue, but I think it's nice to raise it as a separate ticket - an issue there was it'd be nice to update diffusers package to newer version, but nowadays I am unable even to run colab demo due to this error.

I wanted to follow up as presented demo colab is not working at all. When downloading StableDiffusion pipeline I got error

TypeError: getattr(): attribute name must be string

I found an error about it on SO: https://stackoverflow.com/questions/74687769/typeerror-getattr-attribute-name-must-be-string-in-pytorch-diffusers-how

When I force to use 0.24, I got error on importing diffusers_interpret:
ImportError: cannot import name 'preprocess_mask' from 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint' (/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py)

So I tried to install it with pip again but it downloads diffusers back to 0.3.0

So I can create a pipe for stable diffusion (requires diffusers > 0.4) OR I can use diffusers_interpret (requires 0.3).
Both of them does not work together in demo colab meaning I cannot reproduce your output

Update diffusers-interpret to work with the latest diffusers package (0.8.0)

When using diffusers-interpret with the latest diffusers (0.8.0, yes, I need this version because I use Euler discrete scheduler), it gives the following error:

ImportError: cannot import name 'preprocess_mask' from 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint' (/usr/local/lib/python3.7/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py)

Can it be fixed to work with it? 0.3.0 is now very outdated.
Thanks!

StableDiffusionPipelineExplainer enable_attention_slicing() and limit token attribution

Version 0.3.0 of the ๐Ÿค— Difusers introduces enable_attention_slicing, and I wonder if there's a way to implement this in the explainer. Below is the code that I used and it ran out of CUDA memory:

# Import pipeline
import torch
from diffusers import StableDiffusionPipeline

torch_device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    use_auth_token = True,
    revision = "fp16" if torch_device != "cpu" else None,
    torch_dtype = torch.float16 if torch_device != "cpu" else None)

pipe.to(torch_device)

pipe.enable_attention_slicing() # attention optimization for less memory usage

# Pass pipeline to the explainer class

from diffusers_interpret import StableDiffusionPipelineExplainer

explainer = StableDiffusionPipelineExplainer(pipe)

prompt = "photograph, piggy, corn salad"

with torch.autocast(torch_device):
    output = explainer(prompt,
                       guidance_scale=7.5,
                       num_inference_steps=17)

output.image

The Colab Notebook cannot be opened

Notebook loading error
There was an error loading this notebook. Ensure that the file is accessible and try again.

Invalid Credentials

Problably the notebook it is not set to public.

TypeError

When I try the following code snippet from your notebook I get a TypeError

prompt = "A cute corgi with the Eiffel Tower in the background"

generator = torch.Generator(device).manual_seed(2022)
with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt, 
        num_inference_steps=15, 
        generator=generator
    )

TypeError: '>' not supported between instances of 'NoneType' and 'int'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.