jack000 / glid-3-xl-stable Goto Github PK

View Code? Open in Web Editor NEW

290.0 290.0 36.0 1.08 MB

stable diffusion training

License: MIT License

Python 99.45% Shell 0.55%

glid-3-xl-stable's People

Contributors

Stargazers

Watchers

glid-3-xl-stable's Issues

how to avoid CUDA out of memory?

All of the training scripts specified in the README give errors like the following:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CUDA Version: 11.7
using 8x Tesla V100-SXM2 (with 16GB memory)

reducing --batch_size 32 didn't help
passing --microbatch 1 didn't help

bert for inpainting?

why does inpaint training use bert encoder?

qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.

已放弃 (核心已转储)

The advantage 👍 of this network arch

Seems more vram efficient than original LDM/SD,
On colab freetier T4, this can work with [1,4,104,112] latent (832x896 image) without cuda OOM,
while the original can only work with [1,4,88,96] (704x768). Both under fp16.

The issues I encountered are:
Without re-train, clip_proj is empty, and image_embed seems must be None. (otherwise some conv error.)
So is it possible to use image_embed without re-train?

Orig LDM/SD has 6 other samplers from k-diffusion. You can see the minimal (zero extra dependency) ripoff of k-diffusion on my notebook:
https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb
(My ripoff also get sigmas_karras and eta (ddim_eta) works unlike all other k-diffusion copypastas.)

Will this network arch get more samplers than plms&ddim in the future?

Also did you try JIT (torch.jit.trace()) on this network arch? JIT can help checking is there some weird pythonic things in the code.
I followed Ailia's instructions axinc-ai/ailia-models#830 ,
turned Orig LDM/SD into jit (the notebook above is it), wonder if this arch can also be JIT'd.

Color shifting & slight changes to unmasked area when outpainting

Seeing some strange shifts to the unmasked (not supposed to be edited) region of images when outpainting:

I started with the left image (512x512), and extended to the right with a mask preserving the original image. However, the preserved section is changed, as you can see in the image above. I repeated this to extend further right, and the same thing happened. Each time the image seems to get a little darker, and on close inspection, the fine details seem sharper.

The image being passed into do_run() is unchanged from the original (I saved a copy just before inference to be sure).

Any ideas how to fix this?

Does it works with cpu too?

Hi, has anyone ever tried to train with cpu?
i know it will be super slow but im tried for the fun of it

i currently disabled my gpu by setting this line in image_train_stable.py
torch.cuda.is_available = lambda : False

Traceback (most recent call last):
File "scripts\image_train_stable.py", line 157, in
main()
File "scripts\image_train_stable.py", line 85, in main
TrainLoop(
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 194, in run_loop
self.run_step(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 208, in run_step
self.forward_backward(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 236, in forward_backward
losses = compute_losses()
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 96, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\gaussian_diffusion.py", line 1137, in training_losses
model_output = model(x_t, self._scale_timesteps(t), **model_kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 133, in call
return self.model(x, new_ts, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 880, in forward
h = module(h, emb, context)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 217, in forward
x = layer(x)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

sorry for bothering with useless question but am i doing something wrong?
thanks

edit:
nevermind
i removed both .half() from the image_train_stable.py and deleted --use_fp16 from the training arguments

this way i was able to train on cpu

Will you implement textual inversion and DreamBooth ?

Hi, Amazing job you're doing !
Do you plan on implementing textual inversion and/or DreamBooth ?

Suggestion - Using Hivemind for distributed training

Could it be possible to use Hivemind to distribute the compute?
https://github.com/learning-at-home/hivemind
Also is there a way to lower the vram usage?

Colab notebook

Hello! Thanks for this amazing work. Astonished by these beautiful results!

I've created a colab notebook that makes it easier for people to experiment.

Thought you might want to add it to the readme :)

https://colab.research.google.com/drive/1tKUTU7hhPsFlHAYsENRfiB2vy5pNrPh5?usp=sharing

https://colab.research.google.com/drive/177E0DpVK1YOfN5zOelyElerWyYjcuZ7G?usp=sharing

How to use merge.py?

I've been finetuning an SD model and training seems to work OK.

But I'm stuck when merging the files back together for inference.

Which files in logs/ need to be combined to create a usable output model-merged.pt?

When I try python3 merge.py ../sdv14ema.ckpt logs/output015000.pt and use the resulting model-merged.pt for inference, I get this error:

$ python sample.py --model_path model-merged.pt --batch_size 3 --num_batches 3 --text "A Humvee in a ditch"
... lots of output trimmed ...
        Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "callbacks", "lr_schedulers".

I have tried with the ema-0.9999-015000.pt and opts015000.pt files and neither seem to contain the right keys.

What am I doing wrong?

cloud you please provide the config of inpaint.pt?

Inpainting sample.py script inpaints entire image

Hi Jack,

Great work with the training scripts. I'm trying to run your pretrained inpainting model with sample.py, but the output just inpaints the entire image rather than the specified region. In your wiki, you show some great examples of inpainting on the face. Do you have a script to produce those results?

Thanks again,

Jeff

Merging Two Models

Hello, thanks for your implementation. I've scoured the internet for merging multiple LDMs into one model, but there doesn't seem to be a viable solution as of yet. Would it be possible to do so?

For example, you might have one model finetuned on scenery, and one on animals. Would merging two be a viable solution, or would some form of transfer learning be a better alternative? I've tried merging state_dict between two models, but it expands so many dimensions that I don't think it would work properly during inference.

Any insight would be greatly appreciated!

Does this work with CLIP guidance for stable diffusion?

Wondering if this code base means we can use CLIP guidance for generation instead of the classifier free guidance in the regular model?

image_train_stable.py: error: unrecognized arguments: --lr_warmup_steps 10000

When executing train.sh, it outputs the following:

root@centro:/glid-3-xl-stable# ./train.sh
usage: image_train_stable.py [-h] [--data_dir DATA_DIR] [--schedule_sampler SCHEDULE_SAMPLER] [--lr LR] [--weight_decay WEIGHT_DECAY] [--lr_anneal_steps LR_ANNEAL_STEPS] [--batch_size BATCH_SIZE]
                             [--microbatch MICROBATCH] [--ema_rate EMA_RATE] [--log_interval LOG_INTERVAL] [--save_interval SAVE_INTERVAL] [--resume_checkpoint RESUME_CHECKPOINT]
                             [--use_fp16 USE_FP16] [--fp16_scale_growth FP16_SCALE_GROWTH] [--kl_model KL_MODEL] [--actual_image_size ACTUAL_IMAGE_SIZE] [--image_size IMAGE_SIZE]
                             [--num_channels NUM_CHANNELS] [--num_res_blocks NUM_RES_BLOCKS] [--num_heads NUM_HEADS] [--num_heads_upsample NUM_HEADS_UPSAMPLE] [--num_head_channels NUM_HEAD_CHANNELS]
                             [--attention_resolutions ATTENTION_RESOLUTIONS] [--channel_mult CHANNEL_MULT] [--dropout DROPOUT] [--class_cond CLASS_COND] [--use_checkpoint USE_CHECKPOINT]
                             [--use_scale_shift_norm USE_SCALE_SHIFT_NORM] [--resblock_updown RESBLOCK_UPDOWN] [--use_spatial_transformer USE_SPATIAL_TRANSFORMER] [--context_dim CONTEXT_DIM]
                             [--clip_embed_dim CLIP_EMBED_DIM] [--image_condition IMAGE_CONDITION] [--super_res_condition SUPER_RES_CONDITION] [--learn_sigma LEARN_SIGMA]
                             [--diffusion_steps DIFFUSION_STEPS] [--noise_schedule NOISE_SCHEDULE] [--timestep_respacing TIMESTEP_RESPACING] [--use_kl USE_KL] [--predict_xstart PREDICT_XSTART]
                             [--rescale_timesteps RESCALE_TIMESTEPS] [--rescale_learned_sigmas RESCALE_LEARNED_SIGMAS]
image_train_stable.py: error: unrecognized arguments: --lr_warmup_steps 10000
root@centro:/glid-3-xl-stable#

Is this expected? Should I just remove that argument?

Multi-GPU doesn't seem to make much of a difference

Not sure if I'm doing something wrong but training on single A100 vs 8xA100 mpiexec -N 8 doesn't seem to change the training speed even though nvidia-smi is showing all of the gpus in use.

When I compare trained models by both of these setups (identical step size) they don't seem to differ.

Is there some configuration option that I missed?

which dataset did you use for training inpaint?

cannot disable fp16

It seems like there is a bug when setting use_fp16 = False, the log is

Traceback (most recent call last):
  File "scripts/image_train_stable.py", line 150, in <module>
    main()
  File "scripts/image_train_stable.py", line 78, in main
    TrainLoop(
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 194, in run_loop
    self.run_step(batch, cond)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 208, in run_step
    self.forward_backward(batch, cond)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 236, in forward_backward
    losses = compute_losses()
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/respace.py", line 96, in training_losses
    return super().training_losses(self._wrap_model(model), *args, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/gaussian_diffusion.py", line 1137, in training_losses
    model_output = model(x_t, self._scale_timesteps(t), **model_kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/respace.py", line 133, in __call__
    return self.model(x, new_ts, **kwargs)
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 882, in forward
    h = module(h, emb, context)
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 217, in forward
    x = layer(x, context)
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 188, in forward
    x = block(x, context=context)
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 140, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/nn.py", line 162, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/nn.py", line 174, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 144, in _forward
    x = self.attn2(self.norm2(x), context=context) + x
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 112, in forward
    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
  File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/functional.py", line 327, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: expected scalar type Float but found Half

25+ Stable Diffusion Tutorials And Guides - Very Useful For Stable Diffusion Users - Not An Issue

Hello dear Jack Qiao, I hope you let this thread stay to help newcomers. This is not an issue thread. Thank you.

Expert-Level Tutorials on Stable Diffusion: Master Advanced Techniques and Strategies

Greetings everyone. I am Dr. Furkan Gözükara. I am an Assistant Professor in Software Engineering department of a private university (have PhD in Computer Engineering). My professional programming skill is unfortunately C# not Python :)

My linkedin : https://www.linkedin.com/in/furkangozukara/

Our channel address if you like to subscribe : https://www.youtube.com/@SECourses

Our discord to get more help : https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

I am keeping this list up-to-date. I got upcoming new awesome video ideas. Trying to find time to do that.

I am open to any criticism you have. I am constantly trying to improve the quality of my tutorial guide videos. Please leave comments with both your suggestions and what you would like to see in future videos.

All videos have manually fixed subtitles and properly prepared video chapters. You can watch with these perfect subtitles or look for the chapters you are interested in.

Since my profession is teaching, I usually do not skip any of the important parts. Therefore, you may find my videos a little bit longer.

Playlist link on YouTube: Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img

1.) Automatic1111 Web UI - PC - Free
Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer

2.) Automatic1111 Web UI - PC - Free
How to use Stable Diffusion V2.1 and Different Models in the Web UI - SD 1.5 vs 2.1 vs Anything V3

3.) Automatic1111 Web UI - PC - Free
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed

4.) Automatic1111 Web UI - PC - Free
DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI

5.) Automatic1111 Web UI - PC - Free
How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI

6.) Automatic1111 Web UI - PC - Free
How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1

7.) Automatic1111 Web UI - PC - Free
8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

8.) Automatic1111 Web UI - PC - Free
How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

9.) Automatic1111 Web UI - PC - Free
How To Generate Stunning Epic Text By Stable Diffusion AI - No Photoshop - For Free - Depth-To-Image

10.) Python Code - Hugging Face Diffusers Script - PC - Free
How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

11.) NMKD Stable Diffusion GUI - Open Source - PC - Free
Forget Photoshop - How To Transform Images With Text Prompts using InstructPix2Pix Model in NMKD GUI

12.) Google Colab Free - Cloud - No PC Is Required
Transform Your Selfie into a Stunning AI Avatar with Stable Diffusion - Better than Lensa for Free

13.) Google Colab Free - Cloud - No PC Is Required
Stable Diffusion Google Colab, Continue, Directory, Transfer, Clone, Custom Models, CKPT SafeTensors

14.) Automatic1111 Web UI - PC - Free
Become A Stable Diffusion Prompt Master By Using DAAM - Attention Heatmap For Each Used Token - Word

15.) Python Script - Gradio Based - ControlNet - PC - Free
Transform Your Sketches into Masterpieces with Stable Diffusion ControlNet AI - How To Use Tutorial

16.) Automatic1111 Web UI - PC - Free
Sketches into Epic Art with 1 Click: A Guide to Stable Diffusion ControlNet in Automatic1111 Web UI

17.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required
Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI

18.) Automatic1111 Web UI - PC - Free
Fantastic New ControlNet OpenPose Editor Extension & Image Mixing - Stable Diffusion Web UI Tutorial

19.) Automatic1111 Web UI - PC - Free
Automatic1111 Stable Diffusion DreamBooth Guide: Optimal Classification Images Count Comparison Test

20.) Automatic1111 Web UI - PC - Free
Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods

21.) Automatic1111 Web UI - PC - Free
New Style Transfer Extension, ControlNet of Automatic1111 Stable Diffusion T2I-Adapter Color Control

22.) Automatic1111 Web UI - RunPod - Paid
How To Install New DreamBooth Extension On RunPod - Automatic1111 Web UI - Stable Diffusion

23.) Automatic1111 Web UI - PC - Free
Generate Text Arts & Fantastic Logos By Using ControlNet Stable Diffusion Web UI For Free Tutorial

24.) Automatic1111 Web UI - PC - Free
For downgrade to older version if you don't like Torch 2 : first delete venv, let it reinstall, then activate venv and run this command pip install -r "path_of_SD_Extension\requirements.txt"
How To Install New DREAMBOOTH & Torch 2 On Automatic1111 Web UI PC For Epic Performance Gains Guide

25.) Automatic1111 Web UI - PC - Free
Training Midjourney Level Style And Yourself Into The SD 1.5 Model via DreamBooth Stable Diffusion

how to use inpaint.pt with img2img and Strength

it seems that the stength parameter is not available, it would be interesting to have an area where inpaint is needed and an area where the model paints based on the original image via the Strength parameter, and the area where it stays the same as the original image.
It could be a three color mask, the white area where the model will not paint, the gray area where the model will paint based on the Strength parameter and the black area where it will inpain.

Static Noise with sample.py

Hello, I was trying out the out/inpainting method you mentioned in the readme.

After training for about 1000 steps, I tried out sample.py with inpainting.
Input Images:

Output:

I have also read from other issues that at least 10k steps are necesarry, is there something wrong I could be doing?

Kaggle Train with 2 GPUs

So because I'm broke, I use Kaggle with anything to do with AIs. When I heard of Stable Diffusion, I used this Repo for training. Recently I got the text to image working, and then the image to image working. But I'm trying to get their T4 GPUs to Train, which combined, has a total of 30 GB of memory. The current problem is that mpiexec is not recognizing there is 2 GPUs and also what is the bare minimum requirements for training?

how to make inpaint outpainting work with trained dreambooth models.

how to make inpaint outpainting work with trained dreambooth models.
it would be great if that was possible

some questions about image size

Hi Jack,
Resolutions for SD official V1.4 (V1.3) model are 512 for kl and 64 for diffusion model but your Readme settings are 256 for kl, and 32 for diffusion model.

Can the two resolution settings match?
Thanks!

Can't continue to finetune classifier model, and can't use new kl model to finetune main model.

I get this error when trying to continue to finetune a classifier model or when trying to use a finetuned classifier model to finetune the main model.
main()
File "scripts/classifier_train_stable.py", line 47, in main
encoder.load_state_dict(kl_sd, strict=True)
File "/home/thomas/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AutoencoderKL:
Missing key(s) in state_dict: "encoder.conv_in.weight", "encoder.conv_in.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block.0.norm1.bias", "encoder.down.0.block.0.conv1.weight", "encoder.down.0.block.0.conv1.bias", "encoder.down.0.block.0.norm2.weight", "encoder.down.0.block.0.norm2.bias", "encoder.down.0.block.0.conv2.weight", "encoder.down.0.block.0.conv2.bias", "encoder.down.0.block.1.norm1.weight", "encoder.down.0.block.1.norm1.bias", "encoder.down.0.block.1.conv1.weight", "encoder.down.0.block.1.conv1.bias", "encoder.down.0.block.1.norm2.weight", "encoder.down.0.block.1.norm2.bias", "encoder.down.0.block.1.conv2.weight", "encoder.down.0.block.1.conv2.bias", "encoder.down.0.downsample.conv.weight", "encoder.down.0.downsample.conv.bias", "encoder.down.1.block.0.norm1.weight", "encoder.down.1.block.0.norm1.bias", "encoder.down.1.block.0.conv1.weight", "encoder.down.1.block.0.conv1.bias", "encoder.down.1.block.0.norm2.weight", "encoder.down.1.block.0.norm2.bias", "encoder.down.1.block.0.conv2.weight", "encoder.down.1.block.0.conv2.bias", "encoder.down.1.block.0.nin_shortcut.weight", "encoder.down.1.block.0.nin_shortcut.bias", "encoder.down.1.block.1.norm1.weight", "encoder.down.1.block.1.norm1.bias", "encoder.down.1.block.1.conv1.weight", "encoder.down.1.block.1.conv1.bias", "encoder.down.1.block.1.norm2.weight", "encoder.down.1.block.1.norm2.bias", "encoder.down.1.block.1.conv2.weight", "encoder.down.1.block.1.conv2.bias", "encoder.down.1.downsample.conv.weight", "encoder.down.1.downsample.conv.bias", "encoder.down.2.block.0.norm1.weight", "encoder.down.2.block.0.norm1.bias", "encoder.down.2.block.0.conv1.weight", "encoder.down.2.block.0.conv1.bias", "encoder.down.2.block.0.norm2.weight", "encoder.down.2.block.0.norm2.bias", "encoder.down.2.block.0.conv2.weight", "encoder.down.2.block.0.conv2.bias", "encoder.down.2.block.0.nin_shortcut.weight", "encoder.down.2.block.0.nin_shortcut.bias", "encoder.down.2.block.1.norm1.weight", "encoder.down.2.block.1.norm1.bias", "encoder.down.2.block.1.conv1.weight", "encoder.down.2.block.1.conv1.bias", "encoder.down.2.block.1.norm2.weight", "encoder.down.2.block.1.norm2.bias", "encoder.down.2.block.1.conv2.weight", "encoder.down.2.block.1.conv2.bias", "encoder.down.2.downsample.conv.weight", "encoder.down.2.downsample.conv.bias", "encoder.down.3.block.0.norm1.weight", "encoder.down.3.block.0.norm1.bias", "encoder.down.3.block.0.conv1.weight", "encoder.down.3.block.0.conv1.bias", "encoder.down.3.block.0.norm2.weight", "encoder.down.3.block.0.norm2.bias", "encoder.down.3.block.0.conv2.weight", "encoder.down.3.block.0.conv2.bias", "encoder.down.3.block.1.norm1.weight", "encoder.down.3.block.1.norm1.bias", "encoder.down.3.block.1.conv1.weight", "encoder.down.3.block.1.conv1.bias", "encoder.down.3.block.1.norm2.weight", "encoder.down.3.block.1.norm2.bias", "encoder.down.3.block.1.conv2.weight", "encoder.down.3.block.1.conv2.bias", "encoder.mid.block_1.norm1.weight", "encoder.mid.block_1.norm1.bias", "encoder.mid.block_1.conv1.weight", "encoder.mid.block_1.conv1.bias", "encoder.mid.block_1.norm2.weight", "encoder.mid.block_1.norm2.bias", "encoder.mid.block_1.conv2.weight", "encoder.mid.block_1.conv2.bias", "encoder.mid.attn_1.norm.weight", "encoder.mid.attn_1.norm.bias", "encoder.mid.attn_1.q.weight", "encoder.mid.attn_1.q.bias", "encoder.mid.attn_1.k.weight", "encoder.mid.attn_1.k.bias", "encoder.mid.attn_1.v.weight", "encoder.mid.attn_1.v.bias", "encoder.mid.attn_1.proj_out.weight", "encoder.mid.attn_1.proj_out.bias", "encoder.mid.block_2.norm1.weight", "encoder.mid.block_2.norm1.bias", "encoder.mid.block_2.conv1.weight", "encoder.mid.block_2.conv1.bias", "encoder.mid.block_2.norm2.weight", "encoder.mid.block_2.norm2.bias", "encoder.mid.block_2.conv2.weight", "encoder.mid.block_2.conv2.bias", "encoder.norm_out.weight", "encoder.norm_out.bias", "encoder.conv_out.weight", "encoder.conv_out.bias", "decoder.conv_in.weight", "decoder.conv_in.bias", "decoder.mid.block_1.norm1.weight", "decoder.mid.block_1.norm1.bias", "decoder.mid.block_1.conv1.weight", "decoder.mid.block_1.conv1.bias", "decoder.mid.block_1.norm2.weight", "decoder.mid.block_1.norm2.bias", "decoder.mid.block_1.conv2.weight", "decoder.mid.block_1.conv2.bias", "decoder.mid.attn_1.norm.weight", "decoder.mid.attn_1.norm.bias", "decoder.mid.attn_1.q.weight", "decoder.mid.attn_1.q.bias", "decoder.mid.attn_1.k.weight", "decoder.mid.attn_1.k.bias", "decoder.mid.attn_1.v.weight", "decoder.mid.attn_1.v.bias", "decoder.mid.attn_1.proj_out.weight", "decoder.mid.attn_1.proj_out.bias", "decoder.mid.block_2.norm1.weight", "decoder.mid.block_2.norm1.bias", "decoder.mid.block_2.conv1.weight", "decoder.mid.block_2.conv1.bias", "decoder.mid.block_2.norm2.weight", "decoder.mid.block_2.norm2.bias", "decoder.mid.block_2.conv2.weight", "decoder.mid.block_2.conv2.bias", "decoder.up.0.block.0.norm1.weight", "decoder.up.0.block.0.norm1.bias", "decoder.up.0.block.0.conv1.weight", "decoder.up.0.block.0.conv1.bias", "decoder.up.0.block.0.norm2.weight", "decoder.up.0.block.0.norm2.bias", "decoder.up.0.block.0.conv2.weight", "decoder.up.0.block.0.conv2.bias", "decoder.up.0.block.0.nin_shortcut.weight", "decoder.up.0.block.0.nin_shortcut.bias", "decoder.up.0.block.1.norm1.weight", "decoder.up.0.block.1.norm1.bias", "decoder.up.0.block.1.conv1.weight", "decoder.up.0.block.1.conv1.bias", "decoder.up.0.block.1.norm2.weight", "decoder.up.0.block.1.norm2.bias", "decoder.up.0.block.1.conv2.weight", "decoder.up.0.block.1.conv2.bias", "decoder.up.0.block.2.norm1.weight", "decoder.up.0.block.2.norm1.bias", "decoder.up.0.block.2.conv1.weight", "decoder.up.0.block.2.conv1.bias", "decoder.up.0.block.2.norm2.weight", "decoder.up.0.block.2.norm2.bias", "decoder.up.0.block.2.conv2.weight", "decoder.up.0.block.2.conv2.bias", "decoder.up.1.block.0.norm1.weight", "decoder.up.1.block.0.norm1.bias", "decoder.up.1.block.0.conv1.weight", "decoder.up.1.block.0.conv1.bias", "decoder.up.1.block.0.norm2.weight", "decoder.up.1.block.0.norm2.bias", "decoder.up.1.block.0.conv2.weight", "decoder.up.1.block.0.conv2.bias", "decoder.up.1.block.0.nin_shortcut.weight", "decoder.up.1.block.0.nin_shortcut.bias", "decoder.up.1.block.1.norm1.weight", "decoder.up.1.block.1.norm1.bias", "decoder.up.1.block.1.conv1.weight", "decoder.up.1.block.1.conv1.bias", "decoder.up.1.block.1.norm2.weight", "decoder.up.1.block.1.norm2.bias", "decoder.up.1.block.1.conv2.weight", "decoder.up.1.block.1.conv2.bias", "decoder.up.1.block.2.norm1.weight", "decoder.up.1.block.2.norm1.bias", "decoder.up.1.block.2.conv1.weight", "decoder.up.1.block.2.conv1.bias", "decoder.up.1.block.2.norm2.weight", "decoder.up.1.block.2.norm2.bias", "decoder.up.1.block.2.conv2.weight", "decoder.up.1.block.2.conv2.bias", "decoder.up.1.upsample.conv.weight", "decoder.up.1.upsample.conv.bias", "decoder.up.2.block.0.norm1.weight", "decoder.up.2.block.0.norm1.bias", "decoder.up.2.block.0.conv1.weight", "decoder.up.2.block.0.conv1.bias", "decoder.up.2.block.0.norm2.weight", "decoder.up.2.block.0.norm2.bias", "decoder.up.2.block.0.conv2.weight", "decoder.up.2.block.0.conv2.bias", "decoder.up.2.block.1.norm1.weight", "decoder.up.2.block.1.norm1.bias", "decoder.up.2.block.1.conv1.weight", "decoder.up.2.block.1.conv1.bias", "decoder.up.2.block.1.norm2.weight", "decoder.up.2.block.1.norm2.bias", "decoder.up.2.block.1.conv2.weight", "decoder.up.2.block.1.conv2.bias", "decoder.up.2.block.2.norm1.weight", "decoder.up.2.block.2.norm1.bias", "decoder.up.2.block.2.conv1.weight", "decoder.up.2.block.2.conv1.bias", "decoder.up.2.block.2.norm2.weight", "decoder.up.2.block.2.norm2.bias", "decoder.up.2.block.2.conv2.weight", "decoder.up.2.block.2.conv2.bias", "decoder.up.2.upsample.conv.weight", "decoder.up.2.upsample.conv.bias", "decoder.up.3.block.0.norm1.weight", "decoder.up.3.block.0.norm1.bias", "decoder.up.3.block.0.conv1.weight", "decoder.up.3.block.0.conv1.bias", "decoder.up.3.block.0.norm2.weight", "decoder.up.3.block.0.norm2.bias", "decoder.up.3.block.0.conv2.weight", "decoder.up.3.block.0.conv2.bias", "decoder.up.3.block.1.norm1.weight", "decoder.up.3.block.1.norm1.bias", "decoder.up.3.block.1.conv1.weight", "decoder.up.3.block.1.conv1.bias", "decoder.up.3.block.1.norm2.weight", "decoder.up.3.block.1.norm2.bias", "decoder.up.3.block.1.conv2.weight", "decoder.up.3.block.1.conv2.bias", "decoder.up.3.block.2.norm1.weight", "decoder.up.3.block.2.norm1.bias", "decoder.up.3.block.2.conv1.weight", "decoder.up.3.block.2.conv1.bias", "decoder.up.3.block.2.norm2.weight", "decoder.up.3.block.2.norm2.bias", "decoder.up.3.block.2.conv2.weight", "decoder.up.3.block.2.conv2.bias", "decoder.up.3.upsample.conv.weight", "decoder.up.3.upsample.conv.bias", "decoder.norm_out.weight", "decoder.norm_out.bias", "decoder.conv_out.weight", "decoder.conv_out.bias", "quant_conv.weight", "quant_conv.bias", "post_quant_conv.weight", "post_quant_conv.bias".
Unexpected key(s) in state_dict: "state", "param_groups".

Inpainting people

Hey @Jack000, great repo, thanks so much. I want to train Stable Diffusion v2 to optimise for inpainting people into different scenes.

Do you think I can do this with what we have here? Are there any optimisations you would recommend?

size mismatch ... copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3])

Hello, sorry if I'm bothering you again but I have tried merging the resultant pt file (not trained from scratch, instead a few steps from your previous pt inpaint checkpoint), with the SD 1.4 model, but when loading the merged model into a webUI, auto1111 in this example, I always get the following error:


Loading weights [681cbf52] from /content/stable-diffusion-webui/models/Stable-diffusion/model.ckpt
Global Step: 470000
Traceback (most recent call last):
  File "launch.py", line 143, in <module>
    start_webui()
  File "launch.py", line 139, in start_webui
    import webui
  File "/content/stable-diffusion-webui/webui.py", line 78, in <module>
    shared.sd_model = modules.sd_models.load_model()
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 147, in load_model
    load_model_weights(sd_model, checkpoint_info.filename, checkpoint_info.hash)
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 127, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
	size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

I think this has something to do with the architecture your trainer and the normal finetuning use? I'm not an expert.