jack000 / glid-3-xl-stable Goto Github PK
View Code? Open in Web Editor NEWstable diffusion training
License: MIT License
stable diffusion training
License: MIT License
All of the training scripts specified in the README give errors like the following:
RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA Version: 11.7
using 8x Tesla V100-SXM2 (with 16GB memory)
reducing --batch_size 32
didn't help
passing --microbatch 1
didn't help
why does inpaint training use bert encoder?
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.
已放弃 (核心已转储)
Seems more vram efficient than original LDM/SD,
On colab freetier T4, this can work with [1,4,104,112]
latent (832x896 image) without cuda OOM,
while the original can only work with [1,4,88,96]
(704x768). Both under fp16.
The issues I encountered are:
Without re-train, clip_proj
is empty, and image_embed
seems must be None. (otherwise some conv error.)
So is it possible to use image_embed
without re-train?
Orig LDM/SD has 6 other samplers from k-diffusion. You can see the minimal (zero extra dependency) ripoff of k-diffusion on my notebook:
https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb
(My ripoff also get sigmas_karras
and eta (ddim_eta)
works unlike all other k-diffusion copypastas.)
Will this network arch get more samplers than plms&ddim in the future?
Also did you try JIT (torch.jit.trace()
) on this network arch? JIT can help checking is there some weird pythonic things in the code.
I followed Ailia's instructions axinc-ai/ailia-models#830 ,
turned Orig LDM/SD into jit (the notebook above is it), wonder if this arch can also be JIT'd.
Seeing some strange shifts to the unmasked (not supposed to be edited) region of images when outpainting:
I started with the left image (512x512), and extended to the right with a mask preserving the original image. However, the preserved section is changed, as you can see in the image above. I repeated this to extend further right, and the same thing happened. Each time the image seems to get a little darker, and on close inspection, the fine details seem sharper.
The image being passed into do_run() is unchanged from the original (I saved a copy just before inference to be sure).
Any ideas how to fix this?
Hi, has anyone ever tried to train with cpu?
i know it will be super slow but im tried for the fun of it
i currently disabled my gpu by setting this line in image_train_stable.py
torch.cuda.is_available = lambda : False
Traceback (most recent call last):
File "scripts\image_train_stable.py", line 157, in
main()
File "scripts\image_train_stable.py", line 85, in main
TrainLoop(
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 194, in run_loop
self.run_step(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 208, in run_step
self.forward_backward(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 236, in forward_backward
losses = compute_losses()
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 96, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\gaussian_diffusion.py", line 1137, in training_losses
model_output = model(x_t, self._scale_timesteps(t), **model_kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 133, in call
return self.model(x, new_ts, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 880, in forward
h = module(h, emb, context)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 217, in forward
x = layer(x)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
sorry for bothering with useless question but am i doing something wrong?
thanks
edit:
nevermind
i removed both .half() from the image_train_stable.py and deleted --use_fp16 from the training arguments
this way i was able to train on cpu
Hi, Amazing job you're doing !
Do you plan on implementing textual inversion and/or DreamBooth ?
Could it be possible to use Hivemind to distribute the compute?
https://github.com/learning-at-home/hivemind
Also is there a way to lower the vram usage?
Hello! Thanks for this amazing work. Astonished by these beautiful results!
I've created a colab notebook that makes it easier for people to experiment.
Thought you might want to add it to the readme :)
https://colab.research.google.com/drive/1tKUTU7hhPsFlHAYsENRfiB2vy5pNrPh5?usp=sharing
https://colab.research.google.com/drive/177E0DpVK1YOfN5zOelyElerWyYjcuZ7G?usp=sharing
I've been finetuning an SD model and training seems to work OK.
But I'm stuck when merging the files back together for inference.
Which files in logs/ need to be combined to create a usable output model-merged.pt
?
When I try python3 merge.py ../sdv14ema.ckpt logs/output015000.pt
and use the resulting model-merged.pt
for inference, I get this error:
$ python sample.py --model_path model-merged.pt --batch_size 3 --num_batches 3 --text "A Humvee in a ditch"
... lots of output trimmed ...
Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "callbacks", "lr_schedulers".
I have tried with the ema-0.9999-015000.pt and opts015000.pt files and neither seem to contain the right keys.
What am I doing wrong?
Hi Jack,
Great work with the training scripts. I'm trying to run your pretrained inpainting model with sample.py, but the output just inpaints the entire image rather than the specified region. In your wiki, you show some great examples of inpainting on the face. Do you have a script to produce those results?
Thanks again,
Jeff
Hello, thanks for your implementation. I've scoured the internet for merging multiple LDMs into one model, but there doesn't seem to be a viable solution as of yet. Would it be possible to do so?
For example, you might have one model finetuned on scenery, and one on animals. Would merging two be a viable solution, or would some form of transfer learning be a better alternative? I've tried merging state_dict
between two models, but it expands so many dimensions that I don't think it would work properly during inference.
Any insight would be greatly appreciated!
Wondering if this code base means we can use CLIP guidance for generation instead of the classifier free guidance in the regular model?
When executing train.sh, it outputs the following:
root@centro:/glid-3-xl-stable# ./train.sh
usage: image_train_stable.py [-h] [--data_dir DATA_DIR] [--schedule_sampler SCHEDULE_SAMPLER] [--lr LR] [--weight_decay WEIGHT_DECAY] [--lr_anneal_steps LR_ANNEAL_STEPS] [--batch_size BATCH_SIZE]
[--microbatch MICROBATCH] [--ema_rate EMA_RATE] [--log_interval LOG_INTERVAL] [--save_interval SAVE_INTERVAL] [--resume_checkpoint RESUME_CHECKPOINT]
[--use_fp16 USE_FP16] [--fp16_scale_growth FP16_SCALE_GROWTH] [--kl_model KL_MODEL] [--actual_image_size ACTUAL_IMAGE_SIZE] [--image_size IMAGE_SIZE]
[--num_channels NUM_CHANNELS] [--num_res_blocks NUM_RES_BLOCKS] [--num_heads NUM_HEADS] [--num_heads_upsample NUM_HEADS_UPSAMPLE] [--num_head_channels NUM_HEAD_CHANNELS]
[--attention_resolutions ATTENTION_RESOLUTIONS] [--channel_mult CHANNEL_MULT] [--dropout DROPOUT] [--class_cond CLASS_COND] [--use_checkpoint USE_CHECKPOINT]
[--use_scale_shift_norm USE_SCALE_SHIFT_NORM] [--resblock_updown RESBLOCK_UPDOWN] [--use_spatial_transformer USE_SPATIAL_TRANSFORMER] [--context_dim CONTEXT_DIM]
[--clip_embed_dim CLIP_EMBED_DIM] [--image_condition IMAGE_CONDITION] [--super_res_condition SUPER_RES_CONDITION] [--learn_sigma LEARN_SIGMA]
[--diffusion_steps DIFFUSION_STEPS] [--noise_schedule NOISE_SCHEDULE] [--timestep_respacing TIMESTEP_RESPACING] [--use_kl USE_KL] [--predict_xstart PREDICT_XSTART]
[--rescale_timesteps RESCALE_TIMESTEPS] [--rescale_learned_sigmas RESCALE_LEARNED_SIGMAS]
image_train_stable.py: error: unrecognized arguments: --lr_warmup_steps 10000
root@centro:/glid-3-xl-stable#
Is this expected? Should I just remove that argument?
Not sure if I'm doing something wrong but training on single A100 vs 8xA100 mpiexec -N 8 doesn't seem to change the training speed even though nvidia-smi is showing all of the gpus in use.
When I compare trained models by both of these setups (identical step size) they don't seem to differ.
Is there some configuration option that I missed?
It seems like there is a bug when setting use_fp16 = False, the log is
Traceback (most recent call last):
File "scripts/image_train_stable.py", line 150, in <module>
main()
File "scripts/image_train_stable.py", line 78, in main
TrainLoop(
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 194, in run_loop
self.run_step(batch, cond)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 208, in run_step
self.forward_backward(batch, cond)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/train_util.py", line 236, in forward_backward
losses = compute_losses()
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/respace.py", line 96, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/gaussian_diffusion.py", line 1137, in training_losses
model_output = model(x_t, self._scale_timesteps(t), **model_kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/respace.py", line 133, in __call__
return self.model(x, new_ts, **kwargs)
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 882, in forward
h = module(h, emb, context)
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 217, in forward
x = layer(x, context)
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 188, in forward
x = block(x, context=context)
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 140, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/nn.py", line 162, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/nn.py", line 174, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 144, in _forward
x = self.attn2(self.norm2(x), context=context) + x
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yanq/Codes/glid-3-xl-stable/guided_diffusion/unet.py", line 112, in forward
sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
File "/home/yanq/.conda/envs/ldm/lib/python3.8/site-packages/torch/functional.py", line 327, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: expected scalar type Float but found Half
Greetings everyone. I am Dr. Furkan Gözükara. I am an Assistant Professor in Software Engineering department of a private university (have PhD in Computer Engineering). My professional programming skill is unfortunately C# not Python :)
My linkedin : https://www.linkedin.com/in/furkangozukara/
I am keeping this list up-to-date. I got upcoming new awesome video ideas. Trying to find time to do that.
Since my profession is teaching, I usually do not skip any of the important parts. Therefore, you may find my videos a little bit longer.
Playlist link on YouTube: Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img
1.) Automatic1111 Web UI - PC - Free
Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer
2.) Automatic1111 Web UI - PC - Free
How to use Stable Diffusion V2.1 and Different Models in the Web UI - SD 1.5 vs 2.1 vs Anything V3
3.) Automatic1111 Web UI - PC - Free
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed
4.) Automatic1111 Web UI - PC - Free
DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI
5.) Automatic1111 Web UI - PC - Free
How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI
6.) Automatic1111 Web UI - PC - Free
How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1
7.) Automatic1111 Web UI - PC - Free
8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI
8.) Automatic1111 Web UI - PC - Free
How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial
9.) Automatic1111 Web UI - PC - Free
How To Generate Stunning Epic Text By Stable Diffusion AI - No Photoshop - For Free - Depth-To-Image
10.) Python Code - Hugging Face Diffusers Script - PC - Free
How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File
11.) NMKD Stable Diffusion GUI - Open Source - PC - Free
Forget Photoshop - How To Transform Images With Text Prompts using InstructPix2Pix Model in NMKD GUI
12.) Google Colab Free - Cloud - No PC Is Required
Transform Your Selfie into a Stunning AI Avatar with Stable Diffusion - Better than Lensa for Free
13.) Google Colab Free - Cloud - No PC Is Required
Stable Diffusion Google Colab, Continue, Directory, Transfer, Clone, Custom Models, CKPT SafeTensors
14.) Automatic1111 Web UI - PC - Free
Become A Stable Diffusion Prompt Master By Using DAAM - Attention Heatmap For Each Used Token - Word
15.) Python Script - Gradio Based - ControlNet - PC - Free
Transform Your Sketches into Masterpieces with Stable Diffusion ControlNet AI - How To Use Tutorial
16.) Automatic1111 Web UI - PC - Free
Sketches into Epic Art with 1 Click: A Guide to Stable Diffusion ControlNet in Automatic1111 Web UI
17.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required
Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI
18.) Automatic1111 Web UI - PC - Free
Fantastic New ControlNet OpenPose Editor Extension & Image Mixing - Stable Diffusion Web UI Tutorial
19.) Automatic1111 Web UI - PC - Free
Automatic1111 Stable Diffusion DreamBooth Guide: Optimal Classification Images Count Comparison Test
20.) Automatic1111 Web UI - PC - Free
Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods
21.) Automatic1111 Web UI - PC - Free
New Style Transfer Extension, ControlNet of Automatic1111 Stable Diffusion T2I-Adapter Color Control
22.) Automatic1111 Web UI - RunPod - Paid
How To Install New DreamBooth Extension On RunPod - Automatic1111 Web UI - Stable Diffusion
23.) Automatic1111 Web UI - PC - Free
Generate Text Arts & Fantastic Logos By Using ControlNet Stable Diffusion Web UI For Free Tutorial
24.) Automatic1111 Web UI - PC - Free
For downgrade to older version if you don't like Torch 2 : first delete venv, let it reinstall, then activate venv and run this command pip install -r "path_of_SD_Extension\requirements.txt"
How To Install New DREAMBOOTH & Torch 2 On Automatic1111 Web UI PC For Epic Performance Gains Guide
25.) Automatic1111 Web UI - PC - Free
Training Midjourney Level Style And Yourself Into The SD 1.5 Model via DreamBooth Stable Diffusion
it seems that the stength parameter is not available, it would be interesting to have an area where inpaint is needed and an area where the model paints based on the original image via the Strength parameter, and the area where it stays the same as the original image.
It could be a three color mask, the white area where the model will not paint, the gray area where the model will paint based on the Strength parameter and the black area where it will inpain.
So because I'm broke, I use Kaggle with anything to do with AIs. When I heard of Stable Diffusion, I used this Repo for training. Recently I got the text to image working, and then the image to image working. But I'm trying to get their T4 GPUs to Train, which combined, has a total of 30 GB of memory. The current problem is that mpiexec is not recognizing there is 2 GPUs and also what is the bare minimum requirements for training?
how to make inpaint outpainting work with trained dreambooth models.
it would be great if that was possible
Hi Jack,
Resolutions for SD official V1.4 (V1.3) model are 512 for kl and 64 for diffusion model but your Readme settings are 256 for kl, and 32 for diffusion model.
Can the two resolution settings match?
Thanks!
I get this error when trying to continue to finetune a classifier model or when trying to use a finetuned classifier model to finetune the main model.
main()
File "scripts/classifier_train_stable.py", line 47, in main
encoder.load_state_dict(kl_sd, strict=True)
File "/home/thomas/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AutoencoderKL:
Missing key(s) in state_dict: "encoder.conv_in.weight", "encoder.conv_in.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block.0.norm1.bias", "encoder.down.0.block.0.conv1.weight", "encoder.down.0.block.0.conv1.bias", "encoder.down.0.block.0.norm2.weight", "encoder.down.0.block.0.norm2.bias", "encoder.down.0.block.0.conv2.weight", "encoder.down.0.block.0.conv2.bias", "encoder.down.0.block.1.norm1.weight", "encoder.down.0.block.1.norm1.bias", "encoder.down.0.block.1.conv1.weight", "encoder.down.0.block.1.conv1.bias", "encoder.down.0.block.1.norm2.weight", "encoder.down.0.block.1.norm2.bias", "encoder.down.0.block.1.conv2.weight", "encoder.down.0.block.1.conv2.bias", "encoder.down.0.downsample.conv.weight", "encoder.down.0.downsample.conv.bias", "encoder.down.1.block.0.norm1.weight", "encoder.down.1.block.0.norm1.bias", "encoder.down.1.block.0.conv1.weight", "encoder.down.1.block.0.conv1.bias", "encoder.down.1.block.0.norm2.weight", "encoder.down.1.block.0.norm2.bias", "encoder.down.1.block.0.conv2.weight", "encoder.down.1.block.0.conv2.bias", "encoder.down.1.block.0.nin_shortcut.weight", "encoder.down.1.block.0.nin_shortcut.bias", "encoder.down.1.block.1.norm1.weight", "encoder.down.1.block.1.norm1.bias", "encoder.down.1.block.1.conv1.weight", "encoder.down.1.block.1.conv1.bias", "encoder.down.1.block.1.norm2.weight", "encoder.down.1.block.1.norm2.bias", "encoder.down.1.block.1.conv2.weight", "encoder.down.1.block.1.conv2.bias", "encoder.down.1.downsample.conv.weight", "encoder.down.1.downsample.conv.bias", "encoder.down.2.block.0.norm1.weight", "encoder.down.2.block.0.norm1.bias", "encoder.down.2.block.0.conv1.weight", "encoder.down.2.block.0.conv1.bias", "encoder.down.2.block.0.norm2.weight", "encoder.down.2.block.0.norm2.bias", "encoder.down.2.block.0.conv2.weight", "encoder.down.2.block.0.conv2.bias", "encoder.down.2.block.0.nin_shortcut.weight", "encoder.down.2.block.0.nin_shortcut.bias", "encoder.down.2.block.1.norm1.weight", "encoder.down.2.block.1.norm1.bias", "encoder.down.2.block.1.conv1.weight", "encoder.down.2.block.1.conv1.bias", "encoder.down.2.block.1.norm2.weight", "encoder.down.2.block.1.norm2.bias", "encoder.down.2.block.1.conv2.weight", "encoder.down.2.block.1.conv2.bias", "encoder.down.2.downsample.conv.weight", "encoder.down.2.downsample.conv.bias", "encoder.down.3.block.0.norm1.weight", "encoder.down.3.block.0.norm1.bias", "encoder.down.3.block.0.conv1.weight", "encoder.down.3.block.0.conv1.bias", "encoder.down.3.block.0.norm2.weight", "encoder.down.3.block.0.norm2.bias", "encoder.down.3.block.0.conv2.weight", "encoder.down.3.block.0.conv2.bias", "encoder.down.3.block.1.norm1.weight", "encoder.down.3.block.1.norm1.bias", "encoder.down.3.block.1.conv1.weight", "encoder.down.3.block.1.conv1.bias", "encoder.down.3.block.1.norm2.weight", "encoder.down.3.block.1.norm2.bias", "encoder.down.3.block.1.conv2.weight", "encoder.down.3.block.1.conv2.bias", "encoder.mid.block_1.norm1.weight", "encoder.mid.block_1.norm1.bias", "encoder.mid.block_1.conv1.weight", "encoder.mid.block_1.conv1.bias", "encoder.mid.block_1.norm2.weight", "encoder.mid.block_1.norm2.bias", "encoder.mid.block_1.conv2.weight", "encoder.mid.block_1.conv2.bias", "encoder.mid.attn_1.norm.weight", "encoder.mid.attn_1.norm.bias", "encoder.mid.attn_1.q.weight", "encoder.mid.attn_1.q.bias", "encoder.mid.attn_1.k.weight", "encoder.mid.attn_1.k.bias", "encoder.mid.attn_1.v.weight", "encoder.mid.attn_1.v.bias", "encoder.mid.attn_1.proj_out.weight", "encoder.mid.attn_1.proj_out.bias", "encoder.mid.block_2.norm1.weight", "encoder.mid.block_2.norm1.bias", "encoder.mid.block_2.conv1.weight", "encoder.mid.block_2.conv1.bias", "encoder.mid.block_2.norm2.weight", "encoder.mid.block_2.norm2.bias", "encoder.mid.block_2.conv2.weight", "encoder.mid.block_2.conv2.bias", "encoder.norm_out.weight", "encoder.norm_out.bias", "encoder.conv_out.weight", "encoder.conv_out.bias", "decoder.conv_in.weight", "decoder.conv_in.bias", "decoder.mid.block_1.norm1.weight", "decoder.mid.block_1.norm1.bias", "decoder.mid.block_1.conv1.weight", "decoder.mid.block_1.conv1.bias", "decoder.mid.block_1.norm2.weight", "decoder.mid.block_1.norm2.bias", "decoder.mid.block_1.conv2.weight", "decoder.mid.block_1.conv2.bias", "decoder.mid.attn_1.norm.weight", "decoder.mid.attn_1.norm.bias", "decoder.mid.attn_1.q.weight", "decoder.mid.attn_1.q.bias", "decoder.mid.attn_1.k.weight", "decoder.mid.attn_1.k.bias", "decoder.mid.attn_1.v.weight", "decoder.mid.attn_1.v.bias", "decoder.mid.attn_1.proj_out.weight", "decoder.mid.attn_1.proj_out.bias", "decoder.mid.block_2.norm1.weight", "decoder.mid.block_2.norm1.bias", "decoder.mid.block_2.conv1.weight", "decoder.mid.block_2.conv1.bias", "decoder.mid.block_2.norm2.weight", "decoder.mid.block_2.norm2.bias", "decoder.mid.block_2.conv2.weight", "decoder.mid.block_2.conv2.bias", "decoder.up.0.block.0.norm1.weight", "decoder.up.0.block.0.norm1.bias", "decoder.up.0.block.0.conv1.weight", "decoder.up.0.block.0.conv1.bias", "decoder.up.0.block.0.norm2.weight", "decoder.up.0.block.0.norm2.bias", "decoder.up.0.block.0.conv2.weight", "decoder.up.0.block.0.conv2.bias", "decoder.up.0.block.0.nin_shortcut.weight", "decoder.up.0.block.0.nin_shortcut.bias", "decoder.up.0.block.1.norm1.weight", "decoder.up.0.block.1.norm1.bias", "decoder.up.0.block.1.conv1.weight", "decoder.up.0.block.1.conv1.bias", "decoder.up.0.block.1.norm2.weight", "decoder.up.0.block.1.norm2.bias", "decoder.up.0.block.1.conv2.weight", "decoder.up.0.block.1.conv2.bias", "decoder.up.0.block.2.norm1.weight", "decoder.up.0.block.2.norm1.bias", "decoder.up.0.block.2.conv1.weight", "decoder.up.0.block.2.conv1.bias", "decoder.up.0.block.2.norm2.weight", "decoder.up.0.block.2.norm2.bias", "decoder.up.0.block.2.conv2.weight", "decoder.up.0.block.2.conv2.bias", "decoder.up.1.block.0.norm1.weight", "decoder.up.1.block.0.norm1.bias", "decoder.up.1.block.0.conv1.weight", "decoder.up.1.block.0.conv1.bias", "decoder.up.1.block.0.norm2.weight", "decoder.up.1.block.0.norm2.bias", "decoder.up.1.block.0.conv2.weight", "decoder.up.1.block.0.conv2.bias", "decoder.up.1.block.0.nin_shortcut.weight", "decoder.up.1.block.0.nin_shortcut.bias", "decoder.up.1.block.1.norm1.weight", "decoder.up.1.block.1.norm1.bias", "decoder.up.1.block.1.conv1.weight", "decoder.up.1.block.1.conv1.bias", "decoder.up.1.block.1.norm2.weight", "decoder.up.1.block.1.norm2.bias", "decoder.up.1.block.1.conv2.weight", "decoder.up.1.block.1.conv2.bias", "decoder.up.1.block.2.norm1.weight", "decoder.up.1.block.2.norm1.bias", "decoder.up.1.block.2.conv1.weight", "decoder.up.1.block.2.conv1.bias", "decoder.up.1.block.2.norm2.weight", "decoder.up.1.block.2.norm2.bias", "decoder.up.1.block.2.conv2.weight", "decoder.up.1.block.2.conv2.bias", "decoder.up.1.upsample.conv.weight", "decoder.up.1.upsample.conv.bias", "decoder.up.2.block.0.norm1.weight", "decoder.up.2.block.0.norm1.bias", "decoder.up.2.block.0.conv1.weight", "decoder.up.2.block.0.conv1.bias", "decoder.up.2.block.0.norm2.weight", "decoder.up.2.block.0.norm2.bias", "decoder.up.2.block.0.conv2.weight", "decoder.up.2.block.0.conv2.bias", "decoder.up.2.block.1.norm1.weight", "decoder.up.2.block.1.norm1.bias", "decoder.up.2.block.1.conv1.weight", "decoder.up.2.block.1.conv1.bias", "decoder.up.2.block.1.norm2.weight", "decoder.up.2.block.1.norm2.bias", "decoder.up.2.block.1.conv2.weight", "decoder.up.2.block.1.conv2.bias", "decoder.up.2.block.2.norm1.weight", "decoder.up.2.block.2.norm1.bias", "decoder.up.2.block.2.conv1.weight", "decoder.up.2.block.2.conv1.bias", "decoder.up.2.block.2.norm2.weight", "decoder.up.2.block.2.norm2.bias", "decoder.up.2.block.2.conv2.weight", "decoder.up.2.block.2.conv2.bias", "decoder.up.2.upsample.conv.weight", "decoder.up.2.upsample.conv.bias", "decoder.up.3.block.0.norm1.weight", "decoder.up.3.block.0.norm1.bias", "decoder.up.3.block.0.conv1.weight", "decoder.up.3.block.0.conv1.bias", "decoder.up.3.block.0.norm2.weight", "decoder.up.3.block.0.norm2.bias", "decoder.up.3.block.0.conv2.weight", "decoder.up.3.block.0.conv2.bias", "decoder.up.3.block.1.norm1.weight", "decoder.up.3.block.1.norm1.bias", "decoder.up.3.block.1.conv1.weight", "decoder.up.3.block.1.conv1.bias", "decoder.up.3.block.1.norm2.weight", "decoder.up.3.block.1.norm2.bias", "decoder.up.3.block.1.conv2.weight", "decoder.up.3.block.1.conv2.bias", "decoder.up.3.block.2.norm1.weight", "decoder.up.3.block.2.norm1.bias", "decoder.up.3.block.2.conv1.weight", "decoder.up.3.block.2.conv1.bias", "decoder.up.3.block.2.norm2.weight", "decoder.up.3.block.2.norm2.bias", "decoder.up.3.block.2.conv2.weight", "decoder.up.3.block.2.conv2.bias", "decoder.up.3.upsample.conv.weight", "decoder.up.3.upsample.conv.bias", "decoder.norm_out.weight", "decoder.norm_out.bias", "decoder.conv_out.weight", "decoder.conv_out.bias", "quant_conv.weight", "quant_conv.bias", "post_quant_conv.weight", "post_quant_conv.bias".
Unexpected key(s) in state_dict: "state", "param_groups".
Hey @Jack000, great repo, thanks so much. I want to train Stable Diffusion v2 to optimise for inpainting people into different scenes.
Do you think I can do this with what we have here? Are there any optimisations you would recommend?
Hello, sorry if I'm bothering you again but I have tried merging the resultant pt file (not trained from scratch, instead a few steps from your previous pt inpaint checkpoint), with the SD 1.4 model, but when loading the merged model into a webUI, auto1111 in this example, I always get the following error:
Loading weights [681cbf52] from /content/stable-diffusion-webui/models/Stable-diffusion/model.ckpt
Global Step: 470000
Traceback (most recent call last):
File "launch.py", line 143, in <module>
start_webui()
File "launch.py", line 139, in start_webui
import webui
File "/content/stable-diffusion-webui/webui.py", line 78, in <module>
shared.sd_model = modules.sd_models.load_model()
File "/content/stable-diffusion-webui/modules/sd_models.py", line 147, in load_model
load_model_weights(sd_model, checkpoint_info.filename, checkpoint_info.hash)
File "/content/stable-diffusion-webui/modules/sd_models.py", line 127, in load_model_weights
model.load_state_dict(sd, strict=False)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
I think this has something to do with the architecture your trainer and the normal finetuning use? I'm not an expert.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.