Is there an existing issue for this? <li class="

[Bug]: Unable to do training on sdxl model about sd_dreambooth_extension HOT 4 CLOSED

Cabel7 commented on August 10, 2024

[Bug]: Unable to do training on sdxl model

from sd_dreambooth_extension.

Comments (4)

yuripetralia commented on August 10, 2024

The problem should be diffusers version, change it to an older version like 0.25.0 and try again, hope it helps

from sd_dreambooth_extension.

Cabel7 commented on August 10, 2024

The problem should be diffusers version, change it to an older version like 0.25.0 and try again, hope it helps

Sorry for the delayed reply
so here is the thing I have used online plateforms like google colab, modal labs, sagemaker studio labs, lightning ai studio for running stable diffusion dreambooth with t4 gpu but it failed in all of them

I've tried many methods to solve this issue but none of it worked
during this whole journey of fixing this error i noticed that when i'm pytorch 2.1.2 with cuda 118 it is
giving bits and bytes error i.e. cuda setup failed despite gpu being available

so then i changed torch to torch 2.3.1 with cuda 121, i know t4 architecture does not support cuda 121, but it did run without xformer for sd1.5 model and as you said i changed diffusers to version 0.25.0 and i think it did fix 'diffusion_pytorch_model.bin not found error" but there were still some errors to be fixed like "RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false." since i was using cuda 121 so revert back to torch 2.1.2 cuda 121 and this also came back with same error "RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false."

I clearly remember in past whenever i used to run command nvidia-smi it displayed cuda version 118
but today when i ran same command i found this

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

it says version 122 so i immediately thought of changing the pytorch but i did not find pytorch release with cuda 122

after searching a bit i found this torch on stackoverflow
"conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia"

so i ran
!conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia
%cd /teamspace/studios/this_studio/content/bitsandbytes
!CUDA_VERSION=122 make cuda11x
!python setup.py install

and installed some modules
And now i'm able to train sd.5 model through dreambooth extension i have not checked for sdxl though

but yes it running now although i'm unable to generate any image now and this is the new error i'm facing
as you can see

"Reusing loaded model quix/quix_3624.safetensors [72c71d57e5] to load Quix/Quix_4900.safetensors
Calculating sha256 for /teamspace/studios/this_studio/content/stable-diffusion-webui/models/Stable-diffusion/Quix/Quix_4900.safetensors: 469ad5607d16d9ef3863899c0b6acd494521739c906a8468627cc0dcd1a3c00d
Loading weights [469ad5607d] from /teamspace/studios/this_studio/content/stable-diffusion-webui/models/Stable-diffusion/Quix/Quix_4900.safetensors
Creating model from config: /teamspace/studios/this_studio/content/stable-diffusion-webui/models/Stable-diffusion/Quix/Quix_4900.yaml
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Applying attention optimization: xformers... done.
Model loaded in 1.0s (create model: 0.3s, apply weights to model: 0.6s).
/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/safe.py:156: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return unsafe_torch_load(filename, *args, **kwargs)
0%| | 0/20 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(hpx0bhumqee4qwb)', <gradio.routes.Request object at 0x7f2655834f40>, 'masterpiece, cowboy shot, from below, woman, aqua marine eyes, hood, looking at viewer, red hair, bangs, horns, solo, sweat, thigh strap, (wet:0.8), breasts, black dress, china dress, short dress, pelvic curtain, blue sky, outdoors, palm tree, sky,', '(worst quality, low quality:1.4), monochrome, zombie, (interlocked fingers:1.2), disfigured, cleavage cutout,', [], 1, 1, 7, 768, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'DPM++ 2M', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/processing.py", line 845, in process_images
res = process_images_inner(p)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/processing.py", line 981, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/processing.py", line 1328, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 218, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_samplers_common.py", line 272, in launch_sampling
return func()
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 218, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 237, in forward
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_hijack_utils.py", line 18, in
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_hijack_utils.py", line 32, in call
return self.__orig_func(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1335, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_unet.py", line 91, in UNetModel_forward
return original_forward(self, x, timesteps, context, *args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 797, in forward
h = module(h, emb, context)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
x = layer(x, context)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
x = block(x, context=context[i])
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 269, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 121, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 136, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 272, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/teamspace/studios/this_studio/content/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 496, in xformers_attention_forward
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=get_xformers_flash_attention_op(q, k, v))
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 223, in memory_efficient_attention
return _memory_efficient_attention(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 321, in _memory_efficient_attention
return _memory_efficient_attention_forward(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 337, in _memory_efficient_attention_forward
op = _dispatch_fw(inp, False)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
return _run_priority_list(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(2, 9216, 8, 40) (torch.float32)
key : shape=(2, 9216, 8, 40) (torch.float32)
value : shape=(2, 9216, 8, 40) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
decoderF is not supported because:
xFormers wasn't build with CUDA support
attn_bias type is <class 'NoneType'>
operator wasn't built - see python -m xformers.info for more info
[email protected] is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
operator wasn't built - see python -m xformers.info for more info
tritonflashattF is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
operator wasn't built - see python -m xformers.info for more info
triton is not available
Only work on pre-MLIR triton for now
cutlassF is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see python -m xformers.info for more info
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support"

from sd_dreambooth_extension.

Cabel7 commented on August 10, 2024

Thanks for the help, i solved all the issues so i'm closing this thread

from sd_dreambooth_extension.

toddremo commented on August 10, 2024

I was having the same issue. I changed the following in the dreambooth requirements.txt file and restarted to resolve:
diffusers==0.25.0

from sd_dreambooth_extension.

[Bug]: Unable to do training on sdxl model about sd_dreambooth_extension HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent