A100 Support about fast-stable-diffusion HOT 22 CLOSED

thelastben commented on May 22, 2024

A100 Support

from fast-stable-diffusion.

Comments (22)

delbalso commented on May 22, 2024 1

https://file.io/UkvT0KEU31MY
kept them as .py files

from fast-stable-diffusion.

delbalso commented on May 22, 2024 1

Thank you

from fast-stable-diffusion.

ackl commented on May 22, 2024 1

I can confirm it works with the latest commit that uses --mixed_precision="no" when GPU == A100. Thanks for the quick update!

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

Thanks,

run :

!pip install git+https://github.com/facebookresearch/xformers@51dd119#egg=xformers

after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers

save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them.

the files might not show in the colab explorer, so you will have to rename them

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C.so /usr/local/lib/python3.7/dist-packages/xformers/C.py

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C_flashattention.so /usr/local/lib/python3.7/dist-packages/xformers/C_flashattention.py

from fast-stable-diffusion.

delbalso commented on May 22, 2024

The notebook still doesn't work though. I get this error.

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--num_cpu_threads_per_process was set to 6 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Downloading: 100% 543/543 [00:00<00:00, 565kB/s]
Fetching 16 files: 0% 0/16 [00:00<?, ?it/s]
Downloading: 100% 342/342 [00:00<00:00, 359kB/s]
Fetching 16 files: 6% 1/16 [00:00<00:11, 1.36it/s]
Downloading: 100% 4.56k/4.56k [00:00<00:00, 3.96MB/s]
Fetching 16 files: 19% 3/16 [00:01<00:05, 2.31it/s]
Downloading: 0% 0.00/1.22G [00:00<?, ?B/s]
Downloading: 0% 4.71M/1.22G [00:00<00:25, 47.1MB/s]
...
Downloading: 100% 1.22G/1.22G [00:16<00:00, 75.7MB/s]
Fetching 16 files: 25% 4/16 [00:17<01:12, 6.01s/it]
Downloading: 100% 209/209 [00:00<00:00, 193kB/s]
Fetching 16 files: 38% 6/16 [00:19<00:30, 3.01s/it]
Downloading: 100% 592/592 [00:00<00:00, 586kB/s]
Fetching 16 files: 44% 7/16 [00:19<00:20, 2.27s/it]
Downloading: 0% 0.00/492M [00:00<?, ?B/s]
Downloading: 1% 4.79M/492M [00:00<00:10, 47.9MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.9MB/s]
Fetching 16 files: 50% 8/16 [00:26<00:29, 3.71s/it]
Downloading: 0% 0.00/525k [00:00<?, ?B/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.57MB/s]
Fetching 16 files: 56% 9/16 [00:27<00:19, 2.81s/it]
Downloading: 100% 472/472 [00:00<00:00, 408kB/s]
Fetching 16 files: 62% 10/16 [00:28<00:12, 2.16s/it]
Downloading: 100% 806/806 [00:00<00:00, 793kB/s]
Fetching 16 files: 69% 11/16 [00:28<00:08, 1.70s/it]
Downloading: 0% 0.00/1.06M [00:00<?, ?B/s]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.60MB/s]
Fetching 16 files: 75% 12/16 [00:29<00:05, 1.44s/it]
Downloading: 100% 743/743 [00:00<00:00, 641kB/s]
Fetching 16 files: 81% 13/16 [00:30<00:03, 1.20s/it]
Downloading: 0% 0.00/3.44G [00:00<?, ?B/s]
...
Downloading: 100% 3.44G/3.44G [00:45<00:00, 75.0MB/s]
Fetching 16 files: 88% 14/16 [01:16<00:29, 14.77s/it]
Downloading: 100% 522/522 [00:00<00:00, 454kB/s]
Fetching 16 files: 94% 15/16 [01:17<00:10, 10.53s/it]
Downloading: 0% 0.00/335M [00:00<?, ?B/s]
Downloading: 1% 4.71M/335M [00:00<00:07, 47.1MB/s]
...
Downloading: 100% 335M/335M [00:04<00:00, 76.1MB/s]
Fetching 16 files: 100% 16/16 [01:21<00:00, 5.12s/it]
Generating class images: 100% 3/3 [00:28<00:00, 9.61s/it]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.61MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.7MB/s]
Steps: 0% 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/drive/MyDrive/AI/DreamBooth/training_data/mike_pics_training_data', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

it looks like you missed the cell downloading the model

from fast-stable-diffusion.

liangwei191 commented on May 22, 2024

I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one.

from fast-stable-diffusion.

delbalso commented on May 22, 2024

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

Yep

it looks like you missed the cell downloading the model

Why do you think that? In any case, I just downloaded it again.

I noticed that I copied the precompiled files wrong, but have now fixed them.

BTW the %%capture thing confused me because I didn't see an error.

Here's an update to the error I'm getting:

/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Steps: 0% 2/2000 [00:05<1:14:31, 2.24s/it, loss=0.42, lr=5e-6] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 550, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.7/dist-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 262, in forward
sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 375, in forward
hidden_states = attn(hidden_states, encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 167, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 219, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 451, in forward
return self.net(hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
Steps: 0% 2/2000 [00:05<1:36:41, 2.90s/it, loss=0.42, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/gdrive/MyDrive/stable-diffusion-v1-4', '--instance_data_dir=/content/data/mikemdb', '--output_dir=/content/models/mikemdb', '--instance_prompt=photo of mikemdb man', '--seed=12345', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.

Any ideas?

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly

from fast-stable-diffusion.

delbalso commented on May 22, 2024

Yes I understand, I just placed the files in the right place manually.

FYI think I just got it working by killing

--use_8bit_adam \

and

--mixed_precision="fp16" \

from fast-stable-diffusion.

delbalso commented on May 22, 2024

How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16"
make sure they are the cause for the error you're getting

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

try leaving the --mixed_precision="fp16" \

from fast-stable-diffusion.

delbalso commented on May 22, 2024

I'm saying it only started working when I removed --mixed_precision="fp16" \

from fast-stable-diffusion.

delbalso commented on May 22, 2024

Should I set train_batch_size to the number of training instances I have?

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

That is the number of models it trains on the same instance, best to keep it to one to save time

from fast-stable-diffusion.

ackl commented on May 22, 2024

i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same CUBLAS_STATUS_EXECUTION_FAILED error right as it gets to step 2

I also resolved it by removing --mixed_precision="fp16" \

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

@ackl I'll make sure A100 users won't face that issue in the future

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it
if it works, that would be easier for me to implement the change, looking forward to your feedback

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook

from fast-stable-diffusion.

TheLastBen commented on May 22, 2024

Thanks for the feedback

from fast-stable-diffusion.

A100 Support about fast-stable-diffusion HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent