Comments (22)
https://file.io/UkvT0KEU31MY
kept them as .py files
from fast-stable-diffusion.
Thank you
from fast-stable-diffusion.
I can confirm it works with the latest commit that uses --mixed_precision="no"
when GPU == A100. Thanks for the quick update!
from fast-stable-diffusion.
Thanks,
run :
!pip install git+https://github.com/facebookresearch/xformers@51dd119#egg=xformers
after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers
save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them.
the files might not show in the colab explorer, so you will have to rename them
!cp /usr/local/lib/python3.7/dist-packages/xformers/_C.so /usr/local/lib/python3.7/dist-packages/xformers/C.py
!cp /usr/local/lib/python3.7/dist-packages/xformers/_C_flashattention.so /usr/local/lib/python3.7/dist-packages/xformers/C_flashattention.py
from fast-stable-diffusion.
The notebook still doesn't work though. I get this error.
The following values were not passed to
accelerate launch
and had defaults used instead:
--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--num_cpu_threads_per_process
was set to6
to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
.
Downloading: 100% 543/543 [00:00<00:00, 565kB/s]
Fetching 16 files: 0% 0/16 [00:00<?, ?it/s]
Downloading: 100% 342/342 [00:00<00:00, 359kB/s]
Fetching 16 files: 6% 1/16 [00:00<00:11, 1.36it/s]
Downloading: 100% 4.56k/4.56k [00:00<00:00, 3.96MB/s]
Fetching 16 files: 19% 3/16 [00:01<00:05, 2.31it/s]
Downloading: 0% 0.00/1.22G [00:00<?, ?B/s]
Downloading: 0% 4.71M/1.22G [00:00<00:25, 47.1MB/s]
...
Downloading: 100% 1.22G/1.22G [00:16<00:00, 75.7MB/s]
Fetching 16 files: 25% 4/16 [00:17<01:12, 6.01s/it]
Downloading: 100% 209/209 [00:00<00:00, 193kB/s]
Fetching 16 files: 38% 6/16 [00:19<00:30, 3.01s/it]
Downloading: 100% 592/592 [00:00<00:00, 586kB/s]
Fetching 16 files: 44% 7/16 [00:19<00:20, 2.27s/it]
Downloading: 0% 0.00/492M [00:00<?, ?B/s]
Downloading: 1% 4.79M/492M [00:00<00:10, 47.9MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.9MB/s]
Fetching 16 files: 50% 8/16 [00:26<00:29, 3.71s/it]
Downloading: 0% 0.00/525k [00:00<?, ?B/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.57MB/s]
Fetching 16 files: 56% 9/16 [00:27<00:19, 2.81s/it]
Downloading: 100% 472/472 [00:00<00:00, 408kB/s]
Fetching 16 files: 62% 10/16 [00:28<00:12, 2.16s/it]
Downloading: 100% 806/806 [00:00<00:00, 793kB/s]
Fetching 16 files: 69% 11/16 [00:28<00:08, 1.70s/it]
Downloading: 0% 0.00/1.06M [00:00<?, ?B/s]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.60MB/s]
Fetching 16 files: 75% 12/16 [00:29<00:05, 1.44s/it]
Downloading: 100% 743/743 [00:00<00:00, 641kB/s]
Fetching 16 files: 81% 13/16 [00:30<00:03, 1.20s/it]
Downloading: 0% 0.00/3.44G [00:00<?, ?B/s]
...
Downloading: 100% 3.44G/3.44G [00:45<00:00, 75.0MB/s]
Fetching 16 files: 88% 14/16 [01:16<00:29, 14.77s/it]
Downloading: 100% 522/522 [00:00<00:00, 454kB/s]
Fetching 16 files: 94% 15/16 [01:17<00:10, 10.53s/it]
Downloading: 0% 0.00/335M [00:00<?, ?B/s]
Downloading: 1% 4.71M/335M [00:00<00:07, 47.1MB/s]
...
Downloading: 100% 335M/335M [00:04<00:00, 76.1MB/s]
Fetching 16 files: 100% 16/16 [01:21<00:00, 5.12s/it]
Generating class images: 100% 3/3 [00:28<00:00, 9.61s/it]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.61MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.7MB/s]
Steps: 0% 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/drive/MyDrive/AI/DreamBooth/training_data/mike_pics_training_data', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.
from fast-stable-diffusion.
Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?
from fast-stable-diffusion.
it looks like you missed the cell downloading the model
from fast-stable-diffusion.
I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one.
from fast-stable-diffusion.
Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?
Yep
it looks like you missed the cell downloading the model
Why do you think that? In any case, I just downloaded it again.
I noticed that I copied the precompiled files wrong, but have now fixed them.
BTW the %%capture
thing confused me because I didn't see an error.
Here's an update to the error I'm getting:
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Steps: 0% 2/2000 [00:05<1:14:31, 2.24s/it, loss=0.42, lr=5e-6] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 550, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.7/dist-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 262, in forward
sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 375, in forward
hidden_states = attn(hidden_states, encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 167, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 219, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 451, in forward
return self.net(hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when callingcublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
Steps: 0% 2/2000 [00:05<1:36:41, 2.90s/it, loss=0.42, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/gdrive/MyDrive/stable-diffusion-v1-4', '--instance_data_dir=/content/data/mikemdb', '--output_dir=/content/models/mikemdb', '--instance_prompt=photo of mikemdb man', '--seed=12345', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.
Any ideas?
from fast-stable-diffusion.
If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly
from fast-stable-diffusion.
Yes I understand, I just placed the files in the right place manually.
FYI think I just got it working by killing
--use_8bit_adam \
and
--mixed_precision="fp16" \
from fast-stable-diffusion.
How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins
from fast-stable-diffusion.
it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16"
make sure they are the cause for the error you're getting
from fast-stable-diffusion.
try leaving the --mixed_precision="fp16" \
from fast-stable-diffusion.
I'm saying it only started working when I removed --mixed_precision="fp16" \
from fast-stable-diffusion.
Should I set train_batch_size
to the number of training instances I have?
from fast-stable-diffusion.
That is the number of models it trains on the same instance, best to keep it to one to save time
from fast-stable-diffusion.
i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same CUBLAS_STATUS_EXECUTION_FAILED
error right as it gets to step 2
I also resolved it by removing --mixed_precision="fp16" \
from fast-stable-diffusion.
@ackl I'll make sure A100 users won't face that issue in the future
from fast-stable-diffusion.
@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it
if it works, that would be easier for me to implement the change, looking forward to your feedback
from fast-stable-diffusion.
I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook
from fast-stable-diffusion.
Thanks for the feedback
from fast-stable-diffusion.
Related Issues (20)
- Animatediff cannot output GIFs, PNGs, or MP4s. HOT 2
- Model Load - KeyError: 'content-disposition' HOT 12
- Upscaler is not working as before, PLEASE HELP HOT 1
- torch.cuda.OutOfMemoryError with SDXL HOT 2
- Unable to Launch ComfyUI in Paperspace, no tensorboard server HOT 6
- Google Colab Pro - xfromers needs to be reinstalled HOT 32
- 504 Gateway Time-out HOT 2
- Xformers is missing. HOT 2
- Paperspace cell stopping HOT 16
- (RunPod)A tensor with all NaNs was produced in Unet HOT 2
- mov2mov is not working....
- Google colab code could not open the gradio WebUI
- SDXL Lora Paperspace Trainer Errors - Non-Zero Exit Status HOT 4
- AttributeError: module 'jax.random' has no attribute 'KeyArray' HOT 15
- Thelastben GoogleColab Model Conversion error HOT 6
- ModuleNotFoundError: No module named 'diskcache' HOT 11
- *** Error creating infotext for key "Hires negative prompt" HOT 4
- Error loading script: !adetailer.py [Paperspace] HOT 2
- ImportError: cannot import name 'Undefined' from 'pydantic.fields' HOT 4
- ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast-stable-diffusion.