dreambooth-docker's People
Forkers
dezigns333 chewtoys rsh4d0w rasamaya limitlessmatrix slimlime americanpresidentjimmycarter goldfish1974 wyhgood yuvalsade xiankgx oceanswave aswilam king-alex-d-great breefield ironico edwardchow33 macguyversmusic dimasgonzales thewalkingcity cameronfyfe northfoxz gitcryptodevops svupperdreambooth-docker's Issues
New install Windows 10 (WSL:Ubuntu)
Training Command:
docker run --rm -t --pull always --gpus=all --mount type=bind,source=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt,target=/source.ckpt -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/dest -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train smy20011/dreambooth:v0.1.10 python /diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path=/source.ckpt --dump_path=/dest
docker run --rm -t --pull always --gpus=all -v=D:\StableDiffusion\Marina\512:/instance -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\photo of woman:/class -v=D:\StableDiffusion\Marina\models-new:/output -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/input_model -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train -e HUGGING_FACE_HUB_TOKEN=123 smy20011/dreambooth:v0.1.10 /start_training /train_dreambooth.py --pretrained_model_name_or_path=/input_model --instance_prompt=photo of marina shishova --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=photo of woman --max_train_steps=12000 --learning_rate=5e-7 --lr_scheduler=constant --lr_warmup_steps=0 --save_interval=2000 --save_min_steps=4000 --resolution=512 --output_dir=/output --mixed_precision=bf16 --use_8bit_adam
Training output
...
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
PRETRAINED_MODEL_NAME_OR_PATH
[--pretrained_vae_name_or_path PRETRAINED_VAE_NAME_OR_PATH]
[--revision REVISION]
[--tokenizer_name TOKENIZER_NAME]
[--instance_data_dir INSTANCE_DATA_DIR]
[--class_data_dir CLASS_DATA_DIR]
[--instance_prompt INSTANCE_PROMPT]
[--class_prompt CLASS_PROMPT]
[--save_sample_prompt SAVE_SAMPLE_PROMPT]
[--save_sample_negative_prompt SAVE_SAMPLE_NEGATIVE_PROMPT]
[--n_save_sample N_SAVE_SAMPLE]
[--save_guidance_scale SAVE_GUIDANCE_SCALE]
[--save_infer_steps SAVE_INFER_STEPS]
[--pad_tokens] [--with_prior_preservation]
[--prior_loss_weight PRIOR_LOSS_WEIGHT]
[--num_class_images NUM_CLASS_IMAGES]
[--output_dir OUTPUT_DIR] [--seed SEED]
[--resolution RESOLUTION] [--center_crop]
[--train_text_encoder]
[--train_batch_size TRAIN_BATCH_SIZE]
[--sample_batch_size SAMPLE_BATCH_SIZE]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_train_steps MAX_TRAIN_STEPS]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_checkpointing]
[--learning_rate LEARNING_RATE] [--scale_lr]
[--lr_scheduler LR_SCHEDULER]
[--lr_warmup_steps LR_WARMUP_STEPS]
[--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
[--hub_token HUB_TOKEN]
[--hub_model_id HUB_MODEL_ID]
[--logging_dir LOGGING_DIR]
[--log_interval LOG_INTERVAL]
[--save_interval SAVE_INTERVAL]
[--save_min_steps SAVE_MIN_STEPS]
[--mixed_precision {no,fp16,bf16}]
[--not_cache_latents] [--hflip]
[--local_rank LOCAL_RANK]
[--concepts_list CONCEPTS_LIST]
train_dreambooth.py: error: unrecognized arguments:
Could someone make a video tutorial with the entire process to run it on windows please?
Would be much appreciated.
Pytorch 2?
Any chance we can have an image that uses pytorch 2? :)
pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel has been released
Can you adapt this one? (even less VRAM)
xformers support
I see you're installing xformers in the Dockerfile, but I don't see how to enable it for speedup.
Update requirements
Hello again! First, thanks for building this.
When trying to run the container, it says that it is not able to use the GPU.
I checked and I was missing the nvidia-container-toolkit.
Maybe you can add that it is needed to run this correctly?
Thanks!
Newbie install
Can someone point me to a basic install guide. Doesn't have to be a video but something. Thanks
Support Stable Diffusion 2
upgrade diffusers to v0.9.0 to support Stable Diffusion 2
https://github.com/huggingface/diffusers/releases/tag/v0.9.0
16GB readme example: RuntimeError: CUDA error: invalid argument
Unable to run the 16GB instructions on the readme, as it always results in a CUDA error:
Caching latents: 100%|████████████████████████████████████████████████████████████████| 200/200 [00:49<00:00, 4.02it/s]
Steps: 0%| | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/train_dreambooth.py", line 695, in <module>
main()
File "/train_dreambooth.py", line 645, in main
accelerator.backward(loss)
File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 884, in backward
loss.backward(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 146, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.7/site-packages/xformers/ops.py", line 376, in backward
causal=ctx.causal,
File "/opt/conda/lib/python3.7/site-packages/torch/_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps: 0%| | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
Command is more or less as in the readme, targeting a "person" class:
sudo docker run -it --gpus=all --ipc=host -v $(pwd):/train -e HUGGING_FACE_HUB_TOKEN=$(cat ~/.huggingface/token) smy20011/dreambooth:latest accelerate launch /train_dreambooth.py --pretrained_model_name_or_path=$MODEL_NAME --instance_data_dir=$INSTANCE_DIR --class_data_dir=$CLASS_DIR --output_dir=$OUTPUT_DIR --with_prior_preservation --prior_loss_weight=1.0 --instance_prompt="a photo of abcdefg person" --class_prompt="a photo of a person" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --num_class_images=200 --max_train_steps=800
Using Windows WSL2 in an ubtuntu distro. Graphics card is a nvidia geforce rtx 3080 16 gb.
Docker containers can access the GPU just fine
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 516.94 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 On | N/A |
| 0% 45C P8 44W / 370W | 988MiB / 10240MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
...No idea how to diagnose the issue, as the error message seems far too generic to search for. Any help would be appreciated.
Readme example is broken with --use-auth-token
How to prevent the container from automatically exiting after running once?
I found that after I run and play a training session, the container automatically exits, and when restart is used, an exception will be thrown. failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-e": executable file not found in $PATH: unknown
Hope you can help me with this problem, thanks ! @smy20011
a version supporting cuda 11.7/11.8?
@smy20011
This doesn't work for cuda 11.7/11.8(switching the image for the appropriate pytorch version doesn't work)
train_dreambooth.py: error: unrecognized arguments: --use_auth_token
Hello! FIrst, I just want to tahnk you for creating this docker.
I am having the following error when trying to run it:
betto@pop-os:~/dream$ ./training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
PRETRAINED_MODEL_NAME_OR_PATH
[--tokenizer_name TOKENIZER_NAME]
--instance_data_dir INSTANCE_DATA_DIR
[--class_data_dir CLASS_DATA_DIR]
[--instance_prompt INSTANCE_PROMPT]
[--class_prompt CLASS_PROMPT]
[--with_prior_preservation]
[--prior_loss_weight PRIOR_LOSS_WEIGHT]
[--num_class_images NUM_CLASS_IMAGES]
[--output_dir OUTPUT_DIR] [--seed SEED]
[--resolution RESOLUTION] [--center_crop]
[--train_batch_size TRAIN_BATCH_SIZE]
[--sample_batch_size SAMPLE_BATCH_SIZE]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_train_steps MAX_TRAIN_STEPS]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_checkpointing]
[--learning_rate LEARNING_RATE] [--scale_lr]
[--lr_scheduler LR_SCHEDULER]
[--lr_warmup_steps LR_WARMUP_STEPS]
[--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
[--hub_token HUB_TOKEN]
[--hub_model_id HUB_MODEL_ID]
[--logging_dir LOGGING_DIR]
[--log_interval LOG_INTERVAL]
[--mixed_precision {no,fp16,bf16}]
[--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: unrecognized arguments: --use_auth_token
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=training', '--class_data_dir=classes', '--output_dir=output', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of florzuvi person', '--class_prompt=a photo of person', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800']' returned non-zero exit status 2.
betto@pop-os:~/dream$
Here is my system and directory where I am running this from:
betto@pop-os:~/dream$ neofetch
///////////// betto@pop-os
///////////////////// ------------
///////*767//////////////// OS: Pop!_OS 22.04 LTS x86_64
//////7676767676*////////////// Host: X570 AORUS PRO WIFI -CF
/////76767//7676767////////////// Kernel: 5.19.0-76051900-generic
/////767676///*76767/////////////// Uptime: 11 mins
///////767676///76767.///7676*/////// Packages: 1924 (dpkg)
/////////767676//76767///767676//////// Shell: bash 5.1.16
//////////76767676767////76767///////// Resolution: 2560x1440
///////////76767676//////7676////////// DE: GNOME 42.3.1
////////////,7676,///////767/////////// WM: Mutter
/////////////*7676///////76//////////// WM Theme: Pop
///////////////7676//////////////////// Theme: Pop-dark [GTK2/3]
///////////////7676///767//////////// Icons: Pop [GTK2/3]
//////////////////////'//////////// Terminal: x-terminal-emul
//////.7676767676767676767,////// CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
/////767676767676767676767///// GPU: NVIDIA GeForce RTX 3090
/////////////////////////// Memory: 2520MiB / 32021MiB
/////////////////////
/////////////
betto@pop-os:~/dream$ pwd
/home/betto/dream
betto@pop-os:~/dream$
betto@pop-os:~/dream$ nvidia-smi
Wed Oct 12 03:13:21 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0A:00.0 On | N/A |
| 0% 49C P8 42W / 350W | 381MiB / 24576MiB | 19% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2068 G /usr/lib/xorg/Xorg 148MiB |
| 0 N/A N/A 2173 G /usr/bin/gnome-shell 72MiB |
| 0 N/A N/A 4142 G firefox 157MiB |
+-----------------------------------------------------------------------------+
betto@pop-os:~/dream$ docker --version
Docker version 20.10.18, build b40c2f6
What could I do to be able to run it? Thanks in advance!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.