smy20011 / dreambooth-docker Goto Github PK

View Code? Open in Web Editor NEW

131.0 131.0 25.0 22 KB

Dockerfile 89.35% Shell 10.65%

dreambooth-docker's People

Contributors

Stargazers

Watchers

dreambooth-docker's Issues

New install Windows 10 (WSL:Ubuntu)

Training Command:

docker run --rm -t --pull always --gpus=all --mount type=bind,source=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt,target=/source.ckpt -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/dest -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train smy20011/dreambooth:v0.1.10 python /diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path=/source.ckpt --dump_path=/dest
docker run --rm -t --pull always --gpus=all -v=D:\StableDiffusion\Marina\512:/instance -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\photo of woman:/class -v=D:\StableDiffusion\Marina\models-new:/output -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/input_model -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train -e HUGGING_FACE_HUB_TOKEN=123 smy20011/dreambooth:v0.1.10 /start_training /train_dreambooth.py --pretrained_model_name_or_path=/input_model --instance_prompt=photo of marina shishova --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=photo of woman --max_train_steps=12000 --learning_rate=5e-7 --lr_scheduler=constant --lr_warmup_steps=0 --save_interval=2000 --save_min_steps=4000 --resolution=512 --output_dir=/output --mixed_precision=bf16 --use_8bit_adam

Training output

...

usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
                           PRETRAINED_MODEL_NAME_OR_PATH
                           [--pretrained_vae_name_or_path PRETRAINED_VAE_NAME_OR_PATH]
                           [--revision REVISION]
                           [--tokenizer_name TOKENIZER_NAME]
                           [--instance_data_dir INSTANCE_DATA_DIR]
                           [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT]
                           [--class_prompt CLASS_PROMPT]
                           [--save_sample_prompt SAVE_SAMPLE_PROMPT]
                           [--save_sample_negative_prompt SAVE_SAMPLE_NEGATIVE_PROMPT]
                           [--n_save_sample N_SAVE_SAMPLE]
                           [--save_guidance_scale SAVE_GUIDANCE_SCALE]
                           [--save_infer_steps SAVE_INFER_STEPS]
                           [--pad_tokens] [--with_prior_preservation]
                           [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                           [--num_class_images NUM_CLASS_IMAGES]
                           [--output_dir OUTPUT_DIR] [--seed SEED]
                           [--resolution RESOLUTION] [--center_crop]
                           [--train_text_encoder]
                           [--train_batch_size TRAIN_BATCH_SIZE]
                           [--sample_batch_size SAMPLE_BATCH_SIZE]
                           [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS]
                           [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                           [--gradient_checkpointing]
                           [--learning_rate LEARNING_RATE] [--scale_lr]
                           [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS]
                           [--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
                           [--adam_beta2 ADAM_BETA2]
                           [--adam_weight_decay ADAM_WEIGHT_DECAY]
                           [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
                           [--hub_token HUB_TOKEN]
                           [--hub_model_id HUB_MODEL_ID]
                           [--logging_dir LOGGING_DIR]
                           [--log_interval LOG_INTERVAL]
                           [--save_interval SAVE_INTERVAL]
                           [--save_min_steps SAVE_MIN_STEPS]
                           [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--hflip]
                           [--local_rank LOCAL_RANK]
                           [--concepts_list CONCEPTS_LIST]
train_dreambooth.py: error: unrecognized arguments:

Could someone make a video tutorial with the entire process to run it on windows please?

Would be much appreciated.

Pytorch 2?

Any chance we can have an image that uses pytorch 2? :)

pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel has been released

Can you adapt this one? (even less VRAM)

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

xformers support

I see you're installing xformers in the Dockerfile, but I don't see how to enable it for speedup.

Hello again! First, thanks for building this.
When trying to run the container, it says that it is not able to use the GPU.
I checked and I was missing the nvidia-container-toolkit.
Maybe you can add that it is needed to run this correctly?

Thanks!

Newbie install

Can someone point me to a basic install guide. Doesn't have to be a video but something. Thanks

Support Stable Diffusion 2

upgrade diffusers to v0.9.0 to support Stable Diffusion 2

https://github.com/huggingface/diffusers/releases/tag/v0.9.0

16GB readme example: RuntimeError: CUDA error: invalid argument

Unable to run the 16GB instructions on the readme, as it always results in a CUDA error:

Caching latents: 100%|████████████████████████████████████████████████████████████████| 200/200 [00:49<00:00,  4.02it/s]
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/train_dreambooth.py", line 695, in <module>
    main()
  File "/train_dreambooth.py", line 645, in main
    accelerator.backward(loss)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 884, in backward
    loss.backward(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 146, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/opt/conda/lib/python3.7/site-packages/xformers/ops.py", line 376, in backward
    causal=ctx.causal,
  File "/opt/conda/lib/python3.7/site-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Command is more or less as in the readme, targeting a "person" class:

sudo docker run -it --gpus=all --ipc=host -v $(pwd):/train -e HUGGING_FACE_HUB_TOKEN=$(cat ~/.huggingface/token)  smy20011/dreambooth:latest   accelerate launch /train_dreambooth.py   --pretrained_model_name_or_path=$MODEL_NAME   --instance_data_dir=$INSTANCE_DIR   --class_data_dir=$CLASS_DIR   --output_dir=$OUTPUT_DIR   --with_prior_preservation --prior_loss_weight=1.0   --instance_prompt="a photo of abcdefg person"   --class_prompt="a photo of a person"   --resolution=512   --train_batch_size=1   --gradient_accumulation_steps=2 --gradient_checkpointing   --use_8bit_adam   --learning_rate=5e-6   --lr_scheduler="constant"   --lr_warmup_steps=0   --num_class_images=200   --max_train_steps=800

Using Windows WSL2 in an ubtuntu distro. Graphics card is a nvidia geforce rtx 3080 16 gb.

Docker containers can access the GPU just fine

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 516.94       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:02:00.0  On |                  N/A |
|  0%   45C    P8    44W / 370W |    988MiB / 10240MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

...No idea how to diagnose the issue, as the error message seems far too generic to search for. Any help would be appreciated.

Readme example is broken with --use-auth-token

--use-auth-token isn't a valid flag for train_dreambooth.py, and it currently breaks the example below:

How to prevent the container from automatically exiting after running once？

I found that after I run and play a training session, the container automatically exits, and when restart is used, an exception will be thrown. failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-e": executable file not found in $PATH: unknown
Hope you can help me with this problem, thanks ! @smy20011

a version supporting cuda 11.7/11.8?

@smy20011
This doesn't work for cuda 11.7/11.8(switching the image for the appropriate pytorch version doesn't work)

train_dreambooth.py: error: unrecognized arguments: --use_auth_token

Hello! FIrst, I just want to tahnk you for creating this docker.
I am having the following error when trying to run it:

betto@pop-os:~/dream$ ./training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
                           PRETRAINED_MODEL_NAME_OR_PATH
                           [--tokenizer_name TOKENIZER_NAME]
                           --instance_data_dir INSTANCE_DATA_DIR
                           [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT]
                           [--class_prompt CLASS_PROMPT]
                           [--with_prior_preservation]
                           [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                           [--num_class_images NUM_CLASS_IMAGES]
                           [--output_dir OUTPUT_DIR] [--seed SEED]
                           [--resolution RESOLUTION] [--center_crop]
                           [--train_batch_size TRAIN_BATCH_SIZE]
                           [--sample_batch_size SAMPLE_BATCH_SIZE]
                           [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS]
                           [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                           [--gradient_checkpointing]
                           [--learning_rate LEARNING_RATE] [--scale_lr]
                           [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS]
                           [--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
                           [--adam_beta2 ADAM_BETA2]
                           [--adam_weight_decay ADAM_WEIGHT_DECAY]
                           [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
                           [--hub_token HUB_TOKEN]
                           [--hub_model_id HUB_MODEL_ID]
                           [--logging_dir LOGGING_DIR]
                           [--log_interval LOG_INTERVAL]
                           [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: unrecognized arguments: --use_auth_token
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=training', '--class_data_dir=classes', '--output_dir=output', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of florzuvi person', '--class_prompt=a photo of person', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800']' returned non-zero exit status 2.
betto@pop-os:~/dream$

Here is my system and directory where I am running this from:

betto@pop-os:~/dream$ neofetch
             /////////////                betto@pop-os 
         /////////////////////            ------------ 
      ///////*767////////////////         OS: Pop!_OS 22.04 LTS x86_64 
    //////7676767676*//////////////       Host: X570 AORUS PRO WIFI -CF 
   /////76767//7676767//////////////      Kernel: 5.19.0-76051900-generic 
  /////767676///*76767///////////////     Uptime: 11 mins 
 ///////767676///76767.///7676*///////    Packages: 1924 (dpkg) 
/////////767676//76767///767676////////   Shell: bash 5.1.16 
//////////76767676767////76767/////////   Resolution: 2560x1440 
///////////76767676//////7676//////////   DE: GNOME 42.3.1 
////////////,7676,///////767///////////   WM: Mutter 
/////////////*7676///////76////////////   WM Theme: Pop 
///////////////7676////////////////////   Theme: Pop-dark [GTK2/3] 
 ///////////////7676///767////////////    Icons: Pop [GTK2/3] 
  //////////////////////'////////////     Terminal: x-terminal-emul 
   //////.7676767676767676767,//////      CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz 
    /////767676767676767676767/////       GPU: NVIDIA GeForce RTX 3090 
      ///////////////////////////         Memory: 2520MiB / 32021MiB 
         /////////////////////
             /////////////                                        
                                                                  


betto@pop-os:~/dream$ pwd
/home/betto/dream
betto@pop-os:~/dream$ 
betto@pop-os:~/dream$ nvidia-smi
Wed Oct 12 03:13:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   49C    P8    42W / 350W |    381MiB / 24576MiB |     19%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2068      G   /usr/lib/xorg/Xorg                148MiB |
|    0   N/A  N/A      2173      G   /usr/bin/gnome-shell               72MiB |
|    0   N/A  N/A      4142      G   firefox                           157MiB |
+-----------------------------------------------------------------------------+
betto@pop-os:~/dream$  docker --version
Docker version 20.10.18, build b40c2f6

What could I do to be able to run it? Thanks in advance!

smy20011 / dreambooth-docker Goto Github PK

dreambooth-docker's People

Contributors

Stargazers

Watchers

Forkers

dreambooth-docker's Issues

Training Command:

Training output

Recommend Projects

Recommend Topics

Recommend Org