Giter VIP home page Giter VIP logo

dreambooth-docker's People

Contributors

erjanmx avatar smy20011 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dreambooth-docker's Issues

New install Windows 10 (WSL:Ubuntu)

Training Command:

docker run --rm -t --pull always --gpus=all --mount type=bind,source=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt,target=/source.ckpt -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/dest -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train smy20011/dreambooth:v0.1.10 python /diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path=/source.ckpt --dump_path=/dest
docker run --rm -t --pull always --gpus=all -v=D:\StableDiffusion\Marina\512:/instance -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\photo of woman:/class -v=D:\StableDiffusion\Marina\models-new:/output -v=D:\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned:/input_model -v=C:\Users\A\AppData\Roaming\smy20011.dreambooth\:/train -e HUGGING_FACE_HUB_TOKEN=123 smy20011/dreambooth:v0.1.10 /start_training /train_dreambooth.py --pretrained_model_name_or_path=/input_model --instance_prompt=photo of marina shishova --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=photo of woman --max_train_steps=12000 --learning_rate=5e-7 --lr_scheduler=constant --lr_warmup_steps=0 --save_interval=2000 --save_min_steps=4000 --resolution=512 --output_dir=/output --mixed_precision=bf16 --use_8bit_adam 

Training output

...

usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
                           PRETRAINED_MODEL_NAME_OR_PATH
                           [--pretrained_vae_name_or_path PRETRAINED_VAE_NAME_OR_PATH]
                           [--revision REVISION]
                           [--tokenizer_name TOKENIZER_NAME]
                           [--instance_data_dir INSTANCE_DATA_DIR]
                           [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT]
                           [--class_prompt CLASS_PROMPT]
                           [--save_sample_prompt SAVE_SAMPLE_PROMPT]
                           [--save_sample_negative_prompt SAVE_SAMPLE_NEGATIVE_PROMPT]
                           [--n_save_sample N_SAVE_SAMPLE]
                           [--save_guidance_scale SAVE_GUIDANCE_SCALE]
                           [--save_infer_steps SAVE_INFER_STEPS]
                           [--pad_tokens] [--with_prior_preservation]
                           [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                           [--num_class_images NUM_CLASS_IMAGES]
                           [--output_dir OUTPUT_DIR] [--seed SEED]
                           [--resolution RESOLUTION] [--center_crop]
                           [--train_text_encoder]
                           [--train_batch_size TRAIN_BATCH_SIZE]
                           [--sample_batch_size SAMPLE_BATCH_SIZE]
                           [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS]
                           [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                           [--gradient_checkpointing]
                           [--learning_rate LEARNING_RATE] [--scale_lr]
                           [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS]
                           [--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
                           [--adam_beta2 ADAM_BETA2]
                           [--adam_weight_decay ADAM_WEIGHT_DECAY]
                           [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
                           [--hub_token HUB_TOKEN]
                           [--hub_model_id HUB_MODEL_ID]
                           [--logging_dir LOGGING_DIR]
                           [--log_interval LOG_INTERVAL]
                           [--save_interval SAVE_INTERVAL]
                           [--save_min_steps SAVE_MIN_STEPS]
                           [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--hflip]
                           [--local_rank LOCAL_RANK]
                           [--concepts_list CONCEPTS_LIST]
train_dreambooth.py: error: unrecognized arguments: 

xformers support

I see you're installing xformers in the Dockerfile, but I don't see how to enable it for speedup.

Update requirements

Hello again! First, thanks for building this.
When trying to run the container, it says that it is not able to use the GPU.
I checked and I was missing the nvidia-container-toolkit.
Maybe you can add that it is needed to run this correctly?

Thanks!

Newbie install

Can someone point me to a basic install guide. Doesn't have to be a video but something. Thanks

16GB readme example: RuntimeError: CUDA error: invalid argument

Unable to run the 16GB instructions on the readme, as it always results in a CUDA error:

Caching latents: 100%|████████████████████████████████████████████████████████████████| 200/200 [00:49<00:00,  4.02it/s]
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/train_dreambooth.py", line 695, in <module>
    main()
  File "/train_dreambooth.py", line 645, in main
    accelerator.backward(loss)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 884, in backward
    loss.backward(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 146, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/opt/conda/lib/python3.7/site-packages/xformers/ops.py", line 376, in backward
    causal=ctx.causal,
  File "/opt/conda/lib/python3.7/site-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Command is more or less as in the readme, targeting a "person" class:

sudo docker run -it --gpus=all --ipc=host -v $(pwd):/train -e HUGGING_FACE_HUB_TOKEN=$(cat ~/.huggingface/token)  smy20011/dreambooth:latest   accelerate launch /train_dreambooth.py   --pretrained_model_name_or_path=$MODEL_NAME   --instance_data_dir=$INSTANCE_DIR   --class_data_dir=$CLASS_DIR   --output_dir=$OUTPUT_DIR   --with_prior_preservation --prior_loss_weight=1.0   --instance_prompt="a photo of abcdefg person"   --class_prompt="a photo of a person"   --resolution=512   --train_batch_size=1   --gradient_accumulation_steps=2 --gradient_checkpointing   --use_8bit_adam   --learning_rate=5e-6   --lr_scheduler="constant"   --lr_warmup_steps=0   --num_class_images=200   --max_train_steps=800

Using Windows WSL2 in an ubtuntu distro. Graphics card is a nvidia geforce rtx 3080 16 gb.

Docker containers can access the GPU just fine

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 516.94       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:02:00.0  On |                  N/A |
|  0%   45C    P8    44W / 370W |    988MiB / 10240MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

...No idea how to diagnose the issue, as the error message seems far too generic to search for. Any help would be appreciated.

How to prevent the container from automatically exiting after running once?

I found that after I run and play a training session, the container automatically exits, and when restart is used, an exception will be thrown. failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-e": executable file not found in $PATH: unknown
Hope you can help me with this problem, thanks ! @smy20011

train_dreambooth.py: error: unrecognized arguments: --use_auth_token

Hello! FIrst, I just want to tahnk you for creating this docker.
I am having the following error when trying to run it:

betto@pop-os:~/dream$ ./training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
                           PRETRAINED_MODEL_NAME_OR_PATH
                           [--tokenizer_name TOKENIZER_NAME]
                           --instance_data_dir INSTANCE_DATA_DIR
                           [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT]
                           [--class_prompt CLASS_PROMPT]
                           [--with_prior_preservation]
                           [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                           [--num_class_images NUM_CLASS_IMAGES]
                           [--output_dir OUTPUT_DIR] [--seed SEED]
                           [--resolution RESOLUTION] [--center_crop]
                           [--train_batch_size TRAIN_BATCH_SIZE]
                           [--sample_batch_size SAMPLE_BATCH_SIZE]
                           [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS]
                           [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                           [--gradient_checkpointing]
                           [--learning_rate LEARNING_RATE] [--scale_lr]
                           [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS]
                           [--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
                           [--adam_beta2 ADAM_BETA2]
                           [--adam_weight_decay ADAM_WEIGHT_DECAY]
                           [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
                           [--hub_token HUB_TOKEN]
                           [--hub_model_id HUB_MODEL_ID]
                           [--logging_dir LOGGING_DIR]
                           [--log_interval LOG_INTERVAL]
                           [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: unrecognized arguments: --use_auth_token
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '/train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=training', '--class_data_dir=classes', '--output_dir=output', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of florzuvi person', '--class_prompt=a photo of person', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800']' returned non-zero exit status 2.
betto@pop-os:~/dream$

Here is my system and directory where I am running this from:

betto@pop-os:~/dream$ neofetch
             /////////////                betto@pop-os 
         /////////////////////            ------------ 
      ///////*767////////////////         OS: Pop!_OS 22.04 LTS x86_64 
    //////7676767676*//////////////       Host: X570 AORUS PRO WIFI -CF 
   /////76767//7676767//////////////      Kernel: 5.19.0-76051900-generic 
  /////767676///*76767///////////////     Uptime: 11 mins 
 ///////767676///76767.///7676*///////    Packages: 1924 (dpkg) 
/////////767676//76767///767676////////   Shell: bash 5.1.16 
//////////76767676767////76767/////////   Resolution: 2560x1440 
///////////76767676//////7676//////////   DE: GNOME 42.3.1 
////////////,7676,///////767///////////   WM: Mutter 
/////////////*7676///////76////////////   WM Theme: Pop 
///////////////7676////////////////////   Theme: Pop-dark [GTK2/3] 
 ///////////////7676///767////////////    Icons: Pop [GTK2/3] 
  //////////////////////'////////////     Terminal: x-terminal-emul 
   //////.7676767676767676767,//////      CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz 
    /////767676767676767676767/////       GPU: NVIDIA GeForce RTX 3090 
      ///////////////////////////         Memory: 2520MiB / 32021MiB 
         /////////////////////
             /////////////                                        
                                                                  


betto@pop-os:~/dream$ pwd
/home/betto/dream
betto@pop-os:~/dream$ 
betto@pop-os:~/dream$ nvidia-smi
Wed Oct 12 03:13:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   49C    P8    42W / 350W |    381MiB / 24576MiB |     19%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2068      G   /usr/lib/xorg/Xorg                148MiB |
|    0   N/A  N/A      2173      G   /usr/bin/gnome-shell               72MiB |
|    0   N/A  N/A      4142      G   firefox                           157MiB |
+-----------------------------------------------------------------------------+
betto@pop-os:~/dream$  docker --version
Docker version 20.10.18, build b40c2f6


What could I do to be able to run it? Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.