brian6091 / dreambooth Goto Github PK

Fine-tuning of diffusion models

License: Apache License 2.0

Python 56.42% Jupyter Notebook 43.58%

artificial-intelligence colab colab-notebook deep-learning dreambooth fine-tuning lora low-rank-approximation python stable-diffusion textual-inversion

dreambooth's People

Contributors

Stargazers

Watchers

dreambooth's Issues

Remove parameters when lr=0

https://stackoverflow.com/questions/68377722/in-pytorch-model-training-how-to-freeze-unfreeze-and-freeze-again-some-params

Options necessary for attention/vae slicing?

Need to check how this interacts when xformers is enabled
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()

i get class images of old indigenous women instead of my instance prompt/ token while training LoRA

Hello, there is nothing even close to my instance prompt in save_sample_prompt folder. I`m training LoRA and keep getting pictures of an old indigenous women instead of my token. What could cause that?
INSTANCE TOKEN - "sks"
CLASS_PROMPT - "photo of woman"
INSTANCE_PROMPT - "photo of sks"
SAVE_SAMPLE_PROMPT - "photo of sks woman"

`Accelerate` default config: Not found

I get this error.

This is my notebook. I'm check-marked LoRA and disabled the prior preservation option. Then the rests are in the default setting.

https://colab.research.google.com/drive/1Kr4Fy1_wIhg_j7_rYLpDv40FLKnnDqen?usp=share_link

Thank you

No LSB modules are available.
Description:	Ubuntu 20.04.5 LTS
diffusers==0.11.1
lora-diffusion @ file:///content/lora
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers==0.0.16rc425

Copy-and-paste the text below in your GitHub issue

- `Accelerate` version: 0.15.0
- Platform: Linux-5.10.147+-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- `Accelerate` default config:
	Not found
usage: accelerate <command> [<args>] launch
       [-h]
       [--config_file CONFIG_FILE]
       [--cpu]
       [--mps]
       [--multi_gpu]
       [--tpu]
       [--use_mps_device]
       [--dynamo_backend {no,eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
       [--mixed_precision {no,fp16,bf16}]
       [--fp16]
       [--num_processes NUM_PROCESSES]
       [--num_machines NUM_MACHINES]
       [--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS]
       [--use_deepspeed]
       [--use_fsdp]
       [--use_megatron_lm]
       [--gpu_ids GPU_IDS]
       [--same_network]
       [--machine_rank MACHINE_RANK]
       [--main_process_ip MAIN_PROCESS_IP]
       [--main_process_port MAIN_PROCESS_PORT]
       [--rdzv_conf RDZV_CONF]
       [--max_restarts MAX_RESTARTS]
       [--monitor_interval MONITOR_INTERVAL]
       [-m]
       [--no_python]
       [--main_training_function MAIN_TRAINING_FUNCTION]
       [--downcast_bf16]
       [--deepspeed_config_file DEEPSPEED_CONFIG_FILE]
       [--zero_stage ZERO_STAGE]
       [--offload_optimizer_device OFFLOAD_OPTIMIZER_DEVICE]
       [--offload_param_device OFFLOAD_PARAM_DEVICE]
       [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
       [--gradient_clipping GRADIENT_CLIPPING]
       [--zero3_init_flag ZERO3_INIT_FLAG]
       [--zero3_save_16bit_model ZERO3_SAVE_16BIT_MODEL]
       [--deepspeed_hostfile DEEPSPEED_HOSTFILE]
       [--deepspeed_exclusion_filter DEEPSPEED_EXCLUSION_FILTER]
       [--deepspeed_inclusion_filter DEEPSPEED_INCLUSION_FILTER]
       [--deepspeed_multinode_launcher DEEPSPEED_MULTINODE_LAUNCHER]
       [--fsdp_offload_params FSDP_OFFLOAD_PARAMS]
       [--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
       [--fsdp_sharding_strategy FSDP_SHARDING_STRATEGY]
       [--fsdp_auto_wrap_policy FSDP_AUTO_WRAP_POLICY]
       [--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
       [--fsdp_backward_prefetch_policy FSDP_BACKWARD_PREFETCH_POLICY]
       [--fsdp_state_dict_type FSDP_STATE_DICT_TYPE]
       [--megatron_lm_tp_degree MEGATRON_LM_TP_DEGREE]
       [--megatron_lm_pp_degree MEGATRON_LM_PP_DEGREE]
       [--megatron_lm_num_micro_batches MEGATRON_LM_NUM_MICRO_BATCHES]
       [--megatron_lm_sequence_parallelism MEGATRON_LM_SEQUENCE_PARALLELISM]
       [--megatron_lm_recompute_activations MEGATRON_LM_RECOMPUTE_ACTIVATIONS]
       [--megatron_lm_use_distributed_optimizer MEGATRON_LM_USE_DISTRIBUTED_OPTIMIZER]
       [--megatron_lm_gradient_clipping MEGATRON_LM_GRADIENT_CLIPPING]
       [--aws_access_key_id AWS_ACCESS_KEY_ID]
       [--aws_secret_access_key AWS_SECRET_ACCESS_KEY]
       [--debug]
       training_script
       ...
accelerate <command> [<args>] launch: error: argument --mixed_precision: invalid choice: '' (choose from 'no', 'fp16', 'bf16')

xformers not working with A100?

Seeing warning like this:
TheLastBen/fast-stable-diffusion#911

Training still runs, but should monitor memory to see?

Am using precompiled wheel from:
!pip install -q https://github.com/ShivamShrirao/xformers-wheels/releases/download/4c06c79/xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl

but maybe that only works for T4?

Switch to this?

!pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl

Freeze bitsandbytes?

ShivamShrirao/diffusers#178

Lora Question

Is it possible to do inference using the trained lora model on your colab. I can't seem to figure it out , how?
What files would I use if I want to use this on Automatic1111 Web UI?

Add parameters

save full model in diffusers format (overrides save only trained parameters)
when only saving trainable parameters, option to push to tracker
init kwargs for tracker

SD v2.1 produces black intermediate samples with LoRA

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Versio=11.6. Please reinstall the torchvision that matches your PyTorch install.

When launching training

Seems to be an error everywhere with this so not specific to this repo. Any ideas how to fix?

Add back push to Hugging face

Dump a default accelerate config

In expected directory:

cat < ~/.cache/huggingface/accelerate/default_config.yaml
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: 'NO'
downcast_bf16: 'no'
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
use_cpu: false
EOT

Autocrop images if != resolution and no augmentation pipeline

Error when run colab notebook

i get the below error when i run training cell in colab FineTuning_colab.ipynb
also run cell Training parameters and all parameter parsed

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.15.0
Platform: Linux-5.10.147+-x86_64-with-glibc2.31
Python version: 3.9.16
Numpy version: 1.22.4
PyTorch version (GPU?): 1.13.1+cu117 (True)
Accelerate default config:
Not found
2023-04-16 09:30:04.094704: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-16 09:30:04.940115: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: accelerate [] launch
[-h]
[--config_file CONFIG_FILE]
[--cpu]
[--mps]
[--multi_gpu]
[--tpu]
[--use_mps_device]
[--dynamo_backend {no,eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--mixed_precision {no,fp16,bf16}]
[--fp16]
[--num_processes NUM_PROCESSES]
[--num_machines NUM_MACHINES]
[--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS]
[--use_deepspeed]
[--use_fsdp]
[--use_megatron_lm]
[--gpu_ids GPU_IDS]
[--same_network]
[--machine_rank MACHINE_RANK]
[--main_process_ip MAIN_PROCESS_IP]
[--main_process_port MAIN_PROCESS_PORT]
[--rdzv_conf RDZV_CONF]
[--max_restarts MAX_RESTARTS]
[--monitor_interval MONITOR_INTERVAL]
[-m]
[--no_python]
[--main_training_function MAIN_TRAINING_FUNCTION]
[--downcast_bf16]
[--deepspeed_config_file DEEPSPEED_CONFIG_FILE]
[--zero_stage ZERO_STAGE]
[--offload_optimizer_device OFFLOAD_OPTIMIZER_DEVICE]
[--offload_param_device OFFLOAD_PARAM_DEVICE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_clipping GRADIENT_CLIPPING]
[--zero3_init_flag ZERO3_INIT_FLAG]
[--zero3_save_16bit_model ZERO3_SAVE_16BIT_MODEL]
[--deepspeed_hostfile DEEPSPEED_HOSTFILE]
[--deepspeed_exclusion_filter DEEPSPEED_EXCLUSION_FILTER]
[--deepspeed_inclusion_filter DEEPSPEED_INCLUSION_FILTER]
[--deepspeed_multinode_launcher DEEPSPEED_MULTINODE_LAUNCHER]
[--fsdp_offload_params FSDP_OFFLOAD_PARAMS]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_sharding_strategy FSDP_SHARDING_STRATEGY]
[--fsdp_auto_wrap_policy FSDP_AUTO_WRAP_POLICY]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--fsdp_backward_prefetch_policy FSDP_BACKWARD_PREFETCH_POLICY]
[--fsdp_state_dict_type FSDP_STATE_DICT_TYPE]
[--megatron_lm_tp_degree MEGATRON_LM_TP_DEGREE]
[--megatron_lm_pp_degree MEGATRON_LM_PP_DEGREE]
[--megatron_lm_num_micro_batches MEGATRON_LM_NUM_MICRO_BATCHES]
[--megatron_lm_sequence_parallelism MEGATRON_LM_SEQUENCE_PARALLELISM]
[--megatron_lm_recompute_activations MEGATRON_LM_RECOMPUTE_ACTIVATIONS]
[--megatron_lm_use_distributed_optimizer MEGATRON_LM_USE_DISTRIBUTED_OPTIMIZER]
[--megatron_lm_gradient_clipping MEGATRON_LM_GRADIENT_CLIPPING]
[--aws_access_key_id AWS_ACCESS_KEY_ID]
[--aws_secret_access_key AWS_SECRET_ACCESS_KEY]
[--debug]
training_script
...
accelerate [] launch: error: argument --mixed_precision: invalid choice: '' (choose from 'no', 'fp16', 'bf16')

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Running with LoRA restricted to text_encoder, with no unet training produces title error.

Setup of config files for basic strategies (Dreambooth, Textual Inversion, LoRA)

Setup a "lean" colab that just loads the yaml file for direct editing
https://stackoverflow.com/questions/48687091/how-to-edit-and-save-text-files-py-in-google-colab

Accelerate scaler does not allow kwargs

Bombs using optimizers that require additional inputs to step()

Option to generate sample images from base model

set iter=0 and send to tracker

Drop text conditioning

runwayml/stable-diffusion#8
https://benanne.github.io/2022/05/26/guidance.htm

Check enable_full_determinism is actually fully reproducible

use_ema may is not prepared by accelerator. necessary? check unwrap as well for new parameter

pin_memory and num_workers > 1 produces completely different results?

Try disabling xformers?

--mixed_precision: invalid choice:

Hello. I can`t run training because of this error. mixed precision is set to "fp16"

No LSB modules are available.
Description: Ubuntu 20.04.5 LTS
diffusers==0.11.1
lora-diffusion @ file:///content/lora
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers==0.0.16rc425
2023-02-25 02:45:30.654450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-25 02:45:31.540076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:31.540174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:31.540193: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.15.0
Platform: Linux-5.10.147+-x86_64-with-glibc2.29
Python version: 3.8.10
Numpy version: 1.22.4
PyTorch version (GPU?): 1.13.1+cu116 (True)
Accelerate default config:
Not found
2023-02-25 02:45:36.607671: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-25 02:45:37.416912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:37.417016: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:37.417034: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
usage: accelerate [] launch
[-h]
[--config_file CONFIG_FILE]
[--cpu]
[--mps]
[--multi_gpu]
[--tpu]
[--use_mps_device]
[--dynamo_backend {no,eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--mixed_precision {no,fp16,bf16}]
[--fp16]
[--num_processes NUM_PROCESSES]
[--num_machines NUM_MACHINES]
[--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS]
[--use_deepspeed]
[--use_fsdp]
[--use_megatron_lm]
[--gpu_ids GPU_IDS]
[--same_network]
[--machine_rank MACHINE_RANK]
[--main_process_ip MAIN_PROCESS_IP]
[--main_process_port MAIN_PROCESS_PORT]
[--rdzv_conf RDZV_CONF]
[--max_restarts MAX_RESTARTS]
[--monitor_interval MONITOR_INTERVAL]
[-m]
[--no_python]
[--main_training_function MAIN_TRAINING_FUNCTION]
[--downcast_bf16]
[--deepspeed_config_file DEEPSPEED_CONFIG_FILE]
[--zero_stage ZERO_STAGE]
[--offload_optimizer_device OFFLOAD_OPTIMIZER_DEVICE]
[--offload_param_device OFFLOAD_PARAM_DEVICE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_clipping GRADIENT_CLIPPING]
[--zero3_init_flag ZERO3_INIT_FLAG]
[--zero3_save_16bit_model ZERO3_SAVE_16BIT_MODEL]
[--deepspeed_hostfile DEEPSPEED_HOSTFILE]
[--deepspeed_exclusion_filter DEEPSPEED_EXCLUSION_FILTER]
[--deepspeed_inclusion_filter DEEPSPEED_INCLUSION_FILTER]
[--deepspeed_multinode_launcher DEEPSPEED_MULTINODE_LAUNCHER]
[--fsdp_offload_params FSDP_OFFLOAD_PARAMS]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_sharding_strategy FSDP_SHARDING_STRATEGY]
[--fsdp_auto_wrap_policy FSDP_AUTO_WRAP_POLICY]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--fsdp_backward_prefetch_policy FSDP_BACKWARD_PREFETCH_POLICY]
[--fsdp_state_dict_type FSDP_STATE_DICT_TYPE]
[--megatron_lm_tp_degree MEGATRON_LM_TP_DEGREE]
[--megatron_lm_pp_degree MEGATRON_LM_PP_DEGREE]
[--megatron_lm_num_micro_batches MEGATRON_LM_NUM_MICRO_BATCHES]
[--megatron_lm_sequence_parallelism MEGATRON_LM_SEQUENCE_PARALLELISM]
[--megatron_lm_recompute_activations MEGATRON_LM_RECOMPUTE_ACTIVATIONS]
[--megatron_lm_use_distributed_optimizer MEGATRON_LM_USE_DISTRIBUTED_OPTIMIZER]
[--megatron_lm_gradient_clipping MEGATRON_LM_GRADIENT_CLIPPING]
[--aws_access_key_id AWS_ACCESS_KEY_ID]
[--aws_secret_access_key AWS_SECRET_ACCESS_KEY]
[--debug]
training_script
...
accelerate [] launch: error: argument --mixed_precision: invalid choice: '' (choose from 'no', 'fp16', 'bf16')

Saving and loading for all model variations

https://discuss.pytorch.org/t/how-to-save-the-requires-grad-state-of-the-weights/52906

https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html

Add close-up and face mask augmentations

Accelerate accumulation context for multiple models not supported yet

huggingface/accelerate#668

Do this manually?

Intermediate sampling should cache rng state

Add scheduler config

Prevent saving again when final step == last saved step

Workaround for fact SD 2 models don't have feature_extractor

huggingface/diffusers#1538

Passing None works for SD 2 models, but I think it wrecks previous models, since they come packaged with a feature_extractor

Related,
huggingface/diffusers#1500

Track down why memory consumption changes after saving

Potentially useful:
https://discuss.pytorch.org/t/gpu-memory-consumption-increases-while-training/2770/23

Add submodule (and module/class?) excludes to lora

Update EMA model to config style with dictionary for params

Remove some requirements

tensorboard
Put note in colab to install

bitsandbytes
Issue error?

Praise: Very excited for the ti branch!

Hey @brian6091,

Awesome work! Best SD Colab notebooks out there are yours! Been playing around with them for a couple of weeks with LoRA.

Can't wait for the ti branch with config Colab to be ready :)

Let me know how I can help out.

VRAM/Speed tests

Tesla T4

GPU=14396/15109MiB
3.66s/it training, 1.08s/it inference
BATCH_SIZE=4
TRAIN_TEXT_ENCODER
USE_8BIT_ADAM
FP16
GRADIENT_CHECKPOINTING
GRADIENT_ACCUMULATION_STEPS=1
USE_EMA=False
RESOLUTION=512
No errors or warnings with xformers-0.0.15.dev0+189828c

diffusers==0.9.0
accelerate==0.14.0
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.15/xformers-0.0.15.dev0+189828c.d20221207-cp38-cp38-linux_x86_64.whl

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.14.0
Platform: Linux-5.10.133+-x86_64-with-glibc2.27
Python version: 3.8.15
Numpy version: 1.21.6
PyTorch version (GPU?): 1.13.0+cu116 (True)

Augmentation

https://github.com/zhunzhong07/Random-Erasing
https://timm.fast.ai/RandomErase
https://pytorch.org/vision/main/generated/torchvision.transforms.RandomErasing.html#torchvision.transforms.RandomErasing
https://github.com/cjf8899/KeepAugment_Pytorch
https://github.com/dvlab-research/GridMask

Non-reproducible inference with same seed

huggingface/diffusers#902
huggingface/diffusers#314
huggingface/diffusers#612
huggingface/diffusers#208
https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb
https://pytorch.org/docs/stable/notes/randomness.html

Randomise prompts from text files if multiple

https://stackoverflow.com/questions/3540288/how-do-i-read-a-random-line-from-one-file

Error: "Attempting to unscale FP16 gradients."

Description: Ubuntu 18.04.6 LTS
diffusers @ git+https://github.com/huggingface/diffusers@326de4191578dfb55cb968880d40d703075e331e
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.14.0
Platform: Linux-5.10.133+-x86_64-with-glibc2.27
Python version: 3.8.15
Numpy version: 1.21.6
PyTorch version (GPU?): 1.13.0+cu116 (True)
Accelerate default config:
Not found

Steps: 0% 0/699 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/Dreambooth/train_dreambooth.py", line 854, in <module> main(args) File "/content/Dreambooth/train_dreambooth.py", line 810, in main accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1247, in clip_grad_norm_ self.unscale_gradients() File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1210, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 282, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. Steps: 0% 0/699 [00:12<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 551, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/Dreambooth/train_dreambooth.py', '--revision=fp16', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--instance_data_dir=/content/gdrive/MyDrive/InstanceImages/caetmurxb/', '--class_data_dir=/content/gdrive/MyDrive/RegularizationImages/person/', '--output_dir=/content/models/', '--logging_dir=/content/logs/', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of caetmux person', '--class_prompt=a photo of a person', '--seed=1275017', '--resolution=512', '--train_batch_size=4', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--mixed_precision=fp16', '--use_8bit_adam', '--adam_beta1=0.9', '--adam_beta2=0.999', '--adam_weight_decay=0.01', '--adam_epsilon=1e-08', '--learning_rate=6e-06', '--lr_scheduler=cosine', '--lr_warmup_steps=25', '--lr_cosine_num_cycles=5', '--ema_inv_gamma=1.0', '--ema_power=0.5', '--ema_min_value=0', '--ema_max_value=0.999', '--max_train_steps=699', '--num_class_images=1500', '--sample_batch_size=4', '--save_min_steps=100', '--save_interval=100', '--n_save_sample=4', '--save_sample_prompt=a photo of caetmux person', '--save_sample_negative_prompt=']' returned non-zero exit status 1.

brian6091 / dreambooth Goto Github PK

dreambooth's People

Contributors

Stargazers

Watchers

Forkers

dreambooth's Issues

Recommend Projects

Recommend Topics

Recommend Org