brian6091 / dreambooth Goto Github PK
View Code? Open in Web Editor NEWFine-tuning of diffusion models
License: Apache License 2.0
Fine-tuning of diffusion models
License: Apache License 2.0
Need to check how this interacts when xformers is enabled
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
Hello, there is nothing even close to my instance prompt in save_sample_prompt folder. I`m training LoRA and keep getting pictures of an old indigenous women instead of my token. What could cause that?
INSTANCE TOKEN - "sks"
CLASS_PROMPT - "photo of woman"
INSTANCE_PROMPT - "photo of sks"
SAVE_SAMPLE_PROMPT - "photo of sks woman"
https://stackoverflow.com/questions/15753701/how-can-i-pass-a-list-as-a-command-line-argument-with-argparse
https://stackoverflow.com/questions/49824248/allow-argparse-nargs-to-accept-comma-separated-input-with-choices
https://stackoverflow.com/questions/27280603/accept-a-list-of-type-string-from-the-command-line-in-argparse
I get this error.
This is my notebook. I'm check-marked LoRA and disabled the prior preservation option. Then the rests are in the default setting.
https://colab.research.google.com/drive/1Kr4Fy1_wIhg_j7_rYLpDv40FLKnnDqen?usp=share_link
Thank you
No LSB modules are available.
Description: Ubuntu 20.04.5 LTS
diffusers==0.11.1
lora-diffusion @ file:///content/lora
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers==0.0.16rc425
Copy-and-paste the text below in your GitHub issue
- `Accelerate` version: 0.15.0
- Platform: Linux-5.10.147+-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- `Accelerate` default config:
Not found
usage: accelerate <command> [<args>] launch
[-h]
[--config_file CONFIG_FILE]
[--cpu]
[--mps]
[--multi_gpu]
[--tpu]
[--use_mps_device]
[--dynamo_backend {no,eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--mixed_precision {no,fp16,bf16}]
[--fp16]
[--num_processes NUM_PROCESSES]
[--num_machines NUM_MACHINES]
[--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS]
[--use_deepspeed]
[--use_fsdp]
[--use_megatron_lm]
[--gpu_ids GPU_IDS]
[--same_network]
[--machine_rank MACHINE_RANK]
[--main_process_ip MAIN_PROCESS_IP]
[--main_process_port MAIN_PROCESS_PORT]
[--rdzv_conf RDZV_CONF]
[--max_restarts MAX_RESTARTS]
[--monitor_interval MONITOR_INTERVAL]
[-m]
[--no_python]
[--main_training_function MAIN_TRAINING_FUNCTION]
[--downcast_bf16]
[--deepspeed_config_file DEEPSPEED_CONFIG_FILE]
[--zero_stage ZERO_STAGE]
[--offload_optimizer_device OFFLOAD_OPTIMIZER_DEVICE]
[--offload_param_device OFFLOAD_PARAM_DEVICE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_clipping GRADIENT_CLIPPING]
[--zero3_init_flag ZERO3_INIT_FLAG]
[--zero3_save_16bit_model ZERO3_SAVE_16BIT_MODEL]
[--deepspeed_hostfile DEEPSPEED_HOSTFILE]
[--deepspeed_exclusion_filter DEEPSPEED_EXCLUSION_FILTER]
[--deepspeed_inclusion_filter DEEPSPEED_INCLUSION_FILTER]
[--deepspeed_multinode_launcher DEEPSPEED_MULTINODE_LAUNCHER]
[--fsdp_offload_params FSDP_OFFLOAD_PARAMS]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_sharding_strategy FSDP_SHARDING_STRATEGY]
[--fsdp_auto_wrap_policy FSDP_AUTO_WRAP_POLICY]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--fsdp_backward_prefetch_policy FSDP_BACKWARD_PREFETCH_POLICY]
[--fsdp_state_dict_type FSDP_STATE_DICT_TYPE]
[--megatron_lm_tp_degree MEGATRON_LM_TP_DEGREE]
[--megatron_lm_pp_degree MEGATRON_LM_PP_DEGREE]
[--megatron_lm_num_micro_batches MEGATRON_LM_NUM_MICRO_BATCHES]
[--megatron_lm_sequence_parallelism MEGATRON_LM_SEQUENCE_PARALLELISM]
[--megatron_lm_recompute_activations MEGATRON_LM_RECOMPUTE_ACTIVATIONS]
[--megatron_lm_use_distributed_optimizer MEGATRON_LM_USE_DISTRIBUTED_OPTIMIZER]
[--megatron_lm_gradient_clipping MEGATRON_LM_GRADIENT_CLIPPING]
[--aws_access_key_id AWS_ACCESS_KEY_ID]
[--aws_secret_access_key AWS_SECRET_ACCESS_KEY]
[--debug]
training_script
...
accelerate <command> [<args>] launch: error: argument --mixed_precision: invalid choice: '' (choose from 'no', 'fp16', 'bf16')
Seeing warning like this:
TheLastBen/fast-stable-diffusion#911
Training still runs, but should monitor memory to see?
Am using precompiled wheel from:
!pip install -q https://github.com/ShivamShrirao/xformers-wheels/releases/download/4c06c79/xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl
but maybe that only works for T4?
Switch to this?
!pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl
When launching training
Seems to be an error everywhere with this so not specific to this repo. Any ideas how to fix?
In expected directory:
cat < ~/.cache/huggingface/accelerate/default_config.yaml
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: 'NO'
downcast_bf16: 'no'
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
use_cpu: false
EOT
i get the below error when i run training cell in colab FineTuning_colab.ipynb
also run cell Training parameters and all parameter parsed
No LSB modules are available.
Description: Ubuntu 20.04.5 LTS
diffusers==0.11.1
lora-diffusion @ file:///content/lora
torchvision @ https://download.pytorch.org/whl/cu118/torchvision-0.15.1%2Bcu118-cp39-cp39-linux_x86_64.whl
transformers==4.25.1
xformers==0.0.16rc425
2023-04-16 09:29:59.351268: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-16 09:30:00.246985: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Copy-and-paste the text below in your GitHub issue
Accelerate
version: 0.15.0Accelerate
default config:Running with LoRA restricted to text_encoder, with no unet training produces title error.
Setup a "lean" colab that just loads the yaml file for direct editing
https://stackoverflow.com/questions/48687091/how-to-edit-and-save-text-files-py-in-google-colab
Bombs using optimizers that require additional inputs to step()
set iter=0 and send to tracker
Try disabling xformers?
Hello. I can`t run training because of this error. mixed precision is set to "fp16"
No LSB modules are available.
Description: Ubuntu 20.04.5 LTS
diffusers==0.11.1
lora-diffusion @ file:///content/lora
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers==0.0.16rc425
2023-02-25 02:45:30.654450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-25 02:45:31.540076: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:31.540174: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-25 02:45:31.540193: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Copy-and-paste the text below in your GitHub issue
Accelerate
version: 0.15.0Accelerate
default config:Do this manually?
Passing None works for SD 2 models, but I think it wrecks previous models, since they come packaged with a feature_extractor
Related,
huggingface/diffusers#1500
tensorboard
Put note in colab to install
bitsandbytes
Issue error?
Hey @brian6091,
Awesome work! Best SD Colab notebooks out there are yours! Been playing around with them for a couple of weeks with LoRA.
Can't wait for the ti branch with config Colab to be ready :)
Let me know how I can help out.
Tesla T4
GPU=14396/15109MiB
3.66s/it training, 1.08s/it inference
BATCH_SIZE=4
TRAIN_TEXT_ENCODER
USE_8BIT_ADAM
FP16
GRADIENT_CHECKPOINTING
GRADIENT_ACCUMULATION_STEPS=1
USE_EMA=False
RESOLUTION=512
No errors or warnings with xformers-0.0.15.dev0+189828c
diffusers==0.9.0
accelerate==0.14.0
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.15/xformers-0.0.15.dev0+189828c.d20221207-cp38-cp38-linux_x86_64.whl
Copy-and-paste the text below in your GitHub issue
Accelerate
version: 0.14.0Description: Ubuntu 18.04.6 LTS
diffusers @ git+https://github.com/huggingface/diffusers@326de4191578dfb55cb968880d40d703075e331e
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl
Copy-and-paste the text below in your GitHub issue
Accelerate
version: 0.14.0Accelerate
default config:Steps: 0% 0/699 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/Dreambooth/train_dreambooth.py", line 854, in <module> main(args) File "/content/Dreambooth/train_dreambooth.py", line 810, in main accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1247, in clip_grad_norm_ self.unscale_gradients() File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1210, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 282, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. Steps: 0% 0/699 [00:12<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 551, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/Dreambooth/train_dreambooth.py', '--revision=fp16', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--instance_data_dir=/content/gdrive/MyDrive/InstanceImages/caetmurxb/', '--class_data_dir=/content/gdrive/MyDrive/RegularizationImages/person/', '--output_dir=/content/models/', '--logging_dir=/content/logs/', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of caetmux person', '--class_prompt=a photo of a person', '--seed=1275017', '--resolution=512', '--train_batch_size=4', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--mixed_precision=fp16', '--use_8bit_adam', '--adam_beta1=0.9', '--adam_beta2=0.999', '--adam_weight_decay=0.01', '--adam_epsilon=1e-08', '--learning_rate=6e-06', '--lr_scheduler=cosine', '--lr_warmup_steps=25', '--lr_cosine_num_cycles=5', '--ema_inv_gamma=1.0', '--ema_power=0.5', '--ema_min_value=0', '--ema_max_value=0.999', '--max_train_steps=699', '--num_class_images=1500', '--sample_batch_size=4', '--save_min_steps=100', '--save_interval=100', '--n_save_sample=4', '--save_sample_prompt=a photo of caetmux person', '--save_sample_negative_prompt=']' returned non-zero exit status 1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.