kohakublueleaf / lycoris Goto Github PK

View Code? Open in Web Editor NEW

2.1K 20.0 139.0 217.5 MB

Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion.

License: Apache License 2.0

Python 100.00%

finetune stable-diffusion

lycoris's Introduction

LyCORIS - Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion. (ICLR'24)

A project that implements different parameter-efficient fine-tuning algorithms for Stable Diffusion.

This project originated from LoCon (see archive branch).

If you are interested in discussing more details, you can join our Discord server

If you want to check more in-depth experiment results and discussions for LyCORIS, you can check our paper

Algorithm Overview

LyCORIS currently contains LoRA (LoCon), LoHa, LoKr, (IA)^3, DyLoRA, Native fine-tuning (aka dreambooth). GLoRA and GLoKr are coming soon. Please check List of Implemented Algorithms and Guidelines for more details.

A simple comparison of some of these methods are provided below (to be taken with a grain of salt)

	Full	LoRA	LoHa	LoKr low factor	LoKr high factor $^+$
Fidelity	★	●	▲	◉	▲
Flexibility $^*$	★	●	◉	▲	● $^†$
Diversity	▲	◉	★	●	★
Size	▲	●	●	●	★
Training Speed Linear	★	●	●	★	★
Training Speed Conv	●	★	▲	●	●

★ > ◉ > ● > ▲ [> means better and smaller size is better]

$^+$ Usually we take factor <= 0.5 * sqrt(dim) as low factor and factor >= sqrt(dim as high factor. For example, factor<=8 for SD1.x/SD2.x/SDXL can be seen as low factor, and, factor>=16 can be seen as high factor.
$^*$ Flexibility means anything related to generating images not similar to those in the training set, and combination of multiple concepts, whether they are trained together or not
$^†$ It may become more difficult to switch base model or combine multiple concepts in this situation

The actual performance may vary depending on the datasets, tasks, and hyperparameters used. It is recommended to experiment with different settings to achieve optimal results.

Usage

Image Generation

a1111/sd-webui

After sd-webui 1.5.0, LyCORIS models are officially supported by the built-in LoRA system. You can put them in either models/Lora or models/LyCORIS and use the default syntax <lora:filename:multiplier> to trigger it.

When we add new model types, we will always make sure they can be used with the newest version of sd-webui.

As for sd-webui with version < 1.5.0 or sd-webui-forge, please check this extension.

Others

As far as we are aware, LyCORIS models are also supported in the following interfaces / online generation services (please help us complete the list!)

However, newer model types may not always be supported. If you encounter this issue, consider requesting the developers of the corresponding interface or website to include support for the new type.

Training

There are three different ways to train LyCORIS models.

With kohya-ss/sd-scripts (see a list of compatible graphical interfaces and colabs at the end of the section)
With Naifu-Diffusion
With your own script by using LyCORIS as standalone wrappers for ANY pytorch modules.

In any case, please install this package in the corresponding virtual environment. You can either install it

through pip
```
pip install lycoris-lora
```

or from source

git clone https://github.com/KohakuBlueleaf/LyCORIS
cd LyCORIS
pip install .

A detailed description of the network arguments is provided in docs/Network-Args.md.

kohya script

You can use this package's kohya module to run kohya's training script to train lycoris module for SD models

with command line arguments

accelerate launch train_network.py \
  --network_module lycoris.kohya \
  --network_dim "DIM_FOR_LINEAR" --network_alpha "ALPHA_FOR_LINEAR"\
  --network_args "conv_dim=DIM_FOR_CONV" "conv_alpha=ALPHA_FOR_CONV" \
  "dropout=DROPOUT_RATE" "algo=locon" \

with toml files

accelerate launch train_network.py \
  --config_file example_configs/training_configs/kohya/loha_config.toml \
  --dataset_config example_configs/training_configs/kohya/dataset_config.toml

For your convenience, some example toml files for kohya LyCORIS training are provided in example/training_configs/kohya.

HCP-Diffusion

The support for HCP-Diffusion has been dropped on LyCORIS3.0.0, we will wait until HCP side finish the implementation of new wrapper

You can use this package's hcp module to run HCP-Diffusion's training script to train lycoris module for SD models

accelerate launch -m hcpdiff.train_ac_single \
  --cfg example_configs/training_configs/hcp/hcp_diag_oft.yaml

For your convenience, some example yaml files for HCP LyCORIS training are provided in example/training_configs/hcp.

For the moment being the outputs of HCP-Diffusion are not directly compatible with a1111/sdwebui. You can perform conversion with tools/batch_hcp_convert.py.

In the case of pivotal tuning, tools/batch_bundle_convert.py can be further used to convert to and from bundle formats. Check docs/Conversion-scripts.md for more information.

As standalone wrappers

See standalone_example.py for full example.

Import create_lycoris and LycorisNetwork from lycoris library, put your preset to LycorisNetwork and then use create_lycoris to create LyCORIS module for your pytorch module.

For example:

from lycoris import create_lycoris, LycorisNetwork

LycorisNetwork.apply_preset(
    {"target_name": [".*attn.*"]}
)
lycoris_net = create_lycoris(
    your_model, 
    1.0, 
    linear_dim=16, 
    linear_alpha=2.0, 
    algo="lokr"
)
lycoris_net.apply_to()

# after apply_to(), your_model() will run with LyCORIS net
lycoris_param = lycoris_net.parameters()
forward_with_lyco = your_model(x)

You can check my HakuPhi project to see how I utilize LyCORIS to finetune the Phi-1.5 models.

Other method

After LyCORIS3.0.0, Parametrize API and Functional API have been added, which provide more different ways on utilizing LyCORIS library.

Check API reference for more informations. You can also take the test suites as a kind of examples.

Bitsandbytes support

See bnb_example.py for example. Basically as same as standalone wrapper.

Graphical interfaces and Colabs (via kohya trainer)

You can also train LyCORIS with the following graphical interfaces

and colabs (please help us complete the list!)

However, they are not guaranteed to be up-to-date. In particular, newer types may not be supported. Consider requesting the developers for support or simply use the original kohya script in this case.

Utilities

Extract LoCon

You can extract LoCon from a dreambooth model with its base model.

python3 extract_locon.py <settings> <base_model> <db_model> <output>

Use --help to get more info

$ python3 extract_locon.py --help
usage: extract_locon.py [-h] [--is_v2] [--is_sdxl] [--device DEVICE] [--mode MODE] [--safetensors] [--linear_dim LINEAR_DIM]
                        [--conv_dim CONV_DIM] [--linear_threshold LINEAR_THRESHOLD] [--conv_threshold CONV_THRESHOLD]
                        [--linear_ratio LINEAR_RATIO] [--conv_ratio CONV_RATIO] [--linear_quantile LINEAR_QUANTILE]
                        [--conv_quantile CONV_QUANTILE] [--use_sparse_bias] [--sparsity SPARSITY] [--disable_cp]
                        base_model db_model output_name

Merge LyCORIS back to model

You can merge your LyCORIS model back to your checkpoint (base model).

python3 merge.py <settings> <base_model> <lycoris_model> <output>

Use --help to get more info

$ python3 merge.py --help
usage: merge.py [-h] [--is_v2] [--is_sdxl] [--device DEVICE] [--dtype DTYPE] [--weight WEIGHT] base_model lycoris_model output_name

Conversion of LoRA, LyCORIS and full models between HCP and sd-webui format

This script allows you to use the LyCORIS models trained with HCP-Diffusion in sd-webui.

python3 batch_hcp_convert.py \
  --network_path /path/to/ckpts \
  --dst_dir /path/to/stable-diffusion-webui/models/Lora \
  --output_prefix something \
  --auto_scale_alpha --to_webui

See docs/Conversion-scripts.md for more information.

Conversion from and to bundle format

This script is particularly useful in the case of pivotal tuning.

python3 batch_bundle_convert.py \
  --network_path /path/to/sd-webui-ssd/models/Lora  \
  --emb_path /path/to/ckpts \
  --dst_dir /path/to/sd-webui-ssd/models/Lora/bundle \
  --to_bundle --verbose 2

See docs/Conversion-scripts.md for more information.

Change Log

For full log, please see Change.md

2024/06/29 update to 3.0.0 - Brand New Functional API, Parametrize API and Module API

The reasons of 3.0.0

We reconstruct the whole library with new Class definition and brand new Functional API system.

We also removed lot of redundant/unused modules.

Since the whole library are changed significantly. We decide to call it 3.0.0 as a new major version.

Major Changes

New Module API
Add Parametrize API
Add Functional API
- LoCon/LoHa/LoKr/Diag-OFT/BOFT only.
Remove optional deps from install_requires
Remove lot of redundant/deprecated modules
Better testing
HunYuan DiT Support (PR in kohya-ss/sd-scripts)

Full change log

New Features

LyCORIS now have consistent API for different algorithm like bypass_forward_diff or get_diff_weight method. Developers of other project can utilize these API to do more tricks or integrate LyCORIS into their framework more easily.
LyCORIS now have parametrize API which utilize torch.nn.utils.parametrize.register_parametrization to directly patch individual parameters. Which can be useful for MHA layer or other tricky modules.
- Currently only support 2~5D tensors. And LyCORIS will pretend these weights are weight of Linear/Conv1,2,3D then send it into LyCORIS modules
- More native implementation or more detailed control will be added in the future.
LyCORIS now have functional API. Developers who prefer functional more than Module things can utilize this feature.
- Functional API also allow developers who don't want to introduce new dependencies. Just copy-paste the source code and utilizing it. (with Apache-2 License, directly copy-paste is totally allowed)
Add support for Conv1d and Conv3d module on LoCon/LoHa/LoKr/Full/OFT/BOFT/GLoRA (not All algo in LyCORIS support them, you may receive error when apply unsopported algo), support inherited module (for example: LoRACompatibleConv or LoRACompatibleLinear from huggingface/diffusers)
HunYuan DiT support.

Improvements, Fixes, Slight Changes

Drop dependencies related to kohya-ss/sd-scripts:
- We now take kohya-ss/sd-scripts as optional dependency
- Which means transformers, diffusers and anything related to kohya are all optional deps now.
The definition of dropout and rank_dropout in each algorithm are changed. Since some concept of original rank_dropout in the lora of kohya-ss/sd-script is hard to applied to other algorithm. We can only design the dropout for each module seperatedly.
apply_max_norm issue are all fixed.
DyLoRA, (IA)^3, GLoRA are all rewritten and support Linear/Conv1,2,3d.
(IA)^3, GLoRA, Diag-OFT, BOFT are supported in create_lycoris_from_weights
- lycoris.kohya.create_network_from_weights also support them as well.
Fix wrong implementation of BOFT.
create_lycoris_from_weights and create_network_from_weights now have correct logging infos.
get_module and make_module are moved into modules' API.

Todo list

Automatically selecting an algorithm based on the specific rank requirement.
More experiments for different task, not only diffusion models.
- LoKr and LoHa have been proven to be useful for Large Language Model.
Explore other low-rank representations or parameter-efficient methods to fine-tune either the entire model or specific parts of it.
Documentation for whole library.

Citation

@inproceedings{
  yeh2024navigating,
  title={Navigating Text-To-Image Customization: From Ly{CORIS} Fine-Tuning to Model Evaluation},
  author={SHIH-YING YEH and Yu-Guan Hsieh and Zhidong Gao and Bernard B W Yang and Giyeong Oh and Yanmin Gong},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=wfzXa8e783}
}

lycoris's People

Contributors

Stargazers

Watchers

Forkers

dumpmemory cian0 giga-bytes-dev breakcore2 qingyanbailon hybs3288 blackteawing treksis godmapper gitbenxing 253701 s60912frank madornc ryukra derrian-distro zhidong-gao chicara pupubear007 stardust-sjf kaibioinfo shadowwalker2718 aradon2 qhsakura unaking torridgristle tzekeong samsungapple ongogo hoooon89 xiaotoming qr-0w ddpn08 mrdannywu zcloud2014 haodongzhai jeremiahyan bootsoflagrangian eki-indradi 1124503938 pu-03 hengle selena-t rakataprime laguhesuger soratoharuka dylangh warhreits yuway123 jikezan lkkfree nonfungiblefuturist kangli mocemolon kimx3966 dennisblin thumper-ai thomasyu730 cuirobert zuberbaig89 pzx-star haizhixin2021 dmarx eggpancake2 undercontroller wencuiqiqi kimwoonggon andysingal rugia813 yuanyuan3456 linecode v0xie neonsecret tiamat-tech if-ai lievreai junnyu zherui-yang faildes renanmb hozhenwai 5l1v3r1 teargosling kovalexal dawei03896 vorstcavry idlebg kcbf reanimatedmanx donhardman yamkz hongdangshao dqz00116 qun1qum jonbro32 asutermo realoxy31 rockerboo elsucht dezigns333 eugeneyuz

lycoris's Issues

please make cp-decomposition optional

I rolled back to 0.0.9 so that I can train without it and the results are like night and day
turns out I was never wrong about cp-decomposition breaking training for me
but maybe it's because I'm trying to train styles rather than characters?

extract_locon Missing key(s) in state dict

I'm trying to use extract_locon on a model trained on v1-5-pruned-emaonly.safetensors

python tools/extract_locon.py --safetensors --device=cuda --mode="fixed" --linear_dim=64 ~/workspace/python/ai_art/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ~/workspace/python/ai_art/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/EmilyKavanaugh/EmilyKavanaugh_3000.safetensors ~/workspace/python/ai_art/stable-diffusion/stable-diffusion-webui/models/LyCORIS/emilyKavanaugh.safetensors

And receive the following error:

loading u-net: <All keys matched successfully> Traceback (most recent call last): File "/home/skoehler/workspace/python/ai_art/stable-diffusion/lycoris/tools/extract_locon.py", line 129, in <module> main() File "/home/skoehler/workspace/python/ai_art/stable-diffusion/lycoris/tools/extract_locon.py", line 97, in main base = load_models_from_stable_diffusion_checkpoint(args.is_v2, args.base_model) File "/home/skoehler/workspace/python/ai_art/stable-diffusion/lycoris/lycoris/kohya_model_utils.py", line 865, in load_models_from_stable_diffusion_checkpoint info = vae.load_state_dict(converted_vae_checkpoint) File "/home/skoehler/workspace/python/ai_art/stable-diffusion/lycoris/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for AutoencoderKL: Missing key(s) in state_dict: "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias". Unexpected key(s) in state_dict: "encoder.mid_block.attentions.0.key.bias", "encoder.mid_block.attentions.0.key.weight", "encoder.mid_block.attentions.0.proj_attn.bias", "encoder.mid_block.attentions.0.proj_attn.weight", "encoder.mid_block.attentions.0.query.bias", "encoder.mid_block.attentions.0.query.weight", "encoder.mid_block.attentions.0.value.bias", "encoder.mid_block.attentions.0.value.weight", "decoder.mid_block.attentions.0.key.bias", "decoder.mid_block.attentions.0.key.weight", "decoder.mid_block.attentions.0.proj_attn.bias", "decoder.mid_block.attentions.0.proj_attn.weight", "decoder.mid_block.attentions.0.query.bias", "decoder.mid_block.attentions.0.query.weight", "decoder.mid_block.attentions.0.value.bias", "decoder.mid_block.attentions.0.value.weight".

Is this a bug or am I doing something wrong?

r

modules.devices.NansException

getting this error trying to generate using locon trained with 0.1.2

modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

also, file size of safetensors are smaller compared to 0.0.9 (180 MB vs 230 MB)

I have cp-decomposition disabled
--network_args "conv_dim=64" "conv_alpha=1" "disable_cp=true" "algo=lora"

Token/keyword extraction?

So, I was wondering if it's possible to create a LyCoris from a keyword/token used in dreambooth training?

For example, if the keyword is 'zwx' and the token is '666', can we extract everything related to 'zwx/666' to create a LyCoris?

Sorry if I'm saying some kind of nonsense, I'm a total amateur in this subject. 😅

The reason I'm asking is that the extracted LyCoris from my dreambooth training only works with the specific model I used for training.

If I try using another model, the result is garbage and doesn't resemble me at all. I'm not sure if I'm doing something wrong, but I've tried extracting data using all modes (fixed, ratio, quantile, threshold) and with various values.

I'm using TheLastBen dreambooth, and the output model works great, but I just can't seem to create a LyCoris that works with all models.

"weight´s shape is different:" message when trying to use LoCon

Hi all!
A question to all which have extracted LoCons already:
I am able to extract the difference between the trained and base model as LoRa without problems and it´s working as intended.
When i am trying to extract it (same models) as LoCon, extraction is working well. But when i am trying to use it i am receiving a "weight's shape is different: lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight expected torch.Size([80, 320, 3, 3]) found torch.Size([80, 320]). SD version may be different" error.

So my main question is, how do i know and / or figure out, which dimensions and settings i have to use when extracting the weights as LoCon?

LoHa loss shoot through the roof

I have been trainina using the new LoHa tytpe and the Loss went through the roof... but using the previous LoCon method is working fine:

Command used for LoHa. LoCon was same except for using algo=lora:

accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/models/v1-5-pruned.ckpt" --train_data_dir="D:\dataset\trailbugprints\img_g8" --resolution=512,512 --output_dir="D:/lora/sd1.5/trailbugprints" --logging_dir="D:\dataset\trailbugprints\lora\logs" --network_alpha="384" --save_model_as=safetensors --network_module=lycoris.kohya --network_args "conv_dim=384" "conv_alpha=384" "algo=loha" --text_encoder_lr=0.00005 --unet_lr=0.0001 --network_dim=384 --output_name="trailbugprints_v1.0c" --lr_scheduler_num_cycles="4" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="8" --max_train_steps="600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --caption_dropout_rate="0.05" --bucket_reso_steps=1 --xformers --bucket_no_upscale --noise_offset=0.05 --sample_sampler=dpmsolver++ --sample_prompts="D:/lora/sd1.5/trailbugprints\sample\prompt.txt" --sample_every_n_steps="100"

error for train

When I run train_network.py, I get the following error just before training.
Traceback (most recent call last): File "sd-scripts/train_network.py", line 507, in <module> train(args) File "sd-scripts/train_network.py", line 129, in train network = network_module.create_network(1.0, args.network_dim, args.network_alpha, vae, text_encoder, unet, **net_kwargs) File "/content/sd-scripts/LoCon/locon/locon_kohya.py", line 21, in create_network network = LoRANetwork( File "/content/sd-scripts/LoCon/locon/locon_kohya.py", line 121, in __init__ self.unet_loras = create_modules(LoRANetwork.LORA_PREFIX_UNET, unet, LoRANetwork.UNET_TARGET_REPLACE_MODULE) File "/content/sd-scripts/LoCon/locon/locon_kohya.py", line 108, in create_modules lora = LoConModule(lora_name, child_module, self.multiplier, self.conv_lora_dim, self.conv_alpha) File "/content/sd-scripts/LoCon/locon/locon.py", line 37, in __init__ self.scale = alpha / self.lora_dim TypeError: unsupported operand type(s) for /: 'str' and 'int' Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 551, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', 'sd-scripts/train_network.py', '--pretrained_model_name_or_path=JosephusCheung/ACertainty', '--train_data_dir=train', '--reg_data_dir=reg', '--network_dim=16', '--network_alpha=8', '--resolution=512', '--output_dir=output', '--prior_loss_weight=1.0', '--train_batch_size=2', '--text_encoder_lr=8.333333333333334e-06', '--unet_lr=5e-05', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=750', '--max_train_steps=3000', '--optimizer_type=Lion', '--mixed_precision=fp16', '--xformers', '--cache_latents', '--gradient_checkpointing', '--save_precision=fp16', '--save_every_n_epochs=2', '--save_model_as=safetensors', '--network_module=LoCon.locon.locon_kohya', '--network_args', 'conv_dim=8', 'conv_alpha=4', '--logging_dir=logs', '--clip_skip=2']' returned non-zero exit status 1.

Configuration of train_network.py.

!accelerate launch --num_cpu_threads_per_process 2 sd-scripts/train_network.py \ --pretrained_model_name_or_path=$pretrained_model_name_or_path \ --train_data_dir=train \ --reg_data_dir=reg \ --network_dim=16 \ --network_alpha=8 \ --resolution=512 \ --output_dir='output' \ --prior_loss_weight=1.0 \ --train_batch_size=2 \ --text_encoder_lr=8e-7\ --unet_lr=5e-5 \ --lr_scheduler=cosine_with_restarts \ --lr_warmup_steps=750 \ --max_train_steps=3000 \ --optimizer_type=Lion \ --mixed_precision='fp16' \ --xformers \ --cache_latents \ --gradient_checkpointing \ --save_precision='fp16' \ --save_every_n_epochs=2 \ --save_model_as=2 \ --network_module=LoCon.locon.locon_kohya \ --network_args "conv_dim=8" "conv_alpha=4" \ --logging_dir=logs \ --clip_skip=2

"Additional Network extension not installed"

all files pulled to newest @april 13

1 Additional Network extension is installed and work correctly when using lora
but when webui boot
shell output

Additional Network extension not installed, Only hijack built-in lora
LoCon Extension hijack built-in lora successfully

2 When using lycoris, Additional Network extension gives error
shell output

assert cv_name is not None, f"conversion failed: {du_name}. the model may not be trained by sd-scripts."
AssertionError: conversion failed: lora_unet_down_blocks_0_downsamplers_0_conv. the model may not be trained by sd-scripts.

how can i fix lycoris extension?

Merging to model issue

I tried to merge ia3 to model using the script in tools but is seams that it does not merged corretly, here is comparison.
Standard use

Merge

I remember Lora already provide decomposition for Conv Layer

I remember Lora already provided the decomposition for the convolution layer:
See this:

https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/lora.py

In lines 105-123, their decomposition approach is totally the same as yours.

Any difference between your implementation and Lora's?

Possible to change default weights post training?

If I have a LoCon with TE that works better at 1.1 and Unet better at 1.0 is it possible to make it so that TE 1.1 becomes 1.0 by default?

BUG report

Error running process_batch: H:\novelai-webui-aki-v2\extensions\sd-webui-additional-networks\scripts\additional_networks.py
Traceback (most recent call last):
File "H:\novelai-webui-aki-v2\modules\scripts.py", line 395, in process_batch
script.process_batch(p, *script_args, **kwargs)
File "H:\novelai-webui-aki-v2\extensions\sd-webui-additional-networks\scripts\additional_networks.py", line 209, in process_batch
network, info = lora_compvis.create_network_and_apply_compvis(du_state_dict, weight_tenc, weight_unet, text_encoder, unet)
File "H:\novelai-webui-aki-v2\extensions\a1111-sd-webui-locon\locon_compvis.py", line 62, in create_network_and_apply_compvis
network = LoConNetworkCompvis(
File "H:\novelai-webui-aki-v2\extensions\a1111-sd-webui-locon\locon_compvis.py", line 289, in init
self.text_encoder_loras, te_rep_modules = create_modules(
File "H:\novelai-webui-aki-v2\extensions\a1111-sd-webui-locon\locon_compvis.py", line 266, in create_modules
alpha = comp_state_dict[f'{lora_name}.alpha'].item()
KeyError: 'lora_te_wrapped_transformer_text_model_encoder_layers_0_self_attn_k_proj.alpha'
提示：Python 运行时抛出了一个异常。请检查疑难解答页面。

同时使用Locon模型和LoRA模型时，将LoRA模型放在add-network插件第二位时会引发Locon插件错误，同时第二个LORA模型失效。

Example of using a script for extraction.

We really need an example of using a script for extraction.
I tried to register the same settings as in Kohya, but it doesn't work.

about image generation speed

loha's image generation speed is much slower than lora's, right? Is it just me?
Can you do something about it?

Modification to also support "lora:" in prompt

Hello @KohakuBlueleaf,

I have tried tagging you in this discussion (kohya-ss/sd-scripts#397) but I am not sure if it worked.

The issue: The a1111 built-in lora extension cannot handle newer loras and we were wondering if the lycoris extension could be slightly modified to also work as default lora handler. Please see discussion link for details and add your thoughts there.

Request: set default value of disable_conv_cp to True for better backward capatibility

Hi, I just upgraded to lycoris 0.1.3 today and tried to fix my training parameters until I realised that there has been a new feature and does not go well with addnet. Considering that some may not be aware of this update and train/upload models incompatible with addnet without noting it, do you mind setting the default as to disable this functionality?

Request: extract_locon.py Half Precision when loading models for memory saving

Currently I can't load two models on a free Colab instance, but if I modify it to load the models in half precision it'll work without issue as long as the weights are changed back to float for SVD.

It also seems to load the VAE for both, which shouldn't be needed, and it always loads the text encoder. I'd like an option to skip the text encoder and just focus on the U-Net.

recomposing and merging LoCON with ONNX weights

I'm trying to load and blend/merge LoCON weights with the existing weights in ONNX models, and the correct math is eluding me, even after reading the Algo page a few times. The ONNX model stores weights in their full form, for example (320, 320, 3, 3) for a 320x320 conv2d with 3x3 kernel.

I have working code for LoRA models and the 1x1 kernel, but the if use_cp and k_size != (1, 1) case is giving me trouble: https://github.com/KohakuBlueleaf/LyCORIS/blob/main/lycoris/locon.py#L35

Finding the three weights, down/mid/up, is no problem but I am having trouble multiplying them back into the (320, 320, 3, 3) shape. Everything I've tried results in (320, 320, 1, 1). For example, a model may have:

>>> (lora_down.shape, lora_mid.shape, lora_up.shape)
(torch.Size([4, 320, 1, 1]), torch.Size([4, 4, 3, 3]), torch.Size([320, 4, 1, 1]))

This is a conv2d node, with a 3x3 kernel, stride of (2, 2) and padding of (1, 1, 1, 1).

Looking at https://github.com/KohakuBlueleaf/LyCORIS/blob/main/Algo.md#cp-decomposition, it appears that lora_down matches x1 as expected, and lora_up would match x2 with a .permute((1, 0, 2, 3)). It's the lora_mid or $\tau$ that is giving me trouble. This feels like it should be obvious, but how would you go about recomposing those weights?

Examples of Lycoris using methods other than LoCon / LoHa

Hi,

Thanks for these project.

I'm adding Lycoris support to my SD server project, gyre.ai. Do you know of any released Lycoris that use IA3, LoKR (or any of any format that use bias or apply to MultiheadAttention) that I can test compatibility with?

It would be good to have examples of each under a permissive license (Apache 2.0 is good) for cross-testing. They don't have to be good, just standardized :).

Running the latest extract_locon.py result in AttributeError: 'Namespace' object has no attribute 'disable_small_conv'

Here is the full command and resulting error:

python tools\extract_locon.py --mode quantile --safetensors --linear_quantile 0.75 --conv_quantile 0.75 --device cuda D:/models/v1-5-pruned.ckpt "C:\Users\berna\Downloads\deliberate_v2.safetensors" "D:/lora/sd1.5/deliberate_v2.safetensors"

Traceback (most recent call last):
  File "D:\kohya_ss\tools\lycoris_locon_extract.py", line 129, in <module>
    main()
  File "D:\kohya_ss\tools\lycoris_locon_extract.py", line 119, in main
    not args.disable_small_conv
AttributeError: 'Namespace' object has no attribute 'disable_small_conv'

Is it possible to edit the LyCORIS block weights and save?

With webui extension "SuperMerger" we can edit the block weights for a lora and then save as a new one. But the extension dose not support LyCORIS. Is there any way we can do that like lora?

Usage with DyLoRA

How to use LyCORIS with DyLoRA? The option for it in sd-scripts and the way to use LyCORIS seem to conflict, i.e. I can either use --network_module networks.dylora or --network_module lycoris.kohya.

SVDiff, another new training method

I saw this the other day, and it just hit me that maybe you would be interested in it (and that maybe you haven't seen it yet)
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

there are several independent ideas on that paper, but the main one being "Compact Parameter Space for Diffusion Fine-
tuning", finetuning using "Spectral shifts" (which sounds cool and thus picked my interest), to train personalized and combinable concepts

Is there a way to merge 2 locon loras?

Conv Settings

Is there any typical range that should be used when setting --conv dim and --conv alpha? should they be the same as --network dim and --network_alpha

Merge script not working.

When merging the locon generated by extract_locon.py with merge.py, the generated image is broken.
(I reported it once on Discord, but I'll leave it here as well. I'm sorry if it's annoying.)

Error when learning LoHa.

kohya-ss/sd-scripts#563 (comment)
When I try to train LoHa, it stops with the following error just before training.

Traceback (most recent call last): File "/content/sd-scripts/train_network.py", line 864, in <module> train(args) File "/content/sd-scripts/train_network.py", line 214, in train network = network_module.create_network( File "/usr/local/lib/python3.10/dist-packages/lycoris/kohya.py", line 26, in create_network dropout = float(kwargs.get('dropout', 0.)) TypeError: float() argument must be a string or a real number, not 'NoneType' ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/sd-scripts/train_network.py:864 in <module> │ │ │ │ 861 │ args = parser.parse_args() │ │ 862 │ args = train_util.read_config_from_file(args, parser) │ │ 863 │ │ │ ❱ 864 │ train(args) │ │ 865 │ │ │ │ /content/sd-scripts/train_network.py:214 in train │ │ │ │ 211 │ │ network, _ = network_module.create_network_from_weights(1, arg │ │ 212 │ else: │ │ 213 │ │ # LyCORIS will work with this... │ │ ❱ 214 │ │ network = network_module.create_network( │ │ 215 │ │ │ 1.0, args.network_dim, args.network_alpha, vae, text_encod │ │ 216 │ │ ) │ │ 217 │ if network is None: │ │ │ │ /usr/local/lib/python3.10/dist-packages/lycoris/kohya.py:26 in │ │ create_network │ │ │ │ 23 │ │ network_dim = 4 # default │ │ 24 │ conv_dim = int(kwargs.get('conv_dim', network_dim)) │ │ 25 │ conv_alpha = float(kwargs.get('conv_alpha', network_alpha)) │ │ ❱ 26 │ dropout = float(kwargs.get('dropout', 0.)) │ │ 27 │ algo = kwargs.get('algo', 'lora') │ │ 28 │ use_cp = (not kwargs.get('disable_conv_cp', True) │ │ 29 │ │ │ or kwargs.get('use_conv_cp', False)) │ ╰──────────────────────────────────────────────────────────────────────────────╯ TypeError: float() argument must be a string or a real number, not 'NoneType' ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /usr/local/bin/accelerate:8 in <module> │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if __name__ == '__main__': │ │ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.p │ │ y:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if __name__ == "__main__": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 │ │ in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == Com │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in │ │ simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'sd-scripts/train_network.py', '--pretrained_model_name_or_path=kubanemil/AnyLORA', '--dataset_config=/content/kohya-mydatasets/datasets/suc.toml', '--network_dim=4', '--network_alpha=1', '--output_dir=/content/drive/MyDrive/loras/suc_loha_0.0039', '--lr_scheduler=cosine_with_restarts', '--lr_scheduler_num_cycles=2', '--learning_rate=0.0039', '--output_name=suc_loha_0.0039', '--prior_loss_weight=1.0', '--seed=42', '--max_train_epochs=10', '--optimizer_type=AdamW8bit', '--optimizer_args', 'weight_decay=1e-1', 'betas=0.9,0.99', '--max_grad_norm=1.0', '--mixed_precision=fp16', '--xformers', '--gradient_checkpointing', '--save_precision=fp16', '--sample_every_n_epochs=1', '--sample_prompts=/content/kohya-mydatasets/datasets/suc.txt', '--save_model_as=safetensors', '--color_aug', '--bucket_no_upscale', '--log_with=wandb', '--wandb_api_key=a156eaf006b4bb38f565d4938684c66f0da523fa', '--network_module=lycoris.kohya', '--network_args', 'conv_dim=4', 'conv_alpha=1', 'algo=loha', '--max_token_length=150', '--logging_dir=logs', '--noise_offset=0.05', '--adaptive_noise_scale=0.05', '--clip_skip=2', '--min_snr_gamma=5']' returned non-zero exit status 1.

No corruption was found in the image files.

Some General Questions

This isn't so much an Issue, but rather starting a thread in here;

First of all, Great work!
After reading through the Algo.md & Demo.md it looks really promising.

But i still have some Questions left open:

***What exactly is LyCORIS?:***

Because as i Understood it:

LoRA: The "Normal" Baseline,
LoCon: Introduction of a Convolution Network, which allows the model to also train the Res-Blocked parts, which "prevent" overspecialisations and in theory should also increase the Flexibility. (Depracted and turned into This)
LoHa: Utilizing the Hadamard Product representation, which allows for better weighting generation, and also saves Memory
CP-Decomposition: Speeding up the CPU, with negligible Accuracy loss
LyCORIS: Convolution Network (LoCon) + Hadamard Product Representation (LoHa) + CP-decomposition

Which together means:

Better Learning Rates
More Flexible final Product
Less Memory used
Faster Processing

Or am i Missing/ Misunderstanding Something?

As it Currently stands LyCORIS only works for SD; Does that Include SD-Based models like Anything or HD, or only the "Real" Stable Diffusion Releases?

~~Also less of a question, but how will your project be maintained in the future, as i didn't see any update function?~~

~~And will you Co-Operate with Bmaltais again, to make Lycoris a Standard Feature of the GUI, just like LoCon?~~
Edit: OK i'm stupid After updating Bmaltais GUI the option is now there, without the need to specifically install from the directory, as the Gui handles it itself including (I assume) updating itself...

Sorry if some of those questions sound stupid, but i'm neither a Mathematician, nor a Computer Engineer, and i have a fairly hard time wrapping my head around a lot of the inner workings of what exactly is happening.

Which part is for the LoHA

Dear author,

could you please tell me where is the LoHA part in the code?

Thanks!

[Question] Can I pull another LoRA algorithm?

I inspired by LoRA and LoHa.

I found some of mathematical method is related with these two.

I implement this algorithm and it is on experiment.

I have no good VGAs, so can not do benchmark test(e.g. CIFAR-10, CIFAR-100, etc.).

Only I can obtain is very subjective result, images of some characters generated from SD models.

When some test done, can I pull this?

LoCon LoRA settings formula

Can you give some formulas

How many unet rank and text encoder rank we should set?

According to what?

What learning rates we should apply?

Make LoCon installation modular

currently installation is to move locon folder into kohya-ss/sd-scripts

but that makes local changes to checked-out kohya's repo, so it cannot be cleanly updated any longer.
goal is that this work can reside separately from kohya and still be referenced from train_network.py script.

Feedback: iA3 is amazing

I know you mention that with iA3 "You can only get reasonable result on the model you trained on"... but it does! Actually I get better results with the iA3 on the model I did not train with. Here is a sample I trained of Buster Keaton using SD1.5 Base and rendered using Realistic Vision 3.0:

I mean... for a 208KB file it is bang on accurate. This make me wonder how much waste data is in the other LoRA type.

I trained using the Prodigy optimizer and one thing that was noticed by #Dogucat is that using batch of 1 and gradient 1 on roughly 30 images will result in a very nice prodigy dlr curve... and this is what you want to get a nice looking model... in theory:

Here is the model... so small! 167KB once zipped... could probably just be embedded in the image metadata.

Buster Keaton-3.0b-ia3-sd15.zip

Trigger work is: BstrKtn

can i train with a 2.1model?

such as 2.1model with lora?

merge locon to checkpoint

is there a tutorial or guide to merge locon into checkpoint?

about filesize and kohya extension

when I use this with kohya's script the file size is multiply by 2 ? is this normal? I just copy locon folder to kohya sd-scripts then from powershell script I add "--network_module locon.locon_kohya " the result is looking good though.
also is this can only use this via AUTOMATIC1111 only? got an error when try to use in kohya LoRA additional networks extension.

the recommended value of dim/alpha?

I noticed that there are actually two sets of values linear and conv,so what is the the recommended value of them?
Which does dim<64 alpha=1 refer to?

Composable lora extension causes an error when using Locon or Loha

Error completing request
Arguments: ('task(0cohsq59a1vrlea)', ' masterpiece, best quality, (realistic photo:1.2), detailed illustration, detailed face and eyes, ultra-detailed, illustration, 1girl  <lora:DeiLoha:1>', '(worst quality, low quality:1.4), NG_DeepNegative_V1_75T', [], 20, 15, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, True, False, 'none', 'None', 1, None, False, 'Scale to Fit (Inner Fit)', False, False, 64, 64, 64, 0, 1, False, False, False, False, False, '1:1,1:2,1:2', '0:0,0:0,0:1', '0.2,0.8,0.8', 20, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, None, 50) {}
Traceback (most recent call last):
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/processing.py", line 621, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/processing.py", line 570, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/content/gdrive/MyDrive/sd/stablediffusion/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py", line 728, in forward
    return self.text_model(
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py", line 649, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py", line 578, in forward
    layer_outputs = encoder_layer(
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py", line 321, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py", line 210, in forward
    query_states = self.q_proj(hidden_states) * self.scale
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/stable-diffusion-webui-composable-lora/composable_lora.py", line 150, in lora_Linear_forward
    return lora_forward(self, input, torch.nn.Linear_forward_before_lora(self, input))
  File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/stable-diffusion-webui-composable-lora/composable_lora.py", line 62, in lora_forward
    patch = module.up(module.down(input))
AttributeError: 'LoraHadaModule' object has no attribute 'up'

how to train locon only?

not locon+lora
with the kohya script

0.1.7 Traceback when using scale weight norm

When trying to train under 0.1.7 with scale weight norm and network dropout under SD15 and SD2.1 --network_module=lycoris.kohya --network_args "conv_dim=16" "conv_alpha=8" "use_cp=True" "algo=loha"

I get:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\kohya_ss\train_network.py:873 in <module>                                                     │
│                                                                                                  │
│   870 │   args = parser.parse_args()                                                             │
│   871 │   args = train_util.read_config_from_file(args, parser)                                  │
│   872 │                                                                                          │
│ ❱ 873 │   train(args)                                                                            │
│   874                                                                                            │
│                                                                                                  │
│ D:\kohya_ss\train_network.py:693 in train                                                        │
│                                                                                                  │
│   690 │   │   │   │   optimizer.zero_grad(set_to_none=True)                                      │
│   691 │   │   │                                                                                  │
│   692 │   │   │   if args.scale_weight_norms:                                                    │
│ ❱ 693 │   │   │   │   keys_scaled, mean_norm, maximum_norm = network.apply_max_norm_regulariza   │
│   694 │   │   │   │   │   args.scale_weight_norms, accelerator.device                            │
│   695 │   │   │   │   )                                                                          │
│   696 │   │   │   │   max_mean_logs = {"Keys Scaled": keys_scaled, "Average key norm": mean_no   │
│                                                                                                  │
│ D:\kohya_ss\venv\lib\site-packages\lycoris\kohya.py:378 in apply_max_norm_regularization         │
│                                                                                                  │
│   375 │   │   │   if hasattr(lora, 'apply_max_norm'):                                            │
│   376 │   │   │   │   scaled, norm = lora.apply_max_norm(max_norm_value, device)                 │
│   377 │   │   │   │   norms.append(norm)                                                         │
│ ❱ 378 │   │   │   │   scaled += int(scaled)                                                      │
│   379 │   │                                                                                      │
│   380 │   │   for lora in self.text_encoder_loras:                                               │
│   381 │   │   │   if hasattr(lora, 'apply_max_norm'):                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: result type Long can't be cast to the desired output type Bool

Turning off scale weight norm result in no traceback.

How do I set the size of the generated LORA? Like 144 mb in size?

network_dim=32
network_alpha=1
algo="loha"
conv_dim=1
conv_alpha=1

How to modify it? The above can only generate a size of 80mb

Some thoughts, sharing my work here

Hey there!

I wanted to share some ideas and my previous implemenations here as well. (Its on diffusers implementation side)

You can use extra lora convolution module with --extend-lora argument on my repo. BTW, when I've explored this about 2 months ago results weren't promising, so it isn't on by default.
Awesome to see others finding out exploring more on this! I would love to get to know your results

cloneofsimo/lora#133

(Would be appreciated if you mention above pr on this repo's readme, so people can know that there are diffusers compatible implementations as well)

Module/rank dropout error in lokr

Not sure if I'm missing something but with
--network_args "module_dropout=0.3" I get

Traceback (most recent call last):
  File "X:\avirtual\sd-scripts\train_network.py", line 879, in <module>
    train(args)
  File "X:\avirtual\sd-scripts\train_network.py", line 668, in train
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward
    sample, res_samples = downsample_block(
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 606, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\autograd\function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 602, in custom_forward
    return module(*inputs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\diffusers\models\resnet.py", line 479, in forward
    output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
RuntimeError: The size of tensor a (56) must match the size of tensor b (54) at non-singleton dimension 3
steps:   0%|                                                                                   | 0/640 [00:16<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "X:\avirtual\sd-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

And with rank_dropout I get

    Traceback (most recent call last):
  File "X:\avirtual\sd-scripts\train_network.py", line 879, in <module>
    train(args)
  File "X:\avirtual\sd-scripts\train_network.py", line 650, in train
    encoder_hidden_states = train_util.get_hidden_states(args, input_ids, tokenizer, text_encoder, weight_dtype)
  File "X:\avirtual\sd-scripts\library\train_util.py", line 3188, in get_hidden_states
    enc_out = text_encoder(input_ids, output_hidden_states=True, return_dict=True)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 816, in forward
    return self.text_model(
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 725, in forward
    encoder_outputs = self.encoder(
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 647, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\autograd\function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 643, in custom_forward
    return module(*inputs, output_attentions)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 383, in forward
    hidden_states, attn_weights = self.self_attn(
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 272, in forward
    query_states = self.q_proj(hidden_states) * self.scale
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\lycoris\lokr.py", line 258, in forward
    + self.get_weight(self.org_module[0].weight.data) * self.multiplier
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\lycoris\lokr.py", line 219, in get_weight
    weight *= drop.view(-1, [1]*len(weight.shape[1:])).to(weight.device)
TypeError: view(): argument 'size' must be tuple of ints, but found element of type list at pos 2
steps:   0%|                                                                                                                                                                                                                                          | 0/640 [00:13<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "X:\avirtual\sd-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "X:\avirtual\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Those were supposed to be supported right? Am I using them wrong?

warning: scale_weight_norms is specified but the network does not support it / scale_weight_normsが指定されていますが、ネットワークが対応していません

Good evening, just a quick question about the new scale_weight_norms option in sd-scripts. Is it something that can be implemented in LyCORIS or it just does not make sens?

Error: CUBLAS_STATUS_EXECUTION_FAILED when uses loha

caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 83/83 [00:52<00:00, 1.57it/s]
import network module: lycoris.kohya
Using rank adaptation algo: loha
Apply different lora dim for conv layer
Conv Dim: 8, Linear Dim: 64
Use Dropout value: 0.0
Create LyCORIS Module
create LyCORIS for Text Encoder: 72 modules.
Create LyCORIS Module
create LyCORIS for U-Net: 278 modules.
enable LyCORIS for text encoder
enable LyCORIS for U-Net
prepare optimizer, data loader etc.
use Lion optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 4256
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 714
num epochs / epoch数: 64
batch size per device / バッチサイズ: 3
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 45398
steps: 0%| | 0/45398 [00:00<?, ?it/s]epoch 1/64
steps: 0%| | 99/45398 [06:04<46:22:32, 3.69s/it, loss=0.159]Traceback (most recent call last):
File "H:\gitrepo\kohya_ss\train_network.py", line 699, in
train(args)
File "H:\gitrepo\kohya_ss\train_network.py", line 554, in train
accelerator.backward(loss)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\lycoris\loha.py", line 24, in backward
grad_w2a = temp @ w2b.T
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
steps: 0%| | 99/45398 [06:18<48:09:25, 3.83s/it, loss=0.159]
Traceback (most recent call last):
File "H:\envs\novelai\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "H:\envs\novelai\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "H:\gitrepo\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "H:\gitRepo\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['H:\gitRepo\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=H:/final-prune.ckpt', '--train_data_dir=H:/image429077', '--resolution=1024,1024', '--output_dir=H:/model', '--logging_dir=H:/log', '--network_alpha=1', '--training_comment=Ariah_Lolita_gloariah$lora$', '--save_model_as=safetensors', '--network_module=lycoris.kohya', '--network_args', 'conv_dim=8', 'conv_alpha=1', 'algo=loha', '--text_encoder_lr=3.8e-4', '--unet_lr=2.8e-05', '--network_dim=64', '--output_name=Lolita_ariah_v1', '--lr_scheduler_num_cycles=8', '--lr_scheduler_power=2', '--learning_rate=0.0001', '--lr_scheduler=cosine_with_restarts', '--train_batch_size=3', '--max_train_steps=45398', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--cache_latents', '--optimizer_type=Lion', '--max_data_loader_n_workers=8', '--max_token_length=225', '--clip_skip=2', '--keep_tokens=4', '--vae_batch_size=2', '--bucket_reso_steps=64', '--shuffle_caption', '--xformers', '--persistent_data_loader_workers', '--bucket_no_upscale', '--noise_offset=0.1']' returned non-zero exit status 1

overtrains too fast and not stylisable with same settings as regular lora

was this made just for anime ?
Also this doesnt train higher res as good as regular lora, i think its just a step backwards and new layer of confusion instead of improvement, if it improves its such small percentage its not worth 2xtimes the filesize

Resizing LoCons

is it possible to resize a LoCon like normal nets trained with kohya?

https://github.com/kohya-ss/sd-scripts/blob/main/networks/resize_lora.py

Training LOCON encountered problems.

When I updated Lycoris to version 0.1.4, I found that I couldn't train LOCON. The typical symptoms were very high loss rate, and the generated images were also very strange. However, I did not encounter these issues when training LoHa.
This is the sample image for testing and the training parameters.

accelerate launch "train_network.py" --network_module=lycoris.kohya --pretrained_model_name_or_path="anime-full-vae-fixed.ckpt" --save_model_as=safetensors --caption_extension=".txt" --seed="114514" --resolution=768,768 --train_batch_size=6 --output_name="test" --train_data_dir="./input" --output_dir="./output/" --logging_dir="./logs" --network_alpha=1 --network_dim=16 --learning_rate=3e-4 --unet_lr=3e-4 --max_train_steps=4000 --save_every_n_epochs="1" --lr_scheduler="cosine_with_restarts" --lr_scheduler_num_cycles=1 --optimizer_type="adamw" --optimizer_args "weight_decay=2e-1" --max_grad_norm=1.0 --mixed_precision="bf16" --save_precision="fp16" --enable_bucket --bucket_reso_steps=64 --bucket_no_upscale --random_crop --max_token_length=150 --shuffle_caption --gradient_checkpointing --xformers --persistent_data_loader_workers --vae="vae-ft-mse-840000-ema-pruned.ckpt" --network_args "conv_dim=8" "conv_alpha=1" "algo=lora" --clip_skip=2 --sample_prompts="./prompt/test.txt" --sample_every_n_epochs=1 --lr_warmup_step=100 --network_train_unet_only --noise_offset=0.05

The parameters I just wrote were the ones I used to train LoHa. I have made some modifications to them.

Will it eventually be possible to extract a LoHA from a checkpoint

Do you think it will be possible to extract a LoHA from an existing model at some point?