outsider565 / lora-ga Goto Github PK

Python 100.00%

lora-ga's Introduction

LoRA-GA

Official implementation of the paper "LoRA-GA: Low-Rank Adaptation with Gradient Approximation".

How to run

We use hydra to manage the configurations. You can find the default configurations in conf/ directory.

We have three config groups: peft, dataset, and init. You can use +peft=xxx to specify the peft config, +dataset_name=xxx to specify the dataset config, and +init=xxx to specify the init config. You can also use ++peft.xxx=xxx to specify the sub-configs of peft. Learn more about hydra here.

There are two modes: single run (for one dataset) and multi run (for all datasets).

For single run, you can use command like this

python run_exp.py +peft=all ++peft.lora_relative_r=0.1 +dataset_name=sst2 +init=gaussian

For multi run, you can use command like this

python run_exp.py -m +init=gradient ++peft.lora_r=8 +peft=all wandb.name="stable-gradient-64" ++init.weight="stable" peft.use_rslora=True ++init.stable_gamma=64

Configurations

In order to run LoRA-GA, you need to specify the following configurations:

init=gradient
init.weight=stable
peft.use_rslora=True In this way, you enable the +SO and +GA parts of LoRA-GA.

If you want to run the default LoRA, you can use the following configurations:

init=default

How to download the datasets

The datasets are automatically downloaded when you run the code. If you want to download the datasets manually, you can edit data.py and use the following command:

python data.py

lora-ga's People

Contributors

Stargazers

Watchers

Forkers

danield21 zengls3186428803 limoncc innomakerqiu fclearner

lora-ga's Issues

目前是不支持用deepspeed zero3训练吗

开启deepspeed zero3在estimate_gradient里面梯度估计的时候会报错。

[rank1]: Traceback (most recent call last):
[rank1]:   File "finetune_lora_ga.py", line 747, in <module>
[rank1]:     train()
[rank1]:   File "finetune_lora_ga.py", line 649, in train
[rank1]:     named_grads = estimate_gradient(model, temp_set, 4)
[rank1]:   File "finetune_lora_ga.py", line 272, in estimate_gradient
[rank1]:     outputs = model(**batch)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
[rank1]:     outputs = self.model(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 878, in forward
[rank1]:     inputs_embeds = self.embed_tokens(input_ids)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 163, in forward
[rank1]:     return F.embedding(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2264, in embedding
[rank1]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank1]: RuntimeError: 'weight' must be 2-D

performance on vision models like vit or stable diffusion

thanks for your awesome work!
I was wondering if you got any results on vision models like vit or stable diffusion?

Oh man! Dose anyone think the requirements.txt should be simplified?

No offense. Its A fabulous work! Just confused by the python env :)

LoRA-GA Tuning Issue: Task Loss Not Working as Expected

Hi,
Thanks for excellent work on LoRA-GA,
I am experiencing an issue while using LoRA-GA for model training. Specifically, the task loss is not decreasing as expected. I would like to seek any advice or tuning tips that might help improve this situation.

Are there recommended parameter settings or tuning strategies?

Current Parameter Settings:

init_batch_size: 2
init_iters: 4
init_config:
  mode: "gradient"  # option: "simple", "svd", "gradient"
  lora_A: "unit"  # option: "gaussian", "kaiming", "fan_out_kaiming", "xavier", "zeros", "unit", "orthogonal"
  lora_A_std: 0.01  # only needed when lora_A is "gaussian"
  lora_B: "unit"  # option: "gaussian", "kaiming", "fan_out_kaiming", "xavier", "zeros", "unit", "orthogonal"
  lora_B_std: 0.01  # only needed when lora_B is "gaussian"
  scale: "stable"  # option: "default", "stable", "unit", "normalized", "gd", "weightS"
  stable_gamma: 64  # only needed when scale is "stable"
  direction: "ArB2r"  # option: "ArBr", "A2rBr", "ArB2r"（only needed when mode is "gradient"）
  dtype: "fp32"  # option: "bf16", "fp32"
  norm_clip: false  # norm clipping

this is my loss scalars:

it would be very helpful if you can offer some suggestions!
Thanks!

逐层求梯度

好像代码中并没有给出论文中提到的逐层求梯度的实现

outsider565 / lora-ga Goto Github PK

lora-ga's Introduction

LoRA-GA

How to run

Configurations

How to download the datasets

lora-ga's People

Contributors

Stargazers

Watchers

Forkers

lora-ga's Issues

目前是不支持用deepspeed zero3训练吗

performance on vision models like vit or stable diffusion

Oh man! Dose anyone think the requirements.txt should be simplified?

LoRA-GA Tuning Issue: Task Loss Not Working as Expected

逐层求梯度

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent