Giter VIP home page Giter VIP logo

lora-ga's Introduction

LoRA-GA

Official implementation of the paper "LoRA-GA: Low-Rank Adaptation with Gradient Approximation".

How to run

We use hydra to manage the configurations. You can find the default configurations in conf/ directory.

We have three config groups: peft, dataset, and init. You can use +peft=xxx to specify the peft config, +dataset_name=xxx to specify the dataset config, and +init=xxx to specify the init config. You can also use ++peft.xxx=xxx to specify the sub-configs of peft. Learn more about hydra here.

There are two modes: single run (for one dataset) and multi run (for all datasets).

For single run, you can use command like this

python run_exp.py +peft=all ++peft.lora_relative_r=0.1 +dataset_name=sst2 +init=gaussian 

For multi run, you can use command like this

python run_exp.py -m +init=gradient ++peft.lora_r=8 +peft=all wandb.name="stable-gradient-64" ++init.weight="stable" peft.use_rslora=True ++init.stable_gamma=64

Configurations

In order to run LoRA-GA, you need to specify the following configurations:

  • init=gradient
  • init.weight=stable
  • peft.use_rslora=True In this way, you enable the +SO and +GA parts of LoRA-GA.

If you want to run the default LoRA, you can use the following configurations:

  • init=default

How to download the datasets

The datasets are automatically downloaded when you run the code. If you want to download the datasets manually, you can edit data.py and use the following command:

python data.py

lora-ga's People

Contributors

outsider565 avatar

Stargazers

Tianheng Cheng avatar HaoyuWu556 avatar Lei Wang avatar Alan Fang avatar lishaojie avatar Sasha_dh avatar XiaHan avatar  avatar Xamer avatar  avatar  avatar Guang Yang avatar  avatar Trotsky avatar  avatar  avatar  avatar Zhicheng Guo avatar  avatar Kevin avatar Pengyu Chu avatar 南栖 avatar Mercury avatar Fan avatar i-MaTh avatar skykiseki avatar  avatar 俞一炅 avatar YimingZheng avatar yydxlv avatar  avatar Lin avatar Dreamer312 avatar qhmiao avatar  avatar bo zhou avatar Zena avatar gradetwo avatar  avatar Chengwei Sun avatar junjiewang avatar  avatar  avatar Cyanocitta Yinhao Wang avatar Hongnan Gao avatar  avatar  avatar Yuming Du avatar Fengyu Cai avatar  avatar Leo avatar lionHC avatar 过拟合 avatar Xu Yang avatar Fanxu Meng avatar  avatar  avatar  avatar  avatar SLAPaper Pang avatar Sheng Lin avatar Yushun Zhang avatar Jose Cohenca avatar haonan he avatar  avatar Ning Lu avatar Lesi Chen avatar gongel avatar Xiaolong avatar zhoudaxia avatar 曾斌 avatar  avatar 艾梦 avatar dionysus avatar QinLuo avatar ZHU-Zhiyu avatar  avatar gosh avatar Julien avatar Stern Chow avatar  avatar  avatar Wenlong Ye avatar Yuheng Ji avatar Pumpkin avatar yu_bao avatar Catstyle_Lee avatar HuggingAha avatar potter  avatar Xiang Zhao avatar xylcbd avatar  avatar  avatar  avatar YAO SHUNPENG avatar Longxu Dou avatar

Watchers

 avatar  avatar

lora-ga's Issues

目前是不支持用deepspeed zero3训练吗

开启deepspeed zero3在estimate_gradient里面梯度估计的时候会报错。

[rank1]: Traceback (most recent call last):
[rank1]:   File "finetune_lora_ga.py", line 747, in <module>
[rank1]:     train()
[rank1]:   File "finetune_lora_ga.py", line 649, in train
[rank1]:     named_grads = estimate_gradient(model, temp_set, 4)
[rank1]:   File "finetune_lora_ga.py", line 272, in estimate_gradient
[rank1]:     outputs = model(**batch)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
[rank1]:     outputs = self.model(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 878, in forward
[rank1]:     inputs_embeds = self.embed_tokens(input_ids)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 163, in forward
[rank1]:     return F.embedding(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2264, in embedding
[rank1]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank1]: RuntimeError: 'weight' must be 2-D

LoRA-GA Tuning Issue: Task Loss Not Working as Expected

Hi,
Thanks for excellent work on LoRA-GA,
I am experiencing an issue while using LoRA-GA for model training. Specifically, the task loss is not decreasing as expected. I would like to seek any advice or tuning tips that might help improve this situation.

Are there recommended parameter settings or tuning strategies?

Current Parameter Settings:

init_batch_size: 2
init_iters: 4
init_config:
  mode: "gradient"  # option: "simple", "svd", "gradient"
  lora_A: "unit"  # option: "gaussian", "kaiming", "fan_out_kaiming", "xavier", "zeros", "unit", "orthogonal"
  lora_A_std: 0.01  # only needed when lora_A is "gaussian"
  lora_B: "unit"  # option: "gaussian", "kaiming", "fan_out_kaiming", "xavier", "zeros", "unit", "orthogonal"
  lora_B_std: 0.01  # only needed when lora_B is "gaussian"
  scale: "stable"  # option: "default", "stable", "unit", "normalized", "gd", "weightS"
  stable_gamma: 64  # only needed when scale is "stable"
  direction: "ArB2r"  # option: "ArBr", "A2rBr", "ArB2r"(only needed when mode is "gradient")
  dtype: "fp32"  # option: "bf16", "fp32"
  norm_clip: false  # norm clipping

this is my loss scalars:

1723421470820

it would be very helpful if you can offer some suggestions!
Thanks!

逐层求梯度

好像代码中并没有给出论文中提到的逐层求梯度的实现

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.