git-cloner / llama2-lora-fine-tuning Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 14.0 23.15 MB

llama2 finetuning with deepspeed and lora

Home Page: https://gitclone.com/aiit/chat/

License: MIT License

Python 96.78% Shell 3.22%

deepspeed finetuning llama2 lora

llama2-lora-fine-tuning's People

Contributors

Stargazers

Watchers

Forkers

brewswang skyrookieyu jackeylove1 jack-lizhixin zh460045050 shuoyinn zhangzhuobys ssrisunt kwanyonglee zxt243416724 linzhonghong bill-cai tommyisnothere coco0201

llama2-lora-fine-tuning's Issues

可以支持多机多卡吗，如何配置支持deepspeed的stage2，stage3

非常感谢作者~
我目前的情况是，当我使用8张GPU+deepspeed zero3+4bit qlora就会报错
和这个一样:microsoft/DeepSpeed#3775
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f7019a30890>
在这个讨论串中，作者尝试修改但仍然报错，怀疑deepspeed是不是目前不支持4bit qlora
但是我如果只用一个gpu跑4bit qlora+deepspeed就不会报错
一旦使用多gpu就会跳上面的错误
我看您有提供4bit的量化finetune，但实际默认的参数是使用8bit
想请问是否用成功用两张gpu+deepspeed+4bit qlora成功finetune过？

请问扩充中文词表的作用是什么呀

stream_output 是不是没用到呀

llama2-lora-fine-tuning/generate.py

Line 91 in 9834472

stream_output=False,

请教关于微调

我使用了30m的6w数据，在A10 上面微调微调参数和你的差不多 batch size 32 , 24小时，结果还是回答不了好中文。请问有没有什么建议和经验分享下

ValueError: Attention mask should be of size (4, 1, 240, 480), but is torch.Size([4, 1, 240, 240])

I met this issue when fine-tuning the LLaMa-7B-Chat-hf with example dataset:

Traceback (most recent call last):
File "finetune-lora.py", line 656, in
train()
File "finetune-lora.py", line 622, in train
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 2732, in training_step
self.accelerator.backward(loss)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/accelerator.py", line 1905, in backward
loss.backward(**kwargs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 141, in backward
outputs = ctx.run_function(*detached_inputs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 789, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 423, in forward
raise ValueError(
ValueError: Attention mask should be of size (4, 1, 240, 480), but is torch.Size([4, 1, 240, 240])

decoder输出长度是有限制吗？

parser.add_argument('--base_model', default="llama-2-7b-chat-hf/", type=str)
parser.add_argument('--lora_weights', default="tloen/alpaca-lora-7b", type=str,
                    help="If None, perform inference on the base model")
parser.add_argument('--load_8bit', default="True", type=bool,
                    help='only use CPU for inference')

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.12s/it] Question: 给我写一个用户登录注册系统，前端用vue，后端用go，数据库用mysql设计，写出代码。 This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

请问输出长度是有限制吗？但是感觉2048是不是太短了，怎么能修改这个长度呢？

可以支持多机多卡吗

为什么generate里面使用base model的tokenizer

pip install git+https://github.com/huggingface/transformers -i https://pypi.mirrors.ustc.edu.cn/simple”这步报错了

validation_files

validation_files 自己的训练数据，这个要怎么处理

===================================BUG REPORT===================================

deepspeed这里是使用zero-1/2/3哪个模式切片的？

我看这里好像并没用单独设置deepspeech.json，所以不大清楚

ImportError: cannot import name 'import_path' from '_pytest.doctest'

这个问题怎么解决，pytest是最新的。
Traceback (most recent call last):
File "/data/home/scv9515/A_suke_file/ALLMs/llama2-lora-fine-tuning/finetune-lora.py", line 45, in
from transformers.testing_utils import CaptureLogger
File "/data/home/scv9515/miniconda3/envs/bili/lib/python3.10/site-packages/transformers/testing_utils.py", line 131, in
from _pytest.doctest import (
ImportError: cannot import name 'import_path' from '_pytest.doctest' (/data/home/scv9515/miniconda3/envs/bili/lib/python3.10/site-packages/_pytest/doctest.py)