jianzhnie / llamatuner Goto Github PK

View Code? Open in Web Editor NEW

547.0 9.0 60.0 895 KB

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

Home Page: https://jianzhnie.github.io/llmtech/

License: Apache License 2.0

Python 96.77% Shell 3.23%

llama chatgpt dpo llama3 mixtral ppo qlora qwen rlhf

llamatuner's Introduction

👋🤗🤗👋 Join our WeChat.

Easy and Efficient Fine-tuning LLMs --- 简单高效的大语言模型训练/部署

中文 | English

Introduction

LLamaTuner is an efficient, flexible and full-featured toolkit for fine-tuning LLM (Llama3, Phi3, Qwen, Mistral, ...)

Efficient

Support LLM, VLM pre-training / fine-tuning on almost all GPUs. LLamaTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.
Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.
Compatible with DeepSpeed 🚀, easily utilizing a variety of ZeRO optimization techniques.

Flexible

Support various LLMs (Llama 3, Mixtral, Llama 2, ChatGLM, Qwen, Baichuan, ...).
Support VLM (LLaVA).
Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.
Support various training algorithms (QLoRA, LoRA, full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.

Full-featured

Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.
Support chatting with large models with pre-defined templates.

Easy and Efficient Fine-tuning LLMs --- 简单高效的大语言模型训练/部署

Supported Models

Model	Model size	Default module	Template
Baichuan	7B/13B	W_pack	baichuan
Baichuan2	7B/13B	W_pack	baichuan2
BLOOM	560M/1.1B/1.7B/3B/7.1B/176B	query_key_value	-
BLOOMZ	560M/1.1B/1.7B/3B/7.1B/176B	query_key_value	-
ChatGLM3	6B	query_key_value	chatglm3
Command-R	35B/104B	q_proj,v_proj	cohere
DeepSeek (MoE)	7B/16B/67B/236B	q_proj,v_proj	deepseek
Falcon	7B/11B/40B/180B	query_key_value	falcon
Gemma/CodeGemma	2B/7B	q_proj,v_proj	gemma
InternLM2	7B/20B	wqkv	intern2
LLaMA	7B/13B/33B/65B	q_proj,v_proj	-
LLaMA-2	7B/13B/70B	q_proj,v_proj	llama2
LLaMA-3	8B/70B	q_proj,v_proj	llama3
LLaVA-1.5	7B/13B	q_proj,v_proj	vicuna
Mistral/Mixtral	7B/8x7B/8x22B	q_proj,v_proj	mistral
OLMo	1B/7B	q_proj,v_proj	-
PaliGemma	3B	q_proj,v_proj	gemma
Phi-1.5/2	1.3B/2.7B	q_proj,v_proj	-
Phi-3	3.8B	qkv_proj	phi
Qwen	1.8B/7B/14B/72B	c_attn	qwen
Qwen1.5 (Code/MoE)	0.5B/1.8B/4B/7B/14B/32B/72B/110B	q_proj,v_proj	qwen
StarCoder2	3B/7B/15B	q_proj,v_proj	-
XVERSE	7B/13B/65B	q_proj,v_proj	xverse
Yi (1/1.5)	6B/9B/34B	q_proj,v_proj	yi
Yi-VL	6B/34B	q_proj,v_proj	yi_vl
Yuan	2B/51B/102B	q_proj,v_proj	yuan

Supported Training Approaches

Approach	Full-tuning	Freeze-tuning	LoRA	QLoRA
Pre-Training	✅	✅	✅	✅
Supervised Fine-Tuning	✅	✅	✅	✅
Reward Modeling	✅	✅	✅	✅
PPO Training	✅	✅	✅	✅
DPO Training	✅	✅	✅	✅
KTO Training	✅	✅	✅	✅
ORPO Training	✅	✅	✅	✅

Supported Datasets

As of now, we support the following datasets, most of which are all available in the Hugging Face datasets library.

Supervised fine-tuning dataset

Preference datasets

Please refer to data/README.md to learn how to use these datasets. If you want to explore more datasets, please refer to the awesome-instruction-datasets. Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Data Preprocessing

We provide a number of data preprocessing tools in the data folder. These tools are intended to be a starting point for further research and development.

data_utils.py : Data preprocessing and formatting
sft_dataset.py : Supervised fine-tuning dataset class and collator
conv_dataset.py : Conversation dataset class and collator

Model Zoo

We provide a number of models in the Hugging Face model hub. These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:

Base Model	Adapter	Instruct Datasets	Train Script	Log	Model on Huggingface
llama-7b	FullFinetune	-	-	-
llama-7b	QLoRA	openassistant-guanaco	finetune_lamma7b	wandb log	GaussianTech/llama-7b-sft
llama-7b	QLoRA	OL-CC	finetune_lamma7b
baichuan7b	QLoRA	openassistant-guanaco	finetune_baichuan7b	wandb log	GaussianTech/baichuan-7b-sft
baichuan7b	QLoRA	OL-CC	finetune_baichuan7b	wandb log	-

Requirement

Mandatory	Minimum	Recommend
python	3.8	3.10
torch	1.13.1	2.2.0
transformers	4.37.2	4.41.0
datasets	2.14.3	2.19.1
accelerate	0.27.2	0.30.1
peft	0.9.0	0.11.1
trl	0.8.2	0.8.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.14.0
bitsandbytes	0.39.0	0.43.1
vllm	0.4.0	0.4.2
flash-attn	2.3.0	2.5.8

Hardware Requirement

* estimated

Method	Bits	7B	13B	30B	70B	110B	8x7B	8x22B
Full	AMP	120GB	240GB	600GB	1200GB	2000GB	900GB	2400GB
Full	16	60GB	120GB	300GB	600GB	900GB	400GB	1200GB
Freeze	16	20GB	40GB	80GB	200GB	360GB	160GB	400GB
LoRA/GaLore/BAdam	16	16GB	32GB	64GB	160GB	240GB	120GB	320GB
QLoRA	8	10GB	20GB	40GB	80GB	140GB	60GB	160GB
QLoRA	4	6GB	12GB	24GB	48GB	72GB	30GB	96GB
QLoRA	2	4GB	8GB	16GB	24GB	48GB	18GB	48GB

Getting Started

Clone the code

Clone this repository and navigate to the Efficient-Tuning-LLMs folder

git clone https://github.com/jianzhnie/LLamaTuner.git
cd LLamaTuner

Getting Started

main function	Useage	Scripts
train.py	Full finetune LLMs on SFT datasets	full_finetune
train_lora.py	Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune)	lora_finetune
train_qlora.py	Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs)	qlora_finetune

QLora int4 Finetune

The train_qlora.py code is a starting point for finetuning and inference on various datasets. Basic command for finetuning a baseline model on the Alpaca dataset:

python train_qlora.py --model_name_or_path <path_or_name>

For models larger than 13B, we recommend adjusting the learning rate:

python train_qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name>

To find more scripts for finetuning and inference, please refer to the scripts folder.

Known Issues and Limitations

Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.

4-bit inference is slow. Currently, our 4-bit inference implementation is not yet integrated with the 4-bit matrix multiplication
Resuming a LoRA training run with the Trainer currently runs on an error
Currently, using bnb_4bit_compute_type='fp16' can lead to instabilities. For 7B LLaMA, only 80% of finetuning runs complete without error. We have solutions, but they are not integrated yet into bitsandbytes.
Make sure that tokenizer.bos_token_id = 1 to avoid generation issues.

License

LLamaTuner is released under the Apache 2.0 license.

Acknowledgements

We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.

We appreciate the work by many open-source contributors, especially:

Some lmm fine-tuning repos

Citation

Please cite the repo if you use the data or code in this repo.

@misc{Chinese-Guanaco,
  author = {jianzhnie},
  title = {LLamaTuner: Easy and Efficient Fine-tuning LLMs},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jianzhnie/LLamaTuner}},
}

llamatuner's People

Contributors

Stargazers

Watchers

llamatuner's Issues

百川7B 模型微调结果

单机多卡并行训练报错

错误：
ValueError: DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map.
训练脚本：
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train_lora.py \ --model_name_or_path ./models/baichuan-13B-Base \ --dataset_name alpaca \ --data_dir data/alpaca.json \ --load_from_local \ --output_dir ./work_dir/baichuan-13b-wb-lora-ds \ --lora_target_modules W_pack \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_total_limit 20 \ --save_steps 500 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --source_max_len 512 \ --target_max_len 512 \ --lora_r 16 \ --lora_alpha 16 \ --lora_dropout 0.1 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --trust_remote_code \ --fp16 \ --deepspeed "scripts/ds_config/ds_config_zero3_auto.json"

微调训练失败

你好大佬博主！
在使用lora微调的时候出现一下loss异常的原因可能是什么。
pytorch ：2.0
在数据预处理阶段将输入以及padding处的label标记为-100，在最后的计算交叉损失中忽略-100标志位的损失。

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │
│ │
│ 393 │
│ 394 │
│ 395 if name == 'main': │
│ ❱ 396 │ main() │
│ 397 │
│ │
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │
│ │
│ 309 │ set_seed(args.seed) │
│ 310 │ │
│ 311 │ # Tokenizer │
│ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 313 │ │ args.model_name_or_path, │
│ 314 │ │ cache_dir=args.cache_dir, │
│ 315 │ │ padding_side='right', │
│ │
│ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │
│ from_pretrained │
│ │
│ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │
│ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │
│ 687 │ │ │ if tokenizer_class is None: │
│ ❱ 688 │ │ │ │ raise ValueError( │
│ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │
│ 690 │ │ │ │ ) │
│ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

How to adapt QLoRA to other base models?

Thank you very much for your work! Additionally, I would like to know, which part of the code should I refer to in order to make the base model already open-sourced on Huggingface support QLoRA? I'm a bit confused, your guidance would be greatly appreciated!

该项目与qlora的差别

这个项目是不是qlora的项目源码的拆分，差异在哪里

多卡加速支持evalution吗

faile on 3090

(gh_Chinese-Guanaco) ub2004@ub2004-B85M-A0:~/llm_dev/Chinese-Guanaco$ python3 qlora_int8_finetune.py --model_name_or_path /data-ssd-1t/hf_model/llama-7b-hf --data_path tatsu-lab/alpaca --output_dir work_dir_lora/ --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 500 --save_total_limit 5 --learning_rate 1e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --model_max_length 2048 --logging_steps 1
[2023-06-11 00:48:41,928] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
Traceback (most recent call last):
File "/home/ub2004/llm_dev/Chinese-Guanaco/qlora_int8_finetune.py", line 338, in
train(load_in_8bit=True)
File "/home/ub2004/llm_dev/Chinese-Guanaco/qlora_int8_finetune.py", line 234, in train
model = AutoModelForCausalLM.from_pretrained(
File "/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
return model_class.from_pretrained(
File "/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2819, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

(gh_Chinese-Guanaco) ub2004@ub2004-B85M-A0:~/llm_dev/Chinese-Guanaco$

llama2-13B和llama2-70b微调所需要的显卡配置

您好！请问有推荐的配置吗？

What's the real LICENSING

We release the resources associated with QLoRA finetuning in this repository under MIT license. In addition, we release the Guanaco model family for base LLaMA model sizes of 7B, 13B, 33B, and 65B. These models are intended for purposes in line with the LLaMA license and require access to the LLaMA models.

but the LICENSE file say it's Apache License 2.0

Baichuan7B使用lora微调后测试时总会再次输出query

为什么我把Baichuan7B使用lora微调后测试时总会再次输出一遍问题，是数据预处理时没有做好吗？

ValueError: Undefined dataset tatsu-lab/alpaca

(yk_py39) amd00@MZ32-00:/llm_dev/Efficient-Tuning-LLMs$
(yk_py39) amd00@MZ32-00:/llm_dev/Efficient-Tuning-LLMs$ python train_qlora.py --model_name_or_path /home/amd00/hf_model/llama-7b --output_dir ./out-llama-7b --dataset_name tatsu-lab/alpaca
[2023-08-03 17:54:26,303] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/train_qlora.py", line 102, in
main()
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/train_qlora.py", line 31, in main
data_args.init_for_training()
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/chatllms/configs/data_args.py", line 95, in init_for_training
raise ValueError('Undefined dataset {} in {}'.format(
ValueError: Undefined dataset tatsu-lab/alpaca in /home/amd00/llm_dev/Efficient-Tuning-LLMs/chatllms/configs/../../data/dataset_info.yaml
(yk_py39) amd00@MZ32-00:~/llm_dev/Efficient-Tuning-LLMs$

总是这个错误怎么解决

AttributeError: 'Namespace' object has no attribute 'datasets_list'

QLORA微调alpaca_data.json报错 'padding_value' (position 3) must be float, not NoneType

@jianzhnie 感谢你的工作！我在复现代码时遇到一个问题，想请教您一下: 我调用scripts/finetune_llama_7b_alpaca_zh.sh，微调alpaca_data.json数据集时。出现以下报错，请问是什么问题呀?

......
File "/home/likanxue1/.custom/cuda11.3.1-cudnn8-devel-ubuntu20.04-py38-jupyter/envs/tuning-llm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/likanxue1/model_inference/transformers/src/transformers/trainer_utils.py", line 772, in call
return self.data_collator(features)
File "/home/likanxue1/model_inference/Efficient-Tuning-LLMs/chatllms/data/sft_dataset.py", line 214, in call
input_ids = pad_sequence(input_ids,
File "/home/likanxue1/.custom/cuda11.3.1-cudnn8-devel-ubuntu20.04-py38-jupyter/envs/tuning-llm/lib/python3.10/site-packages/torch/nn/utils/rnn.py", line 399, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType

我在/data/alpaca_zh_pcyn.yaml中只有以下几行信息：
alpaca:
hf_hub_url: ''
local_path: /home/likanxue1/Dataset/alpaca_data.json
dataset_format: alpaca
multi_turn: False

期待你的及时回复，多有打扰，请海涵！

8bit和4bit训练效果对比有吗

如题，想知道8bit和4bit qlora效果有没有差别？

数据预处理标签问题

您好，对你的数据预处理代码有些疑问，你的labels 和 input 都是直接对应的，labels 不用向右shift 一位吗？

Would you support RWKV?

as you see , RWKV world transformer version is here,
https://github.com/xiaol/Huggingface-RWKV-World/tree/main

and some refs:
https://osk6ppi3hr.feishu.cn/docx/ZlqkdTHD7owXpFxXvMyct0Jrn1d

RWKV is far beyond efficiency and you can use RWKV runner as a good reference.
https://github.com/josStorer/RWKV-Runner

合并代码还有一个错误，中间少了一个变量

刚才还发现一个问题，少个参数，导致保存模型的逻辑没生效，代码如下：
parser.add_argument('--load_8bit', type=bool, default=False)

args = parser.parse_args()

apply_lora(args.base_model_path,args.lora_path,**args.load_8bit,** args.target_model_path,
           args.save_target_model)

32g内存+3060ti6G显存可以finetune 7B的模型吗？

如题。
另外，你们试过RWKV的模型吗？

这么好的项目怎么没有issues，我先赞一个

微调后的Llama-2-7b，在模型加载时出错

微调模型以后，我调用一下命令进行推理测试时，报错：
---- 指令 ----
python gradio_webserver.py
--model_name_or_path model_inference/model_path/Llama-2-7b-chat-ms
--lora_model_name_or_path ~/model_inference/model_path/checkpoint-344

--- 报错 ----
Loading the LoRA adapter from model_inference/model_path/checkpoint-344
Traceback (most recent call last):
File "model_inference/Efficient-Tuning-LLMs/chatllms/utils/apply_lora.py", line 90, in
apply_lora(base_model_path=args.base_model_path,
File "model_inference/Efficient-Tuning-LLMs/chatllms/utils/apply_lora.py", line 69, in apply_lora
model: PreTrainedModel = PeftModel.from_pretrained(base_model,
File "model_inference/peft/src/peft/peft_model.py", line 304, in from_pretrained
config = PEFT_TYPE_TO_CONFIG_MAPPING[
File "~/model_inference/peft/src/peft/config.py", line 134, in from_pretrained
config = config_cls(**kwargs)
TypeError: LoraConfig.init() got an unexpected keyword argument 'loftq_config'

请问这个问题如何解决？

中文文档里没写对baichuan-13B的支持，但英文写了

zero3保存的模型无法加载

zero3保存出来的pytorch.bin只有几兆，推理时加载存在问题

baichuan-7B: AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

chatllms - INFO - Adding special tokens.
Using pad_token, but it is not set yet.
Traceback (most recent call last):
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/train_qlora.py", line 156, in
main()
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/train_qlora.py", line 80, in main
add_special_tokens_if_missing(tokenizer, model)
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/chatllms/utils/model_utils.py", line 47, in add_special_tokens_if_missing
smart_tokenizer_and_embedding_resize(special_tokens_dict, tokenizer,
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/chatllms/utils/model_utils.py", line 77, in smart_tokenizer_and_embedding_resize
model.resize_token_embeddings(len(tokenizer))
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1395, in resize_token_embeddings
model_embeds = self._resize_token_embeddings(new_num_tokens)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1416, in _resize_token_embeddings
new_lm_head = self._get_resized_lm_head(old_lm_head, new_num_tokens)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1520, in _get_resized_lm_head
old_lm_head.weight.size() if not transposed else old_lm_head.weight.t().size()
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

想请教一下是什么问题

合并的Bug修复，等会就出来验证结果了

if target_model_path is not None:
print(f'Saving the target model to {target_model_path}')
model.save_pretrained(target_model_path)
base_tokenizer.save_pretrained(target_model_path)

这里还需要增加一个函数调用。
lora_model = lora_model.merge_and_unload()
merge_and_unload函数在16位加载模型的时候有用，8bit不行
下面这个不确定要不要调用
lora_model.train(False)

参数设置

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

python generate_server.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --lora_model_name_or_path   @tloen/alpaca-lora-7b \
    --load_8bit

run add test this scripts and find the bugs

I ran this script with the same error as before

多卡似乎不能将每张卡跑满，请问如何才能让每张卡的计算负载跑满呢

我设置了CUDA_VISIBLE_DEVICE和device_map，在2张A100上跑的时候，发现确实都有内存占用，但是gpu负载总是某张卡高，其他都很低。

utils/apply_lora.py 有个小Bug

apply_lora(args.base_model_path, args.target_model_path, args.lora_path,
args.save_target_model)

def apply_lora(
base_model_path: str,
lora_path: str,
load_8bit: bool = False,
target_model_path: str = None,
save_target_model: bool = False
) -> Tuple[AutoModelForCausalLM, AutoTokenizer]:
两个变量的位置错了

下载了百川7b模型后，直接在gradio_webserver.py里推理，生成内容乱码问题

你好，请教一下，如上图，下载了百川7b模型后，直接在gradio_webserver.py里推理，生成内容乱码问题

批量推理时结果异常

给 LLaMA2-7B 调用 model.generate 函数做推理，"do_sample":False，
如果batch_size=1，则结果很好，
如果batch_size>1，则结果异常，
使用 trainer.predict 批量推理出的结果也是异常的。
#请问大佬们遇到过这种情况吗？

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

4bit loaded error

Traceback (most recent call last):
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 917, in
train()
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 728, in train
model = get_accelerate_model(args, checkpoint_dir)
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 318, in get_accelerate_model
model = AutoModelForCausalLM.from_pretrained(
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/transformers/models/auto/auto_factory.py", line 490, in from_pretrained
return model_class.from_pretrained(
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/transformers/modeling_utils.py", line 2765, in from_pretrained
raise ValueError(
ValueError: You are using device_map='auto' on a 4bit loaded version of the model. To automatically compute the appropriate device map, you should upgrade your accelerate library,pip install --upgrade accelerate or install it from source to support fp4 auto device mapcalculation. You may encounter unexpected behavior, or pass your own device map

I have used the lastest version accelerate or source code of github, the error remains the same.

训练成功了，但是没有合并的脚本，请问如何合并？

Can't find 'adapter_config.json' at '/wzh/Chinese-Guanaco/models/guanaco-13b-merged-wzhtrain

llama-2-13b的模型用单卡跑lora就会报错

用单卡跑就会显示“NotImplementedError: Cannot copy out of meta tensor； no data!”
用双卡跑就正常，但是占用显存很大。两张48G的A6000，每张都会占用35G左右

[问题]有关训练可视化

感谢您对于项目的贡献！

问题描述

对于微调训练过程来说，Efficient-Tuning-LLMs有可视化实时训练进展、曲线的方式吗？

期待您的回复！

如何使用自己的数据集

创建自己数据集的yaml文件只需要指定'local_path'以及‘dataset_format’不对吗，其他yaml文件会有特别多数据，这里有点不清楚。
感谢您能解答

请问支持Baichuan 13B吗？

data_utils.py 是不是有问题？

348 train_dataset, eval_dataset = split_train_eval(
349 dataset,
350 do_eval=args.do_eval,
351 eval_dataset_size=args.eval_dataset_size,
352 max_eval_samples=args.max_eval_samples,
353 do_train=args.do_train,
354 max_train_samples=args.max_train_samples,
355 )
356 if train_dataset:
357 print('=' * 80)
358 print('loaded dataset:', dataset_name, 'train data size:',
359 len(train_dataset))
360 train_datasets.append(train_dataset)
361 if eval_datasets:
362 print('=' * 80)
363 print('loaded dataset:', dataset_name, 'eval data size:',
364 len(eval_dataset))
365 eval_datasets.append(eval_dataset)

361行，eval_datasets --> eval_dataset ???

About llama-2-70B fine-tuning

Appreciate your great work!

Is it possible to fine tune the llama-2-70B for a 3*8*A100 (40G) configuration, thanks!

不同样式的样本对应什么样的情形，如何根据自己的需求选择样本的样式

选择哪一种方法进行数据预处理，处理得到的结果不同。

# 格式1
# return {
#     'input_ids': input_ids,
#     'attention_mask': attention_mask,
#     'labels': labels,
#     'target_mask': target_mask
# }
# 格式2
# return {'input_ids': input_ids, 'labels': labels}

jianzhnie / llamatuner Goto Github PK

llamatuner's Introduction

Easy and Efficient Fine-tuning LLMs --- 简单高效的大语言模型训练/部署

Introduction

Table of Contents

Supported Models

Supported Training Approaches

Supported Datasets

Data Preprocessing

Model Zoo

Requirement

Hardware Requirement

Getting Started

Clone the code

Getting Started

QLora int4 Finetune

Known Issues and Limitations

License

Acknowledgements

Some lmm fine-tuning repos

Citation

llamatuner's People

Contributors

Stargazers

Watchers

Forkers

llamatuner's Issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

选择哪一种方法进行数据预处理，处理得到的结果不同。

Recommend Projects

Recommend Topics

Recommend Org