Giter VIP home page Giter VIP logo

llamatuner's Introduction

 

GitHub Repo stars GitHub Code License GitHub last commit GitHub pull request issue resolution open issues Python 3.9+ Code style: black

👋🤗🤗👋 Join our WeChat.

Easy and Efficient Fine-tuning LLMs --- 简单高效的大语言模型训练/部署

中文 | English

Introduction

LLamaTuner is an efficient, flexible and full-featured toolkit for fine-tuning LLM (Llama3, Phi3, Qwen, Mistral, ...)

Efficient

  • Support LLM, VLM pre-training / fine-tuning on almost all GPUs. LLamaTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.
  • Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.
  • Compatible with DeepSpeed 🚀, easily utilizing a variety of ZeRO optimization techniques.

Flexible

  • Support various LLMs (Llama 3, Mixtral, Llama 2, ChatGLM, Qwen, Baichuan, ...).
  • Support VLM (LLaVA).
  • Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.
  • Support various training algorithms (QLoRA, LoRA, full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.

Full-featured

  • Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.
  • Support chatting with large models with pre-defined templates.

Table of Contents

Supported Models

Model Model size Default module Template
Baichuan 7B/13B W_pack baichuan
Baichuan2 7B/13B W_pack baichuan2
BLOOM 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
BLOOMZ 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
ChatGLM3 6B query_key_value chatglm3
Command-R 35B/104B q_proj,v_proj cohere
DeepSeek (MoE) 7B/16B/67B/236B q_proj,v_proj deepseek
Falcon 7B/11B/40B/180B query_key_value falcon
Gemma/CodeGemma 2B/7B q_proj,v_proj gemma
InternLM2 7B/20B wqkv intern2
LLaMA 7B/13B/33B/65B q_proj,v_proj -
LLaMA-2 7B/13B/70B q_proj,v_proj llama2
LLaMA-3 8B/70B q_proj,v_proj llama3
LLaVA-1.5 7B/13B q_proj,v_proj vicuna
Mistral/Mixtral 7B/8x7B/8x22B q_proj,v_proj mistral
OLMo 1B/7B q_proj,v_proj -
PaliGemma 3B q_proj,v_proj gemma
Phi-1.5/2 1.3B/2.7B q_proj,v_proj -
Phi-3 3.8B qkv_proj phi
Qwen 1.8B/7B/14B/72B c_attn qwen
Qwen1.5 (Code/MoE) 0.5B/1.8B/4B/7B/14B/32B/72B/110B q_proj,v_proj qwen
StarCoder2 3B/7B/15B q_proj,v_proj -
XVERSE 7B/13B/65B q_proj,v_proj xverse
Yi (1/1.5) 6B/9B/34B q_proj,v_proj yi
Yi-VL 6B/34B q_proj,v_proj yi_vl
Yuan 2B/51B/102B q_proj,v_proj yuan

Supported Training Approaches

Approach Full-tuning Freeze-tuning LoRA QLoRA
Pre-Training
Supervised Fine-Tuning
Reward Modeling
PPO Training
DPO Training
KTO Training
ORPO Training

Supported Datasets

As of now, we support the following datasets, most of which are all available in the Hugging Face datasets library.

Supervised fine-tuning dataset
Preference datasets

Please refer to data/README.md to learn how to use these datasets. If you want to explore more datasets, please refer to the awesome-instruction-datasets. Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Data Preprocessing

We provide a number of data preprocessing tools in the data folder. These tools are intended to be a starting point for further research and development.

Model Zoo

We provide a number of models in the Hugging Face model hub. These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:

Base Model Adapter Instruct Datasets Train Script Log Model on Huggingface
llama-7b FullFinetune - - -
llama-7b QLoRA openassistant-guanaco finetune_lamma7b wandb log GaussianTech/llama-7b-sft
llama-7b QLoRA OL-CC finetune_lamma7b
baichuan7b QLoRA openassistant-guanaco finetune_baichuan7b wandb log GaussianTech/baichuan-7b-sft
baichuan7b QLoRA OL-CC finetune_baichuan7b wandb log -

Requirement

Mandatory Minimum Recommend
python 3.8 3.10
torch 1.13.1 2.2.0
transformers 4.37.2 4.41.0
datasets 2.14.3 2.19.1
accelerate 0.27.2 0.30.1
peft 0.9.0 0.11.1
trl 0.8.2 0.8.6
Optional Minimum Recommend
CUDA 11.6 12.2
deepspeed 0.10.0 0.14.0
bitsandbytes 0.39.0 0.43.1
vllm 0.4.0 0.4.2
flash-attn 2.3.0 2.5.8

Hardware Requirement

* estimated

Method Bits 7B 13B 30B 70B 110B 8x7B 8x22B
Full AMP 120GB 240GB 600GB 1200GB 2000GB 900GB 2400GB
Full 16 60GB 120GB 300GB 600GB 900GB 400GB 1200GB
Freeze 16 20GB 40GB 80GB 200GB 360GB 160GB 400GB
LoRA/GaLore/BAdam 16 16GB 32GB 64GB 160GB 240GB 120GB 320GB
QLoRA 8 10GB 20GB 40GB 80GB 140GB 60GB 160GB
QLoRA 4 6GB 12GB 24GB 48GB 72GB 30GB 96GB
QLoRA 2 4GB 8GB 16GB 24GB 48GB 18GB 48GB

Getting Started

Clone the code

Clone this repository and navigate to the Efficient-Tuning-LLMs folder

git clone https://github.com/jianzhnie/LLamaTuner.git
cd LLamaTuner

Getting Started

main function Useage Scripts
train.py Full finetune LLMs on SFT datasets full_finetune
train_lora.py Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune) lora_finetune
train_qlora.py Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs) qlora_finetune

QLora int4 Finetune

The train_qlora.py code is a starting point for finetuning and inference on various datasets. Basic command for finetuning a baseline model on the Alpaca dataset:

python train_qlora.py --model_name_or_path <path_or_name>

For models larger than 13B, we recommend adjusting the learning rate:

python train_qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name>

To find more scripts for finetuning and inference, please refer to the scripts folder.

Known Issues and Limitations

Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.

  1. 4-bit inference is slow. Currently, our 4-bit inference implementation is not yet integrated with the 4-bit matrix multiplication
  2. Resuming a LoRA training run with the Trainer currently runs on an error
  3. Currently, using bnb_4bit_compute_type='fp16' can lead to instabilities. For 7B LLaMA, only 80% of finetuning runs complete without error. We have solutions, but they are not integrated yet into bitsandbytes.
  4. Make sure that tokenizer.bos_token_id = 1 to avoid generation issues.

License

LLamaTuner is released under the Apache 2.0 license.

Acknowledgements

We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.

We appreciate the work by many open-source contributors, especially:

Some lmm fine-tuning repos

Citation

Please cite the repo if you use the data or code in this repo.

@misc{Chinese-Guanaco,
  author = {jianzhnie},
  title = {LLamaTuner: Easy and Efficient Fine-tuning LLMs},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jianzhnie/LLamaTuner}},
}

llamatuner's People

Contributors

jianzhnie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llamatuner's Issues

单机多卡并行训练报错

错误:
ValueError: DeepSpeed Zero-3 is not compatible with low_cpu_mem_usage=True or with passing a device_map.
训练脚本:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train_lora.py \ --model_name_or_path ./models/baichuan-13B-Base \ --dataset_name alpaca \ --data_dir data/alpaca.json \ --load_from_local \ --output_dir ./work_dir/baichuan-13b-wb-lora-ds \ --lora_target_modules W_pack \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_total_limit 20 \ --save_steps 500 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --source_max_len 512 \ --target_max_len 512 \ --lora_r 16 \ --lora_alpha 16 \ --lora_dropout 0.1 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --trust_remote_code \ --fp16 \ --deepspeed "scripts/ds_config/ds_config_zero3_auto.json"

微调训练失败

你好 大佬博主!
在使用lora微调的时候出现一下loss异常的原因可能是什么。
pytorch :2.0
在数据预处理阶段 将输入以及padding处的label标记为-100,在最后的计算交叉损失中忽略-100标志位的损失。
image

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │
│ │
│ 393 │
│ 394 │
│ 395 if name == 'main': │
│ ❱ 396 │ main() │
│ 397 │
│ │
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │
│ │
│ 309 │ set_seed(args.seed) │
│ 310 │ │
│ 311 │ # Tokenizer │
│ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 313 │ │ args.model_name_or_path, │
│ 314 │ │ cache_dir=args.cache_dir, │
│ 315 │ │ padding_side='right', │
│ │
│ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │
│ from_pretrained │
│ │
│ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │
│ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │
│ 687 │ │ │ if tokenizer_class is None: │
│ ❱ 688 │ │ │ │ raise ValueError( │
│ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │
│ 690 │ │ │ │ ) │
│ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

How to adapt QLoRA to other base models?

Thank you very much for your work! Additionally, I would like to know, which part of the code should I refer to in order to make the base model already open-sourced on Huggingface support QLoRA? I'm a bit confused, your guidance would be greatly appreciated!

faile on 3090

(gh_Chinese-Guanaco) ub2004@ub2004-B85M-A0:~/llm_dev/Chinese-Guanaco$ python3 qlora_int8_finetune.py --model_name_or_path /data-ssd-1t/hf_model/llama-7b-hf --data_path tatsu-lab/alpaca --output_dir work_dir_lora/ --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 500 --save_total_limit 5 --learning_rate 1e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --model_max_length 2048 --logging_steps 1
[2023-06-11 00:48:41,928] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
Traceback (most recent call last):
File "/home/ub2004/llm_dev/Chinese-Guanaco/qlora_int8_finetune.py", line 338, in
train(load_in_8bit=True)
File "/home/ub2004/llm_dev/Chinese-Guanaco/qlora_int8_finetune.py", line 234, in train
model = AutoModelForCausalLM.from_pretrained(
File "/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
return model_class.from_pretrained(
File "/home/ub2004/anaconda3/envs/gh_Chinese-Guanaco/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2819, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

(gh_Chinese-Guanaco) ub2004@ub2004-B85M-A0:~/llm_dev/Chinese-Guanaco$

What's the real LICENSING

We release the resources associated with QLoRA finetuning in this repository under MIT license. In addition, we release the Guanaco model family for base LLaMA model sizes of 7B, 13B, 33B, and 65B. These models are intended for purposes in line with the LLaMA license and require access to the LLaMA models.

but the LICENSE file say it's Apache License 2.0

ValueError: Undefined dataset tatsu-lab/alpaca

(yk_py39) amd00@MZ32-00:/llm_dev/Efficient-Tuning-LLMs$
(yk_py39) amd00@MZ32-00:
/llm_dev/Efficient-Tuning-LLMs$ python train_qlora.py --model_name_or_path /home/amd00/hf_model/llama-7b --output_dir ./out-llama-7b --dataset_name tatsu-lab/alpaca
[2023-08-03 17:54:26,303] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/train_qlora.py", line 102, in
main()
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/train_qlora.py", line 31, in main
data_args.init_for_training()
File "/home/amd00/llm_dev/Efficient-Tuning-LLMs/chatllms/configs/data_args.py", line 95, in init_for_training
raise ValueError('Undefined dataset {} in {}'.format(
ValueError: Undefined dataset tatsu-lab/alpaca in /home/amd00/llm_dev/Efficient-Tuning-LLMs/chatllms/configs/../../data/dataset_info.yaml
(yk_py39) amd00@MZ32-00:~/llm_dev/Efficient-Tuning-LLMs$

QLORA微调alpaca_data.json报错 'padding_value' (position 3) must be float, not NoneType

@jianzhnie 感谢你的工作!我在复现代码时遇到一个问题,想请教您一下: 我调用scripts/finetune_llama_7b_alpaca_zh.sh,微调alpaca_data.json数据集时。出现以下报错,请问是什么问题呀?

......
File "/home/likanxue1/.custom/cuda11.3.1-cudnn8-devel-ubuntu20.04-py38-jupyter/envs/tuning-llm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/likanxue1/model_inference/transformers/src/transformers/trainer_utils.py", line 772, in call
return self.data_collator(features)
File "/home/likanxue1/model_inference/Efficient-Tuning-LLMs/chatllms/data/sft_dataset.py", line 214, in call
input_ids = pad_sequence(input_ids,
File "/home/likanxue1/.custom/cuda11.3.1-cudnn8-devel-ubuntu20.04-py38-jupyter/envs/tuning-llm/lib/python3.10/site-packages/torch/nn/utils/rnn.py", line 399, in pad_sequence
return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
TypeError: pad_sequence(): argument 'padding_value' (position 3) must be float, not NoneType

我在/data/alpaca_zh_pcyn.yaml中只有以下几行信息:
alpaca:
hf_hub_url: ''
local_path: /home/likanxue1/Dataset/alpaca_data.json
dataset_format: alpaca
multi_turn: False

期待你的及时回复,多有打扰,请海涵!

数据预处理标签问题

您好,对你的数据预处理代码有些疑问,你的labels 和 input 都是直接对应的,labels 不用向右shift 一位吗?

合并代码还有一个错误,中间少了一个变量

刚才还发现一个问题,少个参数,导致保存模型的逻辑没生效,代码如下:
parser.add_argument('--load_8bit', type=bool, default=False)

args = parser.parse_args()

apply_lora(args.base_model_path,args.lora_path,**args.load_8bit,** args.target_model_path,
           args.save_target_model)

微调后的Llama-2-7b,在模型加载时出错

微调模型以后,我调用一下命令进行推理测试时,报错:
---- 指令 ----
python gradio_webserver.py
--model_name_or_path model_inference/model_path/Llama-2-7b-chat-ms
--lora_model_name_or_path ~/model_inference/model_path/checkpoint-344

--- 报错 ----
Loading the LoRA adapter from model_inference/model_path/checkpoint-344
Traceback (most recent call last):
File "model_inference/Efficient-Tuning-LLMs/chatllms/utils/apply_lora.py", line 90, in
apply_lora(base_model_path=args.base_model_path,
File "model_inference/Efficient-Tuning-LLMs/chatllms/utils/apply_lora.py", line 69, in apply_lora
model: PreTrainedModel = PeftModel.from_pretrained(base_model,
File "model_inference/peft/src/peft/peft_model.py", line 304, in from_pretrained
config = PEFT_TYPE_TO_CONFIG_MAPPING[
File "~/model_inference/peft/src/peft/config.py", line 134, in from_pretrained
config = config_cls(**kwargs)
TypeError: LoraConfig.init() got an unexpected keyword argument 'loftq_config'

请问这个问题如何解决?

baichuan-7B: AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

chatllms - INFO - Adding special tokens.
Using pad_token, but it is not set yet.
Traceback (most recent call last):
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/train_qlora.py", line 156, in
main()
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/train_qlora.py", line 80, in main
add_special_tokens_if_missing(tokenizer, model)
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/chatllms/utils/model_utils.py", line 47, in add_special_tokens_if_missing
smart_tokenizer_and_embedding_resize(special_tokens_dict, tokenizer,
File "/content/drive/MyDrive/Efficient-Tuning-LLMs/chatllms/utils/model_utils.py", line 77, in smart_tokenizer_and_embedding_resize
model.resize_token_embeddings(len(tokenizer))
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1395, in resize_token_embeddings
model_embeds = self._resize_token_embeddings(new_num_tokens)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1416, in _resize_token_embeddings
new_lm_head = self._get_resized_lm_head(old_lm_head, new_num_tokens)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1520, in _get_resized_lm_head
old_lm_head.weight.size() if not transposed else old_lm_head.weight.t().size()
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

想请教一下是什么问题

合并的Bug修复,等会就出来验证结果了

if target_model_path is not None:
print(f'Saving the target model to {target_model_path}')
model.save_pretrained(target_model_path)
base_tokenizer.save_pretrained(target_model_path)

这里还需要增加一个函数调用。
lora_model = lora_model.merge_and_unload()
merge_and_unload函数在16位加载模型的时候有用,8bit不行
下面这个不确定要不要调用
lora_model.train(False)

utils/apply_lora.py 有个小Bug

apply_lora(args.base_model_path, args.target_model_path, args.lora_path,
args.save_target_model)

def apply_lora(
base_model_path: str,
lora_path: str,
load_8bit: bool = False,
target_model_path: str = None,
save_target_model: bool = False
) -> Tuple[AutoModelForCausalLM, AutoTokenizer]:
两个变量的位置错了

批量推理时结果异常

给 LLaMA2-7B 调用 model.generate 函数做推理,"do_sample":False,
如果batch_size=1,则结果很好,
如果batch_size>1,则结果异常,
使用 trainer.predict 批量推理出的结果也是异常的。
#请问大佬们遇到过这种情况吗?

4bit loaded error

Traceback (most recent call last):
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 917, in
train()
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 728, in train
model = get_accelerate_model(args, checkpoint_dir)
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/qlora_int4_finetune.py", line 318, in get_accelerate_model
model = AutoModelForCausalLM.from_pretrained(
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/transformers/models/auto/auto_factory.py", line 490, in from_pretrained
return model_class.from_pretrained(
File "/data1/Semantic_team/my-chatbot/Chinese-Guanaco/transformers/modeling_utils.py", line 2765, in from_pretrained
raise ValueError(
ValueError: You are using device_map='auto' on a 4bit loaded version of the model. To automatically compute the appropriate device map, you should upgrade your accelerate library,pip install --upgrade accelerate or install it from source to support fp4 auto device mapcalculation. You may encounter unexpected behavior, or pass your own device map

I have used the lastest version accelerate or source code of github, the error remains the same.

llama-2-13b的模型用单卡跑lora就会报错

用单卡跑就会显示“NotImplementedError: Cannot copy out of meta tensor; no data!”
用双卡跑就正常,但是占用显存很大。两张48G的A6000,每张都会占用35G左右

[问题]有关训练可视化

感谢您对于项目的贡献!

问题描述

对于微调训练过程来说,Efficient-Tuning-LLMs有 可视化实时训练进展、曲线的方式吗?

期待您的回复!

如何使用自己的数据集

创建自己数据集的yaml文件只需要指定'local_path'以及‘dataset_format’不对吗,其他yaml文件会有特别多数据,这里有点不清楚。
感谢您能解答

data_utils.py 是不是有问题?

348 train_dataset, eval_dataset = split_train_eval(
349 dataset,
350 do_eval=args.do_eval,
351 eval_dataset_size=args.eval_dataset_size,
352 max_eval_samples=args.max_eval_samples,
353 do_train=args.do_train,
354 max_train_samples=args.max_train_samples,
355 )
356 if train_dataset:
357 print('=' * 80)
358 print('loaded dataset:', dataset_name, 'train data size:',
359 len(train_dataset))
360 train_datasets.append(train_dataset)
361 if eval_datasets:
362 print('=' * 80)
363 print('loaded dataset:', dataset_name, 'eval data size:',
364 len(eval_dataset))
365 eval_datasets.append(eval_dataset)

361行,eval_datasets --> eval_dataset ???

About llama-2-70B fine-tuning

Appreciate your great work!

Is it possible to fine tune the llama-2-70B for a 3*8*A100 (40G) configuration, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.