Giter VIP home page Giter VIP logo

llama-factory's Introduction

# LLaMA Factory

GitHub Repo stars GitHub Code License GitHub last commit PyPI Downloads Citation GitHub pull request Discord Twitter Spaces Studios Open in Colab

๐Ÿ‘‹ Join our WeChat.

[ English | ไธญๆ–‡ ]

Fine-tuning a large language model can be easy as...

tutorial_en.mp4

Choose your path:

Table of Contents

Features

  • Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
  • Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO and ORPO.
  • Scalable resources: 32-bit full-tuning, 16-bit freeze-tuning, 16-bit LoRA and 2/4/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8.
  • Advanced algorithms: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and Agent tuning.
  • Practical tricks: FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA.
  • Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
  • Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.

Benchmark

Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

benchmark

Definitions
  • Training Speed: the number of training samples processed per second during the training. (bs=4, cutoff_len=1024)
  • Rouge Score: Rouge-2 score on the development set of the advertising text generation task. (bs=4, cutoff_len=1024)
  • GPU Memory: Peak GPU memory usage in 4-bit quantized training. (bs=1, cutoff_len=1024)
  • We adopt pre_seq_len=128 for ChatGLM's P-Tuning and lora_rank=32 for LLaMA Factory's LoRA tuning.

Changelog

[24/04/26] We supported fine-tuning the LLaVA-1.5 multimodal LLMs. See examples/lora_single_gpu/sft_mllm.sh for usage.

[24/04/22] We provided a Colab notebook for fine-tuning the Llama-3 model on a free T4 GPU. Two Llama-3-derived models fine-tuned using LLaMA Factory are available at Hugging Face, check Llama3-8B-Chinese-Chat and Llama3-Chinese for details.

[24/04/21] We supported Mixture-of-Depths according to AstraMindAI's implementation. See examples/extras/mod for usage.

[24/04/16] We supported BAdam. See examples/extras/badam for usage.

[24/04/16] We supported unsloth's long-sequence training (Llama-2-7B-56k within 24GB). It achieves 117% speed and 50% memory compared with FlashAttention-2, more benchmarks can be found in this page.

Full Changelog

[24/03/31] We supported ORPO. See examples/lora_single_gpu for usage.

[24/03/21] Our paper "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models" is available at arXiv!

[24/03/20] We supported FSDP+QLoRA that fine-tunes a 70B model on 2x24GB GPUs. See examples/extras/fsdp_qlora for usage.

[24/03/13] We supported LoRA+. See examples/extras/loraplus for usage.

[24/03/07] We supported gradient low-rank projection (GaLore) algorithm. See examples/extras/galore for usage.

[24/03/07] We integrated vLLM for faster and concurrent inference. Try --infer_backend vllm to enjoy 270% inference speed. (LoRA is not yet supported, merge it first.)

[24/02/28] We supported weight-decomposed LoRA (DoRA). Try --use_dora to activate DoRA training.

[24/02/15] We supported block expansion proposed by LLaMA Pro. See examples/extras/llama_pro for usage.

[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this blog post for details.

[24/01/18] We supported agent tuning for most models, equipping model with tool using abilities by fine-tuning with --dataset glaive_toolcall.

[23/12/23] We supported unsloth's implementation to boost LoRA tuning for the LLaMA, Mistral and Yi models. Try --use_unsloth argument to activate unsloth patch. It achieves 170% speed in our benchmark, check this page for details.

[23/12/12] We supported fine-tuning the latest MoE model Mixtral 8x7B in our framework. See hardware requirement here.

[23/12/01] We supported downloading pre-trained models and datasets from the ModelScope Hub for Chinese mainland users. See this tutorial for usage.

[23/10/21] We supported NEFTune trick for fine-tuning. Try --neftune_noise_alpha argument to activate NEFTune, e.g., --neftune_noise_alpha 5.

[23/09/27] We supported $S^2$-Attn proposed by LongLoRA for the LLaMA models. Try --shift_attn argument to enable shift short attention.

[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See this example to evaluate your models.

[23/09/10] We supported FlashAttention-2. Try --flash_attn fa2 argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.

[23/08/12] We supported RoPE scaling to extend the context length of the LLaMA models. Try --rope_scaling linear argument in training and --rope_scaling dynamic argument at inference to extrapolate the position embeddings.

[23/08/11] We supported DPO training for instruction-tuned models. See this example to train your models.

[23/07/31] We supported dataset streaming. Try --streaming and --max_steps 10000 arguments to load your dataset in streaming mode.

[23/07/29] We released two instruction-tuned 13B models at Hugging Face. See these Hugging Face Repos (LLaMA-2 / Baichuan) for details.

[23/07/18] We developed an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.

[23/07/09] We released FastEdit โšก๐Ÿฉน, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.

[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.

[23/06/22] We aligned the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.

[23/06/03] We supported quantized training and inference (aka QLoRA). Try --quantization_bit 4/8 argument to work with quantized models.

Supported Models

Model Model size Default module Template
Baichuan2 7B/13B W_pack baichuan2
BLOOM 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
BLOOMZ 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
ChatGLM3 6B query_key_value chatglm3
Command-R 35B/104B q_proj,v_proj cohere
DeepSeek (MoE) 7B/16B/67B q_proj,v_proj deepseek
Falcon 7B/40B/180B query_key_value falcon
Gemma/CodeGemma 2B/7B q_proj,v_proj gemma
InternLM2 7B/20B wqkv intern2
LLaMA 7B/13B/33B/65B q_proj,v_proj -
LLaMA-2 7B/13B/70B q_proj,v_proj llama2
LLaMA-3 8B/70B q_proj,v_proj llama3
LLaVA-1.5 7B/13B q_proj,v_proj vicuna
Mistral/Mixtral 7B/8x7B/8x22B q_proj,v_proj mistral
OLMo 1B/7B q_proj,v_proj -
Phi-1.5/2 1.3B/2.7B q_proj,v_proj -
Phi-3 3.8B qkv_proj phi
Qwen 1.8B/7B/14B/72B c_attn qwen
Qwen1.5 (Code/MoE) 0.5B/1.8B/4B/7B/14B/32B/72B/110B q_proj,v_proj qwen
StarCoder2 3B/7B/15B q_proj,v_proj -
XVERSE 7B/13B/65B q_proj,v_proj xverse
Yi 6B/9B/34B q_proj,v_proj yi
Yuan 2B/51B/102B q_proj,v_proj yuan

Note

Default module is used for the --lora_target argument, you can use --lora_target all to specify all the available modules for better convergence.

For the "base" models, the --template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "instruct/chat" models.

Remember to use the SAME template in training and inference.

Please refer to constants.py for a full list of models we supported.

You also can add a custom chat template to template.py.

Supported Training Approaches

Approach Full-tuning Freeze-tuning LoRA QLoRA
Pre-Training โœ… โœ… โœ… โœ…
Supervised Fine-Tuning โœ… โœ… โœ… โœ…
Reward Modeling โœ… โœ… โœ… โœ…
PPO Training โœ… โœ… โœ… โœ…
DPO Training โœ… โœ… โœ… โœ…
ORPO Training โœ… โœ… โœ… โœ…

Provided Datasets

Pre-training datasets
Supervised fine-tuning datasets
Preference datasets

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Mandatory Minimum Recommend
python 3.8 3.10
torch 1.13.1 2.2.0
transformers 4.37.2 4.39.3
datasets 2.14.3 2.18.0
accelerate 0.27.2 0.28.0
peft 0.9.0 0.10.0
trl 0.8.1 0.8.1
Optional Minimum Recommend
CUDA 11.6 12.2
deepspeed 0.10.0 0.14.0
bitsandbytes 0.39.0 0.43.0
flash-attn 2.3.0 2.5.6

Hardware Requirement

* estimated

Method Bits 7B 13B 30B 70B 110B 8x7B 8x22B
Full AMP 120GB 240GB 600GB 1200GB 2000GB 900GB 2400GB
Full 16 60GB 120GB 300GB 600GB 900GB 400GB 1200GB
Freeze 16 20GB 40GB 80GB 200GB 360GB 160GB 400GB
LoRA/GaLore/BAdam 16 16GB 32GB 64GB 160GB 240GB 120GB 320GB
QLoRA 8 10GB 20GB 40GB 80GB 140GB 60GB 160GB
QLoRA 4 6GB 12GB 24GB 48GB 72GB 30GB 96GB
QLoRA 2 4GB 8GB 16GB 24GB 48GB 18GB 48GB

Getting Started

Data Preparation

Please refer to data/README.md for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk.

Note

Please update data/dataset_info.json to use your custom dataset.

Dependence Installation

git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]

Extra dependencies available: deepspeed, metrics, galore, badam, vllm, bitsandbytes, gptq, awq, aqlm, qwen, modelscope, quality

For Windows users

If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.2, please select the appropriate release version based on your CUDA version.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl

To enable FlashAttention-2 on the Windows platform, you need to install the precompiled flash-attn library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from flash-attention based on your requirements.

Train with LLaMA Board GUI (powered by Gradio)

Important

LLaMA Board GUI only supports training on a single GPU, please use CLI for distributed training.

Use local environment

export CUDA_VISIBLE_DEVICES=0 # `set CUDA_VISIBLE_DEVICES=0` for Windows
export GRADIO_SERVER_PORT=7860 # `set GRADIO_SERVER_PORT=7860` for Windows
python src/train_web.py # or python -m llmtuner.webui.interface
For Alibaba Cloud users

If you encountered display problems in LLaMA Board on Alibaba Cloud, try using the following command to set environment variables before starting LLaMA Board:

export GRADIO_ROOT_PATH=/${JUPYTER_NAME}/proxy/7860/

Use Docker

docker build -f ./Dockerfile -t llama-factory:latest .
docker run --gpus=all \
    -v ./hf_cache:/root/.cache/huggingface/ \
    -v ./data:/app/data \
    -v ./output:/app/output \
    -e CUDA_VISIBLE_DEVICES=0 \
    -p 7860:7860 \
    --shm-size 16G \
    --name llama_factory \
    -d llama-factory:latest

Use Docker Compose

docker compose -f ./docker-compose.yml up -d
Details about volume
  • hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
  • data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
  • output: Set export dir to this location so that the merged result can be accessed directly on the host machine.

Train with Command Line Interface

See examples/README.md for usage.

Use python src/train_bash.py -h to display arguments description.

Deploy with OpenAI-style API and vLLM

CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 python src/api_demo.py \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --template llama3 \
    --infer_backend vllm \
    --vllm_enforce_eager

Download from ModelScope Hub

If you have trouble with downloading models and datasets from Hugging Face, you can use ModelScope.

export USE_MODELSCOPE_HUB=1 # `set USE_MODELSCOPE_HUB=1` for Windows

Train the model by specifying a model ID of the ModelScope Hub as the --model_name_or_path. You can find a full list of model IDs at ModelScope Hub, e.g., LLM-Research/Meta-Llama-3-8B-Instruct.

Projects using LLaMA Factory

If you have a project that should be incorporated, please contact via email or create a pull request.

Click to show
  1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [arxiv]
  2. Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [arxiv]
  3. Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [arxiv]
  4. Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [arxiv]
  5. Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [arxiv]
  6. Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. 2024. [arxiv]
  7. Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. 2024. [arxiv]
  8. Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [arxiv]
  9. Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [arxiv]
  10. Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [arxiv]
  11. Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [arxiv]
  12. Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [arxiv]
  13. Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [arxiv]
  14. Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. 2024. [arxiv]
  15. Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [arxiv]
  16. Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [arxiv]
  17. Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv]
  18. Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. 2024. [arxiv]
  19. Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [arxiv]
  20. Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [arxiv]
  21. Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [arxiv]
  22. Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [arxiv]
  23. Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [arxiv]
  24. Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. 2024. [arxiv]
  25. Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [arxiv]
  26. Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [arxiv]
  27. Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [arxiv]
  28. Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [arxiv]
  29. Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [arxiv]
  30. StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
  31. DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
  32. Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
  33. CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.
  34. MachineMindset: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.

License

This repository is licensed under the Apache-2.0 License.

Please follow the model licenses to use the corresponding model weights: Baichuan2 / BLOOM / ChatGLM3 / Command-R / DeepSeek / Falcon / Gemma / InternLM2 / LLaMA / LLaMA-2/LLaVA-1.5 / LLaMA-3 / Mistral / OLMo / Phi-1.5/2 / Phi-3 / Qwen / StarCoder2 / XVERSE / Yi / Yuan

Citation

If this work is helpful, please kindly cite as:

@article{zheng2024llamafactory,
  title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
  author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Yongqiang Ma},
  journal={arXiv preprint arXiv:2403.13372},
  year={2024},
  url={http://arxiv.org/abs/2403.13372}
}

Acknowledgement

This repo benefits from PEFT, TRL, QLoRA and FastChat. Thanks for their wonderful works.

Star History

Star History Chart

llama-factory's People

Contributors

0xez avatar a-cepheus avatar anvie avatar beat4ocean avatar billvsme avatar buaadreamer avatar codemayq avatar fenglui avatar gitycc avatar hannlp avatar hiyouga avatar jessytsu1 avatar johannhartmann avatar khazic avatar ledzy avatar liuyanyi avatar marko1616 avatar mlinmg avatar mmrbun avatar mrhan1993 avatar s3studio avatar samge0 avatar shanetian avatar slidersun avatar stephen2s avatar tastelikefeet avatar tsumugii24 avatar wangxingjun778 avatar xd2333 avatar yhyu13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llama-factory's Issues

baichuan-7b-sft ไฝฟ็”จ็š„ไป€ไนˆๅฏน่ฏๆ•ฐๆฎๅ‘ข๏ผŸ

rt, ไฝฟ็”จไบ†ไธ‹ๆ˜จๅคฉๅผ€ๆบ็š„baichuan-7b-sft ๆจกๅž‹๏ผŒๆ„Ÿ่ง‰ๆŒบไธ้”™็š„๏ผŒๆƒณ่ฏท้—ฎไธ‹่ฎญ็ปƒ่ฟ‡็จ‹ไฝฟ็”จไบ†ไป€ไนˆๅฏน่ฏๆ•ฐๆฎๅ—๏ผŸๆ–นไพฟๅ…ฌๅผ€ๅ—~ๆ„Ÿ่ฐข๏ผ๏ผ

่ฎญ็ปƒๆŠฅ้”™

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.

QLoRA ่ฎญ็ปƒๆŠฅ้”™

int4 ๆŠฅ้”™๏ผšRuntimeError: self and mat2 must have the same dtype
่ฎญ็ปƒๅ‚ๆ•ฐ๏ผš
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py
--model_name_or_path /models/bloomz-7b1-mt
--do_train
--dataset alpaca_gpt4_zh
--finetuning_type lora
--quantization_bit 4
--output_dir bloomz_lora
--overwrite_cache
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--resume_lora_training False
--plot_loss
--fp16

ไธŠ่ฟฐๅ‚ๆ•ฐ็š„ --quantization_bit ๅฆ‚ๆžœ่ฎพ็ฝฎไธบ 8  ๅฏๆญฃๅธธ่ฎญ็ปƒ
่ฎพๅค‡๏ผšRTX3080

load_valuehead_params ๆฒกๆœ‰้‚ฃไธชๆ–‡ไปถvalue_head.bin

่ฏท้—ฎๅคงๆ‹ฟ๏ผŒๅฆ‚ไธ‹ๆ“ไฝœๆŠฅ้”™๏ผŒๆœ‰้‡ๅˆฐ่ฟ™็งๆƒ…ๅ†ต็š„ๅ—
LLaMA train
๏ผˆๆŒ็ปญ๏ผ‰้ข„ๅŸน่ฎญ
CUDA_VISIBLE_DEVICES=0 python src/train_pt.py
--model_name_or_path path_to_llama_model
--do_train
--dataset wiki_demo
--finetuning_type lora
--output_dir path_to_pt_checkpoint
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

็›ฎๅฝ• path_to_pt_checkpoint ๆฒกๆœ‰ๆ–‡ไปถvalue_head.bin
ๅฝ“่ฎญ็ปƒrwๆจกๅž‹ๆ—ถ
CUDA_VISIBLE_DEVICES=0 python src/train_rm.py
--model_name_or_path path_to_llama_model
--do_train
--dataset comparison_gpt4_en
--finetuning_type lora
--checkpoint_dir path_to_pt_checkpoint
--output_dir path_to_rm_checkpoint
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-5
--num_train_epochs 1.0
--plot_loss
--fp16
ๆŠฅ้”™่ฟ™ไธช็›ฎๅฝ•path_to_pt_checkpointไธ‹้ขๆฒกๆœ‰่ฟ™ไธชๆ–‡ไปถvalue_head.bin
ไปŽๆฅๆฒกๆœ‰่ง่ฟ‡่ฟ™ไธชๆ–‡ไปถๅ•Š

reward่ฎญ็ปƒๅผ‚ๅธธ

Traceback (most recent call last):
File "src/train_rm.py", line 77, in
main()
File "src/train_rm.py", line 25, in main
model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="rm")
File "/baichuan-7B/train/LLaMA-Efficient-Tuning/src/utils/common.py", line 217, in load_pretrained
model = _init_adapter(model, model_args, finetuning_args, is_trainable, is_mergeable)
File "/baichuan-7B/train/LLaMA-Efficient-Tuning/src/utils/common.py", line 135, in _init_adapter
model = get_peft_model(model, lora_config)
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/mapping.py", line 120, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/peft_model.py", line 662, in init
super().init(model, peft_config, adapter_name)
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/peft_model.py", line 99, in init
self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/tuners/lora.py", line 154, in init
self.add_adapter(adapter_name, self.peft_config[adapter_name])
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/tuners/lora.py", line 161, in add_adapter
self._find_and_replace(adapter_name)
File "/root/miniconda3/envs/baichuan/lib/python3.8/site-packages/peft/tuners/lora.py", line 254, in _find_and_replace
raise ValueError(
ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

[Question] ๅ…ณไบŽๅขž้‡้ข„่ฎญ็ปƒ็š„ๅ‡ ไธช้—ฎ้ข˜

ไฝœ่€…ๆ‚จๅฅฝ๏ผŒๅ…ณไบŽไฝฟ็”จbaichuan-7Bๅšๅขž้‡้ข„่ฎญ็ปƒๆœ‰ๅ‡ ไธช้—ฎ้ข˜๏ผš

CUDA_VISIBLE_DEVICES=0 python src/train_pt.py \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset wiki_demo \
    --finetuning_type lora \
    --output_dir path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

ๅ…ถไธญ๏ผŒ

  • finetuning_type loraๆ„ๆ€ๆ˜ฏๅช็”จLoRAๆƒ้‡ๅš้ข„่ฎญ็ปƒๅ—๏ผŸ
  • ็กฌไปถ่ต„ๆบ็š„่ฆๆฑ‚ๆ˜ฏไป€ไนˆ๏ผŒ8ๅผ V100ๆˆ–่€…8ๅผ 3090ๅคŸ็”จๅ—๏ผŸ
  • ้ข„่ฎญ็ปƒ็š„ไปปๅŠกๆ˜ฏไป€ไนˆๅ‘ข๏ผŸ
  • ่‡ชๅฎšไน‰ๆ•ฐๆฎ้›†็š„ๆ ผๅผๅ‚่€ƒwiki_demoๅ—๏ผŸ
  • ๅ…ถไป–ๅ‚ๆ•ฐ้œ€ไธ้œ€่ฆๆ”นๅ‘ข๏ผŸๆˆ‘็Žฐๅœจๆœ‰ๅคงๆฆ‚4-5ไธ‡็š„ๆ•ฐๆฎๅฏไปฅๅš่‡ช็›‘็ฃ่ฎญ็ปƒ๏ผŒepochs่ฎพ็ฝฎๅคšๅฐ‘ๅˆ้€‚ๅ‘ข๏ผŸ

SFT full parameter finetuning - Unable to load the model

I have finetuned LLaMa 7B with full parameters using the following command

deepspeed src/train_sft.py --model_name_or_path huggyllama/llama-7b --do_train --dataset dummy_identity --finetuning_type full --output_dir output/sft-dummy-v1 --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --logging_steps 10 --save_steps 1000 --learning_rate 5e-5 --num_train_epochs 3.0 --plot_loss --fp16 --deepspeed /root/bud-conv/finetune/configs/ds_config.json

How do I run this in cli? When I try the command

python src/cli_demo.py --model_name_or_path huggyllama/llama-7b --checkpoint_dir output/sft-dummy-v1/

I'm getting this

ValueError: The given checkpoint may be not a LoRA checkpoint, please specify --finetuning_type full/freeze instead.

When I specify the finetunetype

python src/cli_demo.py --model_name_or_path huggyllama/llama-7b --checkpoint_dir output/sft-dummy-v1/ --finetuning_type full

I'm getting a shape error as below

image

ไฝ ๅฅฝ๏ผŒ่ฏท้—ฎๆ€Žไนˆๆ ทไฝฟ็”จdeepspeedๅคšๅก่ฎญ็ปƒๅ‘€๏ผŒๅŠ ไบ†deepspeed configๅŽ่ท‘ไธ่ตทๆฅ

โ”‚ 771 โ”‚ โ”‚ if self.distributed_type == DistributedType.DEEPSPEED: โ”‚
โ”‚ โฑ 772 โ”‚ โ”‚ โ”‚ config = self.deepspeed_plugin.deepspeed_config โ”‚
โ”‚ 773 โ”‚ โ”‚ โ”‚ if config.get("fp16", {}).get("enabled", False): โ”‚
โ”‚ 774 โ”‚ โ”‚ โ”‚ โ”‚ mixed_precision = "fp16" โ”‚
โ”‚ 775 โ”‚ โ”‚ โ”‚ elif config.get("bf16", {}).get("enabled", False): โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
AttributeError: 'NoneType' object has no attribute 'deepspeed_config'

openaiapi compatible api_demo support

ๅฏไปฅๅขžๅŠ ๅฎŒๅ…จๅ…ผๅฎนopenai api็š„api demoๅ—๏ผŸ่ฟ™ๆ ท็š„่ฏ๏ผŒๆˆ‘ไปฌๅฐฑๅฏไปฅไฝฟ็”จๅคง้ƒจๅˆ†็š„ๅ‰็ซฏ๏ผŒไพ‹ๅฆ‚chatbotui๏ผŒchatgpt-next ็ญ‰ใ€‚

webui ๅชๅŠ ่ฝฝZiya 13B๏ผŒๆŽจ็†็š„ๆ—ถๅ€™ๆŠฅ RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

้”™่ฏฏ๏ผš

Traceback (most recent call last):
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/gradio/blocks.py", line 1039, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/gradio/utils.py", line 491, in async_iteration
    return next(iterator)
  File "/home/hysz/AI/LLaMA-Efficient-Tuning/src/web_demo.py", line 99, in predict
    generation_output = model.generate(input_ids=input_ids, **gen_kwargs)
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/generation/utils.py", line 1568, in generate
    return self.sample(
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
06/03/2023 13:48:06 - INFO - httpx - HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 500 Internal Server Error"
06/03/2023 13:48:06 - INFO - httpx - HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"
06/03/2023 13:48:07 - INFO - httpx - HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 200 OK"
06/03/2023 13:48:07 - INFO - httpx - HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"

PPO่ฎญ็ปƒๆŠฅ้”™Tensors must be CUDA and denseTensors must be CUDA and dense

ๆŠฅ้”™๏ผš

Assistant:<s>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/tmp/cct/src/train_ppo.py", line 82, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/tmp/cct/src/train_ppo.py", line 82, in <module>
  File "/tmp/cct/src/train_ppo.py", line 82, in <module>
  File "/tmp/cct/src/train_ppo.py", line 82, in <module>
    main()
    main()
    main()
    main()
  File "/tmp/cct/src/train_ppo.py", line 55, in main
  File "/tmp/cct/src/train_ppo.py", line 55, in main
  File "/tmp/cct/src/train_ppo.py", line 55, in main
  File "/tmp/cct/src/train_ppo.py", line 55, in main
    ppo_trainer = PPOPeftTrainer(
     ppo_trainer = PPOPeftTrainer(
            ppo_trainer = PPOPeftTrainer(
 ppo_trainer = PPOPeftTrainer(
                                ^   ^   ^   ^    ^     ^ ^ ^ ^^^^ ^^^^^^ ^^^ ^^^ ^^^^^^^^^^^
^^^^^^  File "/tmp/cct/src/utils/ppo.py", line 72, in __init__
^^^^^^^^
^^^^  File "/tmp/cct/src/utils/ppo.py", line 72, in __init__
^
^^  File "/tmp/cct/src/utils/ppo.py", line 72, in __init__
^^^^
  File "/tmp/cct/src/utils/ppo.py", line 72, in __init__
    PPOTrainer.__init__(self, **kwargs)    PPOTrainer.__init__(self, **kwargs)

        PPOTrainer.__init__(self, **kwargs)PPOTrainer.__init__(self, **kwargs)

  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
    ) = self.accelerator.prepare(
) = self.accelerator.prepare() = self.accelerator.prepare(

     ) = self.accelerator.prepare(
             ^   ^   ^ ^    ^  ^ ^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1182, in prepare
^^^^^^^^^^^^^^^
^
^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1182, in prepare
^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1182, in prepare
^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1182, in prepare
    result = tuple(
             ^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
    result = tuple(
result = tuple(
            result = tuple(
                      ^ ^ ^ ^ ^ ^^ ^ ^^^^^^

^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
^^
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)    ^self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
^
^^^^    ^self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)

  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
    ^^^^^^ ^ ^ ^  ^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^return self.prepare_model(obj, device_placement=device_placement)^^^
^^^^^^^^^^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
^^^^^^^^^^^^^^
^^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
^^^^^^
^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
    return self.prepare_model(obj, device_placement=device_placement)
           ^^^^^^    ^return self.prepare_model(obj, device_placement=device_placement)^
^^    ^return self.prepare_model(obj, device_placement=device_placement)^
^ ^ ^ ^ ^ ^ ^  ^  ^      ^  model = torch.nn.parallel.DistributedDataParallel(^
^^ ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^ ^^^ ^ ^^^ ^^^ ^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
^^^^^^^    ^^model = torch.nn.parallel.DistributedDataParallel(
^
^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
^^ ^ ^ ^
     File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
    model = torch.nn.parallel.DistributedDataParallel(
            ^^    ^model = torch.nn.parallel.DistributedDataParallel(^
^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
                _sync_module_states(_sync_module_states(
_sync_module_states(_sync_module_states(


  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
    _sync_params_and_buffers(
        _sync_params_and_buffers(_sync_params_and_buffers(_sync_params_and_buffers(  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers



  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
    dist._broadcast_coalesced(
RuntimeError    : dist._broadcast_coalesced(Tensors must be CUDA and dense

dist._broadcast_coalesced(dist._broadcast_coalesced(RuntimeError

: Tensors must be CUDA and dense
RuntimeErrorRuntimeError: : Tensors must be CUDA and denseTensors must be CUDA and dense

WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2344665 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2344666) of binary: /root/miniconda3/envs/ppo/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/ppo/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/commands/launch.py", line 932, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/accelerate/commands/launch.py", line 627, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ppo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
src/train_ppo.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-06-13_10:49:40
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2344667)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-06-13_10:49:40
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 2344668)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-13_10:49:40
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2344666)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

ๅ‘ฝไปค๏ผš

accelerate launch src/train_ppo.py \
    --model_name_or_path llama-hf/ \
    --do_train \
    --dataset CCT \
    --quantization_bit 4 \
    --checkpoint_dir sft/checkpoint-3000 \
    --reward_model rm \
    --output_dir ppo \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 2.0 \
    --resume_lora_training False \
    --plot_loss

็”จไธคๅผ 4090ๅพฎ่ฐƒ13b็š„belleๅ‡บ็Žฐoom๏ผŒๅ•ๅกๅˆ™ไธไผš

ๆˆ‘ๅ•ๅกๅพฎ่ฐƒๆฒกๆœ‰ๅ‡บ็Žฐ่ฟ™ไธชๆƒ…ๅ†ต๏ผŒๅคšๅกๅ‡บ็Žฐไบ†๏ผŒไฝ†ๆ˜ฏๆˆ‘ๆœ‰ไธ€ๅผ ๅกๅทฒ็ป่ขซๅ ็”จไบ†15Gๆ˜พๅญ˜๏ผŒ่ฟ˜ๅ‰ฉ8gๅทฆๅณ๏ผŒ็›ธๅฝ“ไบŽๆˆ‘ๆ˜ฏ8+24g่ฟ›่กŒๅคšๅกๅพฎ่ฐƒ๏ผŒ่ฟ™ๆ ทๅพฎ่ฐƒไผš็กฎๅฎžไผšๅ‡บ็Žฐ้—ฎ้ข˜๏ผŸ่ฟ˜ๆ˜ฏๆˆ‘ๆฒก้…็ฝฎๅฅฝ็š„้—ฎ้ข˜๏ผŸ

่ฎญ็ปƒไน‹ๅŽๅญ˜ๅœจ็พ้šพๆ€ง้—ๅฟ˜้—ฎ้ข˜

ๅคงไฝฌๅฅฝ๏ผŒๆจกๅž‹ๅพฎ่ฐƒไน‹ๅŽ๏ผŒไน‹ๅ‰็š„ๅŠŸ่ƒฝๅ…จ้ƒจไธขๅคฑไบ†๏ผŒๅชไผšๅšๅพฎ่ฐƒ่ฎญ็ปƒๆ•ฐๆฎไธญ็š„ไปปๅŠก๏ผŒ่ฟ™ไธชๆ€Žไนˆ่งฃๅ†ณใ€‚้‡‡็”จ็š„ๆ˜ฏ้ป˜่ฎค้…็ฝฎ๏ผŒๆจกๅž‹ๆ˜ฏziya-llama, 1Wๆกๆ•ฐๆฎ๏ผŒ3 epoch

้€š่ฟ‡webui.py ๅฏผๅ…ฅ13BๅŽŸๆจกๅž‹๏ผŒ็”จ8bitๆ–นๅผไผšๆŠฅ้”™

้€š่ฟ‡webui.py ๅฏผๅ…ฅ13BๅŽŸๆจกๅž‹๏ผŒ็”จ8bitๆ–นๅผไผšๆŠฅ้”™๏ผŒๆ‰ง่กŒไปฃ็ ๅฆ‚ไธ‹๏ผš
python src/web_demo.py --model_name_or_path ../models/Ziya-LLaMA-13B --quantization_bit 8
ๅ‡บ้”™ไฟกๆฏๅฆ‚ไธ‹๏ผš

Traceback (most recent call last):
  File "/home/hysz/AI/LLaMA-Efficient-Tuning/src/web_demo.py", line 18, in <module>
    model, tokenizer = load_pretrained(model_args, finetuning_args)
  File "/home/hysz/AI/LLaMA-Efficient-Tuning/src/utils/common.py", line 182, in load_pretrained
    model = model.half() # cast all params to float16 for inference
  File "/home/hysz/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1896, in half
    raise ValueError(
ValueError: `.half()` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been casted to the correct `dtype`.

column names don't match, An error occurred while generating the dataset

May I have some hint about how to solve this question pls๏ผš
image

The detail๏ผšI want to use the dataset format like this in json file๏ผš
image
Then I just add the dataset info in the dataset_info.json like this๏ผš
image
My file are set like this๏ผš
-baichuan
--baichuan-7B
---baichuan-7B
--LLaMA-Efficient-Tuning
---data
----alpaca4zh.json
image
The training command๏ผš
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py
--model_name_or_path /root/baichuan/baichuan-7B/baichuan-7B
--do_train
--dataset alpaca4zh
--finetuning_type lora
--lora_rank 8
--lora_target W_pack
--output_dir alpaca_baichuan
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--eval_steps 100
--learning_rate 5e-5
--max_grad_norm 0.5
--num_train_epochs 3.0
--dev_ratio 0.01
--evaluation_strategy steps
--load_best_model_at_end
--plot_loss
--fp16
The bug๏ผš
image

ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

train_sft.py่ฎญ็ปƒๆŒ‡ไปค๏ผš
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py
--model_name_or_path /data1/projects/baichuan-7B/
--do_train
--dataset alpaca_gpt4_zh
--finetuning_type lora
--output_dir output
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

่ฎญ็ปƒๆŠฅ้”™ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.
value error

ๆœ‰ๆฒกๆœ‰ๅคงไฝฌ็Ÿฅ้“ๆ€Žไนˆ่งฃๅ†ณ๏ผŒ่ฐข่ฐข๏ผ

Train using qlora exist with error

train script as follow :

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path /xx/model/model_weights/Ziya-LLaMA-13B \
    --do_train \
    --dataset xx \
    --finetuning_type lora \
    --output_dir /xx/output \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-3 \
    --num_train_epochs 10.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16 \
    --quantization_bit 4

error message as follow :

Traceback (most recent call last):
  File "/xxx/src/train_sft.py", line 97, in <module>
    main()
  File "/xxx/src/train_sft.py", line 69, in main
    train_result = trainer.train()
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1638, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1923, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 2733, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 2758, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
    outputs = self.model(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 570, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 566, in custom_forward
    return module(*inputs, output_attentions, None)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (724x5120 and 1x13107200)
  0%|                                                                                                                | 0/30 [00:00<?, ?it/s]

ๅพฎ่ฐƒ็™พๅทๆจกๅž‹ๅ‡บ้”™๏ผšTarget modules ['q_proj', 'v_proj'] not found in the base model

ๆ˜ฏไธๆ˜ฏ่ฟ˜ไธๆ”ฏๆŒbaichuan-7BๅŸบๅบงๆจกๅž‹๏ผŒ่ฟ˜ๆ˜ฏ่ฏดๆˆ‘PEFT็š„็‰ˆๆœฌๆœ‰้—ฎ้ข˜๏ผŸ

Traceback (most recent call last):
File "/home/jerome/github/LLaMA-Efficient-Tuning/src/train_sft.py", line 98, in
main()
File "/home/jerome/github/LLaMA-Efficient-Tuning/src/train_sft.py", line 26, in main
model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
File "/home/jerome/github/LLaMA-Efficient-Tuning/src/utils/common.py", line 216, in load_pretrained
model = _init_adapter(model, model_args, finetuning_args, is_trainable, is_mergeable)
File "/home/jerome/github/LLaMA-Efficient-Tuning/src/utils/common.py", line 133, in _init_adapter
model = get_peft_model(model, lora_config)
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/mapping.py", line 120, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/peft_model.py", line 662, in init
super().init(model, peft_config, adapter_name)
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/peft_model.py", line 99, in init
self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/tuners/lora.py", line 154, in init
self.add_adapter(adapter_name, self.peft_config[adapter_name])
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/tuners/lora.py", line 161, in add_adapter
self._find_and_replace(adapter_name)
File "/home/jerome/anaconda3/envs/left/lib/python3.10/site-packages/peft/tuners/lora.py", line 254, in _find_and_replace
raise ValueError(
ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

ๅฏไปฅๆไพ›ไธ€ไธชๅฏไปฅๅ‚่€ƒ็š„็š„accelerate config_fileไนˆ...accelerateไธ€็›ดๅฏๅŠจไธ่ตทๆฅ

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

src/train_sft.py
--model_name_or_path /models/Ziya-LLaMA-13B-Pretrain-v1/
--do_train
--dataset alpaca_gpt4_zh
--finetuning_type lora
--output_dir sft_save_model_checkpoint_V2
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 16
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--resume_lora_training False
--plot_loss
--max_source_length 1200
--max_target_length 768
--fp16

RW่ฎญ็ปƒๆŠฅ้”™

Loading checkpoint shards:  71%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–                | 5/7 [01:51<00:45, 22.58s/it]WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 215299 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 215300 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 215301 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 215298) of binary: /root/miniconda3/envs/xray/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/xray/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/launch.py", line 909, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/launch.py", line 604, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
src/train_rm.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-07_14:46:15
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 215298)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 215298
============================================================

่ฎญ็ปƒๅ‘ฝไปคไธบ๏ผš

accelerate launch src/train_rm.py \
    --model_name_or_path llama-hf/33b-hf/llama-33b-hf \
    --do_train \
    --dataset comparison_gpt4_zh \
    --finetuning_type lora \
    --output_dir rm \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

่ฟ™ๆ˜ฏ็›ฎๅ‰็œ‹ๅˆฐๆœ€ๅ…จ็š„ๅคงๆจกๅž‹่ฎญ็ปƒไปฃ็ 

่ฟ™ๅฅ—ไปฃ็ ๅŒ…ๅซไบ†้ข„่ฎญ็ปƒใ€rlhfๆต็จ‹๏ผŒ่ฟ˜ๆœ‰loraใ€qloraๆŠ€ๆœฏใ€‚็œŸ็š„ๆ˜ฏๅพˆๅ…จ้ขไบ†ใ€‚
ไฝ†ๅฆ‚ๆžœๅฏไปฅๅฎž็Žฐๅคš่ฝฎๅฏน่ฏๆž„ๅปบ๏ผŒๆฏ”ๅฆ‚[q1๏ผŒa1๏ผŒq2๏ผŒa2๏ผŒq3๏ผŒa3]๏ผŒๆž„ๅปบๆˆ่ฎญ็ปƒๆ ทๆœฌไธบ๏ผšprompt๏ผšq1*[IGNORE_INDEX]+a1++q2*[IGNORE_INDEX]+a2++q3*[IGNORE_INDEX]๏ผŒresponse: a3
ๅฐฑๆ›ดๅฅฝไบ†ๅ“ˆๅ“ˆ

PPO้˜ถๆฎตไธŽRM้˜ถๆฎตไฝฟ็”จaccelerate่ฎญ็ปƒไบง็”ŸๅŒๆ ท้”™่ฏฏ

ไปฅไธ‹ๆ˜ฏPPO้˜ถๆฎต็š„้”™่ฏฏlog๏ผŒRMไบง็”Ÿ่ฟ™ไธช้”™่ฏฏๅฏไปฅ้€š่ฟ‡ไธไฝฟ็”จaccelerateๅคšๅก่ฎญ็ปƒ่งฃๅ†ณ๏ผš

  1. "transformers_version": "4.29.2"
  2. ่ฎญ็ปƒๆŠฅ้”™๏ผš
[INFO|modeling_utils.py:2513] 2023-06-08 10:26:34,951 >> loading weights file llama-hf/33b-hf/llama-33b-hf/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1154] 2023-06-08 10:26:34,952 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:577] 2023-06-08 10:26:34,953 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.29.2"
}

Loading checkpoint shards:  71%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–                | 5/7 [01:46<00:43, 21.78s/it]WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517201 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517202 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 517204 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 517203) of binary: /root/miniconda3/envs/xray/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/xray/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/launch.py", line 909, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/accelerate/commands/launch.py", line 604, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
src/train_ppo.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-08_10:28:37
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 2 (local_rank: 2)
  exitcode  : -9 (pid: 517203)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 517203
============================================================
  1. ๆˆ‘็š„ๅ†…ๅญ˜
              total        used        free      shared  buff/cache   available
Mem:          503Gi       301Gi       198Gi        32Mi       3.2Gi       199Gi
Swap:            0B          0B          0B
  1. ่ฎญ็ปƒๅ‘ฝไปค๏ผš
accelerate launch src/train_ppo.py \
    --model_name_or_path llama-hf/33b-hf/llama-33b-hf \
    --do_train \
    --dataset CCT \
    --finetuning_type lora \
    --checkpoint_dir sft/ \
    --reward_model rm/ \
    --output_dir ppo \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 2.0 \
    --resume_lora_training False \
    --plot_loss

ๅ…ณไบŽๅ•ๆœบๅคšๅก่ฎญ็ปƒ้—ฎ้ข˜

ๆ‚จๅฅฝ๏ผŒ่ฏท้—ฎๅฆ‚ไฝ•ๅฎž็Žฐๅฐ†ๅคงๆจกๅž‹็š„ๅ‚ๆ•ฐๅˆ’ๅˆ†ๅˆฐๅคšๅผ ๅกไธŠ่ฎญ็ปƒ๏ผŒ่€Œไธๆ˜ฏๅœจๆฏๅผ ๅกไธŠ้ƒฝๅŠ ่ฝฝๆ•ดไธชๆจกๅž‹ๅ‚ๆ•ฐใ€‚

ๅŠ ่ฝฝchinese-alpaca-plus-13bๆจกๅž‹ๆŽจ็†ๅผ‚ๅธธ

python src/cli_demo.py --model_name_or_path xxx --prompt_template alpaca
xxๆ˜ฏๅˆๅนถๅŽ็š„chinese-alpaca-plus-13b็›ฎๅฝ•ใ€‚ๆจกๅž‹็กฎ่ฎคๆฒกๆœ‰้—ฎ้ข˜๏ผŒ็”จchinese-alpacaๅฎ˜ๆ–น็š„cppๆ–‡ไปถๆŽจ็†ๆฒกๆœ‰้—ฎ้ข˜ใ€‚
image

ValueError: Target module BloomMLP is not supported. Currently, only `torch.nn.Linear` and `Conv1D` are supported.

ๆ‰ง่กŒ็š„ๅ‘ฝไปค๏ผš

#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../src/train_rm.py \
    --model_name_or_path golaxy/gogpt-560m \
    --do_train \
    --dataset_dir ../data \
    --dataset comparison_gpt4_en,comparison_gpt4_zh,hh_rlhf_en \
    --finetuning_type lora \
    --lora_target query_key_value,dense,mlp \
    --output_dir ./results/rm \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16 \
    --seed 123456

ๆŠฅ้”™๏ผš

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/user/Desktop/pythonProject/LLaMA-Efficient-Tuning/examples/../src/train_rm.py:74 in    โ”‚
โ”‚ <module>                                                                                     โ”‚
โ”‚                                                                                              โ”‚
โ”‚   71                                                                                         โ”‚
โ”‚   72                                                                                         โ”‚
โ”‚   73 if __name__ == "__main__":                                                              โ”‚
โ”‚ โฑ 74 โ”‚   main()                                                                              โ”‚
โ”‚   75                                                                                         โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/Desktop/pythonProject/LLaMA-Efficient-Tuning/examples/../src/train_rm.py:24 in    โ”‚
โ”‚ main                                                                                         โ”‚
โ”‚                                                                                              โ”‚
โ”‚   21 โ”‚   # Prepare pretrained model and dataset                                              โ”‚
โ”‚   22 โ”‚   model_args, data_args, training_args, finetuning_args = prepare_args(stage="rm")    โ”‚
โ”‚   23 โ”‚   dataset = prepare_data(model_args, data_args)                                       โ”‚
โ”‚ โฑ 24 โ”‚   model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_tr โ”‚
โ”‚   25 โ”‚   dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="rm") โ”‚
โ”‚   26 โ”‚   data_collator = PairwiseDataCollatorWithPadding(tokenizer, model.pretrained_model)  โ”‚
โ”‚   27                                                                                         โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/Desktop/pythonProject/LLaMA-Efficient-Tuning/src/utils/common.py:186 in           โ”‚
โ”‚ load_pretrained                                                                              โ”‚
โ”‚                                                                                              โ”‚
โ”‚   183 โ”‚   โ”‚   **config_kwargs                                                                โ”‚
โ”‚   184 โ”‚   )                                                                                  โ”‚
โ”‚   185 โ”‚   model = prepare_model_for_training(model) if is_trainable else model               โ”‚
โ”‚ โฑ 186 โ”‚   model = init_adapter(model, model_args, finetuning_args, is_trainable)             โ”‚
โ”‚   187 โ”‚                                                                                      โ”‚
โ”‚   188 โ”‚   if not is_trainable:                                                               โ”‚
โ”‚   189 โ”‚   โ”‚   model.requires_grad_(False) # fix all model params                             โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/Desktop/pythonProject/LLaMA-Efficient-Tuning/src/utils/common.py:121 in           โ”‚
โ”‚ init_adapter                                                                                 โ”‚
โ”‚                                                                                              โ”‚
โ”‚   118 โ”‚   โ”‚   โ”‚   โ”‚   lora_dropout=finetuning_args.lora_dropout,                             โ”‚
โ”‚   119 โ”‚   โ”‚   โ”‚   โ”‚   target_modules=finetuning_args.lora_target                             โ”‚
โ”‚   120 โ”‚   โ”‚   โ”‚   )                                                                          โ”‚
โ”‚ โฑ 121 โ”‚   โ”‚   โ”‚   model = get_peft_model(model, lora_config)                                 โ”‚
โ”‚   122 โ”‚                                                                                      โ”‚
โ”‚   123 โ”‚   if model_args.checkpoint_dir is not None:                                          โ”‚
โ”‚   124 โ”‚   โ”‚   logger.info("Loaded fine-tuned model from checkpoint(s): {}".format(",".join(m โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/mapping.py:120 in get_peft_model       โ”‚
โ”‚                                                                                              โ”‚
โ”‚   117 โ”‚   โ”‚   return PeftModel(model, peft_config)                                           โ”‚
โ”‚   118 โ”‚   if isinstance(peft_config, PromptLearningConfig):                                  โ”‚
โ”‚   119 โ”‚   โ”‚   peft_config = _prepare_prompt_learning_config(peft_config, model_config)       โ”‚
โ”‚ โฑ 120 โ”‚   return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config) โ”‚
โ”‚   121                                                                                        โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/peft_model.py:662 in __init__          โ”‚
โ”‚                                                                                              โ”‚
โ”‚    659 โ”‚   """                                                                               โ”‚
โ”‚    660 โ”‚                                                                                     โ”‚
โ”‚    661 โ”‚   def __init__(self, model, peft_config: PeftConfig, adapter_name="default"):       โ”‚
โ”‚ โฑ  662 โ”‚   โ”‚   super().__init__(model, peft_config, adapter_name)                            โ”‚
โ”‚    663 โ”‚   โ”‚   self.base_model_prepare_inputs_for_generation = self.base_model.prepare_input โ”‚
โ”‚    664 โ”‚                                                                                     โ”‚
โ”‚    665 โ”‚   def forward(                                                                      โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/peft_model.py:99 in __init__           โ”‚
โ”‚                                                                                              โ”‚
โ”‚     96 โ”‚   โ”‚   self.base_model_torch_dtype = getattr(model, "dtype", None)                   โ”‚
โ”‚     97 โ”‚   โ”‚   if not isinstance(peft_config, PromptLearningConfig):                         โ”‚
โ”‚     98 โ”‚   โ”‚   โ”‚   self.peft_config[adapter_name] = peft_config                              โ”‚
โ”‚ โฑ   99 โ”‚   โ”‚   โ”‚   self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](      โ”‚
โ”‚    100 โ”‚   โ”‚   โ”‚   โ”‚   self.base_model, self.peft_config, adapter_name                       โ”‚
โ”‚    101 โ”‚   โ”‚   โ”‚   )                                                                         โ”‚
โ”‚    102 โ”‚   โ”‚   โ”‚   self.set_additional_trainable_modules(peft_config, adapter_name)          โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/tuners/lora.py:154 in __init__         โ”‚
โ”‚                                                                                              โ”‚
โ”‚   151 โ”‚   โ”‚   self.model = model                                                             โ”‚
โ”‚   152 โ”‚   โ”‚   self.forward = self.model.forward                                              โ”‚
โ”‚   153 โ”‚   โ”‚   self.peft_config = config                                                      โ”‚
โ”‚ โฑ 154 โ”‚   โ”‚   self.add_adapter(adapter_name, self.peft_config[adapter_name])                 โ”‚
โ”‚   155 โ”‚                                                                                      โ”‚
โ”‚   156 โ”‚   def add_adapter(self, adapter_name, config=None):                                  โ”‚
โ”‚   157 โ”‚   โ”‚   if config is not None:                                                         โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/tuners/lora.py:161 in add_adapter      โ”‚
โ”‚                                                                                              โ”‚
โ”‚   158 โ”‚   โ”‚   โ”‚   model_config = self.model.config.to_dict() if hasattr(self.model.config, " โ”‚
โ”‚   159 โ”‚   โ”‚   โ”‚   config = self._prepare_lora_config(config, model_config)                   โ”‚
โ”‚   160 โ”‚   โ”‚   โ”‚   self.peft_config[adapter_name] = config                                    โ”‚
โ”‚ โฑ 161 โ”‚   โ”‚   self._find_and_replace(adapter_name)                                           โ”‚
โ”‚   162 โ”‚   โ”‚   if len(self.peft_config) > 1 and self.peft_config[adapter_name].bias != "none" โ”‚
โ”‚   163 โ”‚   โ”‚   โ”‚   raise ValueError(                                                          โ”‚
โ”‚   164 โ”‚   โ”‚   โ”‚   โ”‚   "LoraModel supports only 1 adapter with bias. When using multiple adap โ”‚
โ”‚                                                                                              โ”‚
โ”‚ /home/user/anaconda3/lib/python3.9/site-packages/peft/tuners/lora.py:246 in                  โ”‚
โ”‚ _find_and_replace                                                                            โ”‚
โ”‚                                                                                              โ”‚
โ”‚   243 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   )                                                      โ”‚
โ”‚   244 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   kwargs["fan_in_fan_out"] = lora_config.fan_in_fan_out  โ”‚
โ”‚   245 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   else:                                                          โ”‚
โ”‚ โฑ 246 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   raise ValueError(                                          โ”‚
โ”‚   247 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   f"Target module {target} is not supported. "           โ”‚
โ”‚   248 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   f"Currently, only `torch.nn.Linear` and `Conv1D` are s โ”‚
โ”‚   249 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   )                                                          โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: Target module BloomMLP(
  (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
  (gelu_impl): BloomGelu()
  (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)

ไฝฟ็”จ็š„ๆจกๅž‹ๆ˜ฏๅœจbigscience/bloomz-560mไธŠๅพฎ่ฐƒๅŽ็š„๏ผŒ--lora_tagetๅ‚่€ƒ็š„config.pyๅกซ็š„๏ผŒไฝ†ๆˆ‘็œ‹Bloom็š„ๆจกๅž‹็ป“ๆž„๏ผš

(23): BloomBlock(
  (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (self_attention): BloomAttention(
    (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
    (dense): Linear(in_features=1024, out_features=1024, bias=True)
    (attention_dropout): Dropout(p=0.0, inplace=False)
  )
  (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (mlp): BloomMLP(
    (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
    (gelu_impl): BloomGelu()
    (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
  )

ไผผไนŽไธ้œ€่ฆmlpๅ‚ๆ•ฐ๏ผŒๅบ”่ฏฅๆ˜ฏdense_h_to_4hๅ’Œdense_4h_to_h๏ผŸ

ๅ’จ่ฏขๅ…ณไบŽๅพฎ่ฐƒ็š„ๆŠฅ้”™

ๆŠฅ้”™ๅฆ‚ไธ‹
06/19/2023 13:52:00 - INFO - utils.common - Fine-tuning method: LoRA
ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

่ฟ่กŒไปฃ็ ๅ‚ๆ•ฐ CUDA_VISIBLE_DEVICES=0 python src/train_sft.py --model_name_or_path ../models --do_train --dataset alpaca_gpt4_zh --finetuning_type lora --output_dir path_to_sft_checkpoint --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 1000 --learning_rate 5e-5 --num_train_epochs 3.0 --plot_loss --fp16

base modelๆ˜ฏbaichuan

AssertionError: The given checkpoint is not a LoRA checkpoint, please specify `--finetuning_type full/freeze` instead.

่ฎญ็ปƒๅ‚ๆ•ฐ๏ผš
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py --model_name_or_path ./Bloom/ --do_train --dataset alpaca_gpt4_en --finetuning_type lora --checkpoint_dir path_to_pt_checkpoint --output_dir path_to_sft_checkpoint --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 1000 --learning_rate 5e-5 --num_train_epochs 3.0 --resume_lora_training False --lora_target query_key_value --plot_loss --fp16

Bloomไธๆ”ฏๆŒloraๅ—๏ผŸ่ฐข่ฐขใ€‚

how to fine-tune bloom-3b model?

train.sh

CUDA_VISIBLE_DEVICES=0 python src/train_pt.py --model_name_or_path bloom-3b/ --do_train --dataset wiki_demo --finetuning_type lora --output_dir weights/ --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --learning_rate 5e-5 --num_train_epochs 3.0 --plot_loss

error:
[INFO|modeling_utils.py:3303] 2023-06-13 08:51:34,327 >> All the weights of BloomForCausalLM were initialized from the model checkpoint at bloom-3b/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BloomForCausalLM for predictions without further training.
[INFO|modeling_utils.py:2927] 2023-06-13 08:51:34,328 >> Generation config file not found, using a generation config created from the model config.
06/13/2023 08:51:34 - INFO - utils.common - Fine-tuning method: LoRA
Traceback (most recent call last):
File "/home/server/Tutorial/LLaMA-Efficient-Tuning-main/src/train_pt.py", line 81, in
main()
File "/home/server/Tutorial/LLaMA-Efficient-Tuning-main/src/train_pt.py", line 26, in main
model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="pt")
File "/home/server/Tutorial/LLaMA-Efficient-Tuning-main/src/utils/common.py", line 214, in load_pretrained
model = _init_adapter(model, model_args, finetuning_args, is_trainable, is_mergeable)
File "/home/server/Tutorial/LLaMA-Efficient-Tuning-main/src/utils/common.py", line 133, in _init_adapter
model = get_peft_model(model, lora_config)
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/mapping.py", line 120, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/peft_model.py", line 662, in init
super().init(model, peft_config, adapter_name)
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/peft_model.py", line 99, in init
self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/tuners/lora.py", line 154, in init
self.add_adapter(adapter_name, self.peft_config[adapter_name])
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/tuners/lora.py", line 161, in add_adapter
self._find_and_replace(adapter_name)
File "/home/server/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/tuners/lora.py", line 254, in _find_and_replace
raise ValueError(
ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

ไฝฟ็”จๅ››ๅกA100ๅ’ŒQlora-4่ฟ›่กŒPPO่ฎญ็ปƒๆŠฅ้”™

Assistant:<unk>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/tmp/CCT/src/train_ppo.py", line 82, in <module>
  File "/tmp/CCT/src/train_ppo.py", line 82, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/tmp/CCT/src/train_ppo.py", line 82, in <module>
  File "/tmp/CCT/src/train_ppo.py", line 82, in <module>
    main()
main()
  File "/tmp/CCT/src/train_ppo.py", line 55, in main
  File "/tmp/CCT/src/train_ppo.py", line 55, in main
    main()
  File "/tmp/CCT/src/train_ppo.py", line 55, in main
    main()
  File "/tmp/CCT/src/train_ppo.py", line 55, in main
        ppo_trainer = PPOPeftTrainer(ppo_trainer = PPOPeftTrainer(

  File "/tmp/CCT/src/utils/ppo.py", line 72, in __init__
ppo_trainer = PPOPeftTrainer(  File "/tmp/CCT/src/utils/ppo.py", line 72, in __init__

  File "/tmp/CCT/src/utils/ppo.py", line 72, in __init__
    ppo_trainer = PPOPeftTrainer(
  File "/tmp/CCT/src/utils/ppo.py", line 72, in __init__
    PPOTrainer.__init__(self, **kwargs)
          File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
PPOTrainer.__init__(self, **kwargs)    PPOTrainer.__init__(self, **kwargs)
PPOTrainer.__init__(self, **kwargs)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__

  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py", line 290, in __init__
    ) = self.accelerator.prepare(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1182, in prepare
            ) = self.accelerator.prepare() = self.accelerator.prepare() = self.accelerator.prepare(


  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1182, in prepare
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1182, in prepare
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1182, in prepare
    result = tuple(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
            result = tuple(result = tuple(result = tuple(


  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
            self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one

  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
    return self.prepare_model(obj, device_placement=device_placement)
      File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
return self.prepare_model(obj, device_placement=device_placement)
return self.prepare_model(obj, device_placement=device_placement)  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1275, in prepare_model

  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1275, in prepare_model
    model = torch.nn.parallel.DistributedDataParallel(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
    model = torch.nn.parallel.DistributedDataParallel(
    model = torch.nn.parallel.DistributedDataParallel(  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__

  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
    model = torch.nn.parallel.DistributedDataParallel(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 676, in __init__
        _sync_module_states(_sync_module_states(

_sync_module_states(_sync_module_states(  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states


  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 142, in _sync_module_states
        _sync_params_and_buffers(
    _sync_params_and_buffers(_sync_params_and_buffers(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
_sync_params_and_buffers(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers

  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
    dist._broadcast_coalesced(
    dist._broadcast_coalesced(
        dist._broadcast_coalesced(dist._broadcast_coalesced(

RuntimeError: Tensors must be CUDA and dense
RuntimeErrorRuntimeError: : RuntimeErrorTensors must be CUDA and denseTensors must be CUDA and dense
:
Tensors must be CUDA and dense
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 897529) of binary: /root/miniconda3/envs/llama_etuning/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/llama_etuning/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/commands/launch.py", line 932, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/accelerate/commands/launch.py", line 627, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/llama_etuning/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
src/train_ppo.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-06-09_10:00:51
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 897530)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-06-09_10:00:51
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 897531)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-06-09_10:00:51
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 897532)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-09_10:00:51
  host      : mpudgx202302-DGX-Station-A100-920-23487-2531-000
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 897529)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
accelerate launch src/train_ppo.py \
    --model_name_or_path llama-hf/33b-hf/llama-33b-hf \
    --do_train \
    --dataset ChangChunTeng \
    --finetuning_type lora \
    --checkpoint_dir sft/checkpoint-9000 \
    --reward_model rm/checkpoint-4000 \
    --output_dir ppo \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 2.0 \
    --resume_lora_training False \
    --plot_loss \
    --quantization_bit 4

ๆœ‰ๅ…ณrewardๆจกๅž‹็š„้—ฎ้ข˜

ๆ„Ÿ่ฐขไฝ ไปฌ็š„ๅทฅไฝœ๏ผŒ่ฟ™ไธช็œŸ็š„ๅพˆๅฅฝ็”จ๏ผŒไฝ†ๆ˜ฏๆˆ‘ๆœ‰ไธ€ไธช้—ฎ้ข˜๏ผŒๆ‰“ไธชๆฏ”ๆ–น๏ผŒๆˆ‘ๅฏนLLama่ฟ›่กŒRLHFๆ—ถ๏ผŒๆ˜ฏๅฆ่ƒฝๅคŸไฝฟ็”จBLOOMไฝœไธบrewardๆจกๅž‹๏ผŒๅฆ‚ๆžœๅฏไปฅ็š„่ฏ๏ผŒ้œ€่ฆๅœจๅ“ช้‡Œ่ฟ›่กŒๆ”นๅŠจๅ‘ข

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

่ฎญ็ปƒๆŒ‡ไปค๏ผš

accelerate launch src/train_sft.py \
    --model_name_or_path llama-hf/llama-13b-hf \
    --do_train \
    --dataset ChangChunTeng \
    --finetuning_type lora \
    --output_dir CCT/sft \
    --overwrite_cache \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16

OOM issue for SFT full parameter training

I'm trying to do SFT with full parameter training on LLaMa 7B model. I have used the same command from the readme for train_sft.py. When I use finetuning_type='lora', the training is starting as expected. But when I use finetuning_type='full', it's leading to OOM.

I'm using A100 80GB.

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path huggyllama/llama-7b \ --do_train \ --dataset alpaca_gpt4_en \ --finetuning_type full \ --output_dir path_to_sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --fp16

Any thought on what might be the issue here?

Dependency version in requirements mismatch

  • protobuf should <=3.20.1
    (or produce TypeError: Descriptors cannot not be created directly.)
  • dataset should >=2.12.0
    ( or produce ImportError: datasets>=2.12.0 is required for a normal functioning of this module, but found datasets==2.11.0.)

if qlora:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.