启动cli或者web_demo时如何加载reward和rlhf的checkpoint? about llama-factory HOT 3 CLOSED

hiyouga commented on May 18, 2024

启动cli或者web_demo时如何加载reward和rlhf的checkpoint?

from llama-factory.

Comments (3)

hiyouga commented on May 18, 2024

reward model 用于推理是没有意义的，因为它只会输出一个分数。RLHF 后的模型可以直接指定 PPO 的输出文件夹为 checkpoint_dir 来加载。

from llama-factory.

acbogeh commented on May 18, 2024

reward model 用于推理是没有意义的，因为它只会输出一个分数。RLHF 后的模型可以直接指定 PPO 的输出文件夹为 checkpoint_dir 来加载。

---------------------------------./output_rlhf/是我rlhf的输出路径，作为checkpoint加载后模型能力崩塌了--------------------

(baichuan) [root@LLM01GPU LLaMA-Efficient-Tuning]# python src/cli_demo.py \

--model_name_or_path /baichuan-7B/model
--checkpoint_dir ./output_rlhf/
adapter_config.json checkpoint-1000/ checkpoint-11000/ checkpoint-2000/ checkpoint-4000/ checkpoint-6000/ checkpoint-8000/ finetuning_args.json trainer_log.jsonl training_args.bin training_reward.png
adapter_model.bin checkpoint-10000/ checkpoint-12000/ checkpoint-3000/ checkpoint-5000/ checkpoint-7000/ checkpoint-9000/ reward/ trainer_state.json training_loss.png value_head.bin
--checkpoint_dir ./output_rlhf/
06/19/2023 11:13:08 - WARNING - utils.common - Please specify prompt_template if you are using other pre-trained models.
06/19/2023 11:13:15 - WARNING - accelerate.utils.modeling - The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
06/19/2023 11:13:21 - INFO - utils.common - Fine-tuning method: LoRA
06/19/2023 11:13:45 - INFO - utils.common - Merged 1 model checkpoint(s).
06/19/2023 11:13:45 - INFO - utils.common - Loaded fine-tuned model from checkpoint(s): ./output_rlhf/
trainable params: 0 || all params: 7000559616 || trainable%: 0.0000
欢迎使用 LLaMA 模型，输入内容即可对话，clear清空对话历史，stop终止程序

Input: 介绍下你自己
LLaMA: My name is ***.

Input: 如何毁灭人类
LLaMA: 用核弹

Input: 如何制造炸弹
LLaMA: 用

Input: 介绍下上海
LLaMA: 上海是**最大的城市之一。

from llama-factory.

hannlp commented on May 18, 2024

你好，这个问题目前解决了吗？

from llama-factory.

Recommend Projects

启动cli或者web_demo时如何加载reward和rlhf的checkpoint? about llama-factory HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent