ssbuild / qwen_finetuning Goto Github PK

View Code? Open in Web Editor NEW

77.0 2.0 6.0 517 KB

qwen-7b and qwen-14b finetuning

License: Apache License 2.0

Python 91.36% Shell 8.64%

lora qlora sfml qwen qwen-7b ia3 adalora qwen-14b

qwen_finetuning's Introduction

statement

deep_training

    2024-04-22 简化
    2023-12-02 update qwen model 1.8b 7b 12b 72b 
    2023-10-09 support accelerator trainer
    2023-10-07 support colossalai trainer
    2023-09-26 support transformers trainer
    2023-09-25 0.2.4 support qwen-7b 新版 和 qwen-14b ， 旧版不再支持，旧版可以安装 deep_training <= 0.2.3
                support transformers trainer
    2023-08-11 aigc-zoo 0.1.17.post0 update config , 更新下官方权重配置文件 
    dev 分支加一些新功能和想法 如果求稳定，请使用 stable分支

install

pip install -U -r requirements.txt
如果无法安装 , 可以切换官方源 pip install -i https://pypi.org/simple -U -r requirements.txt


# flash-attention对显卡算例要求算力7.5 以上 ， 下面可选安装 ，如果卡不支持可以不安装。
git clone -b https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary

weight

data sample

数据示例
例子依次分别是 工具，对话，对话，对话
数据构建sample 参考 data/make_data_example.py
数组组成  
role: 可选字段(str) 标志 q字段 角色, one of user system, observation ，system 标识是否为 system prompt , system prompt a 空 
q: 问题
a: 回答

注意事项:
a字段：对于普通对话，a即为回答。
细节可以参考 assets/react_prompt.md

{"id": 1, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"role": "user", "q": "Answer the following questions as best you can. You have access to the following tools:\n\nquark_search: Call this tool to interact with the 夸克搜索 API. What is the 夸克搜索 API useful for? 夸克搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{\"name\": \"search_query\", \"description\": \"搜索关键词或短语\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nimage_gen: Call this tool to interact with the 通义万相 API. What is the 通义万相 API useful for? 通义万相是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{\"name\": \"query\", \"description\": \"中文关键词，描述了希望图像具有什么内容\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [quark_search,image_gen]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can be repeated zero or more times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: 现在给我画个五彩斑斓的黑", "a": "\nThought: 我应该使用通义万相API来生成一张五彩斑斓的黑的图片。\nAction: image_gen\nAction Input: {\"query\": \"五彩斑斓的黑\"}\n            "}, {"role": "observation", "q": "Observation: \n{\"status_code\": 200, \"request_id\": \"3d894da2-0e26-9b7c-bd90-102e5250ae03\", \"code\": null, \"message\": \"\", \"output\": {\"task_id\": \"2befaa09-a8b3-4740-ada9-4d00c2758b05\", \"task_status\": \"SUCCEEDED\", \"results\": [{\"url\": \"https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png\"}], \"task_metrics\": {\"TOTAL\": 1, \"SUCCEEDED\": 1, \"FAILED\": 0}}, \"usage\": {\"image_count\": 1}}\n", "a": "\nThought: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片。\nFinal Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png。\n            "}]}
{"id": 2, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "从南京到上海的路线", "a": "你好，南京到上海的路线如下：\n1. 南京到上海，可以乘坐南京地铁1号线，在南京站乘坐轨道交通1号线。\n2. 南京到浦东机场，可以搭乘上海地铁1号，在陆家嘴站乘坐地铁1线，在浦东国际机场站乘坐机场快线，前往上海浦东国际机场。\n3. 上海到南京，可以换乘上海地铁2号线，从南京站换乘地铁2线，再从南京南站换乘地铁1路，然后到达上海站"}]}
{"id": 3, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "写一个诗歌，关于冬天", "a": "冬夜寂静冷，\n云在天边飘，\n冰封白雪上， \n寒冷像一场雪。\n \n雪花融化成冰，\n像那雪花飘洒，\n在寒冷的冬天，\n感受春天的喜悦。\n 冬日里，\n风雪渐消，\n一片寂静，\n把快乐和温暖带回家。"}]}
{"id": 4, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "晚上睡不着应该怎么办", "a": "晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:\n\n1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。\n2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。\n3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。\n4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。\n5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。\n6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。\n\n如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。"}]}

或者

{"id": 1, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "Answer the following questions as best you can. You have access to the following tools:\n\nquark_search: Call this tool to interact with the 夸克搜索 API. What is the 夸克搜索 API useful for? 夸克搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{\"name\": \"search_query\", \"description\": \"搜索关键词或短语\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nimage_gen: Call this tool to interact with the 通义万相 API. What is the 通义万相 API useful for? 通义万相是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{\"name\": \"query\", \"description\": \"中文关键词，描述了希望图像具有什么内容\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [quark_search,image_gen]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can be repeated zero or more times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: 现在给我画个五彩斑斓的黑"}, {"from": "assistant", "value": "\nThought: 我应该使用通义万相API来生成一张五彩斑斓的黑的图片。\nAction: image_gen\nAction Input: {\"query\": \"五彩斑斓的黑\"}\n            "}, {"from": "observation", "value": "Observation: \n{\"status_code\": 200, \"request_id\": \"3d894da2-0e26-9b7c-bd90-102e5250ae03\", \"code\": null, \"message\": \"\", \"output\": {\"task_id\": \"2befaa09-a8b3-4740-ada9-4d00c2758b05\", \"task_status\": \"SUCCEEDED\", \"results\": [{\"url\": \"https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png\"}], \"task_metrics\": {\"TOTAL\": 1, \"SUCCEEDED\": 1, \"FAILED\": 0}}, \"usage\": {\"image_count\": 1}}\n"}, {"from": "assistant", "value": "\nThought: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片。\nFinal Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png。\n            "}]}
{"id": 2, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "从南京到上海的路线"}, {"from": "assistant", "value": "你好，南京到上海的路线如下：\n1. 南京到上海，可以乘坐南京地铁1号线，在南京站乘坐轨道交通1号线。\n2. 南京到浦东机场，可以搭乘上海地铁1号，在陆家嘴站乘坐地铁1线，在浦东国际机场站乘坐机场快线，前往上海浦东国际机场。\n3. 上海到南京，可以换乘上海地铁2号线，从南京站换乘地铁2线，再从南京南站换乘地铁1路，然后到达上海站"}]}
{"id": 3, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "写一个诗歌，关于冬天"}, {"from": "assistant", "value": "冬夜寂静冷，\n云在天边飘，\n冰封白雪上， \n寒冷像一场雪。\n \n雪花融化成冰，\n像那雪花飘洒，\n在寒冷的冬天，\n感受春天的喜悦。\n 冬日里，\n风雪渐消，\n一片寂静，\n把快乐和温暖带回家。"}]}
{"id": 4, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "晚上睡不着应该怎么办"}, {"from": "assistant", "value": "晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:\n\n1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。\n2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。\n3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。\n4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。\n5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。\n6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。\n\n如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。"}]}

infer

# infer.py 推理预训练模型
# infer_finetuning.py 推理微调模型
# infer_lora_finetuning.py 推理lora微调模型
 python infer.py

量化等级	最低 GPU 显存
FP16（无量化）	13 GB
INT8	10 GB
INT4	6 GB

training

 # 制作数据
 cd scripts
 bash train_full.sh -m dataset 
 or
 bash train_lora.sh -m dataset 
 or
 bash train_ptv2.sh -m dataset 
 
 注: num_process_worker 为多进程制作数据 ， 如果数据量较大 ， 适当调大至cpu数量
 dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)
 
 # 全参数训练 
     bash train_full.sh -m train
     
 # lora adalora ia3 
     bash train_lora.sh -m train
     
 # ptv2
     bash train_ptv2.sh -m train

aigc-serving

部署qwen之后，可测试工具函数

quad_calculator.py

训练参数

友情链接

纯粹而干净的代码

Reference

https://github.com/QwenLM/Qwen-7B

Star History

qwen_finetuning's People

Contributors

Stargazers

Watchers

Forkers

copperdong adambear zhongpei dgo2dance gcxamy njnuwjq

qwen_finetuning's Issues

有报错，貌似没有设置eos_token

ile "/checkpoint/binary/train_package/src/mdl/qwen/data_utils.py", line 58, in on_data_process
ds = TokenTruncation.process(tokenizer,config,data, max_seq_length,**data_conf[strategy])
File "/checkpoint/binary/train_package/src/mdl/qwen/data_processer.py", line 63, in process
d = TokenIdsFinal.process(input_ids,labels,max_seq_length,tokenizer)
File "/checkpoint/binary/train_package/src/mdl/qwen/data_processer.py", line 27, in process
input_ids = np.pad(input_ids, (0, pad_len), 'constant', constant_values=(pad_val, pad_val))
File "<array_function internals>", line 180, in pad
File "/root/.local/lib/python3.8/site-packages/numpy/lib/arraypad.py", line 803, in pad
_set_pad_area(roi, axis, width_pair, value_pair)
File "/root/.local/lib/python3.8/site-packages/numpy/lib/arraypad.py", line 147, in _set_pad_area
padded[left_slice] = value_pair[0]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

deep_training 这个包里报错。挺奇怪。

Rank(Worker): 0 caught unknown unhandled exception:
Exception type: AssertionError
Detail:
Traceback (most recent call last):
File "/checkpoint/binary/train_package/train.py", line 26, in
qwen_train( qwen_args, output_dir)
File "/checkpoint/binary/train_package/src/mdl/qwen/train.py", line 122, in train
trainer.fit(pl_model, train_dataloaders=train_datasets)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1016, in _run_stage
self.fit_loop.run()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
self.advance()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 218, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 185, in run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 260, in _optimizer_step
call._call_lightning_module_hook(
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 144, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/core/module.py", line 1256, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/core/optimizer.py", line 155, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 225, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/plugins/precision/amp.py", line 76, in optimizer_step
closure_result = closure()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in call
self._result = self.closure(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure
step_output = self._step_fn()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 307, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 367, in training_step
return self.model.training_step(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 552, in training_step
outputs = self.compute_loss(**batch)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 371, in compute_loss
return self.model.compute_loss(**kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 117, in compute_loss
return self.model(*args, **batch)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 964, in forward
transformer_outputs = self.transformer(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 780, in forward
outputs = torch.utils.checkpoint.checkpoint(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 776, in custom_forward
return module(*inputs, use_cache, output_attentions)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 538, in forward
attn_outputs = self.attn(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 448, in forward
context_layer = self.core_attention_flash(q, k, v)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 140, in forward
assert all((i.dtype in [torch.float16, torch.bfloat16] for i in (q, k, v)))
AssertionError

int4量化模型transformers使用报错

你好，这个训练参数如何配置，如何命令启动训练

你好，训练参数如何配置，模型需要下载放到对应目录吗

infer.py推理异常

给出一段混杂各种语言的乱码。

请问system 的 prompt为啥固定为 You are a helpful assistant.

如题

反序列化遇到问题

我用V100机器跑没啥问题，但是用线上A100跑就出问题如下。

root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py:90: UserWarning: Couldn't deserialize thrift: TProtocolException: Invalid data

warnings.warn(str(e))
/home/build/tfrecords/tfrecords/arrow_cc/third_party/arrow/cpp/src/arrow/result.cc:28: ValueOrDie called on an error: IOError: Couldn't deserialize thrift: TProtocolException: Invalid data
Deserializing page header failed.

Fatal Python error: Aborted

Current thread 0x00007fa3d305acc0 (most recent call first):
File "/root/.local/lib/python3.8/site-packages/tfrecords/python/io/arrow.py", line 593 in read_table
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 75 in reopen
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 58 in reset
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 51 in init
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 167 in reopen
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 156 in reset
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 149 in init
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/dataset.py", line 104 in RandomDataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/core/numpyadapter.py", line 325 in load
File "/root/.local/lib/python3.8/site-packages/numpy_io/core/reader.py", line 26 in load_numpy_dataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/dataloaders.py", line 64 in load_dataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/dataloaders.py", line 142 in load_distributed_random_sampler
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/data_helper.py", line 89 in load_distributed_random_sampler
File "/checkpoint/binary/train_package/src/mdl/qwen/train.py", line 127 in train
File "/checkpoint/binary/train_package/train.py", line 39 in

我安装的包如下

tiktoken==0.4.0
einops==0.6.1
transformers>=4.30
deepspeed==0.10.0
cpm-kernels==1.0.11
bitsandbytes==0.39.1
accelerate>=0.20
aigc-zoo==0.1.17.post0
jieba==0.42.1
rouge_chinese==1.0.3
numba==0.57.1
GPUtil==1.4.0
torch==1.13.1

请教关于<|endoftext|>的问题

aigc_zoo/src/aigc_zoo/model_zoo/qwen/tokenization_qwen.py 用到了 ENDOFTEXT = "<|endoftext|>"

attention_mask有bug

data_utils.py这个处理，和千问finetune.py 不一致
151 for i,seqlen in enumerate(seqlens):
152 attention_mask[i,:seqlen] = 1

下面是千问官网的（github) finetune.py
174 attention_mask=input_ids.ne(tokenizer.pad_token_id),

NN_DataHelper

NN_DataHelper 在make_dataset_all 会将数据tokenizer化，然后保存

1.make_dataset_all过程能不能将保存的名字和输入的文件名字一致是否可以?
2.处理完后，是否后面每次可以直接用，不用每次都tokennizer化。

当batchsize>1 显示shape '[batchsize, -1]' is invalid for input of attention_mask size

lora微调时,batchesize=2, 显示:
File "/opt/conda/lib/python3.10/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 718, in forward
attention_mask = attention_mask.view(batch_size, -1)
RuntimeError: shape '[2, -1]' is invalid for input of size 3969

经测：只有batchsize=1才能运行，>1时候，每个样本attention_mask都不同，就出错 shape '[batchsize, -1]' is invalid for input of size

问题似乎出在attention_mask的形状与期望的batch_size和样本数量之间的不匹配上。

lora推理报错

Traceback (most recent call last):
File "infer_lora_finetuning.py", line 68, in
response, history = model.chat(tokenizer, input, history=[],)
File "/root/miniconda3/lib/python3.8/site-packages/aigc_zoo/model_zoo/qwen/llm_model.py", line 52, in chat
max_window_size = self.generation_config.max_window_size
AttributeError: 'GenerationConfig' object has no attribute 'max_window_size'

重新拉取了最新的Qwen-chat的所有模型和配置，以及本项目的最新代码，无法进行lora微调了

没有动过任何项目配置和模型配置

你好，请教下，bash train_full.sh -m train微调后的last.ckpt模型，有办法转成类似https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/main下面的safetensors文件吗

你好，请教下，bash train_full.sh -m train微调后的last.ckpt模型，有办法转成类似https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/main 下面的safetensors文件吗

以方便之后用llama.cpp做进一步转换