Giter VIP home page Giter VIP logo

qwen_finetuning's Introduction

statement

    2024-04-22 简化
    2023-12-02 update qwen model 1.8b 7b 12b 72b 
    2023-10-09 support accelerator trainer
    2023-10-07 support colossalai trainer
    2023-09-26 support transformers trainer
    2023-09-25 0.2.4 support qwen-7b 新版 和 qwen-14b , 旧版不再支持,旧版可以安装 deep_training <= 0.2.3
                support transformers trainer
    2023-08-11 aigc-zoo 0.1.17.post0 update config , 更新下官方权重配置文件 
    dev 分支加一些新功能和想法 如果求稳定,请使用 stable分支

install

  • pip install -U -r requirements.txt
  • 如果无法安装 , 可以切换官方源 pip install -i https://pypi.org/simple -U -r requirements.txt

# flash-attention对显卡算例要求算力7.5 以上 , 下面可选安装 ,如果卡不支持可以不安装。
git clone -b https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary

weight

data sample

数据示例
例子依次分别是 工具,对话,对话,对话
数据构建sample 参考 data/make_data_example.py
数组组成  
role: 可选字段(str) 标志 q字段 角色, one of user system, observation ,system 标识是否为 system prompt , system prompt a 空 
q: 问题
a: 回答

注意事项:
a字段:对于普通对话,a即为回答。
细节可以参考 assets/react_prompt.md
{"id": 1, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"role": "user", "q": "Answer the following questions as best you can. You have access to the following tools:\n\nquark_search: Call this tool to interact with the 夸克搜索 API. What is the 夸克搜索 API useful for? 夸克搜索是一个通用搜索引擎,可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{\"name\": \"search_query\", \"description\": \"搜索关键词或短语\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nimage_gen: Call this tool to interact with the 通义万相 API. What is the 通义万相 API useful for? 通义万相是一个AI绘画(图像生成)服务,输入文本描述,返回根据文本作画得到的图片的URL Parameters: [{\"name\": \"query\", \"description\": \"中文关键词,描述了希望图像具有什么内容\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [quark_search,image_gen]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can be repeated zero or more times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: 现在给我画个五彩斑斓的黑", "a": "\nThought: 我应该使用通义万相API来生成一张五彩斑斓的黑的图片。\nAction: image_gen\nAction Input: {\"query\": \"五彩斑斓的黑\"}\n            "}, {"role": "observation", "q": "Observation: \n{\"status_code\": 200, \"request_id\": \"3d894da2-0e26-9b7c-bd90-102e5250ae03\", \"code\": null, \"message\": \"\", \"output\": {\"task_id\": \"2befaa09-a8b3-4740-ada9-4d00c2758b05\", \"task_status\": \"SUCCEEDED\", \"results\": [{\"url\": \"https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png\"}], \"task_metrics\": {\"TOTAL\": 1, \"SUCCEEDED\": 1, \"FAILED\": 0}}, \"usage\": {\"image_count\": 1}}\n", "a": "\nThought: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片。\nFinal Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png。\n            "}]}
{"id": 2, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "从南京到上海的路线", "a": "你好,南京到上海的路线如下:\n1. 南京到上海,可以乘坐南京地铁1号线,在南京站乘坐轨道交通1号线。\n2. 南京到浦东机场,可以搭乘上海地铁1号,在陆家嘴站乘坐地铁1线,在浦东国际机场站乘坐机场快线,前往上海浦东国际机场。\n3. 上海到南京,可以换乘上海地铁2号线,从南京站换乘地铁2线,再从南京南站换乘地铁1路,然后到达上海站"}]}
{"id": 3, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "写一个诗歌,关于冬天", "a": "冬夜寂静冷,\n云在天边飘,\n冰封白雪上, \n寒冷像一场雪。\n \n雪花融化成冰,\n像那雪花飘洒,\n在寒冷的冬天,\n感受春天的喜悦。\n 冬日里,\n风雪渐消,\n一片寂静,\n把快乐和温暖带回家。"}]}
{"id": 4, "paragraph": [{"role": "system", "q": "You are a helpful assistant.", "a": ""}, {"q": "晚上睡不着应该怎么办", "a": "晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:\n\n1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。\n2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。\n3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。\n4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。\n5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。\n6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。\n\n如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。"}]}

或者

{"id": 1, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "Answer the following questions as best you can. You have access to the following tools:\n\nquark_search: Call this tool to interact with the 夸克搜索 API. What is the 夸克搜索 API useful for? 夸克搜索是一个通用搜索引擎,可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{\"name\": \"search_query\", \"description\": \"搜索关键词或短语\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nimage_gen: Call this tool to interact with the 通义万相 API. What is the 通义万相 API useful for? 通义万相是一个AI绘画(图像生成)服务,输入文本描述,返回根据文本作画得到的图片的URL Parameters: [{\"name\": \"query\", \"description\": \"中文关键词,描述了希望图像具有什么内容\", \"required\": true, \"schema\": {\"type\": \"string\"}}] Format the arguments as a JSON object.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [quark_search,image_gen]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can be repeated zero or more times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: 现在给我画个五彩斑斓的黑"}, {"from": "assistant", "value": "\nThought: 我应该使用通义万相API来生成一张五彩斑斓的黑的图片。\nAction: image_gen\nAction Input: {\"query\": \"五彩斑斓的黑\"}\n            "}, {"from": "observation", "value": "Observation: \n{\"status_code\": 200, \"request_id\": \"3d894da2-0e26-9b7c-bd90-102e5250ae03\", \"code\": null, \"message\": \"\", \"output\": {\"task_id\": \"2befaa09-a8b3-4740-ada9-4d00c2758b05\", \"task_status\": \"SUCCEEDED\", \"results\": [{\"url\": \"https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png\"}], \"task_metrics\": {\"TOTAL\": 1, \"SUCCEEDED\": 1, \"FAILED\": 0}}, \"usage\": {\"image_count\": 1}}\n"}, {"from": "assistant", "value": "\nThought: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片。\nFinal Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的黑的图片https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/1e5e2015/20230801/1509/6b26bb83-469e-4c70-bff4-a9edd1e584f3-1.png。\n            "}]}
{"id": 2, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "从南京到上海的路线"}, {"from": "assistant", "value": "你好,南京到上海的路线如下:\n1. 南京到上海,可以乘坐南京地铁1号线,在南京站乘坐轨道交通1号线。\n2. 南京到浦东机场,可以搭乘上海地铁1号,在陆家嘴站乘坐地铁1线,在浦东国际机场站乘坐机场快线,前往上海浦东国际机场。\n3. 上海到南京,可以换乘上海地铁2号线,从南京站换乘地铁2线,再从南京南站换乘地铁1路,然后到达上海站"}]}
{"id": 3, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "写一个诗歌,关于冬天"}, {"from": "assistant", "value": "冬夜寂静冷,\n云在天边飘,\n冰封白雪上, \n寒冷像一场雪。\n \n雪花融化成冰,\n像那雪花飘洒,\n在寒冷的冬天,\n感受春天的喜悦。\n 冬日里,\n风雪渐消,\n一片寂静,\n把快乐和温暖带回家。"}]}
{"id": 4, "conversations": [{"from": "system", "value": "You are a helpful assistant."}, {"from": "user", "value": "晚上睡不着应该怎么办"}, {"from": "assistant", "value": "晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:\n\n1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。\n2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。\n3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。\n4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。\n5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。\n6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。\n\n如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。"}]}

infer

# infer.py 推理预训练模型
# infer_finetuning.py 推理微调模型
# infer_lora_finetuning.py 推理lora微调模型
 python infer.py
量化等级 最低 GPU 显存
FP16(无量化) 13 GB
INT8 10 GB
INT4 6 GB

inference

training

 # 制作数据
 cd scripts
 bash train_full.sh -m dataset 
 or
 bash train_lora.sh -m dataset 
 or
 bash train_ptv2.sh -m dataset 
 
 注: num_process_worker 为多进程制作数据 , 如果数据量较大 , 适当调大至cpu数量
 dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)
 
 # 全参数训练 
     bash train_full.sh -m train
     
 # lora adalora ia3 
     bash train_lora.sh -m train
     
 # ptv2
     bash train_ptv2.sh -m train

aigc-serving

部署qwen之后 , 可测试工具函数

训练参数

训练参数

友情链接

纯粹而干净的代码

Reference

https://github.com/QwenLM/Qwen-7B

Star History

Star History Chart

qwen_finetuning's People

Contributors

ssbuild avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

qwen_finetuning's Issues

有报错,貌似没有设置eos_token

ile "/checkpoint/binary/train_package/src/mdl/qwen/data_utils.py", line 58, in on_data_process
ds = TokenTruncation.process(tokenizer,config,data, max_seq_length,**data_conf[strategy])
File "/checkpoint/binary/train_package/src/mdl/qwen/data_processer.py", line 63, in process
d = TokenIdsFinal.process(input_ids,labels,max_seq_length,tokenizer)
File "/checkpoint/binary/train_package/src/mdl/qwen/data_processer.py", line 27, in process
input_ids = np.pad(input_ids, (0, pad_len), 'constant', constant_values=(pad_val, pad_val))
File "<array_function internals>", line 180, in pad
File "/root/.local/lib/python3.8/site-packages/numpy/lib/arraypad.py", line 803, in pad
_set_pad_area(roi, axis, width_pair, value_pair)
File "/root/.local/lib/python3.8/site-packages/numpy/lib/arraypad.py", line 147, in _set_pad_area
padded[left_slice] = value_pair[0]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

deep_training 这个包里报错 。挺奇怪。

Rank(Worker): 0 caught unknown unhandled exception:
Exception type: AssertionError
Detail:
Traceback (most recent call last):
File "/checkpoint/binary/train_package/train.py", line 26, in
qwen_train( qwen_args, output_dir)
File "/checkpoint/binary/train_package/src/mdl/qwen/train.py", line 122, in train
trainer.fit(pl_model, train_dataloaders=train_datasets)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 529, in fit
call._call_and_handle_interrupt(
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1016, in _run_stage
self.fit_loop.run()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
self.advance()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 218, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 185, in run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 260, in _optimizer_step
call._call_lightning_module_hook(
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 144, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/core/module.py", line 1256, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/core/optimizer.py", line 155, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 225, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/plugins/precision/amp.py", line 76, in optimizer_step
closure_result = closure()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in call
self._result = self.closure(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure
step_output = self._step_fn()
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 307, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 291, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 367, in training_step
return self.model.training_step(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 552, in training_step
outputs = self.compute_loss(**batch)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 371, in compute_loss
return self.model.compute_loss(**kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/transformer_base.py", line 117, in compute_loss
return self.model(*args, **batch)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 964, in forward
transformer_outputs = self.transformer(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 780, in forward
outputs = torch.utils.checkpoint.checkpoint(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 776, in custom_forward
return module(*inputs, use_cache, output_attentions)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 538, in forward
attn_outputs = self.attn(
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 448, in forward
context_layer = self.core_attention_flash(q, k, v)
File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.local/lib/python3.8/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 140, in forward
assert all((i.dtype in [torch.float16, torch.bfloat16] for i in (q, k, v)))
AssertionError

反序列化遇到问题

我用V100机器跑没啥问题,但是用线上A100跑就出问题如下。

root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py:90: UserWarning: Couldn't deserialize thrift: TProtocolException: Invalid data

warnings.warn(str(e))
/home/build/tfrecords/tfrecords/arrow_cc/third_party/arrow/cpp/src/arrow/result.cc:28: ValueOrDie called on an error: IOError: Couldn't deserialize thrift: TProtocolException: Invalid data
Deserializing page header failed.

Fatal Python error: Aborted

Current thread 0x00007fa3d305acc0 (most recent call first):
File "/root/.local/lib/python3.8/site-packages/tfrecords/python/io/arrow.py", line 593 in read_table
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 75 in reopen
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 58 in reset
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 51 in init
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 167 in reopen
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 156 in reset
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/random_dataset/init.py", line 149 in init
File "/root/.local/lib/python3.8/site-packages/fastdatasets/parquet/dataset.py", line 104 in RandomDataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/core/numpyadapter.py", line 325 in load
File "/root/.local/lib/python3.8/site-packages/numpy_io/core/reader.py", line 26 in load_numpy_dataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/dataloaders.py", line 64 in load_dataset
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/dataloaders.py", line 142 in load_distributed_random_sampler
File "/root/.local/lib/python3.8/site-packages/numpy_io/pytorch_loader/data_helper.py", line 89 in load_distributed_random_sampler
File "/checkpoint/binary/train_package/src/mdl/qwen/train.py", line 127 in train
File "/checkpoint/binary/train_package/train.py", line 39 in

我安装的包如下

tiktoken==0.4.0
einops==0.6.1
transformers>=4.30
deepspeed==0.10.0
cpm-kernels==1.0.11
bitsandbytes==0.39.1
accelerate>=0.20
aigc-zoo==0.1.17.post0
jieba==0.42.1
rouge_chinese==1.0.3
numba==0.57.1
GPUtil==1.4.0
torch==1.13.1

attention_mask有bug

data_utils.py这个处理,和千问finetune.py 不一致
151 for i,seqlen in enumerate(seqlens):
152 attention_mask[i,:seqlen] = 1

下面是千问官网的(github) finetune.py
174 attention_mask=input_ids.ne(tokenizer.pad_token_id),

NN_DataHelper

NN_DataHelper 在make_dataset_all 会将数据tokenizer化,然后保存

1.make_dataset_all过程能不能将保存的名字和输入的文件名字一致是否可以?
2.处理完后,是否后面每次可以直接用,不用每次都tokennizer化。

当batchsize>1 显示shape '[batchsize, -1]' is invalid for input of attention_mask size

lora微调时,batchesize=2, 显示:
File "/opt/conda/lib/python3.10/site-packages/deep_training/nlp/models/qwen/modeling_qwen.py", line 718, in forward
attention_mask = attention_mask.view(batch_size, -1)
RuntimeError: shape '[2, -1]' is invalid for input of size 3969

经测:只有batchsize=1才能运行,>1时候,每个样本attention_mask都不同,就出错 shape '[batchsize, -1]' is invalid for input of size

问题似乎出在attention_mask的形状与期望的batch_size和样本数量之间的不匹配上。

lora推理报错

image
Traceback (most recent call last):
File "infer_lora_finetuning.py", line 68, in
response, history = model.chat(tokenizer, input, history=[],)
File "/root/miniconda3/lib/python3.8/site-packages/aigc_zoo/model_zoo/qwen/llm_model.py", line 52, in chat
max_window_size = self.generation_config.max_window_size
AttributeError: 'GenerationConfig' object has no attribute 'max_window_size'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.