Comments (6)
INSTALL_FLASHATTN=true
from llama-factory.
INSTALL_FLASHATTN=true
INSTALL_FLASHATTN=true后安装的是新版本会报错,按照 Dao-AILab/flash-attention#966 (comment) 安装torch==2.3.0、flash-attn==2.5.8 解决undefined symbol: _ZN3c104cuda14ExchangeDeviceEa.
flash-attn对4090是必须的吗?
使用上面命令行训练会出现#4441 (comment) 中的错误, SDPA attention是修改那个参数?
from llama-factory.
应该是 GLM 模型代码的问题,你可以试着更新一下文件:https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py#L30-L36
from llama-factory.
INSTALL_FLASHATTN=true
试了多次,发现docker里需要torch==2.1.2 和 pip install flash-attn --no-build-isolation才能跑起来,装了后torchtext和torchvision都得换成0.16.2。上面提到的torch==2.3.0、flash-attn==2.5.8也不行,不知道第一次怎么成功的,是不是和docker里的cuda版本有关?后面试了下docker compose,无论怎么试都跑不了
flash-attn这个东西能不能不调用啊,我用pip install -e .编译的环境装flash-attn就卡死不动了,只能用docker
from llama-factory.
已经修复了 e3141f5
from llama-factory.
已经修复了 e3141f5
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml可以用了.但用令行加参数llamafactory-cli train --stage sft --do_train True也就是webui界面还是会提示未安装 flash_attn. 尝试docker里尝试安装 flash_attn会报错. 26号下的docker compose在另一台双4090显卡电脑里能运行.报错这台电脑是单4090
exit code: 1
╰─> [165 lines of output]
fatal: not a git repository (or any of the parent directories): .git
torch.__version__ = 2.3.0a0+ebedce2
/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2095, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '4']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-x_jhgpxb/flash-attn_2f2e7ee88bc743f1bc99623ecc04d0cc/setup.py", line 311, in <module>
setup(
File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 103, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-install-x_jhgpxb/flash-attn_2f2e7ee88bc743f1bc99623ecc04d0cc/setup.py", line 266, in run
return super().run()
File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 88, in run
_build_ext.run(self)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
build_ext.build_extensions(self)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 249, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
super(build_ext, self).build_extension(ext)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1773, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2111, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (flash-attn)
from llama-factory.
Related Issues (20)
- DPO训练时,reward/acc很快收敛到1,但效果并不是很好 HOT 7
- 如何画不同数据的损失?
- 分布式训练的支持? HOT 1
- 使用基于Llama-3-Chinese-8B-Instruct-v3做预训练,输出不会停止 HOT 2
- LLama-Factory怎么指定退火阶段加入特定训练数据 HOT 2
- Error when converting PiSSA adapter to normal LoRA in deepspeed stage 3 mode HOT 4
- template使用问题 HOT 2
- llama重复输出,在常见问题中未能解决 HOT 3
- NPU 910b4卡上推理速度很慢
- Low Learning Rate for Mistral Models
- 请教一下:想构建试题+试题问答的数据集来微调Qwen2
- ppo训练没有限制保存checkpoints数量功能。save_steps、save_strategy、save_total_limit没有任何作用。依然每步保存一个checkpoint。 HOT 3
- 长对话的微调 HOT 1
- LLava 1.5 7B 多模态在npu上进行推理进行问答默认是 HOT 1
- OSError: [WinError 126] 找不到指定的模块。 Error loading "D:\mysoft\anaconda3\envs\llama_factory\Lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies HOT 1
- 大佬们,如何使用vllm进行predict预测blue和rouge分数
- 大佬问一个大模型分词问题
- 训练完成之后结果非常不理想
- 可以支持DashInfer格式的大模型训练吗?
- 如何避免每个GPU都开启一个模型 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-factory.