安装flash-attn错误,about qwenlm/qwen

Comments (14)

lucasjinreal commented on May 21, 2024 1

不如直接用xformers，flashattn有的xformers都有，flashattn不支持xformers都支持。。

from qwen.

logicwong commented on May 21, 2024 1

@jackaihfia2334 另外，不使用FlashAttention也是可以运行的。确保transformers==4.31.0就ok

from qwen.

logicwong commented on May 21, 2024 1

@shujun1992 之前碰到过，gcc版本过低确实很多东西都装不了。你可以配置一个conda虚拟环境，然后参照下面的方式升级gcc：

conda install gcc_linux-64 
conda install gxx_linux-64
cd /path/to/anaconda3/envs/xxx/bin (跳到自己的conda虚拟环境）
ln -s gcc x86_64-conda_cos6-linux-gnu-gcc
ln -s g++ x86_64-conda_cos6-linux-gnu-g++

配置完后重新激活环境看看？

from qwen.

DingSiuyo commented on May 21, 2024

from qwen.

logicwong commented on May 21, 2024

@jackaihfia2334 @DingSiuyo 前面应该有一长串的log，关键报错信息应该在里面？可以参照一下这个issue排查问题

from qwen.

WuNein commented on May 21, 2024

@jackaihfia2334 @DingSiuyo 前面应该有一长串的log，关键报错信息应该在里面？可以参照一下这个issue排查问题

您好，https://github.com/Dao-AILab/flash-attention#upgrading-from-flashattention-1x-to-flashattention-2

flashattention 更新了，model导入方法处，是否要做一下根据版本来重命名方法。

from qwen.

DirtyKnightForVi commented on May 21, 2024

这个是必须要安装的吗？

from qwen.

jackaihfia2334 commented on May 21, 2024

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [127 lines of output]

torch.version = 2.1.0.dev20230621+cu117

fatal: detected dubious ownership in repository at '/data/llm/code/Qwen-7B/flash-attention'
To add an exception for this directory, call:

  git config --global --add safe.directory /data/llm/code/Qwen-7B/flash-attention

running bdist_wheel
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:478: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.10
creating build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-3.10/flash_attn
copying flash_attn/init.py -> build/lib.linux-x86_64-3.10/flash_attn
creating build/lib.linux-x86_64-3.10/flash_attn/layers
copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
copying flash_attn/layers/init.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
creating build/lib.linux-x86_64-3.10/flash_attn/losses
copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
copying flash_attn/losses/init.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
creating build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/bert.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/llama.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/opt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/vit.py -> build/lib.linux-x86_64-3.10/flash_attn/models
copying flash_attn/models/init.py -> build/lib.linux-x86_64-3.10/flash_attn/models
creating build/lib.linux-x86_64-3.10/flash_attn/modules
copying flash_attn/modules/block.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
copying flash_attn/modules/init.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
creating build/lib.linux-x86_64-3.10/flash_attn/ops
copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
copying flash_attn/ops/init.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
creating build/lib.linux-x86_64-3.10/flash_attn/utils
copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
copying flash_attn/utils/init.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
running build_ext
building 'flash_attn_cuda' extension
creating build/temp.linux-x86_64-3.10
creating build/temp.linux-x86_64-3.10/csrc
creating build/temp.linux-x86_64-3.10/csrc/flash_attn
creating build/temp.linux-x86_64-3.10/csrc/flash_attn/src
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-3.10/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:42,
from csrc/flash_attn/fmha_api.cpp:33:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h: In function ‘void set_alpha(uint32_t&, float, Data_type)’:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:63:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
63 | alpha = reinterpret_cast<const uint32_t &>( h2 );
| ^~
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:68:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
68 | alpha = reinterpret_cast<const uint32_t &>( h2 );
| ^~
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:70:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
70 | alpha = reinterpret_cast<const uint32_t &>( norm );
| ^~~~
csrc/flash_attn/fmha_api.cpp: In function ‘void set_params_fprop(FMHA_fprop_params&, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, void*, float, float, bool, int)’:
csrc/flash_attn/fmha_api.cpp:64:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct FMHA_fprop_params’; use assignment or value-initialization instead [-Wclass-memaccess]
64 | memset(&params, 0, sizeof(params));
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from csrc/flash_attn/fmha_api.cpp:33:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:75:8: note: ‘struct FMHA_fprop_params’ declared here
75 | struct FMHA_fprop_params : public Qkv_params {
| ^~~~~~~~~~~~~~~~~
csrc/flash_attn/fmha_api.cpp:60:15: warning: unused variable ‘acc_type’ [-Wunused-variable]
60 | Data_type acc_type = DATA_TYPE_FP32;
| ^~~~~~~~
csrc/flash_attn/fmha_api.cpp: In function ‘std::vectorat::Tensor mha_fwd(const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, bool, int, c10::optionalat::Generator)’:
csrc/flash_attn/fmha_api.cpp:208:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
208 | bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
| ^~~~~~~
csrc/flash_attn/fmha_api.cpp: In function ‘std::vectorat::Tensor mha_fwd_block(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, c10::optionalat::Generator)’:
csrc/flash_attn/fmha_api.cpp:533:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
533 | bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
| ^~~~~~~
/usr/local/cuda/bin/nvcc -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o build/temp.linux-x86_64-3.10/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
from csrc/flash_attn/src/fmha_kernel.h:34,
from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
32 | #include "cutlass/cutlass.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
from csrc/flash_attn/src/fmha_kernel.h:34,
from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
32 | #include "cutlass/cutlass.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
from csrc/flash_attn/src/fmha_kernel.h:34,
from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
32 | #include "cutlass/cutlass.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
@logicwong

from qwen.

logicwong commented on May 21, 2024

@jackaihfia2334 @DingSiuyo 前面应该有一长串的log，关键报错信息应该在里面？可以参照一下这个issue排查问题

您好，https://github.com/Dao-AILab/flash-attention#upgrading-from-flashattention-1x-to-flashattention-2

flashattention 更新了，model导入方法处，是否要做一下根据版本来重命名方法。

目前readme的installation里还是贴的v1.0.8的版本，如果按readme来装就不用改。后续我们会修改代码让它同时支持1.0和2.0版本

from qwen.

WuNein commented on May 21, 2024

@jackaihfia2334 @DingSiuyo 前面应该有一长串的log，关键报错信息应该在里面？可以参照一下这个issue排查问题

您好，https://github.com/Dao-AILab/flash-attention#upgrading-from-flashattention-1x-to-flashattention-2
flashattention 更新了，model导入方法处，是否要做一下根据版本来重命名方法。

目前readme的installation里还是贴的v1.0.8的版本，如果按readme来装就不用改。后续我们会修改代码让它同时支持1.0和2.0版本

QWen_PRETRAINED_MODEL_ARCHIVE_LIST = ["qwen-7b"]

try:
    # from flash_attn.flash_attn_interface import flash_attn_unpadded_func
    import flash_attn
    if int(flash_attn.__version__.split(".")[0]) == 1:
        from flash_attn.flash_attn_interface import flash_attn_unpadded_func
    if int(flash_attn.__version__.split(".")[0]) == 2:
        from flash_attn.flash_attn_interface import flash_attn_varlen_func as flash_attn_unpadded_func
except ImportError:
    flash_attn_unpadded_func = None
    print("import flash_attn qkv fail")

实际上稍微改一下就好（

from qwen.

logicwong commented on May 21, 2024

@jackaihfia2334 @DingSiuyo 前面应该有一长串的log，关键报错信息应该在里面？可以参照一下这个issue排查问题

您好，https://github.com/Dao-AILab/flash-attention#upgrading-from-flashattention-1x-to-flashattention-2
flashattention 更新了，model导入方法处，是否要做一下根据版本来重命名方法。

目前readme的installation里还是贴的v1.0.8的版本，如果按readme来装就不用改。后续我们会修改代码让它同时支持1.0和2.0版本
QWen_PRETRAINED_MODEL_ARCHIVE_LIST = ["qwen-7b"]

try:
    # from flash_attn.flash_attn_interface import flash_attn_unpadded_func
    import flash_attn
    if int(flash_attn.__version__.split(".")[0]) == 1:
        from flash_attn.flash_attn_interface import flash_attn_unpadded_func
    if int(flash_attn.__version__.split(".")[0]) == 2:
        from flash_attn.flash_attn_interface import flash_attn_varlen_func as flash_attn_unpadded_func
except ImportError:
    flash_attn_unpadded_func = None
    print("import flash_attn qkv fail")
实际上稍微改一下就好（

可以直接提pr哈

from qwen.

jackaihfia2334 commented on May 21, 2024

按照flash attention官方的reademe，在ngc pytorch容器内安装的
仍然报错
ptxas info : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_traitsILi64ELi128ELi128ELi8ELi4ELi4ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi64ELi128ELi128ELi8ES2_EEEv16Flash_bwd_params
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1902, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-0em76put/flash-attn_82b7e874dae44f0f854165b5859a6df5/setup.py", line 202, in <module>
      setup(
    File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 343, in run
      self.run_command("build")
    File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 84, in run
      _build_ext.run(self)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
      _build_ext.build_ext.run(self)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
      self.build_extensions()
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 848, in build_extensions
      build_ext.build_extensions(self)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
      _build_ext.build_ext.build_extensions(self)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
      self._build_extensions_serial()
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
      self.build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 246, in build_extension
      _build_ext.build_extension(self, ext)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
      objects = self.compiler.compile(sources,
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 661, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1575, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1918, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

from qwen.

logicwong commented on May 21, 2024

按照flash attention官方的reademe，在ngc pytorch容器内安装的仍然报错 ptxas info : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_traitsILi64ELi128ELi128ELi8ELi4ELi4ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi64ELi128ELi128ELi8ES2_EEEv16Flash_bwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1902, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-0em76put/flash-attn_82b7e874dae44f0f854165b5859a6df5/setup.py", line 202, in <module>
      setup(
    File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 343, in run
      self.run_command("build")
    File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 84, in run
      _build_ext.run(self)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
      _build_ext.build_ext.run(self)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
      self.build_extensions()
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 848, in build_extensions
      build_ext.build_extensions(self)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
      _build_ext.build_ext.build_extensions(self)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
      self._build_extensions_serial()
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
      self.build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 246, in build_extension
      _build_ext.build_extension(self, ext)
    File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
      objects = self.compiler.compile(sources,
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 661, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1575, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1918, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

可以在FlashAttention官方库提下issue哈

from qwen.

shujun1992 commented on May 21, 2024

有遇到安装这个报错gcc版本问题的吗，
报错如下：RuntimeError: The current installed version of g++ (4.8.5) is less than the minimum required version by CUDA 11.4 (6.0.0). Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

不敢随便升级GCC

from qwen.

安装flash-attn错误 about qwen HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent