is a p5200 enough for this? ] Traceback (most recent call last):

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

close <div class="snippet-clipboard-content notranslate position-relative overflow

nm <div class="snippet-clipboard-content notranslate position-relative overflow-au

RuntimeError: CUDA error: no kernel image is available for execution on the device about mamba HOT 11 CLOSED

state-spaces commented on August 27, 2024

RuntimeError: CUDA error: no kernel image is available for execution on the device

from mamba.

Comments (11)

hrbigelow commented on August 27, 2024 13

@tridao (I am not sure if this is just a hack, but for us old guys with CCC < 7, can we do this?)

I see that the Quadro P5200 has Cuda Compute capability 6.1. I saw the same error with my GeForce GTX 1070 (which is also Compute Capability 6.1)

I was able to fix it by compiling the causal-conv1d dependency from source, as follows:

git clone https://github.com/Dao-AILab/causal-conv1d.git
# this is the latest version that Mamba supports:
git checkout v1.0.2
cd causal-conv
# edit setup.py to add the lines here:
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_60,code=sm_60")

Here is where you need to add those lines.

Then, compile it from source with:

CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

You can use the following script to test whether it is working properly:

import torch
from causal_conv1d import causal_conv1d_fn

batch, dim, seq, width = 10, 5, 17, 4
x = torch.zeros((batch, dim, seq)).to('cuda')
weight = torch.zeros((dim, width)).to('cuda')
bias = torch.zeros((dim, )).to('cuda')

causal_conv1d_fn(x, weight, bias, None)

EDIT: Just realized the Mamba repo also assumes CCC >= 7. So, I did a similar edit to the mamba setup.py and compiled it with:

henry@henry-gs65:mamba$ MAMBA_FORCE_BUILD=TRUE pip install .

(This takes about 10 minutes to compile)

Once doing this, the top-level Mamba demo works:

import torch

from mamba_ssm import Mamba

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape

from mamba.

hrbigelow commented on August 27, 2024 3

oops, sorry but I forgot a crucial thing. Mamba states that it requires causal_conv1d version <= 1.0.2. I forgot to mention this. So, you need to do a git checkout v1.0.2 before you do the pip install. From where you are now, I'd say it would be:

$ cd causal-conv1d
$ git checkout v1.0.2
# you've already edited the setup.py file I assume
$ pip uninstall causal-conv1d 
$ CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

At this point, it may work ;) Since Mamba dynamically loads the causal-conv1d python module, no re-compilation of mamba is necessary. But I am not positive of that.

from mamba.

thistleknot commented on August 27, 2024 2

Processing /home/user/mamba/causal-conv1d
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 480, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 9, in <module>
      ModuleNotFoundError: No module named 'packaging'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a

despite installing python3-packaging and pip install packaging (and can confirm I can import packaging)

from mamba.

thistleknot commented on August 27, 2024 1

>>> y = model(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 149, in forward
    out = mamba_inner_fn(
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 306, in mamba_inner_fn
    return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 181, in forward
    conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, True)
TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor

Invoked with: tensor([[[-0.4806,  1.2685,  0.3929,  ...,  0.3327,  0.3938, -0.5350],
         [ 0.9421, -0.1715, -0.0481,  ..., -0.1955, -0.8604, -0.4096],
         [ 0.5454, -0.1034, -0.2881,  ...,  0.2157, -1.2089, -0.3394],
         ...,
         [ 0.3014,  0.2976, -0.3656,  ..., -0.4423, -0.8560, -0.3013],
         [-0.3690, -0.3119, -0.1994,  ..., -0.4742, -0.6223,  0.2423],
         [-0.7320,  1.4818,  0.6340,  ..., -0.4294,  0.2926, -0.0436]],

        [[ 0.4325, -0.4794,  0.4466,  ...,  0.1774,  0.8001, -0.0083],
         [-0.2831, -0.2780,  0.3027,  ...,  0.3467, -1.0696,  0.2190],
         [-0.7058,  0.7942, -0.5447,  ...,  0.5141, -0.9554, -0.0649],
         ...,
         [-0.7701,  0.9309, -0.6030,  ...,  0.2993, -0.0422, -0.1484],
         [ 0.5808,  0.4285, -0.5568,  ...,  1.3064, -1.0199, -0.3363],
         [ 0.0734,  0.0993,  0.6768,  ..., -0.1356,  0.9295, -0.1664]]],
       device='cuda:0', requires_grad=True), tensor([[-0.0555,  0.4169,  0.2594, -0.4943],
        [-0.0554,  0.0376,  0.1702,  0.4476],
        [-0.1875,  0.4470,  0.2299, -0.0788],
        [-0.2496,  0.4405, -0.0241,  0.0307],
        [ 0.2666, -0.2731, -0.1284, -0.3504],
        [ 0.2001,  0.1497,  0.2172,  0.1289],
        [ 0.3474,  0.3953,  0.2375,  0.0597],
        [ 0.0498,  0.1374, -0.0508, -0.1526],
        [-0.2388, -0.2890, -0.4515,  0.0008],
        [-0.2706, -0.4276, -0.4668,  0.4245],
        [ 0.0252,  0.0295, -0.4991,  0.2078],
        [ 0.2212,  0.3381, -0.3815,  0.1831],
        [-0.3029, -0.3729, -0.1333, -0.1371],
        [-0.3745,  0.0316, -0.1675,  0.0064],
        [ 0.4358,  0.4920, -0.4541, -0.0722],
        [ 0.2807, -0.1016, -0.4563, -0.3044],
        [ 0.1035,  0.0162,  0.4479,  0.3260],
        [-0.2877,  0.1106,  0.4981,  0.4084],
        [-0.3320, -0.3829, -0.1360,  0.3744],
        [-0.3771, -0.3639, -0.1163,  0.3709],
        [-0.2274, -0.4964, -0.0816,  0.4454],
        [ 0.1764, -0.0485,  0.3448, -0.4393],
        [-0.3905, -0.3605,  0.0623, -0.2038],
        [-0.2044, -0.1454, -0.1526, -0.4165],
        [-0.0414,  0.1940,  0.3441, -0.3418],
        [ 0.4200, -0.2309,  0.1998, -0.1196],
        [-0.4553,  0.1990,  0.4579,  0.1669],
        [-0.3292,  0.0408, -0.4167,  0.3332],
        [ 0.4237,  0.4848, -0.3006, -0.2292],
        [ 0.4939,  0.1801, -0.1294,  0.0011],
        [ 0.3516, -0.3912,  0.3251,  0.3016],
        [-0.0648, -0.0567, -0.3247,  0.4323]], device='cuda:0',
       requires_grad=True), Parameter containing:
tensor([-3.1444e-01,  4.3207e-02,  2.2112e-01, -3.4120e-01,  4.0195e-01,
        -1.4227e-01, -4.5976e-01, -3.6258e-04, -4.6205e-01,  1.7177e-01,
         4.6020e-01, -1.7618e-01,  2.0168e-01,  1.2738e-01,  2.8975e-01,
        -4.2130e-01, -2.3378e-01, -1.8998e-01, -9.5853e-02, -2.4321e-01,
        -1.0333e-02, -2.0879e-01,  1.2288e-01,  5.1831e-02, -4.9842e-02,
        -3.1233e-01,  1.4064e-01, -2.4546e-01,  3.0703e-01,  1.4846e-02,
         7.5587e-02, -3.6691e-01], device='cuda:0', requires_grad=True), True
>>>

from mamba.

tridao commented on August 27, 2024 1

Sorry I'm traveling this week but will have time to look into this next week.

from mamba.

thistleknot commented on August 27, 2024 1

yay, that did it
back in the game
=D

from mamba.

thistleknot commented on August 27, 2024

oops, I did it out of order: nm, still produced the same error after applying same process to mamba's setup.py

Fyi for us newbs

CCC stands for "CUDA Compute Capability," a numerical value that represents the features supported by a CUDA (Compute Unified Device Architecture) hardware (typically a GPU). CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing (an approach known as GPGPU, General-Purpose computing on Graphics Processing Units).

The Compute Capability is a version number indicating the features supported by the GPU. Different versions of CUDA GPUs support different features and therefore have different Compute Capabilities. For example, the Quadro P5200 and GeForce GTX 1070 GPUs mentioned have a Compute Capability of 6.1. This version number is important for developers because they need to compile their programs for a specific Compute Capability to ensure compatibility and optimal performance on the target GPU.

When you modify a setup.py file of a Python package to include specific Compute Capability flags, you are instructing the compiler to generate code optimized for GPUs with that particular Compute Capability. This is often necessary when working with older GPUs or when the pre-compiled binaries of a library do not support the specific Compute Capability of your GPU.

from mamba.

thistleknot commented on August 27, 2024

btw, I had to do something similar to get ctransformers to work

from mamba.

hrbigelow commented on August 27, 2024

(i edited the original instruction to reflect this just now)

from mamba.

thistleknot commented on August 27, 2024

pip install wheel
python setup.py

from mamba.

StorywithLove commented on August 27, 2024

@tridao (I am not sure if this is just a hack, but for us old guys with CCC < 7, can we do this?)

I see that the Quadro P5200 has Cuda Compute capability 6.1. I saw the same error with my GeForce GTX 1070 (which is also Compute Capability 6.1)

I was able to fix it by compiling the causal-conv1d dependency from source, as follows:
git clone https://github.com/Dao-AILab/causal-conv1d.git
# this is the latest version that Mamba supports:
git checkout v1.0.2
cd causal-conv
# edit setup.py to add the lines here:
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_60,code=sm_60")
Here is where you need to add those lines.

Then, compile it from source with:

CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

You can use the following script to test whether it is working properly:
import torch
from causal_conv1d import causal_conv1d_fn

batch, dim, seq, width = 10, 5, 17, 4
x = torch.zeros((batch, dim, seq)).to('cuda')
weight = torch.zeros((dim, width)).to('cuda')
bias = torch.zeros((dim, )).to('cuda')

causal_conv1d_fn(x, weight, bias, None)
EDIT: Just realized the Mamba repo also assumes CCC >= 7. So, I did a similar edit to the mamba setup.py and compiled it with:

henry@henry-gs65:mamba$ MAMBA_FORCE_BUILD=TRUE pip install .

(This takes about 10 minutes to compile)

Once doing this, the top-level Mamba demo works:
import torch

from mamba_ssm import Mamba

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape

Oh, God, I solve it! Love from P40 (CCC 6.1)!!!

from mamba.

RuntimeError: CUDA error: no kernel image is available for execution on the device about mamba HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent