When doing 1d convolutions with a kernel size of 1 I get a warning that no valid algor

I tried replicating the first example and wasn't able to: <div class="snippet-clip

No cudnn implementation of Conv((1,) N=>M),about fluxml/flux.jl

Comments (12)

ludvigk commented on September 27, 2024 1

That did it! Thank you.

from flux.jl.

ludvigk commented on September 27, 2024

Interestingly, ConvTranspose works perfectly fine with kernel size 1. The following gives no warning:

using Flux, CUDA
c = ConvTranspose((1,), 1=>1) |> gpu
x = randn(1,1,1) |> gpu
c(x)

from flux.jl.

ToucheSir commented on September 27, 2024

Can you post the other package versions in your environment? The Julia version alone doesn't tell us much. The version info from CUDA.jl as well. Lastly, make sure you have ample memory available before running the code. This error can pop up if you're near the memory limit of your GPU.

from flux.jl.

ludvigk commented on September 27, 2024

...make sure you have ample memory available before running the code.

I was not aware of this. When you pointed it out, I thought this might have been the case, but I re-checked, and this time I had plenty of available memory.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:2D:00.0 Off |                  N/A |
| 27%   30C    P8    20W / 250W |   2195MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I tested on a clean environment with only the following packages:
CUDA v4.4.1
Flux v0.14.3
cuDNN v1.1.0

CUDA.versioninfo()

CUDA runtime 11.8, artifact installation
CUDA driver 11.7
NVIDIA driver 515.65.1

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+515.65.1

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 7.367 GiB / 11.000 GiB available)

It's worth noting that Conv and ConvTranspose are equivalent in this case, so I can work around it using ConvTranspose. I don't know if that's helpful for fixing the issue with Conv though.

c = Conv((1,), 2 => 3);
ct = ConvTranspose(permutedims(c.weight, (1,3,2)), c.bias);
x = randn(Float32, 2,2,1);
c(x) == ct(x)

from flux.jl.

ludvigk commented on September 27, 2024

It turns out ConvTranspose runs into the same issue when the number of input channels is just slightly larger.

c = ConvTranspose((1,), 8=>1) |> gpu
x = randn(1,8,1) |> gpu
c(x)

Warning: No valid algorithm found, probably bad params for convolution.

At the point of testing, I had more than 9GB of available VRAM, so I don't see how that could be the issue. With 2d convolutions, there is no problem, even with over 10,000 input channels.

Edit:

Even stranger, increasing the number of output channels also results in no warning:

c = ConvTranspose((1,), 8=>32) |> gpu
x = randn(1,8,1) |> gpu
c(x)

from flux.jl.

ToucheSir commented on September 27, 2024

I tried replicating the first example and wasn't able to:

CUDA runtime 12.1, artifact installation
CUDA driver 12.1
NVIDIA driver 530.30.2

CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+530.30.2

Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: Tesla V100S-PCIE-32GB (sm_70, 31.437 GiB / 32.000 GiB available)

Comparing versions, the main difference appears to be having newer CUDA libraries because of a new driver. Are you able to update the Nvidia drivers on your system?

from flux.jl.

ludvigk commented on September 27, 2024

I updated to Nvidia driver 535, but I can't seem to figure out how to upgrade the CUDA libraries. I deleted the whole artifact folder, but it re-downloaded the same CUDA library versions. The only difference is that it upgraded the NVML from 11.0.0 to 12.0.0. Seems likely that this is the root cause though if you can't replicate the behavior.

from flux.jl.

ToucheSir commented on September 27, 2024

Your best bet may be to ask how to upgrade said libraries in the Julia GPU help channels. As-is I'm pretty stumped.

from flux.jl.

ludvigk commented on September 27, 2024

I updated the CUDA runtime version to 12.1 to match yours, and the first example still gives me the same warning. The CUDA runtime libraries are the same version, but my NVIDIA driver installation is 535 instead of 530. The first time I used version driver version 515. It would be surprising if 530 is the only version that works.

CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.86.10

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.86.10

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 8.606 GiB / 11.000 GiB available)

from flux.jl.

ToucheSir commented on September 27, 2024

I doubt that's the case either. Just a quick sanity check, have you restarted your system after installing newer drivers?

To proceed, we definitely need more information on where this error is coming from. Can you re-run your original example with the JULIA_DEBUG=CUDA,cuDNN environment variable set?

from flux.jl.

ludvigk commented on September 27, 2024

I have restarted the system after installing new drivers.
I will see if I can reproduce the results on another system. I don't have another system with a similar GPU, but I will try with different GPUs (P100/V100/A100). I will also check if I can reproduce it on Windows with the same system. It might take a few days, but I will be back with more info.

I re-ran the original example with JULIA_DEBUG=CUDA,cuDNN as you suggested. Without a comparison, I am completely lost looking at the log files.

Initializing Conv layer

julia> c = Conv((1,), 1=>1) |> gpu
Conv((1,), 1 => 1)  # 2 parameters┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE16ComposedFunctionIS0_6iszeroES3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info    : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE16ComposedFunctionIS0_6iszeroES3_IS7_I7Float32Li3ELi1EEEE
│     24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_1685
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_1702
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isnanS3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info    : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isnanS3_IS7_I7Float32Li3ELi1EEEE
│     24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_2751
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_2768
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isnanS3_IS7_I7Float32Li1ELi1EEEE' for 'sm_75'
│ ptxas info    : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isnanS3_IS7_I7Float32Li1ELi1EEEE
│     24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 28 registers, 32 bytes smem, 464 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3006
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3020
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isinfS3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info    : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isinfS3_IS7_I7Float32Li3ELi1EEEE
│     24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3109
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3126
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isinfS3_IS7_I7Float32Li1ELi1EEEE' for 'sm_75'
│ ptxas info    : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isinfS3_IS7_I7Float32Li1ELi1EEEE
│     24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 28 registers, 32 bytes smem, 464 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3239
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for julia_fldmod1_3253
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63

Calling layer

julia> c(x)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/convolution.jl:280
┌ Debug:  cuBLAS (v12.0) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xb272790)
│  Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: CuDNN (v8904) function cudnnGetVersion() called:
│ Time: 2023-09-07T17:21:05.599124 (0d+0h+0m+24s since start)
│ Process=12870; Thread=12870; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/cuDNN.jl:141
┌ Debug:  cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│  Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: CuDNN (v8904) function cudnnCreateConvolutionDescriptor() called:
│     convDesc: location=host; addr=0x7fbe4ef2e440;
│ Time: 2023-09-07T17:21:05.678845 (0d+0h+0m+24s since start)
│ Process=12870; Thread=12870; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/cuDNN.jl:141
┌ Debug:  cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│  Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug:  cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│  Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug:  cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│  Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
│ 
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: PTX compiler log:
│ ptxas info    : 228 bytes gmem
│ ptxas info    : Compiling entry function '_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li3ELi1EE11BroadcastedI12CuArrayStyleILi3EE5TupleI5OneToI5Int64ES5_IS6_ES5_IS6_EE8identityS4_IS2_IS3_ILi3EEv1_S4_I8ExtrudedIS0_IS1_Li3ELi1EES4_I4BoolS10_S10_ES4_IS6_S6_S6_EES9_IS0_IS1_Li3ELi1EES4_IS10_S10_S10_ES4_IS6_S6_S6_EEEEEES6_' for 'sm_75'
│ ptxas info    : Function properties for _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li3ELi1EE11BroadcastedI12CuArrayStyleILi3EE5TupleI5OneToI5Int64ES5_IS6_ES5_IS6_EE8identityS4_IS2_IS3_ILi3EEv1_S4_I8ExtrudedIS0_IS1_Li3ELi1EES4_I4BoolS10_S10_ES4_IS6_S6_S6_EES9_IS0_IS1_Li3ELi1EES4_IS10_S10_S10_ES4_IS6_S6_S6_EEEEEES6_
│     8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Used 40 registers, 600 bytes cmem[0]
│ ptxas info    : Function properties for gpu_report_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info    : Function properties for gpu_signal_exception
│     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
1×1×1 CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}:
[:, :, 1] =
 -0.79804516

from flux.jl.

ToucheSir commented on September 27, 2024

Ok, can you run a quick ] up, ensure you have cuDNN v1.1.1 and try again? What I think is happening is that a change we made to avoid a spurious warning never made it into an actual cuDNN.jl release, so I asked the CUDA.jl maintainers for a new one.

from flux.jl.

No cudnn implementation of Conv((1,) N=>M) about flux.jl HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent