Comments (12)
That did it! Thank you.
from flux.jl.
Interestingly, ConvTranspose
works perfectly fine with kernel size 1. The following gives no warning:
using Flux, CUDA
c = ConvTranspose((1,), 1=>1) |> gpu
x = randn(1,1,1) |> gpu
c(x)
from flux.jl.
Can you post the other package versions in your environment? The Julia version alone doesn't tell us much. The version info from CUDA.jl as well. Lastly, make sure you have ample memory available before running the code. This error can pop up if you're near the memory limit of your GPU.
from flux.jl.
...make sure you have ample memory available before running the code.
I was not aware of this. When you pointed it out, I thought this might have been the case, but I re-checked, and this time I had plenty of available memory.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:2D:00.0 Off | N/A |
| 27% 30C P8 20W / 250W | 2195MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I tested on a clean environment with only the following packages:
CUDA v4.4.1
Flux v0.14.3
cuDNN v1.1.0
CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.7
NVIDIA driver 515.65.1
CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+515.65.1
Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0
Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce RTX 2080 Ti (sm_75, 7.367 GiB / 11.000 GiB available)
It's worth noting that Conv and ConvTranspose are equivalent in this case, so I can work around it using ConvTranspose. I don't know if that's helpful for fixing the issue with Conv though.
c = Conv((1,), 2 => 3);
ct = ConvTranspose(permutedims(c.weight, (1,3,2)), c.bias);
x = randn(Float32, 2,2,1);
c(x) == ct(x)
from flux.jl.
It turns out ConvTranspose
runs into the same issue when the number of input channels is just slightly larger.
c = ConvTranspose((1,), 8=>1) |> gpu
x = randn(1,8,1) |> gpu
c(x)
Warning: No valid algorithm found, probably bad params for convolution.
At the point of testing, I had more than 9GB of available VRAM, so I don't see how that could be the issue. With 2d convolutions, there is no problem, even with over 10,000 input channels.
Edit:
Even stranger, increasing the number of output channels also results in no warning:
c = ConvTranspose((1,), 8=>32) |> gpu
x = randn(1,8,1) |> gpu
c(x)
from flux.jl.
I tried replicating the first example and wasn't able to:
CUDA runtime 12.1, artifact installation
CUDA driver 12.1
NVIDIA driver 530.30.2
CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+530.30.2
Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0
Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: Tesla V100S-PCIE-32GB (sm_70, 31.437 GiB / 32.000 GiB available)
Comparing versions, the main difference appears to be having newer CUDA libraries because of a new driver. Are you able to update the Nvidia drivers on your system?
from flux.jl.
I updated to Nvidia driver 535, but I can't seem to figure out how to upgrade the CUDA libraries. I deleted the whole artifact folder, but it re-downloaded the same CUDA library versions. The only difference is that it upgraded the NVML from 11.0.0 to 12.0.0. Seems likely that this is the root cause though if you can't replicate the behavior.
from flux.jl.
Your best bet may be to ask how to upgrade said libraries in the Julia GPU help channels. As-is I'm pretty stumped.
from flux.jl.
I updated the CUDA runtime version to 12.1 to match yours, and the first example still gives me the same warning. The CUDA runtime libraries are the same version, but my NVIDIA driver installation is 535 instead of 530. The first time I used version driver version 515. It would be surprising if 530 is the only version that works.
CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.86.10
CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.86.10
Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0
Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce RTX 2080 Ti (sm_75, 8.606 GiB / 11.000 GiB available)
from flux.jl.
I doubt that's the case either. Just a quick sanity check, have you restarted your system after installing newer drivers?
To proceed, we definitely need more information on where this error is coming from. Can you re-run your original example with the JULIA_DEBUG=CUDA,cuDNN
environment variable set?
from flux.jl.
I have restarted the system after installing new drivers.
I will see if I can reproduce the results on another system. I don't have another system with a similar GPU, but I will try with different GPUs (P100/V100/A100). I will also check if I can reproduce it on Windows with the same system. It might take a few days, but I will be back with more info.
I re-ran the original example with JULIA_DEBUG=CUDA,cuDNN
as you suggested. Without a comparison, I am completely lost looking at the log files.
Initializing Conv layer
julia> c = Conv((1,), 1=>1) |> gpu
Conv((1,), 1 => 1) # 2 parameters┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE16ComposedFunctionIS0_6iszeroES3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE16ComposedFunctionIS0_6iszeroES3_IS7_I7Float32Li3ELi1EEEE
│ 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_1685
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_1702
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isnanS3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isnanS3_IS7_I7Float32Li3ELi1EEEE
│ 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_2751
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_2768
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isnanS3_IS7_I7Float32Li1ELi1EEEE' for 'sm_75'
│ ptxas info : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isnanS3_IS7_I7Float32Li1ELi1EEEE
│ 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 28 registers, 32 bytes smem, 464 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3006
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3020
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isinfS3_IS7_I7Float32Li3ELi1EEEE' for 'sm_75'
│ ptxas info : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi3E5TupleI5OneToI5Int64ES4_IS5_ES4_IS5_EEES2_ILi3ES3_IS4_IS5_ES4_IS5_ES4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li4ELi1EE11BroadcastedI12CuArrayStyleILi3EES3_IS4_IS5_ES4_IS5_ES4_IS5_EE5isinfS3_IS7_I7Float32Li3ELi1EEEE
│ 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 66 registers, 32 bytes smem, 544 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3109
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3126
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isinfS3_IS7_I7Float32Li1ELi1EEEE' for 'sm_75'
│ ptxas info : Function properties for _Z22partial_mapreduce_grid8identity1_4Bool16CartesianIndicesILi1E5TupleI5OneToI5Int64EEES2_ILi1ES3_IS4_IS5_EEE3ValILitrueEE13CuDeviceArrayIS1_Li2ELi1EE11BroadcastedI12CuArrayStyleILi1EES3_IS4_IS5_EE5isinfS3_IS7_I7Float32Li1ELi1EEEE
│ 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 28 registers, 32 bytes smem, 464 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3239
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for julia_fldmod1_3253
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
Calling layer
julia> c(x)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/convolution.jl:280
┌ Debug: cuBLAS (v12.0) function cublasStatus_t cublasCreate_v2(cublasContext**) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xb272790)
│ Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x(nil))
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: CuDNN (v8904) function cudnnGetVersion() called:
│ Time: 2023-09-07T17:21:05.599124 (0d+0h+0m+24s since start)
│ Process=12870; Thread=12870; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/cuDNN.jl:141
┌ Debug: cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│ Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: CuDNN (v8904) function cudnnCreateConvolutionDescriptor() called:
│ convDesc: location=host; addr=0x7fbe4ef2e440;
│ Time: 2023-09-07T17:21:05.678845 (0d+0h+0m+24s since start)
│ Process=12870; Thread=12870; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN ~/.julia/packages/cuDNN/YkZhm/src/cuDNN.jl:141
┌ Debug: cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│ Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│ Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: cuBLAS (v12.0) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0xbbdcad0)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffc9cd59d6c)
│ Time: 2023-09-07T17:21:06 elapsed from start 0.433333 minutes or 26.000000 seconds
│ Process=12870; Thread=140461244624960; GPU=0; Handle=POINTER (IN HEX:0x0xbbdcad0); StreamId=POINTER (IN HEX:0x(nil)) (defaultStream); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
│
└ @ CUDA.CUBLAS ~/.julia/packages/CUDA/35NC6/lib/cublas/CUBLAS.jl:224
┌ Debug: PTX compiler log:
│ ptxas info : 228 bytes gmem
│ ptxas info : Compiling entry function '_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li3ELi1EE11BroadcastedI12CuArrayStyleILi3EE5TupleI5OneToI5Int64ES5_IS6_ES5_IS6_EE8identityS4_IS2_IS3_ILi3EEv1_S4_I8ExtrudedIS0_IS1_Li3ELi1EES4_I4BoolS10_S10_ES4_IS6_S6_S6_EES9_IS0_IS1_Li3ELi1EES4_IS10_S10_S10_ES4_IS6_S6_S6_EEEEEES6_' for 'sm_75'
│ ptxas info : Function properties for _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li3ELi1EE11BroadcastedI12CuArrayStyleILi3EE5TupleI5OneToI5Int64ES5_IS6_ES5_IS6_EE8identityS4_IS2_IS3_ILi3EEv1_S4_I8ExtrudedIS0_IS1_Li3ELi1EES4_I4BoolS10_S10_ES4_IS6_S6_S6_EES9_IS0_IS1_Li3ELi1EES4_IS10_S10_S10_ES4_IS6_S6_S6_EEEEEES6_
│ 8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Used 40 registers, 600 bytes cmem[0]
│ ptxas info : Function properties for gpu_report_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
│ ptxas info : Function properties for gpu_signal_exception
│ 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
└ @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/compilation.jl:190
┌ Debug: JIT compiling code
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:26
┌ Debug: JIT info log is empty
└ @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/module.jl:63
1×1×1 CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}:
[:, :, 1] =
-0.79804516
from flux.jl.
Ok, can you run a quick ] up
, ensure you have cuDNN v1.1.1 and try again? What I think is happening is that a change we made to avoid a spurious warning never made it into an actual cuDNN.jl release, so I asked the CUDA.jl maintainers for a new one.
from flux.jl.
Related Issues (20)
- deprecate Flux.params HOT 7
- Significant time spent moving medium-size arrays to GPU, type instability HOT 10
- ConvTranspose errors with symmetric non-constant pad
- SamePad() for even sized filters.
- Dense layers with shared parameters HOT 5
- Implementation of `AdamW` differs from PyTorch HOT 10
- `gpu` should warn if cuDNN is not installed HOT 2
- Cannot take `gradient` of L2 regularization loss HOT 1
- Create a flag to use Enzyme as the AD in training/etc. HOT 14
- test Enzyme gradient for loss functions
- test Enzyme gpu support
- Enzyme fails with MultiHeadAttention layer HOT 13
- Enable github Discussions
- Stacked RNN in Flux.jl?
- Add option to throw error on passing wrong precision floats to layers HOT 3
- Potential bug of RNN training flow
- why is my `withgradient` type unstable ? HOT 1
- is `Flux.huber_loss` type-unstable ?
- Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) HOT 1
- ConvTranspose with padding on cpu throws exception HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux.jl.