So earlier I was getting the same issue from issue #45
E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR
_NO_DEVICE: no CUDA-capable device is detected
Here is my sytem config for reference:
Ubuntu 18.04 LTS
i7- 9th GEN
GTX 1660 Ti 6GB (notebook)
16 GB RAM
CUDA Toolkit - 10.1.243
cudnn - 7.6.5
Driver version 450.66 (It doesnt work with r418 either)
No other programs running in background
I tried the fix reccomened by FussnerS and am now getting a new error.
swastik@G531G:~/Documents/TensorFlow-2.x-YOLOv3$ python train.py
2020-09-09 11:24:28.420718: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-09-09 11:24:28.445247: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2599990000 Hz
2020-09-09 11:24:28.445783: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564df49d7b30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-09 11:24:28.445815: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-09 11:24:28.446370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-09 11:24:28.458470: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.458703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2020-09-09 11:24:28.458815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-09 11:24:28.459771: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-09-09 11:24:28.460699: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-09-09 11:24:28.460898: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-09-09 11:24:28.461914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-09-09 11:24:28.462462: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-09-09 11:24:28.464652: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-09 11:24:28.464724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.465068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.465261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-09 11:24:28.465282: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-09 11:24:28.519237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 11:24:28.519280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-09-09 11:24:28.519296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-09-09 11:24:28.519459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.519734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.519970: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.520200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 5052 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-09 11:24:28.521529: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564df7fd1160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-09 11:24:28.521541: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2658484180504880421
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 12809034059942815785
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 5298053440
locality {
bus_id: 1
links {
}
}
incarnation: 11980719283488140266
physical_device_desc: "device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 5307793851066346902
physical_device_desc: "device: XLA_GPU device"
]
2020-09-09 11:24:28.522030: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.522225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2020-09-09 11:24:28.522249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-09 11:24:28.522257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-09-09 11:24:28.522264: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-09-09 11:24:28.522273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-09-09 11:24:28.522281: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-09-09 11:24:28.522289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-09-09 11:24:28.522296: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-09 11:24:28.522322: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.522517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.522695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
GPUs [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2020-09-09 11:24:28.572536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.572786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 268.26GiB/s
2020-09-09 11:24:28.572843: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-09-09 11:24:28.572854: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-09-09 11:24:28.572862: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-09-09 11:24:28.572888: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-09-09 11:24:28.572920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-09-09 11:24:28.572942: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-09-09 11:24:28.572950: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-09 11:24:28.572981: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.573183: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.573364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-09 11:24:28.573406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 11:24:28.573411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-09-09 11:24:28.573415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-09-09 11:24:28.573458: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.573663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-09 11:24:28.573852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5052 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
skipping conv2d_74
skipping conv2d_66
skipping conv2d_58
2020-09-09 11:24:45.025499: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-09 11:24:45.477260: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-09-09 11:24:45.489178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 926, in conv2d
"dilations", dilations)
tensorflow.python.eager.core._FallbackException: Expecting int64_t value for attr strides, got numpy.int32
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 195, in
main()
File "train.py", line 150, in main
results = train_step(image_data, target)
File "train.py", line 88, in train_step
pred_result = yolo(image_data, training=True)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 719, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 888, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 207, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1106, in call
return self.conv_op(inp, filter)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 638, in call
return self.call(inp, filter)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 237, in call
name=self.name)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2014, in conv2d
name=name)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 933, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1022, in conv2d_eager_fallback
ctx=ctx, name=name)
File "/home/swastik/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
.
.
.
Also, I should mention that earlier the model was training on my CPU after showing the error (same as #45 ). Now it just fails.