Hello i tried to install the docker-gpu images, i've download the weights and the data

Hi, thanks for reporting the issue. I think I know the <a href="http

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

cudnn failed to initialyze,about accenture/air

Comments (7)

Hyper5phere commented on May 27, 2024

Hi, thanks for reporting the issue.

I think I know the reason for this, but I don't currently have time to test it properly.

Could you try editing your infer.sh like this

...
python3 keras_retinanet/keras_retinanet/bin/infer.py \
    --gpu 0 \
    --backbone resnet152 \
...

and report if the problem persists? So just add that --gpu 0 argument to the python script call.

I'll push a fix later if this worked.

from air.

mrba59 commented on May 27, 2024

Thanks for your fast reply , so i added the --gpu 0 option, but know i have memory error

Using TensorFlow backend.
2022-03-03 14:58:30.312397: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-03-03 14:58:30.343955: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499950000 Hz
2022-03-03 14:58:30.344621: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x595f530 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-03 14:58:30.344634: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-03-03 14:58:30.346912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-03-03 14:58:30.445422: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.445649: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56ec400 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-03-03 14:58:30.445689: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce GTX 1650 Ti with Max-Q Design, Compute Capability 7.5
2022-03-03 14:58:30.445993: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.446098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: NVIDIA GeForce GTX 1650 Ti with Max-Q Design major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2022-03-03 14:58:30.447122: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-03-03 14:58:30.478695: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-03-03 14:58:30.493179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-03-03 14:58:30.497231: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-03-03 14:58:30.528241: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-03-03 14:58:30.548327: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-03-03 14:58:30.619992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-03 14:58:30.620400: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.621051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.621384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-03-03 14:58:30.622065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-03-03 14:58:30.624780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-03 14:58:30.624791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2022-03-03 14:58:30.624796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2022-03-03 14:58:30.625270: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.625397: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-03 14:58:30.625530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3121 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1650 Ti with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loading model, this may take a second...
tracking <tf.Variable 'Variable:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_1:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_2:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_3:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_4:0' shape=(15, 4) dtype=float32> anchors
Running inference on image folder: /home/data/images/test
Running network: N/A% (0 of 4) |         | Elapsed Time: 0:00:00 ETA:  --:--:--2022-03-03 14:58:57.547785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-03 14:58:59.637530: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-03-03 14:59:00.303684: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:00.303723: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:00.703934: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.68GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:00.703967: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.68GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:01.966572: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:01.966603: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:02.188302: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:02.188326: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:02.234087: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:02.234108: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2022-03-03 14:59:04.119234: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:04.119585: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 967.39M (1014379264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:04.120012: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:09.894982: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:09.895344: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:24.072675: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:24.073058: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 1.05G (1127088128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Running network:  50% (2 of 4) |####     | Elapsed Time: 0:00:58 ETA:   0:00:322022-03-03 14:59:59.489396: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2022-03-03 14:59:59.520640: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.521148: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.521577: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.522053: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.522465: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.533667: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 14:59:59.533990: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 15:00:09.535761: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 15:00:09.537261: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-03 15:00:09.537327: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00MiB (rounded to 4194304).  Current allocation summary follows.
2022-03-03 15:00:09.537427: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): 	Total Chunks: 294, Chunks in use: 294. 73.5KiB allocated for chunks. 73.5KiB in use in bin. 10.9KiB client-requested in use in bin.
2022-03-03 15:00:09.537461: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): 	Total Chunks: 66, Chunks in use: 66. 33.0KiB allocated for chunks. 33.0KiB in use in bin. 33.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537490: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): 	Total Chunks: 361, Chunks in use: 361. 363.2KiB allocated for chunks. 363.2KiB in use in bin. 361.9KiB client-requested in use in bin.
2022-03-03 15:00:09.537518: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): 	Total Chunks: 62, Chunks in use: 62. 124.0KiB allocated for chunks. 124.0KiB in use in bin. 124.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537548: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): 	Total Chunks: 150, Chunks in use: 150. 606.5KiB allocated for chunks. 606.5KiB in use in bin. 600.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537579: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): 	Total Chunks: 18, Chunks in use: 18. 147.8KiB allocated for chunks. 147.8KiB in use in bin. 144.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537609: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): 	Total Chunks: 4, Chunks in use: 4. 64.0KiB allocated for chunks. 64.0KiB in use in bin. 64.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537638: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): 	Total Chunks: 4, Chunks in use: 4. 170.2KiB allocated for chunks. 170.2KiB in use in bin. 147.0KiB client-requested in use in bin.
2022-03-03 15:00:09.537666: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): 	Total Chunks: 20, Chunks in use: 20. 1.33MiB allocated for chunks. 1.33MiB in use in bin. 1.25MiB client-requested in use in bin.
2022-03-03 15:00:09.537693: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): 	Total Chunks: 13, Chunks in use: 13. 1.84MiB allocated for chunks. 1.84MiB in use in bin. 1.77MiB client-requested in use in bin.
2022-03-03 15:00:09.537724: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): 	Total Chunks: 48, Chunks in use: 48. 12.51MiB allocated for chunks. 12.51MiB in use in bin. 11.89MiB client-requested in use in bin.
2022-03-03 15:00:09.537760: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): 	Total Chunks: 37, Chunks in use: 37. 20.46MiB allocated for chunks. 20.46MiB in use in bin. 19.98MiB client-requested in use in bin.
2022-03-03 15:00:09.537817: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): 	Total Chunks: 221, Chunks in use: 221. 223.49MiB allocated for chunks. 223.49MiB in use in bin. 219.69MiB client-requested in use in bin.
2022-03-03 15:00:09.537874: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): 	Total Chunks: 159, Chunks in use: 159. 362.77MiB allocated for chunks. 362.77MiB in use in bin. 356.30MiB client-requested in use in bin.
2022-03-03 15:00:09.537914: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): 	Total Chunks: 20, Chunks in use: 20. 85.85MiB allocated for chunks. 85.85MiB in use in bin. 77.00MiB client-requested in use in bin.
2022-03-03 15:00:09.537953: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): 	Total Chunks: 15, Chunks in use: 15. 136.06MiB allocated for chunks. 136.06MiB in use in bin. 130.81MiB client-requested in use in bin.
2022-03-03 15:00:09.538000: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): 	Total Chunks: 4, Chunks in use: 4. 72.00MiB allocated for chunks. 72.00MiB in use in bin. 72.00MiB client-requested in use in bin.
2022-03-03 15:00:09.538058: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): 	Total Chunks: 3, Chunks in use: 3. 105.15MiB allocated for chunks. 105.15MiB in use in bin. 70.97MiB client-requested in use in bin.
2022-03-03 15:00:09.538093: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-03 15:00:09.538121: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-03 15:00:09.538155: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-03 15:00:09.538186: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 4.00MiB was 4.00MiB, Chunk State: 
2022-03-03 15:00:09.538220: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 536870912
2022-03-03 15:00:09.538257: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00000000 next 1194 of size 524288
2022-03-03 15:00:09.538296: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00080000 next 1195 of size 256
2022-03-03 15:00:09.538337: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00080100 next 1196 of size 256
2022-03-03 15:00:09.538378: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00080200 next 1197 of size 1024
2022-03-03 15:00:09.538416: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00080600 next 1198 of size 589824
2022-03-03 15:00:09.538454: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e00110600 next 1199 of size 419430

....

2022-03-03 15:00:09.543402: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e78a48d00 next 379 of size 1024
2022-03-03 15:00:09.543412: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e78a49100 next 380 of size 65536
2022-03-03 15:00:09.543422: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e78a59100 next 382 of size 4096
2022-03-03 15:00:09.543430: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e78a5a100 next 383 of size 4096size 4096
2022-03-03 15:00:09.545968: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e7ac0bc00 next 791 of size 1024
2022-03-03 15:00:09.545972: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e7ac0c000 next 792 of size 1024
2022-03-03 15:00:09.545976: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e7ac0c400 next 794 of size 4096
2022-03-03 15:00:09.545980: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e7ac0d400 next 795 of size 1024
2022-03-03 15:00:09.545984: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f0e7ac0d800 next 796 of 
2022-03-03 15:00:09.548304: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 214 Chunks of size 1048576 totalling 214.00MiB
2022-03-03 15:00:09.548312: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1309696 totalling 1.25MiB
2022-03-03 15:00:09.548319: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 4 Chunks of size 1310720 totalling 5.00MiB
2022-03-03 15:00:09.548326: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1507328 totalling 1.44MiB
2022-03-03 15:00:09.548333: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1887488 totalling 1.80MiB
2022-03-03 15:00:09.548340: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 11 Chunks of size 2097152 totalling 22.00MiB
2022-03-03 15:00:09.548347: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 138 Chunks of size 2359296 totalling 310.50MiB
2022-03-03 15:00:09.548355: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 4 Chunks of size 2764800 totalling 10.55MiB
2022-03-03 15:00:09.548362: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 4 Chunks of size 3145728 totalling 12.00MiB
2022-03-03 15:00:09.548370: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 3970816 totalling 3.79MiB
2022-03-03 15:00:09.548376: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 4128768 totalling 3.94MiB
2022-03-03 15:00:09.548384: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 15 Chunks of size 4194304 totalling 60.00MiB
2022-03-03 15:00:09.548391: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 4562688 totalling 4.35MiB
2022-03-03 15:00:09.548398: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 4718592 totalling 9.00MiB
2022-03-03 15:00:09.548406: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 6291456 totalling 6.00MiB
2022-03-03 15:00:09.548412: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 6815744 totalling 6.50MiB
2022-03-03 15:00:09.548419: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 8388608 totalling 24.00MiB
2022-03-03 15:00:09.548427: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 9175040 totalling 8.75MiB
2022-03-03 15:00:09.548432: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 9233408 totalling 8.81MiB
2022-03-03 15:00:09.548440: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 9 Chunks of size 9437184 totalling 81.00MiB
2022-03-03 15:00:09.548447: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 14155776 totalling 13.50MiB
2022-03-03 15:00:09.548453: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 4 Chunks of size 18874368 totalling 72.00MiB
2022-03-03 15:00:09.548461: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 36673536 totalling 34.97MiB
2022-03-03 15:00:09.548467: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 36744704 totalling 35.04MiB
2022-03-03 15:00:09.548473: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 36841984 totalling 35.13MiB
2022-03-03 15:00:09.548480: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 1023.00MiB
2022-03-03 15:00:09.548486: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1072693248 memory_limit_: 3273523200 available bytes: 2200829952 curr_region_allocation_bytes_: 2147483648
2022-03-03 15:00:09.548495: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  3273523200
InUse:                  1072693248
MaxInUse:               2146434816
NumAllocs:                   61535
MaxAllocSize:           1032775680

2022-03-03 15:00:09.548546: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *********************************************x***x**************************************************
2022-03-03 15:00:09.548581: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:886 : Resource exhausted: OOM when allocating tensor with shape[2048,512,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 218, in <module>
    main()
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 195, in main
    profile=args.profile
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 250, in get_detections
    max_inflation_factor=max_inflation_factor
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 134, in run_inference_on_image
    boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))[:3]
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1580, in predict_on_batch
    outputs = self.predict_function(ins)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[2048,512,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node res5a_branch2c/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[filtered_detections/map/while/LoopCond/_9551]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[2048,512,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node res5a_branch2c/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

I have a GeForce gtx 1650 4Gb ,is my grahic card enough ?

from air.

Hyper5phere commented on May 27, 2024

It would seems like it. As a workaround, you can play with these parameters in infer.sh

--image_min_side 1525 \
--image_max_side 2025 \

Setting them lower (e.g. halving both) might fix the OOM problem, but this will naturally have a performance impact (detector accuracy goes down). I recommend setting them as high as your GPU can handle and keeping the same "aspect ratio".

Also you can try running infer.sh on CPU but it's going to be rather slow (though with only 5 test images, it's doable).

EDIT: Actually, a better idea would be to increase the following parameter

--image_tiling_dim 4 \

something like 6 or 8 should do the trick. Currently, it tells the AIR detector to split the input image into 4x4 grid of overlapping tiles and process them individually to circumvent GPU memory limitations without sacrificing image resolution.

from air.

dinarakhay commented on May 27, 2024

Hello! I tried to run infer.sh in Docker on CPU and got the following:

root@docker-desktop:/home# /bin/bash infer.sh
Using TensorFlow backend.
2022-03-06 16:23:57.704343: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
qemu: uncaught target signal 6 (Aborted) - core dumped
infer.sh: line 32:    74 Aborted                 python3 keras_retinanet/keras_retinanet/bin/infer.py --backbone resnet152 --image_min_side 1525 --image_max_side 2025 --score_threshold 0.05 --max_detections 100000 --nms_threshold 0.25 --config $PWD/keras_retinanet/config.ini --anchor_scale 0.965 --convert_model true --image_tiling_dim 4 --nms_mode $BBA_MODE --model $PWD/models/$MODEL $IMAGE_FOLDER $PWD/data/predictions/${MODEL/.h5/}-$BBA_MODE-inference

Turned out, this was because I was using Apple M1 processor, and TensorFlow binary crashes on Apple M1 in x86_64 Docker container.

Is there any chance I could compile a Tensorflow from the source that does not use the AVX instruction set? Or maybe there is an already compiled one?

Would deeply appreciate any help.

from air.

mrba59 commented on May 27, 2024

Thank you for your reply.
Reducing the the min and max side worked for me

from air.

Hyper5phere commented on May 27, 2024

@dinarakhay Unfortunately I haven't yet tried to install the AIR detector on ARM64 architecture so I can't give you any exact advice. Although, I think it should be possible if you compile everything from source or build the docker container on the host machine. It might be possible to get the Dockerfile out of the keras-retinanet-gpu image provided by using alpine/dfimage and feeding that to docker build command on the ARM64 machine. Another way could be using the buildx command. Anyhow, you need to test these methods out. Also, I'd be eager to hear the results.

Although, maybe you need to be picky about the Tensorflow installation inside the container, not sure if Docker can resolve what instructions it will be compiled to use. You can try replacing the TF installation with tensorflow-aarch64. I'd try tensorflow-1.15.5-cp36-cp36m-manylinux_2_24_aarch64.whl from the releases page.

@mrba59 No problem, good to hear it worked out.

from air.

Hyper5phere commented on May 27, 2024

Closing this due to inactivity.

from air.

cudnn failed to initialyze about air HOT 7 CLOSED

Comments (7)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent