Comments (13)
π Hello @Abdullahsuheyl, thank you for your interest in Ultralytics YOLOv8 π! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.
If this is a π Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training β Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Join the vibrant Ultralytics Discord π§ community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
Install
Pip install the ultralytics
package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.
pip install ultralytics
Environments
YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
from ultralytics.
Hello! This error usually occurs when there's a mismatch between the CUDA, cuDNN versions, and the PyTorch build you're using. Here's a couple of suggestions to resolve this:
-
Ensure Compatibility: Make sure the versions of CUDA, cuDNN, and PyTorch are compatible. PyTorch
2.3.0+cu118
is built for CUDA 11.8, which seems to align with your set up, but double-check that cuDNN is also compatible with CUDA 11.8. -
Reinstall Dependencies: Sometimes, a clean reinstallation of the CUDA, cuDNN, and PyTorch libraries resolves these issues.
-
Test Installation: Test your CUDA installation with the basic 'hello world' for CUDA to ensure your GPU setup is correctly configured.
If you continue facing issues, posting details of your environment can help further diagnose the problem.
python -m torch.utils.collect_env
This command will display detailed environment information that could be useful to troubleshoot further. If the problem persists, consider reporting this with the output from the above command in the Ultralytics GitHub issues section, so more targeted help can be provided. π οΈ
from ultralytics.
I was getting this error. I will tried downgrading CUDA from 12.4 (below) to 12.1 as you suggested and now it's fine.
If you continue facing issues, posting details of your environment can help further diagnose the problem.
python -m torch.utils.collect_envThis command will display detailed environment information that could be useful to troubleshoot further. If the problem persists, consider reporting this with the output from the above command in the Ultralytics GitHub issues section, so more targeted help can be provided. π οΈ
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Linux Mint 21.3 (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro P1000
Nvidia driver version: 535.161.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 13
CPU max MHz: 4600.0000
CPU min MHz: 800.0000
BogoMIPS: 5199.98
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Virtualisation: VT-x
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 1.5 MiB (6 instances)
L3 cache: 12 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Mitigation; TSX disabled
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0
[pip3] torchaudio==2.3.0
[pip3] torchvision==0.18.0
[pip3] triton==2.3.0
[conda] Could not collect
from ultralytics.
Glad to hear that downgrading CUDA resolved your issue! Sometimes pinpointing the right combination of CUDA and PyTorch versions can indeed solve these tricky compatibility problems. π
If anything else comes up or you have further questions, feel free! Happy coding! π
from ultralytics.
@glenn-jocher If we are receiving this warning will it effect training? I am also getting this warning. Here is my system's output:
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: CentOS Stream 9 (x86_64)
GCC version: (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.34
Python version: 3.9.18 (main, Jan 24 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (64-bit runtime)
Python platform: Linux-5.14.0-437.el9.x86_64-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
Nvidia driver version: 550.54.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8352M CPU @ 2.30GHz
CPU family: 6
Model: 106
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
Stepping: 6
CPU(s) scaling MHz: 97%
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 4600.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 3 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 80 MiB (64 instances)
L3 cache: 96 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0
[pip3] torchvision==0.18.0
[pip3] triton==2.3.0
[conda] Could not collect
from ultralytics.
Hey there! π The warning you're seeing typically indicates a potential mismatch or an unsupported configuration between your CUDA/cuDNN setup and PyTorch. However, it shouldn't directly affect the training process unless it leads to a crash or other runtime errors.
If your training is proceeding without interruptions and the model's performance metrics are as expected, you can generally continue training. Just keep an eye on any unusual behaviors or performance drops.
If you want to dive deeper or eliminate the warning, you might consider aligning your CUDA and cuDNN versions more closely with the PyTorch build, or even trying a different build of PyTorch that's tailored to your specific CUDA version.
Happy training, and let us know if anything else pops up! π
from ultralytics.
from ultralytics.
Hello! π The warning you're encountering is related to a compatibility issue between CUDA/cuDNN and PyTorch. It's often not critical unless it causes crashes or affects performance. If your training runs smoothly and the metrics look good, you might continue as is. However, ensuring that your CUDA, cuDNN, and PyTorch versions are fully compatible can help avoid such warnings. You can check the compatibility here.
If the issue persists or affects your workflow, consider testing with a different version of PyTorch that matches your CUDA setup more closely. Here's how you can install a specific version of PyTorch:
pip install torch==x.x.x+cuXXX -f https://download.pytorch.org/whl/torch_stable.html
Replace x.x.x
and cuXXX
with the desired version and CUDA version. Keep us posted on how it goes! π
from ultralytics.
I am also getting this warning, but seems that my first train epoch stop later that:
Epoch 0: 0%| | 0/13 [00:00<?, ?it/s]
/home/racquel-knust/miniconda3/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
/home/racquel-knust/miniconda3/lib/python3.11/site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
from ultralytics.
@racquelknustdomingues hello! π
The warning you're encountering is related to a compatibility issue between CUDA/cuDNN and PyTorch. This can sometimes cause training to halt unexpectedly. Here are a few steps you can take to address this:
-
Check Compatibility: Ensure that your CUDA, cuDNN, and PyTorch versions are compatible. You can find the compatibility matrix on the PyTorch website.
-
Update or Downgrade: If there's a mismatch, consider updating or downgrading your CUDA/cuDNN or PyTorch versions. For example, if you're using CUDA 12.1, make sure your PyTorch build supports it.
-
Reinstall Dependencies: Sometimes, a clean reinstallation of CUDA, cuDNN, and PyTorch can resolve these issues.
-
Use a Different Backend: You can try switching to a different backend by setting the environment variable
TORCH_CUDNN_V8_API_ENABLED=0
to disable the cuDNN v8 API.
Here's a quick example of how to set this environment variable in your script:
import os
os.environ['TORCH_CUDNN_V8_API_ENABLED'] = '0'
import torch
# Your training code here
- Fallback to CPU: If the issue persists and you need to continue training, you can temporarily switch to CPU by setting the device to 'cpu' in your training script:
device = torch.device('cpu')
model.to(device)
If you continue to face issues, please share more details about your environment and setup. We're here to help! π
Happy training! π
from ultralytics.
@whittenator Hello, excuse meοΌI got results similar to yours by using python -m torch.utils.collect_env
, Have you solved the problem of cuDNN version: Could not collect
? and How was it resolved? Looking forward to your reply!
from ultralytics.
@whittenator Hello, excuse meοΌI got results similar to yours by using
python -m torch.utils.collect_env
, Have you solved the problem ofcuDNN version: Could not collect
? and How was it resolved? Looking forward to your reply!
I solved this by manually installing cuDNN 8.9.7 in Linux, but the cudnn version in torch environment is not matched:
print("cuDNN version:", torch.backends.cudnn.version())
cuDNN version: 8902
This indicates that the cudnn version used by torch is 8.9.2. I think this is because we used the code for one click configuration on the PyTorch official website pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
. This will install cuDNN with Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
that using cuDNN 8.9.2.
For updating cuDNN to 8.9.7, I operated pip install nvidia-cudnn-cu12==8.9.7.29
, Then
print("cuDNN version:", torch.backends.cudnn.version())
cuDNN version: 8907
cuDNN version with torch is updated but UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
still exitst.
In addition, manually installing nvidia-cudnn-cu12==8.9.7.29 triggered a compatibility warning:
Warning
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.3.1+cu121 requires nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 8.9.7.29 which is incompatible.
Fortunately, it doesn't seem to have had any impact on my training so far.
from ultralytics.
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
from ultralytics.
Related Issues (20)
- How to use yolov8 algorithm combined with botsort algorithm to obtain video tracking results and obtain detailed tracking result files for subsequent tracking algorithm evaluationοΌ HOT 5
- YOLOv8-OBB - All the losses are 0 or nan during training and ) precision, recall metrics. HOT 5
- Training process killed sometimes with ultralytics v8.2.91 HOT 11
- Region counter HOT 3
- Explanation of Parameter pretrained=True HOT 3
- Retraining from best.pt or last.pt? HOT 4
- θΎεΊζ ηΎε°δΊεΎηζ°ι HOT 2
- why conduct the method `masked_fill_` in function `select_topk_candidates`, in other words Why does it appear invalid boxes ? HOT 2
- How to Use Gradient Accumulation with YOLOv8n for Larger Batch Sizes HOT 6
- The parameters BOX(P, R, mAP50, mAP50-95) are always 0 when the batch value is high. HOT 15
- AttributeError: 'dict' object has no attribute 'box' when trying to train YOLOv8 on custom trainer using v8DetectionLoss. HOT 2
- Can we add null images (images without any label) in training and validation set for training any model? HOT 7
- Set model parameters occurs no change HOT 6
- @dhrhkddns Provide the code you used to export, load and predict. HOT 2
- Custom YOLOv8 Architecture - Fine Tuning vs. Training from scratch HOT 9
- Code Issue HOT 4
- Tuning with Raytune doesn't save checkpoints nor plots HOT 9
- yolov8 has removed the Object Score branch, but how does it distinguish foreground from background? HOT 2
- Instance segmentation from Human Pose Estimation HOT 4
- How are background images processed (where there are no marked-up objects)? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.