Comments (3)
@HonestyBrave hi there! 👋
Thank you for reaching out and providing a detailed description of the issue you're encountering with non_max_suppression
(NMS) slowing down over time. Let's work through this together.
Initial Checks
-
Reproducible Example: To better assist you, could you please provide a minimum reproducible code example? This will help us replicate the issue on our end and investigate further. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.
-
Library Versions: Ensure that you are using the latest versions of
torch
andultralytics
. You can upgrade your packages using the following commands:pip install --upgrade torch ultralytics
Potential Causes and Solutions
The slowdown you're experiencing could be due to several factors, including memory leaks or GPU memory fragmentation. Here are a few steps you can take to diagnose and potentially resolve the issue:
-
Memory Management: Ensure that you are properly managing GPU memory. You can try clearing the cache periodically using:
import torch torch.cuda.empty_cache()
-
Batch Processing: If you're processing a large number of images in batches, ensure that each batch is handled independently to avoid memory buildup. For example:
for batch in batches: outputs = model(batch) outputs = outputs[0].cpu() preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168) torch.cuda.empty_cache()
-
Profiling: Use profiling tools to identify bottlenecks in your code. You can use
torch.profiler
to get detailed insights into where the time is being spent.
Example Code
Here's a modified version of your code snippet with some of the suggestions applied:
import torch
from ultralytics import YOLO
import time
# Load model
model = YOLO('path/to/weights.pt').to('cuda:0')
# Process batches
for batch in batches:
start_time = time.time()
outputs = model(batch)
outputs = outputs[0].cpu()
preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168)
print(f'NMS time: {time.time() - start_time}')
torch.cuda.empty_cache()
Next Steps
Please try the above suggestions and let us know if the issue persists. If it does, providing the minimum reproducible example will be crucial for us to dive deeper into the problem.
Thank you for your patience and cooperation. We're here to help! 😊
from ultralytics.
thank you for you reply in time, much pleasure.
i find is my server problem, i run the same code in other server, it not have the problem, but add "torch.cuda.empty_cache()" will add about 200ms in 2080Ti server. thank you again for you reply!
from ultralytics.
Hi @HonestyBrave,
Thank you for the update! 😊 I'm glad to hear that running the code on a different server resolved the issue. It sounds like the initial server might have had some underlying hardware or configuration problems affecting performance.
Regarding the torch.cuda.empty_cache()
function, it's true that while it helps manage GPU memory, it can introduce a slight overhead. If you find that it adds significant delay, you might want to use it more sparingly or explore other memory management strategies.
If you have any more questions or run into other issues, feel free to reach out. We're here to help! 🚀
from ultralytics.
Related Issues (20)
- Cannot get bounding boxes but `show` can still display the detected objects HOT 2
- Oriented Bounding Boxes for Cross Detection HOT 7
- Training a model using ARM64 devices utilizes only one core HOT 15
- Add hardware support for ARM64 NPUs (Hailo8L or RK3855 NPU) HOT 1
- Deployment of training nodes in a Kuberentes Cluster HOT 5
- yolo_world HOT 2
- The problem of weight transfer in YOLOv8s backbone HOT 36
- Export - Ultralytics YOLOv8 model to TFJS HOT 3
- Application of SAHI in YOLOV8-OBB mission HOT 1
- Frame drop when increasing the number of streams HOT 4
- How to ReID a person and visualize his route across multiple cameras in live time HOT 2
- How to train YOLOV9 with this project? HOT 1
- Not displaying the RGB frame as soon as code runs and lagging when there is no object detected HOT 3
- Failed to train on AMD GPU (RCOM enabled and validated) HOT 3
- Convert YOLO models to Torchscript GPU Half Precision HOT 3
- Two questions about 'yolov8-rtdetr' HOT 1
- Error in TensorFlow Lite export for YOLOv8 model HOT 5
- Error occurred while running the code to generate COCO-test-dev2017 HOT 11
- How is the YOLOV8 encryption model implemented? HOT 1
- train question HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.