<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

non_max_suppression slow about ultralytics HOT 3 CLOSED

HonestyBrave commented on July 23, 2024

non_max_suppression slow

from ultralytics.

Comments (3)

glenn-jocher commented on July 23, 2024

@HonestyBrave hi there! 👋

Thank you for reaching out and providing a detailed description of the issue you're encountering with non_max_suppression (NMS) slowing down over time. Let's work through this together.

Initial Checks

Reproducible Example: To better assist you, could you please provide a minimum reproducible code example? This will help us replicate the issue on our end and investigate further. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.
Library Versions: Ensure that you are using the latest versions of torch and ultralytics. You can upgrade your packages using the following commands:
```
pip install --upgrade torch ultralytics
```

Potential Causes and Solutions

The slowdown you're experiencing could be due to several factors, including memory leaks or GPU memory fragmentation. Here are a few steps you can take to diagnose and potentially resolve the issue:

Memory Management: Ensure that you are properly managing GPU memory. You can try clearing the cache periodically using:
```
import torch
torch.cuda.empty_cache()
```

Batch Processing: If you're processing a large number of images in batches, ensure that each batch is handled independently to avoid memory buildup. For example:

for batch in batches:
    outputs = model(batch)
    outputs = outputs[0].cpu()
    preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168)
    torch.cuda.empty_cache()

Profiling: Use profiling tools to identify bottlenecks in your code. You can use torch.profiler to get detailed insights into where the time is being spent.

Example Code

Here's a modified version of your code snippet with some of the suggestions applied:

import torch
from ultralytics import YOLO
import time

# Load model
model = YOLO('path/to/weights.pt').to('cuda:0')

# Process batches
for batch in batches:
    start_time = time.time()
    outputs = model(batch)
    outputs = outputs[0].cpu()
    preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168)
    print(f'NMS time: {time.time() - start_time}')
    torch.cuda.empty_cache()

Next Steps

Please try the above suggestions and let us know if the issue persists. If it does, providing the minimum reproducible example will be crucial for us to dive deeper into the problem.

Thank you for your patience and cooperation. We're here to help! 😊

from ultralytics.

HonestyBrave commented on July 23, 2024

thank you for you reply in time, much pleasure.

i find is my server problem, i run the same code in other server, it not have the problem, but add "torch.cuda.empty_cache()" will add about 200ms in 2080Ti server. thank you again for you reply!

from ultralytics.

glenn-jocher commented on July 23, 2024

Hi @HonestyBrave,

Thank you for the update! 😊 I'm glad to hear that running the code on a different server resolved the issue. It sounds like the initial server might have had some underlying hardware or configuration problems affecting performance.

Regarding the torch.cuda.empty_cache() function, it's true that while it helps manage GPU memory, it can introduce a slight overhead. If you find that it adds significant delay, you might want to use it more sparingly or explore other memory management strategies.

If you have any more questions or run into other issues, feel free to reach out. We're here to help! 🚀

from ultralytics.

non_max_suppression slow about ultralytics HOT 3 CLOSED

Comments (3)

Initial Checks

Potential Causes and Solutions

Example Code

Next Steps

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent