Giter VIP home page Giter VIP logo

Comments (3)

glenn-jocher avatar glenn-jocher commented on July 23, 2024

@HonestyBrave hi there! 👋

Thank you for reaching out and providing a detailed description of the issue you're encountering with non_max_suppression (NMS) slowing down over time. Let's work through this together.

Initial Checks

  1. Reproducible Example: To better assist you, could you please provide a minimum reproducible code example? This will help us replicate the issue on our end and investigate further. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

  2. Library Versions: Ensure that you are using the latest versions of torch and ultralytics. You can upgrade your packages using the following commands:

    pip install --upgrade torch ultralytics

Potential Causes and Solutions

The slowdown you're experiencing could be due to several factors, including memory leaks or GPU memory fragmentation. Here are a few steps you can take to diagnose and potentially resolve the issue:

  1. Memory Management: Ensure that you are properly managing GPU memory. You can try clearing the cache periodically using:

    import torch
    torch.cuda.empty_cache()
  2. Batch Processing: If you're processing a large number of images in batches, ensure that each batch is handled independently to avoid memory buildup. For example:

    for batch in batches:
        outputs = model(batch)
        outputs = outputs[0].cpu()
        preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168)
        torch.cuda.empty_cache()
  3. Profiling: Use profiling tools to identify bottlenecks in your code. You can use torch.profiler to get detailed insights into where the time is being spent.

Example Code

Here's a modified version of your code snippet with some of the suggestions applied:

import torch
from ultralytics import YOLO
import time

# Load model
model = YOLO('path/to/weights.pt').to('cuda:0')

# Process batches
for batch in batches:
    start_time = time.time()
    outputs = model(batch)
    outputs = outputs[0].cpu()
    preds = ops.non_max_suppression(outputs, conf_thres=0.2, iou_thres=0.45, agnostic=False, max_det=168)
    print(f'NMS time: {time.time() - start_time}')
    torch.cuda.empty_cache()

Next Steps

Please try the above suggestions and let us know if the issue persists. If it does, providing the minimum reproducible example will be crucial for us to dive deeper into the problem.

Thank you for your patience and cooperation. We're here to help! 😊

from ultralytics.

HonestyBrave avatar HonestyBrave commented on July 23, 2024

thank you for you reply in time, much pleasure.

i find is my server problem, i run the same code in other server, it not have the problem, but add "torch.cuda.empty_cache()" will add about 200ms in 2080Ti server. thank you again for you reply!

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 23, 2024

Hi @HonestyBrave,

Thank you for the update! 😊 I'm glad to hear that running the code on a different server resolved the issue. It sounds like the initial server might have had some underlying hardware or configuration problems affecting performance.

Regarding the torch.cuda.empty_cache() function, it's true that while it helps manage GPU memory, it can introduce a slight overhead. If you find that it adds significant delay, you might want to use it more sparingly or explore other memory management strategies.

If you have any more questions or run into other issues, feel free to reach out. We're here to help! 🚀

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.