<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Batch inference speed same than looping through a bunch of imgs about ultralytics HOT 3 OPEN

abelBEDOYA commented on July 23, 2024 1

Batch inference speed same than looping through a bunch of imgs

from ultralytics.

Comments (3)

glenn-jocher commented on July 23, 2024

@abelBEDOYA hello,

Thank you for providing detailed information and screenshots regarding your batch inference issue. To help us investigate further, could you please share a minimal reproducible code example? This will allow us to replicate the issue on our end. You can refer to our guide on creating a minimal reproducible example here: Minimum Reproducible Example.

Additionally, please ensure that you are using the latest versions of torch and ultralytics. If not, kindly update your packages and try running your tests again to see if the issue persists.

Regarding your observations, it's important to note that while batch inference can offer speed improvements, the actual performance gain depends on various factors such as GPU memory bandwidth, the complexity of the model, and the overhead of batching operations. The linear increase in time you are observing might be due to these factors.

Here's a quick example of how you can perform batch inference using the ultralytics library:

from ultralytics import YOLO
import torch
import time

# Load the YOLOv8 model
model = YOLO("yolov8n.pt")

# Prepare a batch of images
images = [torch.randn(3, 640, 640) for _ in range(8)]  # Example batch of 8 images

# Measure inference time for batch processing
start_time = time.time()
results = model.predict(images)
end_time = time.time()

print(f"Batch inference time: {end_time - start_time} seconds")

Feel free to adjust the batch size and image dimensions as per your requirements. If you continue to experience issues, please share the code you are using for both the loop and batch inference tests.

Looking forward to your response!

from ultralytics.

abelBEDOYA commented on July 23, 2024

Hi, I've been testing your point and there is no difference between batch inference and a simple loop through list of images when it comes to spent time. These are the measures,

This plot is the outcome of this script (minimal reproducible code example)

from ultralytics import YOLO
import torch
import time
import numpy as np
import matplotlib.pyplot as plt

# Load the YOLOv8 model
model = YOLO("yolov8m.pt")
nn = [2,3,4,5,6,7,8,9,11,13,16,19,24,29,34,40,47,56]
n_samples = 5
tt_batch = []
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))
tt_loop = []

## LOOPING INFERENCE:
for n in nn:
    t_ = []
    for _ in range(n_samples):
        images = [torch.sigmoid(torch.randn(1, 3, 640, 640)) for _ in range(n)] 
        start_time = time.time()
        for img in images:
            results = model.predict(img, verbose=False)
        end_time = time.time()
        t_.append(end_time-start_time)

    t_m = np.mean(t_)
    tt_loop.append(t_m)
    print(n,': ', t_m)

model = YOLO("yolov8m.pt")
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))


## BATCH INFERENCE:
for n in nn:
    t_ = []
    parar = False
    for _ in range(n_samples):
        images = torch.randn(n, 3, 640, 640) #[torch.randn(n, 3, 640, 640) for _ in range(n)]  # Example batch of 8 images
        images = torch.sigmoid(images)
        start_time = time.time()
        try:
            results = model.predict(images, verbose=False)
        except:
            parar = True
            break
        end_time = time.time()
        t_.append(end_time-start_time)
    if parar:
        break
    t_m = np.mean(t_)
    tt_batch.append(t_m)
    print(n,': ', t_m)




plt.plot(nn, tt_loop, label='looping', color = 'r')
plt.plot(nn, tt_loop, 'o', color = 'r')
plt.plot(nn[:len(tt_batch)], tt_batch, label='batch_inference', color = 'blue')
plt.plot(nn[:len(tt_batch)], tt_batch, 'o', color = 'blue')
plt.legend(loc='best', frameon=True)
plt.xlabel('number of images')
plt.ylabel('procesing time')
plt.show()

from ultralytics.

glenn-jocher commented on July 23, 2024

Hi @abelBEDOYA,

Thank you for providing a detailed minimal reproducible example and the results of your tests. It's great to see such thorough investigation! 😊

From your observations, it appears that the batch inference time scales linearly with the number of images, similar to looping through individual images. This behavior can be influenced by several factors, including GPU memory bandwidth, the overhead of batching operations, and the specific implementation details of the model and inference engine.

Here are a few points to consider:

Batch Size and GPU Utilization: The efficiency of batch processing can vary depending on the batch size and the GPU's ability to handle multiple images simultaneously. Smaller batch sizes might not fully utilize the GPU, while larger batch sizes could lead to memory bottlenecks.
Model Complexity: The complexity of the YOLOv8 model can also impact the performance gains from batching. More complex models might not see as significant speedups from batching due to the overhead of managing larger tensors.
Inference Engine: The underlying inference engine (PyTorch in this case) might have optimizations that affect how batch processing is handled compared to individual image processing.

To further investigate, you might want to try the following:

Experiment with Different Batch Sizes: Test with varying batch sizes to see if there's an optimal size that provides better performance.
Profile GPU Utilization: Use tools like NVIDIA's nvidia-smi to monitor GPU utilization and memory usage during batch and looped inference to identify any bottlenecks.
TensorRT Export: Consider exporting your model to TensorRT for potentially better batch inference performance. TensorRT optimizes models for NVIDIA GPUs and can provide significant speedups. You can find more details on exporting to TensorRT here.

Here's a quick example of how to export to TensorRT and run inference:

from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO("yolov8m.pt")

# Export the model to TensorRT format
model.export(format="engine")  # creates 'yolov8m.engine'

# Load the exported TensorRT model
tensorrt_model = YOLO("yolov8m.engine")

# Run batch inference
images = torch.randn(8, 3, 640, 640)  # Example batch of 8 images
results = tensorrt_model.predict(images)

I hope this helps! If you have any further questions or need additional assistance, feel free to ask. We're here to help! 😊

from ultralytics.

Batch inference speed same than looping through a bunch of imgs about ultralytics HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent