Comments (3)
@abelBEDOYA hello,
Thank you for providing detailed information and screenshots regarding your batch inference issue. To help us investigate further, could you please share a minimal reproducible code example? This will allow us to replicate the issue on our end. You can refer to our guide on creating a minimal reproducible example here: Minimum Reproducible Example.
Additionally, please ensure that you are using the latest versions of torch
and ultralytics
. If not, kindly update your packages and try running your tests again to see if the issue persists.
Regarding your observations, it's important to note that while batch inference can offer speed improvements, the actual performance gain depends on various factors such as GPU memory bandwidth, the complexity of the model, and the overhead of batching operations. The linear increase in time you are observing might be due to these factors.
Here's a quick example of how you can perform batch inference using the ultralytics
library:
from ultralytics import YOLO
import torch
import time
# Load the YOLOv8 model
model = YOLO("yolov8n.pt")
# Prepare a batch of images
images = [torch.randn(3, 640, 640) for _ in range(8)] # Example batch of 8 images
# Measure inference time for batch processing
start_time = time.time()
results = model.predict(images)
end_time = time.time()
print(f"Batch inference time: {end_time - start_time} seconds")
Feel free to adjust the batch size and image dimensions as per your requirements. If you continue to experience issues, please share the code you are using for both the loop and batch inference tests.
Looking forward to your response!
from ultralytics.
Hi, I've been testing your point and there is no difference between batch inference and a simple loop through list of images when it comes to spent time. These are the measures,
This plot is the outcome of this script (minimal reproducible code example)
from ultralytics import YOLO
import torch
import time
import numpy as np
import matplotlib.pyplot as plt
# Load the YOLOv8 model
model = YOLO("yolov8m.pt")
nn = [2,3,4,5,6,7,8,9,11,13,16,19,24,29,34,40,47,56]
n_samples = 5
tt_batch = []
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))
tt_loop = []
## LOOPING INFERENCE:
for n in nn:
t_ = []
for _ in range(n_samples):
images = [torch.sigmoid(torch.randn(1, 3, 640, 640)) for _ in range(n)]
start_time = time.time()
for img in images:
results = model.predict(img, verbose=False)
end_time = time.time()
t_.append(end_time-start_time)
t_m = np.mean(t_)
tt_loop.append(t_m)
print(n,': ', t_m)
model = YOLO("yolov8m.pt")
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))
## BATCH INFERENCE:
for n in nn:
t_ = []
parar = False
for _ in range(n_samples):
images = torch.randn(n, 3, 640, 640) #[torch.randn(n, 3, 640, 640) for _ in range(n)] # Example batch of 8 images
images = torch.sigmoid(images)
start_time = time.time()
try:
results = model.predict(images, verbose=False)
except:
parar = True
break
end_time = time.time()
t_.append(end_time-start_time)
if parar:
break
t_m = np.mean(t_)
tt_batch.append(t_m)
print(n,': ', t_m)
plt.plot(nn, tt_loop, label='looping', color = 'r')
plt.plot(nn, tt_loop, 'o', color = 'r')
plt.plot(nn[:len(tt_batch)], tt_batch, label='batch_inference', color = 'blue')
plt.plot(nn[:len(tt_batch)], tt_batch, 'o', color = 'blue')
plt.legend(loc='best', frameon=True)
plt.xlabel('number of images')
plt.ylabel('procesing time')
plt.show()
from ultralytics.
Hi @abelBEDOYA,
Thank you for providing a detailed minimal reproducible example and the results of your tests. It's great to see such thorough investigation! 😊
From your observations, it appears that the batch inference time scales linearly with the number of images, similar to looping through individual images. This behavior can be influenced by several factors, including GPU memory bandwidth, the overhead of batching operations, and the specific implementation details of the model and inference engine.
Here are a few points to consider:
-
Batch Size and GPU Utilization: The efficiency of batch processing can vary depending on the batch size and the GPU's ability to handle multiple images simultaneously. Smaller batch sizes might not fully utilize the GPU, while larger batch sizes could lead to memory bottlenecks.
-
Model Complexity: The complexity of the YOLOv8 model can also impact the performance gains from batching. More complex models might not see as significant speedups from batching due to the overhead of managing larger tensors.
-
Inference Engine: The underlying inference engine (PyTorch in this case) might have optimizations that affect how batch processing is handled compared to individual image processing.
To further investigate, you might want to try the following:
- Experiment with Different Batch Sizes: Test with varying batch sizes to see if there's an optimal size that provides better performance.
- Profile GPU Utilization: Use tools like NVIDIA's
nvidia-smi
to monitor GPU utilization and memory usage during batch and looped inference to identify any bottlenecks. - TensorRT Export: Consider exporting your model to TensorRT for potentially better batch inference performance. TensorRT optimizes models for NVIDIA GPUs and can provide significant speedups. You can find more details on exporting to TensorRT here.
Here's a quick example of how to export to TensorRT and run inference:
from ultralytics import YOLO
# Load the YOLOv8 model
model = YOLO("yolov8m.pt")
# Export the model to TensorRT format
model.export(format="engine") # creates 'yolov8m.engine'
# Load the exported TensorRT model
tensorrt_model = YOLO("yolov8m.engine")
# Run batch inference
images = torch.randn(8, 3, 640, 640) # Example batch of 8 images
results = tensorrt_model.predict(images)
I hope this helps! If you have any further questions or need additional assistance, feel free to ask. We're here to help! 😊
from ultralytics.
Related Issues (20)
- FedAvg with YOLO HOT 6
- YOLOv8, v10, RT-DETR albumentation do not apply HOT 5
- How can i train better my project ? YOLOV8 HOT 14
- Codebase for running YoloV10 with ONNX HOT 8
- xywh returns wrong result while xyxy returns right result HOT 1
- Support distributed evaluation during training process HOT 1
- Is there an example of yolov8n-segn Android split HOT 2
- @glenn-jocher tracker is not working for custom trained models,
- multi input video to YOLOv8 and using bytetrack.yaml return same ID to different object and keep increasing HOT 2
- The engine model RTX3060 exported by RTX4070 cannot be inferred HOT 3
- YOLO(model_yaml).load(model.pt) not work. HOT 5
- Exporting after training on YoloV10 raise a ValueError with MultiGPU HOT 7
- Yolov8 classifier training: impossible to disable some augmentation options HOT 5
- Decoupled Head in YOLOv8 HOT 5
- How to increase the weight of segmentation loss in a segmentation task? HOT 11
- Why is the performance of detection task better than segmentation task? HOT 8
- Permission Denied Error in the middle/end of training. HOT 5
- Show the true label HOT 1
- The confidence difference of pt and onnx model on yolov9. HOT 3
- About Detection Speed YOLOV8 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.