<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Question about Show all objects of one class in image segmentation about ultralytics HOT 5 OPEN

Hogushake commented on May 29, 2024

Question about Show all objects of one class in image segmentation

from ultralytics.

Comments (5)

glenn-jocher commented on May 29, 2024

@Hogushake hello! 😊 It looks like you're on the right track but need to adjust your approach to display masks for all detected objects in the Person class. The key adjustment is to iterate through all predictions (not just the first one) and overlay or combine their masks accordingly. Here's a modified snippet of your code:

import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO("yolov8n-seg.pt")
cap = cv2.VideoCapture("people.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    results = model.predict(frame, conf=0.3, classes=0)

    # Initialize a mask to accumulate all person masks
    combined_mask = np.zeros_like(frame[:, :, 0])

    # Loop through all detected objects and combine their masks
    for mask in results[0].masks.data:
        combined_mask += (mask.numpy() * 255).astype("uint8")

    # Ensure combined mask is binary
    combined_mask = np.clip(combined_mask, 0, 255)

    cv2.imshow("Combined Masks", combined_mask)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

This modification initializes combined_mask to accumulate the masks of all detected persons. Each mask from the predictions is added to this accumulation. Finally, make sure to apply np.clip to ensure the final mask remains in a valid range.

This should display masks for all detected Person objects with confidence above 0.3, as intended. Let me know if this helps or if you have further questions!

from ultralytics.

Hogushake commented on May 29, 2024

@glenn-jocher
Thank you so much for your quick reply😊

When I run the code you sent me, I get the following error:

combined_mask += (mask.numpy() * 255).astype("uint8")
ValueError: operands could not be broadcast together with shapes (360,640) (384,640) (360,640)

This error appears to occur because the output size of yolo is different from the size of the input data i think.
So I added one line of code to adjust the size and solved it!

for mask in results[0].masks.data:
    resized_mask = cv2.resize(mask.numpy(), (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
    combined_mask += (resized_mask * 255).astype("uint8")

The problem of Yolo's output size being different from the input size was mentioned in other questions, so I was aware of it in advance.
So, is there an adjustable internal parameter within yolo that makes the size of the result the same, without using an external function like cv2.resize?

from ultralytics.

glenn-jocher commented on May 29, 2024

Hey @Hogushake,

Great observation on the size discrepancy! 😊 The model indeed outputs a mask that matches its input size, which might differ from your original video frame size if it got resized during inference.

Adding cv2.resize, as you've done, is currently the recommended practice to match the output mask dimensions with that of the input frame. YOLOv8 does not include a built-in parameter to auto-adjust the output size to match the original unaltered input size directly.

Your modification using cv2.resize seems apt for ensuring dimension consistency across different processing stages. If any further adjustments are needed or you encounter more issues, feel free to reach out again!

Happy coding! 🚀

from ultralytics.

Hogushake commented on May 29, 2024

Thank you for your reply.
Like the code above, we resize for mask.numpy, but the mask size does not match as shown in the picture.

The bottom of the binary mask is not recognized.
Is there a problem with the code?

from ultralytics.

glenn-jocher commented on May 29, 2024

Hey there!

It looks like the issue might be due to how the resizing is handled, particularly with binary masks where nearest neighbor interpolation preserves the binary nature. Here's a small tweak to your approach which ensures that the resizing keeps the binary properties of the mask right:

for mask in results[0].masks.data:
    resized_mask = cv2.resize((mask.numpy() * 255).astype('uint8'), (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
    combined_mask += resized_mask

This should ensure that the resizing does not introduce any unintended changes in the mask values. If you're still facing challenges, please ensure that your frame and mask sizes are printed out correctly before and after resizing to help debug the issue effectively.

Happy coding! 😊

from ultralytics.

Question about Show all objects of one class in image segmentation about ultralytics HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent