Giter VIP home page Giter VIP logo

Comments (5)

glenn-jocher avatar glenn-jocher commented on May 29, 2024

@Hogushake hello! 😊 It looks like you're on the right track but need to adjust your approach to display masks for all detected objects in the Person class. The key adjustment is to iterate through all predictions (not just the first one) and overlay or combine their masks accordingly. Here's a modified snippet of your code:

import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO("yolov8n-seg.pt")
cap = cv2.VideoCapture("people.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    results = model.predict(frame, conf=0.3, classes=0)

    # Initialize a mask to accumulate all person masks
    combined_mask = np.zeros_like(frame[:, :, 0])

    # Loop through all detected objects and combine their masks
    for mask in results[0].masks.data:
        combined_mask += (mask.numpy() * 255).astype("uint8")

    # Ensure combined mask is binary
    combined_mask = np.clip(combined_mask, 0, 255)

    cv2.imshow("Combined Masks", combined_mask)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

This modification initializes combined_mask to accumulate the masks of all detected persons. Each mask from the predictions is added to this accumulation. Finally, make sure to apply np.clip to ensure the final mask remains in a valid range.

This should display masks for all detected Person objects with confidence above 0.3, as intended. Let me know if this helps or if you have further questions!

from ultralytics.

Hogushake avatar Hogushake commented on May 29, 2024

@glenn-jocher
Thank you so much for your quick reply😊

When I run the code you sent me, I get the following error:

combined_mask += (mask.numpy() * 255).astype("uint8")
ValueError: operands could not be broadcast together with shapes (360,640) (384,640) (360,640)

This error appears to occur because the output size of yolo is different from the size of the input data i think.
So I added one line of code to adjust the size and solved it!

for mask in results[0].masks.data:
    resized_mask = cv2.resize(mask.numpy(), (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
    combined_mask += (resized_mask * 255).astype("uint8")

The problem of Yolo's output size being different from the input size was mentioned in other questions, so I was aware of it in advance.
So, is there an adjustable internal parameter within yolo that makes the size of the result the same, without using an external function like cv2.resize?

from ultralytics.

glenn-jocher avatar glenn-jocher commented on May 29, 2024

Hey @Hogushake,

Great observation on the size discrepancy! 😊 The model indeed outputs a mask that matches its input size, which might differ from your original video frame size if it got resized during inference.

Adding cv2.resize, as you've done, is currently the recommended practice to match the output mask dimensions with that of the input frame. YOLOv8 does not include a built-in parameter to auto-adjust the output size to match the original unaltered input size directly.

Your modification using cv2.resize seems apt for ensuring dimension consistency across different processing stages. If any further adjustments are needed or you encounter more issues, feel free to reach out again!

Happy coding! πŸš€

from ultralytics.

Hogushake avatar Hogushake commented on May 29, 2024

Thank you for your reply.
Like the code above, we resize for mask.numpy, but the mask size does not match as shown in the picture.
327659679-a5013972-503d-4ffc-a721-24c3fd209532
The bottom of the binary mask is not recognized.
Is there a problem with the code?

from ultralytics.

glenn-jocher avatar glenn-jocher commented on May 29, 2024

Hey there!

It looks like the issue might be due to how the resizing is handled, particularly with binary masks where nearest neighbor interpolation preserves the binary nature. Here's a small tweak to your approach which ensures that the resizing keeps the binary properties of the mask right:

for mask in results[0].masks.data:
    resized_mask = cv2.resize((mask.numpy() * 255).astype('uint8'), (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
    combined_mask += resized_mask

This should ensure that the resizing does not introduce any unintended changes in the mask values. If you're still facing challenges, please ensure that your frame and mask sizes are printed out correctly before and after resizing to help debug the issue effectively.

Happy coding! 😊

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.