Giter VIP home page Giter VIP logo

Comments (8)

geaxgx avatar geaxgx commented on July 18, 2024

Hi, lm_input_length x lm_input_length (resp. pd_input_length x pd_input_length) correspond to the shape of the image used as input by the landmark (resp. pose detection) neural network. So, for instance, the landmark NN is fed with 256x256 images. The raw outputs of the NN are landmark coordinates between 0 and 256 which are then normalized between 0 and 1 by dividing them by 256.

from depthai_blazepose.

bruno-darochac avatar bruno-darochac commented on July 18, 2024

Hi, thanks for your answer !

I just want to understand someting. You have in parameter, the possibility to set or not the xyz mode. But at least, without turning on this mode, you got the depth of the landmarks in the image. But what I don't understand is that you seems to create the stereo capture only when the xyz mode is on on the function create_pipeline.

So my questions are, how did you manage to get de depth of each landmarks ? And in approach to my usecase, is it possible to get the depth raw data in your program or should I create a new pipeline ?

from depthai_blazepose.

geaxgx avatar geaxgx commented on July 18, 2024

This is what I explain here: https://github.com/geaxgx/depthai_blazepose#inferred-3d-vs-measured-3d

The important point to understand is that the mediapipe landmark model is able to infer 3D landmarks from 2D images (I call these as "inferred 3D landmarks"). That may sound like a bit of magic, but I guess google used synthetic data to train their model. Of course, you can't expect real accuracy from these landmarks. Also these landmarks are not absolute (the model can't tell you that for instance the left shoulder is at 5 meters from the camera) but relative to a reference point, the middle point between the hips (the model can estimate a relative delta x,y,z relative to the mid hips).
Now, if you set the xyz mode when running my script, you can IN ADDITION to the inferred 3D landmarks, get from the depth raw data, the real 3D absolute position of this reference point. So you have on one side the body landmarks relative to the reference point and on the other side the absolute position of the reference point. By combining these 2 types of information, you can get an estimatation of the 3D absolute position of each landmark.
This combination task is done there:

if self.show_3d == "mixed":

            if self.show_3d == "mixed":  
                if body.xyz_ref:
                    """
                    Beware, the y value of landmarks_world coordinates is negative for landmarks 
                    above the mid hips (like shoulders) and negative for landmarks below (like feet).
                    The y value of (x,y,z) coordinates given by depth sensor is negative in the lower part
                    of the image and positive in the upper part.
                    """
                    translation = body.xyz / 1000
                    translation[1] = -translation[1]
                    if body.xyz_ref == "mid_hips":                   
                        points = points + translation
                    elif body.xyz_ref == "mid_shoulders":
                        mid_hips_to_mid_shoulders = np.mean([
                            points[mpu.KEYPOINT_DICT['right_shoulder']],
                            points[mpu.KEYPOINT_DICT['left_shoulder']]],
                            axis=0) 
                        points = points + translation - mid_hips_to_mid_shoulders   

body.xyz contains the absolute 3D position of the reference point as measured from the raw depth data.
The reference point is the mid hips whenever the mid hips is visible in the image, otherwise we use the middle of the shoulders as the reference point.

So, note that by setting the xyz mode, my script configure the pipeline to get the depth raw data, but I rely on this depth data to measure the position of only one point, the reference point. FYI, the first link above explain why I cannot directly measure from this depth data the absolute position of every landmark.

I hope I managed to make things a bit clearer :-)

from depthai_blazepose.

bruno-darochac avatar bruno-darochac commented on July 18, 2024

Many thanks ! Your explaination were perfectly clear :-D

And your script inspire me for making something more accurate for my use case. And now I understand better the workflow of your script 👍

I got a last question, have you try to manage the little clitch that the model have ? Like maybe tresholding the changes between two frame to avoid those little clitch ? And if I want to try to make something is this in your template_manager_script.py that I should take a look ?

from depthai_blazepose.

geaxgx avatar geaxgx commented on July 18, 2024

I am currently on holydays, only reading mails once in a while. What do you mean by "clitch" ? If you mean that the drawn landmarks are not superimposed on their corresponding body parts (specially noticeable with the head landmarks), I am afraid that it can't be improved. It is due to the process of converting the original tflite model into openvino float 16 model, whare precision and accuracy have been lost.

from depthai_blazepose.

bruno-darochac avatar bruno-darochac commented on July 18, 2024

Oh ! enjoy your vacations ! And thanks to take time to answer me.

What I mean by clitch is that even if the body is static, the landmarks move like for 1 or 2 pixel between t and t+1 frames, and it looks like unstable.

from depthai_blazepose.

geaxgx avatar geaxgx commented on July 18, 2024

Thanks :-)
You can try to play with the parameters of the smoothing filter:

self.filter_landmarks = mpu.LandmarksSmoothingFilter(

or
self.filter_landmarks = mpu.LandmarksSmoothingFilter(

from depthai_blazepose.

bruno-darochac avatar bruno-darochac commented on July 18, 2024

I'll give a look a this so ! Many thanks for your availability. I think that I have all the keys to continue my project aahah

I'll mention you in the thanks of my bachelor !
Enjoy your vacation :-)

from depthai_blazepose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.