Giter VIP home page Giter VIP logo

Comments (2)

vietanhdev avatar vietanhdev commented on August 26, 2024

Hello,
Our implementation is a modified version of the original model.
First, for identity_1, we don't know the exact purpose of this branch. This architecture is for tracking, so we guess that this branch predicts if there is a person in the image. I also verified this assumption by running the pre-trained model with the following code:

import tensorflow as tf
import cv2
import numpy as np

model = tf.keras.models.load_model('saved_model_full_pose_landmark_39kp')
cap = cv2.VideoCapture(0)

while True:
    _, origin = cap.read()
    img = cv2.resize(origin, (256, 256))
    img = img.astype(float)
    img = (img - 127) / 255
    img = np.array([img])

    heatmap, classify, regress = model.predict(img)
    confidence = np.reshape(classify, (1,))[0]
    print(confidence)

For identity_2, as explained here, they have 4 outputs for each keypoints:

x and y: Landmark coordinates normalized to [0.0, 1.0] by the image width and height respectively.
z: Should be discarded as currently the model is not fully trained to predict depth, but this is something on the roadmap.
visibility: A value in [0.0, 1.0] indicating the likelihood of the landmark being visible (present and not occluded) in the image.

That's why their output size is 4 * number_of_keypoints. In the pre-trained model we used to implement this repo, number_of_keypoints = 39, so we have 4 * 39 = 156 outputs. I removed z dimension from keypoints, the shape of the output is 3 * number_of_keypoints.

Another difference between our model and the original model is that the heatmap output of our model has the shape of (128, 128, number_of_keypoints) while the original model only has the shape of (128, 128, 1). We are using output from heatmap for the keypoints. In the future, we will modify this design.

from tf-blazepose.

jizhu1023 avatar jizhu1023 commented on August 26, 2024

@vietanhdev Thanks for your reply, which address my issues well! The other thing I am confused about is why there are 39 keypoints but not 33 or 35 keypoints as mentioned by the paper? By looking into the code in Mediapipe, I found the keypoint 34-35 are auxiliary_landmarks for ROI generation and keypoint 36-39 are not used. I further visualize the locations of keypoint 36-39 and found they are the same with some keypoints on hands.

from tf-blazepose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.