Giter VIP home page Giter VIP logo

depthai_hand_tracker's Introduction

Hand tracking with DepthAI

Running Google Mediapipe Hand Tracking models on DepthAI hardware (OAK-1, OAK-D, ...)

There is a version for OpenVINO there : openvino_hand_tracker

Demo

Install

Install DepthAI gen2 python package:

python3 -m pip install -r requirements.txt

Run

To use the color camera as input :

python3 HandTracker.py

To use a file (video or image) as input :

python3 HandTracker.py -i filename

To enable gesture recognition :

python3 HandTracker.py -g

Gesture recognition

To run only the palm detection model (without hand landmarks), use --no_lm argument. Of course, gesture recognition is not possible in this mode.

Use keypress between 1 and 7 to enable/disable the display of hand features (palm bounding box, palm landmarks, hand landmarks, handedness, gesture,...)

The models

You can find the models palm_detector.blob and hand_landmark.blob under the 'models' directory, but below I describe how to get the files.

  1. Clone this github repository in a local directory (DEST_DIR)
  2. In DEST_DIR/models directory, download the source tflite models from Mediapipe:
  1. Install the amazing PINTO's tflite2tensorflow tool. Use the docker installation which includes many packages including a recent version of Openvino.
  2. From DEST_DIR, run the tflite2tensorflow container: ./docker_tflite2tensorflow.sh
  3. From the running container:
cd resources/models
./convert_models.sh

The convert_models.sh converts the tflite models in tensorflow (.pb), then converts the pb file into Openvino IR format (.xml and .bin), and finally converts the IR files in MyriadX format (.blob).

By default, the number of SHAVES associated with the blob files is 4. In case you want to generate new blobs with different number of shaves, you can use the script gen_blob_shave.sh:

# Example: to generate blobs for 6 shaves
./gen_blob_shave.sh -m pd -n 6   # will generate palm_detection_sh6.blob
./gen_blob_shave.sh -m lm -n 6   # will generate hand_landmark_sh6.blob

Explanation about the Model Optimizer params :

  • The preview of the OAK-* color camera outputs BGR [0, 255] frames . The original tflite palm detection model is expecting RGB [-1, 1] frames. --reverse_input_channels converts BGR to RGB. --mean_values [127.5,127.5,127.5] --scale_values [127.5,127.5,127.5] normalizes the frames between [-1, 1].
  • The images which are fed to hand landmark model are built on the host in a format similar to the OAK-* cameras (BGR [0, 255]). The original hand landmark model is expecting RGB [0, 1] frames. Therefore, the following arguments are used --reverse_input_channels --scale_values [255.0, 255.0, 255.0]

Blob models vs tflite models The palm detection blob does not exactly give the same results as the tflite version, because the tflite ResizeBilinear instruction is converted into IR Interpolate-1. Yet the difference is almost imperceptible thanks to the great help of PINTO (see issue ).

depthai_hand_tracker's People

Contributors

geaxgx avatar

Stargazers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.