xingyizhou / deepmodel Goto Github PK

View Code? Open in Web Editor NEW

111.0 13.0 43.0 8.12 MB

Code repository for Model-based Deep Hand Pose Estimation

License: GNU General Public License v3.0

C++ 81.41% Python 18.59%

deepmodel's Introduction

Model-based Deep Hand Pose Estimation

This repository is the released code of our IJCAI 2016 paper for estimating hand pose from depth image.

Contact: [email protected]

Requirements

Caffe
Python with opencv

Installation

Download caffe
Copy ./libs/include to caffe_root/include and ./libs/src to caffe_root/src
Compile caffe
Copy path.config.example to path.config and set the pycaffe path

Test

Run demo.py in ./testing
Our prediction on NYU dataset here
Our prediction on ICVL dataset here

Train

download NYU dataset
set NYU_path in path.config
Run GetH5DataNYU.py in ./training
Train with solver.prototxt

Citation

Please cite DeepModel in your publication if it helps your research:

@inproceedings{zhou2016model,
    author = {Xingyi Zhou and Qingfu Wan and Wei Zhang and Xiangyang Xue and Yichen Wei},
    booktitle = {IJCAI},
    title = {Model-based Deep Hand Pose Estimation},
    year = {2016}
}

deepmodel's People

Contributors

Stargazers

Watchers

deepmodel's Issues

details of pose parameter theta

In deepmodel.prototxt which is the layer that outputs "55 dimensional pose parameter theta" . Is it the layer called "DoF"? But it has only 47 num_output.

What is the order of the parameters in the pose parameter theta? Is it the bend, side , twist for each joint of libhand model. What is the joint sequence?

NYU Dataset dist convert

How to convert a NYU Dataset png img pixel value to z dist value?

Visualization in demo.py

Hello, I have a question in demo.py. I saw the following code.

      for j in range(J):
        x[j] = joint[joints[j] * 3]
        y[j] = joint[joints[j] * 3 + 1]   
        z[j] = joint[joints[j] * 3 + 2]
        cv2.circle(img, (int((x[j] + 1) / 2 * 128), int((- y[j] + 1) / 2 * 128)), 2, (255, 0, 0), 2)

After computing the joint in 3D, it seems that we just normalize x,y coordinates to [0,128). I reimplement your code to run pytorch, my output for the ground truth (not my prediction of network) is like this:

The joints looks wrong and should be positioned left slightly. I wonder that it is normal or not (due to cropping, reisize and so on..)

uvd convert to xyz coordination

Hi~ Thank you for your kind and great open source code. I have some issues in these following code:

xstart = int(math.floor((u * d / fx - cube_size / 2.) / d * fx))
xend = int(math.floor((u * d / fx + cube_size / 2.) / d * fx))
ystart = int(math.floor((v * d / fy - cube_size / 2.) / d * fy))
yend = int(math.floor((v * d / fy + cube_size / 2.) / d * fy))

in your code,
fx = 588.03
fy = 587.07

And I know fu,fv is decided because the original depth image is 640*480 pixel.

Are fx and fy decided by the camera? Where and How can I get them according to the NYU hand pose dataset? I'm a green hand and I haven't figure out the transforming between xyz and uvd coordination.
Could you pleas give me some help? Thank you!

真实环境下深度图测试

你好，请教一个问题。我想知道 testing\test_images 文件夹下的测试图像是怎么得到的，我根据你论文里的描述，我做了如下预处理：
（1）检测到手的位置；
（2）将感兴趣区域扣出来；
（3）过滤手部前后30公分的背景；
（4）转化为灰度图；
（5）做归一化到[-1,1]；

可是测试出的结果并不好，我想知道我的预处理出现了问题还是哪里出现了问题，我看到你的测试图像是3通道的灰度图，想知道你是如何做的预处理？

                                                                                                               期待您的回复！谢谢！

Rotation Angle Constraints

In your paper, you have mentioned:

"Each rotation angle has a range [theta_i; theta_i], which are the lower/upper bounds for the
angle. Such bounds avoid self-collision and physically infeasible poses."

Could you please point me to the values of these bounds for each joint and degree of freedom.

An issue about the RGB （4804603) picture

HI~ I come here to ask for your help again.
Why the NYU Hand Dataset's RGB picture quality is so poor? It's not clear. They look so weird.
Could you please give me a hand? The dataset's author seems not to answer issues.
Thank you!

preprocessing: cube_sizes = [300, 300, 300 * 0.87]

HI~ I come here again. In your code:
https://github.com/xingyizhou/DeepModel/blob/master/training/GetH5DataNYU.py 64 line

data_names = ['train', 'test_1', 'test_2']
cube_sizes = [300, 300, 300 * 0.87]
id_starts = [0, 0, 2440]

What' the purpose of 300*0.87 when creating the test data?
Looking forward for your replying.
Best wishes!

accuracy layer on training phase

Hi, xingyi
Than you for your great open source code. I have reproduced the results as you show on the repository.
Furthermore, I want to add accuracy layer in DeepModel.prototxt file .

layer {
            name: "accuracy"
            type: "Accuracy"
            bottom: "DeepHandModelxyz"
            bottom: "joint"
            top: "accuracy"
            include {
                  phase: "TEST"
                   }
          }

However, there is error say that the dimension of jonit and DeepHandModelxyz doesn't match.

Number of labels must match number of predictions; e.g., if label axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}

Is it maybe the sake of hd5 format data format? Can you give me some help? I am looking forward to your replay. Thank you!!!

NYU数据集下载太慢了还老是中断，请问有什么镜像吗或者其他方法

error in caffe while building layer deephandmodel

I0614 12:48:35.391320 30215 layer_factory.hpp:77] Creating layer DeepHandModel
I0614 12:48:35.391338 30215 net.cpp:91] Creating Layer DeepHandModel
I0614 12:48:35.391347 30215 net.cpp:425] DeepHandModel <- DoF
I0614 12:48:35.391356 30215 net.cpp:399] DeepHandModel -> DeepHandModelxyz
*** Aborted at 1465888715 (unix time) try "date -d @1465888715" if you are using GNU date ***
PC: @ 0x7f9b7cc44754 (unknown)
*** SIGSEGV (@0xc0) received by PID 30215 (TID 0x7f9b7ee55780) from PID 192; stack trace: ***
@ 0x7f9b7cc26d40 (unknown)
@ 0x7f9b7cc44754 (unknown)
@ 0x7f9b7cc4d147 (unknown)
@ 0x7f9b7e670cd9 caffe::DeepHandModelLayer<>::LayerSetUp()
@ 0x7f9b7e6f0515 caffe::Net<>::Init()
@ 0x7f9b7e6f13b5 caffe::Net<>::Net()
@ 0x7f9b7e67b72a caffe::Solver<>::InitTrainNet()
@ 0x7f9b7e67c93c caffe::Solver<>::Init()
@ 0x7f9b7e67cc6a caffe::Solver<>::Solver()
@ 0x7f9b7e6d43e3 caffe::Creator_SGDSolver<>()
@ 0x411666 caffe::SolverRegistry<>::CreateSolver()
@ 0x40ab20 train()
@ 0x40852c main
@ 0x7f9b7cc11ec5 (unknown)
@ 0x408cfd (unknown)
@ 0x0 (unknown)
Segmentation fault

caffe installation tests ran ok. hdf5_classification example also ran ok. Why is this(DeepHandModel) failing? Did you encounter this error? How did you fix it?

where is caffe/proto/caffe.pb.h?

There is no caffe/proto/caffe.pb.h at https://github.com/BVLC/caffe

which version of caffe do we need to use?

kindly advice

Unknown Layer Error - While running demo.py

Hi,

I have copied your files from ./libs/include to caffe_root/include and ./libs/src to caffe_root/src and built the caffe library(make && make pycaffe) by enabling the python layer WITH_PYTHON_LAYER := 1 in Makefile.config. Even then it is reporting an error stating no layer named DeepHandModel. Can you please help in resolving this issue.

F0606 19:27:40.747319 20872 layer_factory.hpp:80] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: DeepHandModel (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, Python, ReLU, Reduction, Reshape, SPP, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
Aborted (core dumped)

I have a question in your hand detection module

Hi thanks for sharing your code

I have a question about your hand detection module which is used in most of the hand pose estimation papers recently.

According to 96 line of training/GetH5DataNYU.py,

depth = CropImage(depth, joint_uvd[id, 34])

you used joint_uvd[id, 34] for com of CropImage function.

So, I`m curious about whether you used groundtruth palm position(joint_uvd[id, 34]) when you crop the hand from original depth image even in test stage or not.

Preprocessing of depth images

As far as I understand the preprocessing part, the procedure is

third party detects center of hand*
3d cube with fixed size around center of hand is extracted
depth values in this cube are normalized to [-1,1]
the resulting depth image is resized to 128x128 pixel

However, hands in the test depth images [0,772,1150,1350,1739].png won't be detected correctly once the image is being flipped or rotated.

Are there some more steps involved in the preprocessing of the depth images (e.g. rotation, left/right hand orientation)?

*) Oberweger et al. (2015) used a center of mass approach to detect the center of hand, whereas in #4 the wrist position is mentioned as being the center, could you clarify on that?

performance in real world

Hi:
thank you for your job!
I run the code with some pic from the real camera,and got bad results.
I followed the steps below:
a. get a depth image from the camera
b. crop the region including the hand
c. run the code.
I looked into the image and found the only diff between mine and NYU is :
my image is more noisy(gussian)than NYU.
Is this lead to the bad results? or did you do some experiments in the real world，and how did it
behaves?
thanks!