Giter VIP home page Giter VIP logo

v2v-posenet-pytorch's Introduction

V2V-PoseNet-pytorch

This is a pytorch implementation of V2V-PoseNet(V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map), which is largely based on the author's torch7 implementation.

This repository provides

  • V2V-PoseNet core modules(model, voxelization, ..)
  • An experiment demo on MSRA hand pose dataset, result in ~11mm mean error.
  • Additional Integral Pose Loss (or PoseFix Loss) implementation, result in ~10mm mean error on the same demo.

Requirements

  • pytorch 0.4.1 or pytorch 1.0
  • python 3.6
  • numpy

Warning on pytorch0.4.1 cudnn:

May need to disable cudnn for batchnorm, or just only use cuda instead. With cudnn for batchnorm and in float precision, the model cannot train well. My simple experiments show that:

cudnn+float: NOT work(e.g. the loss decreases much slower, and result in a higher loss) 
cudnn+float+(disable batchnorm's cudnn): work(e.g. the loss decreases faster, and result in a lower loss)
cudnn+double: work, but the speed is slow
cuda+(float/double): work, but uses much more memroy

There is a similar issue pointed out by https://github.com/Microsoft/human-pose-estimation.pytorch. As suggested, disable cudnn for batchnorm:

PYTORCH=/path/to/pytorch
for pytorch v0.4.0
sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
for pytorch v0.4.1
sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py

MSRA hand dataset demo

Usage

  • Clone this repo:
git clone https://github.com/dragonbook/V2V-PoseNet-pytorch.git
cd V2V-PoseNet-pytorch
Note, this repository contains a copy of the msra hand centers under ./datasets/msra_center.
  • Configure data_dir=path/to/msra-hand and center_dir=path/to/msra-hand-center in ./experiments/msra-subject3/main.py. And Run following command to perform training and testing. It will train the dataset for few epochs and evaluate on the test dataset. The test result will be saved as test_res.txt and the fit result on training data will be saved as fit_res.txt
PYTHONPATH=./ python ./experiments/msra-subject3/main.py
  • Configure data_dir=path/to/msra-hand and center_dir=path/to/msra-hand-center in ./experiments/msra-subject3/gen_gt.py. Run it to generate ground truth labels as train_s3_gt.txt and test_s3_gt.txt

  • Configure pred_file=path/to/test_s3_gt.txt and gt_file=path/to/test_res.txt in ./experiments/msra-subject3/show_acc.py. Run it to plot accuracy and error.

  • The following figures show that the simple experiment can result in about 11mm mean error.

msra_s3_acc

msra_s3_mean_error

Additional IntegralPose/PoseFix style loss implementation

Replaced V2V-PoseNet's loss with PoseFix's loss(one-hot heatmap loss + L1 coord loss), and it's independently implemented under ./integral-pose directory. Also, configure data_dir and center_dir in ./integral-pose/main.py, and start training. The result shows about 10mm mean error.

integral_loss_s3_acc

integral_loss_mean_error

compare_mean_error

Below is from author's README for reference

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Introduction

This is our project repository for the paper, V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map (CVPR 2018).

We, Team SNU CVLAB, (Gyeongsik Moon, Juyong Chang, and Kyoung Mu Lee of Computer Vision Lab, Seoul National University) are winners of HANDS2017 Challenge on frame-based 3D hand pose estimation.

Please refer to our paper for details.

If you find our work useful in your research or publication, please cite our work:

[1] Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map." CVPR 2018. [arXiv]

@InProceedings{Moon_2018_CVPR_V2V-PoseNet,
author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
title = {V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}

In this repository, we provide

  • Our model architecture description (V2V-PoseNet)
  • HANDS2017 frame-based 3D hand pose estimation Challenge Results
  • Comparison with the previous state-of-the-art methods
  • Training code
  • Datasets we used (ICVL, NYU, MSRA, ITOP)
  • Trained models and estimated results
  • 3D hand and human pose estimation examples

Model Architecture

V2V-PoseNet

HANDS2017 frame-based 3D hand pose estimation Challenge Results

Challenge_result

Comparison with the previous state-of-the-art methods

Paper_result_hand_graph

Paper_result_hand_table

Paper_result_human_table

About our code

Dependencies

Our code is tested under Ubuntu 14.04 and 16.04 environment with Titan X GPUs (12GB VRAM).

Code

Clone this repository into any place you want. You may follow the example below.

makeReposit = [/the/directory/as/you/wish]
mkdir -p $makeReposit/; cd $makeReposit/
git clone https://github.com/mks0601/V2V-PoseNet_RELEASE.git
  • src folder contains lua script files for data loader, trainer, tester and other utilities.
  • data folder contains data converter which converts image files to the binary files.

To train our model, please run the following command in the src directory:

th rum_me.lua
  • There are some optional configurations you can adjust in the config.lua.
  • You have to convert the .png images of the ICVL and NYU dataset to the .bin files by running the code from data folder.
  • The directory where you have to put the dataset files and computed centers of each frame is defined in src/data/dataset_name/data.lua
  • Visualization code is finally uploaded! You have to prepare 'result_pixel.txt' for each dataset. Each row of the result file has to contain the pixel coordinates of x, y and depth of all joints (i.e, x1 y1 z1 x2 y2 z2 ...). Then run pixel2world script and run draw_DB.m

Dataset

We trained and tested our model on the four 3D hand pose estimation and one 3D human pose estimation datasets.

Results

Here we provide the precomputed centers, estimated 3D coordinates and pre-trained models.

The precomputed centers are obtained by training the hand center estimation network from DeepPrior++ . Each line represents 3D world coordinate of each frame. In case of ICVL, NYU, MSRA dataset, if depth map is not exist or not contain hand, that frame is considered as invalid. In case of ITOP dataset, if 'valid' variable of a certain frame is false, that frame is considered as invalid. All test images are considered as valid.

The 3D coordinates estimated on the ICVL, NYU and MSRA datasets are pixel coordinates and the 3D coordinates estimated on the ITOP datasets are world coordinates. The estimated results are from ensembled model. You can make the results from a single model by downloading the pre-trained model and testing it.

We used awesome-hand-pose-estimation to evaluate the accuracy of the V2V-PoseNet on the ICVL, NYU and MSRA dataset.

Belows are qualitative results. result_1 result_2 result_3 result_4 result_5 result_6

v2v-posenet-pytorch's People

Contributors

dragonbook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

v2v-posenet-pytorch's Issues

Pose Estimation

Hello and thanks for sharing your work,

Does your code work with pose estimation examples?

Thanks

can't work well in NYU

Your code works fine in the MSRA dataset. I changed the code to the NYU dataset. The precision is not up to the torch7 version. I used your parameter settings with an accuracy of 13mm, using the author's parameter settings. 10mm, what do you think is the reason?

Some questions with NYU datatset

Thanks for your sharing.

I have changed your msra_hand.py to nyu_hand.py fitting NYU dataset and keep all other parts of your code unchanged. However, the result of NYU dataset is bad. The all mean error is 180mm after training. (epoch=14, batch_size=64, nGpus=4, optimizer =Adam, cubicSize=250, keep orignal_size and crop_size the same as yours.
While the results on NYU is about 13mm or less after epoch ensemble with nGPU=1 and batchSz=16.

Here is my question. Why with the increase of batch size the results becoming worse?
Is there any way to solve such a problem?

Thanks again!

Can I use ITOP dataset too?

Thank you for sharing your hard work. It has been so helpful to me.

In your implementation, I could find code for MSRA dataset only.

Are ITOP dataset, which is included in the V2V-PoseNet author's torch 7 implementation, also possible?

If it is not, can there be any critical differences I should keep in mind while implementing ITOP by myself?

Thank you so much and have a good day.

the question of numble about dataset

the sample numble of msra_center provided in your code dont equal msra dataset. one(msra_center) is 76391 and another(msra dataset) is 76375.

pose estimation examples

Hi, I have download your pretrained models. But I still have some problems for me to convert my own depth image into the network, and get a good pose/hand estimation. Will you provide some example codes?

License

Hi, Thanks for the implementation of V2V-Posenet in Pytorch. It's really helpful in understanding the authors' work. I'm thinking of using your code in a project and wonder if it's open-source. If it is, would it be possible to add a license?

Thanks again.

Testing on Videos or real-time dataset

I am working on a project in which i have to estimate/get the 3d coordinates of the hand from a real-time dataset or a video. I would like to know whether it could get me the coordinates from a video or not. If it could happen then what is the approx fps of the video from which it could extract ?

How to test with custom dataset?

Hi,

I've trained your pytorch version model using the MSRA dataset and it works well, and the next step I'd like to use the trained model to run on my own dataset, may I ask how to prepare the centered data like the paper described?

Thanks!

data loader for hands 2017

Hi, Thanks a lot for your awesome codebase. Could you please clarify the data loader for hands 2017 dataset?
I could not find that here

How to fit it in real industrial environment

Hello,
Thanks for sharing.

In real industrial environment except UYU, ICVL and MSRA datasets, gives a demo for using pretrained model with the depth image from RGBD camera?

If there exists similar Python demos, will be very great and helpful.

Thanks for reply.

Best Regards!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.