kristijanbartol / general-3d-humans Goto Github PK

[CVPR 2022 Oral] Generalizable Human Pose Triangulation

License: MIT License

Dockerfile 0.70% Shell 0.46% Python 98.84%

3d-human-pose human-pose-estimation human36m cmu-panoptic feedforward-neural-network generalization stochastic-learning cvpr2022 deep-learning

general-3d-humans's Introduction

Generalizable Human Pose Triangulation

Introduction

Ever tried to run a pretrained multi-view 3D pose estimation on your own data? We address the problem that these models perform significantly worse on novel camera arrangements, if it's even possible to run them. This is the source code for the CVPR 2022 paper Generalizable Human Pose Triangulation.

✅ Latest release: (v0.1)

add inference script (assuming previously extracted 2D keypoints and known camera parameters);
run inference from main.py;
add instructions and command line options (see 3D pose estimation model (inference)).

🚧 Next release: (v0.2)

add a script to extract 2D keypoints (using off-the-shelf 2D detector such as OpenPose);
estimate camera extrinsics for your cameras;
short tutorial on how to estimate camera extrinsics and estimate 3D poses for any multi-view data!

Note

It is already possible to estimate camera extrinsics if you previously extract 2D keypoints (see Relative camera pose estimation (inference)).

Citation

If you use our model in your research, please reference our paper:

@inproceedings{Bartol:CVPR:2022,
   title = {Generalizable Human Pose Triangulation},
   author = {Bartol, Kristijan and Bojani\'{c}, David and Petkovi\'{c}, Tomislav and Pribani\'{c}, Tomislav},
   booktitle = {Proceedings of IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
   month = jun,
   year = {2022}
}

Updates / Work-In-Progress

We plan to completely prepare the source code with the pretrained models, demos, and videos by mid May. The to-do list consists of:

Usage

First download pretrained backbone and place it in ./models/pretrained/.

To install and prepare the environment, use docker:

docker build -t <image-name> .

docker run --rm --gpus all --name <container-name> -it \
	-v ${REPO_DIR}:/generalizable-triangulation \
	-v ${BASE_DATA_DIR}/:/data/  <image-name>

Data preparation

Prior to running any training/evaluation/inference, 2D pose detections need to be extracted. Our backmode 2D pose detector is the baseline model, i.e., the version available in karfly/learnable-triangulation-pytorch, but the straightforward inference method is not provided so it's not straightforward to use it. Instead, but with no guarantees, pose detectors such as OpenPose or MMPose can be used.

But we already prepared some training/evaluation data :) (password: data-3d-humans, directory: pretrained). Extract the folder into data/<dataset>. Note that the Human3.6M dataset already contains bounding boxes obtained as described here.

Pose estimation model training

To train on the base configuration (use Human3.6M for training), run:

python main.py

A more convenient way to specify the arguments is through the .vscode/launch.json, if the VSCode IDE is used. All the options are available in src/options.py.

Pose estimation model evaluation

Download pretrained models from SharePoint (password: data-3d-humans, directory: data).

python main.py --run_mode eval

3D pose estimation model (inference)

To run an inference on novel views, first use a 2D keypoint detector in all views and frames to generated 2D keypoint estimates.

Once the poses are obtained, you can run:

python main.py --run_mode infer

Relative camera pose estimation (inference)

To estimate relative camera poses on Human3.6M using the keypoint estimation on the test data, run:

python src/fundamental.py

The rotation and translation estimations are produced and stored in est_Rs.npy and est_ts.npy.

Results

The results for base, intra, and inter configurations are:

Base (H36M)	Intra (CMU)	Inter (CMU->H36M)
29.1 mm	25.6 mm	31.0 mm

Data and pretrained models

The data used for the above commands is in ./data/ folder. Note that, in this submission, we only include subject 1 (Human3.6M) for training, but it should be sufficient to reproduce the original results.

Acknowledgements

Parts of the source code were adapted from cvlab-dresden/DSAC and karfly/learnable-triangulation-pytorch and directly inspired by some of the following publications:

[1] DSAC - Differentiable RANSAC for Camera Localization

[2] Learnable Triangulation of Human Pose

[3] Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

[4] Categorical Reparameterization with Gumbel-Softmax

[5] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

general-3d-humans's People

Contributors

Stargazers

Watchers

Forkers

yulong314 bruinxiong

general-3d-humans's Issues

Download link of pretrained models from SharePoint (password: pretrained-3d-humans) expired :)

sorry to bother:)

How to run the inference on a custom dataset?

Thanks for your work. I am trying to run the inference for my custom dataset but I can figure it out what should I do to make the 2d keypoints and caliberation parameters ready. Can you help please?

training/evaluation data :) (password: data-3d-humans) is expired

could you please provide a another usable download link :)

Enable demo (inference on multi-view images)

Motivation

The most natural expectation is to have a demo script to run results on your own data, which is the main "selling point" of the paper. To make it more convenient for the users, it would be nice to have a script that directly takes images and produces 3D poses, without having to go to other repository first to produce 2D keypoint estimations.

Description

In order to support demo, several steps need to be prepared:

2D keypoint detections (using a "custom" detector, such as OpenPose)
camera parameters estimation (in case the parameters are not already obtained)
create a custom dataset on-the-fly
3D keypoint estimations similar to the current test() function

Questions regarding inference on the custom data

Hi, thank you for your great work! I would like to run your model to triangulate the data that we've collected.

Since we are using detection models with COCO configuration, I am not sure how can I run your model with COCO.

Do we have to retrain the model on COCO configuration?

In infer_pose.py, it seems like I need h36m_custom.pt while I can only download h36m_knwon.pt and h36m_est.pt from your shared link. Or can I use h36m_known.pt if I already have the camera calibration parameters?

Thank you for your answer in advance!

Scale ambiguity

Hi @kristijanbartol ,
Thanks a lot for releasing the code, your paper is very interesting.
I have a 'naive' question about the scale ambiguity in your pipeline. When we use the 8-point algorithm there is this issue of scale ambiguity about the translation vector. Several times in your codebase you are using a 'scale' variable such as here. But as far as I know you do not really estimate given an initial assumption this scale such as assuming that a person is 1.80m tall.
Could you explain me how you would estimate this scale factor in your pipeline, it is not really clear to me at the moment?
Thanks a lot for your feedback and response,

Release main scripts

When will you release main scripts for training, inference, and evaluation ?