Giter VIP home page Giter VIP logo

general-3d-humans's Introduction

Generalizable Human Pose Triangulation

Introduction

Ever tried to run a pretrained multi-view 3D pose estimation on your own data? We address the problem that these models perform significantly worse on novel camera arrangements, if it's even possible to run them. This is the source code for the CVPR 2022 paper Generalizable Human Pose Triangulation.

โœ… Latest release: (v0.1)

  • add inference script (assuming previously extracted 2D keypoints and known camera parameters);
  • run inference from main.py;
  • add instructions and command line options (see 3D pose estimation model (inference)).

๐Ÿšง Next release: (v0.2)

  • add a script to extract 2D keypoints (using off-the-shelf 2D detector such as OpenPose);
  • estimate camera extrinsics for your cameras;
  • short tutorial on how to estimate camera extrinsics and estimate 3D poses for any multi-view data!

Note

It is already possible to estimate camera extrinsics if you previously extract 2D keypoints (see Relative camera pose estimation (inference)).

Citation

If you use our model in your research, please reference our paper:

@inproceedings{Bartol:CVPR:2022,
   title = {Generalizable Human Pose Triangulation},
   author = {Bartol, Kristijan and Bojani\'{c}, David and Petkovi\'{c}, Tomislav and Pribani\'{c}, Tomislav},
   booktitle = {Proceedings of IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
   month = jun,
   year = {2022}
}

Updates / Work-In-Progress

We plan to completely prepare the source code with the pretrained models, demos, and videos by mid May. The to-do list consists of:

  • [19-04-2022] Instructions for training pose estimation model
  • [19-04-2022] Fundamental matrix estimation algorithm
  • [22-04-2022] Refactor the source code
  • Complete the documentation
  • [26-04-2022] Pretrained pose estimation learning model
  • [26-04-2022] Demo to obtain camera parameters from multi-frame keypoints (src/fundamental.py)
  • Demo to obtain 3D poses from arbitrary image sequence (previously calibrated)
  • Demo to obtain 3D poses from arbitrary image sequence (uncalibrated)
  • Short tutorial on how to obtain camera parameters and 3D poses on any multi-view data
  • [28-04-2022] Instructions for running inference
  • [21-07-2022] Training and evaluation functions
  • Project page

Usage

First download pretrained backbone and place it in ./models/pretrained/.

To install and prepare the environment, use docker:

docker build -t <image-name> .

docker run --rm --gpus all --name <container-name> -it \
	-v ${REPO_DIR}:/generalizable-triangulation \
	-v ${BASE_DATA_DIR}/:/data/  <image-name>

Data preparation

Prior to running any training/evaluation/inference, 2D pose detections need to be extracted. Our backmode 2D pose detector is the baseline model, i.e., the version available in karfly/learnable-triangulation-pytorch, but the straightforward inference method is not provided so it's not straightforward to use it. Instead, but with no guarantees, pose detectors such as OpenPose or MMPose can be used.

But we already prepared some training/evaluation data :) (password: data-3d-humans, directory: pretrained). Extract the folder into data/<dataset>. Note that the Human3.6M dataset already contains bounding boxes obtained as described here.

Pose estimation model training

To train on the base configuration (use Human3.6M for training), run:

python main.py

A more convenient way to specify the arguments is through the .vscode/launch.json, if the VSCode IDE is used. All the options are available in src/options.py.

Pose estimation model evaluation

Download pretrained models from SharePoint (password: data-3d-humans, directory: data).

python main.py --run_mode eval

3D pose estimation model (inference)

To run an inference on novel views, first use a 2D keypoint detector in all views and frames to generated 2D keypoint estimates.

Once the poses are obtained, you can run:

python main.py --run_mode infer

Relative camera pose estimation (inference)

To estimate relative camera poses on Human3.6M using the keypoint estimation on the test data, run:

python src/fundamental.py

The rotation and translation estimations are produced and stored in est_Rs.npy and est_ts.npy.

Results

The results for base, intra, and inter configurations are:

Base (H36M) Intra (CMU) Inter (CMU->H36M)
29.1 mm 25.6 mm 31.0 mm

Data and pretrained models

The data used for the above commands is in ./data/ folder. Note that, in this submission, we only include subject 1 (Human3.6M) for training, but it should be sufficient to reproduce the original results.

Acknowledgements

Parts of the source code were adapted from cvlab-dresden/DSAC and karfly/learnable-triangulation-pytorch and directly inspired by some of the following publications:

[1] DSAC - Differentiable RANSAC for Camera Localization

[2] Learnable Triangulation of Human Pose

[3] Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

[4] Categorical Reparameterization with Gumbel-Softmax

[5] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

general-3d-humans's People

Contributors

kristijanbartol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

general-3d-humans's Issues

How to run the inference on a custom dataset?

Thanks for your work. I am trying to run the inference for my custom dataset but I can figure it out what should I do to make the 2d keypoints and caliberation parameters ready. Can you help please?

Enable demo (inference on multi-view images)

Motivation

The most natural expectation is to have a demo script to run results on your own data, which is the main "selling point" of the paper. To make it more convenient for the users, it would be nice to have a script that directly takes images and produces 3D poses, without having to go to other repository first to produce 2D keypoint estimations.

Description

In order to support demo, several steps need to be prepared:

  • 2D keypoint detections (using a "custom" detector, such as OpenPose)
  • camera parameters estimation (in case the parameters are not already obtained)
  • create a custom dataset on-the-fly
  • 3D keypoint estimations similar to the current test() function

Questions regarding inference on the custom data

Hi, thank you for your great work! I would like to run your model to triangulate the data that we've collected.

Since we are using detection models with COCO configuration, I am not sure how can I run your model with COCO.

Do we have to retrain the model on COCO configuration?

In infer_pose.py, it seems like I need h36m_custom.pt while I can only download h36m_knwon.pt and h36m_est.pt from your shared link. Or can I use h36m_known.pt if I already have the camera calibration parameters?

Thank you for your answer in advance!

Scale ambiguity

Hi @kristijanbartol ,
Thanks a lot for releasing the code, your paper is very interesting.
I have a 'naive' question about the scale ambiguity in your pipeline. When we use the 8-point algorithm there is this issue of scale ambiguity about the translation vector. Several times in your codebase you are using a 'scale' variable such as here. But as far as I know you do not really estimate given an initial assumption this scale such as assuming that a person is 1.80m tall.
Could you explain me how you would estimate this scale factor in your pipeline, it is not really clear to me at the moment?
Thanks a lot for your feedback and response,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.