Giter VIP home page Giter VIP logo

mvgformer's Introduction

MVGFormer: Multiple View Geometry Transformers for 3D Human Pose Estimation

CVPR'24 [Paper] [Poster]

This is the official implementation of our work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation.

Framework

We explicitly introduce multi-view geometric modules into an end-to-end Transformers architecture for 3D human pose estimation, resulting in remarkable generalization performance across various camera settings.

framework

Iterative Query Updates with Transformers

We employ a coarse-to-fine query refinement process, initially randomly sampling coarse 3D queries in the areas. Subsequently, we project these coarse 3D poses onto 2D images and aggregate features using Transformer attentions to update the projected points. Finally, we use triangulation to recover more accurate 3D poses. This process is iteratively repeated in an end-to-end differentiable manner by Transformer decoder layers.

iterative_result

Reference

@inproceedings{liao2024multiple,
  title={Multiple View Geometry Transformers for 3D Human Pose Estimation},
  author={Liao, Ziwei and Zhu, Jialiang and Wang, Chunyu and Hu, Han and Waslander, Steven L},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={708--717},
  year={2024}
}

1. Installation

Clone the repository:

git clone [email protected]:XunshanMan/MVGFormer.git

Dependencies

Create a conda environment:

conda create -n mvgformer python==3.10
conda activate mvgformer

Install mmcv carefully with cuda version:

pip install -U openmim
mim install mmcv-full

Install the dependencies:

pip install -r requirements.txt

Compile the attention modules. Please carefully set your CUDA_HOME=/usr/local/cuda-*/, where * is your cuda version:

cd ./lib/models/ops
CUDA_HOME=/usr/local/cuda-11.0/ python setup.py build install

Dataset Prepararation

Consider the project root directory as ${POSE_ROOT}.

CMU Panoptic dataset. Please download the CMU Panoptic Dataset following VoxelPose as below:

  • Download the dataset by following the instructions in panoptic-toolbox and extract them under ${POSE_ROOT}/data/panoptic/

  • You can only download those sequences you need. You can also just download a subset of camera views by specifying the number of views (HD_Video_Number) and changing the camera order in ./scripts/getData.sh. The sequences and camera views used in our project can be obtained from here, and the Table A.1 in the Supplementary Materials of our paper.

  • Note that we only use HD videos, calibration data, and 3D Body Keypoint in the codes. You can comment out other irrelevant codes such as downloading 3D Face data in ./scripts/getData.sh.

Feature Backbone. Please download the PoseResNet-50 pre-trained model and place it at ${POSE_ROOT}/models/pose_resnet50_panoptic.pth.tar (ResNet-50 pretrained on COCO dataset and finetuned jointly on Panoptic dataset and MPII).

The structure should look like this:

${POSE_ROOT}
|-- models
|   |-- pose_resnet50_panoptic.pth.tar
|-- data
    |-- panoptic
        |-- 16060224_haggling1
        |   |-- hdImgs
        |   |-- hdvideos
        |   |-- hdPose3d_stage1_coco19
        |   |-- calibration_160224_haggling1.json
        |-- 160226_haggling1  
        |-- ...

Shelf/Campus Dataset. Please follow VoxelPose to download the Shelf/Campus Dataset.

2. Training

2.1 CMU Panoptic dataset

We train on five camera views from the camera arrangement of CMU0. We trained our models on 8 GPUs and batch_size=1 for each GPU. The evaluation result will be printed after every epoch, the best result can be found in the log.

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/panoptic/knn5-lr4-q1024-g8.yaml

Model Checkpoints

The trained model checkpoints can be downloaded as below. Please put the weights under models/.

Datasets AP25 MPJPE pth
Panoptic 92.3 16.0 here

2.2 Ablation Experiments

Following Section 2.1, please change the parameters in the config file, or pass through extra parameters, such as DECODER.num_decoder_layers=4, DECODER.num_instance=1024, TRAIN.END_EPOCH=100 to run the ablation experiments.

3. Evaluation

3.1 Generalization (Out-of-domain) Evaluation

We train our model on the CMU panoptic dataset, with the camera arrangement of CMU0. We can infer the model on several out-of-domain settings, including Change Camera Numbers, Change Camera Arrangements, and Change Dataset without finetuning.

For CMU-panoptic dataset, you can find ready-made configuration files below configs/panoptic/generalization. EXP_TYPE can be CMU0ex3,CMU0ex4,CMU0ex6,CMU0ex7,CMU1,CMU2,CMU3,CMU4.

python run/validate_3d.py --cfg configs/panoptic/generalization/{EXP_TYPE}.yaml --model_path models/mvgformer_q1024_model.pth.tar

You can also configure experiments with extra parameters:

  • Change Camera Numbers. The camera number can be 3,4,5,6,7:
python run/validate_3d.py --cfg configs/panoptic/knn5-lr4-q1024.yaml --model_path models/mvgformer_q1024_model.pth.tar DATASET.TEST_CAM_SEQ='CMU0ex' DATASET.CAMERA_NUM=7
  • Change Camera Arrangements. The camera arrangements can be CMU0,CMU1,CMU2,CMU3,CMU4. Please see here for the detail of each arrangements:
python run/validate_3d.py --cfg configs/panoptic/knn5-lr4-q1024.yaml --model_path models/mvgformer_q1024_model.pth.tar DATASET.TEST_CAM_SEQ=CMU1
  • Change Dataset without finetuning. The dataset can be Shelf,Campus.
python run/validate_3d.py --cfg XXXX --model_path models/mvgformer_q1024_model.pth.tar --dataset Shelf/Campus

3.2 In-domain Evaluation

This experiment aims to test the in-domain performance of our model. It is trained on the camera arrangement of CMU0 on the CMU panoptic dataset, and also infer with the same camera arrangement.

Infer with single GPU with BATCH_SIZE=1:

python run/validate_3d.py --cfg configs/panoptic/knn5-lr4-q1024.yaml --model_path models/mvgformer_q1024_model.pth.tar TEST.BATCH_SIZE=1 

LICENSE

This repository is licensed under Apache-2.0. For commercial use, please reach out to the authors.

Acknowledgement

Our work is based on the codebases of MvP and VoxelPose. We deeply appreciate the authors for their invaluable contributions.

mvgformer's People

Contributors

xunshanman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mvgformer's Issues

Regarding Results after Epoch 1 Training

Hi, thanks for the great work! I tried training the model and got two tables as the result after the first epoch.

  1. The two tables show different values of $AP_{25}$, $AP_{50}$, $AP_{75}$, $AP_{100}$, etc (Same for Recall values). Also two different values of $MPJPE$ are printed. I assume the paper reports the first table. What does the second table here represent?
  2. Moreover, wandB only shows the Table 2 values. Am I missing something?

image

Different Results

image
I have downloaded your model to validate on the Panoptic model. Why are the results so unnormal?

when release the code

Your work is truly impressive! I was wondering if there are any plans to release the code?

About the code

Hi !
Thanks for your work. Do you have any plan to open source the code?

Model training time and GPU detials

Hi, can the authors share the details of the GPU on which MVGFormer was trained and how long was the training time. Also, was the best model trained for 100 epochs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.