hongsukchoi / 3dcrowdnet_release Goto Github PK

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

License: MIT License

Python 99.90% Shell 0.10%

crowded-scenes pose-estimation 3d-human-mesh 3d-human-shape-and-pose-estimation cvpr2022 monocular-images

3dcrowdnet_release's Introduction

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet

News

💪 3DCrowdNet achieves the state-of-the-art accuracy on 3DPW (3D POSES IN THE WILD DATASET)!
💪 We improved PA-MPJPE to 51.1mm and MPVPE to 97.6mm using a ResNet 50 backbone!

Introduction

This repo is the official PyTorch implementation of Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes (CVPR 2022).

Installation

We recommend you to use an Anaconda virtual environment. Install PyTorch >=1.6.0 and Python >= 3.7.3. Then, run sh requirements.sh. You should slightly change torchgeometry kernel code following here.

Quick demo

Preparing

Download the pre-trained 3DCrowdNet checkpoint from here and place it under ${ROOT}/demo/.
Download demo inputs from here and place them under ${ROOT}/demo/input (just unzip the demo_input.zip).
Make ${ROOT}/demo/output directory.
Get SMPL layers and VPoser according to this.
Download J_regressor_extra.npy from here and place under ${ROOT}/data/.

Running

Run python demo.py --gpu 0. You can change the input image with --img_idx {img number}.
A mesh obj, a rendered mesh image, and an input 2d pose are saved under ${ROOT}/demo/.
The demo images and 2D poses are from CrowdPose and HigherHRNet respectively.
The depth order is not estimated. You can manually change it.

Results

☀️ Refer to the paper's main manuscript and supplementary material for diverse qualitative results!

Reproduction

First finish the directory setting. Then, refer to here to train and evaluate 3DCrowdNet.

Reference

@InProceedings{choi2022learning,  
author = {Choi, Hongsuk and Moon, Gyeongsik and Park, JoonKyu and Lee, Kyoung Mu},  
title = {Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes},  
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}
year = {2022}  
}

Related Projects

I2L-MeshNet_RELEASE
3DCrowdNet_RELEASE
TCMR_RELEASE
Hand4Whole_RELEASE
HandOccNet
NeuralAnnot_RELEASE

3dcrowdnet_release's People

Contributors

Stargazers

Watchers

Forkers

ai-machine-vision-lab uyoung-jeong jiahongwu1995 yangchanghee jiahao-ma easy-shu mirapurkrabek yanqi1811 nikosvasilik louhz ericzw driver4567 5l1v3r1 da-oon hannixxxoh

3dcrowdnet_release's Issues

Have question about the conversion between cam_param and cam_trans

3DCrowdNet_RELEASE/main/model.py

Line 38 in fde86df

def get_camera_trans(self, cam_param, meta_info, is_render):

Hello @hongsukchoi
First, thanks for the nice work you've publicly released!
Currently I'm doing a research on the multi-person reconstruction and while reading your code line by line I encountered the function get_camera_trans which I mentioned above.

As far as I understand, from RotationNet we get cam_param and the function converts it into cam_trans (which stands for camera translation matrix) and again it goes in to get_coord function.

My QUESTION starts here!
I don't understand the exact meaning or assumption of gamma and k_value.
Is there any theory referring to it?

You seems like replying to issues so..
Thanks in advance!

Joseph

H36M, MuCo SMPL GT

Hi. thanks for sharing the nice work.

In your code, You use SMPLify-X pseudo GT on 3D datasets(H36M, MuCo), and use Neural Annot pseudo GT on 2D datasets(CrowdPose, MSCOCO).
Can I ask why you don't Neural Annot on H36M, MuCo?

How can i have my own 2d_pose_results.json, for my own images ?

hi every one

I want to run the demo.py on my own images and I don't know how to get the appropriate json file

thanks

How can I get J_regressor_coco_hip_smpl.npy file?

Thanks for the great work.
I am trying to run 3DPW evaluation script, but the test script raises an error due to nonexistence of J_regressor_coco_hip_smpl.npy file.
It seems that you did not provide a download link for it.
Is this file the same as this one? https://github.com/mks0601/I2L-MeshNet_RELEASE/blob/master/data/MSCOCO/J_regressor_coco_hip_smpl.npy

how to convert the result to bvh file?

does there any instructions?

About get_camera_trans in model.py

def get_camera_trans(self, cam_param, meta_info, is_render): # camera translation t_xy = cam_param[:,:2] gamma = torch.sigmoid(cam_param[:,2]) # apply sigmoid to make it positive k_value = torch.FloatTensor([math.sqrt(cfg.focal[0]*cfg.focal[1]*cfg.camera_3d_size*cfg.camera_3d_size/(cfg.input_img_shape[0]*cfg.input_img_shape[1]))]).cuda().view(-1) if is_render: bbox = meta_info['bbox'] k_value = k_value * math.sqrt(cfg.input_img_shape[0]*cfg.input_img_shape[1]) / (bbox[:, 2]*bbox[:, 3]).sqrt() t_z = k_value * gamma cam_trans = torch.cat((t_xy, t_z[:,None]),1) return cam_trans

Hi, how to understand this function about the parameters including cam_param, gamma, k_value，t_z, cam_trans?

json

Thanks for your share！
If I want to train with my dataset, how can I get the json file about 2D keypoints?

minor issue: readme correction

In https://github.com/hongsukchoi/3DCrowdNet_RELEASE/blob/main/assets/directory.md#pytorch-smpl-layer-and-vposer
The description is

Download basicModel_f_lbs_10_207_0_v1.0.0.pkl, basicModel_m_lbs_10_207_0_v1.0.0.pkl, and basicModel_neutral_lbs_10_207_0_v1.0.0.pkl from here (female & male) and here (neutral) to ${ROOT}/smplpytorch/smplpytorch/native/models.

It seems that it should be ${ROOT}/common/utils/smplpytorch/smplpytorch/native/models. rather than ${ROOT}/smplpytorch/smplpytorch/native/models.

load_vposer

How to predict the whole person in the image?

Hi, thanks for sharing your code. I notice that this model inputs the cropped and resized image and is trained to predict SMPL parameters and camera parameters once a person. As a result, if there's more than one person in the image, we detect the human and crop the image with human detection results. I'm wondering how to input the original image without cropping. However, I got a few questions in dealing with the dataset. Could you help me with it?

For the camera parameters, Do I need to predict the camera parameters per person or image? (A image may have many persons, and I don't decide to crop the image.)
Which key points in targets need to be changed?

some question about process 2d datasets

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?
in addition, the cropped image may contain another person, won't this process bring ambiguity to the network？
thank you very much!

How to reproduce the result in table6 and table7?

Hi, thanks for sharing your code. I'm wondering how to reproduce the result in table6 and table7. Could your offer your preprocessed json file, config file and dataset file?

About MSCOCO train annotation file

I am trying to run train script, but there is an issue about annotation file.
Current code tries to read 'MSCOCO_train_SMPL_NeuralAnnot.json' insead of 'coco_smplifyx_train.json':

3DCrowdNet_RELEASE/data/MSCOCO/MSCOCO.py

Line 66 in c4d32c0

 with open(osp.join(self.annot_path, 'MSCOCO_train_SMPL_NeuralAnnot.json')) as f: 

Is it correct to comment the 66th line and uncomment the 67th line? Or should we use 'MSCOCO_train_SMPL_NeuralAnnot.json'?

share of training code

Wrong path for SMPL layers

Sorry I was mistaken.

How can i train the net with the FreiHAND dataset?

How do you calculate 3dpck?

There is no place in the code to calculate 3dpck, can you describe in detail how pck is calculated?

Evaluate CMU-Panoptic Code

Thank you for your work!

Can you please provide the code and json file(2D Pose Result) to evalution on the CMU-Panoptic dataset?

=================================================================================
안녕하세요! Github에 논문 코드를 올려주셔서 감사합니다!

Code를 확인해본 결과 CMU-Panoptic 결과를 내면서 사용한 python file과 2D Pose estimation json 파일이 없는 것 같아서 요청드리고자 이슈에 올렸습니다!

감사합니다

How to preprocess the 3dpw dataset?

Sorry to bother you again. :(
I have a question about the SMPL parameters in the 3dpw dataset you offered.
I noticed the pose parameter (72,) and trans parameters (3,) differ from the official 3dpw dataset. So, I guess there are some preprocess in making the annotation. Could you tell me how you preprocess it?
Thanks in advance.~

3DPW_validation_crowd_hhrnet_result.json

@hongsukchoi
i can‘t find the
3DPW_validation_crowd_hhrnet_result.json
J_regressor_mi_smpl.npy
MuPoTs_test_hhrnet_result.json

Question about reproducing Table 8

Hi,
Thank you for your fantastic work!
However, I tried to reproduce table8. using the command python train.py --amp --continue --gpu 0 --cfg ../assets/yaml/3dpw.yml and got the result below:

* train for 10 epoch
	MPJPE from mesh: 93.76 mm
	PA MPJPE from mesh: 56.54 mm
	MPVPE from mesh: 110.48 mm
* train for 5 epoch
	MPJPE from mesh: 85.21 mm
	PA MPJPE from mesh: 52.53 mm
	MPVPE from mesh: 101.27 mm
* train for 4 epoch
        MPJPE from mesh: 85.65 mm
        PA MPJPE from mesh: 51.91 mm
        MPVPE from mesh: 101.10 mm
* train for 2 epoch
	MPJPE from mesh: 84.16 mm
	PA MPJPE from mesh: 52.13 mm
	MPVPE from mesh: 99.90 mm
* train for 1 epoch
	MPJPE from mesh: 83.17 mm
	PA MPJPE from mesh: 52.34 mm
	MPVPE from mesh: 99.32 mm

* train for 0 epoch(pretrained)
	MPJPE from mesh: 712.35 mm
	PA MPJPE from mesh: 98.89 mm
	MPVPE from mesh: 734.46 mm

Is it normal to get to the best point only after 1 epoch? (BTW, I used the pre-trained ResNet-50 weights of xiao2018simple)
Also is it normal for the results to be worse as the epoch increasing?

Thank you!

About 2D pose

Hello, according to the paper, I can see that the image has 2Dpose input, but in the code, I don't see where the 2Dpose input comes from. Could you please help me understand the following code?

Question about Table 1 and experiments

@hongsukchoi

Hello! 3DCrowdNet is a nice work!

I have a few questions about Table 1 and network.

Is Table 1 result tested on the 3DPW-Crowd?
Is Table 1 result trained on a mixed dataset or a single dataset?
I don't quite understand this operation. The img_feat is a 2d image-space feature. For sampling on it, the 3d joint may use a perspective projection to get the 2d image space point. Why just use the x,y of the 3d joint?
When you are testing your idea of crowded-scene robust, do you test on the 3DPW-crowd instead of the large whole 3dpw test set? If the results are good on 3dpw-crowd, then you test on the whole 3dpw test set. Do the research procedure I describe correct?

Question about calculation of smpl_trans in Human36M

Hi! Firstly, thanks for your excellent work! I was learning your data processing procedure and had confusions about the following code:

3DCrowdNet_RELEASE/data/Human36M/Human36M.py

Line 216 in fde86df

 smpl_trans = smpl_trans - root_joint_coord + np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0) 

Could you kindly explain to me that what the physical meaning of 'smpl_trans - root_joint_coord + np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0)' is? As far as i understand that 'smpl_trans - root_joint_coord' represents the translation in the coordinate system that takes root joint as origin. My question is how to explain np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0)?

Unstable test results

Hi, have you ever had a situation where the result of a retest is different from the result of the last test, and what may be the reason for this?

About training dataset

Hi everyone,

to run the training command, is it necessary to download all the training set 3D (human36m and MuCo), can i just use an extract ?

Thanks

Cannot reproduce without pre-trained ResNet-50 weights of xiao2018simple

Hi,
I tried to reproduce table 8 without pre-trained ResNet-50 weights of xiao2018simple.
My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml
and the config file is :

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.00025 #0.001/4
lr_backbone: 0.0001
lr_dec_factor: 10

However, I got very strange results on 3dpw as below (I evaluate every epoch):

Do you have any idea about this?
Thank you!

experimental results

Thank you for sharing.
I tried to reproduce your experimental results, but both results were far from those given in your paper.

First result:
MPJPE from mesh: 493.55 mm
PA MPJPE from mesh: 112.53 mm
MPVPE from mesh: 529.30 mm

Second result:
MPJPE from mesh: 644.32 mm
PA MPJPE from mesh: 103.72 mm
MPVPE from mesh: 661.93 mm

Can you share the hyperparameter setting or give some instructions?
pytorch:1.6,cuda:10.2

Runtime error in pytorch trying to reproduce 3DCrowdNet demo

Hello!

Trying to reproduce the provided demo I get the following error:
File "demo.py", line 114, in
model.load_state_dict(ckpt['network'], strict=False)
File "<path_to_condaenv>/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.human_model_layer.th_betas: copying a param with shape torch.Size([1, 10]) from checkpoint, the shape in current model is torch.Size([1, 300]).
size mismatch for module.human_model_layer.th_shapedirs: copying a param with shape torch.Size([6890, 3, 10]) from checkpoint, the shape in current model is torch.Size([6890, 3, 300]).

It seems pytorch is failing to load the pretrained 3DCrowdNet model referred to in the README. Does someone have a hint on how to fix this and why this might happen?

My setup:
Debian 11 (bullseye)
Anaconda version: 22.9.0
Python version in Anaconda environment: 3.7.3
PyTorch version: 1.13.0+cu117

Thank you very much!

How should I infer my own image?

If I want to infer my own image using the demo.py, how can I get the json file of 2D poses result?

3DPW parsed data link broken

Hi, thanks for your excellent work! But when I tried to reproduce the results, the 3DPW parsed data link listed in https://github.com/hongsukchoi/3DCrowdNet_RELEASE/blob/3d84d17797e353aaf49acfd8087637eca2877292/assets/directory.md is broken. It requires access, but I couldn't send a request.

Could you fix this problem?

missing definition for load_vposer in human_body_prior

from human_body_prior.tools.model_loader import load_vposer can not
find load_vposer.

About MuPoTs evaluation

Your paper reports evaluation resluts of MuPoTS-3D, but there are several procedures I could not fully understand.
1)
Your data loading script(data/MuPoTs/MuPoTs.py) reads preprocessed 2d pose estimator results based on annotation id. Does this mean that you used ground-truth bounding boxes for retrieving 2d predictions?
2)
At

3DCrowdNet_RELEASE/data/MuPoTs/MuPoTs.py

Line 80 in c4d32c0

 self.mpii3d_smpl_regressor = np.load(osp.join('..', 'data', 'MPI_INF_3DHP', 'J_regressor_mi_smpl.npy'))[:17] 

, your code specifies to use joint regressor weights of mpi-inf-3dhp dataset.
Is this file made by yourself?
As far as I know, previous works(including https://github.com/JiangWenPL/multiperson) used J_regressor_extra.npy for evaluation on mpi-inf-3dhp and MuPoTS.

H36M evaluation

Hi,
Have you ever evaluated your method on Human3.6M testing set?
Because I am curious about whether your method still works well in simple environments and is comparable to other methods.

How to learn trans?

The groundtruth of SMPL trans parameters range varies widely for each dataset, How to let the network learn the correct trans, combined with the focal and the princpt to complete the correct projection? I noticed you set focal = 5000, princpt = 256/2, how to understand this value? Thank you!!