Giter VIP home page Giter VIP logo

3dcrowdnet_release's Introduction

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet

front_figur

News

💪 3DCrowdNet achieves the state-of-the-art accuracy on 3DPW (3D POSES IN THE WILD DATASET)!
💪 We improved PA-MPJPE to 51.1mm and MPVPE to 97.6mm using a ResNet 50 backbone!

Introduction

This repo is the official PyTorch implementation of Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes (CVPR 2022).

Installation

We recommend you to use an Anaconda virtual environment. Install PyTorch >=1.6.0 and Python >= 3.7.3. Then, run sh requirements.sh. You should slightly change torchgeometry kernel code following here.

Quick demo

Preparing

  • Download the pre-trained 3DCrowdNet checkpoint from here and place it under ${ROOT}/demo/.
  • Download demo inputs from here and place them under ${ROOT}/demo/input (just unzip the demo_input.zip).
  • Make ${ROOT}/demo/output directory.
  • Get SMPL layers and VPoser according to this.
  • Download J_regressor_extra.npy from here and place under ${ROOT}/data/.

Running

  • Run python demo.py --gpu 0. You can change the input image with --img_idx {img number}.
  • A mesh obj, a rendered mesh image, and an input 2d pose are saved under ${ROOT}/demo/.
  • The demo images and 2D poses are from CrowdPose and HigherHRNet respectively.
  • The depth order is not estimated. You can manually change it.

Results

☀️ Refer to the paper's main manuscript and supplementary material for diverse qualitative results!

table table

Directory

Refer to here.

Reproduction

First finish the directory setting. Then, refer to here to train and evaluate 3DCrowdNet.

Reference

@InProceedings{choi2022learning,  
author = {Choi, Hongsuk and Moon, Gyeongsik and Park, JoonKyu and Lee, Kyoung Mu},  
title = {Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes},  
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}
year = {2022}  
}  

Related Projects

I2L-MeshNet_RELEASE
3DCrowdNet_RELEASE
TCMR_RELEASE
Hand4Whole_RELEASE
HandOccNet
NeuralAnnot_RELEASE

3dcrowdnet_release's People

Contributors

hongsukchoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3dcrowdnet_release's Issues

Have question about the conversion between cam_param and cam_trans

def get_camera_trans(self, cam_param, meta_info, is_render):

Hello @hongsukchoi
First, thanks for the nice work you've publicly released!
Currently I'm doing a research on the multi-person reconstruction and while reading your code line by line I encountered the function get_camera_trans which I mentioned above.

As far as I understand, from RotationNet we get cam_param and the function converts it into cam_trans (which stands for camera translation matrix) and again it goes in to get_coord function.

My QUESTION starts here!
I don't understand the exact meaning or assumption of gamma and k_value.
Is there any theory referring to it?

You seems like replying to issues so..
Thanks in advance!

Joseph

H36M, MuCo SMPL GT

Hi. thanks for sharing the nice work.

In your code, You use SMPLify-X pseudo GT on 3D datasets(H36M, MuCo), and use Neural Annot pseudo GT on 2D datasets(CrowdPose, MSCOCO).
Can I ask why you don't Neural Annot on H36M, MuCo?

About get_camera_trans in model.py

def get_camera_trans(self, cam_param, meta_info, is_render): # camera translation t_xy = cam_param[:,:2] gamma = torch.sigmoid(cam_param[:,2]) # apply sigmoid to make it positive k_value = torch.FloatTensor([math.sqrt(cfg.focal[0]*cfg.focal[1]*cfg.camera_3d_size*cfg.camera_3d_size/(cfg.input_img_shape[0]*cfg.input_img_shape[1]))]).cuda().view(-1) if is_render: bbox = meta_info['bbox'] k_value = k_value * math.sqrt(cfg.input_img_shape[0]*cfg.input_img_shape[1]) / (bbox[:, 2]*bbox[:, 3]).sqrt() t_z = k_value * gamma cam_trans = torch.cat((t_xy, t_z[:,None]),1) return cam_trans

Hi, how to understand this function about the parameters including cam_param, gamma, k_value,t_z, cam_trans?

json

Thanks for your share!
If I want to train with my dataset, how can I get the json file about 2D keypoints?

minor issue: readme correction

In https://github.com/hongsukchoi/3DCrowdNet_RELEASE/blob/main/assets/directory.md#pytorch-smpl-layer-and-vposer
The description is

Download basicModel_f_lbs_10_207_0_v1.0.0.pkl, basicModel_m_lbs_10_207_0_v1.0.0.pkl, and basicModel_neutral_lbs_10_207_0_v1.0.0.pkl from here (female & male) and here (neutral) to ${ROOT}/smplpytorch/smplpytorch/native/models.

It seems that it should be ${ROOT}/common/utils/smplpytorch/smplpytorch/native/models. rather than ${ROOT}/smplpytorch/smplpytorch/native/models.

How to predict the whole person in the image?

Hi, thanks for sharing your code. I notice that this model inputs the cropped and resized image and is trained to predict SMPL parameters and camera parameters once a person. As a result, if there's more than one person in the image, we detect the human and crop the image with human detection results. I'm wondering how to input the original image without cropping. However, I got a few questions in dealing with the dataset. Could you help me with it?

  1. For the camera parameters, Do I need to predict the camera parameters per person or image? (A image may have many persons, and I don't decide to crop the image.)
  2. Which key points in targets need to be changed?

some question about process 2d datasets

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?
in addition, the cropped image may contain another person, won't this process bring ambiguity to the network?
thank you very much!

About MSCOCO train annotation file

I am trying to run train script, but there is an issue about annotation file.
Current code tries to read 'MSCOCO_train_SMPL_NeuralAnnot.json' insead of 'coco_smplifyx_train.json':

with open(osp.join(self.annot_path, 'MSCOCO_train_SMPL_NeuralAnnot.json')) as f:

Is it correct to comment the 66th line and uncomment the 67th line? Or should we use 'MSCOCO_train_SMPL_NeuralAnnot.json'?

Evaluate CMU-Panoptic Code

Thank you for your work!

Can you please provide the code and json file(2D Pose Result) to evalution on the CMU-Panoptic dataset?

=================================================================================
안녕하세요! Github에 논문 코드를 올려주셔서 감사합니다!

Code를 확인해본 결과 CMU-Panoptic 결과를 내면서 사용한 python file과 2D Pose estimation json 파일이 없는 것 같아서 요청드리고자 이슈에 올렸습니다!

감사합니다

How to preprocess the 3dpw dataset?

Sorry to bother you again. :(
I have a question about the SMPL parameters in the 3dpw dataset you offered.
I noticed the pose parameter (72,) and trans parameters (3,) differ from the official 3dpw dataset. So, I guess there are some preprocess in making the annotation. Could you tell me how you preprocess it?
Thanks in advance.~

Question about reproducing Table 8

Hi,
Thank you for your fantastic work!
However, I tried to reproduce table8. using the command python train.py --amp --continue --gpu 0 --cfg ../assets/yaml/3dpw.yml and got the result below:

* train for 10 epoch
	MPJPE from mesh: 93.76 mm
	PA MPJPE from mesh: 56.54 mm
	MPVPE from mesh: 110.48 mm
* train for 5 epoch
	MPJPE from mesh: 85.21 mm
	PA MPJPE from mesh: 52.53 mm
	MPVPE from mesh: 101.27 mm
* train for 4 epoch
        MPJPE from mesh: 85.65 mm
        PA MPJPE from mesh: 51.91 mm
        MPVPE from mesh: 101.10 mm
* train for 2 epoch
	MPJPE from mesh: 84.16 mm
	PA MPJPE from mesh: 52.13 mm
	MPVPE from mesh: 99.90 mm
* train for 1 epoch
	MPJPE from mesh: 83.17 mm
	PA MPJPE from mesh: 52.34 mm
	MPVPE from mesh: 99.32 mm

* train for 0 epoch(pretrained)
	MPJPE from mesh: 712.35 mm
	PA MPJPE from mesh: 98.89 mm
	MPVPE from mesh: 734.46 mm

Is it normal to get to the best point only after 1 epoch? (BTW, I used the pre-trained ResNet-50 weights of xiao2018simple)
Also is it normal for the results to be worse as the epoch increasing?

Thank you!

About 2D pose

Hello, according to the paper, I can see that the image has 2Dpose input, but in the code, I don't see where the 2Dpose input comes from. Could you please help me understand the following code?

Question about Table 1 and experiments

@hongsukchoi

Hello! 3DCrowdNet is a nice work!

I have a few questions about Table 1 and network.

  • Is Table 1 result tested on the 3DPW-Crowd?
  • Is Table 1 result trained on a mixed dataset or a single dataset?
  • I don't quite understand this operation. The img_feat is a 2d image-space feature. For sampling on it, the 3d joint may use a perspective projection to get the 2d image space point. Why just use the x,y of the 3d joint?
  • When you are testing your idea of crowded-scene robust, do you test on the 3DPW-crowd instead of the large whole 3dpw test set? If the results are good on 3dpw-crowd, then you test on the whole 3dpw test set. Do the research procedure I describe correct?

Question about calculation of smpl_trans in Human36M

Hi! Firstly, thanks for your excellent work! I was learning your data processing procedure and had confusions about the following code:

smpl_trans = smpl_trans - root_joint_coord + np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0)

Could you kindly explain to me that what the physical meaning of 'smpl_trans - root_joint_coord + np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0)' is? As far as i understand that 'smpl_trans - root_joint_coord' represents the translation in the coordinate system that takes root joint as origin. My question is how to explain np.dot(R, root_joint_coord.transpose(1,0)).transpose(1,0)?

Unstable test results

Hi, have you ever had a situation where the result of a retest is different from the result of the last test, and what may be the reason for this?

About training dataset

Hi everyone,

to run the training command, is it necessary to download all the training set 3D (human36m and MuCo), can i just use an extract ?

Thanks

Cannot reproduce without pre-trained ResNet-50 weights of xiao2018simple

Hi,
I tried to reproduce table 8 without pre-trained ResNet-50 weights of xiao2018simple.
My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml
and the config file is :

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.00025 #0.001/4
lr_backbone: 0.0001
lr_dec_factor: 10

However, I got very strange results on 3dpw as below (I evaluate every epoch):
image

Do you have any idea about this?
Thank you!

experimental results

Thank you for sharing.
I tried to reproduce your experimental results, but both results were far from those given in your paper.

First result:
MPJPE from mesh: 493.55 mm
PA MPJPE from mesh: 112.53 mm
MPVPE from mesh: 529.30 mm

Second result:
MPJPE from mesh: 644.32 mm
PA MPJPE from mesh: 103.72 mm
MPVPE from mesh: 661.93 mm

Can you share the hyperparameter setting or give some instructions?
pytorch:1.6,cuda:10.2

Runtime error in pytorch trying to reproduce 3DCrowdNet demo

Hello!

Trying to reproduce the provided demo I get the following error:
File "demo.py", line 114, in
model.load_state_dict(ckpt['network'], strict=False)
File "<path_to_condaenv>/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.human_model_layer.th_betas: copying a param with shape torch.Size([1, 10]) from checkpoint, the shape in current model is torch.Size([1, 300]).
size mismatch for module.human_model_layer.th_shapedirs: copying a param with shape torch.Size([6890, 3, 10]) from checkpoint, the shape in current model is torch.Size([6890, 3, 300]).

It seems pytorch is failing to load the pretrained 3DCrowdNet model referred to in the README. Does someone have a hint on how to fix this and why this might happen?

My setup:
Debian 11 (bullseye)
Anaconda version: 22.9.0
Python version in Anaconda environment: 3.7.3
PyTorch version: 1.13.0+cu117

Thank you very much!

About MuPoTs evaluation

Your paper reports evaluation resluts of MuPoTS-3D, but there are several procedures I could not fully understand.
1)
Your data loading script(data/MuPoTs/MuPoTs.py) reads preprocessed 2d pose estimator results based on annotation id. Does this mean that you used ground-truth bounding boxes for retrieving 2d predictions?
2)
At

self.mpii3d_smpl_regressor = np.load(osp.join('..', 'data', 'MPI_INF_3DHP', 'J_regressor_mi_smpl.npy'))[:17]
, your code specifies to use joint regressor weights of mpi-inf-3dhp dataset.
Is this file made by yourself?
As far as I know, previous works(including https://github.com/JiangWenPL/multiperson) used J_regressor_extra.npy for evaluation on mpi-inf-3dhp and MuPoTS.

H36M evaluation

Hi,
Have you ever evaluated your method on Human3.6M testing set?
Because I am curious about whether your method still works well in simple environments and is comparable to other methods.

How to learn trans?

The groundtruth of SMPL trans parameters range varies widely for each dataset, How to let the network learn the correct trans, combined with the focal and the princpt to complete the correct projection? I noticed you set focal = 5000, princpt = 256/2, how to understand this value? Thank you!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.