river-zhang / gta Goto Github PK

[NeurIPS 23] Official repository for NeurIPS 2023 paper "Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction"

Home Page: https://river-zhang.github.io/GTA-projectpage/

Python 98.14% Shell 0.53% GLSL 1.34%

3d clothed-humans clothed-people-digitalization digital human reconstruction vision neurips-2023 python pytorch

gta's People

Contributors

Stargazers

Watchers

Forkers

linecode strategist922 jackzhousz soeminemi

gta's Issues

The test results seems to be inconsistent with the paper

I tried your code in Ubuntu20.04 cuda 11.8 pytorch2.0.1
I have got the results:

GTA.mp4

but the corresponding results in your paper is much better obviously, as shown in the following：

I want to know if my result is reasonable。

ViTencoder input

I found that the front/back normal maps are also used as input to the encoder and image to generate three-plane features. I want to know why? Will the result be improved?

Reading the code, I found that after obtaining the three-plane feature map, it was concatenated with the normal feature.

I only input the image through VitPose's pre-trained ViTencoder model to get the image features, and then also through the three decoders to get the three-plane features and splice with the normal features. Is that all right?

inference time

Hi, thanks for your great work.
I read your paper, but didn't see any mention of inference time for a single image.
Do you have a rough idea of what it would be on a modern GPU?

thanks!

About the version of pymeshlab

During inferencing, I first installed the current pymeshlab version of 2023.12 and encountered the
AttributeError: 'pymeshlab.pmeshlab.MeshSet' object has no attribute 'laplacian_smooth'

Then I changed the pymeshlab to 2022 and finished the inference successfully.
Maybe the new version of pymeshlab is mismatched with the code.

Strange surface of inferenced results

Thanks for your great work!
I encountered some problems during inferencing.
Would you please help me?
My inference results have strange surfaces just as #7

I noticed that an ERROR occurred, although it didn't stop the inference progress:

Resume MLP weights from ./data/ckpt/GTA.ckpt
Resume normal model from ./data/ckpt/normal.ckpt
Using pixie as HPS Estimator

Dataset Size: 5
  0%|                                                                                                                                                            | 0/5 [00:00<?, ?it/s]
2024-03-02 16:02:28.809516226 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:515 CreateExecutionProviderInstance] Failed to create TensorrtExecutionProvider. 
Please reference https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements to ensure all dependencies are met.
1eca7a73c3c61d9debde493de37c7d99:   0%|                                                                                                                          | 0/5 [00:06<?, ?it/s
Body Fitting --- normal: 0.089 | silhouette: 0.043 | Total: 0.132:  12%|█████████▎                                                                    | 12/100 [00:01<00:13,  6.32it/s]
1eca7a73c3c61d9debde493de37c7d99:   0%|                                                                                                                          | 0/5 [00:08<?, ?it/s]

Is it normal that this error ocurred during inferencing?

I tried to change the onnxruntime-gpu and TensorRT's version but it didn't work.

My environment is:
CUDA 11.7
pytorch 1.13.1
onnxruntime-gpu 1.14
TensorRT 8.5.3.1

Texture is vertex coloring or PNG ?

Hi, Can I know about the texture is using vertex coloring or exporting the png format texture?

Thank you.

there is no .npy file

Thanks for your sharing of this work.
When i tried to run the infer.py, i found there is no .npy files and directories in certain place:

Question about numbers, evaluation.

Hello I have two questions regarding the test results in your two papers GTA and SIFU.

I see Issue #5 and your explanation in it. But I still don't understand why your GTA numbers for THuman 2.0 is different.
In GTA paper, it is Chamfer 0.814, P2S 0.862, Normal 0.055. In SIFU, it is 0.73, 0.72, 0.04.
I noticed that in both of your papers evaluation code, you are using GT front and back normal. This is different from ICON's evaluation protocol where they use estimated normal. (YuliangXiu/ICON#183)
If using estimated normal, your GTA numbers for THuman 2.0 should be 1.12, 1.12, 0.065.

Could you please clarify these two points? Thank you!

Ask for training code

Excellent work! I would like to ask if the training code is included in the repository?

About PNSR

Amazing work! Can you provide code for calculating PNSR or tell me where to find the relevant code?

The test results of the pre-trained model are inconsistent with the benchmark

I have successfully tested the pre-trained model, but I found that the results do not match the data in the benchmark, especially for Normal. Is this normal? Looking forward to your reply.

By the way, the results of GTA in your two papers are quiet different.

Expecting a demo

Hi, River-Zhang
I'm studying papers on human body reconstruction, and I have read your paper. It is a very nice work! May I ask when will you update the open source? Expecting your demo~

THuman 2.0 evaluation protocol

Hi authors, I have a question regarding the THuman 2.0 evaluation protocol in your Table 1.

How do you create train/test split?
For the test set, how many views do you render per subject, and what is the FOV?

Thank you in advance!

When will you realease the code?

Such an excellent job! Could you please tell me when do you plan to release the code?

About HGPIFu

When estimating the human body geometry, the query operation is performed in HGPIFuNet.

The first step is to project the sampled point set onto the image plane. But I found that the transforms parameter is None.

So in xyz = self.projection(points, calibs, transforms), only the points are rotated and translated.
Are all points in the world coordinate system? The projection operation only converts points from the world coordinate system to the camera coordinate system after rotation and translation, and does not project further to the image plane. Please give me some help.

about SMPLX model

In which part of your code do you use a Pixie-like model to estimate SMPLX parameters? I have read your code, and it seems that in the training process, you used the SMPLX parameters obtained from the THuman2.0 dataset as the prior enhancement query. Only when infer was an image, since it was not the image of the dataset, PIXIE model was used to predict the SMPLX parameters as the prior enhancement query. Is my understanding correct?