c-he / nemf Goto Github PK

View Code? Open in Web Editor NEW

155.0 155.0 9.0 796 KB

[NeurIPS 2022] Official implementation of "NeMF: Neural Motion Fields for Kinematic Animation"

License: MIT License

Python 100.00%

nemf's People

Contributors

Stargazers

Watchers

Forkers

peterzs bruinxiong syguan96 zhengyiluo jackzhousz rucchzy archerguzen brentyi setarehc

nemf's Issues

What is the difference between rotmat and global_xform?

Hi , thank you for your code ! I want to ask the difference between rotmat(or rot6d) and global_xform?
From the code in amass.py , i see the global_xform was gotten by the following code:

SO What is the difference between rot6d and global_xform?Why we get global_xform through rot6d
AND in training and applying , the global_xform was seen as rotmat, such as the following code:

SO WHAT is the global_xform?

Clarification on sparse keyframe in-betweening

Hi,
Thanks for this great work and your codebase!
In the sparse keyframe setting of the inbetweening experiments (refering to Table 5 in the paper), how are keyframes exactly selected? Do you select the first frame and then, select the next frame every t = {5, 10, 15, 20} frames? This way, when t = 5 and motion length is some arbitrary length, the mask will look like this: [1,0,0,0,0,1,0,0,0,0,1,...]
I'm asking this as I'm trying to compare my own inbetweening model with yours and I want to make sure I'm exactly replicating your setup.
Thanks :)

Bug with gt translation

Hi,

Thanks for this awesome work and opensource.

A bug was found at :

NeMF/src/application.py

Line 289 in 8daf6ba

 poses=poses_gt, trans=c2c(trans[i]), betas=np.zeros(10), gender=args.data.gender, mocap_framerate=args.data.fps) 

where c2c(trans[i]) should be c2c(trans_gt[i]).

This bug will lead to foot sliding since the predicted translation could be different from the gt translation.

questions about the up axis

Sorry, I have a question. Why the up axis of smpl skeleton is z up. I print out the offsets of smpl skeleton, it should be y up. Is there something you considered and I missed that information?

Regarding preprocessing script for AMASS Dataset

Thank you for sharing your fantastic work!
I have a question about src/datasets/amass.py.
AMASS dataset uses SMPL+H skeleton topology, but your script seems to use SMPL skeleton topology.
According to this repository, the 24th joint of SMPL is right hand, but the 24th joint of SMPL+H is left_index2.
Are you referring to the wrong rotation value on this joint?

Potential issue of nn.GroupNorm(8, out_channels)

Hi I am trying to train the NeMF model (generative) on the dog dataset. After setting up the dataset and network architecture, I met with an issue:

RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [16, 810, 64] and num_groups=8

It seems to be related to the code below with num_groups = 8 for group normalization:

NeMF/src/nemf/residual_blocks.py

Line 126 in 146a1ea

seq.append(nn.GroupNorm(8, out_channels)) # FIXME: REMEMBER TO CHANGE BACK !!!

where my input data has the size of [16, 810, 64] and 810 cannot be divided by 8. I checked with AMASS dataset, its input size is [16, 360, 128] and it is safe...

I am wondering if there is any proper way of fixing this?

Thanks

About inference on video

Thanks for your great work!
Could you please give me some suggestions on how to run NeMF on an arbitrary video (for motion reconstruction task)? It seems that application.py use data from AMASS.

Question about training epochs

Hi, Thanks for the amazing work! I was wondering how many epochs per sequence length you had trained for each sequence length. "We train our single-motion NeMF for 500 iterations to fit a 32-frame
sequence, and scale the number of iterations proportionally as the sequence length increases to make
sure that our model is sufficiently trained for each length of sequences.". In the supplementary material for NeMF, from my understanding it was written that you had trained for 500 epochs on sequence length of 32 and for each higher sequence length, a proportional number of epochs were taken. What proportional number of epochs were taken in this case?

save motion to bvh

Hi! When we save motion data back to bvh format, why do we need order[::-1] rather than using order itself?

NeMF/src/holden/BVH.py

Line 418 in 146a1ea

rots = np.degrees(anim.rotations.euler(order=order[::-1]))

Another question: euler() function cannot take in order with value 'zxy', as shown in code:

NeMF/src/holden/Quaternions.py

Line 226 in 146a1ea

def euler(self, order='xyz'):

It only takes 'xyz' and 'yzx' and conver quaternion to euler angle. I'd like to have order 'zxy' which is the same in dog bvh file work here. Do you know how to get conversion from quaternion to euler angle with order 'zxy' ? The first link on L249 doesn't work and the second link L 280 didn't have such info.

Thanks!

Issue in the foot skating function

Hi,
I came across your codebase while searching for an implementation of the foot skating metric, and I want to express my gratitude for sharing such a clean codebase. However, I noticed a problem that I believe needs attention.

If I understand correctly, the issue is mainly cuased by this line of the code. It makes a binary tensor of shape BxTx4 (where B is the batch size and T is the sequence length) for the foot joints in the order of [0,2,1,3] i.e., L_Ankle, L_Foot, R_Ankle, R_Foot. However, a few lines later here, it is multiplied by v and h tensors that represent the speed and height of each joint in the order of [0,1,2,3] i.e., L_Ankle, R_Ankle, L_Foot, R_Foot. This mis-alignment in the tensor dimensions can potentially make the resulting score invalid.

Please let me know if I am missing something.

How to divide dataset

Hi Chengan, thank you very much for your amazing work and the open source!

I would like to ask a question. After running amass.py , I got three folders: train/val/test , they weren't be divided into generative/signle/gmp. So how to divide dataset into generative/signle/gmp?

Can NeMF be used for generating physically plaussible and smooth motion sequence from coarse predicted motion sequence

Bug Appears While Trying to Create BVH Viz with Generative Model

Hello, I was trying to run the generative model with my custom BVH dataset. So I followed a similar approach while training the model and generating the test outputs. As you did in generative.py, I used BVH.save function to save the generated anim that is created with positions and rotations arrays. It does not give any errors if the step = 1. But while generating the 60 FPS animation with step = 0.5, it gives an error. The error is caused by the positions array.

While creating the positions array, if 'trans' key does not exist inside the reconstructed data, it creates a zero array with the shape of self.input_data['trans']:

NeMF/src/nemf/generative.py

Lines 451 to 458 in 7991843

 if 'trans' in self.recon_data.keys(): 

 origin = self.input_data['trans'][:, 0] 

 positions = self.recon_data['trans'] 

 positions[..., self.v_axis] = positions[..., self.v_axis] + origin[..., self.v_axis].unsqueeze(1) 

 positions_gt = self.input_data['trans'] # (B, T, 3) 

 else: 

 positions = torch.zeros_like(self.input_data['trans']) # (B, T, 3) 

 positions_gt = torch.zeros_like(self.input_data['trans']) # (B, T, 3)

The reconstructed 60 FPS animation contains 2T amount of frames while the input data contains only T amount of frames. That is why the positions array also contains T amount of frames. Since the rotations and positions arrays do not have the matching frame count, the program gives an error.

I fixed this issue by simply passing (b_size, rotmat.shape[1], 3) as the shape parameters in the creation of the positions array. I did not check if I get any errors if the 'trans' key is available in reconstructed data.

should varaible "rotmat_recon" in decode function match to 'global_xform' ?

hi, I have a little question about the generative model. I looked throught the code, in decode function, the varaible "rotmat_recon" is transformed from "local_motion". And it is transformed into varaible "local_rotmat" using function self.fk.global_to_local() for calculating "pos_recon"

so can it be regarded as the reconstructed "global_xform"? The "rotmat_recon" is further assigned as output['rotmat']. In backward() function , it is used to calculate the rotmat_recon_loss by comparing with self.input_data['rotmat']. But does the gt varaible self.input_data['rotmat'] should be gt "local_rotmat" instead of gt "global_xform"? Because in data processing the varaible "rotmat" is calculated straight from smpl data

Multiplication with args.temporal_scale leads to layer size mismatch in reconstruction

Hi,
I'm trying to do a sanity test of reconstructing 1 motion of 16 frames from the DFaust_67 dataset. I'll just detail the flow of code before the mismatch error happens.

The code goes to L132 of generative.py (inside encode_local). At this point, the size of "x" for me is, (1,360,16).
16 being the length of the clip, 360 being 24 joints * 15 params.
Then self.local_encoder(x) is called and the tensor goes through LocalEncoder's forward method.

It goes through 4 layers, with the output size of each layer being:
torch.Size([1, 420, 8])
torch.Size([1, 540, 4])
torch.Size([1, 840, 2])
torch.Size([1, 1680, 1])

After the view operation, the last layer outputs a 1x1680 tensor.

This when passed to self.mu() - L82 of prior.py - gives a size mismatch error as follows:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1680 and 13440x1024)

The Linear layer expects input of size 13440, which is 1680 x args.temporal_scale .
However the output of the last layer I get is of 1680.

I don't know how to account for args.temporal_scale here..

Can you please let me know what I'm doing wrong / how I can fix this?

Thank you so much!

Best,
S

process animal data

Hi! In animal.py file, at

NeMF/src/datasets/animal.py

Lines 39 to 42 in 146a1ea

 data_translation = anim.positions[:, 0] / 100.0 

 # insert identity quaternion 

 data_rotation = np.insert(data_rotation, [5, 9, 13, 16, 19, 21], [1, 0, 0, 0], axis=1)

Q1 why do : data_translation = anim.positions[:, 0] / 100.0 ?
It seems that it rescaled root translation with a factor of 100, but why do this? And why we don't scale the whole anim.positions?

Q2 why we insert identity quaternion for rotation ?

Thanks!

how to visualize npz animation in blender

May I ask how to visualize npz data with the SMPL-X Blender add-on, step by step?
I have added the SMPL-X Blender add-on, and imported animation of npz data by using "add animation" of that add-on, but I didn't see the human model move. Sorry, I am new to this. Thanks!

Inertialization baseline implementation

Can you point to the code of the inertialization baseline in the repo?
It is not actually an issues but thanks for the help :)

Questions about the processing of SMPL model and AMASS dataset

Hi Chengan, thank you very much for your amazing work and the open source!

I would like to ask a question about the difference between the provided processed SMPL model and the official one, as mentioned here. What to do to the SMPL model during pre-processing?

	if 'trans' in self.recon_data.keys():
	origin = self.input_data['trans'][:, 0]
	positions = self.recon_data['trans']
	positions[..., self.v_axis] = positions[..., self.v_axis] + origin[..., self.v_axis].unsqueeze(1)
	positions_gt = self.input_data['trans'] # (B, T, 3)
	else:
	positions = torch.zeros_like(self.input_data['trans']) # (B, T, 3)
	positions_gt = torch.zeros_like(self.input_data['trans']) # (B, T, 3)

	data_translation = anim.positions[:, 0] / 100.0

	# insert identity quaternion
	data_rotation = np.insert(data_rotation, [5, 9, 13, 16, 19, 21], [1, 0, 0, 0], axis=1)