Thank you for your amazing work!
I have a few questions regarding the evaluation metrics used for the transition part, specifically with the HumanML3D dataset. Given that there's no ground truth available, could you please explain how the FID, Div, PJ, AUJ was calculated for this dataset?
Furthermore, concerning the Peak Jerk metric, I'm interested in knowing the values used for the HumanML3D dataset.
Could you please share the details of Jerk calculation? I'm wondering what values are used among 263 dimension. did the calculation of jerk consider only the joint locations, or did it also include joint rotations? Additionally, is the delta_t for Jerk calculation defined by frame or second?
I appreciate your time and look forward to your insights.
Best,
awdrkjlk966