barquerogerman / flowmdm Goto Github PK

[CVPR 2024] Official Implementation of "Seamless Human Motion Composition with Blended Positional Encodings".

Home Page: https://barquerogerman.github.io/FlowMDM/

License: Other

Python 99.85% Shell 0.15%

diffusion generative-model human-motion motion-generation human-motion-composition human-motion-extrapolation cvpr cvpr2024

flowmdm's Introduction

FlowMDM

Seamless Human Motion Composition with Blended Positional Encodings (CVPR'24)

Human Motion Composition	Human Motion Extrapolation

🔎 About

Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.

📌 News

[2024-05-13] Eval/Gen instructions updated (wrong value in --bpe_denoising_step fixed)
[2024-03-18] Code + model weights released!
[2024-02-27] FlowMDM is now accepted at CVPR 2024!
[2024-02-26] Our paper is available in Arxiv.

📝 TODO List

Release pretrained models.
Release generation (skeletons + blender support for meshes) + evaluation + training code.
Release generation code for demo-style visualizations.

👩🏻‍🏫 Getting started

This code was tested on Ubuntu 20.04.6 LTS + Python 3.8 + PyTorch 1.13.0 While other versions might work as well, we recommend using this conda environment to avoid any issues.

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

Setup conda env:

conda env create -f environment.yml
conda activate FlowMDM
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
conda install ffmpeg -y

This README file contains instructions on how to visualize, evaluate, and train the model.

Note

This repository inherits a lot of work from the original MDM and Guided-Diffusion repositories. Most of FlowMDM's contribution can be found in the model/FlowMDM.py and diffusion/diffusion_wrappers.py files, and the model/x_transformers folder.

📚 Citation

If you find our work helpful, please cite:

@inproceedings{barquero2024seamless,
  title={Seamless Human Motion Composition with Blended Positional Encodings},
  author={Barquero, German and Escalera, Sergio and Palmero, Cristina},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

🤝🏼 Acknowledgements

TEMOS: We inherit a lot of the code from TEMOS.
TEACH: We use TEACH in our work, and inherit part of the code from them.
MDM: We use MDM in our work, and inherit as well part of the code.
PriorMDM: We use PriorMDM in our work, and inherit as well part of the code.
x-transformers: BPEs are built on their transformers library.

⭐ Star History

flowmdm's People

Contributors

Stargazers

Watchers

Forkers

hadryan faisalshahbaz guozanhua218 longervision asksasasa83 iamthephd jags111

flowmdm's Issues

Regarding Evaluation metric

Thank you for your amazing work!

I have a few questions regarding the evaluation metrics used for the transition part, specifically with the HumanML3D dataset. Given that there's no ground truth available, could you please explain how the FID, Div, PJ, AUJ was calculated for this dataset?

Furthermore, concerning the Peak Jerk metric, I'm interested in knowing the values used for the HumanML3D dataset.
Could you please share the details of Jerk calculation? I'm wondering what values are used among 263 dimension. did the calculation of jerk consider only the joint locations, or did it also include joint rotations? Additionally, is the delta_t for Jerk calculation defined by frame or second?

I appreciate your time and look forward to your insights.

Best,
awdrkjlk966

An error(maybe) motion occured when I use a modified input.

When I modified the sequences in the file "composition_babel.json", I have gotten a strange result.
This is my modification:
line 43-48:
"kung fu pose"--->"kung fu pose",
"kung fu pose"--->"dance",
"kung fu pose"--->"lie down",
"step left"---> "stand up",
"throw baseball"--->"throw baseball",
"catch the ball"--->"catch the ball"

The result is here, may I ask if this is normal？
https://github.com/BarqueroGerman/FlowMDM/assets/72643015/97122869-7fe9-4c2c-a592-998b7b013553

An error occurs when running environment.yml。What should I do?

PackagesNotFoundError: The following packages are not available from current channels:

zlib==1.2.13=h5eee18b_0
xz==5.2.6=h5eee18b_0
tk==8.6.12=h1ccaba5_0
sqlite==3.40.0=h5082296_0
setuptools==65.5.0=py38h06a4308_0
readline==8.2=h5eee18b_0
python==3.8.15=h7a1cb2a_2
pip==22.2.2=py38h06a4308_0
openssl==1.1.1s=h7f8727e_0
numpy-base==1.23.4=py38h31eccc5_0
numpy==1.23.4=py38h14f4228_0
ncurses==6.3=h5eee18b_3
mkl_random==1.2.2=py38h51133e4_0
mkl_fft==1.3.1=py38hd3c417c_0
mkl-service==2.4.0=py38h7f8727e_0
mkl==2021.4.0=h06a4308_640
libstdcxx-ng==11.2.0=h1234567_1
libgomp==11.2.0=h1234567_1
libgcc-ng==11.2.0=h1234567_1
libffi==3.4.2=h6a678d5_6
ld_impl_linux-64==2.38=h1181459_1
intel-openmp==2021.4.0=h06a4308_3561
certifi==2022.9.24=py38h06a4308_0
ca-certificates==2022.10.11=h06a4308_0
_openmp_mutex==5.1=1_gnu

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters

Hello BarqueroGerman,

I'm working on replicating your model's performance but noticed a gap between my results and the pretrained model's performance. I've confirmed that my hyperparameters match the ones in your Readme. Could you share the pretrained model's hyperparameters to help me troubleshoot? The performence of my trained model is shown.

Thanks

Source code?

Just wondering if there's any timeline about when source code will be available to the public? Would love to have a look and play around with it

How can I reduce GPU memory usage in generation?

I've noticed that FlowMDM consumes over 14GB of GPU memory during the generate phase, which is much higher than the original MDM. What could be the reason for this increased memory consumption? Is there a way to reduce the memory usage so that it can run on a GPU with only 8GB of memory?

 (ql, qr), (kl, kr), (vl, vr) = map(lambda t: (t[..., :l], t[..., l:]), (q, k, v)) # split query, key, value into rotary and non-rotary parts

Could you please explain the rationale behind splitting the query, key, and value into rotary and non-rotary parts? I would appreciate your insight. Thank you!

BVH file as a output

Congratulation for the awesome work!

Can you please provide a code to get a BVH file as a output?

What's going on with the ffmpeg?

When I was trying to run the demo composition, I meet with the problem as follows: