Giter VIP home page Giter VIP logo

barquerogerman / flowmdm Goto Github PK

View Code? Open in Web Editor NEW
158.0 11.0 7.0 35.78 MB

[CVPR 2024] Official Implementation of "Seamless Human Motion Composition with Blended Positional Encodings".

Home Page: https://barquerogerman.github.io/FlowMDM/

License: Other

Python 99.85% Shell 0.15%
diffusion generative-model human-motion motion-generation human-motion-composition human-motion-extrapolation cvpr cvpr2024

flowmdm's Introduction

FlowMDM

Seamless Human Motion Composition with Blended Positional Encodings (CVPR'24)

Project arXiv visits


Human Motion Composition Human Motion Extrapolation

πŸ”Ž About


Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.

πŸ“Œ News

  • [2024-05-13] Eval/Gen instructions updated (wrong value in --bpe_denoising_step fixed)
  • [2024-03-18] Code + model weights released!
  • [2024-02-27] FlowMDM is now accepted at CVPR 2024!
  • [2024-02-26] Our paper is available in Arxiv.

πŸ“ TODO List

  • Release pretrained models.
  • Release generation (skeletons + blender support for meshes) + evaluation + training code.
  • Release generation code for demo-style visualizations.

πŸ‘©πŸ»β€πŸ« Getting started

This code was tested on Ubuntu 20.04.6 LTS + Python 3.8 + PyTorch 1.13.0 While other versions might work as well, we recommend using this conda environment to avoid any issues.

  1. Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg

For windows use this instead.

  1. Setup conda env:
conda env create -f environment.yml
conda activate FlowMDM
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
conda install ffmpeg -y

This README file contains instructions on how to visualize, evaluate, and train the model.

Note

This repository inherits a lot of work from the original MDM and Guided-Diffusion repositories. Most of FlowMDM's contribution can be found in the model/FlowMDM.py and diffusion/diffusion_wrappers.py files, and the model/x_transformers folder.

πŸ“š Citation

If you find our work helpful, please cite:

@inproceedings{barquero2024seamless,
  title={Seamless Human Motion Composition with Blended Positional Encodings},
  author={Barquero, German and Escalera, Sergio and Palmero, Cristina},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

🀝🏼 Acknowledgements

  • TEMOS: We inherit a lot of the code from TEMOS.
  • TEACH: We use TEACH in our work, and inherit part of the code from them.
  • MDM: We use MDM in our work, and inherit as well part of the code.
  • PriorMDM: We use PriorMDM in our work, and inherit as well part of the code.
  • x-transformers: BPEs are built on their transformers library.

⭐ Star History

Star History Chart

flowmdm's People

Contributors

barquerogerman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flowmdm's Issues

Regarding Evaluation metric

Thank you for your amazing work!

I have a few questions regarding the evaluation metrics used for the transition part, specifically with the HumanML3D dataset. Given that there's no ground truth available, could you please explain how the FID, Div, PJ, AUJ was calculated for this dataset?

Furthermore, concerning the Peak Jerk metric, I'm interested in knowing the values used for the HumanML3D dataset.
Could you please share the details of Jerk calculation? I'm wondering what values are used among 263 dimension. did the calculation of jerk consider only the joint locations, or did it also include joint rotations? Additionally, is the delta_t for Jerk calculation defined by frame or second?

I appreciate your time and look forward to your insights.

Best,
awdrkjlk966

An error(maybe) motion occured when I use a modified input.

When I modified the sequences in the file "composition_babel.json", I have gotten a strange result.
This is my modification:
line 43-48:
"kung fu pose"--->"kung fu pose",
"kung fu pose"--->"dance",
"kung fu pose"--->"lie down",
"step left"---> "stand up",
"throw baseball"--->"throw baseball",
"catch the ball"--->"catch the ball"

The result is here, may I ask if this is normal?
https://github.com/BarqueroGerman/FlowMDM/assets/72643015/97122869-7fe9-4c2c-a592-998b7b013553

An error occurs when running environment.yml。What should I do?

PackagesNotFoundError: The following packages are not available from current channels:

  • zlib==1.2.13=h5eee18b_0
  • xz==5.2.6=h5eee18b_0
  • tk==8.6.12=h1ccaba5_0
  • sqlite==3.40.0=h5082296_0
  • setuptools==65.5.0=py38h06a4308_0
  • readline==8.2=h5eee18b_0
  • python==3.8.15=h7a1cb2a_2
  • pip==22.2.2=py38h06a4308_0
  • openssl==1.1.1s=h7f8727e_0
  • numpy-base==1.23.4=py38h31eccc5_0
  • numpy==1.23.4=py38h14f4228_0
  • ncurses==6.3=h5eee18b_3
  • mkl_random==1.2.2=py38h51133e4_0
  • mkl_fft==1.3.1=py38hd3c417c_0
  • mkl-service==2.4.0=py38h7f8727e_0
  • mkl==2021.4.0=h06a4308_640
  • libstdcxx-ng==11.2.0=h1234567_1
  • libgomp==11.2.0=h1234567_1
  • libgcc-ng==11.2.0=h1234567_1
  • libffi==3.4.2=h6a678d5_6
  • ld_impl_linux-64==2.38=h1181459_1
  • intel-openmp==2021.4.0=h06a4308_3561
  • certifi==2022.9.24=py38h06a4308_0
  • ca-certificates==2022.10.11=h06a4308_0
  • _openmp_mutex==5.1=1_gnu

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters

Hello BarqueroGerman,

I'm working on replicating your model's performance but noticed a gap between my results and the pretrained model's performance. I've confirmed that my hyperparameters match the ones in your Readme. Could you share the pretrained model's hyperparameters to help me troubleshoot? The performence of my trained model is shown.
image
image

Thanks

Source code?

Just wondering if there's any timeline about when source code will be available to the public? Would love to have a look and play around with it

How can I reduce GPU memory usage in generation?

I've noticed that FlowMDM consumes over 14GB of GPU memory during the generate phase, which is much higher than the original MDM. What could be the reason for this increased memory consumption? Is there a way to reduce the memory usage so that it can run on a GPU with only 8GB of memory?

Regarding GT Jerk Computations

Hello. Thanks for the great work! πŸ™‚
Could you explain why GT jerk values are constant numbers, which do not vary along the temporal axis (unlike the generated jerk values)?

Why split query, key, value into rotary and non-rotary parts?

I am intrigued by the code on line 943 of the file 'FlowMDM/model/x_transformers/x_transformers.py':

 (ql, qr), (kl, kr), (vl, vr) = map(lambda t: (t[..., :l], t[..., l:]), (q, k, v)) # split query, key, value into rotary and non-rotary parts

Could you please explain the rationale behind splitting the query, key, and value into rotary and non-rotary parts? I would appreciate your insight. Thank you!

BVH file as a output

Congratulation for the awesome work!

Can you please provide a code to get a BVH file as a output?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.