in2IN:Leveraging individual Information to Generate Human INteractions

🔎 About

Generating human-human motion interactions conditioned on textual descriptions is a very useful application in many areas such as robotics, gaming, animation, and the metaverse. Alongside this utility also comes a great difficulty in modeling the highly dimensional inter-personal dynamics. In addition, properly capturing the intra-personal diversity of interactions has a lot of challenges. Current methods generate interactions with limited diversity of intra-person dynamics due to the limitations of the available datasets and conditioning strategies. For this, we introduce in2IN, a novel diffusion model for human-human motion generation which is conditioned not only on the textual description of the overall interaction but also on the individual descriptions of the actions performed by each person involved in the interaction. To train this model, we use a large language model to extend the InterHuman dataset with individual descriptions. As a result, in2IN achieves state-of-the-art performance in the InterHuman dataset. Furthermore, in order to increase the intra-personal diversity on the existing interaction datasets, we propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D. As a result, DualMDM generates motions with higher individual diversity and improves control over the intra-person dynamics while maintaining inter-personal coherence.

📌 News

[2024-06-04] Code, model weights, and additional training data are now available!
[2024-04-16] Our paper is available on arXiv
[2024-04-06] in2IN is now accepted at CVPR 2024 Workshop HuMoGen!

📝 TODO List

Release code
Release model weights
Release individual descriptions from InterHuman dataset.
Release visualization code.

💻 Usage

🛠️ Installation

Clone the repo

git clone https://github.com/pabloruizponce/in2IN.git

Install the requirements

Download the required libraries
```
pip install -r requirements.txt
```
Install ffmpeg
```
sudo apt update
sudo apt install ffmpeg
```

Warning

All the code has been tested with Ubuntu 22.04.3 LTS x86_64 using Python 3.12.2 and CUDA 12.3.1. If you have any issues, please open and issue.

Download the individual descriptions from the InterHuman dataset from here and place them in the data folder.

Important

The original InterHuman dataset is needed to run the code. You can download it from here. If you use the dataset, please cite us and the original paper.

🕹️ Inference

Download the model weights from here and place them in the checkpoints folder.

  python in2in/scripts/infer.py \
      --model configs/models/in2IN.yaml \
      --infer configs/infer.yaml \
      --mode interaction \
      --out results \
      --device 0 \
      --text_interaction "Interaction textual description" \
      --text_individual1 "Individual textual description" \
      --text_individual2 "Individual textual description" \
      --name "output_name" \

Note

More information about the parameters can be found using the --help flag.

🏃🏻‍♂️ Training

  python in2in/scripts/train.py \
      --train configs/train/in2IN.yaml \
      --model configs/models/in2IN.yaml \
      --data configs/datasets.yaml \
      --mode interaction \
      --device 0 \

🎖️ Evaluation

Download the evaluator model weights from here and place them in the checkpoints folder.

Interaction Quality

  python in2in/scripts/eval/interhuman.py \
      --model configs/models/in2IN.yaml \
      --evaluator configs/eval.yaml \
      --mode [interaction, dual] \
      --out results \
      --device 0 \

Individual Diversity

  python in2in/scripts/eval/DualMDM.py \
      --model configs/models/DualMDM.yaml \
      --evaluator configs/eval.yaml \
      --device 0 \

📚 Citation

If you find our work helpful, please cite:

@InProceedings{Ruiz-Ponce_2024_CVPR,
    author    = {Ruiz-Ponce, Pablo and Barquero, German and Palmero, Cristina and Escalera, Sergio and Garc{\'\i}a-Rodr{\'\i}guez, Jos\'e},
    title     = {in2IN: Leveraging Individual Information to Generate Human INteractions},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {1941-1951}
}

🫶🏼 Acknowledgments

InterGen as we inherit a lot of code from them.
MDM as we used their evaluation code for text-motion models.
Diffusion Models Beat GANS on Image Synthesis as we used their gaussian diffusion code as a base for our implementation.

pabloruizponce / in2in Goto Github PK

in2in's Introduction

in2IN:Leveraging individual Information to Generate Human INteractions

🔎 About

📌 News

📝 TODO List

💻 Usage

🛠️ Installation

🕹️ Inference

🏃🏻‍♂️ Training

🎖️ Evaluation

Interaction Quality

Individual Diversity

📚 Citation

🫶🏼 Acknowledgments

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent