Giter VIP home page Giter VIP logo

humanbehavioranimation's Introduction

Human Behavior Animation

This project explores the human behavior animation, including gesture, etc., as part of my graduate research at Peking University, supervised by Libin Liu.


SIGGRAPH 2024
Semantic Gesticulator: Semantics-aware Co-speech Gesture Synthesis
Zeyi Zhang*, Tenglong Ao*, Yuyao Zhang*, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu, ACM Trans. Graph. 43, 4, Article 136.

In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit.

- Video (YouTube) - Paper (arXiv) - Project Page (github) - Code (github) - Dataset (github) -


SIGGRAPH 2023 (Technical Best Paper Honorable Mention)
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
Tenglong Ao, Zeyi Zhang, Libin Liu, ACM Trans. Graph. 42, 4, Article 40.

The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, stylized co-speech gestures with flexible style control. We leverage the power of the large-scale Contrastive-Language-Image-Pre-training (CLIP) model and present a novel CLIP-guided mechanism that extracts efficient style representations from multiple input modalities, such as a piece of text, an example motion clip, or a video. Our system learns a latent diffusion model to generate high-quality gestures and infuses the CLIP representations of style into the generator via an adaptive instance normalization (AdaIN) layer. We further devise a gesture-transcript alignment mechanism that ensures a semantically correct gesture generation based on contrastive learning. Our system can also be extended to allow fine-grained style control of individual body parts. We demonstrate an extensive set of examples showing the flexibility and generalizability of our model to a variety of style descriptions.

- Video (YouTube | Bilibili) - Paper (arXiv) - Project Page (github) -
- Explained (SIGGRAPH Presentation(English) | 知乎(Chinese)) -


SIGGRAPH Asia 2022 (Technical Best Paper Award)
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu, ACM Trans. Graph. 41, 6, Article 209.

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. In this work, we present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm and semantics-aware gesture synthesis.

- Video (YouTube | Bilibili) - Paper (arXiv) - Code (github) - Dataset (github) -
- Explained (YouTube(English) | 知乎(Chinese)) -

Acknowledgement

The layout of this project is highly inspired by AI4Animation repo.

Copyright Information

This project is only for research or education purposes, and not freely available for commercial use or redistribution.

humanbehavioranimation's People

Contributors

aubrey-ao avatar github-actions[bot] avatar hmthanh avatar

Stargazers

Roman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.