Giter VIP home page Giter VIP logo

qpgesture's Introduction

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Further Work

๐Ÿ“ข DiffuseStyleGesture/DiffuseStyleGesture+ - Based on the diffusion model, the full body gesture.

๐Ÿ“ข UnifiedGesture - Training on multiple gesture datasets, refine the gestures.

1. Environment Settings

This code was tested on NVIDIA GeForce RTX 2080 Ti and requires conda or miniconda.

conda create -n QPGesture python=3.7
conda activate QPGesture
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html



pip install fasttext-wheel 
pip install fasttext

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

2. Quick Start

Download our processed database and pre-trained models from Tsinghua Cloud or Google Cloud and place them in the data fold and pretrained_model fold in the project path.

cd ./codebook/Speech2GestureMatching/
bash GestureKNN.sh

This is an audio clip about 24 seconds long and it takes about 5 minutes to match. You will get the results in ./codebook/Speech2GestureMatching/output/result.npz

cd ..
python VisualizeCodebook.py --config=./configs/codebook.yml --gpu 0 --code_path "./Speech2GestureMatching/output/result.npz" --VQVAE_model_path "../pretrained_model/codebook_checkpoint_best.bin" --stage inference

Then you will get .bvh, .mp4 and other intermediate files in .codebook/Speech2GestureMatching/output/knn_pred_wavvq/

knn_pred_wavvq_generated.mp4

You can use Blender to visualize bvh file.

0001-1440.mp4

We also provide a processed database for speaker id 1, available for download in Tsinghua Cloud and Baidu Cloud. It is optional to use this database. We recommend trying Speaker 1, which has a larger database and better performance.

3. Test your own audio

Here, we need to build the test set. We use ./data/Example3/4.wav as an example. Note that no text is used here.

Download vq-wav2vec Gumbel from fairseq and put it in ./process/. Modify the code of fairseq in conda or miniconda according to this issue.

Then run:

cd ./process/
python make_test_data.py --audio_path "../data/Example3/4.wav" --save_path "../data/Example3/4"

You will get ./data/Example3/4/wavvq_240.npz Then similar to the previous step, just run the following code:

cd ../codebook/Speech2GestureMatching/
bash GestureKNN.sh "../../data/Example3/4/wavvq_240.npz" 0 "./output/result_Example3.npz"

4. Constructing database

Install gentle like Trimodal to align the text and audio, this will take some minutes:

cd ./process/
git clone https://github.com/lowerquality/gentle.git
cd gentle
./install.sh

You can verify whether gentle is installed successfully with the following command:

python align.py './examples/data/lucier.mp3' './examples/data/lucier.txt'

Download the WavLM Large and put it into ./pretrained_model/. Download the character you want to build from BEAT, you can put it in ./dataset/orig_BEAT/ or other places. Here is an example of speaker id 10:

python make_beat_dataset.py --BEAT_path "../dataset/orig_BEAT/speakers/" --save_dir "../dataset/BEAT" --prefix "speaker_10_state_0" --step 1
cd ../codebook/Speech2GestureMatching/
python normalize_audio.py
python mfcc.py
cd ../../process/
python make_beat_dataset.py --BEAT_path "../dataset/orig_BEAT/speakers/" --save_dir "../dataset/BEAT" --prefix "speaker_10_state_0" --step 2

Now we get a basic database and further we compute phase, wavlm and wavvq features:

cd ../codebook/
python PAE.py --config=./configs/codebook.yml --gpu 0 --stage inference
cd ../process/
python make_beat_dataset.py --config "../codebook/configs/codebook.yml" --BEAT_path "../dataset/orig_BEAT/speakers/" --save_dir "../dataset/BEAT" --prefix "speaker_10_state_0" --gpu 0 --step 3
python make_beat_dataset.py --config "../codebook/configs/codebook.yml" --BEAT_path "../dataset/orig_BEAT/speakers/" --save_dir "../dataset/BEAT" --prefix "speaker_10_state_0" --gpu 0 --step 4

Then you will get all the databases in Quick Start.

5. Train your own model

Data preparation

This is just an example of speaker id 10, in fact we use all speakers to train these models.

pip install numpy==1.19.5       # Unfortunately, we have been troubled with the version of the numpy library (with pyarrow).
python beat_data_to_lmdb.py --config=../codebook/configs/codebook.yml --gpu 0

Then you will get data mean/std, and you may copy them to ./codebook/configs/codebook.yml.

gesture VQ-VAE

cd ../codebook/
python train.py --config=./configs/codebook.yml --gpu 0

The gesture VQ-VAE will saved in ./codebook/output/train_codebook/codebook_checkpoint_best.bin.

For futher calculate the distance between each code, run

python VisualizeCodebook.py --config=./configs/codebook.yml --gpu 0 --code_path "./Speech2GestureMatching/output/result.npz" --VQVAE_model_path "./output/train_codebook/codebook_checkpoint_best.bin" --stage train

Then you will get the absolute pose of each code in ./codebook/output/code.npz used in Quick Start.

PAE

python PAE.py --config=./configs/codebook.yml --gpu 0 --stage train

The PAE will saved in ./codebook/output/train_PAE/PAE_checkpoint_best.bin

Reference

This work is highly inspired by Bailando, KNN and DeepPhase.

Citation

If you find this work useful, please consider cite our work with the following bibtex:

@inproceedings{yang2023QPGesture,
  author       = {Sicheng Yang and Zhiyong Wu and Minglei Li and Zhensong Zhang and Lei Hao and Weihong Bao and Haolin Zhuang},
  title        = {QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation},
  booktitle    = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  publisher    = {{IEEE}},
  month        = {June},
  year         = {2023},
  pages        = {2321-2330}
}

Please feel free to contact us [email protected] with any question or concerns.

qpgesture's People

Contributors

youngseng avatar hmthanh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.