Giter VIP home page Giter VIP logo

difftalk's Introduction

DiffTalk

The pytorch implementation for our CVPR2023 paper "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation".

[Project] [Paper] [Video Demo]

Requirements

  • python 3.7.0
  • pytorch 1.10.0
  • pytorch-lightning 1.2.5
  • torchvision 0.11.0
  • pytorch-lightning==1.2.5

For more details, please refer to the requirements.txt. We conduct the experiments with 8 NVIDIA 3090Ti GPUs.

Put the first stage model to ./models.

Dataset

Please download the HDTF dataset for training and test, and process the dataset as following.

Data Preprocessing:

  1. Set all videos to 25 fps.
  2. Extract the audio signals and facial landmarks.
  3. Put the processed data in ./data/HDTF, and construct the data directory as following.
  4. Constract the data_train.txt and data_test.txt as following.

./data/HDTF:

|——data/HDTF
   |——images
      |——0_0.jpg
      |——0_1.jpg
      |——...
      |——N_M.bin
   |——landmarks
      |——0_0.lmd
      |——0_1.lmd
      |——...
      |——N_M.lms
   |——audio_smooth
      |——0_0.npy
      |——0_1.npy
      |——...
      |——N_M.npy

./data/data_train(test).txt:

0_0
0_1
0_2
...
N_M

N is the total number of classes, and M is the class size.

Training

sh run.sh

Test

sh inference.sh

Weakness

  1. The DiffTalk models talking head generation as an iterative denoising process, which needs more time to synthesize a frame compared with most GAN-based approaches. This is also a common problem of LDM-based works.
  2. The model is trained on the HDTF dataset, and it sometimes fails on some identities from other datasets.
  3. When driving a portrait with more challenging cross-identity audio, the audio-lip synchronization of the synthesized video is slightly inferior to the ones under self-driven setting.
  4. During inference, the network is also sensitive to the mask shape in z_T , where the mask needs to cover the mouth region completely and its shape cannot leak any lip shape information.

Acknowledgement

This code is built upon the publicly available code latent-diffusion. Thanks the authors of latent-diffusion for making their excellent work and codes publicly available.

Citation

Please cite the following paper if you use this repository in your research.

@inproceedings{shen2023difftalk,
   author={Shen, Shuai and Zhao, Wenliang and Meng, Zibin and Li, Wanhua and Zhu, Zheng and Zhou, Jie and Lu, Jiwen},
   title={DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation},
   booktitle={CVPR},
   year={2023}
}

difftalk's People

Contributors

sstzal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.