Giter VIP home page Giter VIP logo

emogen's Introduction

Emotionally Enhanced Talking Face Generation

PWC GitHub Stars

Results.mp4

This repository is the official PyTorch implementation of our paper: Emotionally Enhanced Talking Face Generation. We introduce a multimodal framework to generate lipsynced videos agnostic to any arbitrary identity, language, and emotion. Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.

Model

๐Ÿ“‘ Original Paper ๐Ÿ“ฐ Project Page ๐ŸŒ€ Demo โšก Live Testing
Paper Project Page Demo Video Interactive Demo

Note: Currently, our web-interface utilizes CPU for generating results.

Disclaimer

All results from this open-source code or our demo website should only be used for research/academic/personal purposes only.

Prerequisites

  • ffmpeg: sudo apt-get install ffmpeg
  • Install necessary packages using pip install -r requirements.txt.
  • Face detection pre-trained model should be downloaded to face_detection/detection/sfd/s3fd.pth. Alternative link if the above does not work.

Preparing CREMA-D for training

Download data

Download the data from this repo.

Convert videos to 25 fps

python convertFPS.py -i <raw_video_folder> -o <folder_to_save_25fps_videos>

Preprocess dataset

python preprocess_crema-d.py --data_root <folder_of_25fps_videos> --preprocessed_root preprocessed_dataset/

Train!

There are three major steps: (i) Train the expert lip-sync discriminator, (ii) Train the emotion discriminator (iii) Train the EmoGen model.

Training the expert discriminator

python color_syncnet_train.py --data_root preprocessed_dataset/ --checkpoint_dir <folder_to_save_checkpoints>

Training the emotion discriminator

python emotion_disc_train.py -i preprocessed_dataset/ -o <folder_to_save_checkpoints>

Training the main model

python train.py --data_root preprocessed_dataset/ --checkpoint_dir <folder_to_save_checkpoints> --syncnet_checkpoint_path <path_to_expert_disc_checkpoint> --emotion_disc_path <path_to_emotion_disc_checkpoint>

You can also set additional less commonly-used hyper-parameters at the bottom of the hparams.py file.

Note: For simplification in the code, we have used torch.utils.data.random_split in the training scripts to split the CREMA-D dataset into training and testing sets. There is no official train-test split of CREMA-D. Ideally, you should follow this evaluation protocol for splitting.

Inference

Comment these code lines for inference: line1 and line2.

python inference.py --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source> --emotion <categorical emotion>

The result is saved (by default) in results/{emotion}.mp4. You can specify it as an argument, similar to several other available options. The audio source can be any file supported by FFMPEG containing audio data: *.wav, *.mp3, or even a video file, from which the code will automatically extract the audio. Choose categorical emotion from this list: [HAP, SAD, FEA, ANG, DIS, NEU].

Tips for better results:

  • Experiment with the --pads argument to adjust the detected face bounding box. Often leads to improved results. You might need to increase the bottom padding to include the chin region. E.g., --pads 0 20 0 0.
  • If you see the mouth position dislocated or some weird artifacts such as two mouths, then it can be because of over-smoothing the face detections. Use the --nosmooth argument and give it another try.
  • Experiment with the --resize_factor argument, to get a lower-resolution video. Why? The models are trained on faces that were at a lower resolution. You might get better, visually pleasing results for 720p videos than for 1080p videos (in many cases, the latter works well too).

Evaluation

Please check the evaluation/ folder for the instructions.

Future Plans

  • Train the model on MEAD dataset.
  • Develop a metric to evaluate the video quality in case of emotion incorporation.
  • Improve the demo website based on the user study in the paper.

Citation

This repository can only be used for personal/research/non-commercial purposes. Please cite the following paper if you use this repository:

@misc{goyal2023emotionally,
      title={Emotionally Enhanced Talking Face Generation}, 
      author={Sahil Goyal and Shagun Uppal and Sarthak Bhagat and Yi Yu and Yifang Yin and Rajiv Ratn Shah},
      year={2023},
      eprint={2303.11548},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

Copyright (c) 2023 Sahil Goyal, Shagun Uppal, Sarthak Bhagat, Yi Yu, Yifang Yin, Rajiv Ratn Shah

For license information, see the license.

Acknowledgments

The code structure is inspired by Wav2Lip. We thank the authors for the wonderful code. The code for Face Detection has been taken from the face_alignment repository. We thank the authors for releasing their code and models. Demo website is developed by @ddhroov10 and @SakshatMali.

emogen's People

Contributors

sahilg06 avatar sarthak268 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.