Giter VIP home page Giter VIP logo

er-nerf's Introduction

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

This is the official repository for our ICCV 2023 paper Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis.

image

Installation

Tested on Ubuntu 18.04, Pytorch 1.12 and CUDA 11.3.

Install dependency

conda create -n ernerf python=3.10
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0

Preparation

  • Prepare face-parsing model.

    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
  • Prepare the 3DMM model for head pose estimation.

    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
  • Download 3DMM model from Basel Face Model 2009:

    cp 01_MorphableModel.mat data_util/face_tracking/3DMM/
    cd data_util/face_tracking
    python convert_BFM.py
    

Datasets and pretrained models

We get the experiment videos mainly from AD-NeRF, DFRF, GeneFace and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF with the resolution of 450x450.

mkdir -p data/obama
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4

We also provide pretrained checkpoints on the Obama video clip. After completing the data pre-processing step, you can download and test them by:

python main.py data/obama/ --workspace trial_obama/ -O --test --ckpt trial_obama/checkpoints/ngp.pth   # head
python main.py data/obama/ --workspace trial_obama_torso/ -O --test --torso --ckpt trial_obama_torso/checkpoints/ngp.pth   # head+torso

The test results should be about:

setting PSNR LPIPS LMD
head 35.607 0.0178 2.525
head+torso 26.594 0.0446 2.550

Usage

Pre-processing Custom Training Video

  • Put training video under data/<ID>/<ID>.mp4.

    The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5 min.

  • Run script to process the video. (may take several hours)

    python data_utils/process.py data/<ID>/<ID>.mp4
  • Obtain AU45 for eyes blinking

    Run FeatureExtraction in OpenFace, rename and move the output CSV file to data/<ID>/au.csv.

Audio Pre-process

In our paper, we use DeepSpeech features for evaluation. You can select the type of audio feature by --asr_model <deepspeech, esperanto, hubert>.

The sample rate of the audio should be 16k.

  • DeepSpeech

    python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy
  • Wav2Vec

    You can also try to extract audio features via Wav2Vec like RAD-NeRF by:

    python data_utils/wav2vec.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy
  • HuBERT

    In our test, HuBERT extractor performs better for more languages, which has already been used in GeneFace.

    # Borrowed from GeneFace. English pre-trained.
    python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy

Train

First time running will take some time to compile the CUDA extensions.

# train (head and lpips finetune)
python main.py data/obama/ --workspace trial_obama/ -O --iters 100000
python main.py data/obama/ --workspace trial_obama/ -O --iters 125000 --finetune_lips --patch_size 32

# train (torso)
# <head>.pth should be the latest checkpoint in trial_obama
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000

Test

# test on the test split
python main.py data/obama/ --workspace trial_obama/ -O --test # only render the head and use GT image for torso
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test # render both head and torso

Inference with target audio

python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --aud <audio>.npy

Citation

Cite as below if you find this repository is helpful to your project:

@article{li2023ernerf,
  title={Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis},
  author={Li, Jiahe and Zhang, Jiawei and Bai, Xiao and Zhou, Jun and Gu, Lin},
  journal={arXiv preprint arXiv:2307.09323},
  year={2023}
}

Acknowledgement

This code is developed based on RAD-NeRF, DFRF, GeneFace, and AD-NeRF. Thanks for these great projects.

er-nerf's People

Contributors

fictionarry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.