Giter VIP home page Giter VIP logo

obamanet's Introduction

ObamaNet : Lip Sync from Audio

NMPC NMPC NMPC NMPC NMPC

List of Contents

Requirements

You may install the requirements by running the following command

sudo pip3 install -r requirements.txt

The project is built for python 3.5 and above. The other libraries are listed below

  • OpenCV (sudo pip3 install opencv-contrib-python)
  • Dlib (sudo pip3 install dlib) with this file unzipped in the data folder
  • Python Speech Features (sudo pip3 install python-speech-features)

For a complete list refer to requirements.txt file.

I used the tools below to extract and manipulate the data:

Data Extraction


I extracted the data from youtube using youtube-dl. It's perhaps the best downloader for youtube on linux. Commands for extracting particular streams are given below.

  • Subtitle Extraction
youtube-dl --sub-lang en --skip-download --write-sub --output '~/obamanet/data/captions/%(autonumber)s.%(ext)s' --batch-file ~/obamanet/data/obama_addresses.txt --ignore-config
  • Video Extraction
youtube-dl --batch-file ~/obamanet/data/obama_addresses.txt -o '~/obamanet/data/videos/%(autonumber)s.%(ext)s' -f "best[height=720]" --autonumber-start 1

(Videos not available in 720p: 165)

  • Video to Audio Conversion
python3 vid2wav.py
  • Video to Images
ffmpeg -i 00001.mp4 -r 1/5 -vf scale=-1:720 images/00001-$filename%05d.bmp

To convert from BMP format to JPG format, use the following in the directory

mogrify -format jpg *.bmp
rm -rf *.bmp

Copy the patched images into folder a and the cropped images to folder b

python3 tools/process.py --input_dir a --b_dir b --operation combine --output_dir c
python3 tools/split.py --dir c

You may use this pretrained model or train pix2pix from scratch using this dataset. Unzip the dataset into the pix2pix main directory.

python3 pix2pix.py --mode train --output_dir output --max_epochs 200 --input_dir c/train/ --which_direction AtoB

To run the pix2pix trained model

python3 pix2pix.py --mode test --output_dir test_out/ --input_dir c_test/ --checkpoint output/

To convert images to video

ffmpeg -r 30 -f image2 -s 256x256 -i %d-targets.png -vcodec libx264 -crf 25 ../targets.mp4

Pretrained Model

Link to the pretrained model and a subset of the data is here - Link

Download and extract the checkpoints and the data folders into the repository. The file structure should look as shown below.

obamanet
|
└─ data
|   | audios
|   | a2key_data
|   ...
|
└─ checkpoints
|   | output
|   | model.h5
|   ...
└─ train.py
└─ run.py
└─ run.sh
...

Running sample wav file

Run the following commands

bash run.sh <relative_path_to_audio_wav_file>

Example:

bash run.sh data/audios/karan.wav

Feel free to experiment with different voices. However, the result will depend on how close your voice is to the subject we trained on.

Citation


If you use this code for your research, please cite the paper this code is based on: ObamaNet: Photo-realistic lip-sync from text and also the amazing repository of pix2pix by affinelayer.

Cite as arXiv:1801.01442v1 [cs.CV]

FAQs

obamanet's People

Contributors

karanvivekbhargava avatar seancheey avatar

Stargazers

Akash Nidhi P S avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.