Giter VIP home page Giter VIP logo

lipsdontlie's Introduction

Lips Don't Lie: FAST Visual-Lip-Reading Deep Learning model


Tom Bekor, Mitchell Butovsky

Final project as a part of Technion's IEM 097215 "Deep learning for NLP" & EE 046211 "Deep Learning" ๐ŸŒ .

Implemented in PyTorch ๐Ÿ”ฅ.

Description ๐Ÿ‘„

In this project we combine the BlazeFace algorithm and the transformer architecture and acheive near SOTA performance on the GRID dataset with very fast training and inference.

The Repository ๐Ÿงญ

We provide here a short explaination about the structure of this repository:

  • videos/[speaker_id] and alignments/[speaker_id] contain the raw data from the GRID dataset; videos and word alignments respectievly.
  • npy_landmarks and npy_alignments contain the processed videos and alignments. The pre-processing is done automatically by running preprocess.py. The pre-processing mechanisem itself is splitted to the Video.py which pre-processes the videos and Annotation.py which pre-processes the alignments.
  • dataloader.py contains data loaders for both training and testing as well as a tokenizer which prepares the data for the transformer. Tokenization is done using vocab.txt which contains all the possible tokens, as well as <pad>, <sos> and <eos> tokens.
  • model.py contains our architecture, divided to the Transformer and an additional Landmarks Neural Net modules.
  • run.py is the main file of our project. It trains the architecture and then generates predictions on unseen test samples.
  • config.py containts all the constants and hyper-parameters that are used in the project.
  • Finally, inference.py is used to make predictions using the pre-trained models.

Running The Project ๐Ÿƒ

Inference ๐Ÿ”Ž

In order to predict the transcript from some given GRID corpus videos, put them in examples/videos path. Then, just run inference.py. It is possible to change the path/make an inference on a single video by changing the last line of inference.py.

Important: remember to download our pretrained models here, or create them by running run.py

Training ๐Ÿ‹๏ธ

In order to train the model with the preprocessed videos:

  1. Unzip the preprocessed GRID dataset: On Linux, use the command unzip npy_folders.zip. Make sure to have both npy_landmarks, and npy_alignments directories located in your project root.
  2. Run run.py - Training and validation metrics will be save under the metrics directory.
  3. To check the test-set word accuracy, run test-evaluation.py.

In order to train the models from scratch: 4. Download the desired videos to train on from the GRID corpus which can be found here. Make sure that you download the high quality videos and the corresponding word alignments.

  1. Put the videos in the project directory according to the following path format: videos/[speaker_id]/[video.mpg].

    Put the alignments according to the following path format: alignments/[speaker_id]/[alignment.align].

  2. Change the SPEAKERS attribute in the config.py file to a list containing all the speaker ids to train on.

  3. Run preprocess.py. This might take a while

  4. Run run.py.

  5. To check the test-set word accuracy, run test-evaluation.py.

Libraries to Install ๐Ÿ“š

Before trying to run anything please make sure to install all the packages below.

Library Command to Run Minimal Version
NumPy pip install numpy 1.19.5
matplotlib pip install matplotlib 3.3.4
PyTorch pip install torch 1.1.10
Open CV pip install opencv-python 4.5.4
DLib pip install dlib 19.22.1
scikit-learn pip install scikit-learn 0.24.2
tqdm pip install tqdm 4.62.3

lipsdontlie's People

Contributors

tombekor avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.