Giter VIP home page Giter VIP logo

masumbhuiyan / learn-an-effective-lip-reading-model-without-pains Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vipl-audio-visual-speech-understanding/learn-an-effective-lip-reading-model-without-pains

0.0 1.0 0.0 119 KB

The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the state-of-art performance in LRW-1000 dataset.

Python 100.00%

learn-an-effective-lip-reading-model-without-pains's Introduction

Learn an Effective Lip Reading Model without Pains

PWC PWC

Content

Introduction

This is the repository of Learn an Effective Lip Reading Model without Pains. In this repository, we provide pre-trained models and training settings for deep lip reading. We evaluate our pipeline on LRW Dataset and LRW1000 Dataset. We obtain 88.4% and 56.0% on LRW and LRW-1000, respectively. The results are comparable and even surpass current state-of-the-art results. Especially, we reach the current state-of-the-art result (56.0%) on LRW-1000 Dataset.

Benchmark

Year Method LRW LRW-1000
2017 Chung et al. 61.1% 25.7%
2017 Stafylakis et al. 83.5% 38.2%
2018 Stafylakis et al. 88.8% -
2019 Yang et at. - 38.19%
2019 Wang et al. 83.3% 36.9%
2019 Weng et al. 84.1% -
2020 Luo et al. 83.5% 38.7%
2020 Zhao et al. 84.4% 38.7%
2020 Zhang et al. 85.0% 45.2%
2020 Martinez et al. 85.3% 41.4%
2020 Ma et al. 87.7% 43.2%
2020 ResNet18 + BiGRU (Baseline + Cosine LR) 85.0% 47.1%
2020 ResNet18 + BiGRU (Baseline with word boundary + Cosine LR) 87.5% 55.0%
2020 Our Method 86.2% 48.3%
2020 Our Method (with word boundary) 88.4% 56.0%

Dataset Preparation

  1. Download LRW Dataset and LRW1000 Dataset and link lrw_mp4 and LRW1000_Public in the root of this repository:
ln -s PATH_TO_DATA/lrw_mp4 .
ln -s PATH_TO_DATA/LRW1000_Public .
  1. You can run scripts/prepare_lrw.py and scripts/prepare_lrw1000.py to generate training samples of LRW and LRW-1000 Dataset respectively:
python scripts/prepare_lrw.py
python scripts/prepare_lrw1000.py 

The mouth videos, labels, and word boundary information will be saved in the .pkl format. We pack image sequence as jpeg format into our .pkl files and decoding via PyTurboJPEG. You may need to modify the utils/dataset.py file when training your own dataset.

How to test

Link of pretrained weights: Baidu Yun (code: 26qn), Google Drive

If you can not access to provided links, please email [email protected] or [email protected].

To test our provided weights, you should download weights and place them in the root of this repository.

For example, to test baseline on LRW Dataset:

python main_visual.py \
    --gpus='0'  \
    --lr=0.0 \
    --batch_size=128 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=True \
    --save_prefix='checkpoints/lrw-baseline/' \
    --n_class=500 \
    --dataset='lrw' \
    --border=False \
    --mixup=False \
    --label_smooth=False \
    --se=False \
    --weights='checkpoints/lrw-cosine-lr-acc-0.85080.pt'

To test our model in LRW-1000 Dataset:

python main_visual.py \
    --gpus='0'  \
    --lr=0.0 \
    --batch_size=128 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=True \
    --save_prefix='checkpoints/lrw-1000-final/' \
    --n_class=1000 \
    --dataset='lrw1000' \
    --border=True \
    --mixup=False \
    --label_smooth=False \
    --se=True \
    --weights='checkpoints/lrw1000-border-se-mixup-label-smooth-cosine-lr-wd-1e-4-acc-0.56023.pt'

How to train

For example, to train lrw baseline:

python main_visual.py \
    --gpus='0,1,2,3'  \
    --lr=3e-4 \
    --batch_size=400 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=False \
    --save_prefix='checkpoints/lrw-baseline/' \
    --n_class=500 \
    --dataset='lrw' \
    --border=False \
    --mixup=False \
    --label_smooth=False \
    --se=False  

Optional arguments:

  • gpus: the GPU id used for training
  • lr: learning rate. By default, we automatically applied the Linear Scale Rule in code (e.g., lr=3e-4 for 4 GPUs x 32 video/gpu and lr=1.2e-3 for 8 GPUs x 128 video/gpu). We recommend lr=3e-4 for 32 video/gpu when training from scratch. You need to modify the learning rate based on your setting.
  • batch_size: batch size
  • num_workers: the number of processes used for data loading
  • max_epoch: the maximum epochs in training
  • test: The test mode. When using this mode, the program will only test once and exit.
  • weights(optional): The path of pre-trained weight. If this option is specified, the model will load the pre-trained weights by the given location.
  • save_prefix: the save prefix of model parameters
  • n_class: the number of total word classes
  • dataset: the dataset used for training and testing, only lrw and lrw1000 are supported.
  • border: use word boundary indicated variable for training and testing
  • mixup: use mixup in training
  • label_smooth: use label_smooth in training
  • se: use se module in ResNet-18

More training details and setting can be found in our paper. We plan to include more pretrained models in the future.

Dependencies

Citation

If you find this code useful in your research, please consider to cite the following papers:

@article{feng2020learn,
  author       = "Feng, Dalu and Yang, Shuang and Shan, Shiguang and Chen, Xilin",
  title        = "Learn an Effective Lip Reading Model without Pains",
  journal      = "arXiv preprint arXiv:2011.07557",
  year         = "2020",
}

License

The MIT License

learn-an-effective-lip-reading-model-without-pains's People

Contributors

fengdalu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.