Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading

This repository contains the PyTorch implementation of the following paper:

Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading
Minsu Kim, Jeong Hun Yeo, and Yong Man Ro
[Paper]

Preparation

Requirements

python 3.7
pytorch 1.6 ~ 1.8
torchvision
torchaudio
ffmpeg
av
tensorboard
scikit-image
pillow

Datasets

LRW dataset can be downloaded from the below link.

https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html

The pre-processing will be done in the data loader.
The video is cropped with the bounding box [x1:59, y1:95, x2:195, y2:231].

Testing the Model

To test the model, run following command:

# Testing example for LRW
python test.py \
--lrw 'enter_data_path' \
--checkpoint 'enter_the_checkpoint_path' \
--batch_size 80 \
--radius 16 --n_slot 112 --head 8 \
--test_aug \
--gpu 0

Descriptions of training parameters are as follows:

--lrw: training dataset location (lrw)
--checkpoint: the checkpoint file
--batch_size: batch size
--test_aug: whether performing test time augmentation --distributed: Use DataDistributedParallel --dataparallel: Use DataParallel
--gpu: gpu for using --lr: learning rate --n_slot: memory slot size --radius: scaling factor for addressing score --head: number of heads for visual-audio memory
Refer to test.py for the other testing parameters

Pretrained Models

You can download the pretrained models.
Put the ckpt in './data/'

Pretrained model

https://kaistackr-my.sharepoint.com/:u:/g/personal/ms_k_kaist_ac_kr/Ed3LRWN_GYZOmG_vZVxagaIB9r3onu1S24QM1VDiE-ZoCg?e=YU41W9

To test the pretrained model, run following command:

# Testing example for LRW
python test.py \
--lrw 'enter_data_path' \
--checkpoint ./data/Pretrained_Ckpt.ckpt \
--batch_size 80
--radius 16 --slot 112 --head 8 \
--test_aug \
--gpu 0

Architecture	Acc.
Resnet18 + MS-TCN + Multi-head Visual-audio Mem	88.508

Training the Model

To train the model, run following command:

# One GPU Training example for LRW
python train.py \
--lrw 'enter_data_path' \
--checkpoint 'enter_the_checkpoint_path' \
--batch_size 40 \
--radius 16 --n_slot 112 --head 8 \
--augmentations \
--mixup \
--gpu 0

# Dataparallel GPU Training example for LRW
python train.py \
--lrw 'enter_data_path' \
--checkpoint 'enter_the_checkpoint_path' \
--batch_size 80 \
--radius 16 --n_slot 112 --head 8 \
--augmentations \
--mixup \
--dataparallel \
--gpu 0,1

# Distributed Dataparallel GPU Training example for LRW
python -m torch.distributed.launch --nproc_per_node='# of GPU' train.py \
--lrw 'enter_data_path' \
--checkpoint 'enter_the_checkpoint_path' \
--batch_size 40 \
--radius 16 --n_slot 112 --head 8 \
--augmentations \
--mixup \
--distributed \
--gpu 0,1,2,3

Descriptions of training parameters are as follows:

--lrw: training dataset location (lrw)
--checkpoint_dir: the location for saving checkpoint
--checkpoint: the checkpoint file
--batch_size: batch size
--augmentation: whether performing augmentation
--distributed: Use DataDistributedParallel
--dataparallel: Use DataParallel
--mixup: Use mixup augmentation
--gpu: gpu for using --lr: learning rate --n_slot: memory slot size --radius: scaling factor for addressing score --head: number of heads for visual-audio memory
Refer to train.py for the other training parameters

Citation

If you find this work useful in your research, please cite the paper:

@inproceedings{kim2022distinguishing,
  title={Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading},
  author={Kim, Minsu and Yeo, Jeong Hun and Ro, Yong Man},
  booktitle={Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada},
  volume={22},
  year={2022}
}

akshayroy / multi-head-visual-audio-memory Goto Github PK

multi-head-visual-audio-memory's Introduction

Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading

Preparation

Requirements

Datasets

Testing the Model

Pretrained Models

Training the Model

Citation

multi-head-visual-audio-memory's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent