RAVEn: A PyTorch Lightning Implementation

Introduction

We provide code for the reproduction of the main results in Jointly Learning Visual and Auditory Speech Representations from Raw Data. Our implementation is based on PyTorch Lightning.

Preparation

Installation

conda env create -f environment.yml. Change the environment prefix to match the location of miniconda3, if necessary.

Data

The datasets used in the paper can be downloaded from the following links:
- LRS3
- VoxCeleb2
- LRS2
Compute 68 landmarks per frame using e.g., RetinaFace and 2-D FAN, or download them e.g., from this repo. Each landmark file should have the same name as its corresponding video (except that it ends in .npy).

Use the following command to crop the mouths:

python preprocessing/extract_mouths.py --src_dir ${SOURCE_DIR} --tgt_dir ${TARGET_DIR} --landmarks_dir ${LANDMARKS_DIR}

RAVEn pre-trained models

Below are the checkpoints of the Base and Large models pre-trained with RAVEn on LRS3+Vox2-en.

Model	Modality	Checkpoint
Base	Video	Download
Base	Audio	Download
Large	Video	Download
Large	Audio	Download

Testing

Below are the checkpoints corresponding to Tables 1 and 2 for VSR and ASR on LRS3. Models are provided for both low- and high-resource labelled data settings. In the high-resource setting, the models are fine-tuned on the full LRS3 dataset (433 hours). In the low-resource setting, they are fine-tuned on a subset ("trainval") of LRS3 (30 hours).
In some cases, the models were re-trained so the WER may differ slightly from the ones shown in the paper (which are also reproduced below).
The paths for the slurm bash scripts used for inference are shown in the table below. Note that the scripts may need to be modified according to the cluster environment.
The language model we used in this work can be found here.

VSR

Low-resource

Model	Pre-training dataset	WER (%)	Checkpoint	Bash script
Base	LRS3	47.0	Download	scripts/vsr/lrs3_trainval/base_lrs3.sh
Base	LRS3+Vox2-en	40.2	Download	scripts/vsr/lrs3_trainval/base_lrs3vox2.sh
Large	LRS3+Vox2-en	32.5	Download	scripts/vsr/lrs3_trainval/large_lrs3vox2.sh
Large w/ ST	LRS3+Vox2-en	24.8	Download	scripts/vsr/lrs3_trainval/large_lrs3vox2_self.sh
Large w/ ST + LM	LRS3+Vox2-en	23.8	same as last row	scripts/vsr/lrs3_trainval/large_lrs3vox2_self_lm.sh

High-resource

Model	Pre-training dataset	WER (%)	Checkpoint	Bash script
Base	LRS3	39.1	Download	scripts/vsr/lrs3/base_lrs3.sh
Base	LRS3+Vox2-en	33.1	Download	scripts/vsr/lrs3/base_lrs3vox2.sh
Large	LRS3+Vox2-en	27.8	Download	scripts/vsr/lrs3/large_lrs3vox2.sh
Large w/ ST	LRS3+Vox2-en	24.4	Download	scripts/vsr/lrs3/large_lrs3vox2_self.sh
Large w/ ST + LM	LRS3+Vox2-en	23.1	same as last row	scripts/vsr/lrs3/large_lrs3vox2_self_lm.sh

ASR

Low-resource

Model	Pre-training dataset	WER (%)	Checkpoint	Bash script
Base	LRS3	4.7	Download	scripts/asr/lrs3_trainval/base_lrs3.sh
Base	LRS3+Vox2-en	3.8	Download	scripts/asr/lrs3_trainval/base_lrs3vox2.sh
Large	LRS3+Vox2-en	2.7	Download	scripts/asr/lrs3_trainval/large_lrs3vox2.sh
Large w/ ST	LRS3+Vox2-en	2.3	Download	scripts/asr/lrs3_trainval/large_lrs3vox2_self.sh
Large w/ ST + LM	LRS3+Vox2-en	1.9	same as last row	scripts/asr/lrs3_trainval/large_lrs3vox2_self_lm.sh

High-resource

Model	Pre-training dataset	WER (%)	Checkpoint	Bash script
Base	LRS3	2.2	Download	scripts/asr/lrs3/base_lrs3.sh
Base	LRS3+Vox2-en	1.9	Download	scripts/asr/lrs3/base_lrs3vox2.sh
Large	LRS3+Vox2-en	1.4	Download	scripts/asr/lrs3/large_lrs3vox2.sh
Large w/ ST	LRS3+Vox2-en	1.4	Download	scripts/asr/lrs3/large_lrs3vox2_self.sh
Large w/ ST + LM	LRS3+Vox2-en	1.4	same as last row	scripts/asr/lrs3/large_lrs3vox2_self_lm.sh

Code for pre-training and fine-tuning coming soon...

Citation

If you find this repo useful for your research, please consider citing the following:

@article{haliassos2022jointly,
  title={Jointly Learning Visual and Auditory Speech Representations from Raw Data},
  author={Haliassos, Alexandros and Ma, Pingchuan and Mira, Rodrigo and Petridis, Stavros and Pantic, Maja},
  journal={arXiv preprint arXiv:2212.06246},
  year={2022}
}

kyushusouth / raven Goto Github PK

raven's Introduction

RAVEn: A PyTorch Lightning Implementation

Introduction

Preparation

Installation

Data

RAVEn pre-trained models

Testing

VSR

Low-resource

High-resource

ASR

Low-resource

High-resource

Citation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent