Description:

This branch contains the working code for the master's project focusing on language representation and modelling for Swiss German ASR (2019/2020).

The scripts are an implementation of a basic ASR framework based on Kaldi and were originally developed by Spitch AG, with the following functionality:

Neural networks acoustic model training.
WFST lingware compilation.
Evaluation.

The Kaldi (version 5.5.) recipe egs/wsj/s5 (commit 8cc5c8b32a49f8d963702c6be681dcf5a55eeb2e) was used as reference.

Main scripts:

run_archimob.sh: acoustic model training

run_archimob.sh <archimob_input_csv> <archimob_wav_files_directory> <am_output_directory> <transcription_type> <pronunciation_lexicon>

compile_and_decode.sh: lingware compilation and validation

compile_and_decode.sh <arpa_lm> <am_output_directory> <archimob_dev_csv> <archimob_wav_files_directory> <lw_output_directory> <transcription_type> <lmwt_params> <flexwer_mapping>

evaluate.sh: test set decoding and evaluation

./evaluate.sh <archimob_test_csv> <archimob_wav_files_directory> <am_output_directory> <lw_output_directory> <eval_output_directory> <lmwt> <transcription_type> <flexwer_mapping>

Configuration:

path.sh: script to specify the Kaldi root directory and to add certain directories to the path.

cmd.sh: script to select the way of running parallel jobs.

Folders:

Framework specific:

archimob: scripts related to processing the Archimob files for word-level modelling.

archimob_char: scripts related to processing the Archimob files for character-level modelling.

uzh: secondary scripts not included in the Kaldi recipe.

manual: manually generated files.

doc: documentation files.

lms: scripts for compiling language models

scripts: small scripts for processing different parts of ArchiMob and Kaldi outputs

experiments: Makefiles containing commands for exectuing experiments (e.g. training AMs, compiling WFSTs and evaluating)

Kaldi:

conf: configuration files

local: original recipe-specific files from egs/wsj/s5

utils: utilities shared among all the Kaldi recipes

steps: general scripts related to the different steps followed in the Kaldi recipes

Steps for running experiment on dialectial (Dieth) transcriptions

To generate original lexicon from a csv file:

First, extract Dieth transcription utterances from train.csv (possibly also dev.csv/test.csv)

python ./archimob/process_archimob_csv.py \
-i ../data/archimob_r2/train.csv \
-trans orig \
-t ../processed/dieth/dieth_trans.txt

Then create lexicon by mapping grapheme clusters to phones symbols (according to Fran's original approach)

python ./archimob/create_simple_lexicon.py \
-v ../processed/dieth/dieth_trans.txt \
-c manual/clusters.txt \
-o ../processed/dieth/dieth_lexicon.txt

Train AMs

bash ./run_archimob.sh \
../data/archimob_r2/train.csv \
../data/archimob_r2/chunked_wav_files \
../processed/dieth/am_out \
'orig' \
../processed/dieth/dieth_lexicon.txt

Compile WFST and decode on validation set to get best WIP and LMWT

NB. This step assumes a pre-computed LM in .arpa format (as produced by SRILM/MITLM), e.g., ../lms/dieth/mitlm_mkn_3.arpa.
NB. If mapping of normalised to dieth wordforms is available, include it as the last argument for computing FlexWER.

bash ./compile_and_decode.sh \
../lms/dieth/mitlm_mkn_3.arpa \
../processed/dieth/am_out \
../data/archimob_r2/dev.csv \
../data/archimob_r2/chunked_wav_files \
../processed/dieth/lw_out/ \
orig \
"--min-lmwt 5 --max-lmwt 20" \
../data/archimob_r2/norm2dieth_clean.json

Decoding test set and evaluating performance

NB. Specify best LMWT according to validation set decoding explicitly (in this example, 11)
NB. If mapping of normalised to dieth wordforms is available, include it as the last argument for computing FlexWER.

bash ./evaluate.sh \
../data/archimob_r2/test.csv \
../data/archimob_r2/chunked_wav_files \
../processed/dieth/am_out \
../processed/dieth/lw_out/ \
../processed/dieth/eval_out/ \
11 \
orig \
../data/archimob_r2/norm2dieth_clean.json

Steps for running experiment on normalised transcriptions

Steps are largely the same as above. The main differences include: - lexicon generation - language model training - for all basic commands, the <transcription_type> argument must be norm, not orig - no surface-level mapping for FlexWER evaluations

Useful tips for working with normalised transcriptions:

ensure that the csv has been normalised to remove unwanted diacrtitics (e.g. 'õ', 'ã', etc.)
ensure that input lexicon has been extended to cover as many in-vocabulary words as possible

Example call for AM training:

bash ./run_archimob.sh \
../data/archimob_r2/train.csv \
../data/archimob_r2/chunked_wav_files \
../norm/am_out \
'norm' \
../processed/norm/extended_lexicon.txt

Updated 25/07/2020

entn-at / two-headed-master Goto Github PK

two-headed-master's Introduction

Description:

Main scripts:

Configuration:

Folders:

Steps for running experiment on dialectial (Dieth) transcriptions

Steps for running experiment on normalised transcriptions

two-headed-master's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent