Giter VIP home page Giter VIP logo

asr-wer-bench's Introduction

asr-wer-bench

Setup

Essential OS Packages

Linux:

$ apt-get -y install tree build-essential cmake sox libsox-dev \
    libfftw3-dev libatlas-base-dev liblzma-dev libbz2-dev libzstd-dev \
    apt-utils gcc libpq-dev libopenblas-dev \
    libsndfile1 libsndfile-dev libsndfile1-dev libgflags-dev libgoogle-glog-dev

MacOS:

$ brew install cmake boost eigen

Download and install SCTK/sclite

Workbench uses sclite command from SCTK, the NIST Scoring Toolkit. Clone the source, and building using the instructions given in GitGub repo.

Check that sclite is in your path:

$ which sclite

ASR WER Bench

Clone repo:

$ git clone https://github.com/SlangLabs/asr-wer-bench.git

$ cd asr-wer-bench
$ export ASR_WER_BENCH_DIR=`pwd`
$ echo $ASR_WER_BENCH_DIR

Python 3.6.5 or above is required. Set Python virtual environment:

$ python3 -m venv ./.venv
$ source ./.venv/bin/activate

# For CPU machine
$ pip install -r requirements.txt

# For GPU machine
$ pip install -r requirements-gpu.txt

Audio test data:

$ ls -l $ASR_WER_BENCH_DIR/data/en-US/audio

Build KenLM Language Model

Popular Language Model.

NOT NEEDED in current setup for DeepSpeech and wav2letter.

Build kenlm package from GitHub source:

$ git clone https://github.com/kpu/kenlm.git
$ cd kenlm
$ export KENLM_ROOT_DIR=`pwd`
$ echo $KENLM_ROOT_DIR

$ mkdir -p build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON
$ make -j 4

$ cd ..

Build KenLM model:

$ mkdir -p $ASR_WER_BENCH_DIR/models/kenlm/en-US/
$ cd $ASR_WER_BENCH_DIR/models/kenlm/en-US/

$ curl -LO http://www.openslr.org/resources/11/4-gram.arpa.gz
$ gunzip 4-gram.arpa.gz

$ $KENLM_ROOT_DIR/build/bin/build_binary trie 4-gram.arpa 4-gram.bin

Audio Test Data

Audio files for testing are expected to be in a single directory. Each test sample is in a pair of files:

  • audio file: <filename>.wav
  • transcript: <filename>.txt

A sample set is provided in ./data/en-US/audio/ directory.


Test Data Preparation

Currently, there are limitations on the length of the audio and transcription clips.

Benchmark runs only on test cases (and filter out the rest):

  • Audio clip shorter than 30 sec
  • Reference transcript shorter than 620 chars and with 2 or more words

DeepSpeech

Download Models

Models are expected to be in:

  • Mozilla DeepSpeech: $ASR_WER_BENCH_DIR/models/deepspeech/<model-file-name>.{pbmm, scorer}

Please save the models in that location or create soft links. You can also download official pre-trained models:

$ mkdir -p $ASR_WER_BENCH_DIR/models/deepspeech/en-US/
$ cd $ASR_WER_BENCH_DIR/models/deepspeech/en-US/

$ curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.8.1/deepspeech-0.8.1-models.pbmm
$ curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.8.1/deepspeech-0.8.1-models.scorer

$ cd $ASR_WER_BENCH_DIR

# Verify DeepSpeech download
$ deepspeech \
  --model models/deepspeech/en-US/deepspeech-0.8.1-models.pbmm \
  --scorer models/deepspeech/en-US/deepspeech-0.8.1-models.scorer \
  --audio data/en-US/audio/2830-3980-0043.wav

# Expected transcript
$ cat data/en-US/audio/2830-3980-0043.txt

Run Test Bench

To run WER bench for DeepSpeech:

$ PYTHONPATH=. python3 werbench/asr/engine.py --engine deepspeech \
  --model-path-prefix <model dir + model filename prefix> \
  --input-dir <wav txt data dir> \
  --output-path-prefix <output file prefix>

For Example:

$ PYTHONPATH=. python3 werbench/asr/engine.py --engine deepspeech \
  --model-path-prefix ./models/deepspeech/en-US/deepspeech-0.8.1-models \
  --input-dir ./data/en-US/audio \
  --output-path-prefix ./deepspeech-out

This will generate ./deepspeech-out.ref and ./deepspeech-out.hyp files.

Generate sclite Report

To generate sclite report:

$ sclite -r deepspeech-out.ref trn -h deepspeech-out.hyp trn -i rm

To generate detailed sclite report:

$ sclite -r deepspeech-out.ref trn -h deepspeech-out.hyp trn -i rm -o dtl

Facebook Wav2Letter

Download Models

You can down load pre-trained 2av2letter models from Facebook:

$ mkdir -p $ASR_WER_BENCH_DIR/models/wav2letter/en-US
$ cd $ASR_WER_BENCH_DIR/models/wav2letter/en-US

$ wget https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/tokens.txt
$ wget https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lexicon.txt
$ wget https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin
$ wget https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_25Mparams.bin

$ cd $ASR_WER_BENCH_DIR

Run Test Bench in the Docker

The easiest way to run Wav2Letter is to run the docker images provided by Facebook Flashlight project.

To run inference on a CPU machine, get CPU docker image:

$ docker pull flml/flashlight:cpu-latest

To run on GPU machine, you must use nvidia-docker:

$ docker pull flashlight flml/flashlight:cuda-latest

Run the docker with asr-wer-bench mounted as a volume:

$ docker run -v $ASR_WER_BENCH_DIR:/root/asr-wer-bench --rm -itd --name flashlight flml/flashlight:cpu-latest

$ docker exec -it flashlight bash

This will get you in a shell in the docker Set the workbench dir inside the docker:

$ export ASR_WER_BENCH_DIR=/root/asr-wer-bench

Install the requirements:

$ cd $ASR_WER_BENCH_DIR
$ pip3 install -r requirements.txt

TODO: A docker image to include only py packages needed for wav2letter and sclite, so that installing the requirements.txt is not needed and sclite can be run from within the docker image.

Run Test Bench

First select the language model and wav2letter model you want to use:

$ cd $ASR_WER_BENCH_DIR/models/wav2letter/en-US

$ ln -s lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin lm.bin
$ ln -s am_conformer_ctc_stride3_letters_25Mparams.bin model.bin

Go back to the asr-wer-bench dir, and run the benchmark :

$ cd $ASR_WER_BENCH_DIR

$ PYTHONPATH=. python3 werbench/asr/engine.py --engine wav2letter \
  --model-path-prefix <model dir> \
  --input-dir <wav txt data dir> \
  --output-path-prefix <output file prefix>

For Example:

$ PYTHONPATH=. python3 werbench/asr/engine.py --engine wav2letter \
  --model-path-prefix ./models/wav2letter/en-US \
  --input-dir ./data/en-US/audio \
  --output-path-prefix ./wav2letter-out

This will generate:

  • Reference transcripts: ./wav2letter-out.ref
  • Hypothesis transcripts: ./wav2letter-out.hyp
  • Performance report: ./wav2letter-out.perf
  • Timestamp splits for each test sample: ./wav2letter-out-timestamps/*-ts.txt

The timestamp files contain following tuples in Audacity Labels format, separated by tabs: <start-timestamp end-timestamp word>

Exit the docket shell.

Generate sclite Report

To generate sclite report:

$ sclite -r wav2letter-out.ref trn -h wav2letter-out.hyp trn -i rm

To generate detailed sclite report:

$ sclite -r wav2letter-out.ref trn -h wav2letter-out.hyp trn -i rm -o dtl

© 2020-21 Slang Labs Private Limited. All rights reserved.

asr-wer-bench's People

Contributors

scgupta avatar gmurthy-slanglabs avatar

Stargazers

Shantanu Nair avatar Satvik Kumar avatar Guijin Ding avatar VCW avatar

Watchers

Steven Viola avatar James Cloos avatar Giridhar Murthy avatar Kumar Rangarajan avatar  avatar Phaniraj Raghavendra avatar

Forkers

scgupta

asr-wer-bench's Issues

wav2letter python binding installation fails

Hi I'm running into an issue where I cannot install the wav2letter bindings.
cd /wav2letter/bindings/python then pip install -e .
gives the following output:

ERROR: Command errored out with exit status 1: command: /home/username/anaconda3/envs/asr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/username/github/wrapASR/wav2letter/bindings/python/setup.py'"'"'; __file__='"'"'/home/username/github/wrapASR/wav2letter/bindings/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps cwd: /home/username/github/wrapASR/wav2letter/bindings/python/ Complete output (36 lines): running develop running egg_info writing wav2letter.egg-info/PKG-INFO writing dependency_links to wav2letter.egg-info/dependency_links.txt writing top-level names to wav2letter.egg-info/top_level.txt reading manifest file 'wav2letter.egg-info/SOURCES.txt' writing manifest file 'wav2letter.egg-info/SOURCES.txt' running build_ext make: *** No targets specified and no makefile found. Stop. Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/username/github/wrapASR/wav2letter/bindings/python/setup.py", line 109, in <module> zip_safe=False, File "/home/username/anaconda3/envs/asr/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/username/anaconda3/envs/asr/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/home/username/anaconda3/envs/asr/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/username/anaconda3/envs/asr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/username/anaconda3/envs/asr/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run self.install_for_development() File "/home/username/anaconda3/envs/asr/lib/python3.7/site-packages/setuptools/command/develop.py", line 136, in install_for_development self.run_command('build_ext') File "/home/username/anaconda3/envs/asr/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/username/anaconda3/envs/asr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/username/github/wrapASR/wav2letter/bindings/python/setup.py", line 48, in run self.build_extensions() File "/home/username/github/wrapASR/wav2letter/bindings/python/setup.py", line 91, in build_extensions ["cmake", "--build", "."] + build_args, cwd=self.build_temp File "/home/username/anaconda3/envs/asr/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j4']' returned non-zero exit status 2. ---------------------------------------- ERROR: Command errored out with exit status 1: /home/username/anaconda3/envs/asr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/username/github/wrapASR/wav2letter/bindings/python/setup.py'"'"'; __file__='"'"'/home/username/github/wrapASR/wav2letter/bindings/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

Debug message in wav2letter transcrptions

Some of the transcriptions have this strange string:

INFERENCECTC.CPP:246] [INFERENCE TUTORIAL FOR CTC]: WAITING THE INPUT IN THE FORMAT [AUDIO_PATH]

In case of any failure, at worst it should be an empty string.

Processing >30 sec clips in wav2letter

Wav2letter doesn't handle clips larger than 30 sec properly. Current benchmark filters such audio files from test/train set.

Maybe VAD can be used to break a longer audio file and concatenate the transcription text,

Time performance characteristics report

Other than Word Error Rate (WER), transcription time also plays a critical role in practical application.

So a benchmark report that has the following info is desirable:

  1. clip size
  2. transcription time
  3. ratio of the clip size / transcription time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.