Training a new SpeechtoText model with Deepspeech

Make sure the deepspeech version are same for training and running

clone the deepspeech via- https://github.com/mozilla/DeepSpeech
then git checkout v 0.5.1
cd DeepSpeech pip install -r requirements.txt
Download ngram processing tool kenlm.

$sudo apt install zlib1g-dev libbz2-dev liblzma-dev libeigen3-dev libboost1.65-all-dev cmake
$mkdir build
$cd build
$cmake ..
$sudo make install

Install native_client. This is the pre-processing tool that comes with deepSpeech and can help with pre-processing. runs in the root directory of deepSpeech: python3 util/taskcluster.py --arch gpu --target ./native_client
Prepare dataset. There should be three folders train, test, and dev. Each folder should contain all the audio files with csv. Which should have three columns. wav_file, wav_size, wav_transcript.
Create the alphabet, vocubulary and lm.binary file.
Save the below code as train.sh and run in from the deepspeech root directory.

set -xe
if [ ! -f DeepSpeech.py ]; then
    echo "Please make sure you run this from DeepSpeech's top level directory."
    exit 1
fi;

python3.5 -u DeepSpeech.py \
  --train_files /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/train/train.csv \
  --test_files /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/test/test.csv \
  --train_batch_size 80 \
  --test_batch_size 40 \
  --n_hidden 375 \
  --epoch 3 \
  --validation_step 1 \
  --early_stop True \
  --earlystop_nsteps 6 \
  --estop_mean_thresh 0.1 \
  --estop_std_thresh 0.1 \
  --dropout_rate 0.22 \
  --learning_rate 0.00095 \
  --report_count 100 \
  --use_seq_length False \
  --export_dir /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/results/model_export/ \
  --checkpoint_dir /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/results/checkout/ \
  --decoder_library_path /home/nvidia/tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so \
  --alphabet_config_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/alphabet.txt \
  --lm_binary_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/lm.bin \
  --lm_trie_path /usr/workspace/pykaldi/deepspeech/vietnam/deepspeech_training/trie \
  "$@"

running the model deepspeech --model model_data/output_graph.pb --alphabet model_data/alphabet.txt --lm model_data/lm.bin --trie model_data/tri --audio model_data/x.wav

Useful links

https://github.com/mozilla/DeepSpeech/blob/d3168391b3e9ee041bf8c517237cbacbd0c23da2/README.md#installing-prerequisites-for-training

http://www.programmersought.com/article/1886530040/

saifrehman / speechtotext-model-training-with-deepspeech Goto Github PK

speechtotext-model-training-with-deepspeech's Introduction

Training a new SpeechtoText model with Deepspeech

Useful links

speechtotext-model-training-with-deepspeech's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent