Command Word Recognition is based on a phoneme level GMM-HMM embedding model
pip install -r requirements.txt
Build lexicon which maps a character to phonemes
python DaCiDian/DaCiDian.py
python prep_data.py --mfcc_dim 13 --min_thresh 5
phoneme whose frequency less than 5 will be replaced by 'UNK'
python main.py
it takes about 4 min for a step on my Win10 and about 45s on my Manjaro Linux. Default GMM feature dim is 13. It can give better acc on test set than dim=39
python test.py
Remember to change model_path in test.py. It will give viterbi align and accuracy.
results of 13 dim mfcc feature:
results of 39 dim mfcc feature:
viterbi align example: