Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

License: GNU General Public License v3.0

Python 3.47% Shell 0.14% Jupyter Notebook 96.39%

tensorflow speech speech-recognition raspberry-pi audio-recognition neural-network deep-learning convolutional-neural-networks recurrent-neural-networks ensemble-learning

tf-speech-recognition-challenge-solution's Introduction

TF Speech Recognition Challenge

Tensorflow Speech Recognition Challenge was a Kaggle competition organised by Google Brain to use the Speech Commands Dataset to build an algorithm that understands simple spoken commands. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

This solution achieved a rank of 63 on private leaderboard (top 5%).

Project Structure

data
- raw
  - train (Training audio files)
  - test (Test audio files used for evaluation
libs
- classification (All scripts used for training and evaluation)
notebooks
scripts (Executable scripts)
models (Pretrained Models)

Requirements

Tensorflow 1.4
librosa
scikit-learn
Python 3.x

Running

Download the Speech Commands Dataset and extract the dataset in the train folder. Test Audio can be placed in data/test/audio folder.

The notebooks can be run individually using Jupyter. To run the scripts from command line edit the notebooks using Jupyter and run:

./script/execute_notebook.py

and select the notebook to run. The results are stored in results/notebook_name.log

P0 Predict Test WAV.ipynb can be used to predict audio files using a trained graphdef model.

Architecture

Models used

A variant of Convolutional LSTM (https://arxiv.org/pdf/1610.00277.pdf)
LSTM-L (https://arxiv.org/pdf/1711.07128.pdf)
C-RNN (https://arxiv.org/pdf/1711.07128.pdf)
GRU-L (https://arxiv.org/pdf/1711.07128.pdf)
Resnet

Training

The model was trained using a GCP instance with the following specifications:

NVIDIA Tesla P100 X 1
16 GB RAM
35 GB SSD

Most of the models converged in 30k steps. Pseudo Labelling on test data was used to improve the model performance.

Prediction

The final model was a ensemble 13 models. Weighted Averaging and Stacking was used to generate the final predictions.

Aknowledgements

ML-KWS-for-MCU (https://github.com/ARM-software/ML-KWS-for-MCU)
Very Deep Convolutional Neural Network for Robust Speech Recognition (https://arxiv.org/pdf/1610.00277.pdf)
Speech Commands Dataset (https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)

If you like this project or have any queries don't hesitate to send an email to [email protected]

tf-speech-recognition-challenge-solution's People

Contributors

Stargazers

Watchers

tf-speech-recognition-challenge-solution's Issues

no directory named "src"

when I try to run the script,
[01 Init Notebook]
I didn't found any directory named "src" and no utils.py or any other files.

import sys
sys.path.append("../src")

from libs import utils

how to test my audio data after training completed?

the code train data well, but it does`t classify the audio file in test directory after training completed? so, what I should perform to test my audio data after training completed?

my audio data for testing are in test dirctory "data/raw/test/audio/*.wav"

Error when i launch 1/2/3 options

Enter option: 2
Executing notebook 01 Init.ipynb
[NbConvertApp] WARNING | pattern '../notebooks\01\' matched no files
[NbConvertApp] WARNING | pattern 'Init.ipynb' matched no files
File "", line 1
This application is used to convert notebook files (*.ipynb) to various other
^
SyntaxError: invalid syntax
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='1252'>
BrokenPipeError: [Errno 32] Broken pipe

Can I use your code for disabled speech recognithion

Im master student in computer science and Im having a special needs and disability which is related to some problem in my speech, for this reason I would like to use your code to improve the speech recognition for disables people.

I just want to know what are the part of code should I change, if I want to use speech recognition for disables people database.

Thank for your help

resent_model is missing

result file is missing

I download the code and database and it is work fine, but I can`t see any result directory to find the output in <notebook_name>.log.

can you explain exactly where I can find this directory or see the output?

thank you

InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ../logs&checkpoint/lstm_l/ckpt-30000: Not found: ../logs&checkpoint/lstm_l; No such file or directory

error in input shape when call freeze.freeze

after I complete the training and call freeze.freeze I got this error:

File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'Reshape' (op: 'Reshape') with input shapes: [?,1], [].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 151, in
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "", line 146, in main
File "../libs/classification/freeze.py", line 163, in freeze_graph
FLAGS.dct_coefficient_count, model_architecture,model_size_info)
File "../libs/classification/freeze.py", line 73, in create_inference_graph
decoded_sample_data=tf.reshape(decoded_sample_data,shape=(model_settings['desired_samples']))
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3938, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2958, in create_op
set_shapes_for_outputs(ret)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2209, in set_shapes_for_outputs
shapes = shape_func(op)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2159, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 1 but is rank 0 for 'Reshape' (op: 'Reshape') with input shapes: [?,1], [].

requirment for using my data

if it is no problem. I have some question. If I want to use different database of speech and I replace the audio files in train and test folder with my audio . what I should update for classification ,in the code?

Thank for your time.

result file is still empty

I have created result folder in path "C:\Users\Hussain\Anaconda3\results" and executed the code, the result folder still empty.

how to see the output?

TypeError: Cannot interpret feed_dict key as Tensor: The name 'train_mode:0' refers to a Tensor which does not exist. The operation, 'train_mode', does not exist in the graph.

This is the error that I am getting in HyperTuning convLSTM. The error is probably at the val_acc() because the hyperopt is not being able to plot the graph. Can you help me resolve this issue.Thanks

ValueError: noverlap must be less than nperseg.

How to fix this error?

subho406 / tf-speech-recognition-challenge-solution Goto Github PK