Giter VIP home page Giter VIP logo

subho406 / tf-speech-recognition-challenge-solution Goto Github PK

View Code? Open in Web Editor NEW
58.0 6.0 28.0 116.38 MB

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

License: GNU General Public License v3.0

Python 3.47% Shell 0.14% Jupyter Notebook 96.39%
tensorflow speech speech-recognition raspberry-pi audio-recognition neural-network deep-learning convolutional-neural-networks recurrent-neural-networks ensemble-learning

tf-speech-recognition-challenge-solution's Introduction

TF Speech Recognition Challenge

Tensorflow Speech Recognition Challenge was a Kaggle competition organised by Google Brain to use the Speech Commands Dataset to build an algorithm that understands simple spoken commands. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

This solution achieved a rank of 63 on private leaderboard (top 5%).

Project Structure

  • data
    • raw
      • train (Training audio files)
      • test (Test audio files used for evaluation
  • libs
    • classification (All scripts used for training and evaluation)
  • notebooks
  • scripts (Executable scripts)
  • models (Pretrained Models)

Requirements

  1. Tensorflow 1.4
  2. librosa
  3. scikit-learn
  4. Python 3.x

Running

Download the Speech Commands Dataset and extract the dataset in the train folder. Test Audio can be placed in data/test/audio folder.

The notebooks can be run individually using Jupyter. To run the scripts from command line edit the notebooks using Jupyter and run:

./script/execute_notebook.py

and select the notebook to run. The results are stored in results/notebook_name.log

P0 Predict Test WAV.ipynb can be used to predict audio files using a trained graphdef model.

Architecture

Models used

  1. A variant of Convolutional LSTM (https://arxiv.org/pdf/1610.00277.pdf)
  2. LSTM-L (https://arxiv.org/pdf/1711.07128.pdf)
  3. C-RNN (https://arxiv.org/pdf/1711.07128.pdf)
  4. GRU-L (https://arxiv.org/pdf/1711.07128.pdf)
  5. Resnet

Training

The model was trained using a GCP instance with the following specifications:

  • NVIDIA Tesla P100 X 1
  • 16 GB RAM
  • 35 GB SSD

Most of the models converged in 30k steps. Pseudo Labelling on test data was used to improve the model performance.

Prediction

The final model was a ensemble 13 models. Weighted Averaging and Stacking was used to generate the final predictions.

Aknowledgements

  1. ML-KWS-for-MCU (https://github.com/ARM-software/ML-KWS-for-MCU)
  2. Very Deep Convolutional Neural Network for Robust Speech Recognition (https://arxiv.org/pdf/1610.00277.pdf)
  3. Speech Commands Dataset (https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)

If you like this project or have any queries don't hesitate to send an email to [email protected]

tf-speech-recognition-challenge-solution's People

Contributors

subho406 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tf-speech-recognition-challenge-solution's Issues

no directory named "src"

when I try to run the script,
[01 Init Notebook]
I didn't found any directory named "src" and no utils.py or any other files.

import sys
sys.path.append("../src")

from libs import utils

how to test my audio data after training completed?

the code train data well, but it does`t classify the audio file in test directory after training completed? so, what I should perform to test my audio data after training completed?

my audio data for testing are in test dirctory "data/raw/test/audio/*.wav"

ts

Error when i launch 1/2/3 options

Enter option: 2
Executing notebook 01 Init.ipynb
[NbConvertApp] WARNING | pattern '../notebooks\01\' matched no files
[NbConvertApp] WARNING | pattern 'Init.ipynb' matched no files
File "", line 1
This application is used to convert notebook files (*.ipynb) to various other
^
SyntaxError: invalid syntax
Exception ignored in: <_io.TextIOWrapper name='' mode='w' encoding='1252'>
BrokenPipeError: [Errno 32] Broken pipe

Can I use your code for disabled speech recognithion

Im master student in computer science and Im having a special needs and disability which is related to some problem in my speech, for this reason I would like to use your code to improve the speech recognition for disables people.

I just want to know what are the part of code should I change, if I want to use speech recognition for disables people database.

Thank for your help

result file is missing

I download the code and database and it is work fine, but I can`t see any result directory to find the output in <notebook_name>.log.

can you explain exactly where I can find this directory or see the output?

untitled

thank you

error in input shape when call freeze.freeze

after I complete the training and call freeze.freeze I got this error:

File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'Reshape' (op: 'Reshape') with input shapes: [?,1], [].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 151, in
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "", line 146, in main
File "../libs/classification/freeze.py", line 163, in freeze_graph
FLAGS.dct_coefficient_count, model_architecture,model_size_info)
File "../libs/classification/freeze.py", line 73, in create_inference_graph
decoded_sample_data=tf.reshape(decoded_sample_data,shape=(model_settings['desired_samples']))
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3938, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2958, in create_op
set_shapes_for_outputs(ret)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2209, in set_shapes_for_outputs
shapes = shape_func(op)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2159, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "/home/hussain/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 1 but is rank 0 for 'Reshape' (op: 'Reshape') with input shapes: [?,1], [].

requirment for using my data

if it is no problem. I have some question. If I want to use different database of speech and I replace the audio files in train and test folder with my audio . what I should update for classification ,in the code?

Thank for your time.

result file is still empty

I have created result folder in path "C:\Users\Hussain\Anaconda3\results" and executed the code, the result folder still empty.

how to see the output?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.