Giter VIP home page Giter VIP logo

wyjwl / tf-speech-recognition-challenge-solution Goto Github PK

View Code? Open in Web Editor NEW

This project forked from subho406/tf-speech-recognition-challenge-solution

0.0 2.0 0.0 116.38 MB

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

License: GNU General Public License v3.0

Python 3.47% Shell 0.14% Jupyter Notebook 96.39%

tf-speech-recognition-challenge-solution's Introduction

TF Speech Recognition Challenge

Tensorflow Speech Recognition Challenge was a Kaggle competition organised by Google Brain to use the Speech Commands Dataset to build an algorithm that understands simple spoken commands. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

This solution achieved a rank of 63 on private leaderboard (top 5%).

Project Structure

  • data
    • raw
      • train (Training audio files)
      • test (Test audio files used for evaluation
  • libs
    • classification (All scripts used for training and evaluation)
  • notebooks
  • scripts (Executable scripts)
  • models (Pretrained Models)

Requirements

  1. Tensorflow 1.4
  2. librosa
  3. scikit-learn
  4. Python 3.x

Running

Download the Speech Commands Dataset and extract the dataset in the train folder. Test Audio can be placed in data/test/audio folder.

The notebooks can be run individually using Jupyter. To run the scripts from command line edit the notebooks using Jupyter and run:

./script/execute_notebook.py

and select the notebook to run. The results are stored in results/notebook_name.log

P0 Predict Test WAV.ipynb can be used to predict audio files using a trained graphdef model.

Architecture

Models used

  1. A variant of Convolutional LSTM (https://arxiv.org/pdf/1610.00277.pdf)
  2. LSTM-L (https://arxiv.org/pdf/1711.07128.pdf)
  3. C-RNN (https://arxiv.org/pdf/1711.07128.pdf)
  4. GRU-L (https://arxiv.org/pdf/1711.07128.pdf)
  5. Resnet

Training

The model was trained using a GCP instance with the following specifications:

  • NVIDIA Tesla P100 X 1
  • 16 GB RAM
  • 35 GB SSD

Most of the models converged in 30k steps. Pseudo Labelling on test data was used to improve the model performance.

Prediction

The final model was a ensemble 13 models. Weighted Averaging and Stacking was used to generate the final predictions.

Aknowledgements

  1. ML-KWS-for-MCU (https://github.com/ARM-software/ML-KWS-for-MCU)
  2. Very Deep Convolutional Neural Network for Robust Speech Recognition (https://arxiv.org/pdf/1610.00277.pdf)
  3. Speech Commands Dataset (https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)

If you like this project or have any queries don't hesitate to send an email to [email protected]

tf-speech-recognition-challenge-solution's People

Contributors

subho406 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.