Light

agrim9 / digit-speech-recognition Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 2.0 44.87 MB

Speech Recognition: Digits

Jupyter Notebook 93.90% Python 6.10%

speech-recognition python mfcc dtw

digit-speech-recognition's Introduction

Digit-Speech-Recognition

For technical details, please see the Report

File Description and Usage

Files

bag_of_frames.py: Code for N-fold CV for bag of frames method. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
DTW.py: Code for N-fold CV for DTW method. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
live_endpointing.py: Code for real time endpointing used for interactive demo
Live_Test.ipynb: IPython notebook to test the code in an interactive setting. Requires VQ codebook to be generated from each speakers data using k means. This was tested for 8 clusters, however accuracy may improve if the number of clusters are increased.
mfcc_feat.py: Code to get the MFCC feature vector for the wav file input, obtained after endpointing in ./Processed_Data. Stores the MFCC in ./Extracted_Feats/ directory
Report.pdf: Report made for the project, describing things in more detail and also includes observations
Speech_Endpointing.ipynb: Used to endpoint the speech signals. Warning ! Different thresholds maybe required for different speakers. Splits the input waveform into smaller waveforms which are free of noise, in ./Processed_Data directory.
VQ.py: Used to generate the files in ./VQ_codebooks/ directory. Requires MFCC features to be calculated and stored in ./Extracted_Feats/ in the specified format.
VQ_check.py: Code for n CV validation of VQ method. Requires ./VQ_codebooks to be populated appropriately.

Directories

Extracted_Feats: Stores the extracted MFCC feature vectors for each 64 utterance of each digit. See the directory structure to get more idea on storing part
Output_Logs: Has 3 text files storing the output logs of N-CV of Bag of Frames, VQ and DTW methods
Processed_Data: Has 64 wav files corresponding to every digit, obtained after end-pointing appropriately
Raw_Data: Has the zip files for the given data of 16 speakers
VQ_codebook: Has the VQ codebooks generated for each speaker by VQ.py

digit-speech-recognition's People

Contributors

Watchers

Forkers

wisdomh419 doanthuvan

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.