Giter VIP home page Giter VIP logo

accentdetection's Introduction

0. Download the librispeech model from "http://www.kaldi-asr.org/downloads/build/6/trunk/egs/librispeech/s5/exp/". We have used model tri2a in our experiments. Extract the model and keep exp directory in the parent dir of this project. Also ensure to keep the lang directory in this folder. The lang folder can be downloaded from "http://www.kaldi-asr.org/downloads/build/6/trunk/egs/librispeech/s5/data/lang/"
1. Download the speech data from GMU Speech Archive.  Run the file getFiles.py in directory GMU_Archive
2. Process the speech files in GMU_Archive: convert them to 44100 Hz wav files
3. Run process_data.sh. The first arguement is the directory path of data, i.e. GMU_Archive in our case.The second arguement is the path of the file containing speech transcript. In our case it is the file "transcript". It generates wav.scp, utt2spk, spk2utt (for kaldi feature extraction) and text(for kaldi alignment) in data directory. It also filters non compatible wav files and calls fix_data_dir.sh to bring meta files to kaldi compatibility.
4. Copy the timit transcript from GMU archive to file utt_dict.
5. Execute baseAudioGen.sh. The first arguement is utt_dict. This will read the utt_dict word by word, capitilize it, generate machine synthesized audio clips for each word in StandardWords. It also calls the python script convert_mfcc.py to generate MFCC features of these clips and saves them in the same directory. Note that the formatted variable can be printed in a file to generate 'transcript' required in step 3 above. 
6. nationalities contains a sorted unique list of nationalities in the GMU dataset.
7. Run wordsnip.sh. First arguement is path of data dir, i.e. GMU_Archive. The second arguement is the path of transcript. In the script, set stage to 1 if running for the first time. This script will extract the features, generate the alignments and snip the utternace into words. It also stretches or elongates the word files to match the length of a word to its reference audio clip's length(machine synthesized). It saves these new word snippets in the directory "words_new".
8. The run getMFCC.py. It reads the word snippets from words_new directory, and for each word, randomly selects one audio clip from each nationality. For the given corpus, there are 55 words. Thus for each word we select an audio file for each nationality(~193), giving us total 55*193 ~10k files. The MFCC features so generated are placed in "mfcc_new_2"
9. Finally run train.py. Modify the file to try out different ML models. This is the conclusion of our pipeline.

accentdetection's People

Contributors

kaavee315 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.