Giter VIP home page Giter VIP logo

pauses's Introduction

pauses

Quick library to extract pause length features from audio files.

How to get started

I'm assuming you are running this on a Mac computer (this is the only operating system tested).

First, make sure you have installed Python3, FFmpeg, and SoX via Homebrew:

brew install python3 sox ffmpeg

Now, clone the repository and install all require dependencies:

git clone [email protected]:jim-schwoebel/pauses.git
cd pauses 
pip3 install -r requirements.txt

Technique #1 - thresholding

The extract_pauselength.py script uses sys.argv[] convention to pass through variables in the terminal. For more information on this, check out this StackOverflow post.

assumptions

To simplify things a bit, I recorded a few files that I could use for reference (in ./data folder) - slow, moderate, moderate-fast, and fast speaking (reading the constitution of the US).

I then used pydub to segment based on a threshold of 50 milleseconds segments and -32 dBFS (to allow for detection of fast speaking events) as a silence interval. This parameter likely needs to be tuned to the dataset and speaker power, etc. and is likely overfitted to my voice. Nonetheless, this gives a proof-of-concept implementation of how to segment speaking segments from non-speaking segments with a threshold. I then calculated pause length as total duration (seconds) over the counted number of segments (e.g. number of pauses) - to get a sec/pause.

if you want to process all audio files in the ./data folder

Run the script in the terminal with:

python3 extract_pauses_1.py n y

recording voice files and calculating pauses in real-time

If you want to record a file you can do this by:

python3 extract_pauses_1.py y n

After you record it it will display the pause length and create a .JSON file.

process audio files in ./data folder and record an audio file in real time together

If you want to both record a file (10 seconds) and process all the files in the ./data director you can run

python3 extract_pauses_1.py y y

Technique #2 - machine learning classification

Another technique that can be used is to train a machine learning model to detect pause lengths. In this case, I trained a quick machine learning model from 5-6 files separating the files into 200 millisecond windows and labeling each one as a 'pause' or a 'speech' event. I used the train_audioTPOT.py script found in the voicebook repository with the librosa feature embedding (librosa_features.py). The model achieves around 91.22807017543859% accuracy with an optimized SVM model.

To run this script, you must first put some files in the load_dir folder when you clone the repository (e.g. 'fast.wav').

Next, run the script:

python3 extract_pauses_2.py

The audio files in ./load_dir are then spliced into 200 millisecond segments and classified as silence or speech events. What results is a file in the ./load_dir that corresponds with the speech file (e.g. fast.wav --> fast.json) with the following information:

{"filename": "fast.wav", "total_length": 1.0, "mean": 0.4, "std": 0.20000000000000007, "max_value": 0.6000000000000001, "min_pause": 0.2, "median": 0.4}

As you can see, you get a bit more information here. Note this was a proof-of-concept and likely needs to be augmented with other datasets for it to work robustly across speakers.

Limitations

Both scripts are limited to low-noise environments. If there is a lot of background noise in your file, I'd first suggest cleaning them and removing noise (e.g. with SoX) before using this script to calculate pause lengths.

Feedback

Any feedback this repository is greatly appreciated.

License

This repository is licensed under the Apache 2.0 License.

Additional reading

pauses's People

Contributors

jim-schwoebel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pauses's Issues

pocketsphinx installations issues

I am having issues downloading pocketsphinx, seems to have issues with new Mac update and I was wondering there is a way to alleviate this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.