Giter VIP home page Giter VIP logo

voice_emotion's Introduction

Vocal Emotion Sensing

Intro

Human expression and communication is multi-faceted and complex. For example, a speaker not only communicates through words, but also through cadence, intonation, facial expressions, and body language. It's why we prefer to hold business meetings in person rather than over conference calls, and why conference calls are preferred over emails or texting. The closer we are the more communication bandwidth exists.

Voice recognition software has advanced greatly in recent years. This technology now does an excellent job of recognizing phoenetic sounds and piecing these together to reproduce spoken words and sentences. However, simply translating speech to text does not fully encapsulate a speakers message. Facial expressions and body language aside, text is highly limited in its capacity to capture emotional intent. It's why sarcasm is so difficult to capture on paper.

This github repo contains code used to build, train, and test a convolutional neural network to classify emotion in input audio files.

Data

Data for training and testing this classifier was obtained from three University compiled datasets: RAVDESS, TESS, and SAVEE. In total these datasets provide > 4,000 labeled audio files across 7 common emotional categories (neutral, happy, sad, angry, fearful, disgusted, surprised) spoken by 30 actors.

The RAVDESS audio files are easily available online in a single downloadable zip file. The TESS files are also easily available but require some web-scraping and a wget command. The SAVEE files require registration to access, anyone can easily register. Once registered the files can be easily downloaded via a single wget command.

Methods

A number of preprocessing steps are required before audio files can be classified. Python's Librosa library contains a number of excellent functions for performing these steps. The process is essentially to load audio files into python, remove silent portions, create set length windows of audio, then compute MFCCs for each window. The actual classification is then performed on the windowed MFCCs.

The file visualizing_and_cleaning_data.ipynb contains code used for EDA and noise floor detection.

Training data is generated by taking random windows from each file, for this project I used 0.4s windows. The number of windows taken from each file is determined by the file's length. This training data is then fed into a convolutional neural network.

In order to make predictions on test files a similar process is used. The main difference is that windows are taken from the test file in a sliding fashion, I used a 0.1s step size for my 0.4s windows. Classification is then performed on every window, predictions are aggregated for all the windows, and the final result is determined by choosing the class with the greatest aggregate prediction strength.

The file build_cnn.ipynb contains code used for generating training data, passing the training data to a CNN, then making predictions on a test set of files.

Results

The CNN was able to predict the emotion of the test set files with 83% accuracy.

Model Accuracy

voice_emotion's People

Contributors

alexmuhr avatar

Stargazers

 avatar Machadowisck avatar  avatar  avatar Abhishek Verma avatar Zhu Xinyi avatar Julio Ramirez avatar Younghwa Oh avatar Haneen avatar  avatar KushGrandhi avatar Hirdesh Kumar avatar Tee Kai Feng avatar DJ avatar kevincho avatar Alexander avatar  avatar cbck avatar Nitesh Thapliyal avatar  avatar mahefamampionona avatar Yuan GAO  avatar  avatar  avatar  avatar Vladyslav Moisieienkov avatar Nickolay V. Shmyrev avatar Vladimir Seregin avatar Nico Galoppo avatar  avatar  avatar chenchengxin avatar  avatar  avatar  avatar  avatar Vivek Jain avatar Fan Qian avatar xingxy avatar  avatar .- ... .--. .. .-. .. -. --. --. ..- .-. ..- avatar Tanay Singh avatar STYLIANOS IORDANIS avatar Amr Kayid avatar  avatar Farr avatar

Watchers

Nickolay V. Shmyrev avatar  avatar  avatar  avatar

voice_emotion's Issues

It is a pity that this project is dead

This project is very interesting, it is a pity that the development has not been continued. I tried to use it but it gives errors, it seems that something has changed in these 5 years. It would have been interesting to have at least a requirements.txt file to try to replicate the environment.

In the file scraping_TESS.ipynb when I run the code soup = make_soup(url) i got the error:

FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Testing with custom audio files

Hi,
I finally came up with a way to test the model with a custom file.
When I pass a file from TESS dataset, it predicts correctly. But when I pass a voice recording of myself, it's always wrong.
Is the setting important? like mono/bitrate, etc?

Invalid Train Test Split

Using TESS dataset is invalid as it only has 2 speakers, which makes the test results unreliable and biased. Try splitting with a speaker independent approach and test again so that the test results are reliable. #

Use pretrained model

How can we use your pretrained model to make prediction on a sample audio file? Say prediction on every 1s chunks.

I couldn't figure it out looking at your sample code. Can you please share a prediction piece?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.