Giter VIP home page Giter VIP logo

speech-emotion-recognition's Introduction

Recognising Human Emotions From Raw Audio

Collaborator: Aman Agarwal, Aditya Mishra

In this project we will use Mel frequency cepstral coefficients (MFCC) to train a recurrent neural network (LSTM) and classify human emotions into happy, sad, angry, frustrated, sad, neutral and fear categories.

The dataset used is The Interactive Emotional Dyadic Motion Capture (IEMOCAP) collected by University of Southern California

the link for the same can be found here

The dataset

The IEMOCAP database consists of 10 emotions. We selected the major 6 emotions viz. angry, neutral, frustrated, sad, excited and happy, in our training set. Features extracted from the raw audio of all sessions were saved along with their length and emotion. We used the first 20 mfcc coefficients as the feature vector, the process can be found in notebook

To convert data into a consistent shape we have applied Bucket Padding. The data is first sorted according to their sequence lengths and then divided into a specific number of buckets. The length of data thus divided is in close range of each other which eliminates extra padding. This method is used in Bucket Iterator which is used to get the batch if desired examples.

For selecting a batch, a bucket is chosen at random containing sorted data, out of that bucket contiguous examples equal to the batch size are chosen. The examples are padded to the shape of maximum sequence length and then shuffled. This gives the desired batch. the code for bucket iterator is taken from R2RT

Model

We used two layers of Bidirectional LSTM followed by attention in the last layer. The batch size was kept as 128 with the learning rate of 1e-4.

Results

The model was trained for 500 epochs and after which the curve almost reached a plateau. The model showed overfitting when the dropout was not used. We then applied a dropout of keep probability 0.8 between the last LSTM layer and the output layer.

Adding dropout reduced the overfitting of the model and increased its overall accuracy. The model showed an unweighted accuracy across six emotions of 45% with the validation accuracy of 42%.

Dropout of 0.2 No Dropout

Tensorflow model

Tensorflow implementation of the model has been added. The repository contains two files, speech_emotion_gpu to run the model on gpu and speech_emotion_gpu_multi which makes the file run parallelly on multiple gpus.

Input data for model can be downloaded from this link.

It consists of the following features: F0 (pitch), voice probability, zero-crossing rate, 12-dimensional Mel-frequency cepstral coefficients (MFCC) with log energy, and their first time derivatives. The features have been taken from this paper.

speech-emotion-recognition's People

Contributors

amanbasu avatar ravilkashyap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

speech-emotion-recognition's Issues

misssing files Ses01F_script01_2_F008.wav and microsoft_32_features.pkl

Dear sir,

I appropriate your work, i'm student at GMR Institute of Technology,Rajam. As a part of the Project, i am working (Speech to emotion recognition using RNN). For reference i have seen your github. (https://github.com/amanbasu/speech-emotion-recognition.git).

But there are some files are missing in the repository.

Could you please share those data with me?

Ses01F_script01_2_F008.wav
microsoft_32_features.pkl

i couldn't understand the

"path_to_database/IEMOCAP_database/Session{}/wav//.wav".

because of this i couldn't able to load the data...Please help me in this regard

Please respond to the mail as early as possible

Thank you.

Where is the speech_emotion_data.pkl?

Hello Aman:

Thank you for enjoying your code.I am trying to follow your code.But I don't find the "speech_emotion_data.pkl" in the file of "speech_emotion_gpu.py".So please could you provide the speech_emotion_data.pkl ? Thanks very much.

Best wishes!

zero_crosses should be '/320' not '/0.02', shouldn't it?

In extract_features.py, the following code should be /320 not /0.02, shouldn't it?

zero_crosses = np.nonzero(np.diff(sig[start:end] > 0))[0].shape[0]/0.02 # zero crosses

โ†“ modify

zero_crosses = np.nonzero(np.diff(sig[start:end] > 0))[0].shape[0]/ 320 # zero crosses

โ€ป Zero Crossing Rate : The rate of sign-changes of the signal during the duration of a particular frame. [1]

ref.[1]: https://github.com/tyiannak/pyAudioAnalysis/wiki/3.-Feature-Extraction

Link for speech_emotion_data.pkl

Hi,
Could you please provide me the link for speech_emotion_data.pkl file that you used in speech_emotion_gpu.py python file.
And also if possible could you upload the requirements.txt file to know the versions of the packages you used since I was facing so many errors with version changes.

Thanks in advance!

No Features columns in create_mfcc.ipynb

In create_mfcc.ipynb, Features column does not exist in the 'file' dataframe. I would appreciate if you could share the code for generating features such as F0 (pitch), voice probability, zero-crossing rate and so on.

Emotion values

can you provide the script for getting emotion_values.csv file?

Data Prep

Hello Aman,

Quick question, how did you prepare your IEMOCAP data for loading to the model? I am trying to replicate your code but dont know how to pre-process the data.

Link for Dataset

Hi, Hope you are doing well.

I found your approach very interesting and useful for me as well. The link for Dataset that you have shared doesn't exist, so could you please share the dataset with me so that it would really helpful for implementing.

Thanks in advance, appreciate your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.