Light

abhishekyana / emotion-recognition-using-spatio-temporal-data Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 0.0 320 KB

Emotion Recognition using Spatio-Temporal Data, applied on RAVDESS Dataset, to predict 8 emotion, based on the Video(Spatio) and Audio(Temporal) Data.

License: MIT License

Python 100.00%

emotion-recognition-using-spatio-temporal-data's Introduction

Emotion-Recognition-using-Spatio-Temporal-Data

Emotion Recognition using Spatio-Temporal Data, applied on RAVDESS Dataset, to predict 8 emotion, based on the Video(Spatio) and Audio(Temporal) Data

It is a multi-modal learning:
- Where for the Spatial features a Convolutional Neural Networks are used.
- For the Audio Temporal Data a variant of RNN called LSTM is used.

Preprocessing the Spatio-Temporal data before feeding it to the model.

For Spatial data:
- Video feed of 30 frames per second: We have 30 images per second.
- Uniformly sampled 5 images per second at regular intervals, like this.
- So, Each image has a lot of white and unused space. So, as we need only the facial features of the person, I've applied Face Recognition to get the localised coordinated of the face and cropped the image to have only the face.
- Now, To ge the sequential richness of the video in a single image, all the five images are stitched horizontally in order to make a strip of images.
- Noe, We have this strip of image per each second of the video, where we have the sequential information ina single image where we can run a CNN over the single image.
For Audio Temporal Data:
- We have the audio in two formats, Speech and Songs for each emotion for each actor.
- The sampling rate of the audio files is 44100Hz, But as human's peaking frequency lies in between 0-8000Hz we can can down sample the audio to 16000Hz considering the Nyquist criterion. I've used the Librosa to load and down sample the wav files.
- The Audio data when feeding into the model is converted into MFCCs(Mel-Frequency Cepstral Coefficients), this Wav to MFCC conversion is also coded over CUDA enabled Tensorflow so, we can parallelize that part as well.
- Now, The Audio MFCCs can be given to an LSTM where the final latent space vector (Hidden layer) is obtained.

Actual Model:

The latent space vector from the images through CNNs (200 dims) is obtained from the model1 and the latent space vector from the LSTMs (200 dims) is obtained from the model2. These two vectors are concatenated to get a 400dims vector, and that concatenated vector is given to a fully connected layer to predict the emotion class(C=8).
The model is trained on 80% of the randomly shuffled data and validated on the remaining 20% This model converged and performed pretty well on the Validation dataset also.

Thank you for Reading..
Best,
Abhishek Yanamandra

emotion-recognition-using-spatio-temporal-data's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.