Giter VIP home page Giter VIP logo

ser_keras_tf_trainer's Introduction

SER_KERAS_TF_TRAINER

This repository includes source codes and documents for Keras/Tensorflow based speech emotion recognition model (https://github.com/batikim09/LIVE_SER) training.

Maintainer: batikim09 (batikim09) - [email protected]

This folder has source codes of a model trainer for speech emotion recognition.

##Contents

  1. Installation Requirements

  2. Usage

  3. References

1. Installation Requirements

This software only runs on OSX or Linux (tested on Ubuntu). It is compatible with python 2.x and 3.x, but the following descrptions assume that python 3.x is installed.

basic system packages

This software relies on several system packages that must be installed using a software manager.

For Ubuntu, please run the following steps:

`sudo apt-get install python-pip python-dev libhdf5-dev'

python packages

Using pip, install all pre-required modules. (pip version >= 8.1 is required, see: http://askubuntu.com/questions/712339/how-to-upgrade-pip-to-latest)

sudo pip3 install -r requirements.txt

2. Usage

We assume that users already downloaded the eNTERFACE corpus that is freely available (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.220.2113&rep=rep1&type=pdf) and builded a h5 database. See details in feature extractor (https://github.com/batikim09/SER_FEAT_EXT).

We assume the location of the database is "../SER_FEAT_EXT/h5db/ENT.RAW.3cls.av".

With this small corpus, deep temporal architectures can't provide any benefits. The following scripts show just examples that do not have any fine-tunning.

Basic training

Users can combine various types of neural networks such as fully-connected neural network (FCN), convolutional neural network (CNN), long-short-term-memory (LSTM), residual network (RESNET), and highway.

Feature vectors have temporal structures. For example, the 2D feature input has a shape of (#sample, #time, 1, #context_window, #feature_dim). The 3D feature input has a shape of (#sample, 1, #time, #context_window, #feature_dim). See details of context windows in https://github.com/batikim09/LIVE_SER. See "./scripts/basic.sh".

Updating pretrained models

Users can train a background model first and load it for fine-tunning. When re-updating parameters of a pre-trained model, freezing some layers is possible too. See "./scripts/pretrained.sh".

Balanced learning

To deal with imbalanced distributions of classes, several methods are provided. See "./scripts/balanced_learning.sh".

3. References

This software is based on the following papers. Please cite one of these papers in your publications if it helps your research:

@inproceedings{kim2017interspeech, title={Towards Speech Emotion Recognition ``in the wild'' using Aggregated Corpora and Deep Multi-Task Learning}, author={\textbf{Kim, Jaebok} and Englebienne, Gwenn and Truong, Khiet P and Evers, Vanessa}, booktitle={Proceedings of the INTERSPEECH}, pages={1113--1117}, year={2017} }

@inproceedings{kim2017acmmm, title={Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition}, author={Kim, Jaebok and Englebienne, Gwenn and Truong, Khiet P and Evers, Vanessa}, booktitle={Proceedings of ACM Multimedia}, pages={1006-1013}, year={2017} }

@inproceedings{kim2017acii, title={Learning spectro-temporal features with 3D CNNs for speech emotion recognition}, author={Kim, Jaebok and Truong, Khiet and Englebienne, Gwenn and Evers, Vanessa}, booktitle={Proceedings of International Conference on Affective Computing and Intelligent Interaction}, pages={}, year={2017} }

ser_keras_tf_trainer's People

Watchers

Jan Kolkmeier avatar  avatar Siewart avatar James Cloos avatar DanielPD avatar  avatar JAEBOK KIM avatar Jelte avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.