Giter VIP home page Giter VIP logo

x4nth055 / emotion-recognition-using-speech Goto Github PK

View Code? Open in Web Editor NEW
519.0 22.0 223.0 966.18 MB

Building and training Speech Emotion Recognizer that predicts human emotions using Python, Sci-kit learn and Keras

License: MIT License

Python 100.00%
machine-learning speech-emotion-recognition emotion-recognition emotion-recognizer sklearn kneighborsclassifier random-forest-classifier mfcc feature-extraction emotion-detection

emotion-recognition-using-speech's Introduction

Speech Emotion Recognition

Introduction

  • This repository handles building and training Speech Emotion Recognition System.
  • The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.
  • This is useful for many industry fields such as making product recommendations, affective computing, etc.
  • Check this tutorial for more information.

Requirements

  • Python 3.6+

Python Packages

  • tensorflow
  • librosa==0.6.3
  • numpy
  • pandas
  • soundfile==0.9.0
  • wave
  • scikit-learn==0.24.2
  • tqdm==4.28.1
  • matplotlib==2.2.3
  • pyaudio==0.2.11
  • ffmpeg (optional): used if you want to add more sample audio by converting to 16000Hz sample rate and mono channel which is provided in convert_wavs.py

Install these libraries by the following command:

pip3 install -r requirements.txt

Dataset

This repository used 4 datasets (including this repo's custom dataset) which are downloaded and formatted already in data folder:

  • RAVDESS : The Ryson Audio-Visual Database of Emotional Speech and Song that contains 24 actors (12 male, 12 female), vocalizing two lexically-matched statements in a neutral North American accent.
  • TESS : Toronto Emotional Speech Set that was modeled on the Northwestern University Auditory Test No. 6 (NU-6; Tillman & Carhart, 1966). A set of 200 target words were spoken in the carrier phrase "Say the word _____' by two actresses (aged 26 and 64 years).
  • EMO-DB : As a part of the DFG funded research project SE462/3-1 in 1997 and 1999 we recorded a database of emotional utterances spoken by actors. The recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. Director of the project was Prof. Dr. W. Sendlmeier, Technical University of Berlin, Institute of Speech and Communication, department of communication science. Members of the project were mainly Felix Burkhardt, Miriam Kienast, Astrid Paeschke and Benjamin Weiss.
  • Custom : Some unbalanced noisy dataset that is located in data/train-custom for training and data/test-custom for testing in which you can add/remove recording samples easily by converting the raw audio to 16000 sample rate, mono channel (this is provided in create_wavs.py script in convert_audio(audio_path) method which requires ffmpeg to be installed and in PATH) and adding the emotion to the end of audio file name separated with '_' (e.g "20190616_125714_happy.wav" will be parsed automatically as happy)

Emotions available

There are 9 emotions available: "neutral", "calm", "happy" "sad", "angry", "fear", "disgust", "ps" (pleasant surprise) and "boredom".

Feature Extraction

Feature extraction is the main part of the speech emotion recognition system. It is basically accomplished by changing the speech waveform to a form of parametric representation at a relatively lesser data rate.

In this repository, we have used the most used features that are available in librosa library including:

  • MFCC
  • Chromagram
  • MEL Spectrogram Frequency (mel)
  • Contrast
  • Tonnetz (tonal centroid features)

Grid Search

Grid search results are already provided in grid folder, but if you want to tune various grid search parameters in parameters.py, you can run the script grid_search.py by:

python grid_search.py

This may take several hours to complete execution, once it is finished, best estimators are stored and pickled in grid folder.

Example 1: Using 3 Emotions

The way to build and train a model for classifying 3 emotions is as shown below:

from emotion_recognition import EmotionRecognizer
from sklearn.svm import SVC
# init a model, let's use SVC
my_model = SVC()
# pass my model to EmotionRecognizer instance
# and balance the dataset
rec = EmotionRecognizer(model=my_model, emotions=['sad', 'neutral', 'happy'], balance=True, verbose=0)
# train the model
rec.train()
# check the test accuracy for that model
print("Test score:", rec.test_score())
# check the train accuracy for that model
print("Train score:", rec.train_score())

Output:

Test score: 0.8148148148148148
Train score: 1.0

Determining the best model

In order to determine the best model, you can by:

# loads the best estimators from `grid` folder that was searched by GridSearchCV in `grid_search.py`,
# and set the model to the best in terms of test score, and then train it
rec.determine_best_model()
# get the determined sklearn model name
print(rec.model.__class__.__name__, "is the best")
# get the test accuracy score for the best estimator
print("Test score:", rec.test_score())

Output:

MLPClassifier is the best
Test Score: 0.8958333333333334

Predicting

Just pass an audio path to the rec.predict() method as shown below:

# this is a neutral speech from emo-db from the testing set
print("Prediction:", rec.predict("data/emodb/wav/15a04Nc.wav"))
# this is a sad speech from TESS from the testing set
print("Prediction:", rec.predict("data/validation/Actor_25/25_01_01_01_back_sad.wav"))

Output:

Prediction: neutral
Prediction: sad

You can pass any audio file, if it's not in the appropriate format (16000Hz and mono channel), then it'll be automatically converted, make sure you have ffmpeg installed in your system and added to PATH.

Example 2: Using RNNs for 5 Emotions

from deep_emotion_recognition import DeepEmotionRecognizer
# initialize instance
# inherited from emotion_recognition.EmotionRecognizer
# default parameters (LSTM: 128x2, Dense:128x2)
deeprec = DeepEmotionRecognizer(emotions=['angry', 'sad', 'neutral', 'ps', 'happy'], n_rnn_layers=2, n_dense_layers=2, rnn_units=128, dense_units=128)
# train the model
deeprec.train()
# get the accuracy
print(deeprec.test_score())
# predict angry audio sample
prediction = deeprec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')
print(f"Prediction: {prediction}")

Output:

0.7717948717948718
Prediction: angry

Predicting probabilities is also possible (for classification ofc):

print(deeprec.predict_proba("data/emodb/wav/16a01Wb.wav"))

Output:

{'angry': 0.99878675, 'sad': 0.0009922335, 'neutral': 7.959707e-06, 'ps': 0.00021298956, 'happy': 8.3598025e-08}

Confusion Matrix

print(deeprec.confusion_matrix(percentage=True, labeled=True))

Output:

              predicted_angry  predicted_sad  predicted_neutral  predicted_ps  predicted_happy
true_angry          80.769226       7.692308           3.846154      5.128205         2.564103
true_sad            12.820514      73.076920           3.846154      6.410257         3.846154
true_neutral         1.282051       1.282051          79.487183      1.282051        16.666668
true_ps             10.256411       3.846154           1.282051     79.487183         5.128205
true_happy           5.128205       8.974360           7.692308      8.974360        69.230774

Example 3: Not Passing any Model and Removing the Custom Dataset

Below code initializes EmotionRecognizer with 3 chosen emotions while removing Custom dataset, and setting balance to False:

from emotion_recognition import EmotionRecognizer
# initialize instance, this will take a bit the first time executed
# as it'll extract the features and calls determine_best_model() automatically
# to load the best performing model on the picked dataset
rec = EmotionRecognizer(emotions=["angry", "neutral", "sad"], balance=False, verbose=1, custom_db=False)
# it will be trained, so no need to train this time
# get the accuracy on the test set
print(rec.confusion_matrix())
# predict angry audio sample
prediction = rec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')
print(f"Prediction: {prediction}")

Output:

[+] Best model determined: RandomForestClassifier with 93.454% test accuracy

              predicted_angry  predicted_neutral  predicted_sad
true_angry          98.275864           1.149425       0.574713
true_neutral         0.917431          88.073395      11.009174
true_sad             6.250000           1.875000      91.875000

Prediction: angry

You can print the number of samples on each class:

rec.get_samples_by_class()

Output:

         train  test  total
angry      910   174   1084
neutral    650   109    759
sad        862   160   1022
total     2422   443   2865

In this case, the dataset is only from TESS and RAVDESS, and not balanced, you can pass True to balance on the EmotionRecognizer instance to balance the data.

Algorithms Used

This repository can be used to build machine learning classifiers as well as regressors for the case of 3 emotions {'sad': 0, 'neutral': 1, 'happy': 2} and the case of 5 emotions {'angry': 1, 'sad': 2, 'neutral': 3, 'ps': 4, 'happy': 5}

Classifiers

  • SVC
  • RandomForestClassifier
  • GradientBoostingClassifier
  • KNeighborsClassifier
  • MLPClassifier
  • BaggingClassifier
  • Recurrent Neural Networks (Keras)

Regressors

  • SVR
  • RandomForestRegressor
  • GradientBoostingRegressor
  • KNeighborsRegressor
  • MLPRegressor
  • BaggingRegressor
  • Recurrent Neural Networks (Keras)

Testing

You can test your own voice by executing the following command:

python test.py

Wait until "Please talk" prompt is appeared, then you can start talking, and the model will automatically detects your emotion when you stop (talking).

You can change emotions to predict, as well as models, type --help for more information.

python test.py --help

Output:

usage: test.py [-h] [-e EMOTIONS] [-m MODEL]

Testing emotion recognition system using your voice, please consider changing
the model and/or parameters as you wish.

optional arguments:
  -h, --help            show this help message and exit
  -e EMOTIONS, --emotions EMOTIONS
                        Emotions to recognize separated by a comma ',',
                        available emotions are "neutral", "calm", "happy"
                        "sad", "angry", "fear", "disgust", "ps" (pleasant
                        surprise) and "boredom", default is
                        "sad,neutral,happy"
  -m MODEL, --model MODEL
                        The model to use, 8 models available are: "SVC","AdaBo
                        ostClassifier","RandomForestClassifier","GradientBoost
                        ingClassifier","DecisionTreeClassifier","KNeighborsCla
                        ssifier","MLPClassifier","BaggingClassifier", default
                        is "BaggingClassifier"

Plotting Histograms

This will only work if grid search is performed.

from emotion_recognition import plot_histograms
# plot histograms on different classifiers
plot_histograms(classifiers=True)

Output:

A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.

Citation

@software{speech_emotion_recognition_2019,
  author       = {Abdeladim Fadheli},
  title        = {Speech Emotion Recognition},
  version      = {1.0.0},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  url          = {https://github.com/x4nth055/emotion-recognition-using-speech}
}

emotion-recognition-using-speech's People

Contributors

dependabot[bot] avatar x4nth055 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emotion-recognition-using-speech's Issues

Error about rec.determine_best_model(train=True)

When I run this sentence rec.determine_best_model(train=True), the code gives error:ModuleNotFoundError: No module named 'sklearn.ensemble._gb_losses'. How to slove it?Is it about the scikit-learn's version?

Hi,

Hi,
Can I use your model to detect any other emotion like rudeness using my own custom-dataset? If yes, please guide.

Originally posted by @Tanish18 in #4 (comment)

extract_feature, did not work.

When I run:
from deep_emotion_recognition import DeepEmotionRecognizer
deeprec = DeepEmotionRecognizer(emotions=['angry', 'sad', 'neutral', 'ps', 'happy'], n_rnn_layers=2, n_dense_layers=2, rnn_units=128, dense_units=128)

train the model

deeprec.train()

get the accuracy

print(deeprec.test_score())

predict angry audio sample

prediction = deeprec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')
print(f"Prediction: {prediction}")

[+] Model created
[*] Model weights loaded
1/1 [==============================] - 0s 400ms/step
0.7538461538461538

extract_feature, did not work.

Invalid Syntax

I ran the pip install requirements.txt and it finished successfully. However when I try to run "python test.py" it gives a invalid syntax issue ->

Rakeshs-MacBook-Air:emotion-recognition-using-speech-master rakeshmohan$ python test.py
File "test.py", line 180
print(f"\t{emotion.capitalize()}: {proba*100:.2f}%")
^
SyntaxError: invalid syntax
Rakeshs-MacBook-Air:emotion-recognition-using-speech-master rakeshmohan$

Can you please help? I am very new to Python and ML etc.

Confusion matrix incomplete problem 混淆矩阵不完整

{'angry': 1.0, 'sad': 1.0595564e-14, 'neutral': 3.413421e-14, 'ps': 2.9746183e-09, 'happy': 1.6824228e-19}
predicted_angry predicted_sad ... predicted_ps predicted_happy
true_angry 87.179489 6.410257 ... 5.128205 0.000000
true_sad 14.102565 75.641022 ... 7.692308 1.282051
true_neutral 3.846154 6.410257 ... 1.282051 5.128205
true_ps 5.128205 7.692308 ... 80.769226 0.000000
true_happy 10.256411 6.410257 ... 10.256411 66.666672

[5 rows x 5 columns]

Where the SVC () model is saved?

Hi, please tell me where the SVC () model is saved after training?
I tried to save rec. model_trained, but I get an error with it :(
AttributeError: 'EmotionRecognizer' object has no attribute 'X_test'

Issue using predict_proba(emotion_recognizer)

When using predict_proba (emotion_recognizer), I get different results based on the order of the chosen emotions.
i.e. if I write "sad","neutral","happy" and then "happy","neutral","sad", my results are different.

TypeError: expected string or bytes-like object

Hi,

I am very new to working with Python, so this Issue might not be specific to the package, but I could not find anything online, so decided to ask. When I try to install import DeepEmotionRecognizer, I get the following error:

`---------------------------------------------------------------------------

TypeError Traceback (most recent call last)
in
----> 1 from deep_emotion_recognition import DeepEmotionRecognizer
2

~/Documents/UCONN/GE/Data/Adam/emotion-recognition-using-speech/deep_emotion_recognition.py in
16 from data_extractor import load_data
17 from create_csv import write_custom_csv, write_emodb_csv, write_tess_ravdess_csv
---> 18 from emotion_recognition import EmotionRecognizer
19 from utils import get_first_letters, AVAILABLE_EMOTIONS, extract_feature, get_dropout_str
20

~/Documents/UCONN/GE/Data/Adam/emotion-recognition-using-speech/emotion_recognition.py in
7 from sklearn.model_selection import GridSearchCV
8
----> 9 import matplotlib.pyplot as pl
10 from time import time
11 from utils import get_best_estimators, get_audio_config

~/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py in
36 import matplotlib.colorbar
37 import matplotlib.image
---> 38 from matplotlib import rcsetup, style
39 from matplotlib import _pylab_helpers, interactive
40 from matplotlib import cbook

~/anaconda3/lib/python3.7/site-packages/matplotlib/style/init.py in
----> 1 from .core import use, context, available, library, reload_library

~/anaconda3/lib/python3.7/site-packages/matplotlib/style/core.py in
222 # Load style library
223 # ==================
--> 224 _base_library = load_base_library()
225
226 library = None

~/anaconda3/lib/python3.7/site-packages/matplotlib/style/core.py in load_base_library()
164 def load_base_library():
165 """Load style library defined in this package."""
--> 166 library = read_style_directory(BASE_LIBRARY_PATH)
167 return library
168

~/anaconda3/lib/python3.7/site-packages/matplotlib/style/core.py in read_style_directory(style_dir)
200 with warnings.catch_warnings(record=True) as warns:
201 styles[path.stem] = rc_params_from_file(
--> 202 path, use_default_template=False)
203 for w in warns:
204 _log.warning('In %s: %s', path, w.message)

~/anaconda3/lib/python3.7/site-packages/matplotlib/init.py in rc_params_from_file(fname, fail_on_error, use_default_template)
983 'c': 'color',
984 'fc': 'facecolor',
--> 985 'ec': 'edgecolor',
986 'mew': 'markeredgewidth',
987 'aa': 'antialiased',

~/anaconda3/lib/python3.7/site-packages/matplotlib/init.py in _rc_params_in_file(fname, fail_on_error)
914 rcParamsOrig = RcParams(rcParams.copy())
915 # This also checks that all rcParams are indeed listed in the template.
--> 916 # Assiging to rcsetup.defaultParams is left only for backcompat.
917 defaultParams = rcsetup.defaultParams = {
918 # We want to resolve deprecated rcParams, but not backend...

~/anaconda3/lib/python3.7/contextlib.py in enter(self)
110 del self.args, self.kwds, self.func
111 try:
--> 112 return next(self.gen)
113 except StopIteration:
114 raise RuntimeError("generator didn't yield") from None

~/anaconda3/lib/python3.7/site-packages/matplotlib/init.py in _open_file_or_url(fname)
891 You have the following UNSUPPORTED LaTeX preamble customizations:
892 %s
--> 893 Please do not ask for support with these customizations active.
894 *****************************************************************
895 """, '\n'.join(config['text.latex.preamble']))

~/anaconda3/lib/python3.7/site-packages/matplotlib/init.py in is_url(filename)
886 config['datapath'] = get_data_path(_from_rc=config['datapath'])
887
--> 888 if "".join(config['text.latex.preamble']):
889 _log.info("""
890 *****************************************************************

TypeError: expected string or bytes-like object`

Do you have any idea what is the problem?
I appreciate any insight:)

Speech to text problem

Hi,
Based on this, I tried to convert speech to text, but whatever I do in the test file is the same print every time, even if I delete from the test file "print ('please talk')", so nothing changes it prints me "please talk" in output. I run the program in order but the output is the same. Also I try to delete some code to make mistake, I thought it wll be an error, but the output is the same.
[+] Model trained
Test accuracy score: 48.677%
Please talk
calm

Thank You in advance.

Best regards.
Bekir

Problem with GridSearch

Hello,
Hope you are doing very well.
I run the model in order to determine the best model (with GridSearch). But it runs with the following problem:
'GradientBoostingClassifier' object has no attribute 'presort'

I couldn't fix it. please help me.
Thank you

Different Results in Example 2

Hello,

I just ran the Example 2 from the README. I didn’t make any changes to the code, but the confusion_matrix showed that the percentage of ‘happy’ and ‘sad’ were mistaken.

image

I would appreciate it if you could comment on this issue.

Error while running the pretrained model: No such file or directory: 'train_custom.csv'

Hello, Could you please help me in this?

  3 rec = EmotionRecognizer(None, emotions=["boredom", "neutral"], features=["mfcc"])
  4 # evaluate all models in `grid` folder and determine the best one in terms of test accuracy

----> 5 rec.determine_best_model()
6 # now you can make inference on the model
7 rec.predict("data/emodb/wav/15b09La.wav") # 'boredom'

10 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in init(self, src, **kwds)
2008 kwds["usecols"] = self.usecols
2009
-> 2010 self._reader = parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
2012

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] No such file or directory: 'train_custom.csv'

Using a pre trained model

Hello,
I want to use a pre trained model to detect boredom in speech? How can I do that? Can you please share your pre trained model for it?

ModuleNotFoundError: No module named 'numba.decorators'

Hi

First of all, Thanks a lot for providing such a valuable material. I am new to Python and may be its a silly question.

I am trying to run the Speech Emotion Recognition code. I followed all the instructions provided but when I try to run the test.py, it gives me below error.

ModuleNotFoundError: No module named 'numba.decorators'

numba is already installed,

any bit of help will be really Appreciated,

Thanks in advance

Different Results in Example 1

Hi,

I wanted to replicate Example 1 from the README. I didn’t make any changes to the code, but got the output much different from yours:

image

Also, the best model selection seems rather strange as I used SVC in the first place. I would appreciate it if you could comment on this issue.

Kind Regards,
Eduards

Testing on WAV files from Youtube?

Hello,

I wanted to use this repository for calling predictions and prediction probabilities on .wav files I'm getting from the internet. I've been using converted youtube audio clips (in the final form of .wav). For some reason, basic predictions aren't being done on these files. I wanted to know if you could explain how one should properly test these new WAV files. For the time being, I've just been following example 2 in the README, downloading the files, and putting them in their own folder within the emotion-recognition-using-speech one. Is there anything in the code I should modify to be able to use this program on wav files converted from Youtube?

References paper

Hi, This is a good job, thank you for your open source!
Is this work has any references paper to this work?

Error while running the pretrained model: No such file or directory: 'train_custom.csv'

FileNotFoundError Traceback (most recent call last)
in ()
7 rec = EmotionRecognizer(model=my_model, emotions=['sad', 'neutral', 'happy'], balance=True, verbose=0)
8 # train the model
----> 9 rec.train()
10 # check the test accuracy for that model
11 print("Test score:", rec.test_score())

10 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in init(self, src, **kwds)
2008 kwds["usecols"] = self.usecols
2009
-> 2010 self._reader = parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
2012

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] No such file or directory: 'train_custom.csv'

I could not run the example in the readme

Hi, I git clone the repo and install the dependency. Then I new a pyhton script which is written as the readme does. But it did not print anything and no error messages. How could this be fixed?

Regarding set up of project

Can i get a tutorial means how to set up this project , i am using jupyter notebook can any one give me any tutorial how to run this project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.