Giter VIP home page Giter VIP logo

emogator's Introduction

EmoGator

The EmoGator dataset consists of 32,130 non-speech vocal bursts (for example: laughter, sighs, moans, and groans) in 30 categories.

This release includes the entire dataset, though additional code will be added later; hopefully, with the information provided, the dataset put to use almost immediately.

Naming Convention

mp3 files are in data/mp3.

There were 357 contributors, 30 vocal burst categories, and 3 instances of each category, providing a balanced dataset with 90 samples for each contributor.

The files are named as follows:

NNNNNN-EE-I.mp3

Where:
NNNNNN: contributor ID (000001-000357)
EE - Emotion Category (01-30)
I - Instance (1,2, or 3)

MP3 files were collected at different sample rates, usually near 44100Hz, but dependent on submitter's computer hardware.

The 30 emotion categories are:

['Adoration', 'Amusement', 'Anger', 'Awe', 'Confusion', 'Contempt', 'Contentment', 'Desire', 'Disappointment', 'Disgust', 'Distress', 'Ecstasy', 'Elation', 'Embarrassment', 'Fear', 'Guilt', 'Interest', 'Neutral', 'Pain', 'Pride', 'Realization', 'Relief', 'Romantic Love', 'Sadness', 'Serenity', 'Shame', 'Surprise (Negative)', 'Surprise (Positive)', 'Sympathy', 'Triumph']

The file data/category_names.pt provides a Python list with these categories, saved via torch.save. Note that the samples are labeled 01-30, but the categories in the Python list are 0-29. A simple:

import torch
category_names=torch.load('data/category-names.pt')

will bring them in.

For now, I've included some utilities in code/utils.

duration is a bash script that pulls the duration (in seconds) for each mp3 file and displays it; it requires ffmpeg.

renumber.py was used to renumber the samples so they were sequential; you shouldn't need it, and since the data is already sequentially renumbered, it won't do anything. (Just there for completeness, really)

sum_duration.py uses duration to list the duration of each mp3 file, along with a total at the end.

summary.py displays a list of submitter ids and instances for each; just a sanity check to make sure we had 90 samples of each.

More to come!

Paper: https://arxiv.org/abs/2301.00508

emogator's People

Contributors

fredbuhl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.