Giter VIP home page Giter VIP logo

ssl_ser's Introduction

Speech Emotion Recognition based on pre-trained models using Self-Supervised Learning

This project is part of a master's thesis for KTH: course DA231X - Degree Project in Computer Science and Engineering, Second Cycle 30hp

Heavily influenced by Huggingface's ASR scripts for wav2vec2 (link here).

Sister directory for handling the sounds-files - SER_dataloader

Get started

Prerequisites

To run the project Docker needs to be installed, as well as uploading the data to your local repository. The data is supposed to be stored in ser_scripts/datasets/{dataset_name}.

Step-by-step

Enter the following commands to the command line to setup and run the scripts

  1. docker build -t ser_env .
  2. docker run -it --gpus device=0 -v $(pwd)/ser_scripts:/workspace/ser_scripts:ro ser_env
    • To run a custom script instead of the regular run_training.sh as defined in the Dockerfile:
    • docker run -it --gpus device=0 -v $(pwd)/ser_scripts:/workspace/ser_scripts:ro ser_env bash ser_scripts/run_xval.sh

Change any parameters in run_x.sh to customize the run. The ser_scripts directory is loaded as a volume in the container, so any changes done locally will be seen in the container as well.

Files

There are 2 kinds of files, build-files and actual scripts. Build-files can be used to construct the environment for running the scripts.

Build-files

  • Dockerfile - Creates a basic container for running the different SER-experiments. The container runs run_training.sh by default
  • requirements.txt - Lists the required packages for the environment.
    • One of the dependancies is the sister-repo of this project, which is made for constructing dataframes for the different datasets. It can be found here

Scripts

There are helper and run/test-scripts. Helper-scripts contains utility functions and structures which the run/test-scripts use as building-blocks to define the different experiments.

Helpers

  • arguments.py - Contains dataclasses for parsing arguments to the main run-files. This allows for ease of use for running experiments from the command line
    • ModelArguments - Where to load/save model, freeze sections of model, ...
    • DataArguments - Where to load data from, sound-quality, ...
    • DatasetArguments - Train/test/validation splits, k-fold, ...
  • data_partitioner.py - Partition the data into train/test/validation or k-fold splits
  • model.py - Structs for different SSL-models and classification-heads
  • utils.py - Utility scripts for plotting, graphs, ...

Run/test-Scripts

Run-scripts have 2 parts, one .sh file meant to serve as a hub for setting the hyper-parameters of the run, while the python-file constructs the logic of the experiments. To run these enter bash run_x.sh (or bash ser_scripts/run_x.sh if in a container) to the command-line. Test-scripts are scripts not meant for any specific experiment, but more so for trying stuff out.

Run:

  • run_training - Basic script which the rest are built from
  • run_baselines - Create mono-lingual baselines for each dataset
  • run_phone_test - Test the difference of applying the phone-filter
  • run_xval - Run cross-validation

Test:

  • test_trained_model - Run an old model created by one of the run-scripts on custom data
  • test_inference_time - Test how long different parts of the inference process take

Data

The data is supposed to be stored in ser_scripts/datasets/{dataset_name}, and should have the below described structure overall. Reading the data and transforming it to Dataframes is done by SER_dataloader, which relies on the data being stored like this. The train/test/val.csv:s contain preset partitons of all the datasets and can be used for reproducibility (use --use_preset_split true to use this preset in the run-scripts, otherwise the partitioning will be random).

.
├── ser_scripts
    ├── datasets
        ├── dataset_name
            ├── wav/wav                         # 2 layers of wav, kaggle datahandling is weird...
            |   ├── file_abc123.wav
            |   └── ...
            ├── wav_telephone/wav_telephone     # same as for wav, 2 layers cuz kaggle...
            |   ├── file_abc123.wav
            |   └── ...
            ├── train.csv                       # Preset train/test/validation-splits 
            ├── test.csv
            ├── val.csv
            └── meta_data.csv                   # Possibly extra csv:s containing some metadata used for constructing the dataframes, ex. actor_2_gender.csv

The following datasets are implemented:

  • Cremad-D - cremad
  • ESD - esd-data (Changed license, no longer commercially viable)
  • Subesco - subesco
  • emoDB - emodb-data (License not specified)
  • MESD - mesddata
  • EMOVO - emovodata (License not specified)
  • Oréau - oreaudata
  • emoUERJ - emouerj

ssl_ser's People

Contributors

felixlut avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.