Giter VIP home page Giter VIP logo

lstm_turn_taking_prediction's Introduction

LSTM Continuous Turn-Taking Prediction

Pytorch implementation for two papers:

  1. Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs (ICMI '18)
  2. Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs (INTERSPEECH '18)

The supplied code is designed to reproduce the main results from [1] that show the utility of using the multiscale approach. The code can potentially be adapted to reproduce other results from both papers. It can also be used to investigate other user-defined feature-sets and architectures. I hope it is useful! Feel free to contact me if you find any errors or have any queries. Please note that it is still a work in progress. The data preparation script takes roughly 4 hours on a modern computer with 4 cores. The script to reproduce the results takes several hours using a single GTX1080.

Requirements:

  • Linux
  • PyTorch v>0.3.0
  • Anaconda
  • nltk
  • Sox
  • OpenSmile-2.3.0

Setup

Download the repository.

git clone https://github.com/mattroddy/lstm_turn_taking_prediction 

Download the maptask corpus audio data from (http://groups.inf.ed.ac.uk/maptask/maptasknxt.html) by running the wget.sh script obtained from the site. Run the script from within the lstm_turn_taking_prediction/data/ folder:

cd lstm_turn_taking_prediction/data
sh 'maptaskBuild-xxxxx.wget.sh'
wget http://groups.inf.ed.ac.uk/maptask/hcrcmaptask.nxtformatv2-1.zip
unzip hcrcmaptask.nxtformatv2-1.zip
rm hcrcmaptask.nxtformatv2-1.zip
cd ..

Split the audio channels:

sh scripts/split_channels.sh

Download opensmile from (https://audeering.com/technology/opensmile/#download) and extract into lstm_turn_taking_prediction/utils. Then replace config files with modified ones: (note: config files have been modified to use a 50ms step size, not use smoothing, and adopt the left-alignment convention)

rm -r utils/opensmile-2.3.0/config
mv utils/config utils/opensmile-2.3.0/

Extract features and evaluation metrics:

python prepare_data.py

Running the code

At this point a model can be trained and tested by running:

python run_json.py 

To reproduce the main results in [1] set the path to your python environment in the appropriate icmi_18_results file. Then:

python icmi_18_results_no_subnets.py 
python icmi_18_results_two_subnets.py

This will reproduce table 1 from [1]. This should take about a day on a modern computer with a GTX1080 GPU. We reduce the number of trials from 5 to 3 to save time. The results can be viewed in the "report_dict.json" files within each respective directory.

lstm_turn_taking_prediction's People

Contributors

mattroddy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.