Giter VIP home page Giter VIP logo

temporalconvolutionalnetworks's Introduction

Temporal Convolutional Networks

This code implements the video- and sensor-based action segmentation models from Temporal Convolutional Networks for Action Segmentation and Detection by Colin Lea, Michael Flynn, Rene Vidal, Austin Reiter, Greg Hager arXiv 2016 (in-review).

It was originally developed for use with the 50 Salads, GTEA, MERL Shopping, and JIGSAWS datasets. Recently we have also achieved high action segmentation performance on medical data, in robotics applications, and using accelerometer data from the UCI Smartphone dataset.

An abbreviated version of this work was described at the ECCV 2016 Workshop on BNMW.

Requirements: TensorFlow, Keras (1.1.2+)

Requirements (optional):

  • Numba: This makes the metrics much faster to compute but can be removed is necessary.
  • LCTM: Our older Conditional Random Field-based models.

Tested on Python 3.5. May work on Python 2.7 but is untested.

Contents (code folder)

  • TCN_main.py. -- Main script for evaluation. I suggest interactively working with this in an iPython shell.
  • compare_predictions.py -- Script to output stats on each set of predictions.
  • datasets.py -- Adapters for processing specific datasets with a common interface.
  • metrics.py -- Functions for computing other performance metrics. These usually take the form score(P, Y, bg_class) where P are the predictions, Y are the ground-truth labels, and bg_class is the background class.
  • tf_models.py -- Models built with TensorFlow / Keras.
  • utils.py -- Utilities for manipulating data.

Data

The features used for many of the datasets we use are linked below. The video features are the output of a Spatial CNN trained using image and motion information as mentioned in the paper. To get features from the MERL dataset talk to Bharat Signh at UMD.

Each set of features should be placed in the features folder (e.g., [TCN_directory]/features/GTEA/SpatialCNN/).

Each .mat file contains three or four types of data: 'Y' refers to the ground truth action labels for each sequence, 'X' is the per-frame probability as output from a Spatial CNN applied to each frame of video, 'A' is the 128-dim intermediate fully connected layer from the Spatial CNN applied at each frame, and if available 'S' is the sensor data (accelerometer signals in 50 Salads, robot kinematics in JIGSAWS).

There are a set of corresponding splits for each dataset in [TCN_directory]/splits/[dataset]. These should be easy to use with the dataset loader included here.

temporalconvolutionalnetworks's People

Contributors

colincsl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.