Giter VIP home page Giter VIP logo

super-events-cvpr18's Introduction

Learning Latent Super-Events to Detect Multiple Activities in Videos

This repository contains the code for our CVPR 2018 paper:

AJ Piergiovanni and Michael S. Ryoo
"Learning Latent Super-Events to Detect Multiple Activities in Videos"
in CVPR 2018

If you find the code useful for your research, please cite our paper:

    @inproceedings{piergiovanni2018super,
          title={Learning Latent Super-Events to Detect Multiple Activities in Videos},
          booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
          author={AJ Piergiovanni and Michael S. Ryoo},
          year={2018}
    }

Temporal Structure Filters

tsf

The core of our approach, the temporal structure filters (TSF) can be found in temporal_structure_filter.py. This creates the TSF with N cauchy distributions. We create the super-event model in super_event.py.

Activity Detection Experiments

model overview

To run our pre-trained models:

python train_model.py -mode joint -dataset multithumos -train False -rgb_model_file models/multithumos/rgb_baseline -flow_model_file models/multithumos/flow_baseline

We tested our models on the MultiTHUMOS, Charades, and AVA datasets (only the temporal annotations were used in AVA). We provide our trained models in the model directory as well as the convert json format for the datasets in the data directory.

Results

Charades localization (Charades_v1_localize.m evaluation):

Method mAP (%)
Two-Stream + LSTM [1] 9.6
Sigurdsson et al. [1] 12.8
I3D [2] baseline 17.22
I3D + LSTM 18.1
I3D + Super-events 19.41

MultiTHUMOS:

Method mAP (%)
Two-Stream [3] 27.6
Two-Stream + LSTM [3] 28.1
Multi-LSTM [3] 29.6
I3D [2] baseline 29.7
I3D + LSTM 29.9
I3D + Super-events 36.4

Example Learned Super-events

Our trained models on MultiTHUMOS which contains ~2500 videos of 65 different activities in continuous videos and Charades which contained ~10,000 continuous videos learned various super-events. Here are some example learned super-events from our models.

For the block action, our model learned to focus on the pass/dribbe before the shot and the shot/dunk action.

basketball

Here are examples of the temporal interval focused on by the super-event for the 'block' action detection capturing dribbling:

Here are examples of the temporal interval focused on by the super-event for the 'block' action detection capturing blocking/dunking:

Requirements

Our code has been tested on Ubuntu 14.04 and 16.04 using python 2.7, PyTorch version 0.3.1 (but will likely work with other versions) with a Titan X GPU.

Setup

  1. Download the code git clone https://github.com/piergiaj/super-events-cvpr18.git

  2. Extract features from your dataset. See Pytorch-I3D for our code to extract I3D features.

  3. train_model.py contains the code to train and evaluate models.

Refrences

[1] G. A. Sigurdsson, S. Divvala, A. Farhadi, and A. Gupta. Asynchronous temporal fields for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

[2] J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[3] S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision (IJCV), pages 1โ€“15, 2015

super-events-cvpr18's People

Contributors

piergiaj avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.