Giter VIP home page Giter VIP logo

lidaguo / tclsta_tcsvt2018 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pku-icst-mipl/tclsta_tcsvt2018

0.0 2.0 0.0 31.37 MB

Source code of our TCSVT 2018 paper "Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification".

Lua 1.02% CMake 1.47% Makefile 0.33% HTML 0.10% CSS 0.13% Jupyter Notebook 49.35% C++ 39.65% Shell 0.29% Python 4.42% Cuda 2.79% MATLAB 0.45%

tclsta_tcsvt2018's Introduction

Introduction

This is the source code of our TCSVT 2018 paper "Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification", please cite the following paper if you use our code.

Yuxin Peng, Yunzhen Zhao, and Junchao Zhang, "Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification", IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), DOI: 10.1109/TCSVT.2018.2808685, 2018.

Dependency

  • The code for spatial-temporal attention model is based on Caffe, all the dependencies are the same as Caffe. The provided caffe code caffe-rc3-lstm/ is modified on the rc3 version.

  • For the implementation of the LSTM layer in caffe-rc3-lstm/, please refer to Junhyuk Oh's implementation.

  • The code for static-motion collaborative model is based on Torch. The provided code CollaborativeLearning/ is modified on Jiasen Lu's implementation, all the dependencies can be seen in Requirements.

  • The proposed TCLSTA also uses the Pre-trained ResNet-50 model with batch normalization, which can be downloaded at Caffe model zoo, download this model and put it in PretrainedModel/ folder.

Data Preparation

Here we use UCF101 dataset for an example, download the UCF101 dataset, and put the extracted frames and optical flow images in dataset/UCF101/UCF101_jpegs_256/ and dataset/UCF101/UCF101_tvl1_flow/ folders, respectively.

It's recommended to use the Christoph Feichtenhofer's toolkit to compute optical flow.

Usage

We show the training steps on UCF101 split01 with single GPU for example.

  1. The training of spatial-temporal attention model.
    For the stable convergence of spatial-temporal attention model, we take the following training steps:
  • Train Connection network and Spatial-level attention network jointly and get the spatial attention model.

      sh train_resnet50_sp01_spatial.sh
    
  • Train Temporal-level attention network based on the obtained spatial attention model, with freezing the weights of Connection network and Spatial-level attention network.

      Sample 10 frames for each video
      sh train_resnet50_sp01_spatial_temporal_frozen.sh
    
  • Train the spatial-temporal attention model jointly based on the obtained model by the last step.

      sh train_resnet50_sp01_spatial_temporal.sh
    

    Note that spatial-temporal attention model on optical flow can be obtained by similar training steps.

  1. The training of static-motion collaborative model.
    First sample 25 frames and opical flow images for each video, then extract frame and optical flow features using trained spatial-temporal attention models. Then excute the following command to train the static-motion collaborative model.

     cd CollaborativeLearning/
     th trainTest.lua
    

Related Link

Welcome to our Laboratory Homepage for more information about our papers, source codes, and datasets.

tclsta_tcsvt2018's People

Contributors

pku-icst-mipl avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.