The tclsta_tcsvt2018 from lidaguo

Introduction

This is the source code of our TCSVT 2018 paper "Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification", please cite the following paper if you use our code.

Yuxin Peng, Yunzhen Zhao, and Junchao Zhang, "Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification", IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), DOI: 10.1109/TCSVT.2018.2808685, 2018.

Dependency

The code for spatial-temporal attention model is based on Caffe, all the dependencies are the same as Caffe. The provided caffe code caffe-rc3-lstm/ is modified on the rc3 version.
For the implementation of the LSTM layer in caffe-rc3-lstm/, please refer to Junhyuk Oh's implementation.
The code for static-motion collaborative model is based on Torch. The provided code CollaborativeLearning/ is modified on Jiasen Lu's implementation, all the dependencies can be seen in Requirements.
The proposed TCLSTA also uses the Pre-trained ResNet-50 model with batch normalization, which can be downloaded at Caffe model zoo, download this model and put it in PretrainedModel/ folder.

Data Preparation

Here we use UCF101 dataset for an example, download the UCF101 dataset, and put the extracted frames and optical flow images in dataset/UCF101/UCF101_jpegs_256/ and dataset/UCF101/UCF101_tvl1_flow/ folders, respectively.

It's recommended to use the Christoph Feichtenhofer's toolkit to compute optical flow.

Usage

We show the training steps on UCF101 split01 with single GPU for example.

The training of spatial-temporal attention model.
For the stable convergence of spatial-temporal attention model, we take the following training steps:

Train Connection network and Spatial-level attention network jointly and get the spatial attention model.
```
  sh train_resnet50_sp01_spatial.sh
```
Train Temporal-level attention network based on the obtained spatial attention model, with freezing the weights of Connection network and Spatial-level attention network.
```
  Sample 10 frames for each video
  sh train_resnet50_sp01_spatial_temporal_frozen.sh
```
Train the spatial-temporal attention model jointly based on the obtained model by the last step.
```
  sh train_resnet50_sp01_spatial_temporal.sh
```
Note that spatial-temporal attention model on optical flow can be obtained by similar training steps.

The training of static-motion collaborative model.
First sample 25 frames and opical flow images for each video, then extract frame and optical flow features using trained spatial-temporal attention models. Then excute the following command to train the static-motion collaborative model.
```
 cd CollaborativeLearning/
 th trainTest.lua
```

lidaguo / tclsta_tcsvt2018 Goto Github PK

tclsta_tcsvt2018's Introduction

Introduction

Dependency

Data Preparation

Usage

Related Link

tclsta_tcsvt2018's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent