Giter VIP home page Giter VIP logo

st-gcn's Introduction

Spatial Temporal Graph Convolutional Networks (ST-GCN)

A graph convolutional network for skeleton based action recognition.

Introduction

This repository holds the codebase, dataset and models for the paper

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Sijie Yan, Yuanjun Xiong and Dahua Lin, AAAI 2018.

[Arxiv Preprint]

Prerequisites

Our codebase is based on Python. There are a few dependencies to run the code. The major python libraries we used are

  • PyTorch
  • NumPy
  • Other Python libraries can be installed by pip install -r requirements.txt

Data Preparation

We experimented on two skeleton-based action recognition datasts: NTU RGB+D and Kinetics-skeleton.

NTU RGB+D

NTU RGB+D can be downloaded from their website. Only the 3D skeletons(5.8GB) modality is required in our experiments. After that, this command should be used to build the database for training or evaluation:

python tools/ntu_gendata.py --data_path <path to nturgbd>

where the <path to nturgbd> points to the 3D skeletons modality of NTU RGB+D dataset you download, for example data/NTU-RGB-D/nturgbd+d_skeletons.

Kinetics-skeleton

Kinetics is a video-based dataset for action recognition which only provide raw video clips without skeleton data. To obatin the joint locations, we first resized all videos to the resolution of 340x256 and converted the frame rate to 30 fps. Then, we extracted skeletons from each frame in Kinetics by Openpose. The extracted skeleton data we called Kinetics-skeleton(7.5GB) can be directly downloaded from here.

It is highly recommended storing data in the SSD rather than HDD for efficiency.

Testing Pretrained Models

Get trained models

We provided the pretrained model weithts of our ST-GCN and the baseline model Temporal-Conv[1]. The model weights can be downloaded by running the script

bash tools/get_models.sh

The downloaded models will be stored under the ./model.

Evaluation

Once datasets and models ready, we can start the evaluation.

To evaluate ST-GCN model pretrained on Kinetcis-skeleton, run

python main.py --config config/st_gcn/kinetics-skeleton/test.yaml

For cross-view evaluation in NTU RGB+D, run

python main.py --config config/st_gcn/nturgbd-cross-view/test.yaml

For cross-subject evaluation in NTU RGB+D, run

python main.py --config config/st_gcn/nturgbd-cross-subject/test.yaml

Similary, the configuration file for testing baseline models can be found under the ./config/baseline.

To speed up evaluation by multi-gpu inference or modify batch size for reducing the memory cost, set --test-batch-size and --device like:

python main.py --config <config file> --test-batch-size <batch size> --device <gpu0> <gpu1> ...

Results

The expected Top-1 accuracy of provided models are shown here:

Model Kinetics-
skeleton (%)
NTU RGB+D
Cross View (%)
NTU RGB+D
Cross Subject (%)
Baseline[1] 20.3 83.1 74.3
ST-GCN (Ours) 30.6 88.9 80.7

[1] Kim, T. S., and Reiter, A. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In BNMW CVPRW.

Training

To train a new ST-GCN model, run

python main.py --config config/st_gcn/<dataset>/train.yaml [--work-dir <work folder>]

where the <dataset> must be nturgbd-cross-view, nturgbd-cross-subject or kinetics-skeleton, depending on the dataset you want to use. The training results, including model weights, configurations and logging files, will be saved under the ./work_dir by default or <work folder> if you appoint it.

You can modify the training parameters such as work-dir, batch-size, step, base_lr and device in the command line or configuration files. The order of priority is: command line > config file > default parameter. For more information, use main.py -h.

Finally, custom model evaluation can be achieved by this command as we mentioned above:

python main.py --config config/st_gcn/<dataset>/test.yaml --weights <path to model weights>

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{stgcn2018aaai,
  title     = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition},
  author    = {Sijie Yan and Yuanjun Xiong and Dahua Lin},
  booktitle = {AAAI},
  year      = {2018},
}

Contact

For any question, feel free to contact

Sijie Yan     : [email protected]
Yuanjun Xiong : [email protected]

st-gcn's People

Contributors

yysijie avatar yjxiong avatar

Stargazers

Braun-Jingzhan Ge avatar ilyas avatar  avatar Edward Turner avatar Yohei Nakayama avatar woosangchul avatar D.Edwards avatar  avatar  avatar Quadwo avatar Amit avatar  avatar

Watchers

wyz avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.