Giter VIP home page Giter VIP logo

view-adaptive-neural-networks-for-skeleton-based-human-action-recognition's Introduction

View Adaptive Neural Networks (VA)

Introduction

For skeleton-based human action recognition, one of the key challenges is the large view variations when capturing data. In order to alleviate the effects of view variations, we introduce a novel view adaptation scheme, which automatically determines the virtual observation viewpoints in a learning based data driven manner.

This repository holds the codes and methods for the following papers:

View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition. TPAMI, 2019, paper

View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data. ICCV, 2017, paper

Flowchart

image

Figure 1: Flowchat of the end-to-end view adaptive neural network. It consists of a main classification network and a view adaptation subnetwork. The view adaptation subnetwork automatically determines the virtual observation viewpoints and transforms the skeleton input to representations under the new viewpoints for classification by the main classification network. The entire network is end-toend trained to optimize the classification performance.

Framework

image

Figure 2: Architecture of the proposed view adaptive neural networks: a view adaptive RNN with LSTM (VA-RNN), and a view adaptive CNN (VA-CNN). The classification scores from the two networks can be fused to provide the fused prediction, denoted as the VA-fusion scheme. Note that based on application requirements, we can use VA-RNN or VA-CNN only or combine them together.

Visualization of the Learned Views

image

Figure 3: Frames of (a) the similar posture captured from different viewpoints for the same subject, and (b) the same action “drinking” captured from different viewpoints for different subjects. 2nd row: original skeletons. 3rd row: Skeleton representations from the observation viewpoints of our VA-RNN model. 4th row: Skeleton representations from the observation viewpoints of our VA-CNN model.

Prerequisites

The code is built with following libraries:

Data Preparation

We need to first dowload the NTU-RGB+D dataset

  • Extract the dataset to ./data/ntu/nturgb+d_skeletons/
  • Process the data
 cd ./data/ntu
 # Get skeleton of each performer
 python get_raw_skes_data.py
 # Remove the bad skeleton 
 python get_raw_denoised_data.py
 # Transform the skeleton to the center of the first frame
 python seq_transformation.py

Training

# For CNN-based model with view adaptation module
python  va-cnn.py --model VA --aug 1 --train 1

# For CNN-based model without view adaptation module
python  va-cnn.py --model baseline --aug 1 --train 1

# For RNN-based model with view adaptation module
python va-rnn.py --model VA --aug 1 --train 1

# For RNN-based model without view adaptation module
python va-rnn.py --model baseline --aug 1 --train 1

Testing

# For CNN-based model with view adaptation module
python  va-cnn.py --model VA --aug 1 --train 0

# For CNN-based model without view adaptation module
python  va-cnn.py --model baseline --aug 1 --train 0

# For RNN-based model with view adaptation module
python va-rnn.py --model VA --aug 1 --train 0

# For RNN-based model without view adaptation module
python va-rnn.py --model baseline --aug 1 --train 0

Reference

If you find our papers and repo useful, please cite our papers. Thanks!

@article{zhang2019view,
  title={View adaptive neural networks for high performance skeleton-based human action recognition},
  author={Zhang, Pengfei and Lan, Cuiling and Xing, Junliang and Zeng, Wenjun and Xue, Jianru and Zheng, Nanning},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2019},
}

@inproceedings{zhang2017view,
  title={View adaptive recurrent neural networks for high performance human action recognition from skeleton data},
  author={Zhang, Pengfei and Lan, Cuiling and Xing, Junliang and Zeng, Wenjun and Xue, Jianru and Zheng, Nanning},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={2117--2126},
  year={2017}
}

Microsoft Open Source Code of Conduct: https://opensource.microsoft.com/codeofconduct

view-adaptive-neural-networks-for-skeleton-based-human-action-recognition's People

Contributors

lcl-2019 avatar shuidongliu avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.