Giter VIP home page Giter VIP logo

actionrecognition's Introduction

Action Recognition

TODO: clear up the all files and rerun the training procedure put a simple picture showing the result

Overall Objective:

  • Try different models for action recognition using data from UCF-101

  • Compare the performance of different models and do some analysis based on the experiment results

File Structure

./rnn_practice: For doing some practice on RNN models and LSTMs with online tutorials and other useful resources

./data: Training and testing data. (But don't add huge data files to this repo, add them to gitignore)

./models: Defining the architecture of models

./utils: Utils scripts for dataset preparation, input pre-processing and other misc

./train_CNN: For training our different CNN models. Load corresponding model, set the training parameters and then start training

./process_CNN: For the LRCN model, the CNN component is pre-trained and then fixed during the training of LSTM cells. Thus we can use the CNN model to pre-process the frames of each video and store the intermediate result for feeding into LSTMs later. This can largely improve the training efficiency of the LRCN model

./train_RNN: For training the LRCN model

./predict: For calculating the overall testing accuracy on the whole testing set

Models

  1. Fine-tuned ResNet50 trained solely with single-frame image data (every frame of every video is considered as an image for training or testing, thus a natural data augmentation). The ResNet50 is from keras repo, with weights pre-trained on Imagenet. ./models/finetuned_resnet.py

  2. LRCN (CNN(here we use the fine-tuned ResNet50) + LSTMs), with input being a sequence of frames uniformly extracted from each video. The fine-tuned ResNet directly uses the result of 1 without extra training. (Refer to Long-term recurrent convolutional network) Produce intermediate data using ./process_CNN.py and then train and predict with ./models/RNN.py

  3. Simple CNN model trained with stacked optical flow data (generate one stacked optical flow from each of the video). ./models/temporal_CNN.py

  4. Two-stream model, combine the models in 2 and 3 with an extra fusion layer that output the final result. 3 and 4 refer to this paper ./models/two_stream.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.