Giter VIP home page Giter VIP logo

micco00x / vision Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 2.26 MB

Video classification in TensorFlow using Mask R-CNN. This project is built upon https://github.com/matterport/Mask_RCNN. The dataset used to train Mask R-CNN has been built with LabelBox, the video classification has been done with an LSTM that classifies activities taken from a subset of ActivityNet dataset (Gymnastics activities). This repository has been used for the final project of the module Vision and Perception (Spring 2018) at Sapienza University of Rome.

Python 36.94% TeX 63.06%
mask-rcnn lstm video-classification tensorflow activitynet

vision's Introduction

Vision

Clone the repository:

git clone https://github.com/micco00x/Vision

Initialize submodules:

git submodule update --init

Create folders:

mkdir images
mkdir logs
mkdir weights

Generate the dataset:

python3 generate_dataset.py

Split the dataset in train and val:

python3 split_data.py --dataset=dataset/trainval/dataset.json

Train the model (not necessary for the next steps):

python3 activity.py train

Train the extended model which includes COCO (note that there's no need to type --download=True if the COCO dataset has already been downloaded previously):

python3 activity.py train --extended=True --download=True

Evaluate the last trained model on the extended dataset:

python3 activity.py evaluate --extended=True --model=last

Generate the dataset that will be used to train the LSTM (considering that the videos are in dataset/activitynet/Gymnastics/ and that the frames will be saved in dataset/activitynet/Frames):

python3 LSTM/extractFrames.py --videofolder=dataset/activitynet/Gymnastics/ --framesfolder=dataset/activitynet/Frames

Split the video dataset in train and val (considering that the frames are in dataset/activitynet/Frames):

python3 LSTM/splitDataset.py --framesfolder=dataset/activitynet/Frames

Generate the .npz datasets that will be later used to train the LSTM:

python3 generate_npz.py --dataset=dataset/activitynet/Frames/train.txt --model=weights/mask_rcnn_coco_0080.h5
python3 generate_npz.py --dataset=dataset/activitynet/Frames/test.txt --model=weights/mask_rcnn_coco_0080.h5

Train the LSTM that recognizes videos passing as datasets the .npz files generated in the previous step:

python3 train_videos.py --train=dataset/activitynet/Frames/train_masks.npz --test=dataset/activitynet/Frames/test_masks.npz

Create a confusion matrix to study the behaviour of the LSTM (be sure to use the same number of hidden layers for the LSTM):

python3 eval_videos.py --dataset=dataset/activitynet/Frames/test_masks.npz --checkpoint=PATH_TO_CHECKPOINT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.