Giter VIP home page Giter VIP logo

vip's Introduction

A platform for quick and easy development of deep learning networks for recognition and detection in videos. Includes popular models like C3D and SSD.

Check out our wiki!

Implemented Models and their performance

Recognition

Model Architecture Dataset ViP Accuracy (%)
I3D HMDB51 (Split 1) 72.75
C3D HMDB51 (Split 1) 50.14 ± 0.777
C3D UCF101 (Split 1) 80.40 ± 0.399

Object Detection

Model Architecture Dataset ViP Accuracy (%)
SSD300 VOC2007 76.58

Video Object Grounding

Model Architecture Dataset ViP Accuracy (%)
DVSA (+fw, obj) YC2-BB (Validation) 30.09

fw: framewise weighting, obj: object interaction

Citation

Please cite ViP when releasing any work that used this platform: https://arxiv.org/abs/1910.02793

@article{ganesh2019vip,
  title={ViP: Video Platform for PyTorch},
  author={Ganesh, Madan Ravi and Hofesmann, Eric and Louis, Nathan and Corso, Jason},
  journal={arXiv preprint arXiv:1910.02793},
  year={2019}
}

Table of Contents

Configured Datasets

Dataset Task(s)
HMDB51 Activity Recognition
UCF101 Activity Recognition
ImageNetVID Video Object Detection
MSCOCO 2014 Object Detection, Keypoints
VOC2007 Object Detection, Classification
YC2-BB Video Object Grounding
DHF1K Video Saliency Prediction

Models

Model Task(s)
C3D Activity Recognition
I3D Activity Recognition
SSD300 Object Detection
DVSA (+fw, obj) Video Object Grounding

Requirements

  • Python 3.6
  • Cuda 9.0
  • (Suggested) Virtualenv

Installation

# Set up Python3 virtual environment
virtualenv -p python3.6 --no-site-packages vip
source vip/bin/activate

# Clone ViP repository
git clone https://github.com/MichiganCOG/ViP
cd ViP

# Install requirements and model weights
./install.sh

Quick Start

Run train.py and eval.py to train or test any implemented model. The parameters of every experiment is specified in its config.yaml file.

Use the --cfg_file command line argument to point to a different config yaml file. Additionally, all config parameters can be overriden with a command line argument.

Testing

Run eval.py with the argument --cfg_file pointing to the desired model config yaml file.

Ex: From the root directory of ViP, evaluate the action recognition network C3D on HMDB51

python eval.py --cfg_file models/c3d/config_test.yaml

Training

Run train.py with the argument --cfg_file pointing to the desired model config yaml file.

Ex: From the root directory of ViP, train the action recognition network C3D on HMDB51

python train.py --cfg_file models/c3d/config_train.yaml

Additional examples can be found on our wiki.

Development

New models and datasets can be added without needing to rewrite any training, evaluation, or data loading code.

Add a Model

To add a new model:

  1. Create a new folder ViP/models/custom_model_name
  2. Create a model class in ViP/models/custom_model_name/custom_model_name.py
    • Complete __init__, forward, and (optional) __load_pretrained_weights functions
  3. Add PreprocessTrain and PreprocessEval classes within custom_model_name.py
  4. Create config_train.yaml and config_test.yaml files for the new model

Examples of previously implemented models can be found here.

Additional information can be found on our wiki.

Add a Dataset

To add a new dataset:

  1. Convert annotation data to our JSON format
    • The JSON skeleton templates can be found here
    • Existing scripts for datasets can be found here
  2. Create a dataset class in ViP/datasets/custom_dataset_name.py.
    • Inherit DetectionDataset or RecognitionDataset from ViP/abstract_dataset.py
    • Complete __init__ and __getitem__ functions
    • Example skeleton dataset can be found here

Additional information can be found on our wiki.

FAQ

A detailed FAQ can be found on our wiki.

vip's People

Contributors

ehofesmann avatar lemmersj avatar natlouis avatar zeonzir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vip's Issues

No such file or directory: '/z/dat/HMDB51/train.json'

So, I installed every depedencies with "install.sh" on a Python 3.6. And whenever I try to train or eval the example's model with "python <eval.py/train.py> --cfg_file models/c3d/config_test.yaml" I get the following Python error : "FileNotFoundError: [Errno 2] No such file or directory: '/z/dat/HMDB51/train.json'". Can someone help me ?

Parser does not support scientific notation

Using scientific notation in the config file (e.g., lr: 1e-4) causes the json parser to read as a string, resulting in an error. Specifically in the case of learning rate this results in an error at line 96 of train (during the optimizer init), but is likely to result in errors elsewhere for different params.

End of epoch divides incorrectly

The logging loss sum appears to be divided by the expected number of samples (i.e., the full minibatch size), instead of the actual samples processed. This results in an abrupt reduction in magnitude of the logged loss when an epoch ends.

division

Change DataLoader Collate Function

Currently for datasets with bounding boxes, we need to specify the max bounding boxes possible so all output batches are of the same size:

self.max_objects = 38

What we should do is use a custom collate function in the DataLoader like used in the Pytorch detection tutorial:

https://github.com/pytorch/vision/blob/6c2cda6a0eda4c835f96f18bb2b3be5043d96ad2/references/detection/utils.py#L237

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Fails on zero grad

In instances where a neuron doesn't factor into the loss (e.g., a component of the loss is disabled for a specific experiment, resulting in a neuron or set of neurons being unused), autograd returns None for the unused connections. This results in a crash at the line:

param.grad *= 1./float(args['psuedo_batch_loop']*args['batch_size']

With the error:

TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'

This can be remedied by inserting:
if param.grad is not None:
prior to the line in question, but I'm unsure of any upstream consequences.

Redundant learning rate decay when resuming experiment

Relevant lines of code: https://github.com/MichiganCOG/ViP/blob/master/train.py#L114-L115

Loading saved weights using pretrained argument also loads the last saved learning rate (after decaying per config file). However the learning rate is further decayed from the lines above, because the scheduler "loops" through all of the epochs again.

Example: If I ended an experiment with a learning rate of 1e-6 after decaying twice from 1e-4. Resuming that experiments gives me a starting learning rate of 1e-8.

Add logging class

Create a class object to be passed through every model, loss, and metric that has a method allowing you to add a plot to tensorboard for any specified variable.

Seed Numpy in addition to Torch

Currently numpy is unseeded so all random function using it are not repeatable. The expectation is that the seed will be used for Torch and Numpy so that experiments will have identical results with the same seed.

Variable clip length

In certain cases the input to the network is not raw frames, but some computed features. All of the processed frames can be loaded at once, so it'd be useful to not specify the clip length and just read all available data per video.

This only works with batch_size = 1, but this is when the pseudo batch loop can come in handy.

About gen_json_UCF101

Could you please provide the gen_json_UCF101 file? Cuz I am using the UCF101 dataset for I3D training.

config error

When I run python eval.py --cfg_file models/c3d/config_test.yaml:

Traceback (most recent call last):
File "eval.py", line 132, in
eval(**args)
File "eval.py", line 62, in eval
model = create_model_object(**args).to(device)
File "/home/byronnar/pyprojects/cv/video_re/models/models_import.py", line 30, in create_model_object
model = getattr(module, dir(module)[model_index])(**kwargs)
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 56, in init
self.__load_pretrained_weights()
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 128, in __load_pretrained_weights
p_dict = torch.load('weights/c3d-pretrained.pth')
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 25948574 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount
.load() > 0 ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:350)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f5186cc9441 in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f5186cc8d7a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: THStorage_free + 0xca (0x7f510dab629a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x149bbd (0x7f5187277bbd in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: __libc_start_main + 0xe7 (0x7f518be4ab97 in /lib/x86_64-linux-gnu/libc.so.6)

What should I do to solve this problem?
my devices:
os 1804
cuda 9.0
cudnn 7.3.1
python 3.6

Allow Unseeded Training

Sometimes it is necessary to produce networks seeded randomly (for showing robust performance, or for ensembling). It would be nice to be able to do this without changing the config at each launch, especially if there is a delay between sending the start command and actually launching the program.

Allow user to ignore final shape argument

Currently train.py (and maybe eval.py?) checks that the final_shape argument matches the actual image returned from the dataloader (line 149). Some architectures are able to handle multiple input shapes. Providing a method of ignoring this assertion (perhaps by setting final_shape to -1) would be helpful in some cases.

JSON documentation does not match implementation (frame_size)

The detection_template.json file indicates that each individual frame should populate the frame_size parameter. However, when the file is read "KeyError: 'frame_size'" is returned, unless the frame size is a parameter nested directly under video (on the same level as base_path).

My guess is that the behavior is correct (since videos can't dynamically change size), but the documentation is incorrect.

Support for multi-gpu training & evaluation

Add option for DataParallel training in PyTorch.
It's pretty straight forward, the only issues are when accessing the state_dict and functions belonging to model (for multi-gpu training). It becomesmodel.module.state_dict instead of model.state_dict.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.