michigancog / vip Goto Github PK

Video Platform for Action Recognition and Object Detection in Pytorch

License: MIT License

Python 99.58% Shell 0.42%

pytorch action-recognition object-detection c3d neural-networks deep-learning ssd imagenetvid mscoco resnet

vip's Issues

JSON documentation does not match implementation (frame_size)

The detection_template.json file indicates that each individual frame should populate the frame_size parameter. However, when the file is read "KeyError: 'frame_size'" is returned, unless the frame size is a parameter nested directly under video (on the same level as base_path).

My guess is that the behavior is correct (since videos can't dynamically change size), but the documentation is incorrect.

config error

When I run python eval.py --cfg_file models/c3d/config_test.yaml:

Traceback (most recent call last):
File "eval.py", line 132, in
eval(**args)
File "eval.py", line 62, in eval
model = create_model_object(**args).to(device)
File "/home/byronnar/pyprojects/cv/video_re/models/models_import.py", line 30, in create_model_object
model = getattr(module, dir(module)[model_index])(**kwargs)
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 56, in init
self.__load_pretrained_weights()
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 128, in __load_pretrained_weights
p_dict = torch.load('weights/c3d-pretrained.pth')
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 25948574 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:350)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f5186cc9441 in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f5186cc8d7a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: THStorage_free + 0xca (0x7f510dab629a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x149bbd (0x7f5187277bbd in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: __libc_start_main + 0xe7 (0x7f518be4ab97 in /lib/x86_64-linux-gnu/libc.so.6)

What should I do to solve this problem?
my devices:
os 1804
cuda 9.0
cudnn 7.3.1
python 3.6

Remove feature extraction code from c3d

Remove self.features = kwargs['model_features'] in c3d.py.
Also remove features from c3d config.

About gen_json_UCF101

Could you please provide the gen_json_UCF101 file? Cuz I am using the UCF101 dataset for I3D training.

End of epoch divides incorrectly

The logging loss sum appears to be divided by the expected number of samples (i.e., the full minibatch size), instead of the actual samples processed. This results in an abrupt reduction in magnitude of the logged loss when an epoch ends.

Redundant learning rate decay when resuming experiment

Relevant lines of code: https://github.com/MichiganCOG/ViP/blob/master/train.py#L114-L115

Loading saved weights using pretrained argument also loads the last saved learning rate (after decaying per config file). However the learning rate is further decayed from the lines above, because the scheduler "loops" through all of the epochs again.

Example: If I ended an experiment with a learning rate of 1e-6 after decaying twice from 1e-4. Resuming that experiments gives me a starting learning rate of 1e-8.

Fails on zero grad

In instances where a neuron doesn't factor into the loss (e.g., a component of the loss is disabled for a specific experiment, resulting in a neuron or set of neurons being unused), autograd returns None for the unused connections. This results in a crash at the line:

param.grad *= 1./float(args['psuedo_batch_loop']*args['batch_size']

With the error:

TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'

This can be remedied by inserting:
if param.grad is not None:
prior to the line in question, but I'm unsure of any upstream consequences.

Add Gradient Clipping

RuntimeError: Given input size: (512x1x7x7). Calculated output size: (512x0x4x4). Output size is too small

I try to use shorter clips to train. eg.15

Reset validation accuracy every epoch

Currently validation accuracy is tracked throughout all of training.

Where should I put the dataset？

Parser does not support scientific notation

Using scientific notation in the config file (e.g., lr: 1e-4) causes the json parser to read as a string, resulting in an error. Specifically in the case of learning rate this results in an error at line 96 of train (during the optimizer init), but is likely to result in errors elsewhere for different params.

Add Saliency Metrics

Initially add cc and nss

Variable clip length

In certain cases the input to the network is not raw frames, but some computed features. All of the processed frames can be loaded at once, so it'd be useful to not specify the clip length and just read all available data per video.

This only works with batch_size = 1, but this is when the pseudo batch loop can come in handy.

Add preprocessing for x-y translation

Randomly translate image along with object bounding box and point coordinates. Include bounds for the translation distances

Resuming experiment results in incorrect test logs

I've been seeing this phenomenon in my experiments, when restarting from already loaded weights. Could be related to #19? Note that logging is fine for training.

Attached are the relevant config files (in a tarball).

configs.tar.gz

Apply preprocessing functions to point coordinates

Can you provide the link for 2.2 times cropped dataset of MPII+NZSL hand dataset?

Hi,

Great work guys!
Can you also provide the link for the MPII+NZSL hand dataset which is 2.2B times cropped?

Seed Numpy in addition to Torch

Currently numpy is unseeded so all random function using it are not repeatable. The expectation is that the seed will be used for Torch and Numpy so that experiments will have identical results with the same seed.

Separate each metric and loss into files

Create two new directories: metrics and losses. Each metric or loss would be self-contained in a separate file.
Avoids extremely long .py files

Set layer-specific learning rates

Specify different learning rates for the layers of a network

Pytorch definition of optimizers and schedulers

Allow you to specify the exact pytorch class and related parameters in the yaml file for the optimizer and scheduler you want to run. (e.g. torch.schedulers.MultiStepLR)

Allow Unseeded Training

Sometimes it is necessary to produce networks seeded randomly (for showing robust performance, or for ensembling). It would be nice to be able to do this without changing the config at each launch, especially if there is a delay between sending the start command and actually launching the program.

Add author accuracies in Readme

Along with citations and links

No such file or directory: '/z/dat/HMDB51/train.json'

So, I installed every depedencies with "install.sh" on a Python 3.6. And whenever I try to train or eval the example's model with "python <eval.py/train.py> --cfg_file models/c3d/config_test.yaml" I get the following Python error : "FileNotFoundError: [Errno 2] No such file or directory: '/z/dat/HMDB51/train.json'". Can someone help me ?

Allow user to ignore final shape argument

Currently train.py (and maybe eval.py?) checks that the final_shape argument matches the actual image returned from the dataloader (line 149). Some architectures are able to handle multiple input shapes. Providing a method of ignoring this assertion (perhaps by setting final_shape to -1) would be helpful in some cases.

Create outline for usage of ViP

In Wiki

Add logging class

Create a class object to be passed through every model, loss, and metric that has a method allowing you to add a plot to tensorboard for any specified variable.

Additon of option to continue training from a saved checkpoint or train a new objective starting with a saved checkpoint

Currently the platform only supports continuing training from a selected checkpoint.
It does not allow the option of a warm start using a saved checkpoint and training using an alternative setup.

Clip stride, clip offset, and num clips are unimplemented

Update extract_clips so that clip stride, clip offset, and num clips work as specified in the config files.

Change DataLoader Collate Function

Currently for datasets with bounding boxes, we need to specify the max bounding boxes possible so all output batches are of the same size:

ViP/datasets/ImageNetVID.py

Line 27 in 74776f2

self.max_objects = 38

What we should do is use a custom collate function in the DataLoader like used in the Pytorch detection tutorial:

https://github.com/pytorch/vision/blob/6c2cda6a0eda4c835f96f18bb2b3be5043d96ad2/references/detection/utils.py#L237

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Scaling gradients when params requires_grad is False

If a network layer is frozen, the pseudo batch loop code crashes when it tries to scale the gradient.

Addition of feature extraction option

Inclusion of a feature extraction option, possibly from any desired layer in a selected model.

Support for multi-gpu training & evaluation

Add option for DataParallel training in PyTorch.
It's pretty straight forward, the only issues are when accessing the state_dict and functions belonging to model (for multi-gpu training). It becomesmodel.module.state_dict instead of model.state_dict.

michigancog / vip Goto Github PK

vip's Issues

Recommend Projects

Recommend Topics

Recommend Org