Giter VIP home page Giter VIP logo

vip's Issues

JSON documentation does not match implementation (frame_size)

The detection_template.json file indicates that each individual frame should populate the frame_size parameter. However, when the file is read "KeyError: 'frame_size'" is returned, unless the frame size is a parameter nested directly under video (on the same level as base_path).

My guess is that the behavior is correct (since videos can't dynamically change size), but the documentation is incorrect.

config error

When I run python eval.py --cfg_file models/c3d/config_test.yaml:

Traceback (most recent call last):
File "eval.py", line 132, in
eval(**args)
File "eval.py", line 62, in eval
model = create_model_object(**args).to(device)
File "/home/byronnar/pyprojects/cv/video_re/models/models_import.py", line 30, in create_model_object
model = getattr(module, dir(module)[model_index])(**kwargs)
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 56, in init
self.__load_pretrained_weights()
File "/home/byronnar/pyprojects/cv/video_re/models/c3d/c3d.py", line 128, in __load_pretrained_weights
p_dict = torch.load('weights/c3d-pretrained.pth')
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 25948574 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount
.load() > 0 ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:350)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f5186cc9441 in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f5186cc8d7a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: THStorage_free + 0xca (0x7f510dab629a in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x149bbd (0x7f5187277bbd in /home/byronnar/anaconda3/envs/vip/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: __libc_start_main + 0xe7 (0x7f518be4ab97 in /lib/x86_64-linux-gnu/libc.so.6)

What should I do to solve this problem?
my devices:
os 1804
cuda 9.0
cudnn 7.3.1
python 3.6

About gen_json_UCF101

Could you please provide the gen_json_UCF101 file? Cuz I am using the UCF101 dataset for I3D training.

End of epoch divides incorrectly

The logging loss sum appears to be divided by the expected number of samples (i.e., the full minibatch size), instead of the actual samples processed. This results in an abrupt reduction in magnitude of the logged loss when an epoch ends.

division

Redundant learning rate decay when resuming experiment

Relevant lines of code: https://github.com/MichiganCOG/ViP/blob/master/train.py#L114-L115

Loading saved weights using pretrained argument also loads the last saved learning rate (after decaying per config file). However the learning rate is further decayed from the lines above, because the scheduler "loops" through all of the epochs again.

Example: If I ended an experiment with a learning rate of 1e-6 after decaying twice from 1e-4. Resuming that experiments gives me a starting learning rate of 1e-8.

Fails on zero grad

In instances where a neuron doesn't factor into the loss (e.g., a component of the loss is disabled for a specific experiment, resulting in a neuron or set of neurons being unused), autograd returns None for the unused connections. This results in a crash at the line:

param.grad *= 1./float(args['psuedo_batch_loop']*args['batch_size']

With the error:

TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'

This can be remedied by inserting:
if param.grad is not None:
prior to the line in question, but I'm unsure of any upstream consequences.

Parser does not support scientific notation

Using scientific notation in the config file (e.g., lr: 1e-4) causes the json parser to read as a string, resulting in an error. Specifically in the case of learning rate this results in an error at line 96 of train (during the optimizer init), but is likely to result in errors elsewhere for different params.

Variable clip length

In certain cases the input to the network is not raw frames, but some computed features. All of the processed frames can be loaded at once, so it'd be useful to not specify the clip length and just read all available data per video.

This only works with batch_size = 1, but this is when the pseudo batch loop can come in handy.

Seed Numpy in addition to Torch

Currently numpy is unseeded so all random function using it are not repeatable. The expectation is that the seed will be used for Torch and Numpy so that experiments will have identical results with the same seed.

Allow Unseeded Training

Sometimes it is necessary to produce networks seeded randomly (for showing robust performance, or for ensembling). It would be nice to be able to do this without changing the config at each launch, especially if there is a delay between sending the start command and actually launching the program.

No such file or directory: '/z/dat/HMDB51/train.json'

So, I installed every depedencies with "install.sh" on a Python 3.6. And whenever I try to train or eval the example's model with "python <eval.py/train.py> --cfg_file models/c3d/config_test.yaml" I get the following Python error : "FileNotFoundError: [Errno 2] No such file or directory: '/z/dat/HMDB51/train.json'". Can someone help me ?

Allow user to ignore final shape argument

Currently train.py (and maybe eval.py?) checks that the final_shape argument matches the actual image returned from the dataloader (line 149). Some architectures are able to handle multiple input shapes. Providing a method of ignoring this assertion (perhaps by setting final_shape to -1) would be helpful in some cases.

Add logging class

Create a class object to be passed through every model, loss, and metric that has a method allowing you to add a plot to tensorboard for any specified variable.

Change DataLoader Collate Function

Currently for datasets with bounding boxes, we need to specify the max bounding boxes possible so all output batches are of the same size:

self.max_objects = 38

What we should do is use a custom collate function in the DataLoader like used in the Pytorch detection tutorial:

https://github.com/pytorch/vision/blob/6c2cda6a0eda4c835f96f18bb2b3be5043d96ad2/references/detection/utils.py#L237

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Support for multi-gpu training & evaluation

Add option for DataParallel training in PyTorch.
It's pretty straight forward, the only issues are when accessing the state_dict and functions belonging to model (for multi-gpu training). It becomesmodel.module.state_dict instead of model.state_dict.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.