fmahoudeau / mict-net-pytorch Goto Github PK
View Code? Open in Web Editor NEWVideo Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
License: Apache License 2.0
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
License: Apache License 2.0
Is there any colab version for the work done?
Excuse me, how can I solve this problem๏ผ
python train.py --model mictresnet --version v1 --backbone resnet18 --lr 1e-2 --weight-decay 5e-4 --dropout 0.5 --batch-size 112 --base-size 192 --crop-size 160 --split 1 --checkname MiCTResNet_V1 --crop-vid 16 --epochs 120 --pretrained --lr-scheduler step --lr-step 80
=> creating output/ucf101/mictresnet/MiCTResNet_V1/
Namespace(backbone='resnet18', base_size=192, batch_size=112, checkname='MiCTResNet_V1', crop_size=160, crop_vid=16, cuda=True, data_folder='/home/y/MiCT/Datasets/', dataset='ucf101', dropout=0.5, epochs=120, eval=False, ft=False, gpu_id=0, lr=0.01, lr_scheduler='step', lr_step=80, model='mictresnet', model_zoo=None, momentum=0.9, no_cuda=False, no_val=False, pretrained=True, resume='', seed=1, split=1, start_epoch=0, test_batch_size=112, test_folder=None, version='v1', weight_decay=0.0005, workers=8)
Compute device: cuda:0
Loading UCF101 train1.csv
9537it [00:00, 254159.06it/s]
Loading UCF101 val1.csv
3783it [00:00, 239066.04it/s]
Total number of parameters: 16051685
Using step LR scheduler
0%| | 0/85 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 221, in
trainer.training(epoch)
File "train.py", line 148, in training
for i, (video, target) in enumerate(tbar):
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter
for obj in iterable:
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/y/Anaconda/envs/mm/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "../../mictnet/datasets/ucf101.py", line 115, in getitem
vid = self._train_transform(vid)
File "../../mictnet/datasets/ucf101.py", line 188, in _train_transform
w, h = vid[0].size
IndexError: list index out of range
Hi, sorry to bother you but can you share pre-trained weights?
Also I have a question accuracy which is you presented in in readme this is validation or train accuracy ?
Can you share the pretrained weight files.
Thank you.
Excuse me, I want to do pre-training on the kinetics700, but I don't know how to do it, can you give me some advice? I haven't found a similar pre-training code for reference. Do you know if there is one?
Dear authors,
I have read your code, but I have one question about the concatenting connection about 3/2D convolution.
The paper (MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition) try to select one channel ( a tensor with NCWH) to ahieve 2D convolution operation and select several channels (a tensor with NCDW*H) to achieve 3D convolution operation. I don't konw how to concatenate the feature maps of 3D convolution and the features maps of 2D convolution. If convenient, please answer this question.
Regards,
Lee
Dear authors,
I have read your code, but I have one question about the 2D convolution.
Normal 3D convolution is convolution of 3 frames of images. Can each 2D convolution in the code be interpreted as a simultaneous convolution of 16 frames of images?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.