Code and models of paper " ECO: Efficient Convolutional Network for Online Video Understanding", ECCV 2018

License: MIT License

CMake 1.46% Makefile 0.31% MATLAB 0.56% Python 4.64% Shell 0.33% HTML 0.10% CSS 0.13% Jupyter Notebook 50.03% C++ 39.36% Cuda 3.08%

eco-efficient-video-understanding's People

Contributors

Stargazers

Watchers

Forkers

ml-lab aimeng100 levelsethu yusea fendaq gxdai willdamon bikong2 shuangshuangguo zhuxinqimac george86028 liuguoyou junmuzi elainebao wzmsltw 124399839 ammieqi hxl1990 sidatian aashiqmuhamed pplntech fytrace guanlongtianzi jackweiwang smartfour lamaric batermj spaceview chilicy chenboheng manqiaoyue xqpinitial shubhampachori12110095 ccccccq huakaibanmu wtdeng linhanxiao hungsing92 klqulei zhipengliu6 caidonkey sususushi xwang0415 ucasustc onlynata zomkey kmyfoer pkuhonker stevve sunyichao johnsonman dreadlord1984 gavin666github xuhuaze707313 wingsbrokenangel zjuqiushi sddai sadjadasghari wegamekinglc shiyongde wanzw01 xianfengju torchstream novellaww hzhang57 taishanmayi matrixplayer bazige newmesc zhonguochong lizhaodong nuaacj github-ftr xuejiwei73 jabongkoo25 isaac-young andrew-zhu defrawy superblc sk124 wittamer123 miaochenguo flyingwing linpengchao yulengchuanjiang amwons pkusnail warhammer0 wchen-casia vanu1 yeboqxc guoshuaixu adeljalalyousif

eco-efficient-video-understanding's Issues

how to set the different "num_segments: "

@mzolfaghari Hi, I plan to train the custom dataset with the different "num_segments" for the speed & accuracy.
And, I change the following information:
(1)
layer {
name: "data"
type: "VideoData"
top: "data"
top: "label"
video_data_param {
#source: ".../train.txt"
source: "./train_videofolder_new.txt"
batch_size: 1 #8 #16
new_length: 1
new_width: 320
new_height: 240
num_segments: 8 #16
modality: RGB
shuffle: true
name_pattern: "%05d.jpg"
}
transform_param{
crop_size: 224
mirror: true
fix_crop: true
more_fix_crop: true
multi_scale: true
max_distort: 1
scale_ratios:[1,.875,.75, .66]
is_flow: false
mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

}
include: { phase: TRAIN }
}

(2)
#=================== 3D network ===========================================
#layer {

name: "r2Dto3D"

type: "Reshape"

bottom: "inception_3c_double_3x3_1_bn"

top: "res2b_bn_pre"

reshape_param{

shape { dim: -1 dim: 16 dim: 96 dim: 28 dim: 28 }

}

#}
layer {
name: "r2Dto3D"
type: "Reshape"
bottom: "inception_3c_double_3x3_1_bn"
top: "res2b_bn_pre"
reshape_param{
shape { dim: -1 dim: 8 dim: 96 dim: 28 dim: 28 }
}
}

(3)
#layer { name: "reshape_fc_st2" type: "Reshape" bottom: "global_pool2D" top: "reshape_fc_st2" #reshape_param { shape { dim: [-1, 1, 16, 1024] } } }
#layer { name: "segment_consensus_st2" type: "Pooling" bottom: "reshape_fc_st2" top: "pool_fusion_st2" #pooling_param { pool: AVE kernel_h: 16 kernel_w: 1 } }

layer { name: "reshape_fc_st2" type: "Reshape" bottom: "global_pool2D" top: "reshape_fc_st2" reshape_param { shape { dim: [-1, 1, 8, 1024] } } }
layer { name: "segment_consensus_st2" type: "Pooling" bottom: "reshape_fc_st2" top: "pool_fusion_st2" pooling_param { pool: AVE kernel_h: 8 kernel_w: 1 } }

However, I got the following error;
I0823 11:47:52.993273 22 net.cpp:170] Top shape: 1 1024 (1024)
I0823 11:47:52.993278 22 layer_factory.hpp:74] Creating layer global_pool
I0823 11:47:52.993285 22 net.cpp:99] Creating Layer global_pool
I0823 11:47:52.993289 22 net.cpp:479] global_pool <- res5b_bn
I0823 11:47:52.993295 22 net.cpp:130] This layer is inheriting previous layer's sync mode: 1
I0823 11:47:52.993300 22 net.cpp:435] global_pool -> global_pool
I0823 11:47:52.993306 22 net.cpp:163] Setting up global_pool
F0823 11:47:52.993738 22 blob.cpp:32] Check failed: shape[i] >= 0 (-1 vs. 0)
*** Check failure stack trace: ***
@ 0x7f3e01c5c5cd google::LogMessage::Fail()
@ 0x7f3e01c5e433 google::LogMessage::SendToLog()
@ 0x7f3e01c5c15b google::LogMessage::Flush()
@ 0x7f3e01c5ee1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f3e01ff7878 caffe::Blob<>::Reshape()
@ 0x7f3e020f5a7b caffe::PoolingLayer<>::Reshape()
@ 0x7f3e0206deec caffe::CuDNNPoolingLayer<>::Reshape()
@ 0x7f3e02135abd caffe::Net<>::Init()
@ 0x7f3e0213821c caffe::Net<>::Net()
@ 0x4118a3 time()
@ 0x40e932 main
@ 0x7f3e0066f830 __libc_start_main
@ 0x40ef79 _start
@ (nil) (unknown)
Aborted (core dumped)

Looking forward to any replies.

download pretrained models

nice work in video understanding, i am very intrested in your work.
as google is blocked in China, i can not download the pretrained models you uploaded to google driver. so, can you share you pretrained models by Baidu Yun for us. thanks a lot

about the weight sharing of architecture overview of ECO or ECOLite

@mzolfaghari
Thanks for your excellent idea, paper and repo. And, after reading your paper, I'm a little bit confused about the weight sharing in your architecture overview. Would you mind telling me more details about it. I thought the N frames are related to the N number of Inception_3c of 2D-Net, so what's the meaning about the weight sharing.
Thanks.
Looking forward to any replies.

About precision on ucf101

Hi，
Thank you for your wonderful job! I use the UCF101 pretrained model downloaded by the script to test the split1 of UCF101, but i only got 88.9% (lite model) and 89.74% (full model) compare to the 91.6% and 92.8% in the paper. The segment is set to 16 and no parameter is changed.

Thx.

Kinetics results on ECO 16

Hi.

In the paper, it is mentioned that you get 64.4% accuracy on Kinetics database for ECO-lite 16 frame. To get this result, do you use the protocol of Kinetics evaluation, with randomly sampling clips of 16 frames from every video and averaging over the results of classification for each inference, or do you use a single inference with 16 frames randomly sampled from the 16 segments of the video?

I tried to reproduce this results using the weights eco_lite_rgb_16F_kinetics_v3.pth.tar and I get only 58.5% on a single inference.

Thanks

train ucf101 failed

Hi: @mzolfaghari

I was training ucf101 dataset and set ECO_Lite.prototxt parameters
`# ----- video/label input ----- for training
layer {
name: "data"
type: "VideoData"
top: "data"
top: "label"
video_data_param {
source: "/home/ww/ECO-efficient-video-understanding/datasets/ucf101/train_rgb_split1.txt"
batch_size: 10
new_length: 1
num_segments: 16
modality: RGB
shuffle: true
name_pattern: "image_%04d.jpg"
}
transform_param{
crop_size: 224
mirror: true
fix_crop: true
more_fix_crop: true
multi_scale: true
max_distort: 1
scale_ratios:[1,.875,.75, .66]
is_flow: false
mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

}
include: { phase: TRAIN }
}
layer {
name: "data"
type: "VideoData"
top: "data"
top: "label"
video_data_param {
source: "/home/ww/ECO-efficient-video-understanding/datasets/ucf101/val_rgb_split1.txt"
batch_size: 1
new_length: 1
num_segments: 16
modality: RGB
name_pattern: "image_%04d.jpg"
}
transform_param{
crop_size: 224
mirror: false
mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

}
include: { phase: TEST }
}`
but,when i start execute script used scripts

sh models_ECO_Lite/ucf101/run.sh

The following errors are listed:

`I0904 16:34:31.579064 7738 video_data_layer.cpp:37] Opening file: /home/ww/ECO-efficient-video-understanding/datasets/ucf101/val_rgb_split1.txt

I0904 16:34:31.603790 7738 video_data_layer.cpp:54] A total of 41822 videos.
*** Aborted at 1536050071 (unix time) try "date -d @1536050071" if you are using GNU date ***
PC: @ 0x7f235907b44f caffe::VideoDataLayer<>::DataLayerSetUp()
*** SIGFPE (@0x7f235907b44f) received by PID 7738 (TID 0x7f23597a6740) from PID 1493677135; stack trace: ***
@ 0x7f2357ba74b0 (unknown)
@ 0x7f235907b44f caffe::VideoDataLayer<>::DataLayerSetUp()
@ 0x7f2359061699 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7f23591123c5 caffe::Net<>::Init()
@ 0x7f23591149b5 caffe::Net<>::Net()
@ 0x7f23590ec1fa caffe::Solver<>::InitTestNets()
@ 0x7f23590eca66 caffe::Solver<>::Init()
@ 0x7f23590ecc26 caffe::Solver<>::Solver()
@ 0x412148 caffe::GetSolver<>()
@ 0x409576 train()
@ 0x406fb0 main
@ 0x7f2357b92830 __libc_start_main
@ 0x4074e9 _start
@ 0x0 (unknown)
Floating point exception (core dumped)
`
The 'train_rgb_split1.txt' is loaded correctly while 'test_rgb_split1.txt' is not.
Please tell me why.

More detailed logs are there log.txt

Can you give me some advice, thanks!

2D tensor changed to 3D tensor, runtime error

I tried to change 2D tensor into 3D tensor, so it can be feed into 3D Conv. The code are shown as follows:
y = self.con1x1(x) # 2D tensor
y = y.view((-1, 96, 16) + y.size()[2:]) #2D tensor changed to 3D tensor

error:

Thank you so much for you kind help.

Using trained PyTorch model

Hello,

Thank you for making the PyTorch implementation available. What are the steps to using a trained model for activity recognition on some videos in my folder?

I have downloaded the ECO Lite model using gd_download.py and have the model downloaded. I have a folder of videos in .avi format.

Thanks,

setting the number of segments

The presented method setting the number of segments seems not right.

layer { name: "reshape_fc_st2" type: "Reshape" bottom: "global_pool2D" top: "reshape_fc_st2" reshape_param { shape { dim: [-1, 1, 16, 1024] } } }

layer { name: "segment_consensus_st2" type: "Pooling" bottom: "reshape_fc_st2" top: "pool_fusion_st2" pooling_param { pool: AVE kernel_h: 16 kernel_w: 1 } }

it looks like that the parameters in the above two layers need to be changed accordingly.

Training ECO PyTorch on the Moments in Time dataset

Hello, I intend to use the ECO PyTorch code to train the model on the moments in time dataset. I have the frames extracted and the training and validation lists prepared.

I'd greatly appreciate any guidance on the changes to be made to the code (main.py) to train the model on the moments dataset (or any dataset other than kinetics).

Thank you,

change the network to Fully Convolutional Networks(FCN) different output

great work for action recongnition.
i am trying to change your network to FCN to fit big scene with multi actions.
change the input from 224224 to 512512 we got 10051 output with the kinetcs model.
with total black input the 100 output of 151 differs from the same output as we expected.
can you help me with this problem?
bad english writing, hope you can understand my word.
looking for your reply.

frames for Kinetics dataset

Hello,

In the script 'create_list_kinetics.m', you have the following path:

path_DB_rgb='/datasets/kinetics/train/db_frames//'

I'm assuming this folder contains the frames for the kinetics videos. Are the frames for the videos available online somewhere, or is there a script available to split the videos into the frames?

I tried running ''main.py' of your pytorch implementation and got the following error:

when running

---> 20 for i, (input, target) in enumerate(train_loader):

......

FileNotFoundError: [Errno 2] No such file or directory: '/kinetics/pumping_gas/ib5PzcBeYIc_000004_000014/0004.jpg'

Thanks,

the uncertainties about the file "online_recognition.py"

Firstly thank your great contribution. I have a question about the testing.
when test the .caffemodel "ECOfull_UCF101_16F.caffemodel", with the input of validation video, .e.g., "v_PlayingDhol_g04_c02.avi", the result changes sevaral times, but, I find there no correct labels. I have test a lot of times with other vailidation video, the conclution seems not to be changed.

can you give me some suggestion about testing ? Thank you very much

About the inference speed?

Dear sir,

I just ust the online_recognition.py method to test the speed, and I found that I wiil use about 50ms for one forward with input shape(163224*224),

I think I can't achieve the hight VPS in the paper, thank you !

about the video_data_param and train.txt

@mzolfaghari
Thanks for your excellent repo and paper.
video_data_param {
source: ".../train.txt"
batch_size: 16
new_length: 1
new_width: 320
new_height: 240
num_segments: 16
modality: RGB
shuffle: true
name_pattern: "%05d.jpg"
}

I want to use your model to train my custom datasets, so, would you mind telling us the format in your train.txt. And, what's more, as for the num_segments: 16, there are 16
{mean_value: [104]
mean_value: [117]
mean_value: [123]
}
the number of the num_segments is the same as the {mean_value}?

hello ,how can i train my own dataset use your pretrained model? thk

Why is ECO performing so well for early recognition?

Hi @mzolfaghari
Thanks for your paper and the upcoming code! The performance of ECO is very impressive, especially for the early recognition task. According to your paper, ECO can achived an accuracy of more than 0.8 when the observation ratio is only 0.1. Since ECO is trained with sparse sampling to capture long-term dependencies, I'm wondering why it performs so well for early recognition, in which the input is obtained by dense sampling. Any intuitive idea about that? By the way, have you tried performing inference by dense sliding window, as typical 3D CNN does?

Download links to pre-trained models

Hi @mzolfaghari,

Thanks for sharing your code!

I tried to download the pre-trained ECO_(full|Lite)_UCF101.caffemodel but I couldn't. Similarly, I couldn't download pre-trained models on HMDB51.
e.g.
Download id: 1FMFjtzt_sBWRUyBV86d1cuwHAlnw7HpB
https://docs.google.com/uc?export=download&id=1FMFjtzt_sBWRUyBV86d1cuwHAlnw7HpB

If possible, could you please check and share the correct download ids?

Thanks!

Some questions about the ECO Lite arch & training details

hi, @mzolfaghari ! Thanks for your excellent work, and I'm glad you will release your code here~
But I have some questions about the ECO Lite arch & training details:

The 2D-Net use BN-Inception Arch(until inception-3c layer) and the output channel is 96, does it mean that only select one branch(output channel: 96) of the inception-3c layer?
For the original 3D-Resnet18, down-sampling is performed by conv3_1, conv4_1 and conv5_1. I noticed that in the supplementary material Table 1, the output size of conv3_x layer is "28x28xN", it seems that down-sampling is not performed by conv3_1, and conv3_1 use stride "1x1x1"?
I try to implement the ECO Network using PyTorch, I initialize the weights of the 2D-Net with the BN-Inception arch pretrained on Kinetics, which provided by tsn-pytorch. And I initialize 3D-Net with the Kinetics-pretrained model of 3D-Resnet18 provided by 3D-ResNets-PyTorch. But when I trained on UCF101, the training loss dropped well, but the test loss performed badly, it seems like overfitting.
(I noticed that in page 7 of the paper, after initializing the weights from different pretrained models, you train ECO and ECO Lite on the Kinetics dataset for 10 epochs, but I didn't do this, can this cause overfitting?)

finetune my own dataset problem

when i finetune my own dataset ,loss=0
why?
I1115 13:17:15.940382 25582 solver.cpp:241] Iteration 20, loss = 0
I1115 13:17:15.940572 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:17:15.940580 25582 solver.cpp:665] Iteration 20, lr = 0.001
I1115 13:18:19.726904 25582 solver.cpp:241] Iteration 40, loss = 0
I1115 13:18:19.727092 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:18:19.727100 25582 solver.cpp:665] Iteration 40, lr = 0.001
I1115 13:19:23.873339 25582 solver.cpp:241] Iteration 60, loss = 0
I1115 13:19:23.873425 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:19:23.873433 25582 solver.cpp:665] Iteration 60, lr = 0.001
I1115 13:20:29.114048 25582 solver.cpp:241] Iteration 80, loss = 0
I1115 13:20:29.114857 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:20:29.114867 25582 solver.cpp:665] Iteration 80, lr = 0.001
I1115 13:21:34.218885 25582 solver.cpp:241] Iteration 100, loss = 0
I1115 13:21:34.218997 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:21:34.219003 25582 solver.cpp:665] Iteration 100, lr = 0.001
I1115 13:21:35.014933 25582 solver.cpp:273] Maximum resident set size is: 1352040 kb
I1115 13:21:35.015017 25582 solver.cpp:282] cuda meminfo: used 3957 MB, of 4037 MB
I1115 13:22:39.643602 25582 solver.cpp:241] Iteration 120, loss = 0
I1115 13:22:39.643775 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:22:39.643785 25582 solver.cpp:665] Iteration 120, lr = 0.001
I1115 13:23:45.116634 25582 solver.cpp:241] Iteration 140, loss = 0
I1115 13:23:45.116822 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:23:45.116832 25582 solver.cpp:665] Iteration 140, lr = 0.001
I1115 13:24:50.277351 25582 solver.cpp:241] Iteration 160, loss = 0
I1115 13:24:50.277442 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:24:50.277449 25582 solver.cpp:665] Iteration 160, lr = 0.001
I1115 13:25:55.094033 25582 solver.cpp:241] Iteration 180, loss = 0
I1115 13:25:55.094105 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:25:55.094113 25582 solver.cpp:665] Iteration 180, lr = 0.001
I1115 13:26:59.386559 25582 solver.cpp:241] Iteration 200, loss = 0
I1115 13:26:59.386631 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:26:59.386636 25582 solver.cpp:665] Iteration 200, lr = 0.001
I1115 13:28:04.782306 25582 solver.cpp:241] Iteration 220, loss = 0
I1115 13:28:04.782454 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:28:04.782464 25582 solver.cpp:665] Iteration 220, lr = 0.001
I1115 13:29:09.191298 25582 solver.cpp:241] Iteration 240, loss = 0
I1115 13:29:09.191468 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)
I1115 13:29:09.191490 25582 solver.cpp:665] Iteration 240, lr = 0.001
I1115 13:30:13.419653 25582 solver.cpp:241] Iteration 260, loss = 0
I1115 13:30:13.419807 25582 solver.cpp:256] Train net output #0: loss = 0 (* 1 = 0 loss)

Train Caffe model on Moments dataset

I was able to make the changes and train the PyTorch version of ECO on the Moments in Time dataset. Could you please guide me to the changes to be made to train the Caffe version of ECO on the Moments in Time dataset? Thanks,

validation precision of pytorch-ECO-Lite on Something-something V1

Hi, using the pytorch code provided.
I only obtain 38.3 accuracy on Something-something V1 validation set.
Which is worse than reported, any suggestions?

how to count action in a video

When I recognize a video,

How to count how many times this action has taken place??

thank you very much

Full architecture - Supplementary material

Hi,
Thanks for making open-source your submission that is very useful. Would it be possible for you to upload the supplementary material on arxiv?
In your paper you refer to the supplementary material concerning the architecture.
I am especially interested in your 2D_Nets_s and 3D_Net architecture.

Thanks and well done for your impressive work :)

Fabien

How to train on something dataset?

I basically reproduce the result on hmdb51, but I can't reproduce same result on something dataset.
Can you tell me how to set dropout, learning rate or other training trick?

Thank u very much！

About the blob's problem when I run online_recognition?

sir,

when I run online_recognition.py , when loading the model, the problem occurs such below:

blob.hpp:141] Check failed: num_axes() <= 4 (5 vs. 4) Cannot use legacy accessors on Blobs with > 4 axes

does the caffe_3d branch support C3D?

thank you very much!

And I just compile caffe by make$make pycaffe but not cmake.

About make && make install

I have finished the steps before make && make install,when I conducted make && make install,it had a error about
No implicit rule found for 'install'.
Finished prerequisites of target file 'install'.
Must remake target 'install'.
make: *** No rule to make target 'install'。停止。
Can you help me solve the problem?Thank you in advance

About video captioning

Thanks for your excellent repo and paper.
From your paper I learnt that you used kinetic pertained model to extract features for video captioning task. Could you explain more about how you extract ECO features for MSVD dataset? (for instance, from which layer you extract features, how to set num_class for MSVD dataset)

Thank you in advance.

how to set different "num_segments"

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

mean_value: [104]
mean_value: [117]
mean_value: [123]

}
include: { phase: TRAIN }
}

(2)
#=================== 3D network ===========================================
#layer {

name: "r2Dto3D"

type: "Reshape"

bottom: "inception_3c_double_3x3_1_bn"

top: "res2b_bn_pre"

reshape_param{

shape { dim: -1 dim: 16 dim: 96 dim: 28 dim: 28 }

}

#}
layer {
name: "r2Dto3D"
type: "Reshape"
bottom: "inception_3c_double_3x3_1_bn"
top: "res2b_bn_pre"
reshape_param{
shape { dim: -1 dim: 8 dim: 96 dim: 28 dim: 28 }
}
}

Looking forward to any replies.
Thanks

When using single input, model always produces wrong output

** Although I am testing PyTorch implementation, I guess it might happen to the Caffe implementation as well. So I am asking if there is an answer to my question.

Hello,

I am Youngkyoon Jang, who is a postdoc in the University of Bristol. Thanks for sharing the PyTorch implementation of ECO network for testing new dataset, EPIC-KITCHEN. But I am facing the problem when testing the model to calculate the accuracy.

I am currently using PyTorch implementation for ECO network. And I noticed that the model always predicts the first index with the highest score when using a single input instead of multiple mini-batch samples in testing. Did you know this problem? Is there a correct way to get a consistent output?

When I put a different number of batch sample in a mini-batch, the model predicts a different score for the same sample depending on the number of mini-batch.

I look forward to your reply.

Best,
Young

nan during the first forward pass

I'm fine tuning the pretrained ECO Full architecture on my own dataset. The inference (deploy.protoxt) runs with no issues. The only difference in the two files is the last layer and the data layer.

I noticed in the first forward pass, res3b_bn blows up (inf). I'm wondering if this is an issue someone has seen before. I've attached the log file.
log.txt

Question about using caffe image

@mzolfaghari
Thank you very much for providing such good ideas and articles. I can't wait to reproduce this code. But I encountered difficulties in configuring the environment. I use bvlc/caffe's image. In the container, follow the steps you suggested to enter the caffe3d folder and execute make all -j8. This error occurs:
root@ecdc968b55d9:/data/ECO-efficient-video-understanding-master/caffe_3d# make all -j8 /bin/sh: 1: bc: not found CXX src/caffe/internal_thread.cpp CXX src/caffe/layer_factory.cpp CXX src/caffe/syncedmem.cpp CXX src/caffe/data_transformer.cpp CXX src/caffe/solver.cpp CXX src/caffe/common.cpp CXX src/caffe/net.cpp CXX src/caffe/blob.cpp In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/layer.hpp:8, from src/caffe/layer_factory.cpp:3: .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is #error This file was generated by a newer version of protoc which is ^ .build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update #error incompatible with your Protocol Buffer headers. Please update ^ .build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers. #error your headers. ^ In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/layer.hpp:8, from src/caffe/layer_factory.cpp:3: .build_release/src/caffe/proto/caffe.pb.h:23:35: fatal error: google/protobuf/arena.h: No such file or directory compilation terminated. In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/net.hpp:10, from src/caffe/solver.cpp:7: .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is #error This file was generated by a newer version of protoc which is ^ .build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update #error incompatible with your Protocol Buffer headers. Please update ^ .build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers. #error your headers. ^ In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/net.hpp:10, from src/caffe/solver.cpp:7: .build_release/src/caffe/proto/caffe.pb.h:23:35: fatal error: google/protobuf/arena.h: No such file or directory compilation terminated. In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/layer.hpp:8, from src/caffe/net.cpp:9: .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is #error This file was generated by a newer version of protoc which is ^ .build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update #error incompatible with your Protocol Buffer headers. Please update ^ .build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers. #error your headers. ^ In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/layer.hpp:8, from src/caffe/net.cpp:9: .build_release/src/caffe/proto/caffe.pb.h:23:35: fatal error: google/protobuf/arena.h: No such file or directory compilation terminated. Makefile:516: recipe for target '.build_release/src/caffe/layer_factory.o' failed make: *** [.build_release/src/caffe/layer_factory.o] Error 1 make: *** Waiting for unfinished jobs.... Makefile:516: recipe for target '.build_release/src/caffe/solver.o' failed make: *** [.build_release/src/caffe/solver.o] Error 1 Makefile:516: recipe for target '.build_release/src/caffe/net.o' failed make: *** [.build_release/src/caffe/net.o] Error 1 In file included from ./include/caffe/blob.hpp:9:0, from src/caffe/blob.cpp:4: .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is #error This file was generated by a newer version of protoc which is ^ .build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update #error incompatible with your Protocol Buffer headers. Please update ^ .build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers. #error your headers. ^ In file included from ./include/caffe/blob.hpp:9:0, from src/caffe/blob.cpp:4: .build_release/src/caffe/proto/caffe.pb.h:23:35: fatal error: google/protobuf/arena.h: No such file or directory compilation terminated. Makefile:516: recipe for target '.build_release/src/caffe/blob.o' failed make: *** [.build_release/src/caffe/blob.o] Error 1 In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/data_transformer.hpp:6, from src/caffe/data_transformer.cpp:7: .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc which is #error This file was generated by a newer version of protoc which is ^ .build_release/src/caffe/proto/caffe.pb.h:13:2: error: #error incompatible with your Protocol Buffer headers. Please update #error incompatible with your Protocol Buffer headers. Please update ^ .build_release/src/caffe/proto/caffe.pb.h:14:2: error: #error your headers. #error your headers. ^ In file included from ./include/caffe/blob.hpp:9:0, from ./include/caffe/data_transformer.hpp:6, from src/caffe/data_transformer.cpp:7: .build_release/src/caffe/proto/caffe.pb.h:23:35: fatal error: google/protobuf/arena.h: No such file or directory compilation terminated. Makefile:516: recipe for target '.build_release/src/caffe/data_transformer.o' failed make: *** [.build_release/src/caffe/data_transformer.o] Error 1

Can you help me analyze where I am wrong? Then how do I need to solve it?

ECO-pytorch Performance on Kinetics validation set

Could you report ECO-pytorch Performance on Kinetics validation set (ECO-Lite)?

I evaluated Kinetics-pretrained ECO-Lite model from ECO-pytorch on Kinetics validation set,
only got 62.1% which is not as good as 64.4% in paper. (ECO-Lite, number of segment = 16)

how to set multi-GPUs

when I train the network, I got the information as following:
(I build the caffe-training environment under the https://github.com/yjxiong/temporal-segment-networks)
You are running caffe compiled with MPI support. Now it's running in non-parallel model
what's meaning?

What's more, I can't use the multi-gpus to train my model.

Looking forward to any advice.

about "Online video understading",in your paper

can you show specific implements of "Online video understading" codes，and by the way,is this alogrithm use in training or testing model?

Initialization models and accuracy@1 in pytorch

Thanks for your excellent work!
I want to reproduce your paper，But I have some questions about Initialization models and accuracy@1
Q1: does the models use Initialization models in your paper in ECO-pytorch if you just set args.pretrained_parts == "both", In other words, weight_url_2d='https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth' and models/C3DResNet18_rgb_16F_kinetics_v1.pth.tar
is your Initialization models in your paper ECO-pytorch
Q2: is it normal that baseline is 53.4% when i set args.pretrained_parts == "scratch" and num_segments=16 , could you provide some information about your model accuracy@1 when you set pretrained_parts='scratch', '2D', '3D', 'both','finetune' and segments=16
i am new learner in this, If you can help me, I will be very grateful.
Looking forward to your reply @mzolfaghari

About the training strategy

@mzolfaghari Thank you for your excellent work and the public-available model！
After reading your paper and running through the code(ECO-pytorch), I have some question about the training strategy. I followed the training parameter in the training script to finetune the ECO-lite model on UCF101 using 16 frames, but I failed to get the same result as in the paper which is 91.6% on the 3 splits of UCF101.
Then I tried to finetune the ECO-full model using the same strategy, but I got the nearly same result as ECO-lite, which is approximately 88%. It seems that ECO-full does not improve the performance. So I wonder if there is any difference between the training parameters of lite and full model.
Thank you!

Some questions about dataset file list.

Hi @mzolfaghari

Thanks for your excellent work! In updated repo, I found that the dataset file lists are not provided (kinetics_train_frm.txt for example). Can you provide these files for reference? Thanks so much!

problem with multiple gpus

solved

About An Ensemble Model

Hi, thank you for releasing a code for the paper.

I have a question on the implementation.
It is written in the paper that you have obtained the best performance on something-something from an ensemble of networks with {16, 20, 24, 32} number of frames.

I wonder how was this ensemble implemented?
Did you train one single model (e.g. taking 16 frames as an input) and test by making that model to take various number of frames {16, 20, 24, 32} (it could be possible because the model performs global average pooling at the end of 3DConvNet so temporal dimension goes away),
or
train multiple models with different number of frames (\e.g. one model takes 16 frames on both train/test, another model takes 20 frames on both train/test, .. , the other takes 32 frames on both train/test)?

Thank you

About online recognition and video captioning

Hi ,I am very interested in your outstanding work.
But I have a problem when i run online recognition.py
I set the model paths in online_recognition.py as :

mean_file = "/home/gs/Desktop/ECO/caffe_3d/action_matlab/rgb_mean.mat"
model_def_file = '/home/gs/Desktop/ECO/Models/ECO_Lite_kinetics.caffemodel'
model_file = '/home/gs/Desktop/ECO/models_ECO_Lite/kinetics/deploy.prototxt'

when i run "python online_recognition.py"
And i got error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W1224 10:08:03.607410 2554 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W1224 10:08:03.607450 2554 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W1224 10:08:03.607463 2554 _caffe.cpp:142] Net('/home/gwl/Desktop/test/model/deploy.prototxt', 1, weights='/home/gwl/Desktop/test/model/ECO_Lite_kinetics.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 51:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F1224 10:08:03.625159 2554 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/gwl/Desktop/test/model/deploy.prototxt
*** Check failure stack trace: ***

Did I set it wrong?
Please help me,thanks!

Some questions about caffe model

After I install caffe successfully，when I try to import caffe in python ，one error occurs：
Traceback (most recent call last): File "online_recognition.py", line 4, in <module> import caffe File "/root/ECO4/caffe_3d/python/caffe/__init__.py", line 1, in <module> from .pycaffe import Net, SGDSolver File "/root/ECO4/caffe_3d/python/caffe/pycaffe.py", line 13, in <module> from ._caffe import Net, SGDSolver ImportError: /root/ECO4/caffe_3d/python/caffe/_caffe.so: invalid ELF header

before I compile caffe，I changed Makefile.config about PYTHONPATH and the path of protobuf,and I have added the following line:
export PYTHONPATH=/root/ECO2/caffe_3d/python:$PYTHONPATH
I google it and someone says I installed the wrong version，can anyone help me？ thx！

About scripts for online recognition and video captioning

Hi ,I am very interested in your outstanding work.
But I have a problem when i run online recognition.py
I set the model paths in online_recognition.py as :

rgb_mean.mat: /home/ra/ECO-efficient-video-understanding-master/caffe_3d/action_matlab
deploy.prototxt: /home/ra/ECO-efficient-video-understanding-master/models_ECO_Lite/kinetics
ECO_Lite_kinetics.caffemodel: download from Google Drive

And i got error:
python online_recognition.py
Setting device 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1105 17:18:41.087724 7888 net.cpp:46] Initializing net from parameters:
name: "ECOLite"
input: "data"
input_dim: 80
input_dim: 3
input_dim: 224
input_dim: 224
.
.
.
I1105 17:19:27.531430 7888 net.cpp:551] Collecting Learning Rate and Weight Decay.
I1105 17:19:27.531447 7888 net.cpp:300] Network initialization done.
I1105 17:19:27.531451 7888 net.cpp:301] Memory required for data: 3935844160
GLib-GIO-Message: 17:19:43.243: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Traceback (most recent call last):
File "online_recognition.py", line 122, in
online_predict(mean_file,model_def_file,model_file,classes_file,num_categories)
File "online_recognition.py", line 92, in online_predict
net.blobs['data'].data[...] = np.transpose(rgb[:,:,:,:], (3,2,1,0))
ValueError: could not broadcast input array from shape (16,3,224,224) into shape (80,3,224,224)

Did I set it wrong?
Please help me,thanks!

CheckSymbolExists.c:(.text+0x16): undefined reference to `pthread_create'

Determining if the pthread_create exist failed with the following output:
Change Dir: /home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_35b4f/fast"
/usr/bin/make -f CMakeFiles/cmTC_35b4f.dir/build.make CMakeFiles/cmTC_35b4f.dir/build
make[1]: Entering directory '/mnt/raid3/home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_35b4f.dir/CheckSymbolExists.c.o
/usr/bin/cc -o CMakeFiles/cmTC_35b4f.dir/CheckSymbolExists.c.o -c /home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c
Linking C executable cmTC_35b4f
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_35b4f.dir/link.txt --verbose=1
/usr/bin/cc -rdynamic CMakeFiles/cmTC_35b4f.dir/CheckSymbolExists.c.o -o cmTC_35b4f
CMakeFiles/cmTC_35b4f.dir/CheckSymbolExists.c.o: In function main': CheckSymbolExists.c:(.text+0x16): undefined reference to pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_35b4f.dir/build.make:86: recipe for target 'cmTC_35b4f' failed
make[1]: *** [cmTC_35b4f] Error 1
make[1]: Leaving directory '/mnt/raid3/home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp'
Makefile:121: recipe for target 'cmTC_35b4f/fast' failed
make: *** [cmTC_35b4f/fast] Error 2

File /home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>

int main(int argc, char** argv)
{
(void)argv;
#ifndef pthread_create
return ((int*)(&pthread_create))[argc];
#else
(void)argc;
return 0;
#endif
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_403cc/fast"
/usr/bin/make -f CMakeFiles/cmTC_403cc.dir/build.make CMakeFiles/cmTC_403cc.dir/build
make[1]: Entering directory '/mnt/raid3/home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_403cc.dir/CheckFunctionExists.c.o
/usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_403cc.dir/CheckFunctionExists.c.o -c /usr/local/lib/python3.5/dist-packages/cmake/data/share/cmake-3.12/Modules/Chec
kFunctionExists.c
Linking C executable cmTC_403cc
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_403cc.dir/link.txt --verbose=1
/usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_403cc.dir/CheckFunctionExists.c.o -o cmTC_403cc -lpthreads
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_403cc.dir/build.make:86: recipe for target 'cmTC_403cc' failed
make[1]: *** [cmTC_403cc] Error 1
make[1]: Leaving directory '/mnt/raid3/home/lijie/ECO-efficient-video-understanding/caffe_3d/build/CMakeFiles/CMakeTmp'
Makefile:121: recipe for target 'cmTC_403cc/fast' failed
make: *** [cmTC_403cc/fast] Error 2

@mzolfaghari when I run cmake .. ，I met this problem above，please help，thanks！

About the training details

@mzolfaghari Thank you for your excellent work. Your paper mentioned that

We initialize the weights of the 2D-Net weights with the BN-Inception architecture [31] pre-trained on Kinetics, as provided by [33].

I wonder how you train a 2D-Net on Kinetics.

about the weight sharing of architecture overview of ECO or ECOLite

Thanks for your excellent idea, paper and repo. And, after reading your paper, I'm a little bit confused about the weight sharing in your architecture overview. Would you mind telling me more details about it. I thought the N frames are related to the N number of Inception_3c of 2D-Net, so what's the meaning about the weight sharing.
Thanks.
Looking forward to any replies.

Pytorch pretrained models of ECO on Kinetics

Did you have the pytorch pretrained models on Kinetics? I would really appreciate it if you can share it

Can you upload the dataset on GitHub

@mzolfaghari Thank you for providing such good ideas and articles. I really want to reproduce this code. But I don't have the dataset, for example Kinetics dataset and so on. Can you upload on GitHub soon? Thank you very much!

in the file models_ECO_Lite/kinetics/deploy.prototxt why the input_dim is 80?

name: "ECOLite"
####################################### data #######################################
input: "data"
input_dim: 80

why the input_dim is 80 ?
is batchsize 5 ?16 * 5 =80 ,is that right ?

Training loss explosion

Dear sir, when I lately train the ECO network based on UCF101 dataset, the loss is always 87.3365 from the begining even I changed the learning rate and some other hyper parameters, also when I use the ECO_Lite_UCF101.caffemodel for tesing, it predicts all the objects to the same one class. I wonder whether you can give me any advice to solve the problem. Thank you.

mzolfaghari / eco-efficient-video-understanding Goto Github PK

eco-efficient-video-understanding's People

Contributors

Stargazers

Watchers

Forkers

eco-efficient-video-understanding's Issues

name: "r2Dto3D"

type: "Reshape"

bottom: "inception_3c_double_3x3_1_bn"

top: "res2b_bn_pre"

reshape_param{

shape { dim: -1 dim: 16 dim: 96 dim: 28 dim: 28 }

}

name: "r2Dto3D"

type: "Reshape"

bottom: "inception_3c_double_3x3_1_bn"

top: "res2b_bn_pre"

reshape_param{

shape { dim: -1 dim: 16 dim: 96 dim: 28 dim: 28 }

}

Recommend Projects

Recommend Topics

Recommend Org