Hi, May I ask what is the expected best accuracy I might find? So far I was able t

That should work. I also used the same to compute the input.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Expected accuracy about two-stream-pytorch HOT 7 CLOSED

bryanyzhu commented on May 24, 2024

Expected accuracy

from two-stream-pytorch.

Comments (7)

bryanyzhu commented on May 24, 2024

Using VGG16 model should give you an accuracy around 78%. Using other advanced models will give you higher accuracy. For example, if you use ResNet152, you should be able to achieve around 83%.

For optical flow, there may be bugs in my implementation right now. The preprocessing may need modification. I am doing the experiments and will update the code soon.

from two-stream-pytorch.

dataintensiveapplication commented on May 24, 2024

May I ask you what are the parameters that you used to achieve the 78% accuracy and after how many epochs you reach this value?
I've used the default parameters (and also other experiments), but the best result still 73.6%, that is far to your 78.
These are the parameters I tried: modality rgb, dataset ucf101, split 1, workers 4, epochs 1000, start-epoch 0, iter-size 5, new_length 1, new_width 340, new_height 256, save-freq 20, resume data/checkpoints, gpus 0, arch vgg16, batch-size 32, learning-rate 0.001 momentum 0.9 weight-decay 5e-4.

For the optical flow the results are even worse (near 60%), do you know on what are related the bugs you mentioned?

from two-stream-pytorch.

bryanyzhu commented on May 24, 2024

Hi, your parameters are good. I think maybe it is because of the input quality. I use opencv to decode the video to images and choose image quality to be 95, which is the default value. If you use ffmpeg to decode the video, the image quality is between 60 to 75. This means the image quality is low, which leads to worse performance. I use ffmpeg to decode before, and I also get 74% accuracy, similar to yours. So I think this is the reason.

For optical flow, I think it is because the preprocessing of optical flow. I use 0.5 and 0.5 for mean and std for now. But maybe it is not appropriate. Because in original caffe implementation, they only subtract the mean, but didn't divide the std. The pre-trained VGG16 models in Caffe and Pytorch accept different preprocessed inputs. Hope this is clear.

from two-stream-pytorch.

dataintensiveapplication commented on May 24, 2024

Thank for your reply.
To compute the input (both rgb frames and optical flows) I used the extract_optical_flow.sh script of this related project https://github.com/yjxiong/temporal-segment-networks .
The input fit well with your data parser without any issue, so I guess they are ok!

from two-stream-pytorch.

bryanyzhu commented on May 24, 2024

That should work. I also used the same script to compute the input.

from two-stream-pytorch.

dataintensiveapplication commented on May 24, 2024

I wonder on what is the cause of the difference in the accuracy. After how many epochs you get 78%?

I'm also working on the use of multiple (25/50/100..) frame samples for each test video, then averaging all the prediction to get the actual one, almost all the papers do something like this. I might share the code here once it works.

from two-stream-pytorch.

bryanyzhu commented on May 24, 2024

@dataintensiveapplication I updated the code and add my test script, also reported my accuracy. You can try to see if you can reproduce.

from two-stream-pytorch.

Expected accuracy about two-stream-pytorch HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent