Giter VIP home page Giter VIP logo

Comments (7)

bryanyzhu avatar bryanyzhu commented on May 24, 2024

Using VGG16 model should give you an accuracy around 78%. Using other advanced models will give you higher accuracy. For example, if you use ResNet152, you should be able to achieve around 83%.

For optical flow, there may be bugs in my implementation right now. The preprocessing may need modification. I am doing the experiments and will update the code soon.

from two-stream-pytorch.

dataintensiveapplication avatar dataintensiveapplication commented on May 24, 2024

May I ask you what are the parameters that you used to achieve the 78% accuracy and after how many epochs you reach this value?
I've used the default parameters (and also other experiments), but the best result still 73.6%, that is far to your 78.
These are the parameters I tried: modality rgb, dataset ucf101, split 1, workers 4, epochs 1000, start-epoch 0, iter-size 5, new_length 1, new_width 340, new_height 256, save-freq 20, resume data/checkpoints, gpus 0, arch vgg16, batch-size 32, learning-rate 0.001 momentum 0.9 weight-decay 5e-4.

For the optical flow the results are even worse (near 60%), do you know on what are related the bugs you mentioned?

from two-stream-pytorch.

bryanyzhu avatar bryanyzhu commented on May 24, 2024

Hi, your parameters are good. I think maybe it is because of the input quality. I use opencv to decode the video to images and choose image quality to be 95, which is the default value. If you use ffmpeg to decode the video, the image quality is between 60 to 75. This means the image quality is low, which leads to worse performance. I use ffmpeg to decode before, and I also get 74% accuracy, similar to yours. So I think this is the reason.

For optical flow, I think it is because the preprocessing of optical flow. I use 0.5 and 0.5 for mean and std for now. But maybe it is not appropriate. Because in original caffe implementation, they only subtract the mean, but didn't divide the std. The pre-trained VGG16 models in Caffe and Pytorch accept different preprocessed inputs. Hope this is clear.

from two-stream-pytorch.

dataintensiveapplication avatar dataintensiveapplication commented on May 24, 2024

Thank for your reply.
To compute the input (both rgb frames and optical flows) I used the extract_optical_flow.sh script of this related project https://github.com/yjxiong/temporal-segment-networks .
The input fit well with your data parser without any issue, so I guess they are ok!

from two-stream-pytorch.

bryanyzhu avatar bryanyzhu commented on May 24, 2024

That should work. I also used the same script to compute the input.

from two-stream-pytorch.

dataintensiveapplication avatar dataintensiveapplication commented on May 24, 2024

I wonder on what is the cause of the difference in the accuracy. After how many epochs you get 78%?

I'm also working on the use of multiple (25/50/100..) frame samples for each test video, then averaging all the prediction to get the actual one, almost all the papers do something like this. I might share the code here once it works.

from two-stream-pytorch.

bryanyzhu avatar bryanyzhu commented on May 24, 2024

@dataintensiveapplication I updated the code and add my test script, also reported my accuracy. You can try to see if you can reproduce.

from two-stream-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.