Giter VIP home page Giter VIP logo

Comments (8)

BestJuly avatar BestJuly commented on July 19, 2024

Hi, @linyuanze. Thank you for your interest.

The accuary 2% for HMDB51 is like random guess.

In my experimental settings, everything is fine, so I am not sure what you have met in experiments.

To help you check where goes wrong, I would like to ask

  1. For the SSL pre-training part, have you checked the performance for video retrieval? How is it going?
  2. For the fine-tuning part, does the network converge? You should at least find the accuracy on training dataset good.
  3. Does the problem happen only for HMDB51?
  4. Except for the dataset path, have you modified the codes?

from iic.

yuanze-lin avatar yuanze-lin commented on July 19, 2024

Hi, @linyuanze. Thank you for your interest.

The accuary 2% for HMDB51 is like random guess.

In my experimental settings, everything is fine, so I am not sure what you have met in experiments.

To help you check where goes wrong, I would like to ask

  1. For the SSL pre-training part, have you checked the performance for video retrieval? How is it going?
  2. For the fine-tuning part, does the network converge? You should at least find the accuracy on training dataset good.
  3. Does the problem happen only for HMDB51?
  4. Except for the dataset path, have you modified the codes?

Really appreciate for your reply!!!

 1. No, I haven't checked the performance for video retrieval, I only check the performance of video classification, especially on UCF101 dataset.
 2. Yes, I've already self-supervised pretrain model on UCF101, and the model converged finally, when I used this pretrained model, and finetune on UCF101, this model's performance is normal. 
 3. Yes, this problem only happen for HMDB51.
 4. Yes, I've modified the codes, and I just copy ucf101.py into hmdb51.py, except for modifying its data path

from iic.

BestJuly avatar BestJuly commented on July 19, 2024

Then I think the problem may be the data.
Though SSL training is on UCF101, no labels are used. Therefore, if it is fine with UCF101 for video recognition, it should not have problem for HMDB51 dataset.

Is the 2% accuracy automatically obtained after fine-tuning using ft_classify.py?
Because I want to mention that if you use ft_classify.py to fine-tune but run it again with --mode=test, the --ckpt then should be fine-tuned models, not the SSL one.

You can show some of your fine-tuning/testing scripts and fine-tuning logs here.

from iic.

yuanze-lin avatar yuanze-lin commented on July 19, 2024

Then I think the problem may be the data.
Though SSL training is on UCF101, no labels are used. Therefore, if it is fine with UCF101 for video recognition, it should not have problem for HMDB51 dataset.

Is the 2% accuracy automatically obtained after fine-tuning using ft_classify.py?
Because I want to mention that if you use ft_classify.py to fine-tune but run it again with --mode=test, the --ckpt then should be fine-tuned models, not the SSL one.

You can show some of your fine-tuning/testing scripts and fine-tuning logs here.

Thank you for suggestions and reply, I will check the code and data according to your suggestions, then I will reply later, really appreciate for your patience !!!

from iic.

yuanze-lin avatar yuanze-lin commented on July 19, 2024

I also want to ask, have you try to use clip_len = 32 in your experiment, I notice the default setting is 16, and I wonder how much performance improvement the model can gain if you use clip_len=32, and have you also evaluated the performance of R(2+1) and S3D backbones?

from iic.

BestJuly avatar BestJuly commented on July 19, 2024

Hi, @linyuanze.

About your concern, I have not tried clip_len=32. And in video recognition, there are many input settings such as 16x112x112, 32x224x224, 64x224x224, etc. I only used the same settings as original C3D.

For different network backbones, I have provided results for C3D, R3D, R21D, and R18 (ResNet-18-3D). Please note that R3D and R21D are different from ResNet-18-3D and R(2+1)D in the original paper. Our adopted network backbone is the same as those in VCP and VCOP for fair comparison. There are two versions of S3D (S3D and S3D-G) used in exisitng SSL papers, and I have not tried these two network backbones.

from iic.

yuanze-lin avatar yuanze-lin commented on July 19, 2024

Hi, @linyuanze.

About your concern, I have not tried clip_len=32. And in video recognition, there are many input settings such as 16x112x112, 32x224x224, 64x224x224, etc. I only used the same settings as original C3D.

For different network backbones, I have provided results for C3D, R3D, R21D, and R18 (ResNet-18-3D). Please note that R3D and R21D are different from ResNet-18-3D and R(2+1)D in the original paper. Our adopted network backbone is the same as those in VCP and VCOP for fair comparison. There are two versions of S3D (S3D and S3D-G) used in exisitng SSL papers, and I have not tried these two network backbones.

Thank you for your reply, I find 2% accuracy of trained model on HMDB51 dataset is caused by some bugs of loading checkpoint, when I solve these bugs, the accuracy is ok according to IIC paper. In addition, I want to ask where you've reported your experimental results for C3D, R3D R21D and R18, because I don't find these results on your IIC paper.

from iic.

BestJuly avatar BestJuly commented on July 19, 2024

You can find the corresponding results in README.md in this repo.

from iic.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.