Hello, Li Tao, Thanks for your great work. I recently want to reproduce your work

Training on hmdb51 about iic HOT 8 CLOSED

yuanze-lin commented on July 19, 2024

Training on hmdb51

from iic.

Comments (8)

BestJuly commented on July 19, 2024

Hi, @linyuanze. Thank you for your interest.

The accuary 2% for HMDB51 is like random guess.

In my experimental settings, everything is fine, so I am not sure what you have met in experiments.

To help you check where goes wrong, I would like to ask

For the SSL pre-training part, have you checked the performance for video retrieval? How is it going?
For the fine-tuning part, does the network converge? You should at least find the accuracy on training dataset good.
Does the problem happen only for HMDB51?
Except for the dataset path, have you modified the codes?

from iic.

yuanze-lin commented on July 19, 2024

Hi, @linyuanze. Thank you for your interest.

The accuary 2% for HMDB51 is like random guess.

In my experimental settings, everything is fine, so I am not sure what you have met in experiments.

To help you check where goes wrong, I would like to ask

For the SSL pre-training part, have you checked the performance for video retrieval? How is it going?

For the fine-tuning part, does the network converge? You should at least find the accuracy on training dataset good.

Does the problem happen only for HMDB51?

Except for the dataset path, have you modified the codes?

Really appreciate for your reply!!!

 1. No, I haven't checked the performance for video retrieval, I only check the performance of video classification, especially on UCF101 dataset.
 2. Yes, I've already self-supervised pretrain model on UCF101, and the model converged finally, when I used this pretrained model, and finetune on UCF101, this model's performance is normal. 
 3. Yes, this problem only happen for HMDB51.
 4. Yes, I've modified the codes, and I just copy ucf101.py into hmdb51.py, except for modifying its data path

from iic.

BestJuly commented on July 19, 2024

Then I think the problem may be the data.
Though SSL training is on UCF101, no labels are used. Therefore, if it is fine with UCF101 for video recognition, it should not have problem for HMDB51 dataset.

Is the 2% accuracy automatically obtained after fine-tuning using ft_classify.py?
Because I want to mention that if you use ft_classify.py to fine-tune but run it again with --mode=test, the --ckpt then should be fine-tuned models, not the SSL one.

You can show some of your fine-tuning/testing scripts and fine-tuning logs here.

from iic.

yuanze-lin commented on July 19, 2024

Then I think the problem may be the data.
Though SSL training is on UCF101, no labels are used. Therefore, if it is fine with UCF101 for video recognition, it should not have problem for HMDB51 dataset.

Is the 2% accuracy automatically obtained after fine-tuning using ft_classify.py?
Because I want to mention that if you use ft_classify.py to fine-tune but run it again with --mode=test, the --ckpt then should be fine-tuned models, not the SSL one.

You can show some of your fine-tuning/testing scripts and fine-tuning logs here.

Thank you for suggestions and reply, I will check the code and data according to your suggestions, then I will reply later, really appreciate for your patience !!!

from iic.

yuanze-lin commented on July 19, 2024

I also want to ask, have you try to use clip_len = 32 in your experiment, I notice the default setting is 16, and I wonder how much performance improvement the model can gain if you use clip_len=32, and have you also evaluated the performance of R(2+1) and S3D backbones?

from iic.

BestJuly commented on July 19, 2024

Hi, @linyuanze.

About your concern, I have not tried clip_len=32. And in video recognition, there are many input settings such as 16x112x112, 32x224x224, 64x224x224, etc. I only used the same settings as original C3D.

For different network backbones, I have provided results for C3D, R3D, R21D, and R18 (ResNet-18-3D). Please note that R3D and R21D are different from ResNet-18-3D and R(2+1)D in the original paper. Our adopted network backbone is the same as those in VCP and VCOP for fair comparison. There are two versions of S3D (S3D and S3D-G) used in exisitng SSL papers, and I have not tried these two network backbones.

from iic.

yuanze-lin commented on July 19, 2024

Hi, @linyuanze.

About your concern, I have not tried clip_len=32. And in video recognition, there are many input settings such as 16x112x112, 32x224x224, 64x224x224, etc. I only used the same settings as original C3D.

For different network backbones, I have provided results for C3D, R3D, R21D, and R18 (ResNet-18-3D). Please note that R3D and R21D are different from ResNet-18-3D and R(2+1)D in the original paper. Our adopted network backbone is the same as those in VCP and VCOP for fair comparison. There are two versions of S3D (S3D and S3D-G) used in exisitng SSL papers, and I have not tried these two network backbones.

Thank you for your reply, I find 2% accuracy of trained model on HMDB51 dataset is caused by some bugs of loading checkpoint, when I solve these bugs, the accuracy is ok according to IIC paper. In addition, I want to ask where you've reported your experimental results for C3D, R3D R21D and R18, because I don't find these results on your IIC paper.

from iic.

BestJuly commented on July 19, 2024

You can find the corresponding results in README.md in this repo.

from iic.

Training on hmdb51 about iic HOT 8 CLOSED

Comments (8)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent