Giter VIP home page Giter VIP logo

Comments (6)

BestJuly avatar BestJuly commented on September 28, 2024

Hi, @wuchlei. Thank you for your interest.

Actually, when I prepared this repo, I trained the finetuning part for about 2 times, and the results are 67.4% and 69.8%. With different random seeds, the performances vary from one to another. Therefore, I think 0.685 and 0.68 are acceptable.

By the way, in our old code version, we directly used x - shift_x instead of ((x - shift_x) + 1) / 2 and achieved 71.8%. The motivation of using ((x - shift_x) + 1) / 2 is to set the residual input in the similar range as the RGB view during self-supervised training. And we found directly using x - shift_x during finetuning period can achieve better performances.

To further improve the performance, you can try many strategies such as

  1. use different data augmentation methods, such as random flipping, color jittering. Or some recent works such as CutMix;
  2. use more crops of the same clip, such as top-left, top-right, center, bottom-left, and bottom right, and average them.

Those are usable and effective tricks while we do not include them here because this is not our target in this paper.

from iic.

wuchlei avatar wuchlei commented on September 28, 2024

Thanks for the reply. I'll give it a try.

from iic.

wuchlei avatar wuchlei commented on September 28, 2024

@BestJuly hi Li,
I've tried what you said that directly using x -shift_x for ft_clasify. The modification to the code is :

def diff(x):
    shift_x = torch.roll(x, 1, 2)
#     return ((x - shift_x) + 1) / 2
    return x - shift_x

However I'm still only getting an accuracy around 0.684. I've even tried to use the strong augmentations in SimCLR to train the backbone and I'm only getting a 2% percent improvement (0.702 accuracy, still not close to the 0.72 accuracy reported in the paper). Could you please share ur code or training scripts for the fine-tuning phase? It would be very helpful.

from iic.

BestJuly avatar BestJuly commented on September 28, 2024

Hi, @wuchlei .

May I ask a question, which SSL pretrained model do you use? I rerun the code twice using the provided model and the current code (ft_classify.py, x - shift_x version) and get 71.2% @ top1 and 72.7% @ top1 respectively. I have uploaded the model with accuracy 72.7% to cloud drive.

*Please note that for testing dataset, I use CenterCrop instead of RandomCrop in corresponding data transformations. This is a bug in previous ft_classify.py file and I have fixed it. Our previous experimental environments used two separate files for finetuning and testing, therefore, this is just a bug when I conduct code refactoring.

The reported result on UCF101 split 1 in the paper is 71.8% @ top1 (Table. 5, settings: frame repeating, res, R3D). I think it is reasonable and here I do not use any strong data augmentations.

Again, I want to say that achiving the exact the same results as that in the paper is impossible because the training procedure includes SSL pretraining and finetuning, and the final recognition results may be affected by each step.

from iic.

wuchlei avatar wuchlei commented on September 28, 2024

Thanks for the reply. With these fixes, I've reproduced the results (around 0.72 for Res+Repeat). With SimCLR strong augmentation, I'm even getting a 0.8% improvement.
Again, thanks for the help and best of luck with your research!

from iic.

BestJuly avatar BestJuly commented on September 28, 2024

Oh, that is good news. SimCLR strong augmentations can have improvements and other augmentations can also help if you want to have better performances. I remembered there is one ECCV'20 paper about the augmentation methods for videos.

Also, good luch with your experiments~

from iic.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.