I notice that you set the finetuning epochs as 200 or 400. <div

How do I know it is pretraining works instead of longer finetuning epochs? about spark HOT 5 CLOSED

keyu-tian commented on May 22, 2024

How do I know it is pretraining works instead of longer finetuning epochs?

from spark.

Comments (5)

keyu-tian commented on May 22, 2024 1

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

from spark.

rayleizhu commented on May 22, 2024 1

Thanks for your quick response.

We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

I think the 100 ep setting is important, otherwise, it is difficult for follow-up works to compare with existing works (Spark, ConvNextV2, etc.) in a fair way because of inconsistent evaluation protocols.

Besides, I think it is more reasonable to finetune pre-trained models with no more than 300 epochs, which is used in the supervised baseline. Otherwise, it is hard to say whether the performance gain comes from longer finetuning or better initialization provided by MIM.

from spark.

keyu-tian commented on May 22, 2024 1

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

On real downstream tasks (COCO object detection & instance segmentation), SparK can outperform Swin+MIM, Swin+Supervised, Conv+Supervised, Conv+Contrastive Learning, so these are like solid proofs of SparK's effectiveness.

from spark.

rayleizhu commented on May 22, 2024 1

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

This makes sense to me. Thanks for the explanation.

from spark.

ds2268 commented on May 22, 2024

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

But for the B and H/L models, they use the same 50 and 100 epochs (ConvNext v2 paper, A.1, Table 11) fine-tuning schedule. It would be nice to compare apple to apples in terms of fine-tuning epochs. What are the results after 50 epochs SparK fine-tunning of the ConvNext-B and 100 epochs for ConvNext-H?

from spark.

How do I know it is pretraining works instead of longer finetuning epochs? about spark HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent