lightdxy / ft-clip Goto Github PK

View Code? Open in Web Editor NEW

200.0 200.0 7.0 4.47 MB

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Python 94.54% Shell 5.46%

ft-clip's People

Contributors

Stargazers

Watchers

Forkers

jsrdcht polynomialqian d710055071 ryanchankh whuhxb anguoyuan langlangzhu

ft-clip's Issues

About low accuracy on valid set, overfitted on train set

Hello! Thank you for your fine-tuning code firstly. However, I met some problems in performance of the model.

I implemented the code and finetune the model "CLIP_L14" on datasets: Oxford Pets, Caltech101 and ImageNet with the same fine-tuning config in the paper except the batch size (Due to the limitation of the device, I set the batch size as 32). But the model performance bad on the validation set with accuracies around 1-5%, but on the train set, the accuracies are around 90%. It seems a typical overfit problem. I changed the learning rate, regulation config, epochs and other related config but failed to solve the problems.

So, I wonder that do you meet the same problem on similar datasets or if there are some methods to solve this problem.

论文求教

您好，拜读了您的大作，很受启发。
您的论文的摘要中提到：Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.... These observations challenge the conventional conclusion that CLIP is not
suitable for fine-tuning...
指出CLIP虽然在零样本推理，尤其是分类任务中效果很好，但是一般认为它不适合微调。我也使用过CLIP，中文和英文版的都试过，将某些自定义的token词和图像进行配对，效果确实不错，感觉CLIP学到的特征很稳健。但是我没有在自己的数据集上微调过CLIP，感觉上，如果特征是好的，那么微调应该效果更好，网络使用的激活都是连续的非线性函数，不会出现类似阶梯函数那样的不连续情况，微调过后，学到的特征应该表达的更好，应该也不会激活后类别判断出错，这只是我的感觉，没有任何证据，所以读到您上面的话有些不解，希望能得到您的解答，非常感谢！

about FLOPs

您好，在论文里面的Table 11，记录了FLOPs的数据：
Model B/16_224 B/16_384 L/16_384 L/14_224 L/14_336
FLOPs 17.5G 55.4G 190.7G 80.7G 190.6G
我利用thop库的profile测试第一个B/16_224的FLOPs只有11.3G，要远小于论文中的17.5G。我猜想可能是我测试的方法不一致，因为profile里面确实默认有些模块没有计算。
所以麻烦想问一下作者在测试FLOPs的具体实现细节，非常感激。

Pre-trained Weights

Thanks for sharing your nice work!

I don't have sufficient computational resources to train the models. May I know if the pre-trained (fine-tuned) weights will be released?

About Layer decay

有一个问题是关于按照代码lr decay设置，最后一层transformer block的lr scale并不是1，这是有意设置的吗还是？求教

about inference code

@LightDXY
Firstly, thank you for sharing the fine-tuning code. However, after completing the fine-tuning, in my own dataset, I only fine-tuned the image encoding part, while I did not fine-tune the text encoding part. I used vit base 16 as the pre training weight, but after fine-tuning, the. pt increased by 5 times. Also, how should the. pt model generated after fine-tuning be used for inference? Looking forward to your guidance, thank you.

lightdxy / ft-clip Goto Github PK

ft-clip's People

Contributors

Stargazers

Watchers

Forkers

ft-clip's Issues

About low accuracy on valid set, overfitted on train set

论文求教

about FLOPs

Pre-trained Weights

About Layer decay

about inference code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent