Giter VIP home page Giter VIP logo

ft-clip's People

Contributors

lightdxy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ft-clip's Issues

About low accuracy on valid set, overfitted on train set

Hello! Thank you for your fine-tuning code firstly. However, I met some problems in performance of the model.

I implemented the code and finetune the model "CLIP_L14" on datasets: Oxford Pets, Caltech101 and ImageNet with the same fine-tuning config in the paper except the batch size (Due to the limitation of the device, I set the batch size as 32). But the model performance bad on the validation set with accuracies around 1-5%, but on the train set, the accuracies are around 90%. It seems a typical overfit problem. I changed the learning rate, regulation config, epochs and other related config but failed to solve the problems.

So, I wonder that do you meet the same problem on similar datasets or if there are some methods to solve this problem.

论文求教

您好,拜读了您的大作,很受启发。
您的论文的摘要中提到:Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.... These observations challenge the conventional conclusion that CLIP is not
suitable for fine-tuning...
指出CLIP虽然在零样本推理,尤其是分类任务中效果很好,但是一般认为它不适合微调。我也使用过CLIP,中文和英文版的都试过,将某些自定义的token词和图像进行配对,效果确实不错,感觉CLIP学到的特征很稳健。但是我没有在自己的数据集上微调过CLIP,感觉上,如果特征是好的,那么微调应该效果更好,网络使用的激活都是连续的非线性函数,不会出现类似阶梯函数那样的不连续情况,微调过后,学到的特征应该表达的更好,应该也不会激活后类别判断出错,这只是我的感觉,没有任何证据,所以读到您上面的话有些不解,希望能得到您的解答,非常感谢!

about FLOPs

您好,在论文里面的Table 11,记录了FLOPs的数据:
Model B/16_224 B/16_384 L/16_384 L/14_224 L/14_336
FLOPs 17.5G 55.4G 190.7G 80.7G 190.6G
我利用thop库的profile测试第一个B/16_224的FLOPs只有11.3G,要远小于论文中的17.5G。我猜想可能是我测试的方法不一致,因为profile里面确实默认有些模块没有计算。
所以麻烦想问一下作者在测试FLOPs的具体实现细节,非常感激。

Pre-trained Weights

Thanks for sharing your nice work!

I don't have sufficient computational resources to train the models. May I know if the pre-trained (fine-tuned) weights will be released?

About Layer decay

有一个问题是关于按照代码lr decay设置,最后一层transformer block的lr scale并不是1,这是有意设置的吗还是?求教

about inference code

@LightDXY
Firstly, thank you for sharing the fine-tuning code. However, after completing the fine-tuning, in my own dataset, I only fine-tuned the image encoding part, while I did not fine-tune the text encoding part. I used vit base 16 as the pre training weight, but after fine-tuning, the. pt increased by 5 times. Also, how should the. pt model generated after fine-tuning be used for inference? Looking forward to your guidance, thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.