Giter VIP home page Giter VIP logo

clap4clip's Introduction

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP?Get goingWhat is in this repo?Language-aware knowledgeUncertainty-related ablationsCite


What is CLAP4CLIP?

alt text

CLAP4CLIP is a general probabilistic finetuning framework for the pre-trained CLIP model on downstream class-incremental learning tasks.

The framework is general because (as depicted below) it supports a diverse range of prompt styles including hand-crafted prompts like Continual-CLIP, task-conditioned prompts like CoOp, instance-conditioned prompts like AttriCLIP, and multi-modal prompts like MaPLe:

alt text

Get going

Clone this github repository:

git clone https://github.com/srvCodes/clap4clip.git
cd clap4clip
mkdir ckpt/
  • Download models: Download the pretrained ViT-B-16.pt and ViT-L-14.pt checkpoints to ckpt/ directory.

  • Download datasets: We suggest following the mammoth library to download all the datasets into the repo datasets/. Instructions for ImageNet-R can be found here.

What is in this repo?

This repo is designed with the aim of benchmarking various finetuning methods for class-incremental learning with the pre-trained CLIP model.

The instructions below depict how to run the models provided with the initial release on CIFAR100 (check the repo scripts/):

  • CLAP4CLIP with hand-crafted prompts (our base CLAP model):
python3 main_incremental_submit.py --lasp --beta 15 --db_name cifar100 --use-vga --expandable-adapter --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip_var --epochs 5 --forward-times 20 --arch ViT-B-16  --method er --variational
  • Continual-CLIP (zero-shot):
python3 main_incremental_submit.py --db_name cifar100 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip --arch ViT-B-16
  • CLIP-Adapter:
python3 main_incremental_submit.py --db_name cifar100 --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clip_adapter --epochs 5 --arch ViT-B-16 --method er

We plan to release the following models upon the acceptance of our paper:

  • CoOp
  • MaPLe
  • AttriCLIP
  • CLAP4CLIP with support for CoOp/MaPLe/AttriCLIP

Language-aware knowledge

  • Past-task distribution regularization (for reducing forgetting in general): Can be evoked by passing the argument --lasp --beta $\gamma$ where $\gamma$ is the loss weight used in Eq. (12) in our paper.
  • Weight initialization (for reducing stability gap): Currently, controlled by commenting/uncommenting this line.

Uncertainty-related ablations

In our paper, we show the out-of-the-box perks of uncertainty-aware modelling for the following two tasks:

Post-hoc novel data detection (PhNDD)

  • PhNDD is a post-hoc setting proposed in our paper for evaluating the novel data detection capabilities of a finetuning algorithm within the continual learning setting. To evoke this, simply pass the argument --eval-ood-score in the script.

Exemplar selection

  • For all but the zero-shot models, the repo implements the following exemplar selection criteria: Random, Herding, Entropy, Variance, Variance of entropy, and Energy scores. These can simply be evoked by passing the value x to the argument --exemplar-selector, where x can be {random, icarl, entropy, variance, distance, var_entropy, energy}.

Cite

If you want to cite this framework feel free to use this preprint citation:

@article{jha_clap4clip,
  title={CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models},
  author={Jha, Saurav and Gong, Dong and Yao, Lina},
  journal={arXiv preprint arXiv:2403.19137},
  year={2024}
}

clap4clip's People

Contributors

srvcodes avatar

Stargazers

wuyujack (Mingfu Liang) avatar Henry avatar Jeff Carpenter avatar Yuyang avatar tdye24 avatar BaofengZan avatar Infish avatar Alicja  avatar JingfanChen avatar

Watchers

 avatar Kostas Georgiou avatar

clap4clip's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.