CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP? • Get going • What is in this repo? • Language-aware knowledge • Uncertainty-related ablations • Cite

What is CLAP4CLIP?

CLAP4CLIP is a general probabilistic finetuning framework for the pre-trained CLIP model on downstream class-incremental learning tasks.

The framework is general because (as depicted below) it supports a diverse range of prompt styles including hand-crafted prompts like Continual-CLIP, task-conditioned prompts like CoOp, instance-conditioned prompts like AttriCLIP, and multi-modal prompts like MaPLe:

Get going

Clone this github repository:

git clone https://github.com/srvCodes/clap4clip.git
cd clap4clip
mkdir ckpt/

Download models: Download the pretrained ViT-B-16.pt and ViT-L-14.pt checkpoints to ckpt/ directory.
Download datasets: We suggest following the mammoth library to download all the datasets into the repo datasets/. Instructions for ImageNet-R can be found here.

What is in this repo?

This repo is designed with the aim of benchmarking various finetuning methods for class-incremental learning with the pre-trained CLIP model.

The instructions below depict how to run the models provided with the initial release on CIFAR100 (check the repo scripts/):

CLAP4CLIP with hand-crafted prompts (our base CLAP model):

python3 main_incremental_submit.py --lasp --beta 15 --db_name cifar100 --use-vga --expandable-adapter --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip_var --epochs 5 --forward-times 20 --arch ViT-B-16  --method er --variational

Continual-CLIP (zero-shot):

python3 main_incremental_submit.py --db_name cifar100 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clclip --arch ViT-B-16

CLIP-Adapter:

python3 main_incremental_submit.py --db_name cifar100 --finetuning --finetune-epochs 2 --num-run 10 --compute-ece --compute-bwt --train_batch 32 --exemplar-selector random --root ../path_to_datasets/ --multi-gpu --gpus 0,1 --default-gpu 0 --model clip_adapter --epochs 5 --arch ViT-B-16 --method er

We plan to release the following models upon the acceptance of our paper:

CoOp
MaPLe
AttriCLIP
CLAP4CLIP with support for CoOp/MaPLe/AttriCLIP

Language-aware knowledge

Past-task distribution regularization (for reducing forgetting in general): Can be evoked by passing the argument --lasp --beta $\gamma$ where $\gamma$ is the loss weight used in Eq. (12) in our paper.
Weight initialization (for reducing stability gap): Currently, controlled by commenting/uncommenting this line.

Uncertainty-related ablations

In our paper, we show the out-of-the-box perks of uncertainty-aware modelling for the following two tasks:

Post-hoc novel data detection (PhNDD)

PhNDD is a post-hoc setting proposed in our paper for evaluating the novel data detection capabilities of a finetuning algorithm within the continual learning setting. To evoke this, simply pass the argument --eval-ood-score in the script.

Exemplar selection

For all but the zero-shot models, the repo implements the following exemplar selection criteria: Random, Herding, Entropy, Variance, Variance of entropy, and Energy scores. These can simply be evoked by passing the value x to the argument --exemplar-selector, where x can be {random, icarl, entropy, variance, distance, var_entropy, energy}.

Cite

If you want to cite this framework feel free to use this preprint citation:

@article{jha_clap4clip,
  title={CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models},
  author={Jha, Saurav and Gong, Dong and Yao, Lina},
  journal={arXiv preprint arXiv:2403.19137},
  year={2024}
}

srvcodes / clap4clip Goto Github PK

clap4clip's Introduction

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [Paper]

What is CLAP4CLIP?

Get going

What is in this repo?

Language-aware knowledge

Uncertainty-related ablations

Post-hoc novel data detection (PhNDD)

Exemplar selection

Cite

clap4clip's People

Contributors

Stargazers

Watchers

clap4clip's Issues

Environment required

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent