Giter VIP home page Giter VIP logo

parameterefficient-dml's Introduction

ParameterEfficient Deep Metric Learning (DML)

Implementation of ICLR 2024 Paper: Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning

Li Ren, Chen Chen, Liqiang Wang, Kien Hua

PWC
PWC
PWC
PWC

Abstract

Deep Metric Learning (DML) has long attracted the attention of the machine learning community as a key objective. Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets. As a result of the success of recent pre-trained models trained from larger-scale datasets, it is challenging to adapt the model to the DML tasks in the local data domain while retaining the previously gained knowledge. In this paper, we investigate parameter-efficient methods for fine-tuning the pre-trained model for DML tasks. In particular, we propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT). Based on the conventional proxy-based DML paradigm, we augment the proxy by incorporating the semantic information from the input image and the ViT, in which we optimize the visual prompts for each class. We demonstrate that our new approximations with semantic information are superior to representative capabilities, thereby improving metric learning performance. We conduct extensive experiments to demonstrate that our proposed framework is effective and efficient by evaluating popular DML benchmarks. In particular, we demonstrate that our fine-tuning method achieves comparable or even better performance than recent state-of-the-art full fine-tuning works of DML while tuning only a small percentage of total parameters.

Framework

image

Citation

If you find this repo useful, please consider citing:

@inproceedings{
  ren2024learning,
  title={Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning},
  author={Li Ren and Chen Chen and Liqiang Wang and Kien A. Hua},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=TWVMVPx2wO}
}

Suggested Enviroment

  • PyTorch >= 1.12
  • cudatoolkit >= 11.3
  • torchvision >= 0.13
  • timm == 0.9.16

Suggested Installation

  1. Properly install the Anaconda or Miniconda
  2. Prepare Conda enviroment with following script
conda init
source ~/.bashrc
conda create --name dml python=3.8 -y
conda activate dml
  1. Install CUDA and Pytorch with following script
conda update -n base -c defaults conda -y
conda install -y pytorch=1.12.1=py3.8_cuda11.3_cudnn8.3.2_0 faiss-gpu torchvision cudatoolkit=11.3 -c pytorch 
conda install -c conda-forge timm=0.9.16 pretrainedmodels fairscale -y
pip install tqdm regex

Download Data

CUB200 can be downloaded from here
CARS can be downloaded from here
Stanford_Online_Products can be downloaded from the official webset
In-Shop Clothes Retrieval can be downloaded from the official webset iNaturalist can be downloaded from the official webset

The datasets should be unpacked and placed in a distinct folder. An illustration of a data structure is as follows:

$HOME/data/
├── cars196
│   └── images
|        ├── Acura Integra Type R 2001
|       ... 
|
├── cub200
│   └── images
|        ├── 001.Black_footed_Albatross
|        ...
├── inshop
│   └── img
|        ├── ...
|   └── list_eval_partition.txt
├── online_products
│   ├── images
│   │   ├── bicycle_final
|   |   ....
│   └── Info_Files
├── inaturalist
│   ├── Inaturalist_train_set1.txt
│   ├── Inaturalist_test_set1.txt
|   └── train_val2018

Training

Start to run the training procedure with following command:

python main.py --source_path ${data_path} \
--config ${config_path} \
--gpu ${gpu_id}

data_path is the root of your dataset.
config_path is the path of configure file that save specific hyper-parameters.
gpu_id is the index of your GPU starting from 0

For example, to fine-tun a model based on ViT-S16,

python main.py --source_path ../dml_data \
--config ./configs/cub200_small.yaml \
--gpu 0

The pre-training model and other configs can be edited in ./configs/${your_setting}.yaml

Proposed Results in Table 2

Main results of our proposed VPTSP-G approach based on the pretrained model ViT-S16 and ViT-B16.

ViT-S16/384 R@1 MAP@R
CUB 86.6 0.527
Cars 87.7 0.289
SOP 84.4 0.501
InShop 91.2 --
ViT-B16/512 R@1 MAP@R
CUB 88.5 0.570
Cars 91.2 0.350
SOP 86.8 0.538
InShop 92.5 --
iNaturalist 84.5 0.481

Please note that the reproduced results may be different from the those reported in the paper due to different enviroments, the hardware architectures and uncontrollable randomnesses.

Acknowledgement

This implementation is partially based on:

https://github.com/Confusezius/Deep-Metric-Learning-Baselines https://github.com/Confusezius/Revisiting_Deep_Metric_Learning_PyTorch

parameterefficient-dml's People

Contributors

noahsark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.