Giter VIP home page Giter VIP logo

dualcoop's Introduction

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations (NeurIPS 2022)

Authors: Ximeng Sun, Ping Hu, Kate Saenko

Introduction

alt text

In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.

Links: Arxiv/Poster/Slides

Welcome to cite our work if you find it is helpful to your research.

@inproceedings{
sun2022dualcoop,
title={DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations},
author={Ximeng Sun and Ping Hu and Kate Saenko},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=QnajmHkhegH}
}

Set-up Experiment Environment

Our implementation is in Pytorch with python 3.9.

Use conda env create -f environment.yml to create the conda environment. In the conda environment, install pycocotools and randaugment with pip:

pip install pycocotools
pip install randaugment

And follow the link to install dassl.

Datasets

Multi-Label Recognition with Patial Labels

  • MS-COCO: We use the official train2014(82K images) and val2014(40K images) for training and test.
  • VOC2007: We use the official trainval (5K images) and test (5K images) splits for training and test.

Zero-shot Multi-Label Recognition

  • MS-COCO: We follow [1, 2] to split the dataset into 48 seen classes and 17 unseen classes. We provide the json files of the seen and unseen annotations on Google Drive. Download and move all files into <coco_dataroot>/annotations/ for using in the training and inference.
  • NUS-WIDE: Following [2, 3] we use 81 human-annotated categories as unseen classes and an additional set of 925 labels obtained from Flickr tags as seen classes. We provide the class split on Google Drive. Download and move those folders into <nus_wide_dataroot>/annotations/ for using in the training and inference.

Training

MLR with Partial Labels

Use the following code to learn a model for MLR with Partial Labels

python train.py  --config_file configs/models/rn101_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>.yaml \
--input_size 448 --lr <lr_value>   --loss_w <loss_weight> \
-pp <porition_of_avail_label> --csc

Some Args:

  • dataset_config_file: currently the code supports configs/datasets/coco.yaml and configs/datasets/voc2007.yaml
  • lr: 0.001 for VOC2007 and 0.002 for MS-COCO.
  • pp: from 0 to 1. It specifies the portion of labels are available during the training.
  • loss_w: to balance the loss scale with different pp. We use larger loss_w for smaller pp.
  • csc: specify if you want to use class-specific prompts. We suggest to use class-agnostic prompts when pp is very small.
    Please refer to opts.py for the full argument list. For Example:
python train.py  --config_file configs/models/rn101_ep50.yaml \
 --datadir  ../datasets/mscoco_2014/ --dataset_config_file configs/datasets/coco.yaml \
 --input_size 448  --lr 0.002   --loss_w 0.03  -pp 0.5

Zero-Shot MLR

python train_zsl.py  --config_file configs/models/rn50_ep50.yaml  \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \ 
--input_size 224  --lr <lr_value>   --loss_w 0.01  --n_ctx_pos 64 --n_ctx_neg 64 \
--num_train_cls <some_value_or_not_specified>

Some Args:

  • lr: 0.002 for MS-COCO and 0.001 for NUS-WIDE
  • n_ctx_pos: the length of learnable positive prompt template
  • n_ctx_neg: the length of learnable negative prompt template
  • num_train_cls: set as an int n. The algorithm randomly pick n classes to compute ASL loss when the number of seen classes are very large during the training, e.g. NUS-WIDE

Note that csc does not work for zero-shot MLR since some classes are never seen during the training.

For example:

python train_zsl.py --config_file configs/models/rn50_ep50.yaml  \
--datadir ../datasets/mscoco_2014/ --dataset_config_file configs/datasets/coco.yaml \
--input_size 224 --lr 0.002  --loss_w 0.01  --n_ctx_pos 64 --n_ctx_neg 64 

Evaluation / Inference

MLR with Partial Labels

python val.py --config_file configs/models/rn101_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \
--input_size 224  --pretrained <ckpt_path> --csc

Zero-Shot MLR

python val_zsl.py --config_file configs/models/rn50_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \
--input_size 224  --n_ctx_pos 64 --n_ctx_neg 64 --pretrained <ckpt_path> --top_k 5

Reference

[1] Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, and Ajay Divakaran. Zero-shot object detection. In ECCV, 2018.
[2] Avi Ben-Cohen, Nadav Zamir, Emanuel Ben-Baruch, Itamar Friedman, and Lihi Zelnik-Manor. Semantic diversity learning for zero-shot multi-label classification. In ICCV, 2021.
[3] Dat Huynh and Ehsan Elhamifar. A shared multi-attention framework for multi-label zero-shot learning. In CVPR, 2020.

Acknowledgement

We would like to thank Kaiyang Zhou for providing code for CoOp. We borrowed and refactored a large portion of his code in the implementation of our work.

dualcoop's People

Stargazers

Jiajun Chen avatar Lai RouCheng avatar 张启航 avatar Jiaxiong Yang avatar Lee avatar j.b.huang avatar Cristata avatar Junlin Chang avatar Nian Shi avatar  avatar  avatar Debug_Yann avatar wenjie zhu avatar Yubei avatar  avatar Mansi Kakkar avatar Shuranov Viacheslav avatar Penghui Yang avatar Ziqi Han avatar Mohammad Reza Taesiri avatar Evennn avatar  avatar Xunqing avatar  avatar Varun Belagali avatar bzr avatar Xinyu Huang avatar Yanda Meng avatar  avatar  avatar WuDuidi avatar  avatar Xinran Wang avatar Dongseob Kim avatar  avatar Leilei Ma avatar Code_Wr avatar Shinyeong Noh avatar Mingyu Wang avatar Sunan He avatar BUAADreamer avatar linyu avatar Ping Hu avatar Youngtaek Oh avatar  avatar

Watchers

James Cloos avatar Youngtaek Oh avatar  avatar Zixian Guo avatar  avatar  avatar

dualcoop's Issues

About the potential bug in data preprocessing

#4 has described potential label leakage in the code, but the reply has not covered it.

According to the author's kind explanation, the preprocessing code has once served necessary purposes in ASL's implementation. However, ASL was evaluated on fully labeled COCO. In partial label settings, it is worried that the same split-merge code can lead to potential label leakage, which is important but not solved in the previous closed issue.

Could the authors check if this code causes the main problem mentioned in #4 ? The code seems to result in a larger proportion of known labels than is actually set.

Thanks.


Thank you for your great work.

While reading the code, I find something confused me in data preprocessing:

# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
    if obj['area'] < 32 * 32:
        output[0][self.cat2cat[obj['category_id']]] = 1
    elif obj['area'] < 96 * 96:
        output[1][self.cat2cat[obj['category_id']]] = 1
    else:
        output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
    masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
    target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]

Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0 will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.

It would be better if the authors replace the code with a simple one and reproduce the results.

About data set

Dear author, have you done any experiments on the openimages dataset and what are the results

Questions about data preprocessing

Thank you for your great work.

While reading the code, I find something confused me in data preprocessing:

# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
    if obj['area'] < 32 * 32:
        output[0][self.cat2cat[obj['category_id']]] = 1
    elif obj['area'] < 96 * 96:
        output[1][self.cat2cat[obj['category_id']]] = 1
    else:
        output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
    masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
    target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]

Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0 will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.

It would be better if the authors replace the code with a simple one and reproduce the results.

Guidelines for training on a new dataset

Hi!
Your work is so interesting. Please, could you provide guidelines and/or requirements for data preparation before training your code with new datasets?

Thank you in advance.

Queries

@sunxm2357 Hi thanks sharing the code base wonderful work i have few queries

  1. can we training the existing pipeline for automotive data which multilable classification like showing in figure below
    image

  2. Since it is used for multilabel classification can we add features like prompts to improve the accuracy of the model
    Thanks in advance

Questions about Reproduction

I try to reproduce the codes according to the descriptions in the original paper combined with the open-sourced CoOp, however, I couldn't obtain the results in the paper. It will be better if the authors could release the open-sourced codes.

How can I install dassl?

Hi there,

I have cloned the DualCoOp repo and installed all the dependencies except dassl. Where do I install dassl? It is its own repo so I'm not sure how to set it up accordingly "within" the DualCoOp repo. Thanks.

About the value of loss_w

Dear author.
I found your work so interesting.
Then, I have a question about training settings to reproduce the results in your paper.
Can you share each value of loss_w related to pp, respectively?
I really appreciate any help you can provide.

Question about multiplicative factor

Hello,

I would like to thank you for releasing the code of your very inspiring work!

I am trying to reproduce your results and I was wondering why there is a multiplication by 20 line 232 of the "models.dualcoop.py" file:
output = 20 * F.conv1d(image_features_norm, text_features[:, :, None])

I have the same question regarding the multiplication by 5 line 237 of the same file:
output = 5 * (output * w).sum(-1)

Regards

Code Release?

Thank you for your great work!

Would you mind sharing your code? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.