sunxm2357 / dualcoop Goto Github PK

Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))

Python 100.00%

dualcoop's Introduction

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations (NeurIPS 2022)

Authors: Ximeng Sun, Ping Hu, Kate Saenko

Introduction

In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.

Links: Arxiv/Poster/Slides

Welcome to cite our work if you find it is helpful to your research.

@inproceedings{
sun2022dualcoop,
title={DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations},
author={Ximeng Sun and Ping Hu and Kate Saenko},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=QnajmHkhegH}
}

Set-up Experiment Environment

Our implementation is in Pytorch with python 3.9.

Use conda env create -f environment.yml to create the conda environment. In the conda environment, install pycocotools and randaugment with pip:

pip install pycocotools
pip install randaugment

And follow the link to install dassl.

Datasets

Multi-Label Recognition with Patial Labels

MS-COCO: We use the official train2014(82K images) and val2014(40K images) for training and test.
VOC2007: We use the official trainval (5K images) and test (5K images) splits for training and test.

Zero-shot Multi-Label Recognition

MS-COCO: We follow [1, 2] to split the dataset into 48 seen classes and 17 unseen classes. We provide the json files of the seen and unseen annotations on Google Drive. Download and move all files into <coco_dataroot>/annotations/ for using in the training and inference.
NUS-WIDE: Following [2, 3] we use 81 human-annotated categories as unseen classes and an additional set of 925 labels obtained from Flickr tags as seen classes. We provide the class split on Google Drive. Download and move those folders into <nus_wide_dataroot>/annotations/ for using in the training and inference.

Training

MLR with Partial Labels

Use the following code to learn a model for MLR with Partial Labels

python train.py  --config_file configs/models/rn101_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>.yaml \
--input_size 448 --lr <lr_value>   --loss_w <loss_weight> \
-pp <porition_of_avail_label> --csc

Some Args:

dataset_config_file: currently the code supports configs/datasets/coco.yaml and configs/datasets/voc2007.yaml
lr: 0.001 for VOC2007 and 0.002 for MS-COCO.
pp: from 0 to 1. It specifies the portion of labels are available during the training.
loss_w: to balance the loss scale with different pp. We use larger loss_w for smaller pp.
csc: specify if you want to use class-specific prompts. We suggest to use class-agnostic prompts when pp is very small.
Please refer to opts.py for the full argument list. For Example:

python train.py  --config_file configs/models/rn101_ep50.yaml \
 --datadir  ../datasets/mscoco_2014/ --dataset_config_file configs/datasets/coco.yaml \
 --input_size 448  --lr 0.002   --loss_w 0.03  -pp 0.5

Zero-Shot MLR

python train_zsl.py  --config_file configs/models/rn50_ep50.yaml  \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \ 
--input_size 224  --lr <lr_value>   --loss_w 0.01  --n_ctx_pos 64 --n_ctx_neg 64 \
--num_train_cls <some_value_or_not_specified>

Some Args:

lr: 0.002 for MS-COCO and 0.001 for NUS-WIDE
n_ctx_pos: the length of learnable positive prompt template
n_ctx_neg: the length of learnable negative prompt template
num_train_cls: set as an int n. The algorithm randomly pick n classes to compute ASL loss when the number of seen classes are very large during the training, e.g. NUS-WIDE

Note that csc does not work for zero-shot MLR since some classes are never seen during the training.

For example:

python train_zsl.py --config_file configs/models/rn50_ep50.yaml  \
--datadir ../datasets/mscoco_2014/ --dataset_config_file configs/datasets/coco.yaml \
--input_size 224 --lr 0.002  --loss_w 0.01  --n_ctx_pos 64 --n_ctx_neg 64

Evaluation / Inference

MLR with Partial Labels

python val.py --config_file configs/models/rn101_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \
--input_size 224  --pretrained <ckpt_path> --csc

Zero-Shot MLR

python val_zsl.py --config_file configs/models/rn50_ep50.yaml \
--datadir <your_dataset_path> --dataset_config_file configs/datasets/<dataset>>.yaml \
--input_size 224  --n_ctx_pos 64 --n_ctx_neg 64 --pretrained <ckpt_path> --top_k 5

Reference

[1] Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, and Ajay Divakaran. Zero-shot object detection. In ECCV, 2018.
[2] Avi Ben-Cohen, Nadav Zamir, Emanuel Ben-Baruch, Itamar Friedman, and Lihi Zelnik-Manor. Semantic diversity learning for zero-shot multi-label classification. In ICCV, 2021.
[3] Dat Huynh and Ehsan Elhamifar. A shared multi-attention framework for multi-label zero-shot learning. In CVPR, 2020.

Acknowledgement

We would like to thank Kaiyang Zhou for providing code for CoOp. We borrowed and refactored a large portion of his code in the implementation of our work.

dualcoop's People

Stargazers

Watchers

Forkers

feinanshan sorrowyn kyouyamaksz zonszer carmelacalabrese kii-chan-iine

dualcoop's Issues

About the potential bug in data preprocessing

#4 has described potential label leakage in the code, but the reply has not covered it.

According to the author's kind explanation, the preprocessing code has once served necessary purposes in ASL's implementation. However, ASL was evaluated on fully labeled COCO. In partial label settings, it is worried that the same split-merge code can lead to potential label leakage, which is important but not solved in the previous closed issue.

Could the authors check if this code causes the main problem mentioned in #4 ? The code seems to result in a larger proportion of known labels than is actually set.

Thanks.

Thank you for your great work.

While reading the code, I find something confused me in data preprocessing:
# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
    if obj['area'] < 32 * 32:
        output[0][self.cat2cat[obj['category_id']]] = 1
    elif obj['area'] < 96 * 96:
        output[1][self.cat2cat[obj['category_id']]] = 1
    else:
        output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
    masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
    target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]
Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0 will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.

It would be better if the authors replace the code with a simple one and reproduce the results.

When will the code be released?

Looking forward to the release code.
When will the code be released?

The cls_ids.pickle file for the coco is missing

Dear author, I find the cls_ids.pickle file missing while reproducing the zero-shot experiment on the coco dataset.
Specifically, I do not find the cls_ids.pickle of coco in the download link (https://drive.google.com/file/d/154dkD7Ok1xxwTZb7hQTAf3FlNCN8Q6KM/view?usp=sharing) that you give.
Please tell me how to get the cls_ids.pickle or directly give the file.

About data set

Dear author, have you done any experiments on the openimages dataset and what are the results

Questions about data preprocessing

Thank you for your great work.

While reading the code, I find something confused me in data preprocessing:

# In coco_detection.py, line 105......
output = torch.zeros((3, len(self.classnames)), dtype=torch.long)
for obj in target:
    if obj['area'] < 32 * 32:
        output[0][self.cat2cat[obj['category_id']]] = 1
    elif obj['area'] < 96 * 96:
        output[1][self.cat2cat[obj['category_id']]] = 1
    else:
        output[2][self.cat2cat[obj['category_id']]] = 1
target = output
if self.mask is not None:
    masked = - torch.ones((3, len(self.classnames)), dtype=torch.long)
    target = self.mask[index] * target + (1 - self.mask[index]) * masked
# ......
# In trainers.py, line 109
target = target.max(dim=1)[0]

Why split the annotations into 3 parts, mask out, and then merge them? It is odd, and also seems to make the proportion of known labels larger than the settings. Only if the masks for 3 parts are all 0 will the label be dropped. For example, 87.5%(0.5^3) labels will be retained instead of 50% if set the proportion to 0.5 following the README.

It would be better if the authors replace the code with a simple one and reproduce the results.

Guidelines for training on a new dataset

Hi!
Your work is so interesting. Please, could you provide guidelines and/or requirements for data preparation before training your code with new datasets?

Thank you in advance.

Queries

@sunxm2357 Hi thanks sharing the code base wonderful work i have few queries

can we training the existing pipeline for automotive data which multilable classification like showing in figure below
Since it is used for multilabel classification can we add features like prompts to improve the accuracy of the model
Thanks in advance

Questions about Reproduction

I try to reproduce the codes according to the descriptions in the original paper combined with the open-sourced CoOp, however, I couldn't obtain the results in the paper. It will be better if the authors could release the open-sourced codes.

How can I install dassl?

Hi there,

I have cloned the DualCoOp repo and installed all the dependencies except dassl. Where do I install dassl? It is its own repo so I'm not sure how to set it up accordingly "within" the DualCoOp repo. Thanks.

F.multi_head_attention_forward missing parameter 'average_attn_weights=True'?

TypeError: multi_head_attention_forward got an unexpected keyword argument 'average_attn_weights'

About the value of loss_w

Dear author.
I found your work so interesting.
Then, I have a question about training settings to reproduce the results in your paper.
Can you share each value of loss_w related to pp, respectively?
I really appreciate any help you can provide.

Question about multiplicative factor

Hello,

I would like to thank you for releasing the code of your very inspiring work!

I am trying to reproduce your results and I was wondering why there is a multiplication by 20 line 232 of the "models.dualcoop.py" file:
output = 20 * F.conv1d(image_features_norm, text_features[:, :, None])

I have the same question regarding the multiplication by 5 line 237 of the same file:
output = 5 * (output * w).sum(-1)

Regards

Code Release?

Thank you for your great work!

Would you mind sharing your code? Thanks!

About the Network

About whether the network can be replaced by VIT for this

sunxm2357 / dualcoop Goto Github PK

dualcoop's Introduction

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations (NeurIPS 2022)

Introduction

Set-up Experiment Environment

Datasets

Multi-Label Recognition with Patial Labels

Zero-shot Multi-Label Recognition

Training

MLR with Partial Labels

Zero-Shot MLR

Evaluation / Inference

MLR with Partial Labels

Zero-Shot MLR

Reference

Acknowledgement

dualcoop's People

Stargazers

Watchers

Forkers

dualcoop's Issues

Recommend Projects

Recommend Topics

Recommend Org