Giter VIP home page Giter VIP logo

clearclip's Introduction

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

1S-Lab, Nanyang Technological University  2CCDS, Nanyang Technological University  3SenseTime Research 
Accepted to ECCV 2024

[arXiv]

Abstract

Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades segmentation quality. With a comparative analysis of statistical properties in the residual connection and the attention output across different pretrained models, we discover that CLIP's image-text contrastive training paradigm emphasizes global features at the expense of local discriminability, leading to noisy segmentation results. In response, we propose ClearCLIP, a novel approach that decomposes CLIP's representations to enhance open-vocabulary semantic segmentation. We introduce three simple modifications to the final layer: removing the residual connection, implementing the self-self attention, and discarding the feed-forward network. ClearCLIP consistently generates clearer and more accurate segmentation maps and outperforms existing approaches across multiple benchmarks, affirming the significance of our discoveries.

Dependencies and Installation

# git clone this repository
git clone https://github.com/mc-lan/ClearCLIP.git
cd ClearCLIP

# create new anaconda env
conda create -n ClearCLIP python=3.10
conda activate ClearCLIP

# install torch and dependencies
pip install -r requirements.txt

Datasets

We include the following dataset configurations in this repo:

  1. With background class: PASCAL VOC, PASCAL Context, Cityscapes, ADE20k, and COCO-Stuff164k,
  2. Without background class: VOC20, Context59 (i.e., PASCAL VOC and PASCAL Context without the background category), and COCO-Object.

Please follow the MMSeg data preparation document to download and pre-process the datasets. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO164K

Quick Inference

python demo.py

Model evaluation

Single-GPU:

python eval.py --config ./config/cfg_DATASET.py --workdir YOUR_WORK_DIR

Multi-GPU:

bash ./dist_test.sh ./config/cfg_DATASET.py

Evaluation on all datasets:

python eval_all.py

Results will be saved in results.xlsx.

We provide the comparison results (in Appendix) on five datasets without background class by using our implementation as below:

Citation

@inproceedings{lan2024clearclip,
      title={ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference}, 
      author={Mengcheng Lan and Chaofeng Chen and Yiping Ke and Xinjiang Wang and Litong Feng and Wayne Zhang},
      booktitle={ECCV},
      year={2024},
}

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

Acknowledgement

This study is supported under the RIE2020 Industry Align- ment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

This implementation is based on OpenCLIP and SCLIP. Thanks for the awesome work.

Contact

If you have any questions, please feel free to reach out at [email protected].

clearclip's People

Contributors

mc-lan avatar

Stargazers

elucida avatar TeaQwQTea avatar Xing Yun (邢云) avatar Malik Hashmat avatar Junwei Zhou avatar persistence avatar Debug_Yann avatar yahooo avatar Lixiang Ru avatar ChenYuming avatar Le Zhang avatar Zheng-Yuan Xie avatar Yunheng Li avatar Nguyen Duc Anh Phuc avatar Khoi Nguyen avatar Tian Liu avatar I-Sheng Fang avatar Rongyuan Wu avatar Zhengzhong Tu avatar Yabin Zhang avatar  avatar Shi Guo avatar Weixia Zhang avatar Ye Ziyu avatar 李开宇 avatar Shunyu Yao avatar  avatar Chaofeng Chen avatar  avatar Chauncey avatar Xianing Chen avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

youlixiya

clearclip's Issues

About Fig.4 (a) Entropy

Hi, thanks for your great work!

I am interested in the Fig.4 (a) Entropy in the paper, but I cannot reproduce the results using the code below. I am confused about the implementation of the normalized entropy calculation.

def compute_normalized_entropy(tensor):
    flat_array = tensor.numpy().flatten()
    hist, _ = np.histogram(flat_array, bins=100, density=True)
    max_entropy = np.log2(len(hist))
    norm_entropy = entropy(hist, base=2) / max_entropy
    return norm_entropy

Could you please share the code used for plotting this figure, specifically the code for calculating the normalized entropy?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.