Giter VIP home page Giter VIP logo

weaksam's Introduction

WeakSAM

Segment Anything Meets Weakly-supervised Instance-level Recognition

Lianghui Zhu1 *,Junwei Zhou1 *,Yan Liu2, Xin Hao2, Wenyu Liu1, Xinggang Wang1 ๐Ÿ“ง

1 School of EIC, Huazhong University of Science and Technology, 2 Alipay Tian Qian Security Lab

(*) equal contribution, (๐Ÿ“ง) corresponding author.

ArXiv Preprint (arXiv 2402.14812)

News

  • Feb. 22nd, 2024: We released our paper on Arxiv. Further details can be found in code and our updated arXiv.

Abstract

Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM addresses two critical limitations in traditional WSOD retraining, i.e., pseudo ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT generation and Region of Interest (RoI) drop regularization. It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively.

Highlight performances

Overview

We first introduce classification clues and spatial points as automatic SAM prompts, which address the problem of SAM requiring interactive prompts. Next, we use the WeakSAM-proposals in the WSOD pipeline, in which the weakly-supervised detector performs class-aware perception to annotate pseudo ground truth (PGT). Then, we analyze the incompleteness and noise problem existing in PGT and propose adaptive PGT generation, RoI drop regularization to address them, respectively. Finally, we use WeakSAM-PGT to prompt SAM for WSIS extension. (The snowflake mark means the model is frozen.)

WeakSAM pipeline

Main results

For WSOD task:

Dataset WSOD method WSOD performance Retrain method Retrain performance
VOC2007 WeakSAM(OICR) 58.9 AP50 Faster R-CNN 65.7 AP50
DINO 66.1 AP50
WeakSAM(MIST) 67.4 AP50 Faster R-CNN 71.8 AP50
DINO 73.4 AP50
COCO2014 WeakSAM(OICR) 19.9 mAP Faster R-CNN 22.3 mAP
DINO 24.9 mAP
WeakSAM(MIST) 22.9 mAP Faster R-CNN 23.8 mAP
DINO 26.6 mAP

For WSIS task:

Dataset Retrain method AP25 AP50 AP70 AP75
VOC2012 Mask R-CNN 70.3 59.6 43.1 36.2
Mask2Former 73.4 64.4 49.7 45.3
Dataset Retrain method AP[50:95] AP50 AP75
COCOval2017 Mask R-CNN 20.6 33.9 22.0
Mask2Former 25.2 38.4 27.0
COCOtest-dev Mask R-CNN 21.0 34.5 22.2
Mask2Former 25.9 39.9 27.9

Data & Preliminaries

Generation & Training pipelines

Citation

If you find this repository/work helpful in your research, welcome to cite the paper and give a โญ.

@article{zhu2024weaksam,
      title={WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition}, 
      author={Lianghui Zhu and Junwei Zhou and Yan Liu and Xin Hao and Wenyu Liu and Xinggang Wang},
      year={2024},
      eprint={2402.14812},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Thanks for these wonderful works and their codebases! โค๏ธ MIST, WSOD2, Segment-anything, WeakTr, SoS-WSOD

weaksam's People

Contributors

colezwhy avatar unrealluver avatar

Stargazers

zhuyun97 avatar yrl-pris avatar Chongkai Yu avatar Shreyas Jaiswal avatar Aditya Arun avatar Sunny avatar ๅผ ้‡Ž avatar fyan avatar zzj avatar Sang avatar  avatar Jeff Carpenter avatar Bo Jiang avatar Wangjie Zhou avatar  avatar  avatar Tianheng Cheng avatar  avatar  avatar taoranyi avatar Debug_Yann avatar Bencheng avatar Gangwei XU avatar Junlin Chang avatar Lu Ming avatar YiwenCao avatar  avatar

Watchers

Tianheng Cheng avatar Kostas Georgiou avatar  avatar

Forkers

fyan1024

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.