Giter VIP home page Giter VIP logo

zs3's Introduction

Zero-Shot Semantic Segmentation

Paper

Zero-Shot Semantic Segmentation
Maxime Bucher, Tuan-Hung Vu , Matthieu Cord, Patrick Pérez
valeo.ai, France
Neural Information Processing Systems (NeurIPS) 2019

If you find this code useful for your research, please cite our paper:

@inproceedings{bucher2019zero,
  title={Zero-Shot Semantic Segmentation},
  author={Bucher, Maxime and Vu, Tuan-Hung and Cord, Mathieu and P{\'e}rez, Patrick},
  booktitle={NeurIPS},
  year={2019}
}

Abstract

Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. By this way, ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time (so called "generalized" zero-shot classification). Performance is further improved by a self-training step that relies on automatic pseudo-labeling of pixels from unseen classes. On the two standard segmentation datasets, Pascal-VOC and Pascal-Context, we propose zero-shot benchmarks and set competitive baselines. For complex scenes as ones in the Pascal-Context dataset, we extend our approach by using a graph-context encoding to fully leverage spatial context priors coming from class-wise segmentation maps.

Code

Pre-requisites

  • Python 3.6
  • Pytorch >= 1.0 or higher
  • CUDA 9.0 or higher

Installation

  1. Clone the repo:
$ git clone https://github.com/valeoai/ZS3
  1. Install this repository and the dependencies using pip:
$ pip install -e ZS3

With this, you can edit the ZS3 code on the fly and import function and classes of ZS3 in other project as well.

  1. Optional. To uninstall this package, run:
$ pip uninstall ZS3

You can take a look at the Dockerfile if you are uncertain about steps to install this project.

Datasets

Pascal-VOC 2012

  • Pascal-VOC 2012: Please follow the instructions here to download images and semantic segmentation annotations.

  • Semantic Boundaries Dataset: Please follow the instructions here to download images and semantic segmentation annotations. Use this train set, which excludes overlap with Pascal-VOC validation set.

The Pascal-VOC and SBD datasets directory should have this structure:

ZS3/data/VOC2012/    % Pascal VOC and SBD datasets root
ZS3/data/VOC2012/ImageSets/Segmentation/     % Pascal VOC splits
ZS3/data/VOC2012/JPEGImages/     % Pascal VOC images
ZS3/data/VOC2012/SegmentationClass/      % Pascal VOC segmentation maps
ZS3/data/VOC2012/benchmark_RELEASE/dataset/img      % SBD images
ZS3/data/VOC2012/benchmark_RELEASE/dataset/cls      % SBD segmentation maps
ZS3/data/VOC2012/benchmark_RELEASE/dataset/train_noval.txt       % SBD train set

Pascal-Context

  • Pascal-VOC 2010: Please follow the instructions here to download images.

  • Pascal-Context: Please follow the instructions here to download segmentation annotations.

The Pascal-Context dataset directory should have this structure:

ZS3/data/context/    % Pascal context dataset root
ZS3/data/context/train.txt     % Pascal context train split
ZS3/data/context/val.txt     % Pascal context val split
ZS3/data/context/full_annotations/trainval/     % Pascal context segmentation maps
ZS3/data/context/full_annotations/labels.txt     % Pascal context 459 classes
ZS3/data/context/classes-59.txt     % Pascal context 59 classes
ZS3/data/context/VOCdevkit/VOC2010/JPEGImages     % Pascal VOC images

Training

Pascal-VOC

Follow steps below to train your model:

  1. Train deeplabv3+ using Pascal VOC dataset and ResNet as backbone, pretrained on imagenet (weights here):
python train_pascal.py
  1. Train GMMN and finetune the last classification layer of the trained deeplabv3+ model:
python train_pascal_GMMN.py
  • Main options

    • imagenet_pretrained_path: Path to ImageNet pretrained weights.
    • resume: Path to deeplabv3+ weights.
    • exp_path: Path to saved logs and weights folder.
    • checkname: Name of the saved logs and weights folder.
    • seen_classes_idx_metric: List of idx of seen classes.
    • unseen_classes_idx_metric: List of idx of unseen classes.
  • Final deeplabv3+ and GMMN weights

Pascal-Context

Follow steps below to train your model:

  1. Train deeplabv3+ using Pascal Context dataset and ResNet as backbone, pretrained on imagenet (weights here):
python train_context.py
  1. Train GMMN and finetune the last classification layer of the trained deeplabv3+ model:
python train_context_GMMN.py
  • Main options

    • imagenet_pretrained_path: Path to ImageNet pretrained weights.
    • resume: Path to deeplabv3+ weights.
    • exp_path: Path to saved logs and weights folder.
    • checkname: Name of the saved logs and weights folder.
    • seen_classes_idx_metric: List of idx of seen classes.
    • unseen_classes_idx_metric: List of idx of unseen classes.
  • Final deeplabv3+ and GMMN weights

(2 bis). Train GMMN with graph context and finetune the last classification layer of the trained deeplabv3+ model:

python train_context_GMMN_GCNcontext.py
  • Main options

    • imagenet_pretrained_path: Path to ImageNet pretrained weights.
    • resume: Path to deeplabv3+ weights.
    • exp_path: Path to saved logs and weights folder.
    • checkname: Name of the saved logs and weights folder.
    • seen_classes_idx_metric: List of idx of seen classes.
    • unseen_classes_idx_metric: List of idx of unseen classes.
  • Final deeplabv3+ and GMMN with graph context weights

Testing

python eval_pascal.py
python eval_context.py
  • Main options
    • resume: Path to deeplabv3+ and GMMN weights.
    • seen_classes_idx_metric: List of idx of seen classes.
    • unseen_classes_idx_metric: List of idx of unseen classes.

Acknowledgements

License

ZS3Net is released under the Apache 2.0 license.

zs3's People

Contributors

gabrieldemarmiesse avatar maximebucher avatar

Stargazers

Iqbal Rifai avatar Wildan M avatar Kishorè Shanto avatar  avatar Lin Chen avatar  avatar wuwuzzz avatar Jiho Choi avatar  avatar Heba Motea avatar Do Tran Ngoc avatar  avatar Aalekh  avatar  avatar Li Chunrui avatar  avatar  avatar Guo Bingyang avatar PeiHsuan avatar Dr. Santosh Shah avatar Sarang Deshpande avatar Vinny Ruia avatar  avatar Jonathan Burton-Barr avatar Wenbin Liang avatar Joe Nevaeh avatar Wen Wang avatar  avatar Junjie Chen avatar ZHOU, Jiapeng avatar Rajeshwar Rathi avatar Liye Mei avatar  avatar Chen Chen avatar Wesley L Passos avatar Romain Thoreau avatar Augus Yan avatar  avatar Haoran Duan avatar Gen Li avatar Ziyi Wu avatar Gavin Fang avatar Seonghoon-Yu avatar ucas_scst_cvmt_seg avatar Egor avatar Azamat Kanametov avatar  avatar Zikun Zhou avatar  avatar yingkaining avatar Vaibhav Bansal avatar  avatar Peter avatar Lihe Yang avatar  avatar Christian cancedda avatar  avatar  avatar  avatar  avatar Feras  avatar  avatar WS Jeon avatar  avatar Vayne avatar  avatar  avatar Ellery Queen avatar  avatar  avatar Zhongyi Zhou avatar  avatar Ziting Wang avatar  avatar  avatar  avatar  avatar  avatar  avatar Siddharth Shrivastava avatar 薛振锋 avatar Trashhhh avatar  avatar  avatar Comcx avatar  avatar Amanda avatar Linton avatar Zhi Chen avatar An-zhi WANG avatar Xiangyu Qi avatar Michael Gygli avatar Jeya Maria Jose avatar Wenjia Wang avatar  avatar Robin Ross avatar  avatar Sheehan avatar Yiqi Wang avatar Wang Ning avatar

Watchers

cheng zhang avatar  avatar Tuan-Hung VU avatar Ross Wightman avatar Zhijie Wang avatar Ashutosh Mishra avatar shuzhen avatar  avatar Cheng avatar Faye avatar  avatar Kashu Yamazaki avatar  avatar paper2code - bot avatar

zs3's Issues

Reproduction issue

Hi there,

I tried training the model from scratch using train_context, and I got fair results (31%), but the train_context_GMMN model does not go higher than 17% pixel accuracy for 2 unseen categories. I downloaded pre-trained weights for 2 and 4 unseen on the pascal-context you provided and ran eval_context. But the results are not explainable:
Seen: pixel accuracy: 4.9% mIoU: 0.5%
Unseen: pixel accuracy: 1% mIoU: 0.6%

Could you please provide us with correct pre-trained weights or shed some light on how to train/eval the model?

Try out on own example?

Hi, I am currently trying to get this to run. Is there an easy way to just try out your pretrained model on my own set of images? Appreciate any help!

the usage of unseen class segmentation annotations

Dear dr.maximebucher:

We would like to thank you for your enlightening paper named "Zero-Shot Semantic Segmentation". Having read your release codes elaborately, we have a kind of confusion about the step 2 training process (In train_pascal_GMMN.py, line 265). The code "loss = self.criterion(output, target)" which means the unseen class segmentation annotations will be inevitably involved in the training procedure of the classifier. In other words, the annotations of the unseen classes shouldn't be used as the supervision, since the zero-shot learning should keep the unseen classes' annotation unavailable. Therefore, we hope to know how the training process of the classifier can achieve the seen and unseen classes recognition.

Thanks again, wish for your response.

You cannot use unseen label info in the training process

Thanks for sharing this repo.
The idea is creative. But there is some logic error in the implementation process.
In a word, you have used the unseen label info in the training process.

  1. has_unseen_class: You don't know you have unseen class or not in real world. You just know the whole set of classes, classes set with label and classes set without label.
    You should use has_unseen_class = len(unique_class) < 21 for VOC.
  2. unique_class: You don't know the unique_class of a image in real world. You just know the unique labeled classes in that image, and the whole set of unseen classes, and the whole set of classes.
    You should generate all classes, seen and unseen, here. And use the unique seen classes in a image to train your generator.
  3. loss = self.criterion(output, target): You cannot use the virtual unseen label in the target to train your classifier. It is not fair. You should add some lines of code:
    for unseen_label in whole_unseen_label: target[target==unseen_label] = 255
    And set the ignore_label parameter in CEloss to 255 to ignore the virtual unseen label to simulate the real world application totally.

Despite all of above, I still believe your idea works. Hope to see your updated result.

The use of unseen labels during the training!

Hi there,
I have read your paper and the code, brilliant idea. I have a few questions on the implementation:

  1. Are unseen target labels being used for training the classifier? (Line 264 train_context_GMMN.py). If so, how is this method different than just training the backbone - deeplabv3+ using (seen and unseen) target labels.
  2. For fine-tuning the classifier to predict unseen label, how would you place and arrange the unseen generated visual features so that the classifier can exploit the spatial correlation between visual features? Could you please elaborate on the fine tuning procedure of the classifier?
  3. GCN is taking in the graph that is being generated based on the seen and unseen labels. Aren’t we supposed not to use any information on the unseen labels during the training process for ZSL.

Using unseen classes GT labels make no sense for zero-shot setting

As discussed in other issues, some unseen classes GT labels are used in this code. However, in the zero-shot setting, it is unsuitable since we do not know anything about the unseen classes except its name. Using some unseen classes GT labels is more like to process a semi-supervised task rather than a zero-shot task.

Dimension of output tensors in eval_pascal.py not match

Hi,
When I was running eval_pascal.py, an error occured

Traceback (most recent call last): 
File "eval_pascal.py", line 166, in validation
all_target = np.concatenate(all_target)
File "<__array_function__ internals>", line 6, in concatenate 
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 700 and the array at index 1 has size 765

I tried to output target.shape for all elements in all_target and the dimensions vary a lot

Pascal context splits

Hi,

I found the training and validation splits for pascal context are not provided on its website. Could someone please tell me where can I get them?

How can I reproduce the results for the baseline model described in experiments section?

Hello,

I read your paper and checked the repository. However, I couldn't find the code which is responsible for training and evaluating the baseline model discussed in the experiments section which is said to replace the classification layer of DeepLabv3+ with a projection layer to project extracted visual features onto the semantic embedding space to perform cosine similarity in the projected space. In the paper, it is said that this baseline is based on the DeViSE for zero-shot image classification. Could you please point me to the location in the repository where this experiment is conducted? If this experiment is not available in this repository, could you please add the responsible code to replicate it?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.