Giter VIP home page Giter VIP logo

sipe's Introduction

Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation (SIPE)

framework

The implementation of Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation, Qi Chen, Lingxiao Yang, Jianhuang Lai, and Xiaohua Xie, CVPR 2022.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) based on image-level labels has attracted much attention due to low annotation costs. Existing methods often rely on Class Activation Mapping (CAM) that measures the correlation between image pixels and classifier weight. However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. Specifically, IPE tailors prototypes for every image to capture complete regions, formed our Image-Specific CAM (IS-CAM), which is realized by two sequential steps. In addition, GSC is proposed to construct the consistency of general CAM and our specific IS-CAM, which further optimizes the feature representation and empowers a self-correction ability of prototype exploration. Extensive experiments are conducted on PASCAL VOC 2012 and MS COCO 2014 segmentation benchmark and results show our SIPE achieves new state-of-the-art performance using only image-level labels.

Environment

  • Python >= 3.6.6
  • Pytorch >= 1.6.0
  • Torchvision

Usage

Step 1. Prepare Dataset

Step 2. Train SIPE

# PASCAL VOC 2012
bash run_voc.sh

# MS COCO 2014
bash run_coco.sh

Step 3. Train Fully Supervised Segmentation Models

To train fully supervised segmentation models, we refer to deeplab-pytorch and seamv1.

Results

Localization maps

Dataset Model mIoU (Train) Weight Training log
PASCAL VOC 2012 CVPR submit 58.65 Download Logfile
PASCAL VOC 2012 This repo 58.88 Download Logfile
MS COCO 2014 CVPR submit 34.41 Download Logfile
MS COCO 2014 This repo 35.05 Download Logfile

Segmentation maps

Dataset Model mIoU (Val) mIoU (Test) Weight
PASCAL VOC 2012 WideResNet38 68.2 69.5 Download
PASCAL VOC 2012 ResNet101 68.8 69.7 Download
MS COCO 2014 WideResNet38 43.6 - Download
MS COCO 2014 ResNet101 40.6 - Download

Citation

@InProceedings{Chen_2022_CVPR_SIPE,
    author    = {Chen, Qi and Yang, Lingxiao and Lai, Jian-Huang and Xie, Xiaohua},
    title     = {Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4288-4298}
}

sipe's People

Contributors

chenqi1126 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sipe's Issues

实验问题

        # if (optimizer.global_step-1)% 500 == 0 and optimizer.global_step > 10:
        #     miou = validate(model, val_data_loader)
        #     torch.save({'net':model.module.state_dict()}, os.path.join("sess", 'ckpt', 'iter_' + str(optimizer.global_step) + '.pth'))
        #
        #     if miou > bestiou:
        #         bestiou = miou
        #         torch.save({'net':model.module.state_dict()}, os.path.join("sess", 'ckpt', 'best.pth'))
            

        
torch.save({'net':model.module.state_dict()}, os.path.join("sess", 'ckpt', 'final.pth'))
torch.cuda.empty_cache()

为什么我把上面代码注释掉,也就是不保存每次迭代结果和best.pth,单纯保存最终'final.pth',得到结果会下降到58.25%呢?

CAM的均值化处理

您好,感谢您卓越的工作,我仔细阅读了您的文章和代码的一些细节,我有一些不太理解的地方。代码中显示需要对原始的CAM和IS-CAM进行均值化
norm_cam = norm_cam/(F.adaptive_max_pool2d(norm_cam, (1, 1)) + 1e-5) IS_cam = IS_cam / (F.adaptive_max_pool2d(IS_cam, (1, 1)) + 1e-5)
同时,不太理解,为什么需要对特征进行进一步的处理:
`

feature_s = feature_s/(torch.norm(feature_s,dim=1,keepdim=True)+1e-5) # B C (H*W)
`

question

Hello author, I am very interested in your work. Your v2 segmentation result is 68.8. I would like to know which weight of the pre trained model you used when running Deeplab v2? deeplabv1_resnet101-coco.pth or deeplabv1_resnet101-imagenet.pth

About the pertained backbone for DeepLab V2

Hi,
Thanks for your interesting work. I notice that you have trained a DeepLabV2 semantic segmentation network. Regarding the pre training of ResNet101, is it on COCO or ImageNet? I notice that you give two repo (deeplab-pytorch and seamv1.) as guidance, but deeplab-pytorch uses COCO pretrained weights, and seamv1 uses ImageNet pretrained weights. Since you didnot explain it in the paper or this repo, I open this issue for the answer. Looking forward to your reply.

Sincerely,
Tomsa.

Why do you use the path:train_ but not trainaug_ during make_cam.py?

Thanks for your work! When I follow the setting of make_cam.py, I get an exception during train_irn.py which shows that FileNotFoundError: No such file 'XXXX/SIPE-main/exp/ir_label/2010_000317.png'. Then I found that there is no 2010_000317.png in train_voc. How can I solve it?

Is your final result after CRF

Because the result I reproduced (after irnet processing), the Miou of train is 68. X, and the result of cam is consistent with your 58.5

Error found while running “$ python train_resnet50_SIPE.py” code

File "train_resnet50_SIPE.py", line 191, in
train()
File "train_resnet50_SIPE.py", line 145, in train
losses.backward()
File "/home/xjt/anaconda3/envs/xjt/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/xjt/anaconda3/envs/xjt/lib/python3.6/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 21, 32, 32]], which is output 0 of GatherBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Can you provide me with the modification plan?

Resnet38 implementation

Hello! Thanks for your great work!

In your paper, SIPE shows SOTA performance with resnet38 on COCO dataset.

I want to implement that.

Can you share your resnet 38 network used in the experiments?

Thanks.

about densecrf

Hello, thank you for providing an excellent paper. As a beginner, I have some questions that I am not familiar with. I would like to ask you for advice:
Regarding the handling of densecrf, you said, "Add a preprocess to convert the uncertainty area (255) to the background (0) before calculation.". Can you provide a detailed explanation?
Sorry to bother you, I hope you can reply!

The result of DenseCRF in Table 1

Hi, I try to reproduce the results of Table 1 in the paper. But the result of +DenseCRF is only 62.05 (using your trained weights to infer the cam file). Could you share your CRF process code and CRF hyper-parameters? Thanks!

The effect of valid_mask?

Hi! Thanks for the great jod!
When I read the code, I was confused about what valid_mask does. From data_voc.py I can see that it is a matrix with partial values of 1 (16, 21, h, w) determined from random cropping. However, in resnet50_SIPE.py,
*norm_cam=F.interpolate(norm_cam, side3.shape[2:], mode='bilinear', align_corners=True)valid_mask.
What does this code mean?

CAM performance in table 4

Hey, thank you for your work.
I just have a question for CAM performance in table 4, the miou value is 50.1%.
Since recent methods based on Resnet50 and vgg16 only reach their CAM among 48%, if you add any nouveau/extra process to reach that performance?

By the way, maybe there is a miss in 4.3, compared with original one, IPE improve the CAM by 3.1%(which is 2.1% in paper)

The weight for 4 extra_conv to get hierarchical feature in IPE

Hey, according to your paper and code, there are 4 extra_conv to get hierarchical feature in IPE, I guess their main effort is to reduce dimensionality.
My question is how do you update their weights when only IPE works, (as mentioned in table 4, miou reaches 53.2% ).
I could understand that in the total framework, the weight could be updated by the Lgsc loss, but in only IPE, there is only Lcls, how could you update their weights then?

densecrf

Hello, how did you implement the DenseCRF post-processing in your paper? I tried applying DenseCRF post-processing to the localization maps in our project following the DenseCRF project on GitHub, but it resulted in a performance decrease.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.