yudewang / seam Goto Github PK

View Code? Open in Web Editor NEW

536.0 536.0 97.0 1.35 MB

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)

License: MIT License

Python 100.00%

seam's People

Contributors

Stargazers

Watchers

Forkers

dbofseuofhust templeblock jdc08161063 thithaotran peterzhousz yanglunwen youtang1993 xrosliang staceycy cv-ip xiaopingzeng ccj5351 morizsj xiamengqing lwzeng gcv9htd zhongkey99 allenwu97 zhang405744522 lwchn stephanie-9 aqua1907 issamlaradji stellali-0501 freegliboracle zfxu lxmwust nikhil1024 louisnust seetaface jaringau shiyanrubing lihuaqiang0101 mepleleo biqiwhu toopigtobig lucehe lmm077 githubltqc mathpopo weepiess watermelon369 shirly-han luoxd1996 mymuli ayanamireifan repo-collection whatsups nathanzhang1104 fumendan liuyin159 deadkany zzx0836 vitopalmis ml-edu wearepal wikiy223 hongbo-sun zhenyuanlin sunwanchun hyalvin choco9966 sohailkhanmarwat fanrz gaurav104 peijie-chiu bnair2001 thaneos hannah271 cheese0615 minlattnwe niexiuping ztt0821 lixiang007666 cvwbp cv-seg qiaowenfan awei-97 jimmyma99 jasonnhu hhvera sasikun krissatie viplfvm pipizhum djene-mengistu fizzcarry liyiheng123 rezatulu1108 pantea1104 pantea110 ayandarobesazim ayandarobesaz muzaffersaylan manan15105411262 hell-to-heaven

seam's Issues

Loss_ER is too small. Is it really helpful?

Thanks for your wonderful job.
However, when I read your code, I find that the loss_er is too small compared with other two loss. The reason is that you apply mean operation directly, however, there are a lot of 0(you set 0 for C-1 channels).

And I find the improvement within loss_er is much small in your paper compared with loss_ecr, I argue this may a bug?

I am sorry I do not have enough gpus to reproduce it.
Look forward to your reply

Question about the classification loss

The SEAM is really a excellent work. After reading the paper, I have a question:

how to get the final segmentation mask? In my understanding, the SEAM finally output a CAM map, then the Random work is used to segment the final mask? Am I right?
How to calculate the classification loss? For example, the final output is

and we can also calculate the background as:

but, how can we use the two result to calculate the loss? how can we generate the ground truth? Is img(m, n) = c (the true label) the ground truth?

Any suggestion is appreciated!

Training the segmentation code

Hello!

Thank you for sharing the excellent code.

I am trying to reproduce the performance you reported and I tried to train the result of the affinity network [Ahn et al.] with the segmentation code of https://github.com/itijyou/ademxapp

But I failed to train. Can you share the hyper-parameters or any change when you train?
From the affinity net I found that he changed SGD to Adam with his work.

You may not remember, I need a little clue.

Thank you.

医学图像的分割

请问这个网络可以用于医学图像的分割吗

Performance from the provided weights

Hello!

Thanks for the sharing the code.

I ran your code with the trained weights and got lower performance than the paper reported.
60.076 % mIOU for the validation set.

My inference step is

Infer the CAM(npy files) from the infer_SEAM.py with the trained model
Infer the Segmentation map(png files) from the infer_aff.py with the trained model
evaluate the png files with gt files.

Is there anything I missed?
Than you!

result images

How can I get the final result images? Is the generated npy file converted to an image?

About training deeplab

Hi, thank you for your excellent work,
Can you provide the code of training the fully-supervised deeplab model? Or can you give me some hints about how you initialize the deeplab to train on pseudo labels? Did you just train it from scratch, or use the backbone trained on imagenet, or use the pretrained parameters on coco?
Thank you for your reply.

cam multiplied by GT label?

Both during training and inference the cam output is multiplied by the ground truth label.

training:

Line 123: cam_rv1 = F.interpolate(visualization.max_norm(cam_rv1),scale_factor=scale_factor,mode='bilinear',align_corners=True)*label
Line 129: cam_rv2 = visualization.max_norm(cam_rv2)*label

inference:
Line 63: cam = cam.cpu().numpy() * label.clone().view(20, 1, 1).numpy()

Is that done in error? How can we assume that labels are available during inference?

GPU and batch size?

Thanks for your great work!
I noticed that in your paper you mentioned: The model is trained on 4 TITAN-Xp GPUs with batch size 8 for 8 epochs.
However, I train the SEAM on 4 2080Ti GPUs with batch size 8, and find that each card only took up about 4G memory.
So I wonder, are 4×12G GPUs necessary?
Thanks for your reply.

training with my custom dataset

Hi. Thank you for sharing your code.

I'm trying to train the model with my custom dataset.
The number of class is 3, so I changed the code in resnet38_SEAM.py

line 16 : self.fc8 = nn.Conv2d(4096, 4, 1, bias=False)

I just changed the dim and run the code, but error occurs.
I thought that it's about the CUDA so I changed the batch size 2.
But the result same.

I found that the error occurs when the loss is Nan.
After some iterations, the loss_cls1 and loss_cls2 become Nan..

THCudaCheck FAIL file=C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMathPointwise.cu line=253 error=59 : device-side assert triggered
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [0,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [2,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [3,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [4,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [5,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "C:/Users/Goeun/PycharmProjects/SEAM2/train_SEAM.py", line 144, in <module>
    loss.backward()
  File "C:\Users\Goeun\miniconda3\envs\seam\lib\site-packages\torch\tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\Goeun\miniconda3\envs\seam\lib\site-packages\torch\autograd\__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMathPointwise.cu:253

Process finished with exit code 1

Couldn't download pretrained models

Replace AffinityNet with IRN

Dear YudeWang,
Thanks for your code!
Do you have replaced AffinityNet with IRN before? I get a worse result when I replaced AffinityNet with IRN. Could you give me some advice about this?

Can u give some detail about training?

Dear Wang, I just use the default training script to run train.py, However the result are lower than paper said(the result mIOU I test in VOC2012 val are 0.388), So can u give some detail about training?(some hyper-param?)

The paramters in optimizer

Hello, I note that the order of paramters (params lr wd) in PolyOptimizer is different from official SGD(params lr momentum). So I think the value of wd will actually be assigned to momentum. Is it so?

class PolyOptimizer(torch.optim.SGD):

    def __init__(self, params, lr, weight_decay, max_step, momentum=0.9):
        super().__init__(params, lr, weight_decay)

Confused about the code: "cam = np.flip(cam, axis=-1)"

SEAM/infer_SEAM.py

Line 64 in c556016

cam = np.flip(cam, axis=-1)

Could you explain why the code need flip operation in the lines 64? Several papers hava also used this operation.
Thank you!

why the final experiments result on test _set is better than it on val_set?

Hi,thanks for sharing your job. did u train the segmention network with train_set only(not with train_aug set )and get the 64.5 of miou on val_set?
and can you explain why the final experiments result on test _set is better than it on val_set, which made me confused?

Why using cls labels to generate CAMs at inference time? Is it valid?

At val / test time, in infer_SEAM.py (line 79 to line 82), you use GT cls labels to choose CAMs of these categories and save these specified CAMs as .npy files. I am wondering whether using GT cls labels at inference time is valid in weakly-supervised semantic segmentation. Could you provide me with some hints? Much thanks!

Performance about `SEAM step`, step3 and `Random walk step`, step3 in README?

Thansks for your sharing!
Can you report the mIoU in SEAM step, step3 and Random walk step, step3 in README?
The previous AffinityNet train another segmentation network with pseudo label and the related source code is not open. I am not sure if i can reproduce the results mentioned in AffinityNet becase i am not familiar with the training of DeepLab. Thanks!

crf code should be changed

    def _crf_with_alpha(cam_dict, alpha):
        v = np.array(list(cam_dict.values()))
        bg_score = np.power(1 - np.max(v, axis=0, keepdims=True), alpha)
        bgcam_score = np.concatenate((bg_score, v), axis=0)
        crf_score = imutils.crf_inference(orig_img, bgcam_score, labels=bgcam_score.shape[0])
        pred_map = crf_score.argmax(0).astype(np.uint8)
        keys = np.array(list(cam_dict.keys()))+1
        keys = np.pad(keys, (1, 0), mode='constant')
        pred_map = keys[pred_map]
        return pred_map


    for t in crf_alpha:
        crf = _crf_with_alpha(cam_dict, t)
        folder = args.out_crf + ('_%.1f' % t)
        if not os.path.exists(folder):
            os.makedirs(folder)
        import imageio
        imageio.imsave(os.path.join(folder, "%s.png" % img_name), crf.astype(np.uint8))



    print(iter)

is there any ablation study of ECR loss

the performance improvement is mostly abtained by PCM module which is constrained by ECR loss, so the effect of ECR loss is more important than ER loss.
Can you provide the ablation study of ECR loss?
it is better to add a ablation study of ecr loss in table 1.

OHEM

Hello, Yude.

Thanks for sharing this great work!

I have one question about table 1. You mentioned that you reported results in table 1 with the training set. Then, it seems OHEM process should be involved with train_SEAM.py. Is that correct? Does your repo include OHEM process? How can I use OHEM in your code?

Thanks

关于CRF的参数设置

您好，我觉得您这个库挺方便的，但我有个问题，就是你的crf参数是怎么设置的？比如infer_SEAM中为什么是4和24？我看之前有人问了这个问题，你说的是just set bkg_score_low<best bkg_score<bkg_score_high。我在实验我的模型的时候，发现best bkg_score是0.21，按理来说按照你这个设置就好，但发现crf refine后的结果更低了，请问有什么好的调参方法嘛？

.

Large performance gap between trained model using default setting and the provided trained model.

With the provided trained 'resnet38_SEAM.pth', the results of SEAM step evaluation:

0/60 background score: 0.000 mIoU: 28.861%
1/60 background score: 0.010 mIoU: 32.021%
2/60 background score: 0.020 mIoU: 35.937%
3/60 background score: 0.030 mIoU: 39.372%
4/60 background score: 0.040 mIoU: 42.470%
5/60 background score: 0.050 mIoU: 45.309%
6/60 background score: 0.060 mIoU: 47.967%
7/60 background score: 0.070 mIoU: 50.436%
8/60 background score: 0.080 mIoU: 52.721%
9/60 background score: 0.090 mIoU: 54.865%
10/60 background score: 0.100 mIoU: 56.885%
11/60 background score: 0.110 mIoU: 58.777%
12/60 background score: 0.120 mIoU: 60.595%
13/60 background score: 0.130 mIoU: 62.310%
14/60 background score: 0.140 mIoU: 63.905%
15/60 background score: 0.150 mIoU: 65.372%
16/60 background score: 0.160 mIoU: 66.710%
17/60 background score: 0.170 mIoU: 67.907%
18/60 background score: 0.180 mIoU: 68.925%
19/60 background score: 0.190 mIoU: 69.758%
20/60 background score: 0.200 mIoU: 70.414%
21/60 background score: 0.210 mIoU: 71.014%
22/60 background score: 0.220 mIoU: 71.291%
23/60 background score: 0.230 mIoU: 71.324%
24/60 background score: 0.240 mIoU: 71.143%
25/60 background score: 0.250 mIoU: 70.799%
26/60 background score: 0.260 mIoU: 70.287%
27/60 background score: 0.270 mIoU: 69.664%
28/60 background score: 0.280 mIoU: 68.952%
29/60 background score: 0.290 mIoU: 68.148%
30/60 background score: 0.300 mIoU: 67.274%
31/60 background score: 0.310 mIoU: 66.322%
32/60 background score: 0.320 mIoU: 65.305%
33/60 background score: 0.330 mIoU: 64.232%
34/60 background score: 0.340 mIoU: 63.105%
35/60 background score: 0.350 mIoU: 61.939%
36/60 background score: 0.360 mIoU: 60.727%
37/60 background score: 0.370 mIoU: 59.485%
38/60 background score: 0.380 mIoU: 58.215%
39/60 background score: 0.390 mIoU: 56.921%
40/60 background score: 0.400 mIoU: 55.609%
41/60 background score: 0.410 mIoU: 54.281%
42/60 background score: 0.420 mIoU: 52.940%
43/60 background score: 0.430 mIoU: 51.605%
44/60 background score: 0.440 mIoU: 50.279%
45/60 background score: 0.450 mIoU: 48.955%
46/60 background score: 0.460 mIoU: 47.630%
47/60 background score: 0.470 mIoU: 46.303%
48/60 background score: 0.480 mIoU: 44.982%
49/60 background score: 0.490 mIoU: 43.653%
50/60 background score: 0.500 mIoU: 42.330%
51/60 background score: 0.510 mIoU: 41.015%
52/60 background score: 0.520 mIoU: 39.709%
53/60 background score: 0.530 mIoU: 38.409%
54/60 background score: 0.540 mIoU: 37.119%
55/60 background score: 0.550 mIoU: 35.848%
56/60 background score: 0.560 mIoU: 34.601%
57/60 background score: 0.570 mIoU: 33.372%
58/60 background score: 0.580 mIoU: 32.158%
59/60 background score: 0.590 mIoU: 30.959%

When using the 'resnet38_SEAM.pth' trained myself using the default settings (except that I used two GPU cards，the batch size was still set to 8), the results of SEAM step evaluation:

0/60 background score: 0.000 mIoU: 22.938%
1/60 background score: 0.010 mIoU: 26.294%
2/60 background score: 0.020 mIoU: 30.367%
3/60 background score: 0.030 mIoU: 33.779%
4/60 background score: 0.040 mIoU: 36.815%
5/60 background score: 0.050 mIoU: 39.461%
6/60 background score: 0.060 mIoU: 41.722%
7/60 background score: 0.070 mIoU: 43.691%
8/60 background score: 0.080 mIoU: 45.386%
9/60 background score: 0.090 mIoU: 46.875%
10/60 background score: 0.100 mIoU: 48.230%
11/60 background score: 0.110 mIoU: 49.466%
12/60 background score: 0.120 mIoU: 50.592%
13/60 background score: 0.130 mIoU: 51.575%
14/60 background score: 0.140 mIoU: 52.443%
15/60 background score: 0.150 mIoU: 53.182%
16/60 background score: 0.160 mIoU: 53.806%
17/60 background score: 0.170 mIoU: 54.334%
18/60 background score: 0.180 mIoU: 54.759%
19/60 background score: 0.190 mIoU: 55.087%
20/60 background score: 0.200 mIoU: 55.339%
21/60 background score: 0.210 mIoU: 55.510%
22/60 background score: 0.220 mIoU: 55.590%
23/60 background score: 0.230 mIoU: 55.594%
24/60 background score: 0.240 mIoU: 55.525%
25/60 background score: 0.250 mIoU: 55.382%
26/60 background score: 0.260 mIoU: 55.169%
27/60 background score: 0.270 mIoU: 54.892%
28/60 background score: 0.280 mIoU: 54.556%
29/60 background score: 0.290 mIoU: 54.155%
30/60 background score: 0.300 mIoU: 53.685%
31/60 background score: 0.310 mIoU: 53.182%
32/60 background score: 0.320 mIoU: 52.640%
33/60 background score: 0.330 mIoU: 52.064%
34/60 background score: 0.340 mIoU: 51.445%
35/60 background score: 0.350 mIoU: 50.793%
36/60 background score: 0.360 mIoU: 50.107%
37/60 background score: 0.370 mIoU: 49.380%
38/60 background score: 0.380 mIoU: 48.624%
39/60 background score: 0.390 mIoU: 47.837%
40/60 background score: 0.400 mIoU: 47.029%
41/60 background score: 0.410 mIoU: 46.199%
42/60 background score: 0.420 mIoU: 45.353%
43/60 background score: 0.430 mIoU: 44.483%
44/60 background score: 0.440 mIoU: 43.593%
45/60 background score: 0.450 mIoU: 42.681%
46/60 background score: 0.460 mIoU: 41.749%
47/60 background score: 0.470 mIoU: 40.809%
48/60 background score: 0.480 mIoU: 39.855%
49/60 background score: 0.490 mIoU: 38.890%
50/60 background score: 0.500 mIoU: 37.914%
51/60 background score: 0.510 mIoU: 36.934%
52/60 background score: 0.520 mIoU: 35.954%
53/60 background score: 0.530 mIoU: 34.974%
54/60 background score: 0.540 mIoU: 33.988%
55/60 background score: 0.550 mIoU: 32.998%
56/60 background score: 0.560 mIoU: 32.011%
57/60 background score: 0.570 mIoU: 31.033%
58/60 background score: 0.580 mIoU: 30.064%
59/60 background score: 0.590 mIoU: 29.102%

Segmentation code release or some repro link recommendation?

About code

Thank you for your sharing firstly!
What do the model outputs crm1 and crm_rv1 represent in the train_SEAM.py file?

How is CAM mIOU validated?

Hi @YudeWang, whether CAM mIOU reported in your paper validated on original PASCAL VOC trainset or SBD augmented trainset? When I use the same hyperparameter as your code, I can only get 43.9% mIOU with single-scale test on augmented trainset, lower than reported 46.1%.

CUDA error: out of memory

I CANNOT RUN IT
EVEN USE 8 GPUs (ONE IMG PER GPU)

Background threshold?

I notice that you traverse all background threshold options and give the best mIoU of pseudo labels, this setting assumes that the ground truth masks are available during pseudo label generating. However, in practice, if the gt masks are available, why don't we just use these gt labels? So I think a background threshold selection strategy without depending on gt masks is needed here for practice. What do you think of it? Thanks！

Some issues about the performance(mIoU)

Hi,

Thanks for sharing the code.

I am trying to reproduce your code, but the final result I got is 3% different from the result in your article.

I followed the steps in the readme exactly. The results of local training are Train: 63.420%, Val: 60.336%; the results obtained by using the model you provided are: Train: 63.606%, Val: 60.076%.

I am not sure where the problem occurred, and I look forward to your answer, thank you.

Optimization problem when training SEAM from scratch

Hi, firstly thank you for releasing the code, I've successfully reproduced part of the result by using the provided weights.

However, when I tried to train SEAM from scratch (not using any pretrained weights), it seems ER loss easily goes down to 0 and ECR loss just cannot go down, then the model cannot improve anymore. I've tried to increase the loss weight of ECR loss but the outcome is still the same.
Could you provide more details or suggestions on how you train SEAM without pretrained weights?

Thanks!

Can you share the segmentation code your used?

Hey, thanks for sharing your code!
I've run your code and achieved close results as reported on training set. But I didn't find the segmentation code. Can you share it?
Many thanks!

CRF Inference

Hi,

Thank you for sharing the code. When I was trying to understand the code, I ran into a trouble to understand crf_inference. In line 96 of infer_SEAM.py, bgcam_score does not seem like probabilities (I checked max value for some images are 1.2). But unary_from_softmax takes probs from softmax as inputs. I am not sure if something is wrong here or I am missing something.
It would be highly appreciated if you could clarify it. Thank you.

CAM is not accurate.

Hi, I know the performance of weakly-supervised semantic segmentation is not so well as supervised SS.

But I still confused about the result, it's much worse than I thought.

Original image:

CAM:

I just run this command: python infer_SEAM.py --weights ../resnet38_SEAM.pth --infer_list voc12/val.txt --out_cam_pred out_cam_pred

segmentation training problems

It seems that you use the train_set to train segmentation model. why not use trainaug?
Following the setting in #11, my results is 61.5 training with trainaug and 56.7 with train. Why it differs a lot from the results of the paper? (Note that the weight is from ilsvrc-cls_rna-a1_cls1000_ep-0001.params. test resolution is (1024*512) * [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] in test.)
why it drops after applying crf in RW step?

exception during SEAM inference

Dear YudeWang,
I have successfully trained the model but during the stage of SEAM inference, the code will stop at a random iter(38, 14 or whatever) and don't go on and out_cam(or out_crf) folder won't produce file anymore. How can I solve the exception?

The valuation problem of the pseudo label

Thank you for your excellent work. I would like to know if the performance of the segmentation model is validated using the ground truth labels provided by VOC. Thank you very much for your response.

segmentation code

dCRF on CAMs

Hi， thanks for sharing your great work，I have one question about dCRF in your paper and wish for your reply.
In your paper, a bunch of CAMs can be generated after training the SEAM.py, I want to know how to proceed the dCRF process in these CAMs (56.83% in table 1). Proceed the dCRF on the CAMs after combing with the best background scores from (0,60) or simply using foreground images ?

cam_full_arr[k+1] = v out of bounds

Thanks for posting your excellent code!
I met some problems when using the code. In line95, the npy files store the information of classes in a training sample. The max k in line 98 is 21 for the VOC dataset, because the VOC dataset contains 21 categories. In line 99, the index will out of the bound, because k+1 can vbe 22 But cam_full_arr's size is 21. How can I do to solve the error?
And what does the line 100 mean? What is filled in the cam_full_arr[0]? I am confused.

SEAM/infer_aff.py

Lines 95 to 100 in 3212261

 cam = np.load(os.path.join(args.cam_dir, name + '.npy'), allow_pickle=True).item() 

 cam_full_arr = np.zeros((21, orig_shape[2], orig_shape[3]), np.float32) 

 for k, v in cam.items(): 

 cam_full_arr[k+1] = v 

 cam_full_arr[0] = (1 - np.max(cam_full_arr[1:], (0), keepdims=False))**args.alpha

Looking forward to your reply.

infer_seam npy

first of all,thanks for this code ,it vary useful!
however,when i use infer_seam, the result will saved as .npy,how can i used it to pred hotmap?

What is the pipeline to reproduce the result of Table 6 in your paper?

Thanks for your excellent work and released code.

Looking forward to your reply!

Huge Time Cost When Running the Code

Hi, thanks to your work!
But I have got a problem when running this code with an 8*GPU(A100) server, it just stuck on this two line
model = torch.nn.DataParallel(model).cuda()
for iter, pack in enumerate(train_data_loader):
And also it cost a lot of time to run evert option in the traning process like F.interpolate
I wonder if there is something wrong with my conda env?

	cam = np.load(os.path.join(args.cam_dir, name + '.npy'), allow_pickle=True).item()

	cam_full_arr = np.zeros((21, orig_shape[2], orig_shape[3]), np.float32)
	for k, v in cam.items():
	cam_full_arr[k+1] = v
	cam_full_arr[0] = (1 - np.max(cam_full_arr[1:], (0), keepdims=False))**args.alpha