yyliu01 / ps-mt Goto Github PK

View Code? Open in Web Editor NEW

180.0 5.0 16.0 1.11 MB

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Home Page: https://arxiv.org/pdf/2111.12903.pdf

License: MIT License

Python 97.58% Dockerfile 0.12% Shell 2.30%

cvpr2022 semantic-segmentation semi-supervised-learning semi-supervised-segmentation

ps-mt's Introduction

PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

by Yuyuan Liu, Yu Tian, Yuanhong Chen, Fengbei Liu, Vasileios Belagiannis and Gustavo Carneiro

Computer Vision and Pattern Recognition Conference (CVPR), 2022

Installation

Please install the dependencies and dataset based on this installation document.

Getting start

Please follow this instruction document to reproduce our results.

Update

blender setting results in VOC12 dataset (under deeplabv3+ with resnet101)

Approach 1/16 (662) 1/8 (1323) 1/4 (2646) 1/2 (5291)

PS-MT (wandb_log) 78.79 80.29 80.66 80.87
- please note that, we update the blender splits list end with an extra 0 (e.g., 6620 for 662 labels) in the original directory.
- you can find the related launching scripts in here.
- In case you are using blender experiments (which are built on top of the high-quality labels), please compare with the results in this table.

Approach	1/16 (662)	1/8 (1323)	1/4 (2646)	1/2 (5291)
PS-MT (wandb_log)	78.79	80.29	80.66	80.87

Results

Pascal VOC12 dataset

augmented set

Backbone 1/16 (662) 1/8 (1323) 1/4 (2646) 1/2 (5291)

50 72.83 75.70 76.43 77.88

101 75.50 78.20 78.72 79.76
high quality set (based on res101)

1/16 (92) 1/8 (183) 1/4 (366) 1/2 (732) full (1464)

65.80 69.58 76.57 78.42 80.01

Backbone	1/16 (662)	1/8 (1323)	1/4 (2646)	1/2 (5291)
50	72.83	75.70	76.43	77.88
101	75.50	78.20	78.72	79.76

1/16 (92)	1/8 (183)	1/4 (366)	1/2 (732)	full (1464)
65.80	69.58	76.57	78.42	80.01

CityScape dataset

following the setting of CAC (720x720, CE supervised loss)

Backbone slid. eval 1/8 (372) 1/4 (744) 1/2 (1488)

50 ✗ 74.37 75.15 76.02

50 ✓ 75.76 76.92 77.64

101 ✓ 76.89 77.60 79.09
following the setting of CPS (800x800, OHEM supervised loss)

Backbone slid. eval 1/8 (372) 1/4 (744) 1/2 (1488)

50 ✓ 77.12 78.38 79.22

Backbone	slid. eval	1/8 (372)	1/4 (744)	1/2 (1488)
50	✗	74.37	75.15	76.02
50	✓	75.76	76.92	77.64
101	✓	76.89	77.60	79.09

Backbone	slid. eval	1/8 (372)	1/4 (744)	1/2 (1488)
50	✓	77.12	78.38	79.22

Training details

Some examples of training details, including:

VOC12 dataset in this wandb link.
Cityscapes dataset in this wandb link (w/ 1-teacher inference).

In details, after clicking the run (e.g., 1323_voc_rand1), you can checkout:

overall information (e.g., training command line, hardware information and training time).
training details (e.g., loss curves, validation results and visualization)
output logs (well, sometimes might crash ...)

Acknowledgement & Citation

The code is highly based on the CCT. Many thanks for their great work.

Please consider citing this project in your publications if it helps your research.

@article{liu2021perturbed,
  title={Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation},
  author={Liu, Yuyuan and Tian, Yu and Chen, Yuanhong and Liu, Fengbei and Belagiannis, Vasileios and Carneiro, Gustavo},
  journal={arXiv preprint arXiv:2111.12903},
  year={2021}
}

TODO

Code of deeplabv3+ for voc12
Code of deeplabv3+ for cityscapes

ps-mt's People

Contributors

Stargazers

Watchers

Forkers

heylinyuhao layccg windb3ll nicolesherwood minisoco awekling billionerd guanghaojiang cyh-0 tian-003 zhulinxiaohai zeroac winterpan2017 pyedog1976 muzaffersaylan wzhengkai

ps-mt's Issues

Cityscapes dataset

Hi!Thank you for the contribution!

Could you provide processed cityscape dataset?I want to use the same dataset to train.

Thanks!

Questions about the training environment

Hello.
Can I know the gpu information (gpu name(type), number of gpu, and memory usage of gpu) and training time in your experiments?

If possible, I would like to know roughly about VOC and Cicyscapes, respectively.
(especially training time and gpu memory usage)

Thank you !

I recently read your paper and it's a great job. I have a question about the Conf-CE loss in the paper. Can you explain what the c(w) formula means? I confused y~ and y^ in the c(w), formula(4), figure 2.

Questions about performance by batch size

Hello.
First of all, thank you for sharing your wonderful research!

I have some questions.
I am comparing your study with the CPS study, which is a TOP3 benchmark on semi-supervised semantic segmentation (pascal VOC dataset).
According to your paper and code, it seems to be done with batch-size = gpus(4)*batch_size(8) = 32.
However, according to the paper and code of CPS, it was done with labeled_data: 8 batch & unlabeled_data: 8 batch.
Therefore, to ensure a fair comparison, I kept all other options unchanged and trained your code with a resnet50 model, batch-size of 8 (which is equal to the product of 2 GPUs and 4 batch-size), and 80 epochs for all labeled ratios.
Also, the implementation of CPS was also trained by setting the epochs to 80 for all labeled ratios.

Below is the result of my re-implementation. (CPS vs PS-MT)

PS-MT

	1/8	1/4	1/2
Epoch80 Score	73.74	72.86	75.77
BEST Score	74.07	74.39	75.80
BEST Epoch	58	28 or 80	80

	1/8	1/4	1/2
Epoch80 Score	74.09	75.406	75.651
BEST Score	74.937	75.557	75.786
BEST Epoch	67	66	78

This result shows the difference with the results of your study.
Also, I think the batch-size in semi-supervised learning is much more important than the importance of batch-size in supervised-learning.

I have a few questions to ask

Is there no problem with the results of my re-implementation described above?
I would like to ask your opinion on the importance of batch-size in semi-supervised learning(semi-supervised semantic segmentatio). Also, I would appreciate it if you could let me know if there is anything I can refer to.

T-VAT

Hello, I would like to know where is your T-VAT module code? Please help me.

three teachers

Would it be better to integrate three teachers?

Results of supervised baseline

Hi,

Did you apply any extra techniques to your supervised baseline, such as setting output stride as 8, auxiliary loss, or OHEM? Since your reported baseline results are very high on the Pascal dataset, according to Figure 3.

continue training

Do you have any code to continue training the model? (including model load)

voc12

Hello, why is there no SegmentationClassAug in the voc12 data set I downloaded? Do I need to process the voc12 data set?

About training

I run with ./scripts/train_voc_aug.sh -l 1323 -g 4 -b 101 but get error：

ID 3 Warm (4) | Ls 0.51 |: 98%|█████████████████████████████████████████████████████████████████████▎ | 40/41 [01:04<00:00, 1.18it/s]
ID 3 Warm (4) | Ls 0.51 |: 98%|█████████████████████████████████████████████████████████████████████▎ | 40/41 [01:14<00:00, 1.18it/s]
ID 3 Warm (4) | Ls 0.51 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [01:14<00:00, 3.65s/it]
ID 3 Warm (4) | Ls 0.51 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [01:14<00:00, 1.81s/it]

0%| | 0/289 [00:00<?, ?it/s]wandb: Network error (ConnectionError), entering retry loop.

How can I solve this problem？Can I run without wandb?

before_start.md error？

Hi,thank you for your great work!

In Cityscapes Setting,you said that

run the scripts with

# -l -> labelled_num; -g -> gpus; -b -> resnet backbone;
./scripts/train_voc_hq.sh -l 372 -g 2 -b 50

Should it be ./scripts/train_city.sh -l 372 -g 2 -b 50 here?

How to set wandb offline？

Hello!
Can I know how to set wandb offline？ Because my school doesn't support me to use it online.
But when I set it offline according to someone else's method, it will report the following error.

wandb: Network error (ReadTimeout), entering retry loop.

Thank you very much!

pre-training weights

Hello, how can I download the pre-training weights resent101 and resent50?

How can I deploy the model to ONNX?

I use
import json, onnx, torch
from Model.Deeplabv3_plus.EntireMode import EntireModel as model_deep
path_pth = 'epoch1.pth'
path_onnx = 'model.onnx'
Input = torch.randn(1, 3, 256, 256)
config = json.load(open('VocCode/configs/config.json'))
config['model']['data_h_w'] = [256, 256]
model = model_deep(num_classes=2, config=config['model'])
checkpoint = torch.load(path_pth)
model.load_state_dict(chekpoint['state_dict'], strict=True)
torch.onnx.export(model, Input, path_onnx, verbose=True)

AttributeError: 'dict' object has no attribute 'training'

Training on Cityscape

Sorry to bother you.
I train with bash ./scripts/train_city.sh -l 372 -g 4 -b 50,but get error:

availble_gpus= [0, 1, 2, 3]
  0%|                                                                                                           | 0/93 [00:00<?, ?it/s]
  0%|                                                                                                           | 0/93 [00:05<?, ?it/s]
wandb: Waiting for W&B process to finish... (failed 1).
wandb: - 0.000 MB of 0.000 MB uploaded (0.000 MB deduped)
wandb: \ 0.000 MB of 0.000 MB uploaded (0.000 MB deduped)
wandb: | 0.000 MB of 0.000 MB uploaded (0.000 MB deduped)
wandb:                                                                                
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /homedjy/PS-MT/wandb/offline-run-20220908_195002-2iykpb0m
wandb: Find logs at: ./wandb/offline-run-20220908_195002-2iykpb0m/logs
Traceback (most recent call last):
  File "CityCode/main.py", line 199, in <module>
    main(-1, 1, config, args)
  File "CityCode/main.py", line 116, in main
    trainer.train()
  File "/home/PS-MT/CityCode/Base/base_trainer.py", line 145, in train
    _ = self._warm_up(epoch, id=1)
  File "/homedjy/PS-MT/CityCode/train.py", line 173, in _warm_up
    curr_iter=batch_idx, epoch=epoch-1, id=id, warm_up=True)
  File "/home/.conda/envs/ps-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/.conda/envs/ps-mt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

I try to fix it but no effect.I want use GPU5,6,7,8,because GPU0123 is occupied.But when print availble_gpus,it's still [0, 1, 2, 3].
I can train model on VOC in same case.
Do you have any ideas?

Training with multi-GPUs

The code works fine when I train with one gpu.
The _warm_up process works fine when using multi-gpus distributed training，but the _train_epoch process gets stuck. Gpus and cpus are still running normally. Have you encountered the same problem?

about The results on classic val set based on resnet 50 as backbone?

hi ,thank you for your nice work!
i have a question about the results on classic val set based on resnet 50 as backbone,so do you have it?
thanks a lot.

How to download pretrained model? bucket_namespace

Hello author, could you provide the download link of ResNet50 and ResNet101? I don't know the bucket_namespace and bucket_name in the download script. I downloaded two pretrained models from other place but the size dosen't match. Thanks.

city_splits

In city_splits ,the file name of gtFine is different from the name of city Raw data set.
For example,In city_splits ,
/images/city_gt_fine/val/frankfurt_000001_066574_leftImg8bit.png /annotation/city_gt_fine/val/frankfurt_000001_066574_gtFine.png
In city Raw data set ,
stuttgart_000000_000019_gtFine_color.png

How can I get a city_splits dataset？

Question on the implementation of feature perturbation

Hi,

Thanks for sharing the code. I am trying to figure out the design of feature perturbation. One questions is that, in this line of code, what is the purpose of subtracting 0.5 from the random vector? Is it simply an engineering trick or related to some theoretical considerations?

About training config

Hi,thank you for your great work!
If I use 4GPU,and batchsize=16 learning-rate=1e-2 ,will it be the best configuration?

And why you said 2GPU can get better performance rather than 4GPU?

About ImageNet pre-trained weights and ResNet architecture

Hello, thanks for sharing your excellent work!

I have questions about ImageNet pre-trained weights and ResNet architecture.

As you have already mentioned on Getting Started page, you utilized the same checkpoints as provided by the CPS.

Q1. But why did you use privately offered pre-trained weights (by CPS authors) instead of official PyTorch ImageNet resnet pretrained weight? Is there any reason for that?

Q2. Also, have you observed any final performance difference between the two versions of pre-trained weights (by CPS and Pytorch)?

Q3. Why do you use a modified version of the ResNet backbone network? It differs from the original one, AFAIK (you use deep stem resnet by Hang Zhang, not the original one). And you should have mentioned it on the paper... Is it fair to compare your results with other previous work?

Q4. Can you release the performance of your work with the original resnet backbone network (or, to say, torchvision resnet), which utilizes Pytorch-provided ImageNet pretrained weight?

Thanks.

Stop issue

Stops during learning. Do you know why?

Question about only supervised training

Hey, I was trying to train the VOC model using only supervised learning, so I changed supervised = True and semi = False in deeplabv3+ config and only gave a labeled text file but I am not able to train it. What other changes should I do?

Performance on the testing data

Hi,
I got some models and results by running './scripts/train_voc_aug.sh -l 1323 -g 4 -b 50'. How can I get the testing results on Pascal VOC? Is the valid_Mean_IoU (0.7005) same as testing result?
Run summary:
global_step 23119
learning_rate_0 1e-05
learning_rate_1 1e-05
loss_sup 0.05151
loss_unsup 0.0
mIoU_labeled 0.932
mIoU_unlabeled 0.619
pixel_acc_labeled 0.98
pixel_acc_unlabeled 0.886
ramp_up 1.0
valid_Mean_IoU 0.7005
valid_Pixel_Accuracy 0.9316

Whether the pre-trained model is necessary？

Thank you very much for your work. I chose not to load the pre-trained model, and when using 1323 VOC12 labeled examples, I found that moiu was only 0.5241 highest. Also, I would like to ask why you did not choose SOTA supervised semantic segmentation models as backbone architectures?

Question about the Conf-CE loss

Hi! Have you done the experiments between the conf-based loss and normal ce loss and seen whether the difference exists?

About the figure of average gradient magnitudes

Hi.
How to draw the figure 5 in the paper? I'm not sure whether I understand the "average gradient magnitudes" clearly. Can you explain more about that figure? Thanks!

sliding evaluation

Hello, thank you for your excellent work, I read your paper, I have a place is not very clear, is the experiment mentioned in the sliding evaluation is what it means, hope to get your reply, thank you!

iterative method

Hello, I would like to ask about the iterative method used for updating two teachers in your paper (this iteration updates the first teacher, and the next iteration updates the other teacher). I saw the explanation in your paper. It is to improve the diversity between the two teachers. I would like to ask what are the benefits of using an iterative method to update the two teachers?

About batch_size

HI! I train with ./scripts/train_voc_aug.sh -l 1323 -g 2 -b 101 ，and I use 4*V100(16G),batchsize=4,lr=0.0025. In your code,it's 2*V100(32G),batchsize=8,lr=0.0025.But miou in result is just75.29%.When I set lr=0.00125 ,the miou=76.33%,lower than you 78.20%.

I set batchsize in scripts/train_voc_aug.sh line 49：

nohup python3 VocCode/main.py --labeled_examples="${labelled}" --gpus=${gpus} --backbone=${backbone} --warm_up=5 --batch_size=4 --semi_p_th=.6 --semi_n_th=.0 --learning-rate=1.25e-3 \
--epochs=${max_epochs} --unsup_weight=${unsup_weight} > voc_aug_"${labelled}"_"${backbone}".out &

In addition, I have done experiments on Cityscapes(372), and only with batchsize set to 2 does not get error:out of GUP memory.(gpu=4*v100(16G),lr=0.0045).In your code it's gpu=2*v100(32G),lr=0.0045,batchsize=8

Can you give me some advice?

visualization

First of all, the author is really good. This is a great job. Then the rookie wants a visual script. Any help will be greatly appreciated！

my god !!!! this code is too abstract ..........

"Hey, buddy, this code seems overly complicated. I wondering why it's been encapsulated in so many layers. It's frustrating when I encounter numerous errors while trying to run it with a different dataset. Modifying it becomes quite exhausting. Why wasn't the code simplified before sharing it?"

About DDP

Hi!Sorry to bother you again.

I try to train the model,GPU memory usage as follow：
`
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|======================================================================|

| 5 N/A N/A 26598 C ...da/envs/ps-mt/bin/python3 2885MiB |
| 5 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 1387MiB |
| 5 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 1387MiB |
| 5 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 1387MiB |
| 6 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 2889MiB |
| 7 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 2889MiB |
| 8 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 2869MiB |

`
I noticed that code used DistributedDataParallel.But why the first GPU(gpu 5) uses more GPU memory?How can I solve it?

Iteration method

Hello, I would like to ask about the iteration method used when updating two teachers in your paper (update the first teacher in this iteration, and update the other teacher in the next iteration). I saw the explanation in your paper. Just to increase the diversity between the two teachers. What are the benefits of using an iterative method to update two teachers?

Does the prediction from the "running_inference" function rely solely on a single teacher model for estimation?

Hello,
I noticed some discrepancies between the code used for inference and the formula described in the research paper. I want to comfirm which one is recommended.

The "running_inference" function utilizes only encoder1 and decoder1 to generate predictions. This means that the function relies solely on teacher 1 for the prediction process.

with torch.no_grad():
    prediction = model.module.decoder1(model.module.encoder1(data),
                                       data_shape=[data.shape[-2], data.shape[-1]],
                                       req_feature=True)

conf_result = torch.softmax(prediction.squeeze(), dim=0).max(0)[0]
hard_result = torch.argmax(prediction.squeeze().squeeze(), dim=0)
false_mask = hard_result != target.squeeze()

Meanwhile, the inference formula described in the paper involve both teacher 1 and teacher 2.

CutMix for Cityscapes

Hi, thank you for your work. I have a question about the CuTMIx.

Since the batch size for each GPU is 1 when using 8GPUs (conventional setting), I wonder how to perform CutMix when only one unlabeled image is in a GPU? Can you give me some advice?