hieuphan33 / reminder Goto Github PK
View Code? Open in Web Editor NEWClass Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation
License: GNU General Public License v3.0
Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation
License: GNU General Public License v3.0
Thanks for uploading the code. Recently Im trying to run the code .Below is my command:
python -m torch.distributed.launch --nproc_per_node=1 run.py --data_root data --batch_size 8 --dataset voc --name REMINDER --task 15-5s --step 0 --lr 0.01 --epochs 30 --method REMINDER
And I met the following errors.:
epoch_loss = trainer.train(
File "train.py", line 214, in train
model.module.in_eval = False
File "/home/ddd/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'IncrementalSegmentationModule' object has no attribute 'module'
Seems the problem is Im using the single GPU. I found your document noted that I need to rename the 'key' with 'key[:7]', but I also notice that in the file 'segmentation_module.py', line: 38, there already has the code 'key[:7]'.
Could you please give me any details about the modification ? Thank you very much !
Hi, great work! I have some questions about the Class similarity weighted knowledge distillation.
As shown in eq. 8, (1). is it assigned the weight to output when the label belongs to the new classes?
(2). when the pseudo label belongs to the old classes and the similarity s is large than the threshold, is it reweight the output O_{i,v}^{t-1}?
(pytorch) tao@tao:/media/tao/新加卷/Osman/REMINDER-main$ /bin/bash /media/tao/新加卷/Osman/REMINDER-main/train_voc_19-1.sh
voc_19-1_REMINDER On GPUs 0,1 Writing in results/2023-12-24_voc_19-1_REMINDER.csv
Begin training!
Begin training!
Learning for 1 with lrs=[0.01].
Learning for 1 with lrs=[0.01].
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
/home/tao/anaconda3/envs/pytorch/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/init.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use PyTorch AMP
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
(pytorch) tao@tao:/media/tao/新加卷/Osman/REMINDER-main$
Hello,
In pascalVOC dataset, 0th class is background.
Does REMINDER make prototype for 0th class too?
Also, while calculate the similarity, does new class calculate with the 0th class?
If not, I want to know the part of excluding the 0th class in your code.
Thank you~
In the sh files for ade dataset, (reminder_100_50.sh, reminder_50.sh, reminder_100-10.sh, reminder_100-10.sh)
METHOD is set as FT, not the REMINDER.
( METHOD = FT )
maybe errata?
Hi @HieuPhan33, I have tried completely removing the additional KD loss, i.e. CSW-KD removed
. The results shown below are still higher than UNKD and Knowledge Distillation in Table 5. Therefore, I doubt the authenticity of the ablation in Table 5. It's unlikely to get such low results since the intermediate layer KD as well as the final small-scale output KD proposed by PLOP are still in place.
0-15 | 16-20 | all | |
---|---|---|---|
Reported | 68.30 | 27.23 | 58.52 |
Reproduced | 66.37 | 27.03 | 57.00 |
CSW-KD removed | 61.97 | 27.87 | 53.85 |
I directly ran with the script reminder_15-1.sh, but obtained lower results on old classes. I can reproduce 15-5 and 19-1 though. Looking forward to your advice.
0-15 | 16-20 | all | |
---|---|---|---|
Reported | 68.30 | 27.23 | 58.52 |
Reproduced | 66.37 | 27.03 | 57.00 |
Hi, I tried to reproduce the ADE20k results as below. I reuse the step 0 checkpoint I train with PLOP (mIoU 41.98), as I think REMINDER shares the same training setup with PLOP for step 0. I attach the script reminder_100-50.sh and reminder_100-10.sh I use for your reference. I notice that you use the linear lr scheduling strategy when you scale down the batch size to 10 per GPU. For consistency, I adopt this strategy across all settings with the lr 0.0008. Additionally, I train the script with 2 GPUs as PLOP does. Hope for your advice.
100-50 | 100-10 | 50-50 | |||||||
---|---|---|---|---|---|---|---|---|---|
0-100 | 101-150 | all | 0-100 | 101-150 | all | 0-50 | 51-150 | all | |
Reported | 41.55 | 19.16 | 34.14 | 38.96 | 21.28 | 33.11 | 47.11 | 20.35 | 29.39 |
Reproduced | 41.91 | 16.10 | 33.36 | 38.43 | 15.56 | 30.86 | 45.56 | 18.49 | 27.63 |
line 26, in train.py, we can see:
seg = seg.view(-1) features = features.transpose(1, 3).contiguous().view(B * H * W, -1) for c in classes: selected_features = features[seg == c]
As you know, before I run these lines, the dimension of seg is [B, H, W], and features is [B, C, H, W]. After I run '.transpose(1, 3)', seg is flatten to [BHW], but features is [BWH, C], is that correct? I think selected features do not correspond to seg one to one, because the shape of features is [B, W, H, C] not [B, H, W, C] after '.transpose(1,3)'. Dose transpose(1, 3) should be replaced by permute([0, 2, 3, 1])?
Hi, I tried to run without your proposed csw loss by removing --csw_kd ${LOSS} --delta_csw 1.0
. Strangely, I obtained almost the same results on ADE20k across different settings. Could you advice?
When I ran the 15-1 training code, I got the "NaN in prototype" error. Temporally I have no idea to solve it. Does anyone meet this problem ?
When I run the REMINDER/scripts/ade/reminder_100-10.sh
I got the error like below:
Traceback (most recent call last): File "run.py", line 585, in <module> main(opts) File "run.py", line 156, in main val_score = run_step(opts, world_size, rank, device) File "run.py", line 427, in run_step logger=logger File "/root/REMINDER/train.py", line 362, in train loss = criterion(outputs, labels) # B x H x W File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 948, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.6/site-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, **kwargs) File "/usr/local/lib/python3.6/site-packages/torch/nn/functional.py", line 2422, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.6/site-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, **kwargs) File "/usr/local/lib/python3.6/site-packages/torch/nn/functional.py", line 2220, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [10, 512, 512, 3]
When I checked the shape, labels.shape = [10, 512, 512, 3] but outputs.shape = [10, 101, 512, 512]
How can I matched the same shape with labels and outputs?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.