zhenzhao / augseg Goto Github PK
View Code? Open in Web Editor NEW[CVPR'23] Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation
Home Page: https://arxiv.org/abs/2212.04976
[CVPR'23] Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation
Home Page: https://arxiv.org/abs/2212.04976
你好我想请问一下,您再跑实验的时候使用的是几张卡,是什么类型的卡呀
作者您好,我在运行的时候出现如下问题:
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
请问应该如何解决
Pascal: JPEGImages | SegmentationClass
其中JPEGImages下载并解压后是
VOCdevkit
--VOC2012
--JPEGImages
--SegmentationClass
SegmentationClass下载并解压后是
SegmentationClass/
而你提供的文件路径为
├── VOC2012
├── JPEGImages
├── SegmentationClass
└── SegmentationClassAug
请问我应该把哪个重新命名为SegmentationClassAug?
作者您好,我是在A6000上两张卡跑的voc实验(每张卡占用24GB)。但是需要跑一天多,为什么这么长时间呢?我看您的就跑两三个小时。而且我的输出日志中Epoch/Iter 比您的多,每轮每个类别的test也要多一轮,这是什么原因呀! 下面是我的输出文件。
r50662.log
作者您好,非常感谢你的工作。
您在文中强调,strong augumentation的目的是产生prediction disagreement,但对为什么prediction disagreement能提升性能,没有做太多解释。
不知道我这么理解对不对:与无监督对比学习同理,在strong augmentation下,消除S-T不一致,将迫使Student网络,过滤掉被augmentation破坏的低层信息(如色彩、纹理等),而专注于提取语义信息。
希望作者解答一下,感谢!
BTW,arxiv版论文的公式4、5,theta_s和theta_t似乎是写反了?
您好!感谢您的出色工作!
我在使用源代码复现时,发现resnet-101下voc fine 92labeled配置只跑出了63.5的MIoU,显著低于原论文汇报的71.09,我使用的config是training log下相同实验的yaml文件,请问是哪里出了差错吗?另外,同样方式183labeled设置是可以复现结果的。
谢谢!
Hi, this is great work, and I'm excited to try it out! Would it be possible to add a LICENSE to this codebase?
Yo, I tried running your project. I am sad to tell you you have been diagnosed with extreme noobiosis. Yeah, now don't go about checking your little english to chinese dictionary to find this word, you won't get anything! Such NOOBS! Absolutely outrageous, man.
Tell me, you people ever heard of something called a requirements.txt
file? Huh??? Ever? I had to install every single dependency waiting on errors for Module not Found
... and even after doing all of that, it throws an error. Wait, I'll show you:
$ sh ./single_run.sh
./single_run.sh: 4: source: not found
/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
[2024-03-14 19:49:39,700] torch.distributed.run: [WARNING]
[2024-03-14 19:49:39,700] torch.distributed.run: [WARNING] *****************************************
[2024-03-14 19:49:39,700] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-03-14 19:49:39,700] torch.distributed.run: [WARNING] *****************************************
2024-03-14 19:49:41.386765: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 19:49:41.435739: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-14 19:49:41.476173: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 19:49:41.512059: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 19:49:41.523980: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-14 19:49:41.561311: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-14 19:49:41.573819: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 19:49:41.623285: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-14 19:49:42.237202: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-14 19:49:42.326103: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-14 19:49:42.379277: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-14 19:49:42.600367: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: train_semi.py [-h] [--config CONFIG] [--local_rank LOCAL_RANK] [--seed SEED] [--port PORT]
train_semi.py: error: unrecognized arguments: --local-rank=0
usage: train_semi.py [-h] [--config CONFIG] [--local_rank LOCAL_RANK] [--seed SEED] [--port PORT]
train_semi.py: error: unrecognized arguments: --local-rank=2
usage: train_semi.py [-h] [--config CONFIG] [--local_rank LOCAL_RANK] [--seed SEED] [--port PORT]
train_semi.py: error: unrecognized arguments: --local-rank=1
usage: train_semi.py [-h] [--config CONFIG] [--local_rank LOCAL_RANK] [--seed SEED] [--port PORT]
train_semi.py: error: unrecognized arguments: --local-rank=3
[2024-03-14 19:49:49,716] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 2) local_rank: 0 (pid: 3248971) of binary: /DATA2/dse316/grp_007/.venv/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launch.py", line 198, in <module>
main()
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launch.py", line 194, in main
launch(args)
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launch.py", line 179, in launch
run(args)
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/DATA2/dse316/grp_007/.venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-03-14_19:49:49
host : pragyan
rank : 1 (local_rank: 1)
exitcode : 2 (pid: 3248972)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-14_19:49:49
host : pragyan
rank : 2 (local_rank: 2)
exitcode : 2 (pid: 3248973)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-03-14_19:49:49
host : pragyan
rank : 3 (local_rank: 3)
exitcode : 2 (pid: 3248974)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-03-14_19:49:49
host : pragyan
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 3248971)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Not entirely unintelligible gibberish or gobbledygook. You, being such fantastic researchers might have faced this issue millions of times! Well, I hope you have, but not very imminent, as the sorry state of affairs I observe in your repository tells me. Is this code or burnt 5 day old mix spaghetti?
Now, if a single one of you have any brain cells left in your sorry little cranium, would you do as much as help me run your project?
Give me a line by line guide on how to run it. Line. By. Line.
Bro look I have a project to submit by the end of this month and I seriously don't get why you would provide such obscure documentation and code for this paper. Sure, your paper might be good, but were you able to implement it on your own machine first of all? That aside, help from your side would be highly appreciated.
Cheers
作者您好,非常感谢你的工作。我发现您的iters好像较其他的工作更多一些,所以这是否会收敛的更快一些,我跑别的代码大概4 50轮才收敛,您的大概的20多轮就收敛了,这是否是因为iters更多的原因
你好,请问使用了Adaptive cutmix 之后出现了效果上升,但是无监督loss一直震荡的现象,请问作者您当时是怎么避免的
Hi Zhen,
Thanks for your work and the code.
The links of pretrained checkpoints (for both resnet50 and resnet101) are incorrect, could you please have a look?
I've tried the checkpoints from CPS (and U2PL), but I've got some unexpected_keys which seem don't appear in your training log.
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
Cheers,
Yuyuan
看到這篇論文的想法覺得非常有趣,是否可以使用這篇論文的方法改為半監督物件偵測呢!
$ sh ./single_run.sh >> "error.txt"
./single_run.sh: 4: source: not found
/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 71, in main
dist.barrier()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2784, in barrier
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 71, in main
dist.barrier()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2784, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, invalid usage, NCCL version 2.10.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, invalid usage, NCCL version 2.10.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 71, in main
dist.barrier()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2784, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 71, in main
dist.barrier()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2784, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3317586 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3317587 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3317584) of binary: /home/dse316/miniconda3/envs/grp_007/bin/python
Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-03-19_17:20:25
host : pragyan
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 3317585)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-03-19_17:20:25
host : pragyan
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 3317584)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
augseg/exps/zrun_citys/citys_semi744/config_semi.yaml
Excellent work!
Could you also provide link for Downloading SegmentationClassAug ?
Thank you so much
您好,感谢您卓越的工作,我在阅读您的代码时,发现以下几行代码,不太理解,为什么要更新教师网络的buffer
for buffer_train, buffer_eval in zip(model.buffers(), model_teacher.buffers()):
buffer_eval.data = buffer_eval.data * ema_decay + buffer_train.data * (1 - ema_decay)
希望能得到您的回复,谢谢
最后使用的结果值是用teacher验证的MIOU,那是不是说明保存的checkpoint不应该是student而应该是teacher?
作者你好!能麻烦你提供一下你的测试代码吗?我想对我自建的数据集进行测试。
作者你好,我在我的数据集上运行了你的代码,实验结果很不错。但我有一个问题,就是在cut_mix_label_adaptive函数中,为什么要用到两次cutmix?这个本质不就是将一个更小的labeled区域转移到unlabeled吗?感觉用一次cutmix就行了。有点不太理解,希望作者解答一下,感谢!
您好,请问为什么不比较CPS在cityscapes上的最好结果?
你好,本文的Adaptive Label-aided CutMix一共有三步,第二步已经利用pi进行了一次cutMix,为什么还又在第三步再cutMix,这样设计是有什么想法吗
作者您好,最近我在您的项目上进行修改学习,目前在pascal voc 2012 数据集上的效果良好,可是在Cityscapes数据集上的效果很差,我仔细研究了您所展示的配置文件以及实验日志,但是目前问题也没有解决,于是想来问一下关于这个数据集上有什么需要注意的地方?首先在这个数据集上损失计算使用了ohem损失,在计算损失中没有用到辅助损失以及类别的权重。然后我注意到同样batch×gpu数目下这个数据集的学习率扩大了10倍,在训练轮次上也调整到了240轮,类别数目为19。除此之外,在评估过程中使用了滑动窗口评估。注意到了这些问题后,我的模型在cityscapes数据集上效果还是很差,目前这个阶段遇到这个问题很困惑不知如何解决。可以给我一些建议吗?如果可以的话真的非常感谢!
Hello there!
I am trying to reproduce the results of your publication for my course project. However, I think there is some issue with the "Pascal: JPEGImages | SegmentationClass" data set. It keeps on giving the error "File not found". The complete error has been provided below:
FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[2024-04-13 19:40:45,183][INFO] - {'criterion': {'kwargs': {'use_weight': False}, 'type': 'CELoss'},
'dataset': {'ignore_label': 255,
'mean': [0.485, 0.456, 0.406],
'n_sup': 662,
'std': [0.229, 0.224, 0.225],
'train': {'batch_size': 8,
'crop': {'size': [513, 513], 'type': 'rand'},
'data_list': './data/splitsall/pascal_u2pl/662/labeled.txt',
'data_root': './data/VOC2012',
'flip': True,
'rand_resize': [0.5, 2.0],
'resize_base_size': 500,
'strong_aug': {'flag_use_random_num_sampling': True,
'num_augs': 3}},
'type': 'pascal_semi',
'val': {'batch_size': 1,
'data_list': './data/splitsall/pascal_u2pl/val.txt',
'data_root': './data/VOC2012'},
'workers': 4},
'exp_path': './exps/zrun_vocs_u2pl/voc_semi662',
'log_path': './exps/zrun_vocs_u2pl/voc_semi662/log',
'net': {'decoder': {'kwargs': {'dilations': [6, 12, 18],
'inner_planes': 256,
'low_conv_planes': 48},
'type': 'augseg.models.decoder.dec_deeplabv3_plus'},
'ema_decay': 0.999,
'encoder': {'kwargs': {'multi_grid': True,
'replace_stride_with_dilation': [False,
False,
True],
'zero_init_residual': True},
'pretrain': './pretrained/resnet101.pth',
'type': 'augseg.models.resnet.resnet101'},
'num_classes': 21,
'sync_bn': True},
'save_path': './exps/zrun_vocs_u2pl/voc_semi662/checkpoints',
'saver': {'pretrain': '', 'snapshot_dir': 'checkpoints', 'use_tb': False},
'trainer': {'epochs': 80,
'evaluate_student': True,
'lr_scheduler': {'kwargs': {'power': 0.9}, 'mode': 'poly'},
'optimizer': {'kwargs': {'lr': 0.001,
'momentum': 0.9,
'weight_decay': 0.0001},
'type': 'SGD'},
'sup_only_epoch': 0,
'unsupervised': {'flag_extra_weak': False,
'loss_weight': 1.0,
'threshold': 0.95,
'use_cutmix': True,
'use_cutmix_adaptive': True,
'use_cutmix_trigger_prob': 1.0}}}
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[2024-04-13 19:40:55,377][INFO] - # samples: 662
[2024-04-13 19:40:55,390][INFO] - # samples: 9920
[2024-04-13 19:40:55,396][INFO] - # samples: 1449
[2024-04-13 19:40:55,396][INFO] - Get loader Done...
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth'
missing_keys: []
unexpected_keys: ['fc.weight', 'fc.bias']
[2024-04-13 19:40:58,584][INFO] - -------------------------- start training --------------------------
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
res_loss_sup, res_loss_unsup = train(
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
_, image_u_weak, image_u_aug, _ = loader_u_iter.next()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
label = self.img_loader(label_path, "L")
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_006330.png'
Traceback (most recent call last):
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
main(args)
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
res_loss_sup, res_loss_unsup = train(
File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
_, image_u_weak, image_u_aug, _ = loader_u_iter.next()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
label = self.img_loader(label_path, "L")
File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_000085.png'
Exception in thread Thread-1 (_pin_memory_loop):
Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
fd = df.detach()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 752, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 686256) of binary: /home/dse316/miniconda3/envs/grp_007/bin/python
Traceback (most recent call last):
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-04-13_19:41:03
host : pragyan
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 686257)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-04-13_19:41:03
host : pragyan
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 686256)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Anticipating a positive response. Please cross check the data source file contains all the files and the link on the GitHub repository is correct.
作者你好,项目中提供的resnet50和resnet101的权重是自己在ImageNet训练的吗?
I appreciate your excellent work.
I want to replicate the experiments described in the paper now. I have already tried the semi-supervised learning experiment since the code exists, but I haven't been able to attempt the supervised learning part as the code is not available. I'm curious about the differences you made between the semi-supervised and supervised learning experiments. Also, I would like to receive your feedback on writing code for supervised learning. Thank you.
Hi! I just read the paper of AugSeg and find a statement "Since U2PL prioritizes selecting high-quality labels from classic VOCs for testing on blender VOC, we reproduce the supervised baseline and its performance on ResNet-50 for fair comparisons". Could I ask what does it mean? Where has indicated the fact that U2PL prioritizes selecting high-quality labels? Thanks a lot!
你好!想问一个关于论文技术的问题,就是Adaptive CutMix-based augmentations这一个过程你是怎么思考并设计的?如果我是作者,我需要怎样才能自然地设计出这个组件呢?谢谢!
我在实验过程中发现您的代码中输出的stu的iou比tea高,请问您有出现这种情况吗?
您好,我想问一下您这个项目是否默认为单机单卡训练,如果是分布式多机训练的话,能否改为普通单机单卡训练呢,具体的操作如何实现?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.