xialipku / emanet Goto Github PK
View Code? Open in Web Editor NEWThe code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)
License: GNU General Public License v3.0
The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)
License: GNU General Public License v3.0
(base) davis@davis-MS-7B17:~/Network/EMANet-master$ python train.py
2019-08-31 13:50:14,703 - INFO - set log dir as ./logdir
2019-08-31 13:50:14,703 - INFO - set model dir as ./models
2019-08-31 13:50:17,131 - ERROR - No checkpoint ./models/latest.pth!
The Training step is stopped, so I have to Keyboard Interrupt it...
Does anybody know how to solve it?
Hey, I found the ValDataset used padding for both image and label.
#image, label = pad_inf(image, label)
def pad_inf(image, label=None):
h, w = image.size()[-2:]
stride = settings.STRIDE
pad_h = (stride + 1 - h % stride) % stride
pad_w = (stride + 1 - w % stride) % stride
if pad_h > 0 or pad_w > 0:
image = F.pad(image, (0, pad_w, 0, pad_h), mode='constant', value=0.)
if label is not None:
label = F.pad(label, (0, pad_w, 0, pad_h), mode='constant',
value=settings.IGNORE_LABEL)
return image, label
Can you show the reasons for doing so?
The moving averaging operation can also be writtern in EMAU class?
Hi @XiaLiPKU ! Your work is amazing, and i am appreciate that you have released your code.
May i ask question? Based on your code, i have modified your code to suit COCO-STUFF training. But i can only get 34.55% miou. I just followed your default setting, but in single gpu. ( pretrained ResNet-101, batch size 3, 30k iterations and so on...).
Looking forward to your reply!
Best wishes
In my opinion, besides T, the selection of K is also important (like in GMM or k-means). I didn't see any ablation study on the effect of different K's, did you do some experiments?
Intuitively, I have the impression that mu represents different features for different classes, so the first K I would try is the number of classes (e.g. 19 for Cityscapes). Can you explain how you decide to use K=64?
As the visualization of responsibility shows, different z's tend to represent different classes, so won't it happen that having K>number of class makes some z's be actually close to each other, making them eventually redundant?
Thanks.
Thank you!
Hi, first of all thanks for your paper.
You mention that for some nets the stride is 16 while for other 8. However, there is nothing on how do you recover it back to the original size. Do you use bi-linear upsampling? If yes, don't have a problem with borders and fine structures for using such a steep upsampling method?
Hi,
Thanks for providing the pre-trained ResNet50 and ResNet101 models.
Do you have the pre-trained ResNet18 model that replaces the first 7x7 Conv to three 3x3 Conv?
I have surfed it for a long time but unfortunately, I didn't find it. If you have saved this model, could you please share it with me?
Many thanks in advance.
It seems that ’mu‘ will be updated during evaluation, in other worlds, it will record some information in test set? It's ok or should be banned?
Hi,
Can you provide the pretrained ResNet152 model?
Thanks!
I only find you put the model in gpu......thx
When I click the link, there comes a problem:
'This XML file does not appear to have any style information associated with it. The document tree is shown below.'
How to solve this?
(base) pf@pf-System-Product-Name:~/EMANet$ python train.py
2019-12-06 21:37:49,527 - INFO - set log dir as ./logdir
2019-12-06 21:37:49,528 - INFO - set model dir as ./models
Traceback (most recent call last):
File "train.py", line 181, in
main()
File "train.py", line 146, in main
sess = Session(dt_split='trainaug')
File "train.py", line 93, in init
self.net = DataParallel(self.net, device_ids=settings.DEVICES)
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 131, in init
_check_balance(self.device_ids)
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 18, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 18, in
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py", line 301, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id
Excuse me, I can not find the grad back to the CONV1. Are there some bugs?
ha ha
Without visualization results, it is difficult for us to understand the paper。If you are free,you can do it!
using the pretrained model throw out this error:
RuntimeError: Error(s) in loading state_dict for EMANet:
Missing key(s) in state_dict: "extractor.4.0.conv1.weight", "extractor.4.0.bn1.weight", "extractor.4.0.bn1.bias", "extractor.4.0.bn1.running_mean", "extractor.4.0.bn1.running_var", "extractor.4.0.conv2.weight", ......
Unexpected key(s) in state_dict: "layer1.0.0.conv1.weight", "layer1.0.0.bn1.weight", "layer1.0.0.b
someone can help?
Hello,
Thank you for publishing the code to your excellent work.
I was wondering how long it takes to train the EMANet with a Resnet-101 backbone - both for when the number of input channels is 256 and 512? How many GPUs did you use to achieve this training time?
Thank you in advance :)
How to compute the FLOPs and parameters of your EMA module?
Could you please share the computing details? Thanks!
I have tried tools like thop (https://github.com/Lyken17/pytorch-OpCounter) yet the results are significantly different from yours. So could you please explain how did you calcualte FLOPs in details?
in train.py line 134 and 135
self.net.module.ema.mu *= momentum self.net.module.ema.mu += mu * (1 - momentum)
it maybe like this
self.net.module.emau.mu *= momentum self.net.module.emau.mu += mu * (1 - momentum)
Line 19 in 9a492d8
I support this line should be
image = (image - settings.MEAN) / settings.STD.
Or is this line a trick?
Hello,
I can't seem to be able to reproduce the ablation study results in figure 3, 4 of the ICCV paper. When trained and evaluated on an iteration number of 3 (T_train = T_eval = 3), my final mIOU is 76.04%, which is 2.48% much less than the result shown in figure 4 (78.52%).
I used the default settings in settings.py
except the following:
Furthermore, my Pillow version is 6.1.0 and my cv2 version is 3.4.2, unlike the version used by the authors.
Is it possible that using a single GPU to train EMANet results in such a significant decrease in the mIOU (possible due to the use of synchronized batchnorm?) or could using a different version Pillow / cv2 be the root cause of this problem?
Thanks in advance :)
You note "All results are achieved with the backbone ResNet-101 with output stride 8". Therefore, why the parameters and FLOPs of EMANet are substantially less than the backbone (ResNet-101)? Taking EMANet512 as an example, it contains 10M parameters and 43.1G FLOPs. However, the backbone (ResNet-101) network totally contains 42.6M parameters and 190.6G FLOPs. Are there some errors in this place?
I meet the same problem as #22 , would you please provide you PIL and cv2 version?
HI, Thanks for the great repo. I can not get latest.pth in training. What should I do?
error:
ERROR - No checkpoint ./models/latest.pth!
Thank you.
In the training process, where can I tell the specific category of each pixel of the training data? There's only object’s marginal information?
hi, I submitted the results of the val set and the test set to the official website for testing, but the two results differ by four points. How can I reduce this gap.
I add other block to replace EMAU, but get some warning. I guess it's bn_lib you used not suitable for my block.
2020-07-30 20:05:29,586 - INFO - step: 2 loss: 2.398 lr: 0.009`
Hi,
How can I put the net output to a Image?
And how to consider the gradient backpropgation in your implement?
Thanks for your reply!!!
According to your ground truth,I made the ground truth of my dataset .But during the training, there was a problem,which I've compiled below. Emmmm, Can you help me? Maybe my dataset is too messy, and their boundaries are not obvious.What advice would you offer to me?
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:564)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f5345247441 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f5345246d7a in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x13652 (0x7f534261a652 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x50 (0x7f5345237ce0 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #4: + 0x30facb (0x7f52f071aacb in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: + 0x376d60 (0x7f52f0781d60 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #6: + 0x3128ea (0x7f52f071d8ea in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #7: torch::autograd::deleteFunction(torch::autograd::Function*) + 0xa2 (0x7f52f071d9a2 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0xa2 (0x7f5330b81bb2 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x14216b (0x7f5330ba516b in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #10: + 0x1421d9 (0x7f5330ba51d9 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: torch::autograd::Variable::Impl::release_resources() + 0x1b (0x7f52f0d5708b in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #12: + 0x1420bb (0x7f5330ba50bb in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #13: + 0x3c30f4 (0x7f5330e260f4 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #14: + 0x3c3141 (0x7f5330e26141 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #15: + 0x19aa5e (0x55791a64ba5e in /home/r/.conda/envs/pytorch/bin/python3)
frame #16: + 0xf1b77 (0x55791a5a2b77 in /home/r/.conda/envs/pytorch/bin/python3)
frame #17: + 0xf1a07 (0x55791a5a2a07 in /home/r/.conda/envs/pytorch/bin/python3)
frame #18: + 0xf1a1d (0x55791a5a2a1d in /home/r/.conda/envs/pytorch/bin/python3)
frame #19: + 0xf1a1d (0x55791a5a2a1d in /home/r/.conda/envs/pytorch/bin/python3)
frame #20: PyDict_SetItem + 0x3da (0x55791a5e963a in /home/r/.conda/envs/pytorch/bin/python3)
frame #21: PyDict_SetItemString + 0x4f (0x55791a5f065f in /home/r/.conda/envs/pytorch/bin/python3)
frame #22: PyImport_Cleanup + 0x99 (0x55791a655d89 in /home/r/.conda/envs/pytorch/bin/python3)
frame #23: Py_FinalizeEx + 0x61 (0x55791a6c0231 in /home/r/.conda/envs/pytorch/bin/python3)
frame #24: Py_Main + 0x35e (0x55791a6ca57e in /home/r/.conda/envs/pytorch/bin/python3)
frame #25: main + 0xee (0x55791a59488e in /home/r/.conda/envs/pytorch/bin/python3)
frame #26: __libc_start_main + 0xf0 (0x7f5348fdd830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #27: + 0x1c3160 (0x55791a674160 in /home/r/.conda/envs/pytorch/bin/python3)
Line 200 in f7d7b47
Hi, @XiaLiPKU ,
Could you also provide the code for visualizing on attention maps, like the responses Z in Fig.5?
THX!
i am puzzled with the bn layer,in your code ,u did not use torch.nn.batchnorm2d ,What's the difference between the torch.bn with synchronizedbn2d
Thanks for releasing the code.
Where is the pretrained model 'Resnet152'?
I am looking forward to your reply.
Dear XiaLiPKU,
I clone your codes and followed the step to train in my servers, but there were just three lines:
2020-03-13 21:29:17,166 - INFO - set log dir as ./logdir
2020-03-13 21:29:17,166 - INFO - set model dir as ./models
2020-03-13 21:29:19,127 - ERROR - No checkpoint ./models/latest.pth!>
I know that error will not influence my training process. but there were no models saved in the ./models and when I run "sh tensorboard.sh", there was nothing. It seems that the training process was stopped. I just replace obj.cuda(async=True) with obj.cuda(non_blocking=True), then I didn't change any codes. Could you help me?
Thanks!
在使用EMANet跑VOC数据集和自己的数据集,最后的分割边缘都有明显的锯齿状,请问你们的结果也是这样的吗?
Hi @XiaLiPKU ! This work is wonderful and thanks so much for releasing the code.
May I ask a question? I used your pretrained model to evaluate on val set and got 80.50% mIoU using single-scale test, but when I trained this model from scratch, I can only get 79.44% finally, which is supposed to be 80.05%.
I just followed your default settings(using pretrained ResNet weights, batch size 16,4 gpus, 30k iterations and so on...).
Are there any other techniques special you adopted to get this final model?
Looking forward to your reply!
Hello,
I would like to ask the authors why does EMAnet suffer from the vanishing / exploding gradient inherent in RNNs even though the EM iterations are unrolled only for a small number (in this case 3) of steps? Vanilla RNNs with with tanh non-linearities can typically work on sequences on the order of 100 time steps, and LSTMs can work on sequences on the order of 1000 time steps.
Since the mIOU peaks at a very small value of T_train, is vanishing / exploding gradients really the reason that the mIOU deteriorates for higher values of T_train (>3)? Have the authors by any chance printed the gradient norms of every layer to check for vanishing or exploding gradients?
Thank you in advance.
When running eval.py only appears mIoU, what should I do to get the segmentation maps?
Dear authors, is the model file named 'final.pth' trained on PASCAL VOC database?
Hi, thank you for releasing the code for EMANet. I find a difference between the code and the paper. The difference lies in the formulation of Equation 13 (in the paper). In the paper, the M step (bases reconstruct) is formulated as follows:
image
However, in the code, the M step is formulated as:
mu = torch.bmm(x, z_)
Actually, mu = torch.bmm(x, z_) is the weighted summation of X. However, Equation 13 (in the paper) is not the weighted summation of X. Anything wrong in the paper?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.