visual-attention-network / segnext Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)
License: Apache License 2.0
Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)
License: Apache License 2.0
hi, thanks for your great job!
i would like to ask how can i do distributed training ?
thanks so much.
“we use batch normalization instead of layer normalization as we found batch normalization gains more for the segmentation performance.”
I can not understand why,I think maybe it is because you substitute self-attentin for Conv。But I am not sure
when i try "image_demo.py" an error occurred.
palette.shape =[19,3]
self.CLASSES=150
how do i fix it?
Hi, can you provide ImageNet pretraining configs (e.g. resolution, epochs, training tricks)? They are important for a fair comparison.
VAN-Small + Light-Ham-D256 with 15.8GFlops and 13.8M Params achieves 45.2mIoU on ADE20K: here
MSCAN-S + Light-Ham-D512 with 16.0GFlops and 14.0M Params achieves 44.3mIoU on ADE20K
VAN-Base + Light-Ham with 34.4GFlops and 27.4M Params achieves 49.6mIoU on ADE20K: here
MSCAN-B + Light-Ham with 35.0GFlops and 28.0M Params achieves 48.5mIoU on ADE20K
What is the need of MSCAN backbone? The paper explain that "Though VAN has achieved great performance in image classification, it neglects the role of multi-scale feature aggregation during the network design, which is crucial for segmentation-like tasks", however that's vanilla VAN without Light-Ham decoder
if use pad,please modify whole_inference in encode_decoder.py line 209
resize_shape = img_meta[0]['img_shape'][:2]
seg_logit = seg_logit[:, :, :resize_shape[0], :resize_shape[1]]
size = img_meta[0]['ori_shape'][:2]
Hi authors,
In MSCA, the learned multi-scale features are aggregated via adding opt rather than concatenating. Is this for the purpose of achieving a better trade-off between performance and complexity? (concat may be more expensive in terms of GPU memory or slower speed).
i train my own datasets, and download the pretrained_weights of 'mscan_b.pth', can you explain to put the weights_file to which place? Thanks very much! my command is : python /tools/train.py local_configs/segnext/base/segnext.base.512x512.tusimple.160k.py --load-from pretrained/mscan_b.pth
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
The SegNeXt is a great job to use multiscale/large convolution kernel to mimic the "attention". I was wondering if it's possible to export your model to the ONNX or the tensorrt? The tensorrt will speed up 2x to 10x than original pytorch, that will be very helpful for us in video prediction.
Your model is based on mmsegmentation which already allows model deployment. I see your code, but wonder some operation may not be supported by ONNX or tensorrt, such as 'unsqueeze' / 'flatten'.
Bests
Disclaimer: I am not a lawyer and this is not legal advice. Please contact an attorney for any licensing questions, this is just my understanding and is only for informational purposes.
First of all, thank you for publishing your paper and sharing this code!
The License section in the README.md
says:
This repo is under the Apache-2.0 license. For commercial use, please contact the authors.
As you can see in https://choosealicense.com/licenses/apache-2.0/ (created by GitHub), the Apache 2.0 license already allows commercial use:
Permissions
- Commercial use
- [...]
and it does not require asking for permission or notifying anyone of commercial use. If you're asking folks to let you know (as an FYI), that's fine, but that would be optional and up to them, and does not belong in the "license" section, as it's not required as part of Apache 2.0 license.
If you want to prevent commercial use without some sort of fee or custom license, then this is not open-source, and Apache 2.0 is perhaps not an appropriate license for your project, so you may want to choose something else, but it may affect who will be able and willing to contribute to a repo that is under a non-open-source license.
In any case, please reach out to an attorney to discuss your questions and your options. As it stands, this is confusing to readers of this project, as "Apache 2.0 license" + "talk to us before commercial use" seems to be conflicting and hence confusing.
Hope this helps.
Best of luck with your project!
Thank you for your job, I have a question about code. In backbones/mscan.py file, there is a "shortcut" operation in SpatialAttention module but there is a "drop_path" in Block module, It should work the same as “shortcut”。 The first shortcut operation I didn't find in the paper。
Excuse me, can you tell me the performance results of segnext on the validation set of pascal voc? I found that both PSPNet and Deeplabv3 + have 85+ mIoU on the test set, but only 80 on the validation set (according to the mmsegmentation repository), so I want to confirm whether the result on the test set will be higher than that on the validation set ?
SegNeXt/mmseg/models/backbones/mscan.py
Line 175 in 1e51c8a
It will raise size mismatch when load mscan-large reference mscan-large-config
I have the original input feature 830(H)x1280(W), but find seg_logits is downsampled to 1024(Channel)x104(H)x160(W) feature map in ham_head. It's too coarse.
你可以看到 在分割时,动物的边界不是很清晰。这可能是降采太多导致。希望提供指导。
Plus, you will see too many background occuping the image, which is hard to optimize the other class-segmentation. How to optimize the model to overcome this issue. For example the class_weight
?
# tools/dist_train.sh segnext.large.ratmetric.py 4
# python tools/train.py segnext.large.ratmetric.py
_base_ = [
'local_configs/segnext/large/segnext.large.512x512.coco_stuff164k.80k.py'
]
num_classes = 3
# load_from = None
load_from = 'work_dirs/segnext.large.ratmetric/latest.pth'
model = dict(
backbone=dict(init_cfg=dict(type='Pretrained', checkpoint='pretrained/segnext_large_512x512_ade_160k.pth')),
decode_head=dict(
num_classes=num_classes,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, class_weight=[1.0/50, 1.0, 1.0], loss_weight=1.0))
)
runner = dict(type='IterBasedRunner', max_iters=6400)
checkpoint_config = dict(by_epoch=False, interval=800)
evaluation = dict(interval=800, metric='mIoU')
data_root = 'data_rat_metric'
img_dir='images'
ann_dir='annotations'
img_wh = (1280,832)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='Resize', img_scale=img_wh, ratio_range=(0.7, 1.5)),
dict(type='RandomCrop', crop_size=img_wh[::-1], cat_max_ratio=1.0, ignore_index=0),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=img_wh[::-1], pad_val=0, seg_pad_val=0),
# dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=img_wh,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=4,
train=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=train_pipeline),
val=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=test_pipeline),
test=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=test_pipeline))
thanks!
The params which you exhibit in the tables are 4M, 14M, 28M, 49M (T, S, B, L).But the size of pertained models in the "TsingHua Cloud" are about 50, 168, 322, 543 MB(T, S, B, L).
I think the size of pretrained model should equal to (params x 4B), because one parameter saved as a float32 number which occupy 4 Bytes in memory. So the parameters inferred from pertained models should be **12M, 44M, 80M, 134M.**
Maybe I misunderstand the rules of memory calculation by params. **Or could you explain?**
In codes,
inputs = torch.cat(inputs, dim=1)
SegNeXt/mmseg/models/decode_heads/ham_head.py
Line 233 in c87bcae
In paper,
our decoder only receives features from the last three stages
can you tell me what happened
Hi, I have a dataset where I am detecting primarily just one object and I wish to create a mask image of that object. The object would appear white against a black background.
I have organised it in
SegNeXt/data/
images/
annotations/
Currently your README has the following information for training:
./tools/dist_train.sh /path/to/config 8
Any pretrained models for me to edit and retrain on the new dataset?
hi,i'm sorry to bother you but
when i use your local_config(default setting) to train neither using cityscape nor ade20k can reproduce your results. Do you have any idea about this?
When i directly use the pretrained .pth file that you provide,i can get the result 79.25%miou on cityscapes. But when I train the model locally, the result is terrible.
Here is my result :60.7% miou on cityscapes; 18.16%miou on ADE20K
I would be very grateful if you could reply me ASAP.
The README.md
says:
This repo is under the Apache-2.0 license.
Could you please add an explicit LICENSE
file to the repo with the copy of the license? That way, it's very clear, and GitHub will also add a line to the repo overview section with the license name.
You can find a copy of the license here: https://choosealicense.com/licenses/apache-2.0/
Hello, thanks for your excellent job , i have another question, my computer has only a gpu, your test command is normal to use or not ? i change the command :./tools/dist_test.sh /path/to/config /path/to/checkpoint_file 1 --eval mIoU. only "8" is instead of "1". I run it but occur some mistakes.
Hi, I reproduce SegNeXt with Paddle and obtain higher scores.
Origin | Paddle |
---|---|
79.8 | 81.04 |
81.3 | 81.33 |
82.6 | 82.74 |
83.2 | 83.32 |
RuntimeError: Expected tensor for 'out' to have the same device as tensor for argument #3 'batch2'; but device 2 does not equal 0 (while checking arguments for baddbmm)
anyone could help me solve this problem?
有没有不包含mmseg的版本?看起来比较费劲,没用的信息太多了
Hi,i have read readme.txt already, but i do not find mscan_x.pth, could you please tell me how to get it, thank you!
i find it. i missed it.
Thanks for your excellent work, I have a quick question about the model structure.
In
SegNeXt/mmseg/models/backbones/mscan.py
Lines 59 to 91 in b53d601
we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.
I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.
SegNeXt/mmseg/models/backbones/mscan.py
Lines 94 to 101 in b53d601
But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.
Hi,
thanks for the fantastic work. I am currently trying to train the tiny model from the Imagenet-pretrained weights on the ADE dataset to begin integrating your work into mmsegmentation, as discussed here and here.
However, I am confused about the batch size and learning rate. In the paper, you mention a batch size of 16 and that you use 8 GPUS. However, the config sets samples_per_gpu
to 8. Can you kindly tell me what the total batch size used for training should be and the corresponding learning rate?
Best wishes & many thanks,
Fabian
你好:
我使用的是segnext_tiny_512x512_ade_160k.pth,以及配置文件segnext.tiny.512x512.ade.160k.py,训练的时候有警告 WARNING - The model and loaded state dict do not match exactly,用small模型也是一样的警告。
请问模型是有什么问题吗? 谢谢
First of all, thank you for your work. Secondly, I encountered some problems. Please take a look at it for me when you are free. I plan to add your msca structure to the convext block, but there are some bugs。
first convext structure
class ConvNextBlock(nn.Module):
def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6):
super().__init__()
self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
self.norm = LayerNorm_s(dim, eps=1e-6)
self.pwconv1 = nn.Linear(dim, 4 * dim)
self.act = nn.GELU()
self.pwconv2 = nn.Linear(4 * dim, dim)
self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
requires_grad=True) if layer_scale_init_value > 0 else None
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x):
input = x
x = self.dwconv(x)
x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)
x = self.norm(x)
x = self.pwconv1(x)
x = self.act(x)
x = self.pwconv2(x)
if self.gamma is not None:
x = self.gamma * x
x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)
x = input + self.drop_path(x)
return x
Then I used your structure to get
class ConvNextBlock2(nn.Module):
def init(self, dim, drop_path=0., layer_scale_init_value=1e-6):
super().init()
self.attn = SpatialAttention(dim) #add it
self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
self.norm = LayerNorm_s(dim, eps=1e-6)
self.pwconv1 = nn.Linear(dim, 4 * dim)
self.act = nn.GELU()
self.pwconv2 = nn.Linear(4 * dim, dim)
self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
requires_grad=True) if layer_scale_init_value > 0 else None
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x):
shortcut = x
x = self.dwconv(x)
x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)
x = self.norm(x)
x = self.pwconv1(x)
x = self.act(x)
x = self.pwconv2(x)
if self.gamma is not None:
x = self.gamma.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm(x)) #add it
x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)
x = shortcut + self.drop_path(x)
return x
but when i run it,then
RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[2, 104, 104, 64] to have 64 channels, but got 104 channels instead
please help me,How to adjust him or what went wrong?
Hi, I got a question about the ADE20K dataset used in SegNeXt
The config file of ADE20K dataset in this repo has some differences from the original version in mmseg, e.g., dict(type='ResizeToMultiple', size_divisor=32),
in transforms, and
type='RepeatDataset',
times=50,
I'm new of mmseg, could you explain these different of configs? Thanks a lot.
Hello, thanks for the wonderful work, I learned a lot from it.
Would you be interesting in contributing your work to mmsegmentation? If you want, I could also help with it.
Hello,
Thank you for sharing valuable code of your work.
Could you get the config files for the Pascal VOC?
I couldn't see that in your files.
thank you
Could you release the SegNeXt-L weights for the COCOStuff dataset?
does the backbone the do detection and segmentation task simuteneously well?
邀请您来 OpenMMLab 社区开放麦宣传推广此工作,我的微信是 van-sin。
According to the tools/get_flops.py, I think you took MACs as FLOPs, but always one MAC is two FLOP.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.