visual-attention-network / segnext Goto Github PK

Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)

License: Apache License 2.0

Python 99.74% Dockerfile 0.14% Shell 0.12%

segnext's People

Contributors

Stargazers

Watchers

segnext's Issues

the result of SegNeXt training on ADE20k

你好，还是问一个小白的问题：
单卡3060，用SegnNexXt-t模型在ADE20k上得出的结果如下：
--train.py

--test.py

想请问您，
1）您论文结果中的mIoU是验证集的结果还是验证集的结果呀？

2）这里的SS MS，我搜了一下，是指下图吗？

3）我得出的结果与作者您在论文中的结果有1%~2%个点的精度差距，问题出在哪呀？我该做如何调整才能达到论文的精度呢？

非常麻烦您可以抽空解答一下小白的困惑！~

Unable to install setup.py

Hi, I'm getting unknown command setup.py. I'm running on Windows 10.

KeyError: "EncoderDecoder: 'MSCAN is not in the models registry'"

when I run the train.py, I met with the error.Anyone can help me ...? thanks a lot! I have checked the register and init.py in mmseg/models and mmseg/models/backbones. The MSCAN module should have been registered.

how can i do distributed training ?

hi, thanks for your great job!
i would like to ask how can i do distributed training ?

thanks so much.

a question about your paper

“we use batch normalization instead of layer normalization as we found batch normalization gains more for the segmentation performance.”
I can not understand why，I think maybe it is because you substitute self-attentin for Conv。But I am not sure

assert palette.shape[0] == len(self.CLASSES)

when i try "image_demo.py" an error occurred.
palette.shape =[19,3]
self.CLASSES=150

how do i fix it?

ImageNet pretrainning configurations

Hi, can you provide ImageNet pretraining configs (e.g. resolution, epochs, training tricks)? They are important for a fair comparison.

VAN backbone performance?

VAN-Small + Light-Ham-D256 with 15.8GFlops and 13.8M Params achieves 45.2mIoU on ADE20K: here
MSCAN-S + Light-Ham-D512 with 16.0GFlops and 14.0M Params achieves 44.3mIoU on ADE20K

VAN-Base + Light-Ham with 34.4GFlops and 27.4M Params achieves 49.6mIoU on ADE20K: here
MSCAN-B + Light-Ham with 35.0GFlops and 28.0M Params achieves 48.5mIoU on ADE20K

What is the need of MSCAN backbone? The paper explain that "Though VAN has achieved great performance in image classification, it neglects the role of multi-scale feature aggregation during the network design, which is crucial for segmentation-like tasks", however that's vanilla VAN without Light-Ham decoder

about the command of training： ./tools/dist_train.sh /path/to/config 8

Hello, very appreciate to give us so excellent article to solve visual tasks, can you explain the path parameter? I cant't find some information about it in code, very grateful！

Hello, can you support binary segmentation dataset? 0 or 255(eg: Medical Image Segmentation)

pad bug, when use pad,iou is error

if use pad,please modify whole_inference in encode_decoder.py line 209

     resize_shape = img_meta[0]['img_shape'][:2]
     seg_logit = seg_logit[:, :, :resize_shape[0], :resize_shape[1]]
     size = img_meta[0]['ori_shape'][:2]

About multi-scale branch

Hi authors,
In MSCA, the learned multi-scale features are aggregated via adding opt rather than concatenating. Is this for the purpose of achieving a better trade-off between performance and complexity? (concat may be more expensive in terms of GPU memory or slower speed).

about train？

i train my own datasets, and download the pretrained_weights of 'mscan_b.pth', can you explain to put the weights_file to which place? Thanks very much! my command is : python /tools/train.py local_configs/segnext/base/segnext.base.512x512.tusimple.160k.py --load-from pretrained/mscan_b.pth

but occur some error which is showed in the follow picture!

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Is it possible to export SegNeXt as ONNX or tensorrt model to speedup inferencing?

The SegNeXt is a great job to use multiscale/large convolution kernel to mimic the "attention". I was wondering if it's possible to export your model to the ONNX or the tensorrt? The tensorrt will speed up 2x to 10x than original pytorch, that will be very helpful for us in video prediction.

Your model is based on mmsegmentation which already allows model deployment. I see your code, but wonder some operation may not be supported by ONNX or tensorrt, such as 'unsqueeze' / 'flatten'.

Bests

Random seeds variance

When I try the SegNeXt training with different random seeds, I find that the mIoU will have a large variance (up to 1% to 2%).

Tiny

Base

How do you think such impact on the comparison with other models and the reproducibility of results?

Confusing messaging on license: "Apache 2.0" conflicts with "contact for commercial use"

Disclaimer: I am not a lawyer and this is not legal advice. Please contact an attorney for any licensing questions, this is just my understanding and is only for informational purposes.

First of all, thank you for publishing your paper and sharing this code!

The License section in the README.md says:

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

As you can see in https://choosealicense.com/licenses/apache-2.0/ (created by GitHub), the Apache 2.0 license already allows commercial use:

Permissions

Commercial use

[...]

and it does not require asking for permission or notifying anyone of commercial use. If you're asking folks to let you know (as an FYI), that's fine, but that would be optional and up to them, and does not belong in the "license" section, as it's not required as part of Apache 2.0 license.

If you want to prevent commercial use without some sort of fee or custom license, then this is not open-source, and Apache 2.0 is perhaps not an appropriate license for your project, so you may want to choose something else, but it may affect who will be able and willing to contribute to a repo that is under a non-open-source license.

In any case, please reach out to an attorney to discuss your questions and your options. As it stands, this is confusing to readers of this project, as "Apache 2.0 license" + "talk to us before commercial use" seems to be conflicting and hence confusing.

Hope this helps.

Best of luck with your project!

Hello, the link of pre training model is invalid.

The code implementation does not match the picture in the paper

Thank you for your job, I have a question about code. In backbones/mscan.py file, there is a "shortcut" operation in SpatialAttention module but there is a "drop_path" in Block module, It should work the same as “shortcut”。 The first shortcut operation I didn't find in the paper。

The performance on PASCAL VOC val set

Excuse me, can you tell me the performance results of segnext on the validation set of pascal voc? I found that both PSPNet and Deeplabv3 + have 85+ mIoU on the test set, but only 80 on the validation set (according to the mmsegmentation repository), so I want to confirm whether the result on the test set will be higher than that on the validation set ？

problem about train.py

你好，我用单卡3060用segnext-t模型在ade20k数据集上跑实验，按作者您的原config文件跑的话，爆显存了。我就把samples_per_gpu设置成了4，learning rate还是按照0.00006跑，跑完整个实验下来，最终的输出是

mIoU只有14.64是啥情况呀，我需要再调整什么吗？感谢协助！

MSCAN-Large's e.r or mlp_ratios needs to fix

SegNeXt/mmseg/models/backbones/mscan.py

Line 175 in 1e51c8a

mlp_ratios=[4, 4, 4, 4],

It will raise size mismatch when load mscan-large reference mscan-large-config

about the inference speed

Hello, i use image_demo.py to inference a picture, and the inference time is 0.6+s, can you help explain? thanks! maybe my
gpu and cpu are the main cause?

Encoder-Decoder downsample 1/8x, is too coarse to produce 'seg_logits'

I have the original input feature 830(H)x1280(W), but find seg_logits is downsampled to 1024(Channel)x104(H)x160(W) feature map in ham_head. It's too coarse.

你可以看到在分割时，动物的边界不是很清晰。这可能是降采太多导致。希望提供指导。

Plus, you will see too many background occuping the image, which is hard to optimize the other class-segmentation. How to optimize the model to overcome this issue. For example the class_weight?

# tools/dist_train.sh segnext.large.ratmetric.py 4
# python tools/train.py segnext.large.ratmetric.py
_base_ = [
    'local_configs/segnext/large/segnext.large.512x512.coco_stuff164k.80k.py'
]

num_classes = 3
# load_from = None
load_from = 'work_dirs/segnext.large.ratmetric/latest.pth'

model = dict(
    backbone=dict(init_cfg=dict(type='Pretrained', checkpoint='pretrained/segnext_large_512x512_ade_160k.pth')),
    decode_head=dict(
        num_classes=num_classes,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, class_weight=[1.0/50, 1.0, 1.0], loss_weight=1.0))
)

runner = dict(type='IterBasedRunner', max_iters=6400)
checkpoint_config = dict(by_epoch=False, interval=800)
evaluation = dict(interval=800, metric='mIoU')

data_root = 'data_rat_metric'
img_dir='images'
ann_dir='annotations'
img_wh = (1280,832)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='Resize', img_scale=img_wh, ratio_range=(0.7, 1.5)),
    dict(type='RandomCrop', crop_size=img_wh[::-1], cat_max_ratio=1.0, ignore_index=0),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=img_wh[::-1], pad_val=0, seg_pad_val=0),
    # dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_wh,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=train_pipeline),
    val=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline),
    test=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline))

Is this model easy to convert to onnx?

thanks!

Are params really exact?

  The params which you exhibit in the tables are 4M, 14M, 28M, 49M (T, S, B, L).But the size of pertained models in the "TsingHua Cloud"  are about 50, 168, 322, 543 MB(T, S, B, L).
  I think the size of pretrained model should equal to (params  x  4B), because one parameter saved as a float32 number which occupy 4 Bytes in memory. So the parameters inferred from pertained models should be **12M, 44M, 80M, 134M.**
 Maybe I misunderstand the rules of memory calculation by params. **Or could you explain?**

The content of the article is inconsistent with the code

In codes,
inputs = torch.cat(inputs, dim=1)

SegNeXt/mmseg/models/decode_heads/ham_head.py

Line 233 in c87bcae

inputs = torch.cat(inputs, dim=1)

In paper,
our decoder only receives features from the last three stages

can you tell me what happened

About the training process's error

please help me watch the picture , why occur this case ? the loss become 0 suddenly， thanks

How do I train a new dataset?

Hi, I have a dataset where I am detecting primarily just one object and I wish to create a mask image of that object. The object would appear white against a black background.
I have organised it in
SegNeXt/data/
images/
annotations/

Currently your README has the following information for training:
./tools/dist_train.sh /path/to/config 8

Any pretrained models for me to edit and retrain on the new dataset?

Cannot reproduce your results

hi，i'm sorry to bother you but
when i use your local_config(default setting) to train neither using cityscape nor ade20k can reproduce your results. Do you have any idea about this?

When i directly use the pretrained .pth file that you provide，i can get the result 79.25%miou on cityscapes. But when I train the model locally, the result is terrible.

Here is my result :60.7% miou on cityscapes; 18.16%miou on ADE20K

I would be very grateful if you could reply me ASAP.

Please add a `LICENSE` file to this repo

The README.md says:

This repo is under the Apache-2.0 license.

Could you please add an explicit LICENSE file to the repo with the copy of the license? That way, it's very clear, and GitHub will also add a line to the repo overview section with the license name.

You can find a copy of the license here: https://choosealicense.com/licenses/apache-2.0/

about the test command(only one gpu)

Hello, thanks for your excellent job , i have another question, my computer has only a gpu, your test command is normal to use or not ? i change the command :./tools/dist_test.sh /path/to/config /path/to/checkpoint_file 1 --eval mIoU. only "8" is instead of "1". I run it but occur some mistakes.

The paddlepaddle version of SegNeXt

Hi, I reproduce SegNeXt with Paddle and obtain higher scores.

Origin	Paddle
79.8	81.04
81.3	81.33
82.6	82.74
83.2	83.32

when I set the "--gpu-id", default=2, I meet this problem. only "--gpu-id", default=0. I can train the model.

RuntimeError: Expected tensor for 'out' to have the same device as tensor for argument #3 'batch2'; but device 2 does not equal 0 (while checking arguments for baddbmm)

anyone could help me solve this problem?

有没有不包含mmseg的版本？

有没有不包含mmseg的版本？看起来比较费劲，没用的信息太多了

how to get mscan_t.pth?

Hi,i have read readme.txt already, but i do not find mscan_x.pth, could you please tell me how to get it, thank you!

i find it. i missed it.

Why there is no activation function in attention module?

Thanks for your excellent work, I have a quick question about the model structure.

SegNeXt/mmseg/models/backbones/mscan.py

Lines 59 to 91 in b53d601

 class AttentionModule(BaseModule): 

 def __init__(self, dim): 

 super().__init__() 

 self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim) 

 self.conv0_1 = nn.Conv2d(dim, dim, (1, 7), padding=(0, 3), groups=dim) 

 self.conv0_2 = nn.Conv2d(dim, dim, (7, 1), padding=(3, 0), groups=dim) 

 self.conv1_1 = nn.Conv2d(dim, dim, (1, 11), padding=(0, 5), groups=dim) 

 self.conv1_2 = nn.Conv2d(dim, dim, (11, 1), padding=(5, 0), groups=dim) 

 self.conv2_1 = nn.Conv2d( 

 dim, dim, (1, 21), padding=(0, 10), groups=dim) 

 self.conv2_2 = nn.Conv2d( 

 dim, dim, (21, 1), padding=(10, 0), groups=dim) 

 self.conv3 = nn.Conv2d(dim, dim, 1) 

 def forward(self, x): 

 u = x.clone() 

 attn = self.conv0(x) 

 attn_0 = self.conv0_1(attn) 

 attn_0 = self.conv0_2(attn_0) 

 attn_1 = self.conv1_1(attn) 

 attn_1 = self.conv1_2(attn_1) 

 attn_2 = self.conv2_1(attn) 

 attn_2 = self.conv2_2(attn_2) 

 attn = attn + attn_0 + attn_1 + attn_2 

 attn = self.conv3(attn) 

 return attn * u

we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.

I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.

SegNeXt/mmseg/models/backbones/mscan.py

Lines 94 to 101 in b53d601

 class SpatialAttention(BaseModule): 

 def __init__(self, d_model): 

 super().__init__() 

 self.d_model = d_model 

 self.proj_1 = nn.Conv2d(d_model, d_model, 1) 

 self.activation = nn.GELU() 

 self.spatial_gating_unit = AttentionModule(d_model) 

 self.proj_2 = nn.Conv2d(d_model, d_model, 1)

But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.

Learning Rate and Batch Size

Hi,

thanks for the fantastic work. I am currently trying to train the tiny model from the Imagenet-pretrained weights on the ADE dataset to begin integrating your work into mmsegmentation, as discussed here and here.

However, I am confused about the batch size and learning rate. In the paper, you mention a batch size of 16 and that you use 8 GPUS. However, the config sets samples_per_gpu to 8. Can you kindly tell me what the total batch size used for training should be and the corresponding learning rate?

Best wishes & many thanks,
Fabian

A case that will never happen in /backbones/mscan.py

A case that will never happen in /backbones/mscan.py, which is about considering if i==0 else xxx in the else branch. I guess this is from van.py but the author forget to change this together with the STEM module.

WARNING - The model and loaded state dict do not match exactly

你好:
我使用的是segnext_tiny_512x512_ade_160k.pth，以及配置文件segnext.tiny.512x512.ade.160k.py，训练的时候有警告 WARNING - The model and loaded state dict do not match exactly，用small模型也是一样的警告。
请问模型是有什么问题吗？谢谢

Why are the results of Mit's Params not consistent with the results presented in the paper, Segformer?

hi ，please help me

First of all, thank you for your work. Secondly, I encountered some problems. Please take a look at it for me when you are free. I plan to add your msca structure to the convext block, but there are some bugs。
first convext structure

todo ConvNextBlock

class ConvNextBlock(nn.Module):

def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6):
    super().__init__()
    self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim)  # depthwise conv
    self.norm = LayerNorm_s(dim, eps=1e-6)
    self.pwconv1 = nn.Linear(dim, 4 * dim)
    self.act = nn.GELU()
    self.pwconv2 = nn.Linear(4 * dim, dim)
    self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
                              requires_grad=True) if layer_scale_init_value > 0 else None
    self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

def forward(self, x):
    input = x
    x = self.dwconv(x)
    x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
    x = self.norm(x)
    x = self.pwconv1(x)
    x = self.act(x)
    x = self.pwconv2(x)
    if self.gamma is not None:
        x = self.gamma * x
    x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)

    x = input + self.drop_path(x)
    return x

Then I used your structure to get

todo ConvNextBlock

class ConvNextBlock2(nn.Module):
def init(self, dim, drop_path=0., layer_scale_init_value=1e-6):
super().init()
self.attn = SpatialAttention(dim) #add it
self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
self.norm = LayerNorm_s(dim, eps=1e-6)
self.pwconv1 = nn.Linear(dim, 4 * dim)
self.act = nn.GELU()
self.pwconv2 = nn.Linear(4 * dim, dim)
self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
requires_grad=True) if layer_scale_init_value > 0 else None
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

def forward(self, x):
     shortcut = x
     x = self.dwconv(x)
     x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
     x = self.norm(x)
     x = self.pwconv1(x)
     x = self.act(x)
     x = self.pwconv2(x)
     if self.gamma is not None:
         x = self.gamma.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm(x))   #add it
     x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)
     x = shortcut + self.drop_path(x)
     return x

but when i run it，then
RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[2, 104, 104, 64] to have 64 channels, but got 104 channels instead
please help me，How to adjust him or what went wrong？

about the ade20k dataset config file in SegNeXt

Hi, I got a question about the ADE20K dataset used in SegNeXt
The config file of ADE20K dataset in this repo has some differences from the original version in mmseg, e.g., dict(type='ResizeToMultiple', size_divisor=32), in transforms, and

        type='RepeatDataset',
        times=50,

I'm new of mmseg, could you explain these different of configs? Thanks a lot.

Could you get the config files for the Pascal VOC?
I couldn't see that in your files.

thank you

	class AttentionModule(BaseModule):
	def __init__(self, dim):
	super().__init__()
	self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)
	self.conv0_1 = nn.Conv2d(dim, dim, (1, 7), padding=(0, 3), groups=dim)
	self.conv0_2 = nn.Conv2d(dim, dim, (7, 1), padding=(3, 0), groups=dim)

	self.conv1_1 = nn.Conv2d(dim, dim, (1, 11), padding=(0, 5), groups=dim)
	self.conv1_2 = nn.Conv2d(dim, dim, (11, 1), padding=(5, 0), groups=dim)

	self.conv2_1 = nn.Conv2d(
	dim, dim, (1, 21), padding=(0, 10), groups=dim)
	self.conv2_2 = nn.Conv2d(
	dim, dim, (21, 1), padding=(10, 0), groups=dim)
	self.conv3 = nn.Conv2d(dim, dim, 1)

	def forward(self, x):
	u = x.clone()
	attn = self.conv0(x)

	attn_0 = self.conv0_1(attn)
	attn_0 = self.conv0_2(attn_0)

	attn_1 = self.conv1_1(attn)
	attn_1 = self.conv1_2(attn_1)

	attn_2 = self.conv2_1(attn)
	attn_2 = self.conv2_2(attn_2)
	attn = attn + attn_0 + attn_1 + attn_2

	attn = self.conv3(attn)

	return attn * u

	class SpatialAttention(BaseModule):
	def __init__(self, d_model):
	super().__init__()
	self.d_model = d_model
	self.proj_1 = nn.Conv2d(d_model, d_model, 1)
	self.activation = nn.GELU()
	self.spatial_gating_unit = AttentionModule(d_model)
	self.proj_2 = nn.Conv2d(d_model, d_model, 1)

visual-attention-network / segnext Goto Github PK

segnext's People

Contributors

Stargazers

Watchers

Forkers

segnext's Issues

Welcome update to OpenMMLab 2.0

todo ConvNextBlock

todo ConvNextBlock

Recommend Projects

Recommend Topics

Recommend Org