Giter VIP home page Giter VIP logo

segnext's People

Contributors

lgyoung avatar likyoo avatar menghaoguo avatar uyzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

segnext's Issues

the result of SegNeXt training on ADE20k

你好,还是问一个小白的问题:
单卡3060,用SegnNexXt-t模型在ADE20k上得出的结果如下:
--train.py
image

--test.py
image

想请问您,
1)您论文结果中的mIoU是验证集的结果还是验证集的结果呀?
image

2)这里的SS MS,我搜了一下,是指下图吗?
image

3)我得出的结果与作者您在论文中的结果有1%~2%个点的精度差距,问题出在哪呀?我该做如何调整才能达到论文的精度呢?

非常麻烦您可以抽空解答一下小白的困惑!~

a question about your paper

“we use batch normalization instead of layer normalization as we found batch normalization gains more for the segmentation performance.”
I can not understand why,I think maybe it is because you substitute self-attentin for Conv。But I am not sure

VAN backbone performance?

VAN-Small + Light-Ham-D256 with 15.8GFlops and 13.8M Params achieves 45.2mIoU on ADE20K: here
MSCAN-S + Light-Ham-D512 with 16.0GFlops and 14.0M Params achieves 44.3mIoU on ADE20K

VAN-Base + Light-Ham with 34.4GFlops and 27.4M Params achieves 49.6mIoU on ADE20K: here
MSCAN-B + Light-Ham with 35.0GFlops and 28.0M Params achieves 48.5mIoU on ADE20K

What is the need of MSCAN backbone? The paper explain that "Though VAN has achieved great performance in image classification, it neglects the role of multi-scale feature aggregation during the network design, which is crucial for segmentation-like tasks", however that's vanilla VAN without Light-Ham decoder

pad bug, when use pad,iou is error

if use pad,please modify whole_inference in encode_decoder.py line 209

     resize_shape = img_meta[0]['img_shape'][:2]
     seg_logit = seg_logit[:, :, :resize_shape[0], :resize_shape[1]]
     size = img_meta[0]['ori_shape'][:2]

About multi-scale branch

Hi authors,
In MSCA, the learned multi-scale features are aggregated via adding opt rather than concatenating. Is this for the purpose of achieving a better trade-off between performance and complexity? (concat may be more expensive in terms of GPU memory or slower speed).

about train?

i train my own datasets, and download the pretrained_weights of 'mscan_b.pth', can you explain to put the weights_file to which place? Thanks very much! my command is : python /tools/train.py local_configs/segnext/base/segnext.base.512x512.tusimple.160k.py --load-from pretrained/mscan_b.pth

but occur some error which is showed in the follow picture!
2022-10-25 13-46-35屏幕截图

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Is it possible to export SegNeXt as ONNX or tensorrt model to speedup inferencing?

The SegNeXt is a great job to use multiscale/large convolution kernel to mimic the "attention". I was wondering if it's possible to export your model to the ONNX or the tensorrt? The tensorrt will speed up 2x to 10x than original pytorch, that will be very helpful for us in video prediction.

Your model is based on mmsegmentation which already allows model deployment. I see your code, but wonder some operation may not be supported by ONNX or tensorrt, such as 'unsqueeze' / 'flatten'.

Bests

Random seeds variance

When I try the SegNeXt training with different random seeds, I find that the mIoU will have a large variance (up to 1% to 2%).

Tiny
image

Base
image

How do you think such impact on the comparison with other models and the reproducibility of results?

Confusing messaging on license: "Apache 2.0" conflicts with "contact for commercial use"

Disclaimer: I am not a lawyer and this is not legal advice. Please contact an attorney for any licensing questions, this is just my understanding and is only for informational purposes.

First of all, thank you for publishing your paper and sharing this code!

The License section in the README.md says:

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

As you can see in https://choosealicense.com/licenses/apache-2.0/ (created by GitHub), the Apache 2.0 license already allows commercial use:

Permissions

  • Commercial use
  • [...]

and it does not require asking for permission or notifying anyone of commercial use. If you're asking folks to let you know (as an FYI), that's fine, but that would be optional and up to them, and does not belong in the "license" section, as it's not required as part of Apache 2.0 license.

If you want to prevent commercial use without some sort of fee or custom license, then this is not open-source, and Apache 2.0 is perhaps not an appropriate license for your project, so you may want to choose something else, but it may affect who will be able and willing to contribute to a repo that is under a non-open-source license.

In any case, please reach out to an attorney to discuss your questions and your options. As it stands, this is confusing to readers of this project, as "Apache 2.0 license" + "talk to us before commercial use" seems to be conflicting and hence confusing.

Hope this helps.

Best of luck with your project!

The code implementation does not match the picture in the paper

Thank you for your job, I have a question about code. In backbones/mscan.py file, there is a "shortcut" operation in SpatialAttention module but there is a "drop_path" in Block module, It should work the same as “shortcut”。 The first shortcut operation I didn't find in the paper。

The performance on PASCAL VOC val set

Excuse me, can you tell me the performance results of segnext on the validation set of pascal voc? I found that both PSPNet and Deeplabv3 + have 85+ mIoU on the test set, but only 80 on the validation set (according to the mmsegmentation repository), so I want to confirm whether the result on the test set will be higher than that on the validation set ?

problem about train.py

你好,我用单卡3060用segnext-t模型在ade20k数据集上跑实验,按作者您的原config文件跑的话,爆显存了。我就把samples_per_gpu设置成了4,learning rate还是按照0.00006跑,跑完整个实验下来,最终的输出是
image
mIoU只有14.64是啥情况呀,我需要再调整什么吗?感谢协助!

about the inference speed

Hello, i use image_demo.py to inference a picture, and the inference time is 0.6+s, can you help explain? thanks! maybe my
gpu and cpu are the main cause?
2022-10-19 14-38-32屏幕截图
Screenshot 2022-10-20 155216

Encoder-Decoder downsample 1/8x, is too coarse to produce 'seg_logits'

I have the original input feature 830(H)x1280(W), but find seg_logits is downsampled to 1024(Channel)x104(H)x160(W) feature map in ham_head. It's too coarse.

你可以看到 在分割时,动物的边界不是很清晰。这可能是降采太多导致。希望提供指导。

Plus, you will see too many background occuping the image, which is hard to optimize the other class-segmentation. How to optimize the model to overcome this issue. For example the class_weight?

image

# tools/dist_train.sh segnext.large.ratmetric.py 4
# python tools/train.py segnext.large.ratmetric.py
_base_ = [
    'local_configs/segnext/large/segnext.large.512x512.coco_stuff164k.80k.py'
]

num_classes = 3
# load_from = None
load_from = 'work_dirs/segnext.large.ratmetric/latest.pth'

model = dict(
    backbone=dict(init_cfg=dict(type='Pretrained', checkpoint='pretrained/segnext_large_512x512_ade_160k.pth')),
    decode_head=dict(
        num_classes=num_classes,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, class_weight=[1.0/50, 1.0, 1.0], loss_weight=1.0))
)

runner = dict(type='IterBasedRunner', max_iters=6400)
checkpoint_config = dict(by_epoch=False, interval=800)
evaluation = dict(interval=800, metric='mIoU')

data_root = 'data_rat_metric'
img_dir='images'
ann_dir='annotations'
img_wh = (1280,832)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='Resize', img_scale=img_wh, ratio_range=(0.7, 1.5)),
    dict(type='RandomCrop', crop_size=img_wh[::-1], cat_max_ratio=1.0, ignore_index=0),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=img_wh[::-1], pad_val=0, seg_pad_val=0),
    # dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_wh,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=train_pipeline),
    val=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline),
    test=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline))

Are params really exact?

  The params which you exhibit in the tables are 4M, 14M, 28M, 49M (T, S, B, L).But the size of pertained models in the "TsingHua Cloud"  are about 50, 168, 322, 543 MB(T, S, B, L).
  I think the size of pretrained model should equal to (params  x  4B), because one parameter saved as a float32 number which occupy 4 Bytes in memory. So the parameters inferred from pertained models should be **12M, 44M, 80M, 134M.**
 Maybe I misunderstand the rules of memory calculation by params. **Or could you explain?**

How do I train a new dataset?

Hi, I have a dataset where I am detecting primarily just one object and I wish to create a mask image of that object. The object would appear white against a black background.
I have organised it in
SegNeXt/data/
images/
annotations/

Currently your README has the following information for training:
./tools/dist_train.sh /path/to/config 8

Any pretrained models for me to edit and retrain on the new dataset?

Cannot reproduce your results

hi,i'm sorry to bother you but
when i use your local_config(default setting) to train neither using cityscape nor ade20k can reproduce your results. Do you have any idea about this?

When i directly use the pretrained .pth file that you provide,i can get the result 79.25%miou on cityscapes. But when I train the model locally, the result is terrible.

Here is my result :60.7% miou on cityscapes; 18.16%miou on ADE20K

I would be very grateful if you could reply me ASAP.

about the test command(only one gpu)

Hello, thanks for your excellent job , i have another question, my computer has only a gpu, your test command is normal to use or not ? i change the command :./tools/dist_test.sh /path/to/config /path/to/checkpoint_file 1 --eval mIoU. only "8" is instead of "1". I run it but occur some mistakes.

how to get mscan_t.pth?

Hi,i have read readme.txt already, but i do not find mscan_x.pth, could you please tell me how to get it, thank you!

i find it. i missed it.

Why there is no activation function in attention module?

Thanks for your excellent work, I have a quick question about the model structure.

In

class AttentionModule(BaseModule):
def __init__(self, dim):
super().__init__()
self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)
self.conv0_1 = nn.Conv2d(dim, dim, (1, 7), padding=(0, 3), groups=dim)
self.conv0_2 = nn.Conv2d(dim, dim, (7, 1), padding=(3, 0), groups=dim)
self.conv1_1 = nn.Conv2d(dim, dim, (1, 11), padding=(0, 5), groups=dim)
self.conv1_2 = nn.Conv2d(dim, dim, (11, 1), padding=(5, 0), groups=dim)
self.conv2_1 = nn.Conv2d(
dim, dim, (1, 21), padding=(0, 10), groups=dim)
self.conv2_2 = nn.Conv2d(
dim, dim, (21, 1), padding=(10, 0), groups=dim)
self.conv3 = nn.Conv2d(dim, dim, 1)
def forward(self, x):
u = x.clone()
attn = self.conv0(x)
attn_0 = self.conv0_1(attn)
attn_0 = self.conv0_2(attn_0)
attn_1 = self.conv1_1(attn)
attn_1 = self.conv1_2(attn_1)
attn_2 = self.conv2_1(attn)
attn_2 = self.conv2_2(attn_2)
attn = attn + attn_0 + attn_1 + attn_2
attn = self.conv3(attn)
return attn * u

we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.

I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.

class SpatialAttention(BaseModule):
def __init__(self, d_model):
super().__init__()
self.d_model = d_model
self.proj_1 = nn.Conv2d(d_model, d_model, 1)
self.activation = nn.GELU()
self.spatial_gating_unit = AttentionModule(d_model)
self.proj_2 = nn.Conv2d(d_model, d_model, 1)

But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.

Learning Rate and Batch Size

Hi,

thanks for the fantastic work. I am currently trying to train the tiny model from the Imagenet-pretrained weights on the ADE dataset to begin integrating your work into mmsegmentation, as discussed here and here.

However, I am confused about the batch size and learning rate. In the paper, you mention a batch size of 16 and that you use 8 GPUS. However, the config sets samples_per_gpu to 8. Can you kindly tell me what the total batch size used for training should be and the corresponding learning rate?

Best wishes & many thanks,
Fabian

A case that will never happen in /backbones/mscan.py

image

A case that will never happen in /backbones/mscan.py, which is about considering if i==0 else xxx in the else branch. I guess this is from van.py but the author forget to change this together with the STEM module.

WARNING - The model and loaded state dict do not match exactly

你好:
我使用的是segnext_tiny_512x512_ade_160k.pth,以及配置文件segnext.tiny.512x512.ade.160k.py,训练的时候有警告 WARNING - The model and loaded state dict do not match exactly,用small模型也是一样的警告。
请问模型是有什么问题吗? 谢谢

hi ,please help me

First of all, thank you for your work. Secondly, I encountered some problems. Please take a look at it for me when you are free. I plan to add your msca structure to the convext block, but there are some bugs。
first convext structure

todo ConvNextBlock

class ConvNextBlock(nn.Module):

def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6):
    super().__init__()
    self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim)  # depthwise conv
    self.norm = LayerNorm_s(dim, eps=1e-6)
    self.pwconv1 = nn.Linear(dim, 4 * dim)
    self.act = nn.GELU()
    self.pwconv2 = nn.Linear(4 * dim, dim)
    self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
                              requires_grad=True) if layer_scale_init_value > 0 else None
    self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

def forward(self, x):
    input = x
    x = self.dwconv(x)
    x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
    x = self.norm(x)
    x = self.pwconv1(x)
    x = self.act(x)
    x = self.pwconv2(x)
    if self.gamma is not None:
        x = self.gamma * x
    x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)

    x = input + self.drop_path(x)
    return x

Then I used your structure to get

todo ConvNextBlock

class ConvNextBlock2(nn.Module):
def init(self, dim, drop_path=0., layer_scale_init_value=1e-6):
super().init()
self.attn = SpatialAttention(dim) #add it
self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
self.norm = LayerNorm_s(dim, eps=1e-6)
self.pwconv1 = nn.Linear(dim, 4 * dim)
self.act = nn.GELU()
self.pwconv2 = nn.Linear(4 * dim, dim)
self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)),
requires_grad=True) if layer_scale_init_value > 0 else None
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

def forward(self, x):
     shortcut = x
     x = self.dwconv(x)
     x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
     x = self.norm(x)
     x = self.pwconv1(x)
     x = self.act(x)
     x = self.pwconv2(x)
     if self.gamma is not None:
         x = self.gamma.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm(x))   #add it
     x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)
     x = shortcut + self.drop_path(x)
     return x

but when i run it,then
RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[2, 104, 104, 64] to have 64 channels, but got 104 channels instead
please help me,How to adjust him or what went wrong?

about the ade20k dataset config file in SegNeXt

Hi, I got a question about the ADE20K dataset used in SegNeXt
The config file of ADE20K dataset in this repo has some differences from the original version in mmseg, e.g., dict(type='ResizeToMultiple', size_divisor=32), in transforms, and

        type='RepeatDataset',
        times=50,

I'm new of mmseg, could you explain these different of configs? Thanks a lot.

Contribute to mmsegmentation

Hello, thanks for the wonderful work, I learned a lot from it.

Would you be interesting in contributing your work to mmsegmentation? If you want, I could also help with it.

Config for Pascal VOC

Hello,

Thank you for sharing valuable code of your work.

Could you get the config files for the Pascal VOC?
I couldn't see that in your files.

thank you

MACs, not FLOPs

According to the tools/get_flops.py, I think you took MACs as FLOPs, but always one MAC is two FLOP.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.