Really nice work! We are interested in your experimental results on

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Concerns on the segmentation performance gap based on Sync-BN about deconvolution HOT 13 CLOSED

yechengxi commented on June 11, 2024

Concerns on the segmentation performance gap based on Sync-BN

from deconvolution.

Comments (13)

yechengxi commented on June 11, 2024

Thank you for your interest. I will clean up the code for semantic segmentation for further analysis.
In fact our paper is comparing with Sync-BN as in the official pytorch implementation. The performance gap is quite significant when training from scratch. But I do not have a conclusion about finetuning for now.

from deconvolution.

PkuRainBow commented on June 11, 2024

Sounds interesting, I will try your method based on our codebase by training the models from scratch. In fact, until now, the Sync-BN performs best than all the other kinds of variants of normalization. We really hope to see your method could outperform the Sync-BN.

from deconvolution.

PkuRainBow commented on June 11, 2024

@yechengxi I have another small question about the setting of the hyper-parameters of the channel_deconv for the segmentation experiments as there, in fact, is no fully-connected layer.

In other words, should I choose the ChannelDeconv or FastDeconv, if it is FastDeconv, how about the parameters?

DeConv=partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)

from deconvolution.

PkuRainBow commented on June 11, 2024

@yechengxi We have run the experiments with ResNet-101 FCN (based on Deconv) w/ output stride=8 using the setting

DeConv=partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3).

In fact, the results seem comparable with the results based on Sync-BN, for example, at the 1000-th iterations, the performance with Sync-BN achieves 13% while Deconv achieves 12.6% on Cityscapes measured by the mIoU.

It would be great if you could share with us some advice on how to tune the hyperparameters to improve the result in order to outperform the Sync-BN.

We have pasted our implementation (modified from your code and we will change the stride and dilation rates of the stage3/4 in the other files, which is not included) as below,

import functools
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict

from lib.extensions.deconvolution.deconv import *
from lib.models.tools.module_helper import ModuleHelper
DeConv=functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, deconv=DeConv):
    """3x3 convolution with padding"""
    if deconv:
        return deconv(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, dilation=dilation,groups=groups)
    else:
        return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, bias=False, dilation=dilation,groups=groups)#


def conv1x1(in_planes, out_planes, stride=1, deconv=DeConv):
    """1x1 convolution"""
    if deconv:
        return deconv(in_planes, out_planes, kernel_size=1, stride=stride)
    else:
        return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion = 1
    __constants__ = ['downsample']

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
        super(BasicBlock, self).__init__()

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride,deconv=deconv)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes,deconv=deconv)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.relu(out)
        out = self.conv2(out)
        if self.downsample is not None:
            identity = self.downsample(x)

        out = out + identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
        super(Bottleneck, self).__init__()
        width = int(planes * (base_width / 64.)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width, deconv=deconv)
        self.conv2 = conv3x3(width, width, stride, groups, dilation, deconv=deconv)
        self.conv3 = conv1x1(width, planes * self.expansion, deconv=deconv)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)

        if hasattr(self,'bn1'):
            out = self.bn1(out)

        out = self.relu(out)
        out = self.conv2(out)
        out = self.relu(out)
        out = self.conv3(out)
        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class DeconvResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=None,
                 norm_layer=None, deconv=DeConv, channel_deconv=None):
        super(DeconvResNet, self).__init__()

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 128
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group

        self.resinit = nn.Sequential(OrderedDict([
            ('conv1', deconv(3, 64, kernel_size=3, stride=2, padding=1)),
            ('relu1', nn.ReLU(inplace=True)),
            ('conv2', deconv(64, 64, kernel_size=3, stride=1, padding=1)),
            ('relu2', nn.ReLU(inplace=True)),
            ('conv3', deconv(64, 128, kernel_size=3, stride=1, padding=1)),
            ('relu3', nn.ReLU(inplace=True))]
        ))

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True)  # change.

        self.layer1 = self._make_layer(block, 64, layers[0], deconv=deconv)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0], deconv=deconv)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1], deconv=deconv)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2], deconv=deconv)
                                       

        if channel_deconv:
            self.deconv1 =channel_deconv()

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m,FastDeconv):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False, deconv=None):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride, deconv=deconv),
                ModuleHelper.BatchNorm2d(bn_type='inplace_abn')(planes * block.expansion)
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer, deconv=deconv))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer, deconv=deconv))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.resinit(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if hasattr(self, 'deconv1'):
            x = self.deconv1(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

from deconvolution.

yechengxi commented on June 11, 2024

@PkuRainBow Have you also modified the head network? ChannelDeconv is only for the backbone network. When used in semantic segmentation, it will be taken out. So we only need to focus on the feature extraction part.

from deconvolution.

yechengxi commented on June 11, 2024

The code has been added.

from deconvolution.

PkuRainBow commented on June 11, 2024

@yechengxi Yes, we have removed the ChannelDeconv in our experiments.

We have figured out several differences by checking your segmentation code and we will update our results latter.

Update:

After fixing several issues, we still find the performance gap becomes even larger. In fact, the model with Sync-BN achieves 31% after 5000-iterations while the model with Deconv only achieves 22% after 5000-iterations. In fact, we use a smaller learning rate 0.01 as our system is already well-verified on the Cityscapes benchmark. We are wondering your observation with a smaller learning rate as it is not a standard-setting to train the model with learning rate around 0.1~

We paste our updated version code as below, and it would be great if you could help to check the possible reasons.

import functools
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict

from lib.extensions.deconvolution.deconv import *
from lib.models.tools.module_helper import ModuleHelper
DeConv=functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, deconv=DeConv):
    """3x3 convolution with padding"""
    if deconv:
        return deconv(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, dilation=dilation, groups=groups, sampling_stride=3)
    else:
        return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, bias=False, dilation=dilation,groups=groups)#


def conv1x1(in_planes, out_planes, stride=1, deconv=DeConv):
    """1x1 convolution"""
    if deconv:
        return deconv(in_planes, out_planes, kernel_size=1, stride=stride)
    else:
        return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion = 1
    __constants__ = ['downsample']

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
        super(BasicBlock, self).__init__()

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride,deconv=deconv)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes,deconv=deconv)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.relu(out)
        out = self.conv2(out)
        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
        super(Bottleneck, self).__init__()
        width = int(planes * (base_width / 64.)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width, deconv=deconv)
        self.conv2 = conv3x3(width, width, stride, groups, dilation, deconv=deconv)
        self.conv3 = conv1x1(width, planes * self.expansion, deconv=deconv)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.relu(out)
        out = self.conv3(out)
        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class DeconvResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=[False, True, True],
                 norm_layer=None, deconv=DeConv, channel_deconv=None):
        super(DeconvResNet, self).__init__()

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 128
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group

        self.resinit = nn.Sequential(OrderedDict([
            ('conv1', deconv(3, 64, kernel_size=3, stride=2, padding=1)),
            ('relu1', nn.ReLU(inplace=True)),
            ('conv2', deconv(64, 64, kernel_size=3, stride=1, padding=1)),
            ('relu2', nn.ReLU(inplace=True)),
            ('conv3', deconv(64, 128, kernel_size=3, stride=1, padding=1)),
            ('relu3', nn.ReLU(inplace=True))]
        ))

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True)  # change.

        self.layer1 = self._make_layer(block, 64, layers[0], deconv=deconv)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0], deconv=deconv)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1], deconv=deconv)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2], deconv=deconv)
                                       

        if channel_deconv:
            self.deconv1 =channel_deconv()

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m,FastDeconv):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False, deconv=None):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride, deconv=deconv),
                ModuleHelper.BatchNorm2d(bn_type='inplace_abn')(planes * block.expansion)
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer, deconv=deconv))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer, deconv=deconv))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.resinit(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

Here we also paste the FCN model implementation as below,


class DeconvFcnNet(nn.Module):
    def __init__(self, configer):
        self.inplanes = 128
        super(DeconvFcnNet, self).__init__()
        self.configer = configer
        self.num_classes = self.configer.get('data', 'num_classes')
        self.backbone = BackboneSelector(configer).get_backbone()

        # extra added layers
        in_channels = [1024, 2048]

        import functools
        from lib.extensions.deconvolution.deconv import FastDeconv
        DeConv = functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)

        self.cls_head = nn.Sequential(
            DeConv(in_channels[1], 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout2d(0.10),
            DeConv(512, self.num_classes, kernel_size=1, bias=True, block=512)
        )
        self.dsn_head = nn.Sequential(
            DeConv(in_channels[0], 512, kernel_size=3, stride=1, padding=1),
            nn.Dropout2d(0.10),
            DeConv(512, self.num_classes, kernel_size=1, bias=True, block=512)
        )

    def forward(self, x_):
        x = self.backbone(x_)
        aux_x = self.dsn_head(x[-2])
        x = self.cls_head(x[-1])
        aux_x = F.interpolate(aux_x, size=(x_.size(2), x_.size(3)), mode="bilinear", align_corners=True)
        x = F.interpolate(x, size=(x_.size(2), x_.size(3)), mode="bilinear", align_corners=True)
        return aux_x, x

from deconvolution.

yechengxi commented on June 11, 2024

0.01 is the standard setting for fine tuning. When training from scratch we should raise it.

from deconvolution.

yechengxi commented on June 11, 2024

I have also added the commands for FCN.

from deconvolution.

bluesky314 commented on June 11, 2024

@yechengxi So you cannot adress the results of poor accuracy?

from deconvolution.

yechengxi commented on June 11, 2024

@yechengxi So you cannot adress the results of poor accuracy?

@bluesky314 you can find good results with the provided commands.

from deconvolution.

bluesky314 commented on June 11, 2024

The last provided commands by @PkuRainBow state "In fact, the model with Sync-BN achieves 31% after 5000-iterations while the model with Deconv only achieves 22% after 5000-iterations. ", I have not tried them myself. Which provided commands are youu referring to?

from deconvolution.

yechengxi commented on June 11, 2024

@bluesky314
I have provided the source code and commands in the 'Segmentation' folder. PkuRainBow was testing his own implementation before I uploaded the code.

from deconvolution.

Concerns on the segmentation performance gap based on Sync-BN about deconvolution HOT 13 CLOSED

Comments (13)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent