Comments (13)
Thank you for your interest. I will clean up the code for semantic segmentation for further analysis.
In fact our paper is comparing with Sync-BN as in the official pytorch implementation. The performance gap is quite significant when training from scratch. But I do not have a conclusion about finetuning for now.
from deconvolution.
Sounds interesting, I will try your method based on our codebase by training the models from scratch. In fact, until now, the Sync-BN performs best than all the other kinds of variants of normalization. We really hope to see your method could outperform the Sync-BN.
from deconvolution.
@yechengxi I have another small question about the setting of the hyper-parameters of the channel_deconv for the segmentation experiments as there, in fact, is no fully-connected layer.
In other words, should I choose the ChannelDeconv or FastDeconv, if it is FastDeconv, how about the parameters?
DeConv=partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)
from deconvolution.
@yechengxi We have run the experiments with ResNet-101 FCN (based on Deconv) w/ output stride=8 using the setting
DeConv=partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3).
In fact, the results seem comparable with the results based on Sync-BN, for example, at the 1000-th iterations, the performance with Sync-BN achieves 13% while Deconv achieves 12.6% on Cityscapes measured by the mIoU.
It would be great if you could share with us some advice on how to tune the hyperparameters to improve the result in order to outperform the Sync-BN.
We have pasted our implementation (modified from your code and we will change the stride and dilation rates of the stage3/4 in the other files, which is not included) as below,
import functools
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from lib.extensions.deconvolution.deconv import *
from lib.models.tools.module_helper import ModuleHelper
DeConv=functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, deconv=DeConv):
"""3x3 convolution with padding"""
if deconv:
return deconv(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, dilation=dilation,groups=groups)
else:
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, bias=False, dilation=dilation,groups=groups)#
def conv1x1(in_planes, out_planes, stride=1, deconv=DeConv):
"""1x1 convolution"""
if deconv:
return deconv(in_planes, out_planes, kernel_size=1, stride=stride)
else:
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
class BasicBlock(nn.Module):
expansion = 1
__constants__ = ['downsample']
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
# Both self.conv1 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv3x3(inplanes, planes, stride,deconv=deconv)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes,deconv=deconv)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.relu(out)
out = self.conv2(out)
if self.downsample is not None:
identity = self.downsample(x)
out = out + identity
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
super(Bottleneck, self).__init__()
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width, deconv=deconv)
self.conv2 = conv3x3(width, width, stride, groups, dilation, deconv=deconv)
self.conv3 = conv1x1(width, planes * self.expansion, deconv=deconv)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
if hasattr(self,'bn1'):
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.relu(out)
out = self.conv3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class DeconvResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None,
norm_layer=None, deconv=DeConv, channel_deconv=None):
super(DeconvResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 128
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.resinit = nn.Sequential(OrderedDict([
('conv1', deconv(3, 64, kernel_size=3, stride=2, padding=1)),
('relu1', nn.ReLU(inplace=True)),
('conv2', deconv(64, 64, kernel_size=3, stride=1, padding=1)),
('relu2', nn.ReLU(inplace=True)),
('conv3', deconv(64, 128, kernel_size=3, stride=1, padding=1)),
('relu3', nn.ReLU(inplace=True))]
))
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True) # change.
self.layer1 = self._make_layer(block, 64, layers[0], deconv=deconv)
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0], deconv=deconv)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1], deconv=deconv)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2], deconv=deconv)
if channel_deconv:
self.deconv1 =channel_deconv()
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m,FastDeconv):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.
# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False, deconv=None):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride, deconv=deconv),
ModuleHelper.BatchNorm2d(bn_type='inplace_abn')(planes * block.expansion)
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation, norm_layer, deconv=deconv))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer, deconv=deconv))
return nn.Sequential(*layers)
def forward(self, x):
x = self.resinit(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if hasattr(self, 'deconv1'):
x = self.deconv1(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
from deconvolution.
@PkuRainBow Have you also modified the head network? ChannelDeconv is only for the backbone network. When used in semantic segmentation, it will be taken out. So we only need to focus on the feature extraction part.
from deconvolution.
The code has been added.
from deconvolution.
@yechengxi Yes, we have removed the ChannelDeconv in our experiments.
We have figured out several differences by checking your segmentation code and we will update our results latter.
Update:
After fixing several issues, we still find the performance gap becomes even larger. In fact, the model with Sync-BN achieves 31% after 5000-iterations while the model with Deconv only achieves 22% after 5000-iterations. In fact, we use a smaller learning rate 0.01 as our system is already well-verified on the Cityscapes benchmark. We are wondering your observation with a smaller learning rate as it is not a standard-setting to train the model with learning rate around 0.1~
We paste our updated version code as below, and it would be great if you could help to check the possible reasons.
import functools
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict
from lib.extensions.deconvolution.deconv import *
from lib.models.tools.module_helper import ModuleHelper
DeConv=functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, deconv=DeConv):
"""3x3 convolution with padding"""
if deconv:
return deconv(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, dilation=dilation, groups=groups, sampling_stride=3)
else:
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation, bias=False, dilation=dilation,groups=groups)#
def conv1x1(in_planes, out_planes, stride=1, deconv=DeConv):
"""1x1 convolution"""
if deconv:
return deconv(in_planes, out_planes, kernel_size=1, stride=stride)
else:
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
class BasicBlock(nn.Module):
expansion = 1
__constants__ = ['downsample']
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
# Both self.conv1 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv3x3(inplanes, planes, stride,deconv=deconv)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes,deconv=deconv)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.relu(out)
out = self.conv2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None, deconv=DeConv):
super(Bottleneck, self).__init__()
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width, deconv=deconv)
self.conv2 = conv3x3(width, width, stride, groups, dilation, deconv=deconv)
self.conv3 = conv1x1(width, planes * self.expansion, deconv=deconv)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.relu(out)
out = self.conv2(out)
out = self.relu(out)
out = self.conv3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class DeconvResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=[False, True, True],
norm_layer=None, deconv=DeConv, channel_deconv=None):
super(DeconvResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 128
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.resinit = nn.Sequential(OrderedDict([
('conv1', deconv(3, 64, kernel_size=3, stride=2, padding=1)),
('relu1', nn.ReLU(inplace=True)),
('conv2', deconv(64, 64, kernel_size=3, stride=1, padding=1)),
('relu2', nn.ReLU(inplace=True)),
('conv3', deconv(64, 128, kernel_size=3, stride=1, padding=1)),
('relu3', nn.ReLU(inplace=True))]
))
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True) # change.
self.layer1 = self._make_layer(block, 64, layers[0], deconv=deconv)
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0], deconv=deconv)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1], deconv=deconv)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2], deconv=deconv)
if channel_deconv:
self.deconv1 =channel_deconv()
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m,FastDeconv):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.
# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False, deconv=None):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride, deconv=deconv),
ModuleHelper.BatchNorm2d(bn_type='inplace_abn')(planes * block.expansion)
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation, norm_layer, deconv=deconv))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer, deconv=deconv))
return nn.Sequential(*layers)
def forward(self, x):
x = self.resinit(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
Here we also paste the FCN model implementation as below,
class DeconvFcnNet(nn.Module):
def __init__(self, configer):
self.inplanes = 128
super(DeconvFcnNet, self).__init__()
self.configer = configer
self.num_classes = self.configer.get('data', 'num_classes')
self.backbone = BackboneSelector(configer).get_backbone()
# extra added layers
in_channels = [1024, 2048]
import functools
from lib.extensions.deconvolution.deconv import FastDeconv
DeConv = functools.partial(FastDeconv, bias=True, eps=1e-5, n_iter=5, block=64, sampling_stride=3)
self.cls_head = nn.Sequential(
DeConv(in_channels[1], 512, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Dropout2d(0.10),
DeConv(512, self.num_classes, kernel_size=1, bias=True, block=512)
)
self.dsn_head = nn.Sequential(
DeConv(in_channels[0], 512, kernel_size=3, stride=1, padding=1),
nn.Dropout2d(0.10),
DeConv(512, self.num_classes, kernel_size=1, bias=True, block=512)
)
def forward(self, x_):
x = self.backbone(x_)
aux_x = self.dsn_head(x[-2])
x = self.cls_head(x[-1])
aux_x = F.interpolate(aux_x, size=(x_.size(2), x_.size(3)), mode="bilinear", align_corners=True)
x = F.interpolate(x, size=(x_.size(2), x_.size(3)), mode="bilinear", align_corners=True)
return aux_x, x
from deconvolution.
0.01 is the standard setting for fine tuning. When training from scratch we should raise it.
from deconvolution.
I have also added the commands for FCN.
from deconvolution.
@yechengxi So you cannot adress the results of poor accuracy?
from deconvolution.
@yechengxi So you cannot adress the results of poor accuracy?
@bluesky314 you can find good results with the provided commands.
from deconvolution.
The last provided commands by @PkuRainBow state "In fact, the model with Sync-BN achieves 31% after 5000-iterations while the model with Deconv only achieves 22% after 5000-iterations. ", I have not tried them myself. Which provided commands are youu referring to?
from deconvolution.
@bluesky314
I have provided the source code and commands in the 'Segmentation' folder. PkuRainBow was testing his own implementation before I uploaded the code.
from deconvolution.
Related Issues (10)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deconvolution.