yechengxi / deconvolution Goto Github PK

License: Apache License 2.0

Python 100.00%

deconvolution's Introduction

Network Deconvolution

Convolution is a central operation in Convolutional Neural Networks (CNNs), which applies a kernel to overlapping regions shifted across the image. However, because of the strong correlations in real-world image data, convolutional kernels are in effect re-learning redundant data. In this work, we show that this redundancy has made neural network training challenging, and propose network deconvolution, a procedure which optimally removes pixel-wise and channel-wise correlations before the data is fed into each layer. Network deconvolution can be efficiently calculated at a fraction of the computational cost of a convolution layer. We also show that the deconvolution filters in the first layer of the network resemble the center-surround structure found in biological neurons in the visual regions of the brain. Filtering with such kernels results in a sparse representation, a desired property that has been missing in the training of neural networks. Learning from the sparse representation promotes faster convergence and superior results without the use of batch normalization. We apply our network deconvolution operation to 10 modern neural network models by replacing batch normalization within each. Extensive experiments show that the network deconvolution operation is able to deliver performance improvement in all cases on the CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, Cityscapes, and ImageNet datasets.

@inproceedings{
Ye2020Network,
title={Network Deconvolution},
author={Chengxi Ye and Matthew Evanusa and Hua He and Anton Mitrokhin and Tom Goldstein and James A. Yorke and Cornelia Fermuller and Yiannis Aloimonos},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=rkeu30EtvS }
}

Install Dependencies

This code requires the use of python3.5 or greater.

We recommend using pip to install the required dependencies.

pip install scipy numpy tensorboard matplotlib

Install PyTorch:

pip install torch torchvision

(optional, for visualization) Install tensorflow:

pip3 install tensorflow

Settings Overview

We have included a few settings you can add into the run command.

The basic run command (for non-imagenet dataset) is:

python main.py --[keyword1] [argument1] --[keyword2] [argument2]  ...

The major keywords to note are:

deconv - set to True or False if you want to test deconv (True) or BN (False)
arch - use a given architecture (resnet50, vgg11, vgg13, vgg19, densenet121)
wd - sets the weight decay to a given value
batch-size - sets the batch size
epochs - the number of epochs to run
dataset - the dataset to use (cifar10, cifar100) (for imagenet you need the other main file)
lr - sets the learning rate
block - block size in deconvolution
block-fc - block size in decorrelating the fully connected layers.

1. Running the examples from the paper

As an example, to run our settings for the CIFAR-10 20-epoch run, with .001 weight decay and 128 batch size, on the vgg11 architecture, you would run:

CUDA_VISIBLE_DEVICES=0 python main.py --lr .1 --optimizer SGD --arch vgg11 --epochs 20 --dataset cifar10  --batch-size 128 --msg True --deconv False --block-fc 0 --wd .001

for batch norm, and

CUDA_VISIBLE_DEVICES=0 python main.py --lr .1 --optimizer SGD --arch vgg11 --epochs 20 --dataset cifar10  --batch-size 128 --msg True --deconv True --block-fc 512 --wd .001

for deconvolution

2. ImageNet dataset:

original resnet18 (90 epochs, use --epochs xx to change)

python main_imagenet.py -a resnet18 -j 32 imagenet/ILSVRC/Data/CLS-LOC

deconv resnet18

python main_imagenet.py -a resnet18d -j 32 imagenet/ILSVRC/Data/CLS-LOC --deconv True

3. Semantic segmentation:

Go to the Segmentation folder and follow the instructions in the ReadMe file.

deconvolution's People

Contributors

Stargazers

Watchers

deconvolution's Issues

Depthwise convolutions

How to replace deconv layer with the combination of depthwise convolution and batchnorm?

License?

Hi,

Thanks for your code, the results are very interesting.
Wondering if you could please update the repository with the LICENSE file?

Deconvolution runtime

Thanks for this paper, I really enjoyed reading it.

I replaced all the batch norm layers in a ResNeXt-50 model with ChannelDeconv(block=64) layers, but I found that training takes much longer doing so, running about 30% slower. Did you notice this too with your experiments? Do you have any suggestions for speeding up the deconvolution layers?

An exiting work!!! And there is not net_util.py. You may miss it. (:

Implementation details

First, let me congratulate you on your paper and also thank you for open-sourcing the code. I was porting the deconv operations/layers to Tensorflow and was wondering about something.

Is the deconv covariance buffer the vast majority of non-trainable parameters in your models? Practically speaking, without groups (which Tensorflow doesn't support easily), the cost for that matrix in terms of parameters = [K1 * K2 * num_blocks] ^ 2. For a 3x3 kernel with 64 blocks, that's roughly 330K parameters right there. Are grouped convolutions the only remedy to this parameter explosion? It might become a network bandwidth issue in multi-node distributed training setups.
Under what circumstances would one prefer the Delinear implementation over the FastDeconv implementation?

Edit: It seems the link to the paper in the readme is broken.

inference time

great work!
network deconvolution has many good features. However, in your paper, the inference time with DC is not mentioned. Is it almostly the same with BN method? I guess.

Concerns on the segmentation performance gap based on Sync-BN

Really nice work!

We are interested in your experimental results on semantic segmentation tasks (Cityscapes).

According to Figure-6, it seems that the proposed DeConv outperforms the BN by a really large margin of around 7~8%. However, it seems that you only report the results trained with 30 epochs and we are wondering about the performance gap after more training, e.g., 100 epochs.

Besides, in the current stage, most of the state-of-the-art segmentation methods use the Sync-BN to improve the results, thus I am also wondering whether have you compared your approach with the Sync-BN.

Last, we hope you could share with us the ImageNet pre-trained checkpoints of ResNet-101/50 based on Deconvolution and we might help to verify the effectiveness of your approach based on the current state-of-the-art segmentation systems.

It would be great if you could share with us your suggestions.

Thanks,

FastDeconv breaks when no bias is used

When setting bias = False in the constructor to FastDeconv, then the forward pass fails at:

b = self.bias - (w @ (X_mean.unsqueeze(1))).view(self.weight.shape[0], -1).sum(1)

This is because self.bias will be None and this line breaks.

I guess this would be possible:

if self.bias is None:
    b = - (w @ (X_mean.unsqueeze(1))).view(self.weight.shape[0], -1).sum(1)
else:
    b = self.bias - (w @ (X_mean.unsqueeze(1))).view(self.weight.shape[0], -1).sum(1)

When using Conv2d with BatchNorm, usually no bias is used in the Conv2d. However, when replacing both, then I guess a bias is needed again. So I'm not sure if there are useful cases for not using a bias when using the FastDeconv?

Thanks for the paper and the implementation!

1d please

Hi,
I bet this would also work for speech recognition and signal processing. maybe even nlp and algo trading.
I would love to try this out for speech recognition tasks in particular.
Can you please generate a 1d devonvolution class. Seems like it shouldn't be too difficult - especially if you really understand every line of code you wrote...

Thanks
Dan

Accuracy caclulation bug

the call to .view() in the function accuracy for the correct tensor (net_util.py line 342), fail when the tensor is not contiguous.
the solution it to transform the tensor to contiguous before apply the view.
a PR is on the way.