Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

This is the official implementation of "Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression".

Introduction
Motivation
Contribution
Dependencies
Test
Train
Results
Reference
Acknowledgements

Introduction

In this paper, we analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense. By simply changing the way the sparsity regularization is enforced, filter pruning and lowrank decomposition can be derived accordingly. This provides another flexible choice for network compression because the techniques complement each other. For example, in popular network architectures with shortcut connections (e.g. ResNet), filter pruning cannot deal with the last convolutional layer in a ResBlock while the low-rank decomposition methods can. In addition, we propose to compress the whole network jointly instead of in a layer-wise manner. Our approach proves its potential as it compares favorably to the state-of-the-art on several benchmarks.

Motivation

Filter pruning and filter decomposition (also termed low-rank approximation) have been developing steadily. Filter pruning nullifies the weak filter connections that have the least influence on the accuracy of the network while low-rank decomposition converts a heavy convolution to a lightweight one and a linear combination. Despite their success, both the pruning-based and decomposition-based approaches have their respective limitations. Filter pruning can only take effect in pruning output channels of a tensor and equivalently cancelling out inactive filters. This is not feasible under some circumstances. The skip connection in a block is such a case where the output feature map of the block is added to the input. Thus, pruning the output could amount to cancelling a possible important input feature map. This is the reason why many pruning methods fail to deal with the second convolution of the ResNet basic block. As for filter decomposition, it always introduces another 1-by-1 convolutional layer, which means additional overhead of calling CUDA kernels. In this paper, we analyze the relationship between the two techniques from the perspective of compact tensor approximation.

A sparsity-inducing matrix A is attached to a normal convolution. The matrix acts as the hinge between filter pruning and decomposition. By enforcing group sparsity to the columns and rows of the matrix, equivalent pruning and decomposition operations can be obtained.

Contribution

1. The connection between filter pruning and decomposition is analyzed from the perspective of compact tensor approximation.

2. A sparsity-inducing matrix is introduced to hinge filter pruning and decomposition and bring them under the same formulation.

3. A bunch of techniques including binary search, gradient based learning rate adjustment, layer balancing, and annealing methods are developed to solve the problem.

4. The proposed method can be applied to various CNNs. We apply this method to VGG, DenseNet, ResNet, ResNeXt, and WRN.

The flowchart of the proposed algorithm.


Group sparsity enforced on the column of the sparsity-inducing matrix.	Group sparsity enforced on the row of the sparsity-inducing matrix.

Dependencies

Python 3.7.4
PyTorch >= 1.2.0
numpy
matplotlib
tqdm
scikit-image
easydict
IPython

Test

Download the model zoo from Google Drive or Dropbox. This contains the pretrained original models and the compressed models. Place the models in ./model_zoo.
Cd to ./scripts.
Use the following scripts in ./scripts/demo_test.sh to test the compressed models.

Be sure the change the directories SAVE_PATH and DATA_PATH.

SAVE_PATH: where the dataset is stored.

SAVE_PATH: where you want to save the results.

	MODEL_PATH=../model_zoo/compressed
    SAVE_PATH=~/projects/logs/hinge_test/new
    DATA_PATH=~/projects/data

    ######################################
    # 1. VGG, CIFAR10
    ######################################
    MODEL=Hinge_VGG
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template "linear3_${TEMPLATE}_VGG" --model ${MODEL} --vgg_type 16 --test_only \
    --pretrain ${MODEL_PATH}/vgg_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 2. DenseNet, CIFAR10
    ######################################
    MODEL=Hinge_DENSENET_SVD
    LAYER=40
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template DenseNet --model ${MODEL} --depth ${LAYER} --test_only \
    --pretrain ${MODEL_PATH}/densenet_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 3. ResNet164, CIFAR10
    ######################################
    MODEL=Hinge_RESNET_BOTTLENECK
    LAYER=164
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --no_bias --test_only \
    --pretrain ${MODEL_PATH}/resnet164_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 4. ResNet164, CIFAR100
    ######################################
    MODEL=Hinge_RESNET_BOTTLENECK
    LAYER=164
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --no_bias --test_only \
    --pretrain ${MODEL_PATH}/resnet164_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 5. ResNet56, CIFAR10
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=56
    CHECKPOINT=${MODEL}_CIFAR10_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ResNet --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet56_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 6. ResNet20, CIFAR10
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=20
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet20_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 7. ResNet20, CIFAR100
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=20
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet20_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 8. ResNeXt164, CIFAR10
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=164
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext164_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 9. ResNeXt164, CIFAR100
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=164
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext164_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 10. ResNeXt20, CIFAR10
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=20
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext20_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 11. ResNeXt20, CIFAR100
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=20
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext20_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 12. WRN, CIFAR100, 0.5
    ######################################
    MODEL=Hinge_WIDE_RESNET
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_0.5
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template "${TEMPLATE}_Wide_ResNet" --model ${MODEL} --depth 16 --widen_factor 10 --test_only \
    --pretrain ${MODEL_PATH}/wrn_cifar100_5.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 13. WRN, CIFAR100, 0.7
    ######################################
    MODEL=Hinge_WIDE_RESNET
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_0.7
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template "${TEMPLATE}_Wide_ResNet" --model ${MODEL} --depth 16 --widen_factor 10 --test_only \
    --pretrain ${MODEL_PATH}/wrn_cifar100_7.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

To test the original uncompressed models, please refer to [`./scripts/baseline_test.sh`](./scripts/baseline_test.sh)

Train

The scripts for compressing ResNet, DenseNet, VGG, ResNeXt, and WRN are released.

Cd to ./scripts
Make sure that the pretrained original models are already downloaded and placed in ./model_zoo/baseline.
Run the scripts hinge_XXX.sh to reproduce the results in our paper, where XXX may be replace by vgg, densenet, resnet, resnext, and wide_resnet depending on which network you want to compress.
Be sure the change the directories SAVE_PATH and DATA_PATH in hinge_XXX.sh.

Results

FLOP and parameter comparison between KSE and Hinge under different compression ratio. ResNet56 is compressed. Top-1 error rate is reported.

Comparison between SSS and the proposed Hinge method on ResNet and ResNeXt. Top-1 error rate is reported for CIFAR100.

Reference

If you find our work useful in your research of publication, please cite our work:

@inproceedings{li2020group,
  title={Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression},
  author={Li, Yawei and Gu, Shuhang and Mayer, Christoph and Van Gool, Luc and Timofte, Radu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2020}
}

Acknowledgements

This work was partly supported by the ETH Zurich Fund (OK), by VSS ASTRA, SBB and Huawei projects, and by Amazon AWS and Nvidia GPU grants.

This repository is built on EDSR (PyTorch). We thank the authors for making their EDSR codes public.

This repository is also based on the implementation of our former paper Learning Filter Basis for Convolutional Neural Network Compression. If you are interested, please refer to:

@inproceedings{li2019learning,
  title = {Learning Filter Basis for Convolutional Neural Network Compression},
  author = {Li, Yawei and Gu, Shuhang and Van Gool, Luc and Timofte, Radu},
  booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
  year = {2019}
}

Training not converging on DenseNet-bottleneck

Hi, I want to test your method on DenseNet with bottleneck structure on CIFAR100 (conv1x1 --> conv3x3)
I follow the code of densenet_svd.py and hinge_resnet_bottleneck.py,
mainly changing the following function

def compress_module_param(module, percentage, threshold):
    # Bias in None. So things becomes easier.
    # get the body
    '''
    # transition 
    (0): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU()
    (2): Conv2d(168, 84, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (3): Conv2d(84, 84, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): AvgPool2d(kernel_size=2, stride=2, padding=0)
    '''
    if isinstance(module, Transition):
        body = module

        conv1 = body._modules['2']
        conv2 = body._modules['3']

        ws1 = conv1.weight.data.shape
        weight1 = conv1.weight.data.squeeze().t()

        ws2 = conv2.weight.data.shape
        weight2 = conv2.weight.data.squeeze().t()

        # calculate pindex
        _, pindex = get_nonzero_index(weight1, dim='output', counter=1, percentage=percentage, threshold=threshold)

        pl = pindex.shape[0]
        weight1 = torch.index_select(weight1, dim=1, index=pindex) 
        conv1.weight = nn.Parameter(weight1.t().view(pl, ws1[1], ws1[2], ws1[3]))
        conv1.out_channels = pl

        # compress conv2
        conv2.weight = nn.Parameter(torch.index_select(weight2, dim=0, index=pindex).t().view(ws2[0], pl, ws2[2], ws2[3]))
        conv2.in_channels = pl

    elif isinstance(module, BottleNeck):
        # with torchsnooper.snoop():
        body = module._modules['body']
        # conv1x1
        conv1 = body._modules['2']
        batchnorm1 = body._modules['3'] # conv1-output 对应的 batchnorm
        conv2 = body._modules['5']

        
        # get conv weights
        ws1 = conv1.weight.data.shape
        weight1 = conv1.weight.data.squeeze().t()
        
        bn_weight1 = batchnorm1.weight.data
        bn_bias1 = batchnorm1.bias.data
        bn_mean1 = batchnorm1.running_mean.data
        bn_var1 = batchnorm1.running_var.data
        
        ws2 = conv2.weight.data.shape
        weight2 = conv2.weight.data.view(ws2[0], ws2[1] * ws2[2] * ws2[3]).t()
        
        # selection compressed channels
        _, pindex1 = get_nonzero_index(weight1, dim='output', counter=1, percentage=percentage, threshold=threshold)
        pl1 = len(pindex1)
        conv1.weight = nn.Parameter(torch.index_select(weight1, dim=1, index=pindex1).t().view(pl1, -1, 1, 1))
        conv1.out_channels = pl1

        # batchnorm1
        batchnorm1.weight = nn.Parameter(torch.index_select(bn_weight1, dim=0, index=pindex1)) 
        batchnorm1.bias = nn.Parameter(torch.index_select(bn_bias1, dim=0, index=pindex1))
        batchnorm1.running_mean = torch.index_select(bn_mean1, dim=0, index=pindex1)
        batchnorm1.running_var = torch.index_select(bn_var1, dim=0, index=pindex1)
        batchnorm1.num_features = pl1
        
        # conv2
        index = torch.repeat_interleave(pindex1, ws2[2] * ws2[3]) * ws2[2] * ws2[3] + \
                torch.tensor(range(0, ws2[2] * ws2[3])).repeat(pindex1.shape[0]).cuda()
        weight2 = torch.index_select(weight2, dim=0, index=index)
        # weight2 = torch.index_select(weight2, dim=1, index=pindex3)
        conv2.weight = nn.Parameter(weight2.view(ws2[0], pl1, 3, 3))
        conv2.in_channels = pl1
        # exit(0)
    else:
        raise NotImplementedError('Do not need to compress the layer ' + module.__class__.__name__)

while testing the model using default parameters, the top-1 test error change as follows:

Did you test your model on DenseNet-bottleck during experiment?
I was wondering is there something wrong with my code, if not, why the testing loss behavior like this?

Thanks for your time and looking forward to your reply.

ofsoundof / group_sparsity Goto Github PK

group_sparsity's Introduction

Contents

Introduction

Motivation

Contribution

Dependencies

Test

Train

Results

Reference

Acknowledgements

group_sparsity's People

Contributors

Stargazers

Watchers

Forkers

group_sparsity's Issues

Recommend Projects

Recommend Topics

Recommend Org