Giter VIP home page Giter VIP logo

group_sparsity's Introduction

This is the official implementation of "Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression".

Contents

  1. Introduction
  2. Motivation
  3. Contribution
  4. Dependencies
  5. Test
  6. Train
  7. Results
  8. Reference
  9. Acknowledgements

Introduction

In this paper, we analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense. By simply changing the way the sparsity regularization is enforced, filter pruning and lowrank decomposition can be derived accordingly. This provides another flexible choice for network compression because the techniques complement each other. For example, in popular network architectures with shortcut connections (e.g. ResNet), filter pruning cannot deal with the last convolutional layer in a ResBlock while the low-rank decomposition methods can. In addition, we propose to compress the whole network jointly instead of in a layer-wise manner. Our approach proves its potential as it compares favorably to the state-of-the-art on several benchmarks.

Motivation

Filter pruning and filter decomposition (also termed low-rank approximation) have been developing steadily. Filter pruning nullifies the weak filter connections that have the least influence on the accuracy of the network while low-rank decomposition converts a heavy convolution to a lightweight one and a linear combination. Despite their success, both the pruning-based and decomposition-based approaches have their respective limitations. Filter pruning can only take effect in pruning output channels of a tensor and equivalently cancelling out inactive filters. This is not feasible under some circumstances. The skip connection in a block is such a case where the output feature map of the block is added to the input. Thus, pruning the output could amount to cancelling a possible important input feature map. This is the reason why many pruning methods fail to deal with the second convolution of the ResNet basic block. As for filter decomposition, it always introduces another 1-by-1 convolutional layer, which means additional overhead of calling CUDA kernels. In this paper, we analyze the relationship between the two techniques from the perspective of compact tensor approximation.

A sparsity-inducing matrix A is attached to a normal convolution. The matrix acts as the hinge between filter pruning and decomposition. By enforcing group sparsity to the columns and rows of the matrix, equivalent pruning and decomposition operations can be obtained.

Contribution

1. The connection between filter pruning and decomposition is analyzed from the perspective of compact tensor approximation.

2. A sparsity-inducing matrix is introduced to hinge filter pruning and decomposition and bring them under the same formulation.

3. A bunch of techniques including binary search, gradient based learning rate adjustment, layer balancing, and annealing methods are developed to solve the problem.

4. The proposed method can be applied to various CNNs. We apply this method to VGG, DenseNet, ResNet, ResNeXt, and WRN.

The flowchart of the proposed algorithm.

Group sparsity enforced on the column of the sparsity-inducing matrix. Group sparsity enforced on the row of the sparsity-inducing matrix.

Dependencies

  • Python 3.7.4
  • PyTorch >= 1.2.0
  • numpy
  • matplotlib
  • tqdm
  • scikit-image
  • easydict
  • IPython

Test

  1. Download the model zoo from Google Drive or Dropbox. This contains the pretrained original models and the compressed models. Place the models in ./model_zoo.

  2. Cd to ./scripts.

  3. Use the following scripts in ./scripts/demo_test.sh to test the compressed models.

    Be sure the change the directories SAVE_PATH and DATA_PATH.

    SAVE_PATH: where the dataset is stored.

    SAVE_PATH: where you want to save the results.

	MODEL_PATH=../model_zoo/compressed
    SAVE_PATH=~/projects/logs/hinge_test/new
    DATA_PATH=~/projects/data

    ######################################
    # 1. VGG, CIFAR10
    ######################################
    MODEL=Hinge_VGG
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template "linear3_${TEMPLATE}_VGG" --model ${MODEL} --vgg_type 16 --test_only \
    --pretrain ${MODEL_PATH}/vgg_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 2. DenseNet, CIFAR10
    ######################################
    MODEL=Hinge_DENSENET_SVD
    LAYER=40
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template DenseNet --model ${MODEL} --depth ${LAYER} --test_only \
    --pretrain ${MODEL_PATH}/densenet_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 3. ResNet164, CIFAR10
    ######################################
    MODEL=Hinge_RESNET_BOTTLENECK
    LAYER=164
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --no_bias --test_only \
    --pretrain ${MODEL_PATH}/resnet164_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 4. ResNet164, CIFAR100
    ######################################
    MODEL=Hinge_RESNET_BOTTLENECK
    LAYER=164
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --no_bias --test_only \
    --pretrain ${MODEL_PATH}/resnet164_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 5. ResNet56, CIFAR10
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=56
    CHECKPOINT=${MODEL}_CIFAR10_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ResNet --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet56_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 6. ResNet20, CIFAR10
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=20
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet20_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 7. ResNet20, CIFAR100
    ######################################
    MODEL=Hinge_ResNet_Basic_SVD
    LAYER=20
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --downsample_type A --test_only \
    --pretrain ${MODEL_PATH}/resnet20_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 8. ResNeXt164, CIFAR10
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=164
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext164_cifar10.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 9. ResNeXt164, CIFAR100
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=164
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=1 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext164_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 10. ResNeXt20, CIFAR10
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=20
    TEMPLATE=CIFAR10
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext20_cifar10.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 11. ResNeXt20, CIFAR100
    ######################################
    MODEL=Hinge_RESNEXT
    LAYER=20
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_L${LAYER}
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template ${TEMPLATE} --model ${MODEL} --depth ${LAYER} --cardinality 32 --bottleneck_width 1 --test_only \
    --pretrain ${MODEL_PATH}/resnext20_cifar100.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 12. WRN, CIFAR100, 0.5
    ######################################
    MODEL=Hinge_WIDE_RESNET
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_0.5
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template "${TEMPLATE}_Wide_ResNet" --model ${MODEL} --depth 16 --widen_factor 10 --test_only \
    --pretrain ${MODEL_PATH}/wrn_cifar100_5.pt  --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}

    ######################################
    # 13. WRN, CIFAR100, 0.7
    ######################################
    MODEL=Hinge_WIDE_RESNET
    TEMPLATE=CIFAR100
    CHECKPOINT=${MODEL}_${TEMPLATE}_0.7
    echo $CHECKPOINT
    CUDA_VISIBLE_DEVICES=0 python ../main_hinge.py --save $CHECKPOINT --template "${TEMPLATE}_Wide_ResNet" --model ${MODEL} --depth 16 --widen_factor 10 --test_only \
    --pretrain ${MODEL_PATH}/wrn_cifar100_7.pt --dir_save ${SAVE_PATH} --dir_data ${DATA_PATH}
To test the original uncompressed models, please refer to [`./scripts/baseline_test.sh`](./scripts/baseline_test.sh)

Train

The scripts for compressing ResNet, DenseNet, VGG, ResNeXt, and WRN are released.

  1. Cd to ./scripts

  2. Make sure that the pretrained original models are already downloaded and placed in ./model_zoo/baseline.

  3. Run the scripts hinge_XXX.sh to reproduce the results in our paper, where XXX may be replace by vgg, densenet, resnet, resnext, and wide_resnet depending on which network you want to compress.

  4. Be sure the change the directories SAVE_PATH and DATA_PATH in hinge_XXX.sh.

Results

FLOP and parameter comparison between KSE and Hinge under different compression ratio. ResNet56 is compressed. Top-1 error rate is reported.

Comparison between SSS and the proposed Hinge method on ResNet and ResNeXt. Top-1 error rate is reported for CIFAR100.

Reference

If you find our work useful in your research of publication, please cite our work:

@inproceedings{li2020group,
  title={Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression},
  author={Li, Yawei and Gu, Shuhang and Mayer, Christoph and Van Gool, Luc and Timofte, Radu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2020}
}

Acknowledgements

This work was partly supported by the ETH Zurich Fund (OK), by VSS ASTRA, SBB and Huawei projects, and by Amazon AWS and Nvidia GPU grants.

This repository is built on EDSR (PyTorch). We thank the authors for making their EDSR codes public.

This repository is also based on the implementation of our former paper Learning Filter Basis for Convolutional Neural Network Compression. If you are interested, please refer to:

@inproceedings{li2019learning,
  title = {Learning Filter Basis for Convolutional Neural Network Compression},
  author = {Li, Yawei and Gu, Shuhang and Van Gool, Luc and Timofte, Radu},
  booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
  year = {2019}
}

group_sparsity's People

Contributors

ofsoundof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

group_sparsity's Issues

How to train from searching stage?

Hi, How can we resume training in the converging stage?

I know there are 200 epochs in the searching stage and 300 epochs in the converging stage by default.
Suppose we already get the model_converging_best.pt under the "model" directory.
I tried to load the pre-trained model using --save, --load and --pretrain, but all not working.
Thanks

Training not converging on DenseNet-bottleneck

Hi, I want to test your method on DenseNet with bottleneck structure on CIFAR100 (conv1x1 --> conv3x3)
I follow the code of densenet_svd.py and hinge_resnet_bottleneck.py,
mainly changing the following function

def compress_module_param(module, percentage, threshold):
    # Bias in None. So things becomes easier.
    # get the body
    '''
    # transition 
    (0): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU()
    (2): Conv2d(168, 84, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (3): Conv2d(84, 84, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): AvgPool2d(kernel_size=2, stride=2, padding=0)
    '''
    if isinstance(module, Transition):
        body = module

        conv1 = body._modules['2']
        conv2 = body._modules['3']

        ws1 = conv1.weight.data.shape
        weight1 = conv1.weight.data.squeeze().t()

        ws2 = conv2.weight.data.shape
        weight2 = conv2.weight.data.squeeze().t()

        # calculate pindex
        _, pindex = get_nonzero_index(weight1, dim='output', counter=1, percentage=percentage, threshold=threshold)

        pl = pindex.shape[0]
        weight1 = torch.index_select(weight1, dim=1, index=pindex) 
        conv1.weight = nn.Parameter(weight1.t().view(pl, ws1[1], ws1[2], ws1[3]))
        conv1.out_channels = pl

        # compress conv2
        conv2.weight = nn.Parameter(torch.index_select(weight2, dim=0, index=pindex).t().view(ws2[0], pl, ws2[2], ws2[3]))
        conv2.in_channels = pl

    elif isinstance(module, BottleNeck):
        # with torchsnooper.snoop():
        body = module._modules['body']
        # conv1x1
        conv1 = body._modules['2']
        batchnorm1 = body._modules['3'] # conv1-output 对应的 batchnorm
        conv2 = body._modules['5']

        
        # get conv weights
        ws1 = conv1.weight.data.shape
        weight1 = conv1.weight.data.squeeze().t()
        
        bn_weight1 = batchnorm1.weight.data
        bn_bias1 = batchnorm1.bias.data
        bn_mean1 = batchnorm1.running_mean.data
        bn_var1 = batchnorm1.running_var.data
        
        ws2 = conv2.weight.data.shape
        weight2 = conv2.weight.data.view(ws2[0], ws2[1] * ws2[2] * ws2[3]).t()
        
        # selection compressed channels
        _, pindex1 = get_nonzero_index(weight1, dim='output', counter=1, percentage=percentage, threshold=threshold)
        pl1 = len(pindex1)
        conv1.weight = nn.Parameter(torch.index_select(weight1, dim=1, index=pindex1).t().view(pl1, -1, 1, 1))
        conv1.out_channels = pl1

        # batchnorm1
        batchnorm1.weight = nn.Parameter(torch.index_select(bn_weight1, dim=0, index=pindex1)) 
        batchnorm1.bias = nn.Parameter(torch.index_select(bn_bias1, dim=0, index=pindex1))
        batchnorm1.running_mean = torch.index_select(bn_mean1, dim=0, index=pindex1)
        batchnorm1.running_var = torch.index_select(bn_var1, dim=0, index=pindex1)
        batchnorm1.num_features = pl1
        
        # conv2
        index = torch.repeat_interleave(pindex1, ws2[2] * ws2[3]) * ws2[2] * ws2[3] + \
                torch.tensor(range(0, ws2[2] * ws2[3])).repeat(pindex1.shape[0]).cuda()
        weight2 = torch.index_select(weight2, dim=0, index=index)
        # weight2 = torch.index_select(weight2, dim=1, index=pindex3)
        conv2.weight = nn.Parameter(weight2.view(ws2[0], pl1, 3, 3))
        conv2.in_channels = pl1
        # exit(0)
    else:
        raise NotImplementedError('Do not need to compress the layer ' + module.__class__.__name__)

while testing the model using default parameters, the top-1 test error change as follows:

test

Did you test your model on DenseNet-bottleck during experiment?
I was wondering is there something wrong with my code, if not, why the testing loss behavior like this?

Thanks for your time and looking forward to your reply.

Download model_zoo.zip problem

Hi, thanks for your work and sharing of the code!
However, I have a problem downloading the pretrained models
(1) I tried to use VPN to directly download the model_zoo.zip but filed (Which I can download other google drive file successfully.)
(2) I follow this tutorial and use wget to download the model, but still not working.
Can you share the directory of the file? Since I saw the download link
"...view?usp=sharing" flash for one second and disappear and redirect to "...view"

Thanks for your time and looking forward to your reply.

Reproducing ImageNet Results

Hi,

@ofsoundof The scripts for CIFAR-10 and CIFAR-100 are here

For examples what sparsity (compression ratio) did you use for ResNet-50, hyper parameters etc.
Can you please point me to scripts to reproduce the ImageNet results ?

Thanks in advance!

Bug: remaining filter equal to 1

Hi, I find a bug in hinge_densenet_svd.py inthis line

loss_proj.append(torch.sum(torch.sum(projection.squeeze().t() ** 2, dim=1) ** (q / 2)) ** (1 / q))

When the remaining filter number equals one, it will give the following error.
The code should be changed to the following to squeeze the last two dimensions

loss_proj.append(torch.sum(torch.sum(projection.squeeze(3).squeeze(2).t() ** 2, dim=1) ** (q / 2)) ** (1 / q))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.