Giter VIP home page Giter VIP logo

filter-pruning-geometric-median's Introduction

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

i1

CVPR 2019 Oral.

Implementation with PyTorch. This implementation is based on soft-filter-pruning.

What's New

FPGM has been re-implemented in Pytorch and NNI.

Usage in Pytorch

from torch.ao.sparsity.pruning._experimental.pruner import FPGM_pruner

# set network-level sparsity: all layers have a sparsity level of 30%
pruner = FPGMPruner(sparsity_level = 0.3)

# set layer-level sparsity: sparsity_level of conv2d1 = 30%, sparsity_level of conv2d2 = 50%
config = [
    {"tensor_fqn": "conv2d1.weight"},
    {"tensor_fqn": "conv2d2.weight", "sparsity_level": 0.5}
]

pruner.prepare(model, config)
pruner.enable_mask_update = True
pruner.step()

# Get real pruned models (without zeros)
pruned_model = pruner.prune()

See source code here and official test code here.

Usage in NNI

from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()

See explanation here.

Table of Contents

Requirements

  • Python 3.6
  • PyTorch 0.3.1
  • TorchVision 0.3.0

Models and log files

The trained models with log files can be found in Google Drive. Specifically:

models for pruning ResNet on ImageNet

models for pruning ResNet on CIFAR-10

models for pruning VGGNet on CIFAR-10

models for ablation study

The pruned model without zeros, refer to this issue.

Training ResNet on ImageNet

Usage of Pruning Training

We train each model from scratch by default. If you wish to train the model with pre-trained models, please use the options --use_pretrain --lr 0.01.

Run Pruning Training ResNet (depth 152,101,50,34,18) on Imagenet:

python pruning_imagenet.py -a resnet152 --save_path ./snapshots/resnet152-rate-0.7 --rate_norm 1 --rate_dist 0.4 --layer_begin 0 --layer_end 462 --layer_inter 3  /path/to/Imagenet2012

python pruning_imagenet.py -a resnet101 --save_path ./snapshots/resnet101-rate-0.7 --rate_norm 1 --rate_dist 0.4 --layer_begin 0 --layer_end 309 --layer_inter 3  /path/to/Imagenet2012

python pruning_imagenet.py -a resnet50  --save_path ./snapshots/resnet50-rate-0.7 --rate_norm 1 --rate_dist 0.4 --layer_begin 0 --layer_end 156 --layer_inter 3  /path/to/Imagenet2012

python pruning_imagenet.py -a resnet34  --save_path ./snapshots/resnet34-rate-0.7 --rate_norm 1 --rate_dist 0.4 --layer_begin 0 --layer_end 105 --layer_inter 3  /path/to/Imagenet2012

python pruning_imagenet.py -a resnet18  --save_path ./snapshots/resnet18-rate-0.7 --rate_norm 1 --rate_dist 0.4 --layer_begin 0 --layer_end 57 --layer_inter 3  /path/to/Imagenet2012

Explanation:

Note1: rate_norm = 0.9 means pruning 10% filters by norm-based criterion, rate_dist = 0.2 means pruning 20% filters by distance-based criterion.

Note2: the layer_begin and layer_end is the index of the first and last conv layer, layer_inter choose the conv layer instead of BN layer.

Usage of Normal Training

Run resnet(100 epochs):

python original_train.py -a resnet50 --save_dir ./snapshots/resnet50-baseline  /path/to/Imagenet2012 --workers 36

Inference the pruned model with zeros

sh function/inference_pruned.sh

Inference the pruned model without zeros

The pruned model without zeros, refer to this issue.

Scripts to reproduce the results in our paper

To train the ImageNet model with / without pruning, see the directory scripts. Full script is here.

Training ResNet on Cifar-10

sh scripts/pruning_cifar10.sh

Please be care of the hyper-parameter layer_end for different layer of ResNet.

Reproduce ablation study of Cifar-10:

sh scripts/ablation_pruning_cifar10.sh

Training VGGNet on Cifar-10

Refer to the directory VGG_cifar.

sh VGG_cifar/scripts/PFEC_train_prune.sh

Four function included in the script, including training baseline, pruning from pretrain, pruning from scratch, finetune the pruend

Our method

sh VGG_cifar/scripts/pruning_vgg_my_method.sh

Including pruning the pretrained, pruning the scratch.

Notes

Torchvision Version

We use the torchvision of 0.3.0. If the version of your torchvision is 0.2.0, then the transforms.RandomResizedCrop should be transforms.RandomSizedCrop and the transforms.Resize should be transforms.Scale.

Why use 100 epochs for training

This can improve the accuracy slightly.

Process of ImageNet dataset

We follow the Facebook process of ImageNet. Two subfolders ("train" and "val") are included in the "/path/to/ImageNet2012". The correspding code is here.

FLOPs Calculation

Refer to the file.

Citation

@inproceedings{he2019filter,
  title     = {Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration},
  author    = {He, Yang and Liu, Ping and Wang, Ziwei and Hu, Zhilan and Yang, Yi},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2019}
}

filter-pruning-geometric-median's People

Contributors

he-y avatar onetaken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

filter-pruning-geometric-median's Issues

VGGNet on CIFAR-10

image
I'm trying to reproduce this results, but I got worse results, the accuracy of Pruned without FT I got are much lower than the results in this table, which is about 25%(PFEC). Could you upload your pre-trained checkpoint that is used to prune?

Scripts' and logs' hyper-parameters are different.

In case of Resnet20-cifar10, scripts say learning rate is 0.01. However logs say 0.1.
Also, In paper you said that 40% means "30% pruned by distance and 10% pruned by norm".
However, the logs name is "cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2",
and it says that "'rate_dist': 0.1, 'rate_norm': 1.0".
which one is correct?

np.abs over consine distance

Hi, thanks for your great work. I have one question on distance calculation:
https://github.com/he-y/filter-pruning-geometric-median/blob/master/pruning_cifar10.py#L528

similar_matrix = 1 - distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'cosine')
similar_sum = np.sum(np.abs(similar_matrix), axis=0)

while the cosine distance in scipy is defined as follow:
cosine_dist
Therefore, the distance between two vectors might be negative, right?
Why here use the np.abs over similar_matrix(FPGM distance matrix) on cosine distance?

Thanks

filters after training

m.do_similar_mask()
net = m.model
After these operations('do_similar_mask'),this model's several filters have been set to zeros.
This paper is based on soft filters pruning, but after training, I found the 'model.paramters' didn't update.What is the reasons?Or Where did I get it wrong? Thanks~
Screenshot from 2019-06-21 16-02-26

Screenshot from 2019-06-21 15-57-51

Can't find the images in IMAGENET/TRAIN/ folder

I have set the arg data as /data/groceries/imagenet1
and it is supposed to get the training data from the arg above + /train

But the mistake occured:

RuntimeError: Found 0 files in subfolders of: /data/groceries/imagenet1/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif

Set the gradients of pruned filters zero

@he-y thanks for your work and sharing!
I have a question: In train() function:

Mask grad for iteration

    m.do_grad_mask()
    optimizer.step()

How do you change model's grad through do_grad_mask for m?

Question about the complexity

Thanks for the nice work first! And I have some question about complexity.

As stated section 3.4 and the codebase, you choose to calculate pairwise distance in equation (6) by scipy.spatial.distance.cdist. Suppose we have n filters with dimension d, the complexity here is $n^2 d$. If we explicit compute the geometric median, the complexity is $ nd $ with time for computing geometric median.
The latter seems to be more efficient. So why do you choose equation (6) instead of equation (4)?

Thanks

Abour similar_small_index

When running the code, I found that the similar_small_index always includes indexes from 0 to n,such as [0,1,2,3,4,5,6,7,8] .And the similar_sum matrix always has some equal numbers in the matrix. Is this because there are some setttings to constrain the matrix?

Tried to allocate one gpu but the commond does work

Nomatter which one I reffered to BY USING
1.CUDA_VISIBLE_DEVICES= # in the shell line
or
2. os.environ["CUDA_VISIBLE_DEVICES"] = "#" in the py file
it returns the error below
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 11.90 GiB total capacity; 11.34 GiB already allocated; 18.94 MiB free; 579.50 KiB cached)

(the GPU0 is used)

About the mathematical notations.

Hi, thanks for open-sourcing the code.

I got confused on some of the mathematical notations.

  1. Why did you choose the geometric mean ? Is there theoratical evidence that 'points near GM can be represented by others'? Why don't just use the mean value of all points, because the mean value is exactly the linear combination of all points.

  2. Eq.10 of the paper.

    TIM图片20190709202555
    step-1 to step-2 is valid ONLY IF g(x) and ||x - F_{i,j^*}|| share the same minimal point. Is there asumption or proof to guarantee this?

FLOPS Calculator has error.

Your flops calculator has error.
Surely, number of filter is integer, but in your code, it is float.
In your pruning algorithm applying ceil to number of filter make it right.
After correct them, in case of resnet110-cifar10,
40% pruning only reduce 40.26% FLOPS and
50% pruning only reduce 50.47% FLOPS.
I have to check it again, but I'm sure your one is wrong.

python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory

I'm trying to run the Training VGGNet on Cifar-10 example, but I've got the following:
python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory

When I run this:

~/filter-pruning-geometric-median$ sh VGG_cifar/scripts/PFEC_train_prune.sh

Could you please help me on this? Am I doing something wrong?

python: can't open file 'pruning_train.py': [Errno 2] No such file or directory

请问这个问题如何解决,我想在tensorflow模型上使用你们的方法修建网络,请问代码的核心部分在什么地方?期待你的回复!
How to solve this problem? I want to use your method to build the network in tensorflow model. Where is the core part of the code? I look forward to your reply.

How to get_small_model

I download the resnet18-5c106cde.pth and run "sh pruning_cifar10.sh ".
pruning_scratch_resnet18(){
...
--arch resnet18
--use_state_dict
--layer_begin 0 --layer_end 48 --layer_inter 3 --epoch_prune 1}
And I get the 'checkpoint.pth.tar'.
When I run the get_small_model.py in SFP, ('--resume', default='checkpoint.pth.tar').There is an error:

/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'models.res_utils.DownsampleA' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "utils/get_small_model.py", line 346, in
main()
File "utils/get_small_model.py", line 83, in main
state_dict = remove_module_dict(state_dict)
File "utils/get_small_model.py", line 161, in remove_module_dict
for k, v in state_dict.items():
File "/home/vinsen/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'items'

distance.cdist

similar_matrix = distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'euclidean')
请问这一语句是实现哪一步的,为什么要求自身的欧氏距离呢?

pruning vgg issue

Hi! I want to prune the vggnet model.But I found that pruning_cifar_vgg.py didn't use the 'do_grad_mask' function.
Is there any difference between resnet and vgg?
Hope for your reply~

File not available

Python files "training baseline, pruning from pretrain, pruning from scratch, finetune the pruend" in reproducing paper are not available.

Maybe a Typo in pruning_cifar10.sh

Thanks for sharing the codes.
In the scripts, the file pruning_cifar10.sh may have some typos.
As the save name is ratenorm0.7_ratedist0.1, the input for argument '--rate_norm' is 1.
Is it a mistake?
If not, I would need help to understand it.

the pruning interval looks useless.

your algorithms looks prune the network every end of a epoch. However, it is not.

If you prune P% weights. Norm criterion is disabled, because zero is the lowest.

Also, Distance criterion is disabled, because all of pruned weights's distance is zero.

I checked the pruned index of every end of epoch, and they are not changed.

Am I wrong?

Reproduction of ResNet-18 from scratch

Hi! After run 'python pruning_imagenet.py -a resnet18 --save_dir ./snapshots/resnet18-rate-0.7 --rate_norm 1 --rate_dist 0.3 --layer_begin 0 --layer_end 57 --layer_inter 3 /home/share/data/ilsvrc12_shrt_256_torch/' with 4 GPUs,batchsize=256

The result got Prec@1 66.662 Prec@5 87.440 Error@1 33.338 which has a little drop as yours in Top-1 @67.78 Top-5 @88.01
How & how

similar_index_for_filter issue

Thanks for the excellent work. But I got a problem here.

In "get_filter_similar" method, the mask is generated by: similar_index_for_filter = [filter_large_index[i] for i in similar_small_index].

Elements of similar_small_index and filter_large_index vectors are indices of filters. Why use the index of filters to slice another list (filter_large_index[i])? On the other hand, errors may occur.

Think this way, there are 16 filters. the similar_small_index contains 15, which is one of the smallest simi_sum filters which need to be pruned. But filter_large_index contains 10 elements only and 15 is not included.

Why do you do_mask before the training process? (purning_imagenet.py)

Hello:
Thank you for your code! I've read your code carefully and found that you do do_mask() and do_similar_mask() before the training process, the simplified code is listed below:
`
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model
....
.....
for epoch in range(1, max_epoches):
...
...
train()
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model

`
I really can not understand it. You initialize the model randomly and do the do_mask() before the first epoch. It just like you get the mask and select the filters randomly. As my understanding, you zero out the grad of the selected filters while training and also set the value of the selected filters zero. Will the values of these filters be zeros all the time since they are set zero before training and will not be updated?

Then if the filters selected before training are always zero, there is no meaning to do the do_mask after the training process.

Besides, could you please tell me whether the selected filters are changed after each epoch? Since you do the init_mask after every training process.

geometric median issue

As issue #8

The zero filters which is set in mask operation will always be in the first several indexes in weight_vec_after_norm.
So if we use distance.cdist to calculate similar_matrix, the similar_matrix is like this
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0 0.x 0.x
0.x 0.x 0.x 0.x 0.x 0 0.x 0.x
the left top matrix values are all zeros.
so these zero filters will always become the geometric-median, because these filters are near the geometric-median point in params space.

And if we train net from scratch, these mask out zeros filters depends on the random weight initialization. Is that correct?

Is my understanding right?

Is it reasonable to get zero filters larger than the theoretical pruned num?

Thanks a lot for your great work!
When I run the code, I have met some problems in getting the small model, and I have tried to find the reasons.
As described in the issue:Is it reasonable to get zero filters larger than the theoretical pruned num? e.g the filter num in first conv is 64, the pruned rate is 0.3, so, the theoretical pruned num is 45, but I got 50 zero filters. Is it reasonable?
This problem caused the failure in getting the small model because the kept filters num may not equal to the theoretical value. To get the small model successfully, I have to pick some zero filters as the kept filters, which leads to the precision decrease in the small model.
During reading your code, it seems that the zero filters should be equal to the theoretical num, I am not clear if it is right?

indices = torch.LongTensor(filter_large_index).cuda()

pruning_scratch_resnet20 0 ./data/yahe/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1

when i want to run on cpu,there are erros:
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:74

File "pruning_cifar10.py", line 524, in get_filter_similar
indices = torch.LongTensor(filter_large_index).cuda()

Figure 2. norm-based criterion.question

Hello! I have a question. In Figure 2, Figure 2a, Small Norm Deviation, green is not good because the variance is small and the search space is small. Figure 2b, Large Minimum Norm, green is not good because v1 ′ ′ v1 → 0. I don't understand why the minimum value of the norm needs to be close to 0, and why is the blue one good?

Question on equivalence of g(x) and g'(x) in FPGM_updated

Hi, thanks for your work, I have one question on why you prove the equivalence on g(x) and g'(x), as in equation (5)-(9)
"Note that even if F_{i,j*} is not included in the calculation of the geometric median in Equation(4),
we could also achieve the same result."

Thanks in advance

small_model.pt accuracy is not equal to big_model.pt

Hi, when i train a resnet50 in imagenet, and i use get_small.sh obtain big_model.pt and small_model.pt. the accuracy is equal. but when i train a resnet50 in private dataset, the obtained small_model.pt accuracy is not equal to big_model.pt. what's the reason about it. the prunning rate is 0.3.

yolov3的darknet模型

非常感谢您上一个问题的解答。我还有一个问题想请教您一下。我现在是想用yolov3算法,使用Darknet53模型进行目标识别,想使用您这个压缩算法对得到的模型(数据集是VOC)进行压缩。要怎么进行修改呢,比如训练函数train()或者精度等方面。刚刚接触这一方面,希望您能给一些建议,谢谢~

Question about the deployment of the pruned model

Thanks for your great work. I wonder if I understand your algorithm well and my question is as follow:

As we all know, the number of output channels of some layers in resnet must be same because of the existence of residual block.

Take resnet for cifar10 as example:

  1. The number of output channels of block1 and conv0 are both 16.
  2. Assuming conv0 prunes the 1st, 2nd, 3rd channels (namely, the rest 13 channels are remained),.
  3. Assuming block1/layer0 prunes the 4th, 5th channels (namely, the rest 14 channels are remained).
  4. Then when forwarding the model, we need to add the feature maps from conv0 and block1/layer0 all. The resulting feature maps still have 16 channels because conv0 and block1/layer0 prune different channels.

Does what I described above right?

If so, the deployed model still has to contain all zero weights, or I will not know which filters are pruned and how should I match features maps from conv0 and block0/layer1 when executing element-wise add operation.

Hopy for your reply!

Channel Pruning should be done in the BN layer after the convolution layer.

The formula of the BN layer is like this: (x - bn.mean) * bn.weight / sqrt(bn.variance + 1e-5) + bn.bias. after channel pruning in a convolution layer, some channels in x are zeros, but according to the formula, the output of these pruned channels in the following BN layer are not zeros. If you get the pruned model without handling each BN layer, the performance drops significantly.

About geometric median pruning (get_filter_similar function)

As you said here, even when you define a compress rate (e.g pruning rate) the total remaining filters are less than the actual compress rate.

That said,

  • Why prune by geometric median after applying l2-norm?
  • Isn't similarity a good proxy of the importance of the filter by itself?

One question about calculate layer_end and last_index

Thank you for you great work. I have one question about layer end.

As the title say, What layer_end means, how to calculate it? And why last_index is layer_end - 3 when running init_rate in main().

self.mask_index = [x for x in range(0, last_index, 3)], as I think, there is a "3" here because conv2d contains 1 parameter and BN layer contains 2, but you only focus on conv2d layer, so you skip BN's parameters, right? But how about parameters in projection layer? As I understand, you do not consider parameters in projection layer in ResNet, but it seems that you do not skip them here.

Hope for your reply

Question about pruning_cifar10.sh.

When I read the pruning_cifar10.sh file,I found the following shell function:

run20_2(){
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&


(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&
}

I have some question about this function, why do you run three sub shells at the same time? I find the parameters of function pruning_scratch_resnet20 in these three sub shells are same except the save_path. Can I just run one sub shell, such as the first one?
It would be appreciate if i could receive your reply! Thank you!

Pruning filters and channels

Hi! I have a puzzle that if you prune some filters in a specific layer, consequently you also need to prune channels in the next layer accordingly.But your codes just zerorize filters for each layer instead of considering the ordered relationship between two layers i.e. remove both filters in the ith layer and channels in the i+1th layer. Especially when the pruning process ends, if all targeted filters are pruned in the whole network, Error will be occured in the inference process, cause the reason overhead.

Prune weights and bias together

Hi,

Thanks for your work. I have a question, for a convolution layer which has both weights and bias, should I prune them together? I noticed in your implementation, in conv2d, you always set bias = False. For example, a convolution layer has weights tensor [3, 3, 64, 256] and bias tensor 256, after pruning, I should get something like weight tensor [3, 3, 64, 175] and bias tensor 175?

the small model of resnet56 for cifar10 performs not as well as the big model ?

Hi, I have trained with pruning a resnet56 for cifar10. The big model has accuracy of 89%, while the small model has only 10%. I looked into the details and found there seems to be an error in pruning training.
When training with pruning, only the first dim of conv weight are masked 0 during filtering. The second dim of conv weight, which matches the input channel number, will not be masked even if some channel of the last conv are masked 0.
However, when transfering to small model, the second dim of conv weight has also to be filtered to keep the shape of input and conv weight match. As a result, some non-zero weights are pruned, and the small model's accuracy drops a lot.
Have you noticed this issue? And how did you solve it ?
Thank you for your patience.
Looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.