liuzhuang13 / slimming Goto Github PK

View Code? Open in Web Editor NEW

556.0 556.0 72.0 43 KB

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

License: MIT License

Lua 98.40% Shell 1.60%

convolutional-neural-networks deep-learning efficient-inference

slimming's People

Contributors

Stargazers

Watchers

slimming's Issues

Channel number to be pruned must be a pow of 2 ?

Hi @liuzhuang13,
There is another issue I met in my experiments: inference time reduced when channel number to be pruned was a pow of 2 , otherwise it increased and longer than baseline, which is not mentioned. Have you ever met same issue in your experiments? and any suggestions?

my hardware and system:
GPU: GeForce RTX 2080 12G
CPU: 58G, 12 cores
System: Ubuntu16.04

regards,
summer, Gao

Slimming MobileNet

Hi, Have you ever try it with MobileNet?

cifar10 flops higher than cifar100 on DenseNet(40% pruned)

Thanks for your great work, I have a small question related with calculating flops
In paper Table 1
cifar10 DenseNet-40 (40% Pruned), model FLOPs is 3.8110^8
cifar100 DenseNet-40 (40% Pruned), model FLOPS is 3.7110^8
Since cifar100 has 100 classes , while cifar10 has 10 classes
Why is the flops in cifar10 higher than flops in cifar100 in the same model
Thanks in advance

Bias masking in BN layers

Hi @liuzhuang13

I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask))
Since what you minimized and pruned are actually weight not bias.
For BN layers, y=γx+β.
You pruned small γ ones, but how about β ? It may be large or important.
For me, after I masked out β I got an enormous accuracy drop.

If there is any misunderstanding of the works just please tell me.
Thank you.

Connection between last conv layer and fc layer

Hi,
Thank you very much for the nice work.
I am just wondering, in your example for vgg, the last layer of conv which connected to the fc outputs 1x1 features. However, my network doesn't output a 1x1 in the last conv layer, so how should I connect the conv with fc? As the weights and output size doesnt match.
Thank you

Slimming DenseNet

Hi @liuzhuang13,

Thank you for a great work. I saw that you leveraged scaling factors of Batch normalization to prune incoming and outgoing weights at conv layers, However in DenseNet after a basic block (1x1 + 3x3) the previous features is concatenated to the current one and the dimension of scaling factors is not matched to that of the previous convolutional layer for pruning. So, How can you prune weights in this case?

By the way, when training sparsity DenseNet is finished with lambda 1e-5, I notice that many scaling factors are not small enough for pruning. Does this affect to the performance of compressed network?

Thanks,
Hai

Is it reasonable to get a threshold for all bn layers?

When calculating the threshold, the weight ordering of all bn layers is used. Is this reasonable?

Is there such a phenomenon:
① The first value of the network is closer to the image pixel value, and the last layer is closer to the category probability. bn's weight is not necessarily the same.
② There is a shortcut in the middle of the network. After the two convolution pixel values are superimposed, the weight parameter becomes larger. May affect bn's weight.

在计算阈值时，将使用所有bn层的权重排序。这合理吗？
是否存在这样的现象：
①网络最前面的数值，更靠近图像像素值，最后一层更靠近类别概率。bn的weight不一定分布相同。
②在网络中间有shortcut，两个卷积像素值叠加后，weight参数变大。可能会影响bn的weight。

sparse-training gets contrary BN distribution

Hi @liuzhuang13,

 After sparse training, I visualize histogram distributions of  all BN scaling factors in our yolov3 model which is used to detect  football,   I found a weird phenomenon--I  didn't get sparser BN scaling factors(more factors equals to 0 or near to 0) but far away from 0  compared with base  model, here is the visualization figs:

implementation of sparse regularization is continuous with your code , I am confused about it and expect your recommends sincerely , and pls tell me if there is any misunderstanding , thanks.

regards.
summer,Gao

local subgradient = S*torch.sign(weight)

L1 sparsity should be torch.abs(weight), can you detail more about it?
local subgradient = S*torch.sign(weight)

在剪枝程序vggprune.py中遇见程序报错

File "vggprune.py", line 73, in
mask = weight_copy.gt(thre).float().cuda()
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other'

Slimming Resnet

Dear @liuzhuang13,
I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right?
So I can not figure out how to slim residual block using your method.

The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

Almost the same situation in shortcut version. How do you handle this?

Thanks

What's the formulation of calculating flops?

Dear author,
I am confused with the formulation of flops calculation, especially in your Paper Network Slimming Table 2. How do you derive that the flops of VGGA is 4.57*10^10?

Best

liuzhuang13 / slimming Goto Github PK

slimming's People

Contributors

Stargazers

Watchers

Forkers

slimming's Issues

Recommend Projects

Recommend Topics

Recommend Org