Giter VIP home page Giter VIP logo

slimming's People

Contributors

liuzhuang13 avatar szq0214 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slimming's Issues

Channel number to be pruned must be a pow of 2 ?

Hi @liuzhuang13,
There is another issue I met in my experiments: inference time reduced when channel number to be pruned was a pow of 2 , otherwise it increased and longer than baseline, which is not mentioned. Have you ever met same issue in your experiments? and any suggestions?

my hardware and system:
GPU: GeForce RTX 2080 12G
CPU: 58G, 12 cores
System: Ubuntu16.04

regards,
summer, Gao

cifar10 flops higher than cifar100 on DenseNet(40% pruned)

Thanks for your great work, I have a small question related with calculating flops
In paper Table 1
cifar10 DenseNet-40 (40% Pruned), model FLOPs is 3.8110^8
cifar100 DenseNet-40 (40% Pruned), model FLOPS is 3.71
10^8
Since cifar100 has 100 classes , while cifar10 has 10 classes
Why is the flops in cifar10 higher than flops in cifar100 in the same model
Thanks in advance

Bias masking in BN layers

Hi @liuzhuang13

I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask))
Since what you minimized and pruned are actually weight not bias.
For BN layers, y=γx+β.
You pruned small γ ones, but how about β ? It may be large or important.
For me, after I masked out β I got an enormous accuracy drop.

If there is any misunderstanding of the works just please tell me.
Thank you.

Connection between last conv layer and fc layer

Hi,
Thank you very much for the nice work.
I am just wondering, in your example for vgg, the last layer of conv which connected to the fc outputs 1x1 features. However, my network doesn't output a 1x1 in the last conv layer, so how should I connect the conv with fc? As the weights and output size doesnt match.
Thank you

Slimming DenseNet

Hi @liuzhuang13,

Thank you for a great work. I saw that you leveraged scaling factors of Batch normalization to prune incoming and outgoing weights at conv layers, However in DenseNet after a basic block (1x1 + 3x3) the previous features is concatenated to the current one and the dimension of scaling factors is not matched to that of the previous convolutional layer for pruning. So, How can you prune weights in this case?

By the way, when training sparsity DenseNet is finished with lambda 1e-5, I notice that many scaling factors are not small enough for pruning. Does this affect to the performance of compressed network?

Thanks,
Hai

Is it reasonable to get a threshold for all bn layers?

When calculating the threshold, the weight ordering of all bn layers is used. Is this reasonable?

Is there such a phenomenon:
① The first value of the network is closer to the image pixel value, and the last layer is closer to the category probability. bn's weight is not necessarily the same.
② There is a shortcut in the middle of the network. After the two convolution pixel values are superimposed, the weight parameter becomes larger. May affect bn's weight.

在计算阈值时,将使用所有bn层的权重排序。 这合理吗?
是否存在这样的现象:
①网络最前面的数值,更靠近图像像素值,最后一层更靠近类别概率。bn的weight不一定分布相同。
②在网络中间有shortcut,两个卷积像素值叠加后,weight参数变大。可能会影响bn的weight。

sparse-training gets contrary BN distribution

Hi @liuzhuang13,

 After sparse training, I visualize histogram distributions of  all BN scaling factors in our yolov3 model which is used to detect  football,   I found a weird phenomenon--I  didn't get sparser BN scaling factors(more factors equals to 0 or near to 0) but far away from 0  compared with base  model, here is the visualization figs:

image

implementation of sparse regularization is continuous with your code , I am confused about it and expect your recommends sincerely , and pls tell me if there is any misunderstanding , thanks.

regards.
summer,Gao

Slimming Resnet

Dear @liuzhuang13,
I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right?
So I can not figure out how to slim residual block using your method.
image
The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image
Almost the same situation in shortcut version. How do you handle this?

Thanks

What's the formulation of calculating flops?

Dear author,
I am confused with the formulation of flops calculation, especially in your Paper Network Slimming Table 2. How do you derive that the flops of VGGA is 4.57*10^10?

Best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.