Hello: Thank you for your code! I've read your code carefully and found that you d

非常感谢您的耐心回复，明白了，可是非常抱歉，还是有2个问题：如果仅仅是为了获得bias的初始化，那么使用“GM cente

"The Lottery Ticket Hypothesis" believes that a small portion of

Why do you do_mask before the training process? (purning_imagenet.py) about filter-pruning-geometric-median HOT 12 CLOSED

he-y commented on June 6, 2024

Why do you do_mask before the training process? (purning_imagenet.py)

from filter-pruning-geometric-median.

Comments (12)

he-y commented on June 6, 2024 1

Yes, I think initialization is important. You can take a look at "Understanding the difficulty of training deep feedforward neural networks".
From your results, it might be safe to conclude that our setting is different from "random initialization".
By the way, our latest result achieves 93.45% with some sort of new initialization.

from filter-pruning-geometric-median.

leoozy commented on June 6, 2024

I believe I misunderstood your code but could not find the right answer

from filter-pruning-geometric-median.

he-y commented on June 6, 2024

Q1: mask before training.
A1: This operation is for the compatibility of pruning pretrained models and scratch models.

Q2: do the do_mask after the training process.
The pruned filters will not change for this setting. I adopt the code from my previous work (soft filter pruning:https://github.com/he-y/soft-filter-pruning). This operation is for the compatibility of soft filter pruning.

from filter-pruning-geometric-median.

leoozy commented on June 6, 2024

非常抱歉，我可能没有描述清楚我的问题。请问一下，您的工作如果train_from_scatch和我在构建模型时候直接构建小模型有什么区别么？我有点想不通，因为您在训练开始之前，就把一部分filter给置零了，并且不会更新梯度，那么这些filter永远是0，那么不论是用“norm小的不重要”原则还是您提出的几何中心原则，重新选择的filter永远还是这些filter，他们始终是0，那么您提出的算法在train_from_scatch setting上不就没有意义了么？因为无论什么原则选择的都是这一批，在训练之前就置零的filter，且永远不会改变，这不就相当于构建了一个小模型从scatch开始训练么？
期待您的回复，非常感谢您！

from filter-pruning-geometric-median.

leoozy commented on June 6, 2024

请问一下，如果您的filter每次都不变的话，为什么在每个训练epoch后还有重新init_mask呢?请问是我哪里理解错了么？

from filter-pruning-geometric-median.

he-y commented on June 6, 2024

For your first question, they are different.
To make it simple, the difference is the random initialization, and "biased" random initialization.

The initialization code is as follows:

filter-pruning-geometric-median/models/resnet.py

Lines 71 to 81 in 44030b7

 for m in self.modules(): 

 if isinstance(m, nn.Conv2d): 

 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 

 m.weight.data.normal_(0, math.sqrt(2. / n)) 

 #m.bias.data.zero_() 

 elif isinstance(m, nn.BatchNorm2d): 

 m.weight.data.fill_(1) 

 m.bias.data.zero_() 

 elif isinstance(m, nn.Linear): 

 init.kaiming_normal(m.weight) 

 m.bias.data.zero_()

If you build a small model, the filter distribution of your small model is the normal distribution. If you build a large model and utilize some pruning criteria to remove filters, the distribution of remaining filters would NOT be the normal distribution.
You can take a look at "The Lottery Ticket Hypothesis" (ICLR best paper) to see how important the initialization is.

from filter-pruning-geometric-median.

he-y commented on June 6, 2024

For your second question, you can delete these codes as they have no influence on the results.
I keep them because my former project (soft filter pruning) needs them.

from filter-pruning-geometric-median.

leoozy commented on June 6, 2024

非常感谢您的耐心回复，明白了，可是非常抱歉，还是有2个问题：

如果仅仅是为了获得bias的初始化，那么使用“GM center” 或者 “less norm is unimportant”有什么区别么，因为都是对于随机初始化的filter进行删减，感觉这个时候的filter并没有语义信息，所以强调“GM center”的作用会觉得非常奇怪；

2.请问一下，在prune Pre_trained model的时候，您直接一次性Prune掉prune rate占比的filter，并且不会改变，那这和您论文中算法1中所说的，在每个epoch最后重新寻找N_i+1 * P 个filter有点不一样，因为不改变的话就没有必要重新寻找了，我想咨询一下如果使用soft的方法每次改变，或者每次pruning rate逐渐增加，每次都只寻找一部分，会不会效果更好一些呢？非常感谢您的耐心！

from filter-pruning-geometric-median.

he-y commented on June 6, 2024

"The Lottery Ticket Hypothesis" believes that a small portion of the original random weight could be the winning ticket. The question is how to find them. Randomly removing them is not a good choice. Maybe you can do a simple experiment to see the difference.
The algorithm 1 in the paper is a general framework that includes your recommended settings and my experiment setting.

2.1. FPGM +SFP.
please take a look at this reply.

2.2. Increasing pruning rate.
Please take a look at my extended journal paper for SFP: Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks.

from filter-pruning-geometric-median.

leoozy commented on June 6, 2024

您好，非常感谢您之前的耐心回复，对我的帮助很大。我跑了一下您的代码，但是把GM部分改为随机挑选40%的channel pruning，在对于pre-trained model 的Pruning效果还是很明显的。但是对于from scatch，我运行的结果是93.10，结构为resnet56, 比您论文中的结果均值要高，您觉得bias initialization是否真的有用，特别是rethinking这篇论文对于彩票假说的否定。

from filter-pruning-geometric-median.

Yejing-Lai commented on June 6, 2024

I believe I misunderstood your code but could not find the right answer

请问您解决该问题了嘛。我也有相似的疑问，filter置如果已经置为0之后，后面还会进行更新吗？期待您的回复~

from filter-pruning-geometric-median.

he-y commented on June 6, 2024

@Yejing-Lai FPGM does not update the filters. SFP will update the filters.

from filter-pruning-geometric-median.

Why do you do_mask before the training process? (purning_imagenet.py) about filter-pruning-geometric-median HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for m in self.modules():
	if isinstance(m, nn.Conv2d):
	n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
	m.weight.data.normal_(0, math.sqrt(2. / n))
	#m.bias.data.zero_()
	elif isinstance(m, nn.BatchNorm2d):
	m.weight.data.fill_(1)
	m.bias.data.zero_()
	elif isinstance(m, nn.Linear):
	init.kaiming_normal(m.weight)
	m.bias.data.zero_()