midasklr / yolov5prune Goto Github PK

View Code? Open in Web Editor NEW

539.0 5.0 111.0 3.07 MB

Dockerfile 0.21% Shell 0.82% Python 67.15% Jupyter Notebook 31.82%

yolov5prune's Introduction

yolov5模型剪枝

2022-1-4: 已更新v5.0版本m/l/x模型剪枝,理论上yolov5l6等模型也支持.

2022-1-1: 已更新v6.0版本剪枝:https://github.com/midasklr/yolov5prune/tree/v6.0

2021-12-14:近期会更新v6.0版本剪枝和蒸馏.

基于yolov5最新v5.0进行剪枝，采用yolov5s模型,目前仅支持s模型。

剪枝方法1

基于BN层系数gamma剪枝。

在一个卷积-BN-激活模块中，BN层可以实现通道的缩放。如下：

BN层的具体操作有两部分：

在归一化后会进行线性变换，那么当系数gamma很小时候，对应的激活（Zout）会相应很小。这些响应很小的输出可以裁剪掉，这样就实现了bn层的通道剪枝。

通过在loss函数中添加gamma的L1正则约束，可以实现gamma的稀疏化。

上面损失函数L右边第一项是原始的损失函数，第二项是约束，其中g(s) = |s|，λ是正则系数，根据数据集调整

实际训练的时候，就是在优化L最小，依据梯度下降算法：

𝐿′=∑𝑙′+𝜆∑𝑔′(𝛾)=∑𝑙′+𝜆∑|𝛾|′=∑𝑙′+𝜆∑𝛾∗𝑠𝑖𝑔𝑛(𝛾)

所以只需要在BP传播时候，在BN层权重乘以权重的符号函数输出和系数即可，对应添加如下代码:

            # Backward
            loss.backward()
            # scaler.scale(loss).backward()
            # # ============================= sparsity training ========================== #
            srtmp = opt.sr*(1 - 0.9*epoch/epochs)
            if opt.st:
                ignore_bn_list = []
                for k, m in model.named_modules():
                    if isinstance(m, Bottleneck):
                        if m.add:
                            ignore_bn_list.append(k.rsplit(".", 2)[0] + ".cv1.bn")
                            ignore_bn_list.append(k + '.cv1.bn')
                            ignore_bn_list.append(k + '.cv2.bn')
                    if isinstance(m, nn.BatchNorm2d) and (k not in ignore_bn_list):
                        m.weight.grad.data.add_(srtmp * torch.sign(m.weight.data))  # L1
                        m.bias.grad.data.add_(opt.sr*10 * torch.sign(m.bias.data))  # L1
            # # ============================= sparsity training ========================== #

            optimizer.step()
                # scaler.step(optimizer)  # optimizer.step
                # scaler.update()
            optimizer.zero_grad()

这里并未对所有BN层gamma进行约束，详情见yolov5s每个模块 https://blog.csdn.net/IEEE_FELLOW/article/details/117536808 分析，这里对C3结构中的Bottleneck结构中有shortcut的层不进行剪枝，主要是为了保持tensor维度可以加：

实际上，在yolov5中，只有backbone中的Bottleneck是有shortcut的，Head中全部没有shortcut.

如果不加L1正则约束，训练结束后的BN层gamma分布近似正太分布：

是无法进行剪枝的。

稀疏训练后的分布：

可以看到，随着训练epoch进行，越来越多的gamma逼近0.

训练完成后可以进行剪枝，一个基本的原则是阈值不能大于任何通道bn的最大gamma。然后根据设定的裁剪比例剪枝。

剪掉一个BN层，需要将对应上一层的卷积核裁剪掉，同时将下一层卷积核对应的通道减掉。

这里在某个数据集上实验。

首先使用train.py进行正常训练：

python train.py --weights yolov5s.pt --adam --epochs 100

然后稀疏训练：

python train_sparsity.py --st --sr 0.0001 --weights yolov5s.pt --adam --epochs 100

sr的选择需要根据数据集调整，可以通过观察tensorboard的map，gamma变化直方图等选择。在run/train/exp*/目录下:

tensorboard --logdir .

然后点击出现的链接观察训练中的各项指标.

训练完成后进行剪枝：

python prune.py --weights runs/train/exp1/weights/last.pt --percent 0.5 --cfg models/yolov5s.yaml

裁剪比例percent根据效果调整，可以从小到大试。注意cfg的模型文件需要和weights对应上,否则会出现运行prune 过程中出现键值不对应的问题,裁剪完成会保存对应的模型pruned_model.pt。

微调：

python finetune_pruned.py --weights pruned_model.pt --adam --epochs 100

在VOC2007数据集上实验,训练集为VOC07 trainval, 测试集为VOC07 test.作为对比,这里列举了faster rcnn和SSD512在相同数据集上的实验结果, yolov5输入大小为512.为了节省时间,这里使用AdamW训练100 epoch.

model	optim&epoch	sparity	[email protected]	mode size	forward time
faster rcnn		-	69.9(paper)
SSD512		-	71.6(paper)
yolov5s	sgd 300	0	67.4
yolov5s	adamw 100	0	66.3
yolov5s	adamw 100	0.0001	69.2
yolov5s	sgd 300	0.001	Inf. error
yolov5s	adamw 100	0.001	65.7	28.7	7.32 ms
55% prune yolov5s			64.1	8.6	7.30 ms
fine-tune above			67.3		7.21 ms
yolov5l	adamw 100	0	70.1
yolov5l	adamw 100	0.001	0.659		12.95 ms

在自己数据集上的实验结果:

model	sparity	map	mode size
yolov5s	0	0.322	28.7 M
sparity train yolov5s	0.001	0.325	28.7 M
65% pruned yolov5s	0.001	0.318	6.8 M
fine-tune	0	0.325	6.8 M

剪枝方法2

对于Bottleneck结构：

如果有右边的参差很小，那么就只剩下左边shortcut连接，相当于整个模块都裁剪掉。可以进行约束让参差逼近0.见train_sparsity2.py。

backbone一共有3个bottleneck，裁剪全部bottleneck：

model	sparity	map	model size
yolov5s-prune all bottlenet	0.001	0.167	28.7 M
85%+Bottlenet		0.151	1.1 M
finetune		0.148

裁剪Bottleneck数	map
所有bottle res	0.167
第2,3的bottle res	0.174
第3的bottle res	0.198

可以看到实际效果并不好，从bn层分布也可以看到，浅层特征很少被裁减掉。

剪枝方法3

卷积核剪枝，那些权重很小的卷积核对应输出也较小，那么对kernel进行约束，是可以对卷积核进行裁剪的。

裁剪卷积核需要将下一层BN层对应裁剪，同时裁剪下一层卷积层的输出通道。见train_sparsity3.py

	s	model size	map
sparity train	1e-5	28.7 M	0.335
50% kernel prune		8.4 M	0.151
finetune		8.4 M	0.332

剪枝方法4

混合1和3，见train_sparsity4.py

	map	model size
conv+bn sparity train	0.284	28.7 M
85% bn prune	0.284	3.7 M
78% conv prune	0.284	3.9 M
85% bn prune+78% conv prune	0.284	3.7 M

替换backbone

model	size	mAPval 0.5:0.95	mAPval 0.5
yolov5s	640	0.357	0.558
mobilenetv3small 0.75	640	0.24	0.421

调参

浅层尽量少剪,从训练完成后gamma每一层的分布也可以看出来.
系数λ的选择需要平衡map和剪枝力度.首先通过train.py训练一个正常情况下的baseline.然后在稀疏训练过程中观察MAP和gamma直方图变化,MAP掉点严重和gamma稀疏过快等情况下,可以适当降低λ.反之如果你想压缩一个尽量小的模型,可以适当调整λ.
稀疏训练=>剪枝=>微调可以反复迭代这个过程多次剪枝.
使用yolov5默认的一些参数通常效果能获得不错的效果，比如使用SGD训练300 epoch，lr 0.01->0.001等，这里实验为了快速选用adamw训练了100 epoch。
看到许多小伙伴提出了很多问题，有的我也没碰到，能解答的尽量解答。
剪枝多少参数，有的是时候和数据集关系很大，我分别在简单任务（5k images,40+ class）和复杂数据集（20w+ images， 120+ class）实验过，简单任务可以将模型剪到很小（小模型也相对不够鲁棒）；复杂的任务最终参数较难稀疏，能剪的参数很少（<20%）。
yolov5的s,m,l,x四个模型结构是一样的，只是深度和宽度两个维度的缩放系数不同，所以本代码应该也适用m,l,x模型。
可以试试用大模型开始剪枝，比如用yolov5l,可能比直接用yolov5s开始剪枝效果更好？大模型的搜索空间通常更大。
在自己的数据集上,设置合理的输入往往很重要, 公开数据集VOC和COCO等通常做了处理,例如VOC长边都是500, COCO长边都是640, 这也是SSD设置输入300和512, yolov5设置输入640的一个重要原因.如果要在自己数据集上获得较好的性能,可以试试调整输入.

常见问题

稀疏训练是非常种重要的,也是调参的重点,多观察bn直方图变化,过快或者过慢都不适合,所以需要平衡你的sr, lr等.一般情况下,稀疏训练的结果和正常训练map是比较接近的.
剪枝时候多试试不同的ratio,一个基本的准则是每层bn层至少保留一个channel,所以有时候稀疏训练不到位,而ratio设置的很大,会看到remaining channel里面会有0出现,这时候要么设置更小的ratio,要么重新稀疏训练,获得更稀疏的参数.
如果想要移植到移动端，可以使用ncnn加速，另外剪枝时控制剩余channel为2^n能有效提升推理速度；GPU可以使用TensorRT加速。

yolov5prune's People

Contributors

Stargazers

Watchers

Forkers

zbpjlc yanggui19891007 jie311 wuyifan2233 lindsayshuo huangwgang hiok2000 xingxu1996 liufqing jinjicheng haojunyong moyans ai-in-air voscar-zhang darrenonly peternara qianyuqianxun-deeplearning martinkeith yulongnan ycbob xinxin12345 ys31jp lc790 ppogg duyongqi dlut-lyz juxioa baynaa7 jimmyw-11 yezechen maniaclook archerprince happyday-lkj liulonghoi chenjs123 dongrenweilearning fardman69420 huyiming2016 laborer123 wstchhwp difanzhu victorfish0511 bailey-24 jh-001 liangyongjue chaoszzss lathour ldarryll shining-love blueyao17 guoxuan112 wy51084915 petal99 abbyqu noobgrow asalways1998 sporterman congliangzhou cv-cver silentera2 chen990627 dzbwhut futureflsl river-cold zy597337447 yanmanlelichen jayer95 bruceby tsingachieve tonyskypc zhangzetian666 lee0v0001 zhjiuang reecezhu athrunsunny gulugulu-water xiaoboyang2333 myl980 wanghe1997 putizi-super wudi-98 zhumingxu msf2865963568 kuiba12138 usbser songjiahao-wq python-eric cpf2021 diaofeng698 tt-maxwell senli123 princearcher xuanjiawang qiaoyukeji rainxuan2019 lifeng0718 gitqinxinyu nanhai78 guangqianzhang deyh2020

yolov5prune's Issues

作者您好，请问方法4的finetune怎么操作呢？我看代码中没有关于这个方法finetune的文件

稀疏训练时候遇到nan

你好，我在稀疏训练的时候设置的参数是sr=0.001，但是还是遇到了如下问题

请问这是什么原因呢

Traceback (most recent call last):
File "/home/k/yolov5_prune/train_sparsity.py", line 601, in
train(hyp, opt, device, tb_writer)
File "/home/k/yolov5_prune/train_sparsity.py", line 340, in train
optimizer.step()
File "/home/k/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/k/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/home/k/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/k/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/adamw.py", line 98, in step
exp_avgs.append(state['exp_avg'])
KeyError: 'exp_avg'

剪枝后转NCNN的问题

前辈您好，按照您的方法，已经稀疏训练和裁减完成，得到了较小的模型，测试结果也很好。但是pt转ncnn遇到了问题，可以转换过去，但是不知道怎么修改param文件，请问能麻烦按照您选练的结果出一个修改param文件的说明吗，感谢前辈。

train_sparsity.py训练时候报错？

请问这是什么原因呢？谢谢版主~

稀疏训练的时候模型配置cfg 不用填，吗？

换主干网络

换主干网络改的地方多吗？比如mobilenetv2

报错这个是什么问题呢

File "D:\Anaconda3\envs\y55\lib\site-packages\torch\serialization.py", line 762, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: STACK_GLOBAL requires str

运行train_prune_sparsity.py报错

作者您好，当我按您的步骤运行train_prune_sparsity.py的时候出现如下报错：
Traceback (most recent call last):
File "train_prune_sparsity.py", line 605, in
train(hyp, opt, device, tb_writer)
File "train_prune_sparsity.py", line 381, in train
if isinstance(layer, nn.BatchNorm2d) and i not in ignore_bn_list:
UnboundLocalError: local variable 'ignore_bn_list' referenced before assignment
Images sizes do not match. This will causes images to be display incorrectly in the UI.
请问这如何解决呢？
但是也无法声明ignore_bn_list为全局变量，因为你前面的代码已经调用了，请问如何解决？希望您能抽空回复一下！

剪枝出现问题希望版主帮忙看看什么原因

请问这是什么原因呢？谢谢版主~

bn_weights/hist

train.py报错

作者您好，当我按您的步骤运行train.py的时候出现如下报错：
Traceback (most recent call last):
File "train.py", line 548, in
train(hyp, opt, device, tb_writer)
File "train.py", line 193, in train
image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: '))
File "/project/train/src_repo/utils/datasets.py", line 72, in create_dataloader
prefix=prefix)
File "/project/train/src_repo/utils/datasets.py", line 385, in init
cache, exists = torch.load(cache_path), True # load
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: STACK_GLOBAL requires str
请问这如何解决呢？

你好，有一些问题想要请教，你工程中的YOLOv5是作者的第几个版本？还有除过train.py和yolo.py脚本的修改外，还对哪些脚本做了修改呢？

find sparsity value

Thanks for sharing your work.

you said that the sparsity value in ( train_sparsity.py --st --sr 0.0001) will be obtained from the gamma variation histogram. But I did not understand how this value is obtained?

Thanks.

Pruning was unsuccessful，HELP!

Normal training and sparse training are possible, but an error is displayed during pruning. Does the code need to be modified? Please explain in detail how to fix it，THANKS

Traceback` (most recent call last):
File "prune.py", line 790, in
test_prune(opt.data,

File "prune.py", line 451, in test_prune
pruned_model = ModelPruned(maskbndict=maskbndict, cfg=pruned_yaml, ch=3).cuda()

File "/home/yolov5prune-main/models/yolo.py", line 250, in init
self.model, self.save, self.from_to_map = parse_pruned_model(self.maskbndict, deepcopy(self.yaml), ch=[ch]) # model, savelist

File "/home/yolov5prune-main/models/yolo.py", line 654, in parse_pruned_model
cv2out = int(maskbndict[named_m_cv2_bn].sum())

KeyError: 'model.2.cv2.bn'

在prune程序中，报了NoneType错误

之前剪枝都没有问题，但是最近程序怎么都跑不了了。
在实例化ModelPruned()的时候，m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
中会报TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not NoneType
有哪位大神可以知道为什么吗？调试看了是Detect的输入三个都是None

剪枝的一些问题

问题一：我采用60%的剪枝比例，剪枝后labels、P、R、map值都变为0是什么原因？

问题二：

模型名称	模型大小
train.py --> last.pt	14.4M
train_sparsity.py --> last.pt	14.4M
orign_model.pt	28.6M
pruned_model.pt	7.8M
finetune_pruned.py --> last.pt	5.4M

上表是我运行四个程序后得到的模型，请问为什么会有四个大小不一样的模型？我看到您readme里面只有剪枝前和剪枝后的模型大小不一样。另外中间过程产生的的orign_model.pt反而比原始训练出来的模型更大了是为什么？还有最终剪枝后的模型是看pruned_model.pt还是看运行finetune_pruned.py得到的last.pt？

谢谢博主！

bn+conv剪枝存在剩余通道为0的情况

maskmergedict = {}
for k,v in maskconvdict.items():
key_bn = k.rsplit(".",1)[0]+".bn"
bmask = maskbndict[key_bn]
newmask = v*bmask
maskmergedict[key_bn] = newmask
最后剩余通道是bn所剩通道乘以conv所剩通道，可能会导致相乘后通道为0的情况
乘之前
layer name origin channels remaining channels
model.23.m.0.cv1.conv | 512 | 2 |
model.23.m.0.cv1.bn | 512 | 95 |
model.23.m.0.cv2.conv | 512 | 2 |
model.23.m.0.cv2.bn | 512 | 162 |
model.23.m.1.cv1.conv | 512 | 2 |
model.23.m.1.cv1.bn | 512 | 154 |
model.23.m.1.cv2.conv | 512 | 2 |
model.23.m.1.cv2.bn | 512 | 222 |
model.23.m.2.cv1.conv | 512 | 2 |
model.23.m.2.cv1.bn | 512 | 210 |
model.23.m.2.cv2.conv | 512 | 18 |
model.23.m.2.cv2.bn | 512 | 244
乘之后
model.23.m.0.cv1.bn tensor(0., device='cuda:0')
model.23.m.0.cv2.bn tensor(1., device='cuda:0')
model.23.m.1.cv1.bn tensor(2., device='cuda:0')

有关稀疏训练的问题

作者您好，首先感谢一下您的代码。我在使用python train_sparsity.py --st --sr 0.1 --weights yolov5s.pt --adam --epochs 100 --batch_size 180调用稀疏训练的代码时，发现bn层的权重变化很小，这种情况应该如何解决呢？

剪枝方法二为什么感觉有点奇怪

我将控制裁剪跳跃连接的mask置为0的代码取消了，但是还是造成了精度下降严重的问题，而且速度没有提升

m.l模型也同样适用吗？

作者大大你好，谢谢你的开源，请问这个代码对yolov5.5的m模型适用吗？如果对网络结构进行了修改，请问又应该再如何进行剪枝呢？请问可以提供一个思路吗

请问可以提供生成剪枝后的yaml模型文件？我想进行二次稀疏和剪枝。

大大您好，我看Liu的文章还采用了多次剪枝的策略，请问大大您的代码支持二次剪枝吗？在进行二次稀疏训练时需要指定新的yaml文件吗（还是采用yolov5s的yaml就可以了呢）？如果需要，请问如何生成新的剪枝后的yaml文件呢？

稀疏训练完自己的数据集剪枝的时候运行prune.py报错

======================================================================================================================================================
Test after prune:
YOLOv5 🚀 20713dac7 torch 1.9.0+cu102 CUDA:0 (Tesla T4, 15109.75MB)

model.0.conv.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv1.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv2.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv3.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.5.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.7.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.8.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.8.cv2.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv3.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.m.0.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.m.0.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.10.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.m.0.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.m.0.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.14.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.m.0.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.m.0.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.18.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.m.0.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.m.0.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.21.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv3.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
Suggested Gamma threshold should be less than 0.7051.
The corresponding prune ratio is 0.102, but you can set higher.
Gamma value that less than 0.7026 are set to zero!

| layer name | origin channels | remaining channels |
| model.0.conv.bn | 32 | 32 |
| model.1.bn | 64 | 64 |
| model.2.cv1.bn | 32 | 32 |
| model.2.cv2.bn | 32 | 32 |
| model.2.cv3.bn | 64 | 53 |
| model.2.m.0.cv1.bn | 32 | 32 |
| model.2.m.0.cv2.bn | 32 | 32 |
| model.3.bn | 128 | 85 |
| model.4.cv1.bn | 64 | 64 |
| model.4.cv2.bn | 64 | 58 |
| model.4.cv3.bn | 128 | 66 |
| model.4.m.0.cv1.bn | 64 | 64 |
| model.4.m.0.cv2.bn | 64 | 64 |
| model.4.m.1.cv1.bn | 64 | 64 |
| model.4.m.1.cv2.bn | 64 | 64 |
| model.4.m.2.cv1.bn | 64 | 64 |
| model.4.m.2.cv2.bn | 64 | 64 |
| model.5.bn | 256 | 191 |
| model.6.cv1.bn | 128 | 128 |
| model.6.cv2.bn | 128 | 128 |
| model.6.cv3.bn | 256 | 234 |
| model.6.m.0.cv1.bn | 128 | 128 |
| model.6.m.0.cv2.bn | 128 | 128 |
| model.6.m.1.cv1.bn | 128 | 128 |
| model.6.m.1.cv2.bn | 128 | 128 |
| model.6.m.2.cv1.bn | 128 | 128 |
| model.6.m.2.cv2.bn | 128 | 128 |
| model.7.bn | 512 | 512 |
| model.8.cv1.bn | 256 | 256 |
| model.8.cv2.bn | 512 | 506 |
| model.9.cv1.bn | 256 | 256 |
| model.9.cv2.bn | 256 | 256 |
| model.9.cv3.bn | 512 | 512 |
| model.9.m.0.cv1.bn | 256 | 256 |
| model.9.m.0.cv2.bn | 256 | 256 |
| model.10.bn | 256 | 247 |
| model.13.cv1.bn | 128 | 86 |
| model.13.cv2.bn | 128 | 94 |
| model.13.cv3.bn | 256 | 242 |
| model.13.m.0.cv1.bn | 128 | 102 |
| model.13.m.0.cv2.bn | 128 | 92 |
| model.14.bn | 128 | 48 |
| model.17.cv1.bn | 64 | 1 |
| model.17.cv2.bn | 64 | 21 |
| model.17.cv3.bn | 128 | 126 |
| model.17.m.0.cv1.bn | 64 | 10 |
| model.17.m.0.cv2.bn | 64 | 46 |
| model.18.bn | 128 | 103 |
| model.20.cv1.bn | 128 | 112 |
| model.20.cv2.bn | 128 | 100 |
| model.20.cv3.bn | 256 | 251 |
| model.20.m.0.cv1.bn | 128 | 76 |
| model.20.m.0.cv2.bn | 128 | 117 |
| model.21.bn | 256 | 253 |
| model.23.cv1.bn | 256 | 256 |
| model.23.cv2.bn | 256 | 253 |
| model.23.cv3.bn | 512 | 474 |
| model.23.m.0.cv1.bn | 256 | 254 |
| model.23.m.0.cv2.bn | 256 | 256 |

             from  n    params  module                                  arguments

0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18090 models.pruned_common.C3Pruned [64, 32, 32, 53, [[32, 32, 32]], 1, 128]
3 -1 1 40715 models.common.Conv [53, 85, 3, 2]
4 -1 1 142446 models.pruned_common.C3Pruned [85, 64, 58, 66, [[64, 64, 64], [64, 64, 64], [64, 64, 64]], 3, 256]
5 -1 1 113836 models.common.Conv [66, 191, 3, 2]
6 -1 1 602836 models.pruned_common.C3Pruned [191, 128, 128, 234, [[128, 128, 128], [128, 128, 128], [128, 128, 128]], 3, 512]
7 -1 1 1079296 models.common.Conv [234, 512, 3, 2]
8 -1 1 650740 models.pruned_common.SPPPruned [512, 256, 506, [5, 9, 13]]
9 -1 1 1179648 models.pruned_common.C3Pruned [506, 256, 256, 512, [[256, 256, 256]], 1, False]
10 -1 1 126958 models.common.Conv [512, 247, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 226052 models.pruned_common.C3Pruned [481, 86, 94, 242, [[86, 102, 92]], 1, False]
14 -1 1 11712 models.common.Conv [242, 48, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 15508 models.pruned_common.C3Pruned [114, 1, 21, 126, [[1, 10, 46]], 1, False]
18 -1 1 117008 models.common.Conv [126, 103, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 176331 models.pruned_common.C3Pruned [151, 112, 100, 251, [[112, 76, 117]], 1, False]
21 -1 1 572033 models.common.Conv [251, 253, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1148992 models.pruned_common.C3Pruned [500, 256, 253, 474, [[256, 254, 256]], 1, False]
detect input : ['model.0.conv.bn', 'model.1.bn', 'model.2.cv3.bn', 'model.3.bn', 'model.4.cv3.bn', 'model.5.bn', 'model.6.cv3.bn', 'model.7.bn', 'model.8.cv2.bn', 'model.9.cv3.bn', 'model.10.bn', 'model.10.bn', ['model.10.bn', 'model.6.cv3.bn'], 'model.13.cv3.bn', 'model.14.bn', 'model.14.bn', ['model.14.bn', 'model.4.cv3.bn'], 'model.17.cv3.bn', 'model.18.bn', ['model.18.bn', 'model.14.bn'], 'model.20.cv3.bn', 'model.21.bn', ['model.21.bn', 'model.10.bn'], 'model.23.cv3.bn'] 24 [17, 20, 23]
24 [17, 20, 23] 1 20496 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [126, 251, 474]]
Model Summary: 283 layers, 6264777 parameters, 6264777 gradients, 13.0 GFLOPS

Traceback (most recent call last):
File "prune.py", line 807, in
opt=opt
File "prune.py", line 546, in test_prune
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) # run once
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/project/train/src_repo/models/yolo.py", line 277, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/project/train/src_repo/models/yolo.py", line 308, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/project/train/src_repo/models/common.py", line 42, in forward
return self.act(self.bn(self.conv(x)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 178, in forward
self.eps,
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2282, in batch_norm
input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: running_mean should contain 128 elements not 85

使用train_sparsity.py训练时出错

刚开始训练时，优化器出错

剪枝方法3训练过程中报错

剪枝失败

步骤按照readme.md进行操作：1、正常训练（yolov5s 模型 57M） 2、稀疏训练（模型14.4M）
执行剪枝：
prune.py 378行： percent_limit = (sorted_bn == highest_thre).nonzero()[0, 0].item() / len(bn_weights) 报错，改为：
percent_limit = (sorted_bn == highest_thre).nonzero(as_tuple=False)[0, 0].item() / len(bn_weights)后继续运行剪枝报错如下：

model.0.conv.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv1.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv2.bn BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.2.cv3.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.4.cv3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.5.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.6.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.7.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.8.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.8.cv2.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.cv3.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.m.0.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.9.m.0.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.10.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.m.0.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.13.m.0.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.14.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.cv3.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.m.0.cv1.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.17.m.0.cv2.bn BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.18.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.cv3.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.m.0.cv1.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.20.m.0.cv2.bn BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.21.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.cv3.bn BatchNorm2d(512, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv1.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv2.bn BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
0.095703125
Suggested Gamma threshold should be less than 0.6982.
The corresponding prune ratio is 0.096, but you can set higher.

Traceback (most recent call last):
File "/home/keroro/Downloads/yolov5prune-main/prune.py", line 808, in
opt=opt
File "/home/keroro/Downloads/yolov5prune-main/prune.py", line 383, in test_prune
assert opt.percent < percent_limit, f"Prune ratio should less than {percent_limit}, otherwise it may cause error!!!"
AssertionError: Prune ratio should less than 0.095703125, otherwise it may cause error!!!

更改网络结构后该如何进行剪枝

作者大大，我想试着在yolov5结构的head和backbone里加入一些小的优化结构，在稀疏训练后，进行prune.py的时候报了错，按照错误修改yolo.py时遇见一些问题，请问作者大大def parse_pruned_model应该怎么修改呢？拜托作者大大了，请指教一下应该怎么修改和修改的思路！万分感谢！

commom.py中Concat_bifpn结构如下：

您好，请问剪枝方法2中对带shortcut的bottleneck进行裁剪为什么效果会变差呢？有没有原理上的可能原因呢？

作者您好，我在复现您代码的时候发现确实方法二效果确实不是很好，而且mAP损失非常大，但我可能不太明白为什么会这样，请问该原因能从原理上得出一些解释吗？

剪枝报错

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\init.py:388: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Traceback (most recent call last):
File "prune.py", line 809, in
test_prune(opt.data,
File "prune.py", line 470, in test_prune
pruned_model = ModelPruned(maskbndict=maskbndict, cfg=pruned_yaml, ch=3).cuda()
File "C:\Users\Administrator\Desktop\yolov5prune-main\models\yolo.py", line 250, in init
self.model, self.save, self.from_to_map = parse_pruned_model(self.maskbndict, deepcopy(self.yaml), ch=[ch]) # model, savelist
File "C:\Users\Administrator\Desktop\yolov5prune-main\models\yolo.py", line 717, in parse_pruned_model
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
File "C:\Users\Administrator\Desktop\yolov5prune-main\models\common.py", line 131, in init
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
File "C:\Users\Administrator\Desktop\yolov5prune-main\models\common.py", line 35, in init
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 430, in init
super(Conv2d, self).init(
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 131, in init
self.weight = Parameter(torch.empty(
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

(tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
(tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

prune.py加载剪枝后的模型报错，怎么看也不会是conv代码的问题吧，之前训练都好好的.....求教

运行python train_sparsity.py --st --sr 0.0001 --weights yolov5s.pt --adam --epochs 100 训练出现Nan

剪枝后拿剪枝的权重稀疏训练会报错

作者你好，很感谢你的代码，代码逻辑很严谨，经过阅读代码我找到了我之前的问题所在。但是我又遇到了新的问题，当我想稀疏训练-》剪枝-》微调-》稀疏训练，反复迭代这个过程会报错，请问，剪枝之后的模型可以继续拿来做稀疏训练吗

_pickle.UnpicklingError: STACK_GLOBAL requires str

作者您好，当我按您的步骤运行train.py或者train_sparsity.py的时候出现如下报错：
Traceback (most recent call last):
File "train.py", line 548, in
train(hyp, opt, device, tb_writer)
File "train.py", line 193, in train
image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: '))
File "/project/train/src_repo/utils/datasets.py", line 72, in create_dataloader
prefix=prefix)
File "/project/train/src_repo/utils/datasets.py", line 385, in init
cache, exists = torch.load(cache_path), True # load
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: STACK_GLOBAL requires str
请问这如何解决呢？

之前也有同学出现这个问题想问一下是怎么解决的

BN-hist图如何制作

非常感谢您的项目，受益匪浅，但是我在看稀疏训练后的数值时，感觉数值很多，看的有些迷糊，另外看到您的hist图很直观，方便请教一下有没有绘制hist图的代码，希望能够学习一下。非常感谢。

占用内存大

GPU nvidia-3060 12G
torch 1.7.1+cu110
Python 3.7.0
Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz

不知道大佬剪完后的内存占用多少，
我的v5s剪完后模型大小从13m降到8m，显存大概700m的大小，但是内存占用需要3个G

关于第二轮稀sparsity-> prune-> finetuning

你好！

我想问下train_prune_sparsity.py文件是否可用于第二次稀疏训练
我用train_prune_sparsity.py进行第二轮稀疏训练时，就算不加--st，模型的map和第一轮微调后的结果相差了6-7个点，这让我感觉很困惑，是不是需要基于ModelPruned方式构建模型进行训练

剪枝之后运行不再调用GPU怎么办

我将yolov5s剪枝到了5M左右,运行detect成功,但是发现速度很慢,转移到jetson nano 2G上之后发现他没有调用GPU(0.3s/P),请问如何进一步优化速度?

模型剪枝finetune后精度恢复

大佬您好，模型经过稀疏->剪枝->finetune后，size变小且精度恢复，但是存在一个问题是推断时间变慢，这种情况该如何解决呢？

Dataset

您好，readme里面的结果用的什么数据集啊，类别数有多少

减枝遇到维度不匹配的问题

您好
我按照readme 稀疏训练以后，使用prune.py时，出现了RuntimeError: Given groups=1, weight of size [115, 128, 1, 1], expected input[1, 111, 80, 80] to have 128 channels, but got 111 channels instead

去查了一下，好像是img读取的问题？但是我看别人没有提这个issue，所以想看看您能不能帮忙解决一下~

| layer name | origin channels | remaining channels |
| model.0.conv.bn | 32 | 32 |
| model.1.bn | 64 | 64 |
| model.2.cv1.bn | 32 | 32 |
| model.2.cv2.bn | 32 | 32 |
| model.2.cv3.bn | 64 | 64 |
| model.2.m.0.cv1.bn | 32 | 32 |
| model.2.m.0.cv2.bn | 32 | 32 |
| model.3.bn | 128 | 121 |
| model.4.cv1.bn | 64 | 64 |
| model.4.cv2.bn | 64 | 47 |
| model.4.cv3.bn | 128 | 115 |
| model.4.m.0.cv1.bn | 64 | 64 |
| model.4.m.0.cv2.bn | 64 | 64 |
| model.4.m.1.cv1.bn | 64 | 64 |
| model.4.m.1.cv2.bn | 64 | 64 |
| model.4.m.2.cv1.bn | 64 | 64 |
| model.4.m.2.cv2.bn | 64 | 64 |
| model.5.bn | 256 | 217 |
| model.6.cv1.bn | 128 | 128 |
| model.6.cv2.bn | 128 | 52 |
| model.6.cv3.bn | 256 | 208 |
| model.6.m.0.cv1.bn | 128 | 128 |
| model.6.m.0.cv2.bn | 128 | 128 |
| model.6.m.1.cv1.bn | 128 | 128 |
| model.6.m.1.cv2.bn | 128 | 128 |
| model.6.m.2.cv1.bn | 128 | 128 |
| model.6.m.2.cv2.bn | 128 | 128 |
| model.7.bn | 512 | 446 |
| model.8.cv1.bn | 256 | 181 |
| model.8.cv2.bn | 512 | 254 |
| model.9.cv1.bn | 256 | 60 |
| model.9.cv2.bn | 256 | 83 |
| model.9.cv3.bn | 512 | 148 |
| model.9.m.0.cv1.bn | 256 | 57 |
| model.9.m.0.cv2.bn | 256 | 96 |
| model.10.bn | 256 | 111 |
| model.13.cv1.bn | 128 | 60 |
| model.13.cv2.bn | 128 | 92 |
| model.13.cv3.bn | 256 | 153 |
| model.13.m.0.cv1.bn | 128 | 63 |
| model.13.m.0.cv2.bn | 128 | 87 |
| model.14.bn | 128 | 98 |
| model.17.cv1.bn | 64 | 42 |
| model.17.cv2.bn | 64 | 61 |
| model.17.cv3.bn | 128 | 109 |
| model.17.m.0.cv1.bn | 64 | 36 |
| model.17.m.0.cv2.bn | 64 | 44 |
| model.18.bn | 128 | 51 |
| model.20.cv1.bn | 128 | 35 |
| model.20.cv2.bn | 128 | 61 |
| model.20.cv3.bn | 256 | 94 |
| model.20.m.0.cv1.bn | 128 | 45 |
| model.20.m.0.cv2.bn | 128 | 64 |
| model.21.bn | 256 | 106 |
| model.23.cv1.bn | 256 | 75 |
| model.23.cv2.bn | 256 | 44 |
| model.23.cv3.bn | 512 | 168 |
| model.23.m.0.cv1.bn | 256 | 57 |
| model.23.m.0.cv2.bn | 256 | 63 |

             from  n    params  module                                  arguments

0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.pruned_common.C3Pruned [64, 32, 32, 64, [[32, 32, 32]], 1, 128]
3 -1 1 69938 models.common.Conv [64, 121, 3, 2]
4 -1 1 150296 models.pruned_common.C3Pruned [121, 64, 47, 115, [[64, 64, 64], [64, 64, 64], [64, 64, 64]], 3, 256]
5 -1 1 225029 models.common.Conv [115, 217, 3, 2]
6 -1 1 570332 models.pruned_common.C3Pruned [217, 128, 52, 208, [[128, 128, 128], [128, 128, 128], [128, 128, 128]], 3, 512]
7 -1 1 835804 models.common.Conv [208, 446, 3, 2]
8 -1 1 265492 models.pruned_common.SPPPruned [446, 181, 254, [5, 9, 13]]
9 -1 1 116370 models.pruned_common.C3Pruned [254, 60, 83, 148, [[60, 57, 96]], 1, False]
10 -1 1 16650 models.common.Conv [148, 111, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 129894 models.pruned_common.C3Pruned [319, 60, 92, 153, [[60, 63, 87]], 1, False]
14 -1 1 15190 models.common.Conv [153, 98, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 49736 models.pruned_common.C3Pruned [213, 42, 61, 109, [[42, 36, 44]], 1, False]
18 -1 1 50133 models.common.Conv [109, 51, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 54147 models.pruned_common.C3Pruned [149, 35, 61, 94, [[35, 45, 64]], 1, False]
21 -1 1 89888 models.common.Conv [94, 106, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 81207 models.pruned_common.C3Pruned [217, 75, 44, 168, [[75, 57, 63]], 1, False]
detect input : ['model.0.conv.bn', 'model.1.bn', 'model.2.cv3.bn', 'model.3.bn', 'model.4.cv3.bn', 'model.5.bn', 'model.6.cv3.bn', 'model.7.bn', 'model.8.cv2.bn', 'model.9.cv3.bn', 'model.10.bn', 'model.10.bn', ['model.10.bn', 'model.6.cv3.bn'], 'model.13.cv3.bn', 'model.14.bn', 'model.14.bn', ['model.14.bn', 'model.4.cv3.bn'], 'model.17.cv3.bn', 'model.18.bn', ['model.18.bn', 'model.14.bn'], 'model.20.cv3.bn', 'model.21.bn', ['model.21.bn', 'model.10.bn'], 'model.23.cv3.bn'] 24 [17, 20, 23]
24 [17, 20, 23] 1 10098 models.yolo.Detect [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [109, 94, 168]]
Model Summary: 283 layers, 2771100 parameters, 2771100 gradients

Traceback (most recent call last):
File "prune.py", line 809, in
opt=opt
File "prune.py", line 548, in test_prune
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) # run once
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/yolov5_prune/models/yolo.py", line 277, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/home/ubuntu/yolov5_prune/models/yolo.py", line 308, in forward_once
x = m(x) # run
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/yolov5_prune/models/pruned_common.py", line 39, in forward
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/yolov5_prune/models/common.py", line 42, in forward
return self.act(self.bn(self.conv(x)))
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [115, 128, 1, 1], expected input[1, 111, 80, 80] to have 128 channels, but got 111 channels instead

稀疏训练MAP为零

感谢你的开源代码，我首先使用train.py训练，map=0.9；然后我使用train_sparsity.py训练，sr=0.0001,发现map=0，当我想验证以下您的代码时，我又将sr=0,结果map=0.1；理论上sr=0时，train_spartrain_sparsity.pysity.py的结果应该等同于train.py，但是差距非常大。可以提供以下帮助吗，代码是否有问题呢。

关于剪枝的问题

想请问大佬这个一般是怎么回事

我想剪枝自己的模型，前面没什么问题，然后发现在这里出现了cuda的报错

打印一下发现是在这一层的问题，想看看里面是什么样子的

结果发现前面都能打印出来，到这里竟然打印不了了，明明是同样的东西

非常奇怪

使用train_sparsity4.py训练耗时太长怎么办？

在使用train_sparsity4.py训练coco训练集时，使用默认参数，一个epoch需要两个小时左右。
请问是什么原因呢？有加快的办法吗？

请问剪枝后模型运行速度加快了多少

你好，感谢你的code。想请问一下你剪枝方法4最后能使模型推导速度加快多少？

稀疏训练报错

请问报这个错是什么原因呢？应该怎么修改呢？

剪枝时剩余通道为1会报too many indices for tensor of dimension 3

具体错误如下：
#这是我把out_idx打印出来，前面有一个np.unsqueeze()，如果剩余通道为1那个用这个函数连这个1都没了
out_idx.shape: ()
layer.weight.size(): torch.Size([128, 512, 1, 1])
Traceback (most recent call last):
File "prune.py", line 827, in
opt=opt
File "prune.py", line 506, in test_prune
pruned_layer.weight.data = w[:,formerin, :, :].clone()
IndexError: too many indices for tensor of dimension 3

关于train_sparsity.py稀疏训练的问题

使用策略一进行稀疏训练，训练后权重文件变大，这种情况正常吗

fine_tuned 方法区别

请问作者，这几个fine_tuned方法有什么区别吗？

剪枝失败

环境：yolov5 是刚刚从github 拉取的最新版本 + yolov5prune 部分文件。
因为在 yolov5prune 工程中测试，发现无法运行。所以复制 train_sparsity.py prune.py prune_utils.py 到最新的 yolov5 工程中测试。

在 yolov5 工程中修改 data/yolov5ss.yaml 使用自己的数据训练。
在 models/yolov5ss.yaml 中修改 depth_multiple：0.01 width_multiple：0.01
使用 train_sparsity.py 进行训练，最终可以得到非常小的参数文件。
python train.py --adam --epochs 100
python train_sparsity.py --st --sr 0.0001 --weights runs/train/exp/lest.pt --adam --epochs 100
训练完成后进行剪枝
python prune.py --weights runs/train/exp1/weights/last.pt --percent 0.3

到这里无法进行下去了。

问题：

在 models/yolov5ss.yaml 中修改 depth_multiple：0.01 width_multiple：0.01
这里是否影响后面的剪枝？以及是不是能够这样修改
请问你的yolov5prune 是基于哪个版本的 yolov5 ? 为啥使用自己的数据无法进行训练
能否和最新的yolov5 进行一下同步呢
非常感谢！

prune

在 yolov5 工程中使用yolov5m6.yaml训练。
使用 train_sparsity.py 进行训练，最终可以得到非常小的参数文件。
python train.py --adam --epochs 100
python train_sparsity.py --st --sr 0.0001 --weights runs/train/exp/lest.pt --adam --epochs 100
训练完成后进行剪枝
python prune.py --weights runs/train/exp1/weights/last.pt --percent 0.3
到这里无法进行下去了,出现的问题如下：
Fusing layers...
Model Summary: 396 layers, 35439396 parameters, 0 gradients, 51.4 GFLOPS
prune module : dict_keys([])
model_list: {}
bn_weights: tensor([])
prune.py:382: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603729066392/work/torch/csrc/utils/python_arg_parser.cpp:882.)
percent_limit = (sorted_bn == highest_thre).nonzero()[0, 0].item() / len(bn_weights)
Traceback (most recent call last):
File "prune.py", line 818, in
opt=opt
File "prune.py", line 382, in test_prune
percent_limit = (sorted_bn == highest_thre).nonzero()[0, 0].item() / len(bn_weights)
IndexError: index 0 is out of bounds for dimension 0 with size 0
不知道是什么原因，麻烦你看下。
非常感谢！

关于sr这个参数有一些问题想请教一下大佬

首先非常感谢作者开源该项目代码。
其次有以下几个问题请教一下大佬：

"sr的选择需要根据数据集调整，可以通过观察tensorboard的map，gamma变化直方图等选择"，关于这句话的理解不是很深刻，能不能麻烦大佬给解释一下。
sr这个参数使用作者给出的“0.001”，但是对于自己的数据集会出现train.py训练的准确率与train_sparsity.py训练的准确率不相同，调小准确率会上升，具体的原因是什么呢？
做的几个实验以及实验的截图，如下：
sr=0.001--->0.0001

sr=0.0001--->0.00001

sr = 0.00001---->0.000001

MPA50的统计分别是