liuzechun / metapruning Goto Github PK

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. In ICCV 2019.

License: MIT License

Python 99.29% Shell 0.71%

metapruning's Introduction

MetaPruning

This is the pytorch implementation of our paper "MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning", https://arxiv.org/abs/1903.10258, published in ICCV 2019.

Traditional pruning decides pruning which channel in each layer and pays human effort in setting the pruning ratio of each layer. MetaPruning can automatically search for the best pruning ratio of each layer (i.e., number of channels in each layer).

MetaPruning contains two steps:

train a meta-net (PruningNet), to provide reliable weights for all the possible combinations of channel numbers in each layer (Pruned Net structures).
search for the best Pruned Net by evolutional algorithm and evaluate one best Pruned Net via training it from scratch.

Citation

If you use the code in your research, please cite:

@inproceedings{liu2019metapruning,
  title={Metapruning: Meta learning for automatic neural network channel pruning},
  author={Liu, Zechun and Mu, Haoyuan and Zhang, Xiangyu and Guo, Zichao and Yang, Xin and Cheng, Kwang-Ting and Sun, Jian},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={3296--3305},
  year={2019}
}

Run

Requirements:
- python3, pytorch 1.1.0, torchvision 0.3.0
ImageNet data:
- You need to split the original training images into sub-validation dataset, which contains 50000 images randomly selected from the training images with 50 images in each 1000-class, and sub-training dataset with the rest of images. Training the PruningNet with the sub-training dataset and searching the pruned network with the sub-validation dataset for inferring model accuracy.
Steps to run:
- Step1: training
- Step2: searching
- Step3: evaluating
- After training the Pruning Net, checkpioint.pth.tar will be generated in the training folder, which will be loaded by the searching algorithm. After searching is done, the top1 encoding vector will be shown in the log. By simply copying the encoding vector to the rngs = [ ] in evaluate.py, you can evaluate the Pruned Network corresponding to this encoding vector.

Models

MobileNet v1

	Uniform Baselines		Meta Pruning
Ratio	Top1-Acc	FLOPs	Top1-Acc	FLOPs	Model
1x	70.6%	569M	-	-	-
0.75x	68.4%	325M	70.9%	316M	Model-MetaP-Mbv1-0.75
0.5x	63.7%	149M	66.1%	142M	Model-MetaP-Mbv1-0.5
0.25x	50.6%	41M	57.2%	41M	Model-MetaP-Mbv1-0.25

MobileNet v2

Uniform Baselines		Meta Pruning
Top1-Acc	FLOPs	Top1-Acc	FLOPs	Model
74.7%	585M	-	-	-
72.0%	313M	72.7%	303M	Model-MetaP-Mbv2-300M
67.2%	140M	68.2%	140M	Model-MetaP-Mbv2-140M
54.6%	43M	58.3%	43M	Model-MetaP-Mbv2-40M

ResNet

	Uniform Baselines		Meta Pruning
Ratio	Top1-Acc	FLOPs	Top1-Acc	FLOPs	Model
1x	76.6%	4.1G	-	-	-
0.75x	74.8%	2.3G	75.4%	2.0G	Model-MetaP-ResN-0.75
0.5x	72.0%	1.1G	73.4%	1.0G	Model-MetaP-ResN-0.5

metapruning's People

Contributors

Stargazers

Watchers

metapruning's Issues

FLOPs calculation

Hi,
Thanks for your code. How do you calculate your FLOPs for MobileNet-V2 model? You count multiply and add as 1 or 2?

Training time？

How long does it take to train PrunedNet?How many GPU hours?

Why the accuracy is low in the start of training about mobilenetV2 part?

During the training, I only changed the batch size from the default 512 to 256, but the Acc@1 is only about 5% and the Acc@5 is only about 15% before the 5th epoch. Do you encounter this situation during the training? Thank you !

very curious about how good about the accuracy without fine-tuning pruned network

I'm very curious about how good about the accuracy without fine-tuning the pruned network. It may be not as good as directly trained network. If not the best prediction from Pruning network. Maybe you can not make sure the searched results are the best

Where to find other models shown in the paper? Such as 124M/105M/84M MobileNetV2?

Problems in searching and evaluating stage

Hi, I really appreciate this work and want to reproduce paper result. But there exists two problems in my experiment.
First, as mentioned in the paper, the searching process only costs a few seconds, but i ran it over one day and did not have result... So i wonder is there any available searching results (may be a model architecture vector) corresponding to the paper result, even one result on ResNet50 will be fine.
Second, as mentioned in README.md, "By simply copying the encoding vector to the rngs = [ ] in evaluate.py, you can evaluate the Pruned Network corresponding to this encoding vector." , but i can not find "rngs = [ ]" in evaluate.py, and can not run it.
By the way, the ResNet-50 model you provided on One-drive is over-sized and can not be downloaded. Can you provide it by other way.
Looking forward to your reply, Chinese will be fine~

running error

Thank you for you interesting work. When i run the resnet code,the error appears in the below. I try to adjust the batchsize, but it does not work. any idea about this?
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 6.31 GiB already allocated; 8.02 MiB free; 32.19 MiB cached)

Where to find the following tow parameters in mobilenetv2.py?

To evaluate mobilenetv2 in my dataset, I don't know where to find the following tow parameters.

overall_channel_ids = [14, 12, 11, 11, 8, 8, 8, 7, 7, 7, 7, 8, 8, 8, 14, 14, 14, 8]
mid_channel_ids = [6, 5, 2, 15, 7, 17, 14, 5, 11, 12, 12, 11, 6, 16, 14, 13, 14]

After the search step, I got the following files in the search directory:

Waiting for your reply!

question about the Model-MetaP-Mbv2-140M accuracy

after load the pretrain model, when i start the evaluate process, the top1 accuracy of first batch in training dataset is only 46.09%, so why the train accuracy is so slow?

the steps are as follows,
Firstly, i download the Model-MetaP-Mbv2-140M directory, there are two files, checkpoint.pth.tar and mobilenet_v2.py.
Then, i copy mobilenet_v2.py to evaluating dir, checkpoint.pth.tar to evaluating/models, and run the run.sh script in evaluating dir, the log is in picture,

Besides, when i comment the train part in evaluate.py, the accuracy of eval dataset is similar.

Model tar decompression error

下载提供的预训练模型，解压时出错

The speed of training mobilenetv2 PruningNet is slow.

Hi, I try to train mobilenetv2 PruningNet from scratch with 4 v100 GPUs(batch_size=256). I find that train one batch data spend about 3 seconds (Probably because of the random crop of the network in the training process.). Is that normal? How many time do you spend for training mobilenetv2 PruningNet from scratch (64 epoch)?

part of train log:
Epoch: [0][0/5004] Time 3.857 (3.857) Data 0.000 (0.000) Loss 6.9178 (6.9178) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000)
Epoch: [0][1/5004] Time 3.421 (3.639) Data 0.000 (0.000) Loss 6.9392 (6.9285) Prec@1 0.000 (0.000) Prec@5 0.781 (0.391)
Epoch: [0][2/5004] Time 3.475 (3.584) Data 0.000 (0.000) Loss 6.9520 (6.9363) Prec@1 0.000 (0.000) Prec@5 0.391 (0.391)
Epoch: [0][3/5004] Time 3.235 (3.497) Data 0.000 (0.000) Loss 6.9477 (6.9392) Prec@1 0.000 (0.000) Prec@5 0.781 (0.488)
Epoch: [0][4/5004] Time 3.162 (3.430) Data 0.000 (0.000) Loss 6.9354 (6.9384) Prec@1 0.781 (0.156) Prec@5 0.781 (0.547)
Epoch: [0][5/5004] Time 3.129 (3.380) Data 0.000 (0.000) Loss 6.9591 (6.9419) Prec@1 0.391 (0.195) Prec@5 0.391 (0.521)
Epoch: [0][6/5004] Time 3.146 (3.347) Data 0.000 (0.000) Loss 6.9494 (6.9429) Prec@1 0.781 (0.279) Prec@5 0.781 (0.558)
Epoch: [0][7/5004] Time 3.138 (3.321) Data 0.000 (0.000) Loss 6.9903 (6.9489) Prec@1 0.000 (0.244) Prec@5 0.781 (0.586)
Epoch: [0][8/5004] Time 3.393 (3.329) Data 0.000 (0.000) Loss 6.9696 (6.9512) Prec@1 0.000 (0.217) Prec@5 0.000 (0.521)
Epoch: [0][9/5004] Time 3.495 (3.345) Data 0.000 (0.000) Loss 7.0030 (6.9563) Prec@1 0.000 (0.195) Prec@5 0.000 (0.469)
Epoch: [0][10/5004] Time 3.307 (3.342) Data 0.000 (0.000) Loss 7.0157 (6.9617) Prec@1 0.391 (0.213) Prec@5 0.781 (0.497)
Epoch: [0][11/5004] Time 3.254 (3.334) Data 0.000 (0.000) Loss 7.0124 (6.9660) Prec@1 0.000 (0.195) Prec@5 0.781 (0.521)
Epoch: [0][12/5004] Time 3.694 (3.362) Data 0.000 (0.000) Loss 7.0236 (6.9704) Prec@1 0.000 (0.180) Prec@5 1.172 (0.571)
Epoch: [0][13/5004] Time 3.186 (3.350) Data 0.000 (0.000) Loss 7.0330 (6.9749) Prec@1 0.000 (0.167) Prec@5 0.000 (0.530)
Epoch: [0][14/5004] Time 3.180 (3.338) Data 0.000 (0.000) Loss 7.0146 (6.9775) Prec@1 0.000 (0.156) Prec@5 0.781 (0.547)
Epoch: [0][15/5004] Time 3.272 (3.334) Data 0.000 (0.000) Loss 7.1130 (6.9860) Prec@1 0.000 (0.146) Prec@5 0.000 (0.513)
Epoch: [0][16/5004] Time 2.912 (3.309) Data 0.000 (0.000) Loss 7.0441 (6.9894) Prec@1 0.000 (0.138) Prec@5 0.781 (0.528)
Epoch: [0][17/5004] Time 3.199 (3.303) Data 0.000 (0.000) Loss 7.0701 (6.9939) Prec@1 0.000 (0.130) Prec@5 0.391 (0.521)
Epoch: [0][18/5004] Time 3.163 (3.296) Data 0.000 (0.000) Loss 7.1076 (6.9999) Prec@1 0.000 (0.123) Prec@5 0.000 (0.493)
Epoch: [0][19/5004] Time 3.197 (3.291) Data 0.000 (0.000) Loss 7.1321 (7.0065) Prec@1 0.000 (0.117) Prec@5 0.391 (0.488)
Epoch: [0][20/5004] Time 3.116 (3.283) Data 0.000 (0.000) Loss 7.0883 (7.0104) Prec@1 0.000 (0.112) Prec@5 0.391 (0.484)
Epoch: [0][21/5004] Time 3.464 (3.291) Data 0.000 (0.000) Loss 7.0444 (7.0119) Prec@1 0.000 (0.107) Prec@5 0.000 (0.462)
Epoch: [0][22/5004] Time 3.135 (3.284) Data 0.000 (0.000) Loss 7.0642 (7.0142) Prec@1 0.000 (0.102) Prec@5 0.391 (0.459)
Epoch: [0][23/5004] Time 3.392 (3.288) Data 0.000 (0.000) Loss 7.0659 (7.0163) Prec@1 0.000 (0.098) Prec@5 0.781 (0.472)
Epoch: [0][24/5004] Time 3.117 (3.282) Data 0.000 (0.000) Loss 7.0385 (7.0172) Prec@1 0.000 (0.094) Prec@5 0.391 (0.469)
Epoch: [0][25/5004] Time 3.271 (3.281) Data 0.000 (0.000) Loss 7.0659 (7.0191) Prec@1 0.000 (0.090) Prec@5 0.781 (0.481)
Epoch: [0][26/5004] Time 3.461 (3.288) Data 0.000 (0.000) Loss 7.0382 (7.0198) Prec@1 0.000 (0.087) Prec@5 0.391 (0.477)
Epoch: [0][27/5004] Time 2.958 (3.276) Data 0.000 (0.000) Loss 7.0603 (7.0213) Prec@1 0.000 (0.084) Prec@5 0.000 (0.460)
Epoch: [0][28/5004] Time 3.120 (3.271) Data 0.000 (0.000) Loss 7.1257 (7.0249) Prec@1 0.391 (0.094) Prec@5 0.391 (0.458)
Epoch: [0][29/5004] Time 3.212 (3.269) Data 0.000 (0.000) Loss 7.0864 (7.0269) Prec@1 0.000 (0.091) Prec@5 0.391 (0.456)
Epoch: [0][30/5004] Time 3.090 (3.263) Data 0.000 (0.000) Loss 7.1347 (7.0304) Prec@1 0.391 (0.101) Prec@5 0.391 (0.454)
Epoch: [0][31/5004] Time 2.839 (3.250) Data 0.000 (0.000) Loss 7.0732 (7.0317) Prec@1 0.000 (0.098) Prec@5 0.781 (0.464)
Epoch: [0][32/5004] Time 3.346 (3.253) Data 0.000 (0.000) Loss 7.1425 (7.0351) Prec@1 0.391 (0.107) Prec@5 0.391 (0.462)
Epoch: [0][33/5004] Time 3.508 (3.260) Data 0.000 (0.000) Loss 7.0733 (7.0362) Prec@1 0.000 (0.103) Prec@5 0.781 (0.471)
Epoch: [0][34/5004] Time 3.215 (3.259) Data 0.000 (0.000) Loss 7.1465 (7.0394) Prec@1 0.000 (0.100) Prec@5 0.000 (0.458)
Epoch: [0][35/5004] Time 3.071 (3.254) Data 0.000 (0.000) Loss 7.0800 (7.0405) Prec@1 0.781 (0.119) Prec@5 1.562 (0.488)

in mobilenetv2,why all bn layers set affine=False?

PruningNet is hard to train

I tried on my customer model, find it is really hard to train.

And another question, is PruningNet trusted to be a justified-evaluation-proxy during EA process?

Failed to reproduce MobileNet-v2 results (eval)

Hi, thanks for your inspiring work and published code.

I have encountered a problem reproducing the results for MobilNet-v2 140M. I directly download the script of MobileNet-140M and use the default parameter setting in the script, but just got the accuracy of Top-1 Acc: 67.40%, Top-5 Acc: 87.47%. (The reported Top-1 Acc is 68.2% in the paper).

May I know if you use the default parameter setting in the script to obtain the results of MobileNet-v2?

Thanks.

The accuracy of the PruningNet

Hi, though I have implemented the accuracy of the pruned network in the paper, I found that the performance of pruning net trained in the first stage is not well. Combining the results of the following two stages, the better the pruning net is, the higher the accuracy of pruned network is.
Could you please share your accuracy of pruning net?

why the val result of PruningNet model is 0

Hello, your MetaPruning performance is excellent.

However, when i start the PruningNet train, the val accuracy is always 0. my steps are as follows,

Download the MetaPruning repository code, and cd mobilenetv2/training directory.
Change the batch_size and learning_rate in train.py according to #2 (comment). The default values are 512 and 0.25 respectively. Since i have only 4 2080Ti GPU cards on single machine, to avoid the gpu out of memory error, reduce them by half, 256 and 0.125.
Split the imagenet train data to subtrain and subval5w, and set traindir and valdir in train.py to these two directory.
Run the run.sh script file, the log is as picture,

so where am i wrong, looking forward to your suggestions.

data augmentation

Thanks for the great work.
I have a question about the data augmentation method.

I notice that RandomResizedCrop(224, scale=(crop_scale, 1.0)), Lighting(lighting_param), are used, which is different from pytorch official data augmentation.

What is the gap between these two data augmentation methods?
Thanks.

目标检测

这个meta-pruning可以适用基于resnet+fpn的目标检测网络的剪枝吗？

Is MB v2 code complete?

Hi,

In reference to issue #36, according to the README:

After searching is done, the top1 encoding vector will be shown in the log. By simply copying the encoding vector to the rngs = [ ] in evaluate.py, you can evaluate the Pruned Network corresponding to this encoding vector.

I did a quick search and found that rngs is only used in MB v1 eval. The MB v2 search log contains the top 50 results for each iteration. So the first questions is: which part of the log should be copied? Is it the No.1 array generated by the last iteration?

The second question is that MB v2 channel config code seems to be different than MB v1's, can you please provide the steps or a patch that will get MB v2 eval to work?

How to evaluate mobilenet v2 with a new dataset?

Thanks for your work!
I have trained mobilenet v2 with my own dataset. Untile now, I have finished training step and search step. But when I tried to execute the evaluate step, I couldn't find 'rngs' in evaluate.py. So, can you describe more details of the evaluate step.
Waiting for your reply!

searching accuracy is low

I really appreciate your excellent works. I want to reproduce papers result. When I finished training model, I run command as follow fo searching model:

sh mobilenetv2/searching/run.sh

However, model acc is so low


 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06400299072266
No.33 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 18. 21. 26. 24. 27.  5. 18. 25. 23. 30. 25. 25.  6. 24. 26. 21. 26.] Top-1 err = 98.06400299072266
No.34 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25.  6. 24. 26. 21. 26.] Top-1 err = 98.06400299072266
No.35 [10. 13. 13. 13. 18. 18. 18. 26. 26. 26. 26. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 28. 27.  4. 30. 25. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06400299072266
No.36 [10. 13. 13. 13. 18. 18. 18. 21. 21. 21. 21. 26. 26. 26. 29. 29. 29. 20.
 20. 11. 26. 24. 29.  5. 18. 27. 23. 30. 24. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06400299072266
No.37 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 18. 26. 26. 25. 27.  5. 18. 27. 14. 30. 22. 25.  6. 24. 26. 21. 19.] Top-1 err = 98.06400299072266
No.38 [20. 13. 13. 13. 18. 18. 18. 10. 10. 10. 10. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 28. 27.  4. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06400299072266
No.39 [10. 13. 13. 13. 18. 18. 18. 26. 26. 26. 26. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25.  6. 24. 26. 21. 13.] Top-1 err = 98.06400299072266
No.40 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
  9. 26. 26. 24. 27.  5. 18. 27. 15. 30. 25. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06400299072266
No.41 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25. 16. 24. 26. 21. 26.] Top-1 err = 98.06600189208984
No.42 [10. 13. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 21. 29.  5. 18. 27. 23. 30. 25. 25. 30. 24. 26. 21.  7.] Top-1 err = 98.06600189208984
No.43 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 28. 25. 23. 30. 25. 25. 16. 24. 26. 21. 26.] Top-1 err = 98.06600189208984
No.44 [20. 13. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.45 [10. 13. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 18. 21. 26. 24. 27.  5. 18. 27. 23. 30. 25. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.46 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 17. 17. 17. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 18. 27. 23. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.47 [20.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 28. 25.  4. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.48 [19. 13. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 25. 20.  4. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.49 [10. 13. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 29.  5. 28. 30. 23. 30. 25. 25.  6. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
No.50 [10.  9. 13. 13. 18. 18. 18. 11. 11. 11. 11. 26. 26. 26. 29. 29. 29. 20.
 20. 21. 26. 24. 27.  5. 18. 27. 14. 30. 25. 25. 30. 24. 26. 21. 18.] Top-1 err = 98.06600189208984
mutation ......
mutation_num = 25
crossover ......
crossover_num = 25
random select ........
random_num = 0
saving tested_dict ........

It's normal phenomenon during searching?

how to use the pruned model？

model of PruningNet

I really appreciate your work and want to follow this work. But training PruningNet on imagenet requires too many GPU resources. Can you provide PruningNet model for us?

The precision of prunned resnet50, traning from scratch.

Hi, I referenced the given prunned network of resnet50 of FLOPs 2G（ids = [23, 17, 16, 17, 16, 16, 18, 20, 18, 25, 11, 29, 22, 18, 13, 27, 17, 23, 23, 20]）, and training from scrach on the given code, but i can't get the accuracy given in the paper. Can you give me some suggesttion about the parameter configuration and GPU number and the training method in your training ,thank you very much.

searching accuracy is low

Hi zechun,

I'm very interested in this work. However, when i follow the searching procedure, the accuracy on subval5w is only 3.7%. It seems unreasonable, but i can not find the reason. My steps are as follows,

step1. split the imagenet train data to subtrain and subval5w, and randomly select 2w samples from subtrain data.
step2. train the mobilenetV2 pruningnet on subtrain data, after 64 epochs, the train Acc@1 is about 45%.
step3. load the pretrain model, set subtrain2w and subval5w as train and val data respectively, and run the search procedure. I double checked the loaded model, it's right the pretrained one.
step4. the Acc@1 on subval5w is about 3.7%, the log is as follows,

"""
loaded checkpoint /data/glusterfs_workspaces/11070468/MetaPruning-master-P40-batchSize/searching_log/mobilenetV2/models/checkpoint.pth.tar epoch = 63
population_num = 50 select_num = 50 mutation_num = 25 crossover_num = 25 random_num = 0 max_iters = 20
random select ........
random_num = 50
test 1th model
[18, 23, 18, 18, 28, 28, 28, 27, 27, 27, 27, 2, 2, 2, 12, 12, 12, 15, 4, 12, 25, 23, 20, 14, 24, 8, 19, 26, 2, 10, 24, 16, 6, 26, 10]
FLOPs = 207.98M
(18.0, 23.0, 18.0, 18.0, 28.0, 28.0, 28.0, 27.0, 27.0, 27.0, 27.0, 2.0, 2.0, 2.0, 12.0, 12.0, 12.0, 15.0, 4.0, 12.0, 25.0, 23.0, 20.0, 14.0, 24.0, 8.0, 19.0, 26.0, 2.0, 10.0, 24.0, 16.0, 6.0, 26.0, 10.0)
train batch 0, loss: 2.62265944480896
train batch 1, loss: 2.5418784618377686
train batch 2, loss: 2.592099666595459
train batch 3, loss: 2.5686259269714355
train batch 4, loss: 2.498969316482544
train batch 5, loss: 2.5880398750305176
train batch 6, loss: 2.5917248725891113
train batch 7, loss: 2.5251238346099854
train batch 8, loss: 2.6498796939849854
train batch 9, loss: 2.642388343811035
train batch 10, loss: 2.4585468769073486
train batch 11, loss: 2.495739221572876
train batch 12, loss: 2.482715606689453
train batch 13, loss: 2.5206358432769775
train batch 14, loss: 2.7063348293304443
train batch 15, loss: 2.5236918926239014
train batch 16, loss: 2.5139198303222656
train batch 17, loss: 2.490814208984375
train batch 18, loss: 2.5737462043762207
train batch 19, loss: 2.40185809135437
eval batch 0, loss: 7.643044948577881
eval batch 1, loss: 8.309141159057617
eval batch 2, loss: 9.017401695251465
eval batch 3, loss: 7.960979461669922
eval batch 4, loss: 7.350314617156982
eval batch 5, loss: 8.273918151855469
eval batch 6, loss: 8.442761421203613
eval batch 7, loss: 7.337158203125
eval batch 8, loss: 7.1714630126953125
eval batch 9, loss: 8.070206642150879
eval batch 10, loss: 8.065461158752441
eval batch 11, loss: 8.770538330078125
eval batch 12, loss: 7.867849349975586
eval batch 13, loss: 8.286103248596191
eval batch 14, loss: 7.584053039550781
eval batch 15, loss: 6.936843395233154
eval batch 16, loss: 6.62252140045166
eval batch 17, loss: 8.285584449768066
eval batch 18, loss: 7.869260787963867
eval batch 19, loss: 7.220644474029541
eval batch 20, loss: 7.309091567993164
eval batch 21, loss: 7.2295942306518555
eval batch 22, loss: 7.227307319641113
eval batch 23, loss: 7.233340740203857
eval batch 24, loss: 7.1270551681518555
eval batch 25, loss: 6.655908584594727
eval batch 26, loss: 7.097906112670898
eval batch 27, loss: 6.717674255371094
eval batch 28, loss: 6.4711737632751465
eval batch 29, loss: 6.653423309326172
eval batch 30, loss: 6.54898738861084
eval batch 31, loss: 6.594447612762451
eval batch 32, loss: 6.689351558685303
eval batch 33, loss: 7.512929916381836
eval batch 34, loss: 7.239442825317383
eval batch 35, loss: 7.165963172912598
eval batch 36, loss: 6.749525547027588
eval batch 37, loss: 6.8322224617004395
eval batch 38, loss: 7.071037769317627
eval batch 39, loss: 6.838944435119629
eval batch 40, loss: 6.563052654266357
eval batch 41, loss: 6.997549533843994
eval batch 42, loss: 6.260917663574219
eval batch 43, loss: 6.890049934387207
eval batch 44, loss: 7.117074489593506
eval batch 45, loss: 6.359354496002197
eval batch 46, loss: 8.009349822998047
eval batch 47, loss: 6.258756637573242
eval batch 48, loss: 7.236196994781494
eval batch 49, loss: 7.505835056304932

Acc@1 3.722 Acc@5 10.550
Top1_err = 96.28 Top5_err = 89.45 loss = 7.3050
test 2th model
[14, 4, 1, 1, 17, 17, 17, 4, 4, 4, 4, 6, 6, 6, 14, 14, 14, 8, 16, 19, 24, 20, 18, 26, 26, 2, 7, 30, 30, 27, 0, 8, 6, 12, 14]
FLOPs = 133.81M
"""

looking forward to your suggestions. @liuzechun

a question about Latency？

hi I have a quension of how to get the latency...
when i read the paper, there is two constraints， flops and latency。how to get the latency of the given device.

forward to your answer!... thanks!

About the converge of Pruning network

Random input to the Pruning network. I wonder if it can converge. In the hypernetwork paper, the input are fixed. Besides. it seems you use lots of independent small FC layers network instead of use a fixed same hypernetwork as a Pruning network. Lots of small Pruning network not one fixed Pruning work. Is this the meta learning way?

confusion about the code

Hello zechun, First of all, appreciate your excellent work very much. It is very novelty.
Here are some questions about the code.

Why the code of evaluation is almost same with training? Especially for the validation set, I see the path of validation set is absolutely same in the code of evaluation and training. I think the path of validation set in the code of training is the sub-validation set apart from training and the path of validation set in the code of evaluation is the real validation set of ImageNet. Is it right?
Is the MobileNetV1 here a typo?

Integrate constraint into genetic algorithm

Hi! If one wants to prune a CNN model with pruning ratio set as 0.4 using genetic algorithm,then how to apply this constraint within GA? I have no idea if to modify pop initialization or fitness function.Also cross/mutate operations will do impacts on pruning ratio.

How About the GPU Memory when search on resnet50?

I think it will cause a very large memory usage when using the fully connected layers to construct the whole model. Specifically, when using the last stage of resnet50 which has 1024 max channels.

How many epochs are used to train the PruningNet(MobileNetv2) from scratch?

Rt.

where can i change the pruning Ratio ?

I have finished the three steps of meta-pruning Net,using my dataset. But i didn't find the place where the pruning ratio changes to evaluate the result of meta-pruning net, thanks for your help

Why random the "scale_ids" in every batch

train.py
175 line, random the scale which means the model in every layer need hold the number of channels.
And I find every batch random once.
If every batch random once,how to know which number of channels is the best combination in the model.
In a epoch,there are many batchs and it will random scales many times.

Question about the use of pruningNet model

Hello liuzechun,
Thanks for your brilliant work.But i have some problem about how to use the pruningNet model you provide. I download the Model-MetaP-Mbv1-0.75 you provide in readme and want to evaluate the Flops and Acc1 in chart.
I use the evaluate.py.But when i load the checkpoint.pth.tar i download,it says there is error in loading state_dict for DataParallel.like...
....size mismatch for module.feature.0.conv1.weight: copying a param with shape torch.Size([25, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([31, 3, 3, 3]).
size mismatch for module.feature.0.bn1.weight: copying a param with shape torch.Size([25]) from checkpoint, the shape in current model is torch.Size([31])......
The code i load the stat_dict is :
model = MobileNetV1()
logging.info(model)
model = nn.DataParallel(model).cuda()
checkpoint = torch.load(checkpoint_tar)
model.load_state_dict(checkpoint['state_dict'])
I print the model you define in MobileNetV1.py,it's different from the structure i download from readme.Is the way i use it incorrect?
Thanks again and looking forward to your reply.

Anna

Why use meta learning to predict weights

Why use an independent meta networks to generate the weights for each block, in stead of using the weights of the original networks and update them directly?

RuntimeError: Given input size: (1280x1x1). Calculated output size: (1280x0x0). Output size is too small

hi，
I changed the imagenet to pycifar10( classes=10).
now, I edit the mobilenet_v2.py the classes =10,and run the mobilenetb2--train,
error is:
RuntimeError: Given input size: (1280x1x1). Calculated output size: (1280x0x0). Output size is too small

why, and how to solve it ,thanks

CUDA out of memory

Traceback (most recent call last):
File "train.py", line 255, in
main()
File "train.py", line 63, in main
model = nn.DataParallel(model).cuda()
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 199, in _apply
param.data = fn(param.data)
File "/home/abc/anaconda3/envs/torch1.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.78 GiB total capacity; 281.29 MiB already allocated; 29.56 MiB free; 18.71 MiB cached)

Do you meet this problem

about mobilenet_v2 part

I have two questions to ask you:
1)in the mobilenet_v2.py why set the parameter “affine” false, what will happen if set to be True;
2)when i train the model with my own data, I found that the acc of my validation set always keep same;
forward to your answer!!!

BN Stat Params during searching

Hello, I really like this work of network pruning.

I have a little problem about the BN's parameters during searching.

Take MobileNetV1 as an example, while training the PruningNet, BN's running_mean/vars have already been calculated and saved, why not directly use these values, but recalibrate them during searching?

Pretrained ResNet50 0.75x Model seems corrupted

I was trying to load (using torch.load) the ResNet50 0.75x pretrained model provided, but run into this error "RuntimeError: unexpected EOF, expected 1565290 more bytes. The file might be corrupted."

I have tried loading the 0.5x model and that loads fine. I tried switching my PyTorch version to 1.1.0 as suggested in your readme in case it was a version issue but that didn't fix the issue.

Has anyone run into this before? I am wondering if there was maybe some issue in the upload of the model to the OneDrive? Any help with this would be appreciated. Thanks.

readme中的模型连接打不开

你好，模型连接好像已经失效，或者需要注册账号登录。
可否直接上传到百度或者谷歌网盘。
我的硬件资源紧张，自己搜索一次实在不太可行。
谢谢。