miaow1988 / shufflenet_v2_pytorch_caffe Goto Github PK

ShuffleNet-V2 for both PyTorch and Caffe.

License: BSD 2-Clause "Simplified" License

Python 98.65% Shell 1.35%

shufflenet_v2_pytorch_caffe's Introduction

ShuffleNet_V2_pytorch_caffe

ShuffleNet-V2 for both PyTorch and Caffe.

This project supports both Pytorch and Caffe. Supported model width are 0.25, 0.33, 0.5, 1.0, 1.5 or 2.0, other model width are not supported.

Usage

PyTorch

Just use shufflenet_v2.py as following.

import torch
import shufflenet_v2
num_classes = 1000
model_width = 0.5
net = shufflenet_v2.Network(num_classes, model_width)
params = torch.load('shufflenet_v2_x0.5.pth', map_location=lambda storage, loc: storage)
net.load_state_dict(params)
input = torch.randn(1,3,224,224)
output = net(input)

Caffe

Prototxt files can be generated by shufflenet_v2.py

python shufflenet_v2.py --save_caffe net --num_classes 1000 --model_width 1.0

Converting Model from PyTorch to Caffe

python shufflenet_v2.py --load_pytorch net.pth --save_caffe net --num_classes 1000 --model_width 1.0

Pretrained ImageNet Models for PyTorch and Caffe

Pretrained models can be downloaded from: https://github.com/miaow1988/ShuffleNet_V2_pytorch_caffe/releases

shufflenet_v2_x0.25, Top-1 Acc = 46.04%. Unofficial.
shufflenet_v2_x0.33, Top-1 Acc = 51.40%. Unofficial.
shufflenet_v2_x0.50, Top-1 Acc = 58.93%. This accuracy is 1.37% lower compared with the result in the official paper.

Training Details

All ImageNet images are resized by a short edge size of 256 (bicubic interpolation by PIL). And then each of them are pickled by Python and stored in a LMDB dataset.
Training is done by PyTorch 0.4.0
data augmentation: 224x224 random crop and random horizontal flip. No image mean extraction is used here, which is done automatically by data/bn layers in the network.
As in my codes, networks are initialized by nn.init.kaiming_normal_(m.weight, mode='fan_out').
A SGD with nesterov momentum (0.9) is used for optimizing. The batch size is 1024. Models are trained by 300000 iterations, while the learning rate decayed linearly from 0.5 to 0.

Something you might have noticed

Models are trained by PyTorch and converted to Caffe. Thus, you should use scale parameter in Caffe's data layer to make sure all input images are rescaled from [0, 255] to [0, 1].
The RGB~BGR problem is not very crucial, you may just ignore the difference if you are use these models as pretrained models for other tasks.

Others

All these years, I barely achieved same or higher results of different kinds of complex ImageNet models reported in papers. If you got a better accuracy, please tell me.

shufflenet_v2_pytorch_caffe's People

Contributors

Stargazers

Watchers

Forkers

rkshuai htkang369 q5390498 qingzi02010 zhdai chl916185 jie0001 smellly hzhang57 pandazha daimagou gy12346123 happog mengrang paojianghu hezhenjun123 zxh009123 zhyq wwwanghao tangtangchx lyk125 soledad89 fangwudi marvin521 trendingtechnology joefannie kongsea zyc4me kuybeda liyuanyaun ngunauj lexuszhi1990 styjb arsenluca yangtairen jiangwqcooler blankworld oakyms zgsxwsdxg starstylesky baoruxiao raymondbigcat giorking haoliuhust youngstu chanbluky sonyeric soccergame flowbinyang qiaokangqi wolfworld6 robin1987z jinjiin fireae quxiaofeng jackcc sleepingidea codingboo zrh0712 linquanxu chaoso lupotto zhahang0 andyliu93 youngliuxx shiruipeng1985 shangyazhou whitexiezx anorthman cosmoshua bonseyes-admin mjanddy hwenjun18 w510056105 maminglang1991 jluhuangj xialuxi litingsjj queensad jacklongking michaelhyin tong888 snowbhr06 dsfour 174614361 mathpopo sonixixi wurdmemory qaz734913414 hanchaow euminds leo-xxx yjinyyzyq yyfyan chunyanzhao roger707 wzheng1983 at1a8 wangyipengw1p p-chao

shufflenet_v2_pytorch_caffe's Issues

Pretrained models

@miaow1988 Would you please release the pretrained models of shufflenet v2 x1.0 and x2.0?

About solver

Hi, Can you please show me your solver file for training in caffe, thanks so much!

data augmentation and weight decay

Hi,
In data augmentation, do you use random crop or random resized crop?
Thanks,
Yifan

Why fix the learning rate of batch norm to 0?

I don't understand why

分类精度

你好，我用shufflenet_v2_0.25的caffe模型跑imagenet val集分类精度只有0.37左右，
调用代码是这样的，

ransformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_raw_scale('data', 1) # rescale from [0, 1] to [0, 255]
net.blobs['data'].reshape(1,3,224,224)

for file in filenames:
pic = os.path.join('data/ilsvrc12/val',file)
input = caffe.io.load_image(pic)
transformed_image = transformer.preprocess('data', input)
net.blobs['data'].data[...] = transformed_image

about the first batch norm

the first batch normalization layer is used to replace the mean subtract, but it should be used with affine=False.

this might be the issue that you can't reproduce the paper result.

About Train In ImageNet

hi,
I use your code with pytorch to train on imagenet. At the begin,the loss is very small,it is around 0.02.the speed of loss decrease is around 0.0001. do it ok?
thank you very much!

Error while converting pytorch to caffe

Traceback (most recent call last):
File "shufflenet_v2.py", line 224, in
net.convert_to_caffe(args.save_caffe)
File "shufflenet_v2.py", line 160, in convert_to_caffe
print(caffe_net.to_proto())
File "/home/orbosoham/freshCaffe/caffe/python/caffe/net_spec.py", line 193, in to_proto
top._to_proto(layers, names, autonames)
File "/home/orbosoham/freshCaffe/caffe/python/caffe/net_spec.py", line 97, in _to_proto
return self.fn._to_proto(layers, names, autonames)
File "/home/orbosoham/freshCaffe/caffe/python/caffe/net_spec.py", line 162, in _to_proto
assign_proto(layer, k, v)
File "/home/orbosoham/freshCaffe/caffe/python/caffe/net_spec.py", line 64, in assign_proto
is_repeated_field = hasattr(getattr(proto, name), 'extend')
AttributeError: group

About X2.0 version

Thanks for sharing those models, would u mind provide the X2.0 version of ShufflenetV2, thank you

Could you please share your solver?

I can not train the shufflenet to the accuracy in the paper, and there is a large margin between mine and that in the paper.
Could you please share your solver.prototxt?

I try to convert model from pytorch to caffe but a error occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/caffe/net_spec.py", line 160, in _to_proto
_param_names[self.type_name] + '_param'), k, v)
KeyError: 'ShuffleChannel'

Have you modified the caffe source code to add a new layer ShuffleChannel?

How to add SE block?

Converted caffe model produce wrong result.

If I don't load pre-trained pytorch model, the result of the converted model has diff within 0.001, but when I load the pre-trained model, the diff is very big. I don't know why, since the pre-trained model has been successfully loaded. BTW, the pretrained model is trained through multi-gpu.

Convert pytorch to caffe Failed

When I convert the pytorch model to the caffe model, I found that 0.25x model can be converted. But more than 0.25x model can not be converted. I found that the problem is that when I executed the conversion code, gpu0 memory is filled up quickly, leading to errors.

F1031 20:37:10.787436 16330 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
Aborted (core dumped)

I found that when the code is executed, there will be out of memory:
In def convert_to_caffe(self, name): in slim.py
caffe_net = caffe.Net(name + '.prototxt', caffe.TEST)

Hi,where is "stage 2","stage 3"?How to extract the output of these layers？

About the last layer of the model

There is a conv layer at the last instead of a full connected layer. There isn't a dropout layer, I think it may lead to overfitting during training.
Actually I used the 1.0 model to train my dataset, and overfitting happens. (train acc 87 val acc 82)

how do train the models of pytorch

which version of caffe u use?

as title

Pytorch Pretrained Model Read Error

Hi, I downloaded the pretrained model in the release.
I tried to read the model (shufflenet_v2_x0.5.pth), the code

def test():
path = os.getcwd() + '/shufflenet_v2_x0.5.pth'
model = torch.load(path)
torch.save(model.state_dict(), os.getcwd() + '/params.pth')
model_object.load_state_dict(torch.load(os.getcwd() + '/params.pth'))

Error information here:

Traceback (most recent call last):
File "test.py", line 13, in
test()
File "test.py", line 9, in test
torch.save(model.state_dict(), os.getcwd() + '/params.pth')
AttributeError: 'collections.OrderedDict' object has no attribute 'state_dict'

the version of pytorch is 0.4.1. ( I am a newbee to pytorch and I am trying to convert the pytorch model to mxnet)

Could you kindly please provide the solution?

Thanks

Hid

trained models

hello,have you trained shufflenet_v2_x1.0 or shufflenet_v2_x1.5 in imagenet.

Do you provide pretrained weights of model_width=1.0 and model_width=2.0

pytorch to caffe, how to add a linear layer on slim.py

I use your code train a test model, just change the convolution layer to a linear layer, and a linear layer to the slim.py for conversion. When trying to convert the torch model to the caffe model, the mean diff is large.

I want to convert pytorch to caffe and support the linear layer. What other work needs to be done?

my codes:

# slim.py
if isinstance(m, nn.Linear):
    if m.bias is None:
        param = [dict(lr_mult=1, decay_mult=1)]
    else:
        param = [dict(lr_mult=1, decay_mult=1), dict(lr_mult=1, decay_mult=0)]
    inner_product_param = dict(
        num_output = m.out_features,
        bias_term = (m.bias is not None),
        weight_filler = dict(type='msra'),
    )
    layer = L.InnerProduct(
        layer,
        param=param,
        inner_product_param = inner_product_param,
    )
    caffe_net.tops[m.g_name] = layer
    return layer

Advices needed for cifar database

Hi @miaow1988 ,

Nice work!
Would you please check the network config bellow for 32*32 dataset? Since only about 72% acc(1.5 version) would be obtained after 400 epoch using the same training paras as this repo.

    self.network_config = [
        g_name('data/bn', nn.BatchNorm2d(3)),
        slim.conv_bn_relu('stage1/conv', 3, in_channels, 1, 1, 1),   #3,2,1->1,1,1
        #g_name('stage1/pool', nn.MaxPool2d(3, 2, 0, ceil_mode=True)),  #removed
        (width_config[0], 2, 1, 4, 'b'),
        (width_config[1], 2, 1, 8, 'b'), # x16
        (width_config[2], 2, 1, 4, 'b'), # x32
        slim.conv_bn_relu('conv5', width_config[2], width_config[3], 1),
        g_name('pool', nn.AvgPool2d(4, 1)), #7->4
        g_name('fc', nn.Conv2d(width_config[3], self.num_classes, 1)),
    ]

Thanks very much for your help!

DepthWiseConvolution

Why these is not depth wise convolution layer?

About Top-1 Acc in ImageNet

shufflenet_v2_x0.50, Top-1 Acc = 58.93%. This accuracy is 1.37% lower compared with the result in the official paper.

but I saw the paper, the x0.50 model only have 39.7% Top-1 Acc in Table 8.

Training with the prototxt is slow

In case of directly using the prototxt, please change the depthwise convolution layers type from Convolution into DepthwiseConvolution. After changing the 1.5x prototxt, the training speed up from ~14seconds/iter to 1s/iter.

Could you please share the pretrained model?

Hi,
It is very nice of you sharing the code. Could you please also share the pretrained models?