bug1989 / caffe-int8-convert-tools Goto Github PK

Generate a quantization parameter file for ncnn framework int8 inference

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

quantized-neural-networks ncnn int8-inference caffe deeplearning-ai

caffe-int8-convert-tools's Introduction

Caffe-Int8-Convert-Tools

This convert tools is base on TensorRT 2.0 Int8 calibration tools, which use the KL algorithm to find the suitable threshold to quantize the activions from Float32 to Int8(-127 - 127).

We provide the Classification(SqueezeNet_v1.1) and Detection(MobileNet_v1 SSD 300) demo based on ncnn(a high-performance neural network inference framework optimized for the mobile platform) and the community ready to support this implementation.

The pull request in ncnn

NCNN have a new convert tool to support Post-Training-Quantization

Using this new ncnn-quantization-tools, you can convert your ncnn model to ncnn int8 model directly. If you just want to deploy your model with ncnn,I suggest you use it.

Reference

For details, please read the following PDF:

8-bit Inference with TensorRT

MXNet quantization implementation:

Quantization module for generating quantized (INT8) models from FP32 models

An introduction to the principles of a Chinese blog written by my friend(bruce.zhang):

The implement of Int8 quantize base on TensorRT

HowTo

The purpose of this tool(caffe-int8-convert-tool-dev.py) is to test new features, such as mulit-channels quantization depend on group num.

This format is already supported in the ncnn latest version. I will do my best to transform some common network models into classification-dev

python caffe-int8-convert-tool-dev-weight.py -h
usage: caffe-int8-convert-tool-dev-weight.py [-h] [--proto PROTO] [--model MODEL]
                                  [--mean MEAN MEAN MEAN] [--norm NORM]
                                  [--images IMAGES] [--output OUTPUT]
                                  [--group GROUP] [--gpu GPU]

find the pretrained caffemodel int8 quantize scale value

optional arguments:
  -h, --help            show this help message and exit
  --proto PROTO         path to deploy prototxt.
  --model MODEL         path to pretrained caffemodel
  --mean MEAN           value of mean
  --norm NORM           value of normalize(scale value or std value)
  --images IMAGES       path to calibration images
  --output OUTPUT       path to output calibration table file
  --group GROUP         enable the group scale(0:disable,1:enable,default:1)
  --gpu GPU             use gpu to forward(0:disable,1:enable,default:0)
python caffe-int8-convert-tool-dev-weight.py --proto=test/models/mobilenet_v1.prototxt --model=test/models/mobilenet_v1.caffemodel --mean 103.94 116.78 123.68 --norm=0.017 --images=test/images/ output=mobilenet_v1.table --group=1 --gpu=1

How to use the output file(calibration-dev.table)

For example in MobileNet_v1_dev.table

conv1_param_0 0.0 3779.48337933 482.140562772 1696.53814502
conv2_1/dw_param_0 0 72.129143 149.919382 // the convdw layer's weight scale every group is 0.0 72.129 149.919 ......
......
conv1 49.466518
conv2_1/dw 123.720796 // the convdw layer's bottom blobchannel scale is 123.720
......

Three steps to implement the conv1 layer int8 convolution:

Quantize the bottom_blob and weight:

bottom_blob_int8 = bottom_blob_float32 * data_scale(49.466518)
weight_int8 = weight_float32 * weight_scale(156.639840)

Convolution_Int8:

top_blob_int32 = bottom_blob_int8 * weight_int8

Dequantize the TopBlob_Int32 and add the bias:

top_blob_float32 = top_blob_int32 / [data_scale(49.466518) * weight_scale(156.639840)] + bias_float32

How to use with ncnn

quantized int8 inference

Accuracy and Performance

We use ImageNet2012 Dataset to complete some classification test.

Type	Detail
Calibration Dataset	ILSVRC2012_img_test 1k
Test Dataset	ILSVRC2012_img_val 5k
Framework	ncnn
Support Layer	Convolution,ConvolutionDepthwise,ReLU

The following table show the Top1 and Top5 different between Float32 and Int8 inference.

Models	FP32		INT8		Loss
	Top1	Top5	Top1	Top5	Diff Top1	Diff Top5
SqueezeNet v1.1	57.78%	79.88%	57.82%	79.84%	+0.04%	-0.04%
MobileNet v1	67.26%	87.92%	66.74%	87.43%	-0.52%	-0.49%
GoogleNet	68.50%	88.84%	68.62%	88.68%	+0.12%	-0.16%
ResNet18	65.49%	86.56%	65.30%	86.52%	-0.19%	-0.04%
ResNet50	71.80%	89.90%	71.76%	90.06%	-0.04%	+0.16%

We use VOC0712,MSCOCO Dataset to complete some detection test.

Type	Detail
Test Dataset	VOC2007
Unit	mAP (Class 20)

Models	FP32	INT8	Loss
SqueezeNet SSD	61.80	61.27	-0.53
MobileNet_v1 SSD	70.49	68.92	-1.57

Speed up

The following table show the speedup between Float32 and Int8 inference. It should be noted that the winograd algorithm is enable in the Float32 and Int8 inference. The Hardware Platform is Hisi3519(Cortex-A17@880MHz)

Uint(ms)	SqueezeNet v1.1	MobileNet v1	GoogleNet	ResNet18	MobileNetv1 SSD	SqueezeNet SSD
Float32	282	490	1107	985	970	610
Int8	192	369	696	531	605	498
Ratio	x1.46	x1.33	x1.59	x1.85	x1.60	x1.22

Memory reduce

Runtime Memory : mbytes

Models	fp32-wino63	int8-wino23	int8-wino43
squeezenet_v1_1	50	30	32
mobilenet_v1	61	35	35
mobilenet_v1_ssd	90	45	45
squeezenet_v1_ssd	210	70	94
resnet18	335	77	130
googlenet_v1	154	72	89

Storage Memory : mbytes

Models	fp32	int8
squeezenet_v1_1	4.71	1.20
mobilenet_v1	16.3	4.31
mobilenet_v1_ssd	22.0	5.60
squeezenet_v1_ssd	21.1	5.37
resnet18	44.6	11.2
googlenet_v1	26.6	6.72

Contributor

Thanks to NVIDIA for providing the principle of correlation entropy and ncnn's author nihui sharing his neural network inference framework.

Thanks to the help from the following friends:

Optimization Instructor : Fugangping, bruce.zhang

Algorithm : xupengfeixupf, JansonZhu, wangxinwei, lengmm

Python : daquexian

License

BSD 3 Clause

caffe-int8-convert-tools's People

Contributors

Stargazers

Watchers

Forkers

nihui suzhenghang lengmm zxt881108 huajianni666 tianylijun hanchaow zzuxzt linajjltlxl barongeng lyk125 qaz734913414 qcd-horizon yyfyan ewenwan canteen-man wuzzh daquexian mshiyu k9sret zgsxwsdxg dl-85 runauto yuankkk lippman1125 bionllo lvchigo irvingshu georgeokelly larsoncs keyky hklee2040 hjchen2 angiend hugo1994 eric612 cysin binbinmeng jerrybonjour labimage tianzhongsong zyc4me jebtang wavelet2008 lpye bigbigzxl xyt2008 liaoheping sonixixi cuongdv1 cvtuge jarvan-ts 646677064 litianjian 1018365842 shuangzixing89 rrawther wg1996 miwaliu feirenlg miaochenguo superpoca hanson-young zhyj3038 sysuzyq praesc ritchiehuang11 azhiltz lenny09 f18298335152h aid-learning-team aidlearning liuyinan1988 5for3to1 aiyangyang963 monkeyking elinx bensonlp ason93 chaoso look-recognize overut taobiaoli bingxinhu willamezhang dengdxq dxqjean raytang88 fangbaolei windzhougithub baucheng tpoisonooo stormkingz jinliemma gpus chentyjpm gongchenghhu metallicguitar amirunpri2018 freshmou

caffe-int8-convert-tools's Issues

Does it Plan to open ncnn-int8 source?

量化后变得更慢了？

caffe-int8-convert-tools -----Error

Hi BUG1989，
when I use caffe-int8-convert-tool-dev.py ,there is an error
File "caffe-int8-convert-tool-dev.py", line 379, in weight_quantize
raise ValueError("First layer should be input")
ValueError: First layer should be input?

my protxt file has “input”：

input: "data"
input_shape {
dim: 1
dim: 3
dim: 300
dim: 300
}
layer {
name: "Conv"
type: "Convolution"
bottom: "data"
top: "Conv"
param {
lr_mult: 1.0
decay_mult: 1.0
}

Conv time of squeezenet-int8

Hi thanks for the great work!
I generated squeezenet-int8 by using the squeezenet_v1_1.table you provided and the squeezenet_v1.1.bin and .param files provided by tencent/NCNN.

The size of new squeezenet-int8 went down to 1.2Mb, also it did work and produced the similar results of test images.

However when I use benchmark(provided by NCNN) to test the 2 models on Macbook Pro, the original squeezenet took 138ms while the int8 model took 174ms.

The results of int8 model are as below:
...
64 {fire3/expand3x3, 5.022, 0.027} 1
65 {fire2/expand3x3, 5.486, 0.030} 1
66 {fire4/expand3x3, 5.709, 0.031} 1
67 {fire5/expand3x3, 5.712, 0.031} 1
68 {fire6/expand3x3, 6.511, 0.035} 1
69 {fire9/expand3x3, 8.201, 0.044} 1
70 {fire8/expand3x3, 8.300, 0.045} 1
71 {pool1, 8.631, 0.046} 1
72 {conv1, 21.531, 0.116} 1
73 {conv10, 59.041, 0.318} 1
sum=185.699 ms
795 = 0.195848
655 = 0.105379
608 = 0.061154
Program ended with exit code: 0

I checked the time for each layer, the time difference might caused by conv10, which took 24ms and 59ms in the original model and int8 model, respectively.

Can you please share your int8 model or give some hints about why did this happen? Thanks!

自己使用此工具对Squeezenet模型进行int8转换遇到问题

你好！
作者给了squeezenet的table，我自己也实际用squeezenet模型转换生成table，但是，我这边产生的table有些地方是：
fire2/squeeze1x1 inf
fire2/expand1x1 inf
fire2/expand3x3 inf
fire3/squeeze1x1 inf
fire3/expand1x1 inf
fire3/expand3x3 inf
fire4/squeeze1x1 inf
fire4/expand1x1 inf
fire4/expand3x3 inf
fire5/squeeze1x1 inf
fire5/expand1x1 inf
......

看着不太对啊！

生成的log好多0，类似的：
conv1: value range 0 - 0.0, interval 0.0, interval num 2048
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

我用作者给的table和我自己转换生成的table分别用ncnn进行推理。
发现作者提供的，推理结果是可以的，而我这边的，结果就不能看了，很差。
所以，我想问，我的做法是不是有问题，不知道哪里出错了，能否指点一下？
谢谢！

do this tools upport darknet int8 convert

QUANTIZE_WINOGRAND_NUM 的作用

请问QUANTIZE_WINOGRAND_NUM这个值有什么作用啊？为什么要设置为31？望告知谢谢~

How to quantize BatchNorm？

看到你们的工作很受启发
我正在量化带batchnorm的VGG16，想要确认一下如何处理batchnorm。
我现在的理解是int8卷积恢复float32后还是用原来的mean，var，bias来处理，不知道这么做对不对。

得到table文件后，想转成int8的caffemodel文件

现在已经得到了table文件
想转成int8的caffemodel文件应该怎么入手呢？
谢谢~

网络压缩

您好，你在压缩模型为什么要用到一张训练集的图片，以及测试用的prototxt？

tensorflow2ncnn 对pb有量化操作么

tensorflow2ncnn 工具有量化操作么，还是用tf的量化工具 quantize_graph 量化后，再使用tensorflow2ncnn 转

dequantize and quantize fusion of coadjacent convolution blocks

fuse dequantize and quantize across convolution blocks via requantize operation
since all int8 scales for activations are all positive, we can place relu activation after the requantize

requantize1(X) = quantize2(dequantize1(X) + bias1))

less memory footprint, faster quantized activation and less quantization overhead results better performance ;)

senario, current scheme, expected scheme

conv1 + relu1 + conv2

quantize1
int8conv1
dequantize1(+bias)
relu1
quantize2
int8conv2
dequantize2(+bias)

quantize1
int8conv1
requantize1(+bias+quantize2)
int8relu1
int8conv2
dequantize2(+bias)

conv1 + relu1 + dwconv2 + relu2 + conv3

quantize1
int8conv1
dequantize1(+bias)
relu1
quantize2(dw)
int8conv2(dw)
dequantize2(dw+bias)
relu2
quantize3
int8conv3
dequantize3(+bias)

quantize1
int8conv1
requantize1(+bias+quantize2(dw))
int8relu1
int8conv2(dw)
requantize2(dw+bias+quantize3)
int8relu2
int8conv3
dequantize3(+bias)

运行后有错误。

File "caffe-int8-convert-tool.py", line 468, in
main()
File "caffe-int8-convert-tool.py", line 448, in main
transformer = network_prepare(net, mean, norm)
File "caffe-int8-convert-tool.py", line 268, in network_prepare
transformer.set_mean('data', img_mean)
File "/home/ivan/caffe-master/python/caffe/io.py", line 251, in set_mean
raise ValueError('Mean channels incompatible with input.')
ValueError: Mean channels incompatible with input.

mobilenet_v1_ssd_weight.table

请问，这个table文件怎么使用？和其他table表相比，原理有何不同吗？谢谢。

I found the tool not support 1x3 3x1 kernels

dear @BUG1989

I found the tool not support 1x3 3x1 kernels

If I want to get the 1x3 3x1 int8 scale ,how can I do ?

请问faster-rcnn的VGG16模型怎么使用这个工具压缩？

我用这个代码会报错
代码如下

python caffe-int8-convert-tool-dev-weight.py --proto=/home/smartgrid307/sjt/Compression/origin/test.prototxt --model=/home/smartgrid307/caffe/py-faster-rcnn/data/faster_rcnn_models/vgg16_60000_190612.caffemodel --mean 102.9801 115.9465 122.7717 --norm=0.0625 --images=/home/smartgrid307/caffe/py-faster-rcnn/data/VOCdevkit2007/VOC2007/JPEGImages/ --output=VGG16.table --gpu=0

报错信息如下：

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 483:21: Message type "caffe.LayerParameter" has no field named "roi_pooling_param".
F0616 20:49:49.845695 15794 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/smartgrid307/sjt/Compression/origin/test.prototxt
*** Check failure stack trace: ***
已放弃 (核心已转储)

caffe我是用faster-rcnn工程里的，可以用来训练，但是这里会报错

支持yolo量化吗？

如题

generate different scale for the same blob used as multiple convolution layers

scenario

A -> conv1 -> B
B -> conv2 -> C
B -> conv3 -> D

currently, the same value is used as the B int8_scale for both conv2 and conv3

expected behavior
since the weight of conv2 and conv3 are not identical, if we apply different value for the blob B, we may get less precision decline in the output C and D

Not work in convolution layer.

Good convert tool!!
I tried to convert my model and I have some issues.

conv2a_param_0 161.69534342797343
conv2_param_0 273.37462513263256
conv3a_param_0 198.5867022152598
conv3_param_0 533.0213728867503
conv4a_param_0 146.93212834686443
conv4_param_0 406.008502200532
conv5a_param_0 237.84033457170227
conv5_param_0 430.328425008172
data 184.220261804279
conv1 32.20969187597571
slice1_1 39.3371286651116
eltwise1 76.43406224473419

When passing conv2a on ncnn, I got all NaN values.

When I convert model, I got following warnings every time.

caffe-int8-convert-tool.py:113: RuntimeWarning: divide by zero encountered in true_divide
return np.sum(dist_a[nonzero_inds] * np.log(dist_a[nonzero_inds] / dist_b[nonzero_inds]))
data bin : 2019 threshold : 0.689392 interval : 0.000341 scale : 184.220262
caffe-int8-convert-tool.py:187: RuntimeWarning: invalid value encountered in double_scalars
expand_value = quantize_distribution[i] / count

And with this error, when I inference on ncnn, I got all NaN values.
I use bmp images for image dataset.
and norm is 1/256.

What could be the problem?
Hope to any suggestions.

Question about casting float number

@BUG1989 I wonder how do you convert number from fp32 to int8? for ex: 1.943232 should be 2 or 1?

raise ValueError("First layer should be input")

Why first layer type have to be 'input'? There is no such data layer in deploy.prototxt.

arm v7 上量化模型中卷积层消耗的时间，有个规律卷积核个数越多的卷积层，量化模型这一卷积层时间消耗降低越明显。卷积核个数越少的话，量化模型这一卷积层时间消耗相比float32模型，基本没有变化，甚至时间消耗增加。

Does the version of protobuf affect the results?

Hi, I try the new release of Apr 3, 2019, but it end up with “IndexError: list index (0) out of range” when calling "layer.convolution_param.kernel_size[0]", so I think it may be because the version of protobuf, which is 3.6.1 in my case, and what is yours?

int8 only used to classify net work ,can not used to segmentation network ?

I use this tool to calibrate the my segmentation network ,the iou is very bad .

@BUG1989

KL计算时疑问

您好，请教一个问题，计算KL散度时，把第一个bin去掉了，为什么？认为这部分值太小，忽略吗？

Why the range of blob_distubution is set to (0,th) in caffe-int8-convert-tool-dev.py:116

As we all know, blob_data is between -blob_max and +blob_max, why blob_distubution range is set to (0,th).

怎么在caffe上测试准确率呢？

已经可以吧模型转换成ncnn格式了，但是想再caffe上测试怎么测试呢？

network_prepare

network_prepare函数里面有这样的顺序：mean-->resizeto255-->norm
不应该是mean-->norm然后再resize到255吗？

校准权重矩阵

你所用到的验证集图片作用是不是校准量化后的权重矩阵？这个具体的原理是？，但是为什么要用图片均值文件号不是很理解？

pc测试，int8速度降低一半

为什么我用你的工具转出table后：
python caffe-int8-convert-tool-dev.py --proto=model/mobilenet_ssd.prototxt --model=model/mobilenet_ssd.caffemodel --mean 127.5 127.5 127.5 --norm=0.007843 --images=images --output=model/mnet_ssd.table --gpu=1
然后使用./caffe2ncnn mobilenet_ssd.prototxt mobilenet_ssd.caffemodel mobilenet_ssd_300_int8.param mobilenet_ssd_300_int8.bin 256 mnet_ssd.table生成param和bin文件
最后使用生成的param和bin文件替换原来的文件，在pc上测试，速度从80+ms上升到130+ms？
请问下这正常吗？谢谢回答。

error happens when use this tool

I don't know if calibration image data format is not correct.

Quantize the Activation:
/usr/local/lib/python3.5/dist-packages/skimage/io/_io.py:49: UserWarning: `as_grey` has been deprecated in favor of `as_gray`
  warn('`as_grey` has been deprecated in favor of `as_gray`')
Traceback (most recent call last):
  File "caffe-int8-convert-tool.py", line 468, in <module>
    main()
  File "caffe-int8-convert-tool.py", line 457, in main
    activation_quantize(net, transformer, images_files)
  File "caffe-int8-convert-tool.py", line 350, in activation_quantize
    net_forward(net, image, transformer)
  File "caffe-int8-convert-tool.py", line 220, in net_forward
    image = caffe.io.load_image(image_path)
  File "/home/jim/caffe/python/caffe/io.py", line 296, in load_image
    img = skimage.img_as_float(skimage.io.imread(filename, as_grey=not color)).astype(np.float32)
  File "/usr/local/lib/python3.5/dist-packages/skimage/io/_io.py", line 62, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/usr/local/lib/python3.5/dist-packages/skimage/io/manage_plugins.py", line 214, in call_plugin
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/skimage/io/_plugins/pil_plugin.py", line 36, in imread
    im = Image.open(f)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2295, in open
    % (filename if filename else fp))
OSError: cannot identify image file <_io.BufferedReader name='/home/jim/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb/lock.mdb'>

为何量化完都为inf

conv0 inf
conv1 inf
conv2_0 inf
conv2 inf
conv3 inf
conv3_0 inf
conv4 inf
conv5 inf

readme里面的implementation是不是写错了

if I have a new layer added by myself , can I use your tool to quantize the data

if I have a new layer added by myself , can I use your tool to quantize the data ?

报个case：量化后某个层的weight scale全为0

如题

如果用这样的table去给ncnn做量化的话，该层的输出就只剩下bias了（0 * weight ＋bias）。
这算是bug吗？

shufflenet v2+ssd performance loss too much

Thanks for the great tools!
When I use this tool to generate table from shufflenet v2+ssd prototxt and caffemodel, and use the caffe2ncnn tool to get int8 model, the performance loss too much, but mobilenet v1+ssd is fine with this tool. Why is it, any idea?

量化时对batchnorm层以及包含bias的conv层的处理方法？

我用从caffe模型转换ncnn，量化前后，结果始终有比较大的差异。

我的模型结构类似于mobilenet-ssd，包含带权重的batchnorm层。将batchnorm层的数据merge到conv层后conv层包含了bias。看caffe2ncnn.cpp里面line 710有一句注释“we will not quantize the bias values”，想请教下量化时是忽略了conv层的bias项吗？如果这样的话应该怎样处理batchnorm层呢，只能使用instance norm吗？

还有一个现象，我在生成table时使用了不同的mean, norm参数，生成的table最后几行的数值不一样。但是使用不同的table量化后模型的输出结果是一样的，请问这可能是什么引起的呢？我使用的是offline的量化模式

关于此工具量化与TensorRT量化

你好，有个问题想请教一下，也不知道你的联系方式，那就在这里提问好了。
我理解的TensorRT int8量化核心是KL散度校验，那么是不是实现了这些，就相当于实现了TensorRT int8量化呢？你开发的此工具，是不是相当于TensorRT int8量化呢？有什么改进或者不同吗？谢谢

Converted Mobilefacenet_Res model, low accuracy

Original issue was at [https://github.com/Tencent/ncnn/issues/798].

I've got a mobilefacenet model (with some extra res blocks), and converted it to ncnn using caffe2ncnn, and the performance is almost the same as running on caffe.

However, after I use the caffe int8 quantize tool, transfer it to ncnn int8 quantized model, there is a massive performance drop. I first suspect the preprocessing of the image (since I'm new to c++), but after I use the same inference code for int8 model on fp model, the fp model's performance is good, so the preprocess should be fine. The problem should be the quantized int8 ncnn model. Maybe I did something wrong during the quantization table creation? Or during the quantized ncnn transformation?

Would you please help me solve this issue, I'm now focusing to get the quantized model working on ncnn.

Many thanks.

Questions about the accuracy

Appreciate your contribution to int8 optimization.
I followed the steps you provided and got wrong results for squeezenet/mobilenet.
Firstly I used mean_vals(104 117 123) and got the int8 table for the squeezenet model in ncnn project. Then I used it to generate the .param and .bin. Later I tested a pic with this int8 model but got different result with the one on float model. And I noticed results from int8 model is normally small.
The same issue occured on my mobilenet trial.
I wonder if I missed any steps or did something wrong. Looking forward to your reply.

Doubts about Mean and Norm parameters

Hello,
Your tool is quite interesting. However i dont understand the parameters of Mean and Norm.
Is Mean the average of weights ?

Kind regards

mtcnn量化后精度下降严重

阈值组合[0.8,0.8,0.98]
量化组合 | WiderFace mAP |
P-R-O | 0.669 |
P(int8)-R-O | 0.493 |
P-R(int8)-O | 0.245 |
P(int8)-R(int8)-O | 0.158 |
对于检测网络的量化，怎么才能尽量减少精度丢失呢？

what does the parameter 'Norm' mean?

Excuse me, I am wondering what the parameter 'Norm' mean exactly.Thanks a lot!

只对卷积核为3x3和1x1的作了量化？

您好，非常感谢您提供的代码。
不过我在转模型的时候，发现alexnet的第一个卷积层并没有在.table文件中输出 8=1 ，看了一下您的代码发现：

        # find the convolution 3x3 and 1x1 layers to get out the weight_scale
        if(layer.type == "Convolution" or layer.type == "ConvolutionDepthwise"):
            kernel_size = layer.convolution_param.kernel_size[0]
            if(kernel_size == 3 or kernel_size == 1):

而alexnet的第一个卷积层是11x11的卷积核，请问一下为什么只对3x3和1x1的卷积核作量化呢？还是我理解得哪里不太对……

threshold_distribution中的代码问题

def threshold_distribution(distribution, target_bin=128):
"""
Return the best threshold value.
Ref: https://github.com//apache/incubator-mxnet/blob/master/python/mxnet/contrib/quantization.py
Args:
distribution: list, activations has been processed by histogram and normalize,size is 2048
target_bin: int, the num of bin that is used by quantize, Int8 default value is 128
Returns:
target_threshold: int, num of bin with the minimum KL
"""
distribution = distribution[1:]
length = distribution.size
threshold_sum = sum(distribution[target_bin:])
kl_divergence = np.zeros(length - target_bin)

for threshold in range(target_bin, length):
    sliced_nd_hist = copy.deepcopy(distribution[:threshold])

    # generate reference distribution p
    p = sliced_nd_hist.copy()
    p[threshold-1] += threshold_sum
    threshold_sum = threshold_sum - distribution[threshold]

    # is_nonzeros[k] indicates whether hist[k] is nonzero
    is_nonzeros = (p != 0).astype(np.int64)
    # 
    quantized_bins = np.zeros(target_bin, dtype=np.int64)
    # calculate how many bins should be merged to generate quantized distribution q
    num_merged_bins = sliced_nd_hist.size // target_bin <----- 这里如果不能整除，当sliced_nd_hist.size为128/129/130/...时，num_merged_bins结果都一样，这里其实只对sliced_nd_hist.size为128的倍数是才有意义 ?
    
    # merge hist into num_quantized_bins bins
    for j in range(target_bin):
        start = j * num_merged_bins
        stop = start + num_merged_bins
        quantized_bins[j] = sliced_nd_hist[start:stop].sum()
    quantized_bins[-1] += sliced_nd_hist[target_bin * num_merged_bins:].sum() <-----quantized_bins[-1] 意义是啥，是不是写错了，应该是quantized_bins[target_bin-1] ?

Caffe or NCNN？

你好！
工具名字有Caffe，但是基于框架是NCNN。
我想问一下该工具和Caffe是什么关系呢？

关于vgg-ssd转化的几个问题

hi，你好：

  我想问下：
  1. 目前ncnn开始支持vgg-like的3*3大小的卷积核的arm端了吗？
  2. 由于我训练的时候数据是转化成lmdb的，我现在在转化成table的时候，--images=test/images/这个该数据地址是用原始的训练图片（.jpg）就可以了吗，不用管标签吗？
  3. scale我在训练的时候是没填的，在这里直接默认为1即可？

   期待您的回答，谢谢

my network do not have the mean file ,so how to use this tool?

my network do not have the mean file ,
so how to use this tool to make the result is right?

my network is used to image segmentation.

per-group input and weight int8_scale for group convolution

scenario 1

A channel = 10
B channel = 10
A -> depthwiseconv -> B

expected behavior
generate 10 int8_scale for A, one for each channel
generate 10 int8_scale for depthwiseconv weight, one for each channel

scenario 2

A channel = 10
B channel = 10
A -> conv (group=2) -> B

expected behavior
generate 2 int8_scale for A, one for each 5 channels
generate 2 int8_scale for conv weight, one for each group

TypeError: unorderable types: int() > str()

Hi, How to fix this error? Thank you!