paddlepaddle / paddleslim Goto Github PK

View Code? Open in Web Editor NEW

1.5K 92.0 347.0 16.73 MB

PaddleSlim is an open-source library for deep model compression and architecture search.

Home Page: https://paddleslim.readthedocs.io/zh_CN/latest/

License: Apache License 2.0

Shell 0.17% Python 96.73% CMake 0.99% C++ 0.49% Jupyter Notebook 0.24% Cuda 1.38%

pruning quantization nas bert compression detection distillation ernie segmentation sparsity

paddleslim's People

Contributors

Stargazers

Watchers

Forkers

qingqing01 slf12 baiyfbupt ceci3 heavengate chaos1992 vincentxiyu zhangting2020 lielinjiang itminner kuke lijianshe02 faninsm godson1024 robot-ai-machinelearning cuda-convnet juncaipeng liuruixue1 meijiangyuan magi803 zhaoyisong lidanqing-intel ssige claychenlei weiniuzhu king666888 fblackp eason-zz good18 vinsonwu1985 ouxinyu doublepku checkguest jennyli-xin finder93 armelite dualxu gold2018 tuzhirun sconank libingbingdev deanechen xialinxi zhougf jingmouren yuhonghong7035 xiaolm2018 yuhonghong95721 nosoldier rakeshacharya-d chongwang-nlpr xiteng1988 wojtuss wobiguangyincuiruo yuantao15 yanggui19891007 tj1116 discussy dreameronair bond-h shyamalschandra wang-kangkang wanghaoshuang leisheng526 amy2017new mcl-stone liguang190223 feng-zhen yukavio sunzongdi mmglove xwyangjshb cq2019git cryoco huangxu96 aboutmei liuqiaoping7 francisj7 jiansowa ayantian 13957166977 jackwin asdlei99 joejiong edwardnguyen1705 mr-c-good i-spark eddieback cheeryoung79 txhan lihuaqiang0101 oynevo zhudibo zhudibo-unwulian zhangpuming liam-i jason2003 toydogcat ruizewang iotctech

paddleslim's Issues

transformer distilling出错

我试了transformer的distill,train了两个batch之后提示如下错误：
Error: Tensor holds no memory. Call Tensor::mutable_data first.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.cc:23)
[operator < elementwise_div > error]

是内存原因吗？

安装时错误

使用从clone代码安装会报错：

from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
ImportError: cannot import name 'PostTrainingQuantization'

使用pip安装不会报错，但运行代码会报上述错误

quant_post量化yolov3报错 KeyError: 'stage.3.7.0.conv.weights'

CPU 8
RAM 32GB
GPU v100
显存 16GB
磁盘 100GB
环境配置
Python版本 python3.7
框架版本 PaddlePaddle 1.7.0
@slf12
aistudio@jupyter-115786-193843:~/post_training_quantization_withdata$ sh run_post_training_quanzation.sh
-------------------args----------------------
algo: KL
batch_nums: 20
batch_size: 3000
is_full_quantize: False
model_dir: ../work/PaddleDetection_1/yolov3_dark_freeze/mj_yolov3_darknet
model_filename: None
params_filename: None
save_model_path: yolov3_int8_model
use_gpu: True

W0314 12:14:31.721148 637 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0314 12:14:31.725704 637 device_context.cc:245] device: 0, cuDNN Version: 7.3.
2020-03-14 12:14:34,203-INFO: all run batch: 0
2020-03-14 12:14:34,203-INFO: all run batch: 0
2020-03-14 12:14:34,203-INFO: calculate scale factor ...
2020-03-14 12:14:34,203-INFO: calculate scale factor ...
Traceback (most recent call last):
File "post_training_quantization.py", line 66, in
batch_nums=10)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/quant/quanter.py", line 306, in quant_post
post_training_quantization.quantize()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 231, in quantize
self._calculate_scale_factor()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 353, in _calculate_scale_factor
data = self._sampling_data[var_name]
KeyError: 'stage.3.7.0.conv.weights'

[DocumentTranslation]API-analysis

https://paddlepaddle.github.io/PaddleSlim/api/analysis_api/

[DocumentTranslation]home page

https://paddlepaddle.github.io/PaddleSlim/

用quant_post量化了一个yolov3_darknet的的模型，推理加载出错

环境：
paddle1.7
出错信息：
PaddleCheckError: OP(LoadCombine) fail to open file ..\yolov3_darknet_quant_924_params_, please check whether the mod
el file is complete or damaged. at [D:\1.6.1\paddle\paddle/fluid/operators/load_combine_op.h:46]

量化代码
def quantize(): val_reader = mjreader.custom_reader(images_lists, data_dir, input_size,mode) place = fluid.CUDAPlace(0) exe = fluid.Executor(place) quant_post( executor=exe, model_dir='../work/PaddleDetection/yolov3_freeze/yolov3_darknet', quantize_model_path='./yolov3_darknet_quant_924/', sample_generator=val_reader, model_filename='__model__', params_filename='__params__', batch_size=16, batch_nums=20) def main(): quantize()
用quant_post量化了一个yolov3_darknet的的模型，得到的模型如何进行加载推理

[DocumentTranslation]API-distillation

https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/

[DocumentTranslation]API-prune

https://paddlepaddle.github.io/PaddleSlim/api/prune_api/

剪枝例子报错

slim裁剪，如何获取卷积层名字？

问题一：
官网提供的很多教程都是基于命令行形式的
但是slim中很多API参数都需要program的概念，我要如何获取这个参数？
比如使用裁剪的时候，需要先获取卷积层的名字，应该什么方式获取？
模型训练完保存是这样的

问题二：
slim的API很多都是基于静态图的方法（根据参数来做的判断），在动态图上使用是否有连接教程？

[DocumentTranslation]API-quantization

https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/

关于量化的问题

文档中关于量化的配置：
weight_bits(int) - 参数量化bit数，默认8, 可选1-8，推荐设为8，因为量化后的数据类型是 int8 。
activation_bits(int) - 激活量化bit数，默认8，可选1-8，推荐设为8，因为量化后的数据类型是 int8 。
dtype(int8) - 量化后的参数类型，默认 int8 , 目前仅支持 int8 。
请问我如果把weight_bits(int) 设为7，把activation_bits(int)设为7，然后训练，模型的结果相当于7bit量化吗？

官方文档的图片很多都加载不出来呀

参考这个页面

请问一下，如果针对COCO数据集的目标检测任务，使用强化学习策略进行NAS搜索有什么推荐的配置吗？

尊敬的开发者：
你好！
想请教一下，对于COCO数据集的目标检测任务，使用强化学习策略进行NAS搜索，有什么推荐的GPU配置吗？
我想知道我们现在的GPU服务器的配置是否适合这个任务，
期待你的回复！

[DocumentTranslation]tutorials-distillation

在aistudio上运行sa_nas_mobilenetv2出现错误

运行时输出如下：
aistudio@jupyter-7623-23204:~/work/PaddleSlim/demo/nas$ python sa_nas_mobilenetv2.py --class_dim 10 --lr 0.01
Namespace(batch_size=256, class_dim=10, data='cifar10', is_server=True, lr=0.01, search_steps=100, use_gpu=True)
2020-01-06 16:57:17,903-INFO: range table: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])
2020-01-06 16:57:17,903-INFO: ControllerServer - listen on: [172.25.33.199:8989]
2020-01-06 16:57:17,904-INFO: Controller Server run...
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py", line 45, in convert_to_list
value_list = list(value)
TypeError: 'numpy.int64' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "sa_nas_mobilenetv2.py", line 318, in
search_mobilenetv2(config, args, image_size, is_server=args.is_server)
File "sa_nas_mobilenetv2.py", line 92, in search_mobilenetv2
train_program, startup_program, image_shape, archs, args)
File "sa_nas_mobilenetv2.py", line 49, in build_program
output = archs(data)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 186, in net_arch
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 311, in _invresi_blocks
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/mobilenetv2.py", line 271, in _inverted_residual_unit
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim-0.1-py3.7.egg/paddleslim/nas/search_space/base_layer.py", line 52, in conv_bn_layer
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 2721, in conv2d
filter_size = utils.convert_to_list(filter_size, 2, 'filter_size')
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py", line 49, in convert_to_list
value))
ValueError: The filter_size's type must be list or tuple. Received: 3

请问怎么处理下？谢谢
另外， block_sa_nas_mobilenetv2.py 运行也是这个错误

模型量化后，不支持fluid.ParallelExecutor吗？

执行
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)
val_program = fluid.default_main_program().clone(for_test=True)
再训练quant_program，发觉不能用fluid.ParallelExecutor，提示
AttributeError: 'CompiledProgram' object has no attribute '_enable_dgc'

如何理解敏感度裁切中的greedy_prune和普通的prune区别？

1.如提问
2.下列代码中，#标出的部分有何意义？为什么要乘2？

def flops_sensitivity(program,
                      place,
                      param_names,
                      eval_func,
                      sensitivities_file=None,
                      pruned_flops_rate=0.1):

    assert (1.0 / len(param_names) > pruned_flops_rate)

    scope = fluid.global_scope()
    graph = GraphWrapper(program)
    sensitivities = load_sensitivities(sensitivities_file)

    for name in param_names:
        if name not in sensitivities:
            sensitivities[name] = {}
    base_flops = flops(program)
    target_pruned_flops = base_flops * pruned_flops_rate

    pruner = Pruner()
    baseline = None
    for name in sensitivities:

        pruned_program, _, _ = pruner.prune(
            program=graph.program,
            scope=None,
            params=[name],
            ratios=[0.5],
            place=None,
            lazy=False,
            only_graph=True)
       ################################################
        param_flops = (base_flops - flops(pruned_program)) * 2
        channel_size = graph.var(name).shape()[0]
        pruned_ratio = target_pruned_flops / float(param_flops)
       ################################################
        pruned_ratio = round(pruned_ratio, 3)
        pruned_size = round(pruned_ratio * channel_size)
        pruned_ratio = 1 if pruned_size >= channel_size else pruned_ratio

        if len(sensitivities[name].keys()) > 0:
            _logger.debug(
                '{} exist; pruned ratio: {}; excepted ratio: {}'.format(
                    name, sensitivities[name].keys(), pruned_ratio))
            continue
        if baseline is None:
            baseline = eval_func(graph.program)
        param_backup = {}
        pruner = Pruner()
        _logger.info("sensitive - param: {}; ratios: {}".format(name,
                                                                pruned_ratio))
        loss = 1
        if pruned_ratio < 1:
            pruned_program = pruner.prune(
                program=graph.program,
                scope=scope,
                params=[name],
                ratios=[pruned_ratio],
                place=place,
                lazy=True,
                only_graph=False,
                param_backup=param_backup)
            pruned_metric = eval_func(pruned_program)
            loss = (baseline - pruned_metric) / baseline
        _logger.info("pruned param: {}; {}; loss={}".format(name, pruned_ratio,
                                                            loss))
        sensitivities[name][pruned_ratio] = loss
        _save_sensitivities(sensitivities, sensitivities_file)

        # restore pruned parameters
        for param_name in param_backup.keys():
            param_t = scope.find_var(param_name).get_tensor()
            param_t.set(param_backup[param_name], place)
    return sensitivities

3.下列代码中min_loss和max_loss初始值均为0，会导致while循环不执行

def get_ratios_by_sensitive(self, sensitivities, pruned_flops,
                                eval_program):
        """
        Search a group of ratios for pruning target flops.
        Args:
          sensitivities(dict): The sensitivities used to generate a group of pruning ratios. The key of dict
                               is name of parameters to be pruned. The value of dict is a list of tuple with
                               format `(pruned_ratio, accuracy_loss)`.
          pruned_flops(float): The percent of FLOPS to be pruned.
          eval_program(Program): The program whose FLOPS is considered.
        Returns:
          dict: A group of ratios. The key of dict is name of parameters while the value is the ratio to be pruned.
        """

        min_loss = 0.
        max_loss = 0.
        # step 2: Find a group of ratios by binary searching.
        base_flops = flops(eval_program)
        ratios = None
        max_times = 20
        while min_loss < max_loss and max_times > 0:
            loss = (max_loss + min_loss) / 2
            _logger.info(
                '-----------Try pruned ratios while acc loss={}-----------'.
                format(loss))
            ratios = self.get_ratios_by_loss(sensitivities, loss)
            _logger.info('Pruned ratios={}'.format(
                [round(ratio, 3) for ratio in ratios.values()]))
            pruned_program = self._pruner.prune(
                eval_program,
                None,  # scope
                ratios.keys(),
                ratios.values(),
                None,  # place
                only_graph=True)
            pruned_ratio = 1 - (float(flops(pruned_program)) / base_flops)
            _logger.info('Pruned flops: {:.4f}'.format(pruned_ratio))

            # Check whether current ratios is enough
            if abs(pruned_ratio - pruned_flops) < 0.015:
                break
            if pruned_ratio > pruned_flops:
                max_loss = loss
            else:
                min_loss = loss
            max_times -= 1
        return ratios

请问一下，硬件延时表，PaddleSlim是会自动测量吗，还是需要我们给出呀？

尊敬的开发者：
你好！
关于硬件延时表的话，PaddleSlim是会自动测量吗，还是需要我们给出呀？
期待你的回复！

[DocumentTranslation]tutorials-prune

安装问题

安装不成功，使用pip安装成功后，出现以下问题：

使用源码安装，出现以下问题：

另外，给的demo中，尝试运行了多个，结果都无法正常运行，主要是：
sensitive：无法运行，merge_sensitive()报错；
auto_prune：无法运行，报错信息如下

sensitive_prune：无法正常运行完。报错信息如下

几乎给的demo都无法正常运行

我在用slim剪枝时优化器报错怎么解决

pruned_program, _, _ = pruner.prune(train_program, fluid.global_scope(),
params=ratios.keys(),ratios=ratios.values(), place=place)

报错为：Error: Param and Velocity of MomentumOp should have the same dimension.
[Hint: Expected param_dim == ctx->GetInputDim("Velocity"), but received param_dim:12544, 4096 != ctx->GetInputDim("Velocity"):24832, 4096.] at (/paddle/paddle/fluid/operators/optimizers/momentum_op.h:79)
[operator < momentum > error]
应该是说参数数量不匹？但我剪枝的话参数数量肯定会减少吧

distilling报错

忧伤死了，调了半天，最后报个这个
Error: Tensor holds no memory. Call Tensor::mutable_data first.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.cc:23)
[operator < lookup_table_v2 > error]
实在没头绪了，查issue似乎有别人也报过这个，希望官方注意下吧

yolov3训练后量化出错

CPU8
RAM32GB
GPUv100
显存16GB
磁盘100GB
环境配置
Python版本python3.7
框架版本 PaddlePaddle 1.7.0
运行代码：

input_size=(3, 512, 512)
sys.path[0] = os.path.join(
    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)


        

def quantize():
    val_reader = mjreader.custom_reader(images_lists, data_dir, input_size,mode)
    place = fluid.CUDAPlace(0) 
    exe = fluid.Executor(place)
    quant_post(
        executor=exe,
        model_dir='../work/PaddleDetection/yolov3_dark_freeze/mj_yolov3_darknet',
        quantize_model_path='./yolov3_darknet_quant/',
        sample_generator=val_reader,
        model_filename='__model__',
        params_filename='__params__',
        batch_size=16,
        batch_nums=20)

Error Message Summary:

InvalidArgumentError: Input(ImgSize) dim[0] and Input(X) dim[0] should be same.
[Hint: Expected dim_imgsize[0] == dim_x[0], but received dim_imgsize[0]:4800 != dim_x[0]:16.] at (/paddle/paddle/fluid/operators/detection/yolo_box_op.cc:50)
[operator < yolo_box > error]

是否支出训练自己的数据集

paddleslim是否支持训练自己的数据集，yolov3_mobilenet_v1_voc_prune下载的一堆权重文件怎么使用？是否有教程或者视频等

analysis中的model_size计算

该计算中对于模型中bn层，除了scale和offset外，将mean和variance也统计为了参数量，后两者没有经过优化算法优化过，不是学习到的，是否应该算作参数量？

为什么PaddleSlim模型量化时不量化卷积层的偏置？

另外，对于激活值得量化，为什么一般采用滑动窗口平均得办法，而不是最简单得abs_max，有什么依据吗？

nas 对搜索到的模型的中间信息进行输出

我尝试使用paddleslim平台运行https://github.com/PaddlePaddle/PaddleSlim/blob/release/1.0.1/docs/zh_cn/tutorials/image_classification_nas_quick_start.ipynb 网址下的例程并尝试将网络模型和参数输出，但是遇到了问题，在代码中并没有将全连接层以外的其他网络层定义出，我理解其他网络层的架构是使用：archs = sanas.next_archs()[0] 总体描述的，但是当我编写输出语句时，输出语句中fluid.io.save_inference_model(dirname=save_path, feeded_var_names=[data.name], target_vars=[predict], executor=exe) 其中target_vars=[predict] predict=其他网络层+最后全连接层，也就是说：
archs = sanas.next_archs()[0]
output = fluid.layers.fc(input=output, size=10)
predict = archs+output

如果我的理解没有错误，那么输出语句应该怎么表达才能将整个网络架构的信息进行输出呢？

[DocumentTranslation]tutorials-sanas

https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md

How to reuse the result of pruning

[DocumentTranslation] tutorials-quantization

https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/
https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/
https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_embedding_demo/

加载通道裁剪后的模型，再进行蒸馏报错

    student_program = fluid.Program()
    s_startup = fluid.Program()
    with fluid.program_guard(student_program, s_startup):
        with fluid.unique_name.guard():
            # model definition
           ...
    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
    load_model(exe, student_program, args.pretrained_model)  #prune.io
    val_program = student_program.clone(for_test=True)

    teacher_model = models.__dict__[args.teacher_model]()
    teacher_program = fluid.Program()
    t_startup = fluid.Program()
    with fluid.program_guard(teacher_program, t_startup):
        with fluid.unique_name.guard():
            # teacher model definition
           ...

    exe.run(t_startup)
    if args.teacher_pretrained_model:
        def if_exist(var):
            return os.path.exists(
                os.path.join(args.teacher_pretrained_model, var.name))
        fluid.io.load_vars(
            exe,
            args.teacher_pretrained_model,
            main_program=teacher_program,
            predicate=if_exist)

    data_name_map = {'images': 'images'}
    merge(teacher_program, student_program, data_name_map, place)
    with fluid.program_guard(student_program, s_startup):
        dist_loss = soft_label_loss("teacher_fc_0.tmp_0", "fc_0.tmp_0", student_program)
        loss = avg_cost + dist_loss
        lr, opt = create_optimizer(args)
        opt.minimize(loss)
    exe.run(s_startup)

会报以下错误：

Error: Param and Velocity of MomentumOp should have the same dimension.
  [Hint: Expected param_dim == ctx->GetInputDim("Velocity"), but received param_dim:256 != ctx->GetInputDim("Velocity"):128.] at (/paddle/paddle/fluid/operators/optimizers/momentum_op.h:79)
  [operator < momentum > error]

怀疑是exe.run(s_startup)这句代码覆盖了load_model ，但是optimizer又需要初始化，请问如何解决？

蒸馏时将teacher的变量加入student的program，那保存训练断点时模型参数会否变大？

如题。模型是transformer模型

yolov3剪枝训练时，优化器报错

训练部分主要代码如下：

def train():

    logger.info("start train YOLOv3, train params:%s", str(train_parameters))

    logger.info("create place, use gpu:" + str(train_parameters['use_gpu']))

    logger.info("build network and program")

    place = fluid.CUDAPlace(0) if train_parameters['use_gpu'] else fluid.CPUPlace()
    exe = fluid.Executor(place)

    scope = fluid.Scope()
    train_program = fluid.Program()
    start_program = fluid.Program()
    test_program = fluid.Program()
    
    feeder, reader, loss = build_program_with_feeder(train_program, start_program, place)

    pred = build_program_with_feeder(test_program, start_program, istrain=False)
    
    test_program = test_program.clone(for_test=True)
    
    train_fetch_list = [loss.name]
    
    exe.run(start_program, scope=scope)
    
    load_pretrained_params(exe, train_program)
    
    if train_parameters['print_params']:
        param_delimit_str = '-' * 20 + "All parameters in current graph" + '-' * 20
        print(param_delimit_str)
        for block in train_program.blocks:
            for param in block.all_parameters():
                print("parameter name: {}\tshape: {}".format(param.name,
                                                             param.shape))
        print('-' * len(param_delimit_str))
    
    pruned_params = train_parameters['pruned_params'].strip().split(",")
    logger.info("pruned params: {}".format(pruned_params))
    pruned_ratios = [float(n) for n in train_parameters['pruned_ratios'].strip().split(",")]
    logger.info("pruned ratios: {}".format(pruned_ratios))
    
    logger.info("build executor and init params")
    
    pruner = Pruner()
    train_program = pruner.prune(
        train_program,
        scope,
        params=pruned_params,
        ratios=pruned_ratios,
        place=place,
        only_graph=False)[0]
    
    base_flops = flops(test_program)
    test_program = pruner.prune(
        test_program,
        scope,
        params=pruned_params,
        ratios=pruned_ratios,
        place=place,
        only_graph=True)[0]
    pruned_flops = flops(test_program)

    stop_strategy = train_parameters['early_stop']
    rise_limit = stop_strategy['rise_limit']

    min_loss = stop_strategy['min_loss']
    # stop_train = False
    rise_count = 0
    total_batch_count = 0
    current_best_f1 = 0.0
    train_temp_loss = 0
    current_best_pass = 0
    current_best_box_pass = 0
    current_best_recall = 0
    current_best_precision = 0
    current_best_box_recall = 0
    current_best_box_precision = 0
    current_best_box_f1 = 0
    for pass_id in range(train_parameters["num_epochs"]):
        logger.info("current pass: {}, start read image".format(pass_id))
        batch_id = 0
        total_loss = 0.0
        for batch_id, data in enumerate(reader()):
            t1 = time.time()
            loss = exe.run(train_program, feed=feeder.feed(data), fetch_list=train_fetch_list)
            period = time.time() - t1
            loss = np.mean(np.array(loss))
            total_loss += loss
            batch_id += 1
            total_batch_count += 1
            
            if batch_id % 200 == 0:
                logger.info("pass {}, trainbatch {}, loss {} time {}".format(pass_id,
                                                                             batch_id, loss, "%2.2f sec" % period))
        pass_mean_loss = total_loss / batch_id
        logger.info("pass {0} train result, current pass mean loss: {1}".format(pass_id, pass_mean_loss))

    logger.info("end training")`

#########################################################
#####################报错信息如下：

Python Call Stacks (More useful to users):
------------------------------------------
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2594, in _prepend_op
    attrs=kwargs.get("attrs", None))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 5472, in autoincreased_step_counter
    attrs={'step': float(step)})
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/learning_rate_scheduler.py", line 48, in _decay_step_counter
    counter_name='@LR_DECAY_COUNTER@', begin=begin, step=1)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/learning_rate_scheduler.py", line 387, in piecewise_decay
    global_step = _decay_step_counter()
  File "train.py", line 265, in optimizer_momentum_setting
    learning_rate=fluid.layers.piecewise_decay(boundaries=boundaries, values=values),
  File "train.py", line 371, in get_loss
    optimizer = optimizer_momentum_setting()
  File "train.py", line 306, in build_program_with_feeder
    loss = get_loss(model, outputs, gt_box, gt_label, main_prog)
  File "train.py", line 403, in train
    feeder, reader, loss = build_program_with_feeder(train_program, start_program, place)
  File "train.py", line 544, in <module>
    train()

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The Tensor in the increment Op's Input Variable X(@LR_DECAY_COUNTER@) is not initialized.
  [Hint: Expected t->IsInitialized() == true, but received t->IsInitialized():0 != true:1.] at (/paddle/paddle/fluid/framework/operator.cc:1264)
  [operator < increment > error]

模型量化后运行预测速度会提高吗？

如题

How to make pruning support for SGD operator

Please register SGD in this script: https://github.com/PaddlePaddle/PaddleSlim/blob/release/1.0.1/paddleslim/prune/prune_walker.py#L526

Add lines as below:

@PRUNE_WORKER.register
class sgd(PruneWorker):
    def __init__(self, op, pruned_params, visited={}):
        super(sgd, self).__init__(op, pruned_params, visited)

    def _prune(self, var, pruned_axis, pruned_idx):
        pass

We will fix this issue by skipping default operators in the next version.

PaddleSlim支持transformer模型吗？今后会支持吗

谢谢！

是否能支持设置某些层不量化？

实测发现某些层量化后对精度的影响明显大于其他层，是否可以设置将这些层不量化只量化其他层？

等大佬们先试水用起来

请问一下，对于COCO数据集任务的蒸馏是对模型整体进行蒸馏吗？

尊敬的开发者：
你好！
请问一下，对于COCO数据集的任务来说，模型蒸馏是对模型整体都进行蒸馏吗，还是只对主干网络部分进行蒸馏？
我在飞桨的课程中看到了这样的训练结果，

期待你的回复！

训练后量化权值使用的量化方法确切是什么？

原文：训练后量化的目标是求取量化比例因子，主要有两种方法：非饱和量化方法 ( No Saturation) 和饱和量化方法 (Saturation)。非饱和量化方法计算FP32类型Tensor中绝对值的最大值abs_max，将其映射为127，则量化比例因子等于abs_max/127。饱和量化方法使用KL散度计算一个合适的阈值T (0<T<mab_max)，将其映射为127，则量化比例因子等于T/127。一般而言，对于待量化op的权重Tensor，采用非饱和量化方法，对于待量化op的激活Tensor（包括输入和输出），采用饱和量化方法。
问题：
1.权值使用的量化方法算法原理里用的是abs_max，而api介绍里说的是channel_abs_max
2.另外对于激活层，使用KL散度计算出的T，算法原理里写的是0<T<mab_max, 其中mab_max是指什么？
3.训练后量化中，对于输入量化比例系数的计算也是使用KL散度吗？

在使用蒸馏时，若teacher program里有包含BIGRU会报错

在teacher program里
定义
encoder_fwd_cell = fluid.layers.GRUCell(hidden_size=128)
encoder_fwd_output, fwd_state = fluid.layers.rnn(
cell=encoder_fwd_cell,
inputs=emb_out,
sequence_length=None,
time_major=False,
is_reverse=False)
# 使用GRUCell构建反向RNN
encoder_bwd_cell = fluid.layers.GRUCell(hidden_size=128)
encoder_bwd_output, bwd_state = fluid.layers.rnn(
cell=encoder_bwd_cell,
inputs=emb_out,
sequence_length=None,
time_major=False,
is_reverse=True)
# 拼接前向与反向GRU的编码结果得到h
encoder_output = fluid.layers.concat(
input=[encoder_fwd_output, encoder_bwd_output], axis=2)
encoder_output=fluid.layers.elementwise_mul(encoder_output,input_mask,axis=-1)
这个结构的话，就会报错，去掉就可以，估计源码中未考虑到某种情况导致出错

mobilenetv2 deeplabv3+ pruning问题

在最新的paddleslim库和paddleseg库上进行mobilenetv2 deeplabv3+的pruning，配置裁剪参数后报以下错误，请问是什么原因。

Parameter[decoder/separable_conv1/pointwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv1/pointwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv1/pointwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/weights] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/gamma] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv2/depthwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/weights] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/gamma] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/beta] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/moving_mean] loaded sucessfully!
Parameter[decoder/separable_conv2/pointwise/BatchNorm/moving_variance] loaded sucessfully!
Parameter[logit/weights] loaded sucessfully!
Parameter[logit/biases] loaded sucessfully!
332/332 pretrained parameters loaded successfully!
Traceback (most recent call last):
File "./slim/prune/train_prune.py", line 504, in
main(args)
File "./slim/prune/train_prune.py", line 491, in main
train(cfg)
File "./slim/prune/train_prune.py", line 347, in train
only_graph=False)[0]
File "/home/hpc/ccx/paddle1.7/PaddleSlim/paddleslim/prune/pruner.py", line 82, in prune
param_t = np.array(scope.find_var(param).get_tensor())
AttributeError: 'NoneType' object has no attribute 'get_tensor'

@wanghaoshuang

[DocumentTranslation]API-table_latency

https://paddlepaddle.github.io/PaddleSlim/table_latency/

name '_logger' is not defined

使用在线量化训练模型，报错：
Traceback (most recent call last):
File "train.py", line 734, in
main()
File "train.py", line 730, in main
train(args)
File "train.py", line 339, in train
test_prog, place, quant_config, scope=None, for_test=True)
File "/home/vis/duyuting/anaconda3/lib/python3.7/site-packages/paddleslim-1.0.0-py3.7.egg/paddleslim/quant/quanter.py", line 132, in quant_aware
NameError: name '_logger' is not defined
环境是paddle1.6 paddleslim是上一个issue给的release版本我去quanter.py去看了一下确实对_logger没有定义？是有bug？

请问一下，使用PaddleSlim进行蒸馏的话，teacher模型的选择有什么要求吗，是不是跟student“长得越像”越好？

尊敬的开发者：
你好！
请问一下，使用PaddleSlim进行蒸馏的话，teacher模型的选择有什么要求吗，是不是跟student模型“长得越像”越好？
期待你的回复！

How to append float32 operator to quantized graph

from paddleslim.quant import quant_aware, convert
quantized_graph = convert(infer_prog, place, config=config)
quantized_program = quantized_graph.to_program()
for var in quantized_program.list_vars():
    print(var.name)

with fluid.program_guard(quantized_program):
    out = quantized_program.global_block().var("your_var_name")
    out = fluid.layers.some_op(out)

fluid.io.save_inference_model(main_program=quantized_program, ...)

Some API docs:

program_guard: https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/program_guard_cn.html#program-guard
Program: https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Program_cn.html#program
save-inference-model: https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/io_cn/save_inference_model_cn.html#save-inference-model
IrGraph.to_porgram: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/framework.py#L3803