paddlepaddle / benchmark Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 153.0 54.49 MB

Shell 23.72% Python 75.23% TeX 0.08% Dockerfile 0.02% MATLAB 0.09% CMake 0.33% C++ 0.53%

benchmark's People

Contributors

Stargazers

Watchers

Forkers

luotao1 xreki zhhsplendid guru4elephant gongweibao kolinwei sneaxiy chengduozh kqsmea8 phlrain thomasqin2090 baojun-nervana zhaoyuchen2018 p79n6a chenwhql zhiqiu tensor-tang gaowei8 ccmeteorljh onejune2018 xiaosang wojtuss lidanqing-intel wozna zhangting2020 sand3r- ddokupil lelelelelez iducn gfwm2013 tangtang586 paddlepaddle-gardener donproc randytli asdlei99 aurelius84 wanghuancoder vslyu zhwesky2010 zhupengyang shippingwang sandyhouse joejiong littletomatodonkey liym27 liuchiachi joey12300 tink2123 zhui lielinjiang yaoxuefeng6 wanghaoshuang hong19860320 seiriosplus windstamp ashburnlee huangxu96 jameslim-sy 123malin forfishes thisjiang niuliling123 gentelyang jerrywgz jiaxiao243 kevinxu816 tongxin zjq9409 firestonelib wadefelix zoooo0820 cxxly kpatr1ck gzxl juncaipeng 0x45f nemonameless hydrogensulfate zh794390558 ldoublev occupymars2025 ghostxsl rainfrost1 zhangbo9674 zkh2016 hysunflower courtesy-xs pangyoki wangna11bd lzzyzlbb huangjun12 yt605155624 stjordanis lutaochu from00 aganlengzi m3ngyang joeqiao12 linjieccc ronny1996

benchmark's Issues

运行maskrcnn-from-fb有报错

使用8卡运行maskrcnn-from-fb，在训练结束后的test阶段有如下报错：
2019-05-23 04:24:31,288 maskrcnn_benchmark.trainer INFO: Total training time: 1 day, 18:53:50.877918 (0.8579 s / it) Traceback (most recent call last): File "./tools/train_net.py", line 174, in <module> main() File "./tools/train_net.py", line 170, in main run_test(cfg, model, args.distributed) File "./tools/train_net.py", line 95, in run_test data_loaders_val = make_data_loader(cfg, is_train=False, is_distributed=distributed) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/data/build.py", line 154, in make_data_loader datasets = build_dataset(dataset_list, transforms, DatasetCatalog, is_train) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/data/build.py", line 33, in build_dataset data = dataset_catalog.get(dataset_name) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/config/paths_catalog.py", line 113, in get attrs = DatasetCatalog.DATASETS[name] KeyError: 'coco_2017_minival'

se-resnext多进程挂了

Pass 0, trainbatch 4990, loss 4.21697,                         acc1 0.15625, acc5 0.37500, lr 0.10000, time 0.32 sec
Pass 0, trainbatch 5000, loss 3.58300,                         acc1 0.28125, acc5 0.46875, lr 0.10000, time 0.32 sec
train.py:445: RuntimeWarning: Mean of empty slice.
  test_loss = np.array(test_info[0]).mean()
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
train.py:446: RuntimeWarning: Mean of empty slice.
  test_acc1 = np.array(test_info[1]).mean()
train.py:447: RuntimeWarning: Mean of empty slice.
  test_acc5 = np.array(test_info[2]).mean()
End pass 0, train_loss 5.04358, train_acc1 0.09865, train_acc5 0.24054, test_loss nan, test_acc1 nan, test_acc5 nan
Traceback (most recent call last):
  File "train.py", line 494, in <module>
    main()
  File "train.py", line 490, in main
    train(args)
  File "train.py", line 459, in train
    fluid.io.save_persistables(exe, model_path, main_program=train_prog)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 521, in save_persistables
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 199, in save_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 237, in save_vars
    executor.run(save_program)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 650, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 748, in _run
    exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator save error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1748, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 221, in save_vars
    'file_path': os.path.join(save_dirname, new_var.name)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 199, in save_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 521, in save_persistables
    filename=filename)
  File "train.py", line 459, in train
    fluid.io.save_persistables(exe, model_path, main_program=train_prog)
  File "train.py", line 490, in main
    train(args)
  File "train.py", line 494, in <module>
    main()
C++ Callstacks: 
holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:139]
PaddlePaddle Call Stacks: 
0       0x7efddc40f388p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1       0x7efddc40f6d7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7efddc41010bp paddle::framework::Tensor::type() const + 107
3       0x7efdde378e1dp paddle::framework::GetDataTypeOfVar(paddle::framework::Variable const*) + 157
4       0x7efddcfa47e3p paddle::operators::SaveOp::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const + 67
5       0x7efdde37b63bp paddle::framework::OperatorWithKernel::ChooseKernel(paddle::framework::RuntimeContext const&, paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 235
6       0x7efdde37d798p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 728
7       0x7efdde37da11p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8       0x7efdde37b01cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9       0x7efddc59ae6ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10      0x7efddc59df3fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143
11      0x7efddc40031dp
12      0x7efddc441da6p
13            0x4c5326p PyEval_EvalFrameEx + 37958
14            0x4b9b66p PyEval_EvalCodeEx + 774
15            0x4c1f56p PyEval_EvalFrameEx + 24694
16            0x4b9b66p PyEval_EvalCodeEx + 774
17            0x4c17c6p PyEval_EvalFrameEx + 22758
18            0x4b9b66p PyEval_EvalCodeEx + 774
19            0x4c17c6p PyEval_EvalFrameEx + 22758
20            0x4b9b66p PyEval_EvalCodeEx + 774
21            0x4c17c6p PyEval_EvalFrameEx + 22758
22            0x4b9b66p PyEval_EvalCodeEx + 774
23            0x4c17c6p PyEval_EvalFrameEx + 22758
24            0x4b9b66p PyEval_EvalCodeEx + 774
25            0x4c1f56p PyEval_EvalFrameEx + 24694
26            0x4b9b66p PyEval_EvalCodeEx + 774
27            0x4c1f56p PyEval_EvalFrameEx + 24694
28            0x4b9b66p PyEval_EvalCodeEx + 774
29            0x4eb69fp
30            0x4e58f2p PyRun_FileExFlags + 130
31            0x4e41a6p PyRun_SimpleFileExFlags + 390
32            0x4938cep Py_Main + 1358
33      0x7efe1e358830p __libc_start_main + 240
34            0x493299p _start + 41

OP Benchmark 测试算子性能的时候无法使用nsight system

[PaddlePaddle OP Benchmark](https://github.com/PaddlePaddle/benchmark/tree/master/api)
在测试op的性能和精度的时候,无法正常的启动Nsight system,本人的显卡是3060,环境是docker环境下的latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2

Mask rcnn跑多卡多进程时，只有一个卡在跑，其它卡未启动。

Traceback (most recent call last):
File "train.py", line 210, in
train()
File "train.py", line 88, in train
exe.run(fluid.default_startup_program())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 625, in run
use_program_cache=use_program_cache)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 702, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Place CUDAPlace(0) is not supported, Please re-compile with WITH_GPU option at [/paddle/paddle/fluid/platform/device_context.cc:37]
PaddlePaddle Call Stacks:
0 0x7f12effedba8p void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const*, int) + 360
1 0x7f12effedef7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7f12f1dc79e3p paddle::platform::DeviceContextPool::Get(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 355
3 0x7f12f1c6675dp paddle::framework::GarbageCollector::GarbageCollector(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 477
4 0x7f12f1c669e1p paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long) + 33
5 0x7f12f016b8d0p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 480
6 0x7f12f016c6afp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
7 0x7f12effdd29ep
8 0x7f12f0023096p
9 0x4c5326p PyEval_EvalFrameEx + 37958
10 0x4b9b66p PyEval_EvalCodeEx + 774
11 0x4c1f56p PyEval_EvalFrameEx + 24694
12 0x4b9b66p PyEval_EvalCodeEx + 774
13 0x4c17c6p PyEval_EvalFrameEx + 22758
14 0x4b9b66p PyEval_EvalCodeEx + 774
15 0x4c1f56p PyEval_EvalFrameEx + 24694
16 0x4b9b66p PyEval_EvalCodeEx + 774
17 0x4eb69fp
18 0x4e58f2p PyRun_FileExFlags + 130
19 0x4e41a6p PyRun_SimpleFileExFlags + 390
20 0x4938cep Py_Main + 1358
21 0x7f137c748830p __libc_start_main + 240
22 0x493299p _start + 41

Retinanet performance improvement On V100

负责人：
wangchaochaohu
测试环境
- 显卡驱动:418.39
- CUDA 9.0 CUDNN7
当前性能对比
CUDA90

场景	Paddle	pytorch	对比
单GPU	6.317	7.889	差于 20%

Optimize inference performance of ERNIE on CPU

负责人

@tensor-tang @GaoWei8

机器型号

6148

commit号

based on #164

初始性能

10个sample

I0820 10:33:36.597270 35686 inference.cc:211] Load 10 samples from /home/tangjian/ernie/Inference/c++/ernie/seq128_data/test_ds_10
I0820 10:33:37.552497 35686 inference.cc:351] Run 10 samples, average latency: 95.519 ms per sample.
I0820 10:33:37.552565 35686 inference.cc:356] Run 9 samples, average latency [exclude 1 warmup steps]: 89.8265 ms per sample.

profile 结果

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::fc                 740         625.813     0.013066    2.59008     0.845693    0.511826
thread0::load               202         269.789     0.009506    168.902     1.33559     0.220649
thread0::elementwise_add    380         78.8811     0.045364    29.1685     0.207582    0.0645135
thread0::transpose2         480         63.5249     0.089001    4.08297     0.132343    0.0519543
thread0::dropout            380         51.7229     0.01217     0.262424    0.136113    0.0423019
thread0::layer_norm         250         43.0832     0.150904    0.226994    0.172333    0.0352359
thread0::matmul             250         36.4239     0.033627    14.6704     0.145696    0.0297895
thread0::relu               120         22.7715     0.130891    1.77192     0.189762    0.0186238
thread0::scale              140         11.0508     0.006102    0.105016    0.0789342   0.00903797
thread0::softmax            120         9.73205     0.050275    0.451451    0.0811004   0.00795943
thread0::reshape2           480         4.47205     0.006964    0.022523    0.00931677  0.0036575
thread0::lookup_table       30          2.67894     0.074823    0.105928    0.089298    0.00219099
thread0::stack              10          1.43889     0.130984    0.154692    0.143889    0.00117681
thread0::tanh               10          0.986778    0.084346    0.191761    0.0986778   0.000807043
thread0::slice              10          0.12234     0.009367    0.033865    0.012234    0.000100057
thread0::feed               40          0.109835    0.001013    0.005219    0.00274588  8.98293e-05
thread0::fetch              10          0.106458    0.006874    0.011848    0.0106458   8.70674e-05

TODO

~~去掉load @tensor-tang，不会统计进预测时间，可以忽略~~
dropout多线程 @GaoWei8
fuse @intel

基于CUDA10在配置竞品环境时碰到的问题

CUDA10不支持竞品tensorflow1.12.0，只支持1.13.0之后的版本。

1.13.0以后的tensorflow和1.12.0版本差别较大，很多模型在tensorflow1.12.0下可以跑但是在tensorflow1.13.0之后的版本上已经不能正常训练。

Mask rcnn跑多卡多进程时，只有一个卡在跑，其它卡未启动。

Which is used for BERT training benchmark

Which script is used for BERT training benchmark, I see there are 2 kind of script, one is for pre-train, e.g train.py, the other is for fine tuning, e.g. run_classify.py.
Which one is used for benchmark?

The design and optimization of API Benchmark

se-resnext 8个进程下hang住，4进程下没问题

paddle commit-id: 977e9fcb274f9497a193baf59303f4a2024f1791
运行脚本：
https://github.com/PaddlePaddle/benchmark/blob/master/se-resnext/paddle/run_with_multi_process.sh

设置CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 时，运行一段时间出现hang住，其中2个卡出现GPU占用率100%

设置CUDA_VISIBLE_DEVICES=0,1,2,3 时，则没有问题

Optimize inference performance of ERNIE on P40 GPU

负责人

@Xreki @zhaoyuchen2018

初始性能

测试时间：2019年8月14日
测试者：@Xreki
GPU平台信息：Tesla P40
软件信息：
- Driver Version，418.39
- CUDA 9.0
- cuDNN 7.5
Paddle commit：

commit 744279fe685dd0b8b426a686d84ad449da02366e
Author: Kevin <[email protected]>
Date:   Mon Aug 12 10:13:12 2019 +0800

    Refine embedding Api doc (#18820)

测试代码：#164
编译Paddle使用Docker镜像：paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
编译测试程序，测试使用Docker镜像：paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
测试结果：
- GPU ratio，96%
- Runtime，8.3554 ms/sample

NVIDIA BERT推理解决方案Faster Transformer开源了

Optimize the performance of CQDNN on CPU

负责人

@luotao1 @GaoWei8

初始性能

测试时间：2019年11月12日
模型配置：单机16线程，采用率0.02，2000个part
测试者：@Aurelius84
单位：s/epoch
CPU型号：Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，56核

-	Paddle (MKL_CBWR=COMPATIBLE)	Paddle (MKL_CBWR="")	竞品
每轮耗时	41	23	38
加速比	慢于7.8%	快于39%

目标

因为竞品是设置了MKL_CBWR=COMPATIBLE跑的，所以需要Paddle (MKL_CBWR=COMPATIBLE)打平竞品。

Optimize the performance of seq2seq model on GPU

初始性能

测试时间：2019年8月8日
测试者：@Xreki

AttributeError: module 'paddle' has no attribute 'fluid'

可能是因为paddle develop分支中fluid已经废弃了，但是benchmark中没有做出相应的调整

Missing definition of the OpenACC API routine/s in the OpenACC library linked to the application

在/benchmark/api/tests目录下运行指令 bash run.sh，报错如下：

环境：
显卡:gtx 960
paddle:根据源码编译安装
cuda:10.2
tensorflow:2.1.0
tensorrt:6.0.1.8
cudnn:7.6.5
python:3.7.12
torch:1.12.0

Add dist-trainning to benchmark

Resnet

Add reference to models repo.
Add dgc support and performance benchmark.
Add dist support.
Add new reader.

Mask-rcnn paddle需要加入一个环境变量才可以最大bs=5的问题

mask-rcnn paddle最大bs=5测试需要加入环境变量
FLAGS_limit_of_tmp_allocation=2147483648
这个对用户相当不友好，需要优化一下，
加入该环境变量后，会导致速度从0.344 s/step 下降到 0.347 s/step

Optimize the performance of Cascade On V100

负责人：
wangchaochaohu
测试环境
- 显卡驱动:418.39
- CUDA 9.0 CUDNN7
当前性能对比
CUDA90

场景	Paddle	pytorch	对比
单GPU	7.374	8.230	差于 10.4%

Optimize train performance of StarGAN on V100 GPU

负责人

@chenwhql

初始性能

场景	Paddle dev	pytorch 1.1	对比
单GPU	48.525	70.508	差于 31%

测试时间：2019年8月2日
测试者：@chenwhql
GPU平台信息：Tesla V100
软件信息：
- Driver Verison：418.39
- CUDA 9.0
- cuDNN 7.5
Paddle：develop [commit版本: ee2f296]
测试代码：#152
编译Paddle使用Docker镜像：paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
编译测试程序，测试使用Docker镜像：paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev

mask-rcnn 有时会出现loss 为nan 的问题

单卡和多进程下不稳定复现，单进程多卡下目前还未出现

pr #31 会导致transformer出现loss异常现象

paddle version： develop commit-id:63d9fe336217303e1178e20ee66a7a10055387dc
#31
revert 该pr后没有问题，loss正常
训练结果：

2019-04-24 13:20:26,645-INFO: step_idx: 2300, epoch: 0, batch: 2300, avg loss: 65.918922, normalized loss: 64.541956, ppl: 42483886980604466541361102848.000000, speed: 4.82 step/s
2019-04-24 13:20:47,210-INFO: step_idx: 2400, epoch: 0, batch: 2400, avg loss: 74.120911, normalized loss: 72.743944, ppl: 154989570662575536897566879776768.000000, speed: 4.86 step/s
2019-04-24 13:21:07,700-INFO: step_idx: 2500, epoch: 0, batch: 2500, avg loss: 82.949768, normalized loss: 81.572802, ppl: 1058343263653170446511950991059845120.000000, speed: 4.88 step/s
train.py:558: RuntimeWarning: overflow encountered in exp
  np.exp([min(total_avg_cost, 100)]),speed))
2019-04-24 13:21:28,190-INFO: step_idx: 2600, epoch: 0, batch: 2600, avg loss: 92.073807, normalized loss: 90.696840, ppl: inf, speed: 4.88 step/s
2019-04-24 13:21:48,633-INFO: step_idx: 2700, epoch: 0, batch: 2700, avg loss: 103.230675, normalized loss: 101.853708, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.89 step/s
2019-04-24 13:22:09,212-INFO: step_idx: 2800, epoch: 0, batch: 2800, avg loss: 114.019951, normalized loss: 112.642984, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.86 step/s
2019-04-24 13:22:29,489-INFO: step_idx: 2900, epoch: 0, batch: 2900, avg loss: 126.890511, normalized loss: 125.513544, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.93 step/s
2019-04-24 13:22:49,770-INFO: step_idx: 3000, epoch: 0, batch: 3000, avg loss: 139.719406, normalized loss: 138.342440, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.93 step/s
2019-04-24 13:23:09,918-INFO: step_idx: 3100, epoch: 0, batch: 3100, avg loss: 151.848801, normalized loss: 150.471834, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.96 step/s
2019-04-24 13:23:29,846-INFO: step_idx: 3200, epoch: 0, batch: 3200, avg loss: 136.258896, normalized loss: 134.881929, ppl: 26881171418161356094253400435962903554686976.000000, speed: 5.02 step/s
2019-04-24 13:23:49,720-INFO: step_idx: 3300, epoch: 0, batch: 3300, avg loss: 187.088806, normalized loss: 185.711840, ppl: 26881171418161356094253400435962903554686976.000000, speed: 5.03 step/s
2019-04-24 13:24:09,800-INFO: step_idx: 3400, epoch: 0, batch: 3400, avg loss: 209.184631, normalized loss: 207.807665, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.98 step/s
2019-04-24 13:24:29,971-INFO: step_idx: 3500, epoch: 0, batch: 3500, avg loss: 234.616592, normalized loss: 233.239626, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.96 step/s

deeplabv3+ 有时出现异常结束的问题，导致run.sh 退出无法获取结果

训练日志如下：

step 75, loss: 2.736500, step_time_cost: 0.151 s
step 76, loss: 2.795518, step_time_cost: 0.150 s
step 77, loss: 2.817705, step_time_cost: 0.150 s
step 78, loss: 2.724798, step_time_cost: 0.149 s
step 79, loss: 2.779751, step_time_cost: 0.147 s
Training done. Model is saved to /home/crim/benchmark/deeplabv3+/paddle/output/model
*** Aborted at 1557585563 (unix time) try "date -d @1557585563" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x58) received by PID 4250 (TID 0x7f2212a06700) from PID 88; stack trace: ***
    @     0x7f22ea9e6390 (unknown)
    @           0x4bc644 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4d4e4d (unknown)
    @           0x4bca3c PyEval_EvalFrameEx
    @           0x4d4e4d (unknown)
    @           0x4bca3c PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4d57a3 (unknown)
    @           0x4a587e PyObject_Call
    @           0x4be51e PyEval_EvalFrameEx
    @           0x4c141f PyEval_EvalFrameEx
    @           0x4c141f PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4d5669 (unknown)
    @           0x4eef5e (unknown)
    @           0x4a587e PyObject_Call
    @           0x4c5ef0 PyEval_CallObjectWithKeywords
    @           0x589662 (unknown)
    @     0x7f22ea9dc6ba start_thread
    @     0x7f22ea71241d clone
    @                0x0 (unknown)

可以把set -xe改成 set -x，但是这个问题是否要解决一下？

tensorflow-2.0alpha跑竞品模型分别报如下错误

deeplabv3

Traceback (most recent call last):
  File "/ssd2/liyang/tensorflow/models/research/deeplab/train.py", line 22, in <module>
    from deeplab import common
  File "/ssd2/liyang/tensorflow/models/research/deeplab/common.py", line 25, in <module>
    flags = tf.app.flags
AttributeError: 'module' object has no attribute 'app'

RL模型

TF<1.3 support will be removed after 2018-03-15! Actually many examples already require TF>=1.3.
Traceback (most recent call last):
  File "./algorithm.py", line 5, in <module>
    from tensorpack.utils.globvars import globalns as param
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/__init__.py", line 17, in <module>
    from tensorpack.models import *
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/__init__.py", line 49, in <module>
    _global_import(module_name)
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/__init__.py", line 30, in _global_import
    p = __import__(name, globals(), locals(), level=1)
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/batch_norm.py", line 6, in <module>
    from tensorflow.contrib.framework import add_model_variable
ImportError: No module named 'tensorflow.contrib'

PaddingRNN

Traceback (most recent call last):
  File "train.py", line 255, in <module>
    main()
  File "train.py", line 91, in main
    rnn_type = args.rnn_type)
  File "/ssd2/liyang/benchmark/PaddingRNN/lstm_tf/ptb_lm_model.py", line 12, in ptb_lm_model
    x_place = tf.placeholder(tf.int32, [batch_size, num_steps])
AttributeError: 'module' object has no attribute 'placeholder'

Transformer模型

Traceback (most recent call last):
  File "/usr/local/bin/t2t-datagen", line 18, in <module>
    from tensor2tensor.bin import t2t_datagen
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 39, in <module>
    from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/problems.py", line 22, in <module>
    from tensor2tensor.utils import registry
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/registry.py", line 551, in <module>
    attacks = tf.contrib.framework.deprecated(None, "Use registry.attack")(attack)
AttributeError: 'module' object has no attribute 'contrib'
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 23, in <module>
    from tensor2tensor.bin import t2t_trainer
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 24, in <module>
    from tensor2tensor import models  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/__init__.py", line 25, in <module>
    from tensor2tensor.layers import modalities  # pylint: disable=g-import-not-at-top
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/modalities.py", line 28, in <module>
    from tensor2tensor.layers import common_attention
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_attention.py", line 31, in <module>
    from tensor2tensor.layers import area_attention
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/area_attention.py", line 23, in <module>
    from tensor2tensor.layers import common_layers
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_layers.py", line 30, in <module>
    import tensorflow_probability as tfp
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/__init__.py", line 78, in <module>
    from tensorflow_probability.python import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/__init__.py", line 21, in <module>
    from tensorflow_probability.python import bijectors
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/bijectors/__init__.py", line 46, in <module>
    from tensorflow_probability.python.bijectors.matveclu import MatvecLU
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/bijectors/matveclu.py", line 24, in <module>
    from tensorflow_probability.python.math.linalg import lu_reconstruct
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/math/__init__.py", line 22, in <module>
    from tensorflow_probability.python.math.diag_jacobian import diag_jacobian
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/math/diag_jacobian.py", line 24, in <module>
    tfe = tf.contrib.eager
AttributeError: 'module' object has no attribute 'contrib'

CycleGAN模型

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from tensorflow.examples.tutorials.mnist import input_data
ImportError: No module named 'tensorflow.examples'

mask-rcnn-fpn optimize

负责人：
zhaoyuchen
测试环境
- 显卡驱动:418.39
- CUDA 9.0 CUDNN7
当前性能对比
CUDA90

backbone：resnext

场景	Paddle	pytorch	对比
单GPU	2.601	3.035	差于 14%

backbone：resnet

场景	Paddle	pytorch	对比
单GPU	5.568	5.922	差于 6%

MobilenetV1 profile for optimizing

Profile MobilenetV1 to optimize depthwise conv.

paddingRnn 出现了loss异常

paddle version：develop commit-id：63d9fe336217303e1178e20ee66a7a10055387dc
结果日志：

     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage,
         in order to fetch the right value of the fetch_list, please set the
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
         # if you need to fetch conv1, then:
         conv1.persistable = True


I0424 13:37:22.304425 70214 build_strategy.cc:303] SeqOnlyAllReduceOps:0, num_trainers:1
begin to load data
vocab word num 10000
finished load data
-- Epoch:[0]; Batch:[132]; Time: 0.08103 s; ppl: 7701.04980, lr: 1.00000
-- Epoch:[0]; Batch:[264]; Time: 0.08842 s; ppl: 5319.26318, lr: 1.00000
-- Epoch:[0]; Batch:[396]; Time: 0.08054 s; ppl: 4582.54102, lr: 1.00000
-- Epoch:[0]; Batch:[528]; Time: 0.09146 s; ppl: 4241.10107, lr: 1.00000
-- Epoch:[0]; Batch:[660]; Time: 0.09322 s; ppl: 4072.14404, lr: 1.00000
-- Epoch:[0]; Batch:[792]; Time: 0.08899 s; ppl: 3849.68213, lr: 1.00000
-- Epoch:[0]; Batch:[924]; Time: 0.08301 s; ppl: 3604.66235, lr: 1.00000
-- Epoch:[0]; Batch:[1056]; Time: 0.08520 s; ppl: 3396.28955, lr: 1.00000
-- Epoch:[0]; Batch:[1188]; Time: 0.07859 s; ppl: 3215.93188, lr: 1.00000
-- Epoch:[0]; Batch:[1320]; Time: 0.09284 s; ppl: 2754.36401, lr: 1.00000
Parameters are randomly initialized and not good this time because the loss is over 1000 after the first epoch.
Abort this training process and please start again.

Optimize inference performance of ERNIE INT8 on CPU

Now, paddle ERNIE fp32 inference on CPU performance is ass below:
single thread： 251.464 m
20 threads：29.8818 ms
Our goal is to prove that with INT8 real kernel, ERNIE can get the performance gain.
@Sand3r- @wojtuss Please update your benchmark progress here.
@wzzju @luotao1 Please track the status here.

transformer 运行报错

[2019-06-21 16:52:25,962 INFO train.py:655] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=100, local=True, opts=['learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '1024', 'd_inner_hid', '4096', 'n_head', '16', 'prepostprocess_dropout', '0.3', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '100', 'max_length', '256'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
[2019-06-21 16:52:26,259 INFO train.py:706] before adam
memory_optimize is deprecated. Use CompiledProgram and Executor
[2019-06-21 16:52:52,269 INFO train.py:724] local start_up:
[2019-06-21 16:52:52,270 INFO train.py:506] init fluid.framework.default_startup_program
W0621 16:52:53.971729  7661 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0621 16:52:53.983120  7661 device_context.cc:267] device: 0, cuDNN Version: 7.4.
[2019-06-21 16:52:54,789 INFO train.py:509] begin reader
Traceback (most recent call last):
  File "train.py", line 806, in <module>
    train(args)
  File "train.py", line 726, in train
    token_num, predict, pyreader)
  File "train.py", line 515, in train_loop
    py_reader_provider_wrapper=py_reader_provider_wrapper)
  File "train.py", line 347, in prepare_data_generator
    train_reader = py_reader_provider_wrapper(data_reader, )
TypeError: py_reader_provider_wrapper() takes exactly 2 arguments (1 given)

Optimize XLnet performance

模型提供的测试报告：

V100 单机单卡自测值：
paddle 版本：develop
速度：0.961218 steps/s
tf 1.15
速度: 1.61 step/s

v1.5发布的Benchmark里的单进程/多进程的含义

在最新发布的v1.5的Benchmark里，各模型（如BERT)的benchmark都分为单GPU, 8GPU(单进程），8GPU(多进程）。请问这里单进程和多进程的具体含义是什么？谢谢

另外一个问题：BERT benchmark里用的是steps/s, 那么一个step是多少个sample呢？从training脚本看，batch_size缺省是8192. 不知道benchmark里的batch_size是多少？

Optimize the performance of PyramidDNN on CPU

负责人

@zhaoyuchen2018 , @luotao1

初始性能

测试时间：2019年7月15日
Paddle commit：
模型配置：
- 单机单线程：CPU_NUM=1，1个进程读数据
测试者：@Aurelius84
单位：s/epoch
CPU型号：Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz

epoch id	Paddle (MKL_CBWR=COMPATIBLE)	Paddle (MKL_CBWR="")	竞品
1	357	307	277
2	343	310	278

CPU型号：Intel(R) Xeon(R) CPU E5-2620 v4 @2.10GHz，16核

epoch id	Paddle (MKL_CBWR=COMPATIBLE)	Paddle (MKL_CBWR="")	竞品
1	268	254	219
2	283	245	220

结论：
- 因为竞品是设置了MKL_CBWR=COMPATIBLE跑的，所以需要用Paddle (MKL_CBWR=COMPATIBLE)和竞品对比，为了发现Paddle慢的地方。
- Paddle比竞品慢~25%
新增op
- search_pyramid_hash：PaddlePaddle/Paddle#18611

SEResnet50 模型定义问题

在CV的benchmark上，SE-ResNeXt50模型的定义地址为
https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/se_resnext.py的文件。有以下代码：
if layers == 50:
cardinality = 32
reduction_ratio = 16
depth = [3, 4, 6, 3]
num_filters = [128, 256, 512, 1024]

        conv = self.conv_bn_layer(
            input=input,
            num_filters=64,
            filter_size=7,
            stride=2,
            act='relu',
            name='conv1', )
        conv = fluid.layers.pool2d(
            input=conv,
            pool_size=3,
            pool_stride=2,
            pool_padding=1,
            pool_type='max',
            use_cudnn=False)

根据这里的定义 weight 的深度结构是[3, 4, 6, 3] 个数定义为 [128, 256, 512, 1024]
标准的Resnet 50的 weights 定义应该是 [256,512,1024,2048]，是不是相当于SE-RESNET是为了降低计算的开销，所以做了简化设计？

Optimize train performance of STGAN on V100 GPU

负责人

@chenwhql

初始性能

场景	Paddle dev	tensorflow 1.2	对比
单GPU	49.091	82.244	差于 40%

测试时间：2019年8月6日
测试者：@chenwhql
GPU平台信息：Tesla V100
软件信息：
- Driver Verison：418.39
- CUDA 9.0
- cuDNN 7.5
Paddle：develop [commit版本: ee2f296]
测试代码：#154
编译Paddle使用Docker镜像：paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
编译测试程序，测试使用Docker镜像：paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev

transformer 多进程单卡下报错

https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/train.py#L616

2019-05-21 09:29:28,729-INFO: Namespace(batch_size=4096, device='GPU', enable_ce=True, fetch_steps=100, local=True, opts=['dropout_seed', '10', 'learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '512', 'd_inner_hid', '2048', 'n_head', '8', 'prepostprocess_dropout', '0.1', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '1', 'model_dir', 'tmp_models', 'ckpt_dir', 'tmp_ckpts'], pool_size=200000, shuffle=False, shuffle_batch=False, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_default_pe=False, use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
Traceback (most recent call last):
  File "train.py", line 784, in <module>
    train(args)
  File "train.py", line 641, in train
    dev_count = get_device_num()
  File "train.py", line 616, in get_device_num
    device_num = subprocess.check_output(['nvidia-smi','-L']).decode().count('\n')
NameError: global name 'subprocess' is not defined

CUDA device is not set properly

When I use op benchmark,the runtime can not find gpu.

Optimize the performance of Transformer-Big on 1 V100 GPU

负责人

@wangchaochaohu

初始性能

测试时间：2019年06月20日
Paddle commit：
models commit：
测试脚本：run.sh

base_batch_size=4096
python -u train.py \
    --src_vocab_fpath data/vocab.bpe.32000 \
    --trg_vocab_fpath data/vocab.bpe.32000 \
    --special_token <s> <e> <unk> \
    --train_file_pattern data/train.tok.clean.bpe.32000.en-de \
    --batch_size ${base_batch_size} \
    --use_token_batch True \
    --sort_type pool \
    --pool_size 200000 \
    --shuffle True \
    --shuffle_batch True \
    --use_py_reader True \
    --use_mem_opt True \
    --enable_ce False \
    --fetch_steps 100 \
    learning_rate 2.0 \
    warmup_steps 8000 \
    beta2 0.997 \
    d_model 1024 \
    d_inner_hid 4096 \
    n_head 16 \
    prepostprocess_dropout 0.3 \
    attention_dropout 0.1 \
    relu_dropout 0.1 \
    weight_sharing True \
    pass_num 100 \
    max_length 256

QA：@ccmeteorljh
单位：steps/s

	Paddle 1.5.0	TensorFlow 1.12.0	Ratio
1 GPU	1.82	1.968	-7.6%
8 GPUs (SP)	13.12	7.072	+86%

auto_run_paddle.sh执行的时候，paddingRnn总是会会在large model出现错误

2019-05-13 18:45:34,470 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_ce=True, inference_only=False, init_params_path=None, log_path=None, max_epoch=5, model_type='large', parallel=True, profile=False, rnn_model='static', save_model_dir=None, use_gpu=True)
2019-05-13 18:45:34,470-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_ce=True, inference_only=False, init_params_path=None, log_path=None, max_epoch=5, model_type='large', parallel=True, profile=False, rnn_model='static', save_model_dir=None, use_gpu=True)
W0513 18:45:37.997777  8668 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0513 18:45:38.010324  8668 device_context.cc:269] device: 0, cuDNN Version: 7.0.
W0513 18:45:38.077817  8668 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
2019-05-13 18:45:38,082-WARNING: 
     You can try our memory optimize feature to save your memory usage:
         # create a build_strategy variable to set memory optimize option
         build_strategy = compiler.BuildStrategy()
         build_strategy.enable_inplace = True
         build_strategy.memory_optimize = True
         
         # pass the build_strategy to with_data_parallel API
         compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
             loss_name=loss.name, build_strategy=build_strategy)
      
     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage, 
         in order to fetch the right value of the fetch_list, please set the 
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
         # if you need to fetch conv1, then:
         conv1.persistable = True

                 
I0513 18:45:39.936303  8668 build_strategy.cc:305] SeqOnlyAllReduceOps:0, num_trainers:1
begin to load data
vocab word num 10000
finished load data
Traceback (most recent call last):
  File "train.py", line 512, in <module>
    main()
  File "train.py", line 506, in main
    train()
  File "train.py", line 421, in train
    train_ppl = train_an_epoch(epoch_id, batch_times)
  File "train.py", line 393, in train_an_epoch
    use_program_cache=True)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 640, in run
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 502, in _run_parallel
    exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator reshape2 error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1700, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/nn.py", line 6590, in reshape
    "XShape": x_shape})
  File "/home/crim/benchmark/PaddingRNN/lstm_paddle/lm_model.py", line 250, in lm_model
    init_hidden, shape=[num_layers, -1, hidden_size])
  File "train.py", line 192, in main
    rnn_model=rnn_model)
  File "train.py", line 512, in <module>
    main()
C++ Callstacks: 
holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:146]
PaddlePaddle Call Stacks: 
0       0x7f06f362b468p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1       0x7f06f362b7b7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7f06f362c24bp paddle::framework::Tensor::type() const + 107
3       0x7f06f3eb83e8p paddle::operators::ReshapeKernel::operator()(paddle::framework::ExecutionContext const&) const + 312
4       0x7f06f536c6e0p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 368
5       0x7f06f536c97ap paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 218
6       0x7f06f536a0fcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
7       0x7f06f5177ceap paddle::framework::details::ComputationOpHandle::RunImpl() + 250
8       0x7f06f516a3e0p paddle::framework::details::OpHandleBase::Run(bool) + 160
9       0x7f06f50c823dp
10      0x7f06f50c909dp paddle::framework::details::ThreadedSSAGraphExecutor::RunOp(std::shared_ptr<paddle::framework::BlockingQueue<paddle::framework::details::VarHandleBase*> > const&, paddle::framework::details::OpHandleBase*) + 1325
11      0x7f06f50cf3ebp paddle::framework::details::ThreadedSSAGraphExecutor::RunImpl(std::vector<std::string, std::allocator<std::string> > const&) + 1339
12      0x7f06f50ccc32p paddle::framework::details::ThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&) + 482
13      0x7f06f50b8b9ap paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&) + 378
14      0x7f06f37e2132p paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&, std::string const&) + 562
15      0x7f06f361a60ep
16      0x7f06f36605d6p
17            0x4c5326p PyEval_EvalFrameEx + 37958
18            0x4b9b66p PyEval_EvalCodeEx + 774
19            0x4c1f56p PyEval_EvalFrameEx + 24694
20            0x4b9b66p PyEval_EvalCodeEx + 774
21            0x4c17c6p PyEval_EvalFrameEx + 22758
22            0x4b9b66p PyEval_EvalCodeEx + 774
23            0x4c1f56p PyEval_EvalFrameEx + 24694
24            0x4b9b66p PyEval_EvalCodeEx + 774
25            0x4c1f56p PyEval_EvalFrameEx + 24694
26            0x4b9b66p PyEval_EvalCodeEx + 774
27            0x4c1f56p PyEval_EvalFrameEx + 24694
28            0x4b9b66p PyEval_EvalCodeEx + 774
29            0x4eb69fp
30            0x4e58f2p PyRun_FileExFlags + 130
31            0x4e41a6p PyRun_SimpleFileExFlags + 390
32            0x4938cep Py_Main + 1358
33      0x7f077fb53830p __libc_start_main + 240
34            0x493299p _start + 41

paddlepaddle / benchmark Goto Github PK

benchmark's People

Contributors

Stargazers

Watchers

Forkers

benchmark's Issues

负责人

机器型号

commit号

初始性能

profile 结果

TODO

负责人

初始性能

负责人

初始性能

目标

初始性能

负责人

初始性能

负责人

初始性能

负责人

初始性能

负责人

初始性能

Recommend Projects

Recommend Topics

Recommend Org