Giter VIP home page Giter VIP logo

benchmark's People

Contributors

2742195759 avatar ashburnlee avatar avin0323 avatar ccmeteorljh avatar chengduozh avatar chenwhql avatar dddivano avatar from00 avatar gaowei8 avatar gfwm2013 avatar gongweibao avatar hysunflower avatar jameslim-sy avatar jiaxiao243 avatar junjun315 avatar kolinwei avatar lelelelelez avatar lidanqing-intel avatar luotao1 avatar mmglove avatar sneaxiy avatar tangtang586 avatar wangchaochaohu avatar wangzhe0912 avatar windstamp avatar xiegegege avatar xreki avatar zhengya01 avatar zjq9409 avatar zzsean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benchmark's Issues

运行maskrcnn-from-fb有报错

使用8卡运行maskrcnn-from-fb,在训练结束后的test阶段有如下报错:
2019-05-23 04:24:31,288 maskrcnn_benchmark.trainer INFO: Total training time: 1 day, 18:53:50.877918 (0.8579 s / it) Traceback (most recent call last): File "./tools/train_net.py", line 174, in <module> main() File "./tools/train_net.py", line 170, in main run_test(cfg, model, args.distributed) File "./tools/train_net.py", line 95, in run_test data_loaders_val = make_data_loader(cfg, is_train=False, is_distributed=distributed) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/data/build.py", line 154, in make_data_loader datasets = build_dataset(dataset_list, transforms, DatasetCatalog, is_train) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/data/build.py", line 33, in build_dataset data = dataset_catalog.get(dataset_name) File "/benchmark/benchmark/Mask-RCNN/maskrcnn-from-fb/maskrcnn_benchmark/config/paths_catalog.py", line 113, in get attrs = DatasetCatalog.DATASETS[name] KeyError: 'coco_2017_minival'

se-resnext多进程挂了

Pass 0, trainbatch 4990, loss 4.21697,                         acc1 0.15625, acc5 0.37500, lr 0.10000, time 0.32 sec
Pass 0, trainbatch 5000, loss 3.58300,                         acc1 0.28125, acc5 0.46875, lr 0.10000, time 0.32 sec
train.py:445: RuntimeWarning: Mean of empty slice.
  test_loss = np.array(test_info[0]).mean()
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
train.py:446: RuntimeWarning: Mean of empty slice.
  test_acc1 = np.array(test_info[1]).mean()
train.py:447: RuntimeWarning: Mean of empty slice.
  test_acc5 = np.array(test_info[2]).mean()
End pass 0, train_loss 5.04358, train_acc1 0.09865, train_acc5 0.24054, test_loss nan, test_acc1 nan, test_acc5 nan
Traceback (most recent call last):
  File "train.py", line 494, in <module>
    main()
  File "train.py", line 490, in main
    train(args)
  File "train.py", line 459, in train
    fluid.io.save_persistables(exe, model_path, main_program=train_prog)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 521, in save_persistables
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 199, in save_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 237, in save_vars
    executor.run(save_program)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 650, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 748, in _run
    exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator save error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1748, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 221, in save_vars
    'file_path': os.path.join(save_dirname, new_var.name)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 199, in save_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 521, in save_persistables
    filename=filename)
  File "train.py", line 459, in train
    fluid.io.save_persistables(exe, model_path, main_program=train_prog)
  File "train.py", line 490, in main
    train(args)
  File "train.py", line 494, in <module>
    main()
C++ Callstacks: 
holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:139]
PaddlePaddle Call Stacks: 
0       0x7efddc40f388p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1       0x7efddc40f6d7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7efddc41010bp paddle::framework::Tensor::type() const + 107
3       0x7efdde378e1dp paddle::framework::GetDataTypeOfVar(paddle::framework::Variable const*) + 157
4       0x7efddcfa47e3p paddle::operators::SaveOp::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const + 67
5       0x7efdde37b63bp paddle::framework::OperatorWithKernel::ChooseKernel(paddle::framework::RuntimeContext const&, paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 235
6       0x7efdde37d798p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 728
7       0x7efdde37da11p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8       0x7efdde37b01cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9       0x7efddc59ae6ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10      0x7efddc59df3fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143
11      0x7efddc40031dp
12      0x7efddc441da6p
13            0x4c5326p PyEval_EvalFrameEx + 37958
14            0x4b9b66p PyEval_EvalCodeEx + 774
15            0x4c1f56p PyEval_EvalFrameEx + 24694
16            0x4b9b66p PyEval_EvalCodeEx + 774
17            0x4c17c6p PyEval_EvalFrameEx + 22758
18            0x4b9b66p PyEval_EvalCodeEx + 774
19            0x4c17c6p PyEval_EvalFrameEx + 22758
20            0x4b9b66p PyEval_EvalCodeEx + 774
21            0x4c17c6p PyEval_EvalFrameEx + 22758
22            0x4b9b66p PyEval_EvalCodeEx + 774
23            0x4c17c6p PyEval_EvalFrameEx + 22758
24            0x4b9b66p PyEval_EvalCodeEx + 774
25            0x4c1f56p PyEval_EvalFrameEx + 24694
26            0x4b9b66p PyEval_EvalCodeEx + 774
27            0x4c1f56p PyEval_EvalFrameEx + 24694
28            0x4b9b66p PyEval_EvalCodeEx + 774
29            0x4eb69fp
30            0x4e58f2p PyRun_FileExFlags + 130
31            0x4e41a6p PyRun_SimpleFileExFlags + 390
32            0x4938cep Py_Main + 1358
33      0x7efe1e358830p __libc_start_main + 240
34            0x493299p _start + 41

Mask rcnn跑多卡多进程时,只有一个卡在跑,其它卡未启动。

Traceback (most recent call last):
File "train.py", line 210, in
train()
File "train.py", line 88, in train
exe.run(fluid.default_startup_program())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 625, in run
use_program_cache=use_program_cache)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 702, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Place CUDAPlace(0) is not supported, Please re-compile with WITH_GPU option at [/paddle/paddle/fluid/platform/device_context.cc:37]
PaddlePaddle Call Stacks:
0 0x7f12effedba8p void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const*, int) + 360
1 0x7f12effedef7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7f12f1dc79e3p paddle::platform::DeviceContextPool::Get(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 355
3 0x7f12f1c6675dp paddle::framework::GarbageCollector::GarbageCollector(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 477
4 0x7f12f1c669e1p paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long) + 33
5 0x7f12f016b8d0p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 480
6 0x7f12f016c6afp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
7 0x7f12effdd29ep
8 0x7f12f0023096p
9 0x4c5326p PyEval_EvalFrameEx + 37958
10 0x4b9b66p PyEval_EvalCodeEx + 774
11 0x4c1f56p PyEval_EvalFrameEx + 24694
12 0x4b9b66p PyEval_EvalCodeEx + 774
13 0x4c17c6p PyEval_EvalFrameEx + 22758
14 0x4b9b66p PyEval_EvalCodeEx + 774
15 0x4c1f56p PyEval_EvalFrameEx + 24694
16 0x4b9b66p PyEval_EvalCodeEx + 774
17 0x4eb69fp
18 0x4e58f2p PyRun_FileExFlags + 130
19 0x4e41a6p PyRun_SimpleFileExFlags + 390
20 0x4938cep Py_Main + 1358
21 0x7f137c748830p __libc_start_main + 240
22 0x493299p _start + 41

Retinanet performance improvement On V100

  • 负责人:
    wangchaochaohu
  • 测试环境
    • 显卡驱动:418.39
    • CUDA 9.0 CUDNN7
  • 当前性能对比
    CUDA90
场景 Paddle pytorch 对比
单GPU 6.317 7.889 差于 20%

Optimize inference performance of ERNIE on CPU

负责人

@tensor-tang @GaoWei8

机器型号

6148

commit号

based on #164

初始性能

10个sample

I0820 10:33:36.597270 35686 inference.cc:211] Load 10 samples from /home/tangjian/ernie/Inference/c++/ernie/seq128_data/test_ds_10
I0820 10:33:37.552497 35686 inference.cc:351] Run 10 samples, average latency: 95.519 ms per sample.
I0820 10:33:37.552565 35686 inference.cc:356] Run 9 samples, average latency [exclude 1 warmup steps]: 89.8265 ms per sample.

profile 结果

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
thread0::fc                 740         625.813     0.013066    2.59008     0.845693    0.511826
thread0::load               202         269.789     0.009506    168.902     1.33559     0.220649
thread0::elementwise_add    380         78.8811     0.045364    29.1685     0.207582    0.0645135
thread0::transpose2         480         63.5249     0.089001    4.08297     0.132343    0.0519543
thread0::dropout            380         51.7229     0.01217     0.262424    0.136113    0.0423019
thread0::layer_norm         250         43.0832     0.150904    0.226994    0.172333    0.0352359
thread0::matmul             250         36.4239     0.033627    14.6704     0.145696    0.0297895
thread0::relu               120         22.7715     0.130891    1.77192     0.189762    0.0186238
thread0::scale              140         11.0508     0.006102    0.105016    0.0789342   0.00903797
thread0::softmax            120         9.73205     0.050275    0.451451    0.0811004   0.00795943
thread0::reshape2           480         4.47205     0.006964    0.022523    0.00931677  0.0036575
thread0::lookup_table       30          2.67894     0.074823    0.105928    0.089298    0.00219099
thread0::stack              10          1.43889     0.130984    0.154692    0.143889    0.00117681
thread0::tanh               10          0.986778    0.084346    0.191761    0.0986778   0.000807043
thread0::slice              10          0.12234     0.009367    0.033865    0.012234    0.000100057
thread0::feed               40          0.109835    0.001013    0.005219    0.00274588  8.98293e-05
thread0::fetch              10          0.106458    0.006874    0.011848    0.0106458   8.70674e-05

TODO

基于CUDA10在配置竞品环境时碰到的问题

CUDA10不支持竞品tensorflow1.12.0,只支持1.13.0之后的版本。

1.13.0以后的tensorflow和1.12.0版本差别较大,很多模型在tensorflow1.12.0下可以跑但是在tensorflow1.13.0之后的版本上已经不能正常训练。

Mask rcnn跑多卡多进程时,只有一个卡在跑,其它卡未启动。

Traceback (most recent call last):
File "train.py", line 210, in
train()
File "train.py", line 88, in train
exe.run(fluid.default_startup_program())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 625, in run
use_program_cache=use_program_cache)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 702, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Place CUDAPlace(0) is not supported, Please re-compile with WITH_GPU option at [/paddle/paddle/fluid/platform/device_context.cc:37]
PaddlePaddle Call Stacks:
0 0x7f12effedba8p void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const*, int) + 360
1 0x7f12effedef7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7f12f1dc79e3p paddle::platform::DeviceContextPool::Get(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 355
3 0x7f12f1c6675dp paddle::framework::GarbageCollector::GarbageCollector(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 477
4 0x7f12f1c669e1p paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long) + 33
5 0x7f12f016b8d0p paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 480
6 0x7f12f016c6afp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
7 0x7f12effdd29ep
8 0x7f12f0023096p
9 0x4c5326p PyEval_EvalFrameEx + 37958
10 0x4b9b66p PyEval_EvalCodeEx + 774
11 0x4c1f56p PyEval_EvalFrameEx + 24694
12 0x4b9b66p PyEval_EvalCodeEx + 774
13 0x4c17c6p PyEval_EvalFrameEx + 22758
14 0x4b9b66p PyEval_EvalCodeEx + 774
15 0x4c1f56p PyEval_EvalFrameEx + 24694
16 0x4b9b66p PyEval_EvalCodeEx + 774
17 0x4eb69fp
18 0x4e58f2p PyRun_FileExFlags + 130
19 0x4e41a6p PyRun_SimpleFileExFlags + 390
20 0x4938cep Py_Main + 1358
21 0x7f137c748830p __libc_start_main + 240
22 0x493299p _start + 41

Which is used for BERT training benchmark

Which script is used for BERT training benchmark, I see there are 2 kind of script, one is for pre-train, e.g train.py, the other is for fine tuning, e.g. run_classify.py.
Which one is used for benchmark?

Optimize inference performance of ERNIE on P40 GPU

负责人

@Xreki @zhaoyuchen2018

初始性能

  • 测试时间:2019年8月14日
  • 测试者:@Xreki
  • GPU平台信息:Tesla P40
  • 软件信息:
    • Driver Version,418.39
    • CUDA 9.0
    • cuDNN 7.5
  • Paddle commit:
commit 744279fe685dd0b8b426a686d84ad449da02366e
Author: Kevin <[email protected]>
Date:   Mon Aug 12 10:13:12 2019 +0800

    Refine embedding Api doc (#18820)
  • 测试代码:#164
  • 编译Paddle使用Docker镜像:paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
  • 编译测试程序,测试使用Docker镜像:paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
  • 测试结果:
    • GPU ratio,96%
    • Runtime,8.3554 ms/sample

NVIDIA BERT推理解决方案Faster Transformer开源了

Optimize the performance of CQDNN on CPU

负责人

@luotao1 @GaoWei8

初始性能

  • 测试时间:2019年11月12日
  • 模型配置:单机16线程,采用率0.02,2000个part
  • 测试者:@Aurelius84
  • 单位:s/epoch
  • CPU型号:Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,56核
- Paddle (MKL_CBWR=COMPATIBLE) Paddle (MKL_CBWR="") 竞品
每轮耗时 41 23 38
加速比 慢于7.8% 快于39%

目标

因为竞品是设置了MKL_CBWR=COMPATIBLE跑的,所以需要Paddle (MKL_CBWR=COMPATIBLE)打平竞品。

Optimize train performance of StarGAN on V100 GPU

负责人

@chenwhql

初始性能

场景 Paddle dev pytorch 1.1 对比
单GPU 48.525 70.508 差于 31%
  • 测试时间:2019年8月2日
  • 测试者:@chenwhql
  • GPU平台信息:Tesla V100
  • 软件信息:
    • Driver Verison:418.39
    • CUDA 9.0
    • cuDNN 7.5
  • Paddle:develop [commit版本: ee2f296]
  • 测试代码:#152
  • 编译Paddle使用Docker镜像:paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
  • 编译测试程序,测试使用Docker镜像:paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev

pr #31 会导致transformer出现loss异常现象

paddle version: develop commit-id:63d9fe336217303e1178e20ee66a7a10055387dc
#31
revert 该pr后没有问题,loss正常
训练结果:

2019-04-24 13:20:26,645-INFO: step_idx: 2300, epoch: 0, batch: 2300, avg loss: 65.918922, normalized loss: 64.541956, ppl: 42483886980604466541361102848.000000, speed: 4.82 step/s
2019-04-24 13:20:47,210-INFO: step_idx: 2400, epoch: 0, batch: 2400, avg loss: 74.120911, normalized loss: 72.743944, ppl: 154989570662575536897566879776768.000000, speed: 4.86 step/s
2019-04-24 13:21:07,700-INFO: step_idx: 2500, epoch: 0, batch: 2500, avg loss: 82.949768, normalized loss: 81.572802, ppl: 1058343263653170446511950991059845120.000000, speed: 4.88 step/s
train.py:558: RuntimeWarning: overflow encountered in exp
  np.exp([min(total_avg_cost, 100)]),speed))
2019-04-24 13:21:28,190-INFO: step_idx: 2600, epoch: 0, batch: 2600, avg loss: 92.073807, normalized loss: 90.696840, ppl: inf, speed: 4.88 step/s
2019-04-24 13:21:48,633-INFO: step_idx: 2700, epoch: 0, batch: 2700, avg loss: 103.230675, normalized loss: 101.853708, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.89 step/s
2019-04-24 13:22:09,212-INFO: step_idx: 2800, epoch: 0, batch: 2800, avg loss: 114.019951, normalized loss: 112.642984, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.86 step/s
2019-04-24 13:22:29,489-INFO: step_idx: 2900, epoch: 0, batch: 2900, avg loss: 126.890511, normalized loss: 125.513544, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.93 step/s
2019-04-24 13:22:49,770-INFO: step_idx: 3000, epoch: 0, batch: 3000, avg loss: 139.719406, normalized loss: 138.342440, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.93 step/s
2019-04-24 13:23:09,918-INFO: step_idx: 3100, epoch: 0, batch: 3100, avg loss: 151.848801, normalized loss: 150.471834, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.96 step/s
2019-04-24 13:23:29,846-INFO: step_idx: 3200, epoch: 0, batch: 3200, avg loss: 136.258896, normalized loss: 134.881929, ppl: 26881171418161356094253400435962903554686976.000000, speed: 5.02 step/s
2019-04-24 13:23:49,720-INFO: step_idx: 3300, epoch: 0, batch: 3300, avg loss: 187.088806, normalized loss: 185.711840, ppl: 26881171418161356094253400435962903554686976.000000, speed: 5.03 step/s
2019-04-24 13:24:09,800-INFO: step_idx: 3400, epoch: 0, batch: 3400, avg loss: 209.184631, normalized loss: 207.807665, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.98 step/s
2019-04-24 13:24:29,971-INFO: step_idx: 3500, epoch: 0, batch: 3500, avg loss: 234.616592, normalized loss: 233.239626, ppl: 26881171418161356094253400435962903554686976.000000, speed: 4.96 step/s

deeplabv3+ 有时出现异常结束的问题,导致run.sh 退出无法获取结果

训练日志如下:

step 75, loss: 2.736500, step_time_cost: 0.151 s
step 76, loss: 2.795518, step_time_cost: 0.150 s
step 77, loss: 2.817705, step_time_cost: 0.150 s
step 78, loss: 2.724798, step_time_cost: 0.149 s
step 79, loss: 2.779751, step_time_cost: 0.147 s
Training done. Model is saved to /home/crim/benchmark/deeplabv3+/paddle/output/model
*** Aborted at 1557585563 (unix time) try "date -d @1557585563" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x58) received by PID 4250 (TID 0x7f2212a06700) from PID 88; stack trace: ***
    @     0x7f22ea9e6390 (unknown)
    @           0x4bc644 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4c17c6 PyEval_EvalFrameEx
    @           0x4d4e4d (unknown)
    @           0x4bca3c PyEval_EvalFrameEx
    @           0x4d4e4d (unknown)
    @           0x4bca3c PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4d57a3 (unknown)
    @           0x4a587e PyObject_Call
    @           0x4be51e PyEval_EvalFrameEx
    @           0x4c141f PyEval_EvalFrameEx
    @           0x4c141f PyEval_EvalFrameEx
    @           0x4b9b66 PyEval_EvalCodeEx
    @           0x4d5669 (unknown)
    @           0x4eef5e (unknown)
    @           0x4a587e PyObject_Call
    @           0x4c5ef0 PyEval_CallObjectWithKeywords
    @           0x589662 (unknown)
    @     0x7f22ea9dc6ba start_thread
    @     0x7f22ea71241d clone
    @                0x0 (unknown)

可以把set -xe改成 set -x,但是这个问题是否要解决一下?

tensorflow-2.0alpha跑竞品模型分别报如下错误

  • deeplabv3
Traceback (most recent call last):
  File "/ssd2/liyang/tensorflow/models/research/deeplab/train.py", line 22, in <module>
    from deeplab import common
  File "/ssd2/liyang/tensorflow/models/research/deeplab/common.py", line 25, in <module>
    flags = tf.app.flags
AttributeError: 'module' object has no attribute 'app'
  • RL模型
TF<1.3 support will be removed after 2018-03-15! Actually many examples already require TF>=1.3.
Traceback (most recent call last):
  File "./algorithm.py", line 5, in <module>
    from tensorpack.utils.globvars import globalns as param
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/__init__.py", line 17, in <module>
    from tensorpack.models import *
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/__init__.py", line 49, in <module>
    _global_import(module_name)
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/__init__.py", line 30, in _global_import
    p = __import__(name, globals(), locals(), level=1)
  File "/usr/local/lib/python3.5/dist-packages/tensorpack/models/batch_norm.py", line 6, in <module>
    from tensorflow.contrib.framework import add_model_variable
ImportError: No module named 'tensorflow.contrib'
  • PaddingRNN
Traceback (most recent call last):
  File "train.py", line 255, in <module>
    main()
  File "train.py", line 91, in main
    rnn_type = args.rnn_type)
  File "/ssd2/liyang/benchmark/PaddingRNN/lstm_tf/ptb_lm_model.py", line 12, in ptb_lm_model
    x_place = tf.placeholder(tf.int32, [batch_size, num_steps])
AttributeError: 'module' object has no attribute 'placeholder'
  • Transformer模型
Traceback (most recent call last):
  File "/usr/local/bin/t2t-datagen", line 18, in <module>
    from tensor2tensor.bin import t2t_datagen
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 39, in <module>
    from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/problems.py", line 22, in <module>
    from tensor2tensor.utils import registry
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/registry.py", line 551, in <module>
    attacks = tf.contrib.framework.deprecated(None, "Use registry.attack")(attack)
AttributeError: 'module' object has no attribute 'contrib'
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 23, in <module>
    from tensor2tensor.bin import t2t_trainer
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 24, in <module>
    from tensor2tensor import models  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/__init__.py", line 25, in <module>
    from tensor2tensor.layers import modalities  # pylint: disable=g-import-not-at-top
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/modalities.py", line 28, in <module>
    from tensor2tensor.layers import common_attention
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_attention.py", line 31, in <module>
    from tensor2tensor.layers import area_attention
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/area_attention.py", line 23, in <module>
    from tensor2tensor.layers import common_layers
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_layers.py", line 30, in <module>
    import tensorflow_probability as tfp
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/__init__.py", line 78, in <module>
    from tensorflow_probability.python import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/__init__.py", line 21, in <module>
    from tensorflow_probability.python import bijectors
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/bijectors/__init__.py", line 46, in <module>
    from tensorflow_probability.python.bijectors.matveclu import MatvecLU
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/bijectors/matveclu.py", line 24, in <module>
    from tensorflow_probability.python.math.linalg import lu_reconstruct
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/math/__init__.py", line 22, in <module>
    from tensorflow_probability.python.math.diag_jacobian import diag_jacobian
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_probability/python/math/diag_jacobian.py", line 24, in <module>
    tfe = tf.contrib.eager
AttributeError: 'module' object has no attribute 'contrib'
  • CycleGAN模型
Traceback (most recent call last):
  File "main.py", line 3, in <module>
    from tensorflow.examples.tutorials.mnist import input_data
ImportError: No module named 'tensorflow.examples'

mask-rcnn-fpn optimize

  • 负责人:
    zhaoyuchen
  • 测试环境
    • 显卡驱动:418.39
    • CUDA 9.0 CUDNN7
  • 当前性能对比
    CUDA90

backbone:resnext

场景 Paddle pytorch 对比
单GPU 2.601 3.035 差于 14%

backbone:resnet

场景 Paddle pytorch 对比
单GPU 5.568 5.922 差于 6%

paddingRnn 出现了loss异常

paddle version:develop commit-id:63d9fe336217303e1178e20ee66a7a10055387dc
结果日志:

     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage,
         in order to fetch the right value of the fetch_list, please set the
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
         # if you need to fetch conv1, then:
         conv1.persistable = True


I0424 13:37:22.304425 70214 build_strategy.cc:303] SeqOnlyAllReduceOps:0, num_trainers:1
begin to load data
vocab word num 10000
finished load data
-- Epoch:[0]; Batch:[132]; Time: 0.08103 s; ppl: 7701.04980, lr: 1.00000
-- Epoch:[0]; Batch:[264]; Time: 0.08842 s; ppl: 5319.26318, lr: 1.00000
-- Epoch:[0]; Batch:[396]; Time: 0.08054 s; ppl: 4582.54102, lr: 1.00000
-- Epoch:[0]; Batch:[528]; Time: 0.09146 s; ppl: 4241.10107, lr: 1.00000
-- Epoch:[0]; Batch:[660]; Time: 0.09322 s; ppl: 4072.14404, lr: 1.00000
-- Epoch:[0]; Batch:[792]; Time: 0.08899 s; ppl: 3849.68213, lr: 1.00000
-- Epoch:[0]; Batch:[924]; Time: 0.08301 s; ppl: 3604.66235, lr: 1.00000
-- Epoch:[0]; Batch:[1056]; Time: 0.08520 s; ppl: 3396.28955, lr: 1.00000
-- Epoch:[0]; Batch:[1188]; Time: 0.07859 s; ppl: 3215.93188, lr: 1.00000
-- Epoch:[0]; Batch:[1320]; Time: 0.09284 s; ppl: 2754.36401, lr: 1.00000
Parameters are randomly initialized and not good this time because the loss is over 1000 after the first epoch.
Abort this training process and please start again.

transformer 运行报错

[2019-06-21 16:52:25,962 INFO train.py:655] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=100, local=True, opts=['learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '1024', 'd_inner_hid', '4096', 'n_head', '16', 'prepostprocess_dropout', '0.3', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '100', 'max_length', '256'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
[2019-06-21 16:52:26,259 INFO train.py:706] before adam
memory_optimize is deprecated. Use CompiledProgram and Executor
[2019-06-21 16:52:52,269 INFO train.py:724] local start_up:
[2019-06-21 16:52:52,270 INFO train.py:506] init fluid.framework.default_startup_program
W0621 16:52:53.971729  7661 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0621 16:52:53.983120  7661 device_context.cc:267] device: 0, cuDNN Version: 7.4.
[2019-06-21 16:52:54,789 INFO train.py:509] begin reader
Traceback (most recent call last):
  File "train.py", line 806, in <module>
    train(args)
  File "train.py", line 726, in train
    token_num, predict, pyreader)
  File "train.py", line 515, in train_loop
    py_reader_provider_wrapper=py_reader_provider_wrapper)
  File "train.py", line 347, in prepare_data_generator
    train_reader = py_reader_provider_wrapper(data_reader, )
TypeError: py_reader_provider_wrapper() takes exactly 2 arguments (1 given)

Optimize XLnet performance

模型提供的测试报告:
image

V100 单机单卡自测值:
paddle 版本:develop
速度:0.961218 steps/s
tf 1.15
速度: 1.61 step/s

v1.5发布的Benchmark里的单进程/多进程的含义

在最新发布的v1.5的Benchmark里,各模型(如BERT)的benchmark都分为单GPU, 8GPU(单进程),8GPU(多进程)。请问这里单进程和多进程的具体含义是什么?谢谢

另外一个问题:BERT benchmark里用的是steps/s, 那么一个step是多少个sample呢?从training脚本看,batch_size缺省是8192. 不知道benchmark里的batch_size是多少?

Optimize the performance of PyramidDNN on CPU

负责人

@zhaoyuchen2018 , @luotao1

初始性能

  • 测试时间:2019年7月15日
  • Paddle commit:
  • 模型配置:
    • 单机单线程:CPU_NUM=1,1个进程读数据
  • 测试者:@Aurelius84
  • 单位:s/epoch
  • CPU型号:Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz
epoch id Paddle (MKL_CBWR=COMPATIBLE) Paddle (MKL_CBWR="") 竞品
1 357 307 277
2 343 310 278
  • CPU型号:Intel(R) Xeon(R) CPU E5-2620 v4 @2.10GHz,16核
epoch id Paddle (MKL_CBWR=COMPATIBLE) Paddle (MKL_CBWR="") 竞品
1 268 254 219
2 283 245 220
  • 结论:

    • 因为竞品是设置了MKL_CBWR=COMPATIBLE跑的,所以需要用Paddle (MKL_CBWR=COMPATIBLE)和竞品对比,为了发现Paddle慢的地方。
    • Paddle比竞品慢~25%
  • 新增op

SEResnet50 模型定义问题

在CV的benchmark上,SE-ResNeXt50模型的定义地址为
https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/se_resnext.py的文件。有以下代码:
if layers == 50:
cardinality = 32
reduction_ratio = 16
depth = [3, 4, 6, 3]
num_filters = [128, 256, 512, 1024]

        conv = self.conv_bn_layer(
            input=input,
            num_filters=64,
            filter_size=7,
            stride=2,
            act='relu',
            name='conv1', )
        conv = fluid.layers.pool2d(
            input=conv,
            pool_size=3,
            pool_stride=2,
            pool_padding=1,
            pool_type='max',
            use_cudnn=False)

根据这里的定义 weight 的深度结构是[3, 4, 6, 3] 个数定义为 [128, 256, 512, 1024]
标准的Resnet 50的 weights 定义应该是 [256,512,1024,2048], 是不是相当于SE-RESNET是为了降低计算的开销,所以做了简化设计?

Optimize train performance of STGAN on V100 GPU

负责人

@chenwhql

初始性能

场景 Paddle dev tensorflow 1.2 对比
单GPU 49.091 82.244 差于 40%
  • 测试时间:2019年8月6日
  • 测试者:@chenwhql
  • GPU平台信息:Tesla V100
  • 软件信息:
    • Driver Verison:418.39
    • CUDA 9.0
    • cuDNN 7.5
  • Paddle:develop [commit版本: ee2f296]
  • 测试代码:#154
  • 编译Paddle使用Docker镜像:paddlepaddle/paddle_manylinux_devel:cuda9.0_cudnn7
  • 编译测试程序,测试使用Docker镜像:paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev

transformer 多进程单卡下报错

https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/train.py#L616

2019-05-21 09:29:28,729-INFO: Namespace(batch_size=4096, device='GPU', enable_ce=True, fetch_steps=100, local=True, opts=['dropout_seed', '10', 'learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '512', 'd_inner_hid', '2048', 'n_head', '8', 'prepostprocess_dropout', '0.1', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '1', 'model_dir', 'tmp_models', 'ckpt_dir', 'tmp_ckpts'], pool_size=200000, shuffle=False, shuffle_batch=False, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_default_pe=False, use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
Traceback (most recent call last):
  File "train.py", line 784, in <module>
    train(args)
  File "train.py", line 641, in train
    dev_count = get_device_num()
  File "train.py", line 616, in get_device_num
    device_num = subprocess.check_output(['nvidia-smi','-L']).decode().count('\n')
NameError: global name 'subprocess' is not defined

Optimize the performance of Transformer-Big on 1 V100 GPU

负责人

@wangchaochaohu

初始性能

  • 测试时间:2019年06月20日
  • Paddle commit:
  • models commit:
  • 测试脚本:run.sh
base_batch_size=4096
python -u train.py \
    --src_vocab_fpath data/vocab.bpe.32000 \
    --trg_vocab_fpath data/vocab.bpe.32000 \
    --special_token <s> <e> <unk> \
    --train_file_pattern data/train.tok.clean.bpe.32000.en-de \
    --batch_size ${base_batch_size} \
    --use_token_batch True \
    --sort_type pool \
    --pool_size 200000 \
    --shuffle True \
    --shuffle_batch True \
    --use_py_reader True \
    --use_mem_opt True \
    --enable_ce False \
    --fetch_steps 100 \
    learning_rate 2.0 \
    warmup_steps 8000 \
    beta2 0.997 \
    d_model 1024 \
    d_inner_hid 4096 \
    n_head 16 \
    prepostprocess_dropout 0.3 \
    attention_dropout 0.1 \
    relu_dropout 0.1 \
    weight_sharing True \
    pass_num 100 \
    max_length 256
Paddle 1.5.0 TensorFlow 1.12.0 Ratio
1 GPU 1.82 1.968 -7.6%
8 GPUs (SP) 13.12 7.072 +86%

auto_run_paddle.sh执行的时候,paddingRnn总是会会在large model出现错误

2019-05-13 18:45:34,470 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_ce=True, inference_only=False, init_params_path=None, log_path=None, max_epoch=5, model_type='large', parallel=True, profile=False, rnn_model='static', save_model_dir=None, use_gpu=True)
2019-05-13 18:45:34,470-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_ce=True, inference_only=False, init_params_path=None, log_path=None, max_epoch=5, model_type='large', parallel=True, profile=False, rnn_model='static', save_model_dir=None, use_gpu=True)
W0513 18:45:37.997777  8668 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0513 18:45:38.010324  8668 device_context.cc:269] device: 0, cuDNN Version: 7.0.
W0513 18:45:38.077817  8668 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
2019-05-13 18:45:38,082-WARNING: 
     You can try our memory optimize feature to save your memory usage:
         # create a build_strategy variable to set memory optimize option
         build_strategy = compiler.BuildStrategy()
         build_strategy.enable_inplace = True
         build_strategy.memory_optimize = True
         
         # pass the build_strategy to with_data_parallel API
         compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
             loss_name=loss.name, build_strategy=build_strategy)
      
     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage, 
         in order to fetch the right value of the fetch_list, please set the 
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
         # if you need to fetch conv1, then:
         conv1.persistable = True

                 
I0513 18:45:39.936303  8668 build_strategy.cc:305] SeqOnlyAllReduceOps:0, num_trainers:1
begin to load data
vocab word num 10000
finished load data
Traceback (most recent call last):
  File "train.py", line 512, in <module>
    main()
  File "train.py", line 506, in main
    train()
  File "train.py", line 421, in train
    train_ppl = train_an_epoch(epoch_id, batch_times)
  File "train.py", line 393, in train_an_epoch
    use_program_cache=True)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 640, in run
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 502, in _run_parallel
    exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator reshape2 error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1700, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/nn.py", line 6590, in reshape
    "XShape": x_shape})
  File "/home/crim/benchmark/PaddingRNN/lstm_paddle/lm_model.py", line 250, in lm_model
    init_hidden, shape=[num_layers, -1, hidden_size])
  File "train.py", line 192, in main
    rnn_model=rnn_model)
  File "train.py", line 512, in <module>
    main()
C++ Callstacks: 
holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/paddle/paddle/fluid/framework/tensor.h:146]
PaddlePaddle Call Stacks: 
0       0x7f06f362b468p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1       0x7f06f362b7b7p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2       0x7f06f362c24bp paddle::framework::Tensor::type() const + 107
3       0x7f06f3eb83e8p paddle::operators::ReshapeKernel::operator()(paddle::framework::ExecutionContext const&) const + 312
4       0x7f06f536c6e0p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 368
5       0x7f06f536c97ap paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 218
6       0x7f06f536a0fcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
7       0x7f06f5177ceap paddle::framework::details::ComputationOpHandle::RunImpl() + 250
8       0x7f06f516a3e0p paddle::framework::details::OpHandleBase::Run(bool) + 160
9       0x7f06f50c823dp
10      0x7f06f50c909dp paddle::framework::details::ThreadedSSAGraphExecutor::RunOp(std::shared_ptr<paddle::framework::BlockingQueue<paddle::framework::details::VarHandleBase*> > const&, paddle::framework::details::OpHandleBase*) + 1325
11      0x7f06f50cf3ebp paddle::framework::details::ThreadedSSAGraphExecutor::RunImpl(std::vector<std::string, std::allocator<std::string> > const&) + 1339
12      0x7f06f50ccc32p paddle::framework::details::ThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&) + 482
13      0x7f06f50b8b9ap paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&) + 378
14      0x7f06f37e2132p paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&, std::string const&) + 562
15      0x7f06f361a60ep
16      0x7f06f36605d6p
17            0x4c5326p PyEval_EvalFrameEx + 37958
18            0x4b9b66p PyEval_EvalCodeEx + 774
19            0x4c1f56p PyEval_EvalFrameEx + 24694
20            0x4b9b66p PyEval_EvalCodeEx + 774
21            0x4c17c6p PyEval_EvalFrameEx + 22758
22            0x4b9b66p PyEval_EvalCodeEx + 774
23            0x4c1f56p PyEval_EvalFrameEx + 24694
24            0x4b9b66p PyEval_EvalCodeEx + 774
25            0x4c1f56p PyEval_EvalFrameEx + 24694
26            0x4b9b66p PyEval_EvalCodeEx + 774
27            0x4c1f56p PyEval_EvalFrameEx + 24694
28            0x4b9b66p PyEval_EvalCodeEx + 774
29            0x4eb69fp
30            0x4e58f2p PyRun_FileExFlags + 130
31            0x4e41a6p PyRun_SimpleFileExFlags + 390
32            0x4938cep Py_Main + 1358
33      0x7f077fb53830p __libc_start_main + 240
34            0x493299p _start + 41

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.