paddlepaddle / paddle-lite Goto Github PK

PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎）

Home Page: https://www.paddlepaddle.org.cn/lite

License: Apache License 2.0

CMake 1.87% Shell 1.67% C++ 82.34% C 3.64% Objective-C 0.02% Objective-C++ 1.60% Metal 0.89% Java 0.12% Python 7.66% Cuda 0.13% Batchfile 0.05%

mobile deep-learning neural-network arm mdl baidu embedded mali fpga mobile-deep-learning

paddle-lite's Introduction

Paddle Lite

English | 简体中文

Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架，定位于支持包括移动端、嵌入式以及边缘端在内的多种硬件平台。

当前 Paddle Lite 不仅在百度内部业务中得到全面应用，也成功支持了众多外部用户和企业的生产任务。

快速入门

使用 Paddle Lite，只需几个简单的步骤，就可以把模型部署到多种终端设备中，运行高性能的推理任务，使用流程如下所示：

一. 准备模型

Paddle Lite 框架直接支持模型结构为 PaddlePaddle 深度学习框架产出的模型格式。目前 PaddlePaddle 用于推理的模型是通过 save_inference_model 这个 API 保存下来的。如果您手中的模型是由诸如 Caffe、Tensorflow、PyTorch 等框架产出的，那么您可以使用 X2Paddle 工具将模型转换为 PaddlePaddle 格式。

二. 模型优化

Paddle Lite 框架拥有优秀的加速、优化策略及实现，包含量化、子图融合、Kernel 优选等优化手段。优化后的模型更轻量级，耗费资源更少，并且执行速度也更快。这些优化通过 Paddle Lite 提供的 opt 工具实现。opt 工具还可以统计并打印出模型中的算子信息，并判断不同硬件平台下 Paddle Lite 的支持情况。您获取 PaddlePaddle 格式的模型之后，一般需要通过该 opt 工具做模型优化。opt 工具的下载和使用，请参考模型优化方法。

三. 下载或编译

Paddle Lite 提供了 Android/iOS/x86/macOS 平台的官方 Release 预测库下载，我们优先推荐您直接下载 Paddle Lite 预编译库，或者从 Release notes 处获取最新的预编译编译库。

Paddle Lite 已支持多种环境下的源码编译，为了避免复杂、繁琐的环境搭建过程，我们建议您使用 Docker 统一编译环境搭建进行编译。当然，您也可以根据宿主机和目标设备的 CPU 架构和操作系统，在源码编译中找到相应的环境搭建及编译指南，自行完成编译环境的搭建。

四. 预测示例

Paddle Lite 提供了 C++、Java、Python 三种 API，并且提供了相应 API 的完整使用示例:

您可以参考示例中的说明快速了解使用方法，并集成到您自己的项目中去。

针对不同的硬件平台，Paddle Lite 提供了各个平台的完整示例：

主要特性

支持多平台：涵盖 Android、iOS、嵌入式 Linux 设备、Windows、macOS 和 Linux 主机
支持多种语言：包括 Java、Python、C++
轻量化和高性能：针对移动端设备的机器学习进行优化，压缩模型和二进制文件体积，高效推理，降低内存消耗

持续集成

System	x86 Linux	ARM Linux	Android (GCC/Clang)	iOS
CPU(32bit)
CPU(64bit)
OpenCL	-	-		-
Metal	-	-	-
华为麒麟 NPU	-	-		-
华为昇腾 NPU			-	-
昆仑芯 XPU			-	-
昆仑芯 XTCL			-	-
高通 QNN	-	-		-
寒武纪 MLU		-	-	-
(瑞芯微/晶晨/恩智浦) 芯原 TIM-VX	-			-
Android NNAPI	-	-		-
联发科 APU	-	-		-
颖脉 NPU	-		-	-
Intel OpenVINO		-	-	-
亿智 NPU	-		-	-

架构设计

Paddle Lite 的架构设计着重考虑了对多硬件和平台的支持，并且强化了多个硬件在一个模型中混合执行的能力，多个层面的性能优化处理，以及对端侧应用的轻量化设计。

其中，Analysis Phase 包括了 MIR(Machine IR) 相关模块，能够对原有的模型的计算图针对具体的硬件列表进行算子融合、计算裁剪在内的多种优化。Execution Phase 只涉及到 Kernel 的执行，且可以单独部署，以支持极致的轻量级部署。

进一步了解 Paddle Lite

如果您想要进一步了解 Paddle Lite，下面是进一步学习和使用 Paddle Lite 的相关内容：

文档和示例

关键技术

模型量化：
- 静态离线量化
- 动态离线量化
调试分析：调试和性能分析工具
移动端模型训练：点击了解一下
飞桨预训练模型库：试试在 PaddleHub 浏览和下载 Paddle 的预训练模型
飞桨推理 AI 硬件统一适配框架 NNAdapter：点击了解一下

FAQ

FAQ：常见问题，可以访问 FAQ、搜索 Issues、或者通过页面底部的联系方式联系我们

贡献代码

贡献代码：如果您想一起参与 Paddle Lite 的开发，贡献代码，请访问开发者共享文档

交流与反馈

AIStudio 实训平台端测部署系列课程：https://aistudio.baidu.com/aistudio/course/introduce/22690
欢迎您通过 Github Issues 来提交问题、报告与建议
技术交流微信群：添加 wechat id:baidupaddle或扫描下方微信二维码，添加并回复小助手“端侧”，系统自动邀请加入；技术群 QQ 群: 一群696965088（已满）；二群，959308808；

微信公众号官方技术交流QQ群

如果您对我们的工作感兴趣，也欢迎加入我们！

版权和许可证

Paddle Lite由 Apache-2.0 license 提供。

paddle-lite's People

Contributors

Stargazers

Watchers

Forkers

feiyereal yazaihu liuqingdada guojcoder kyocen qwangmobile michaellee826 lawrencezcl loggge sysau offbye janvenzhao wurq jeffersoncong vigo2013 sunkaianna xshhhm geographerwang tianzuishiwo ai-books xqpinitial msnqqer ycsuperlife scholltan vivienfu ghzhangnj luyulong ahuang1900 wmyue chuting zack6514 andywoj zjpjohn lyk125 hebin1016 s0302102 zgsxwsdxg weining gaoyuan1993 maning711 guofeng007 jelonlian ml-lab jimmy54 fishfire hanhailong frankfqchen oraclexbw rickyman headmaster1 jack2949 ministar683 kylexjk susyimes hehaitao074 notafraidltd realsun garmbrood phoenixbull freedomjavaer liaozjabc lihouzhao jacobjiangwei xdbice croooss michaelchansn chagge forgetvoice yak0xff flyawaylin wangtianlang0912 buxiaoliang huaijing xdcs100 justinjing soledad89 anker88 bibigfish absorbguo hit2sjtu akingyin1987 yushangbin hesitationer qxiang88 850361813 reborty kingpengzero alittleyellow duansong allensmile yangjingyuan longmiao9q baiyancheng20 vogali chubbymaggie ai-face zhangsuya theewind lamperougeyxy mx2017

paddle-lite's Issues

关于降低模型体积

关于降低模型体积
"提供量化脚本，对32位float转8位uint直接支持，模型体积量化后4M上下"
没有看到这个脚本？谢谢

在mac及xcode9环境下执行./build.sh mac脚本报错

如题，报错信息如下，
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: build/libmdl-static.a(mdl_jni.cpp.o) has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib build/libmdl-static.a
[ 89%] Built target mdl
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: build/libmdl-static.a(mdl_jni.cpp.o) has no symbols
[ 89%] Built target mdl-static
make: *** [all] Error 2

Some feedback is here.

Thank you for this great work. Some feedback is as follows.

I have found that when my net has multi outputs, the program crashes. And I try to fix it in "mobile-deep-learning/src/net.cpp:87", "for(int i = start; i <= end; i++)" should be "for(int i = start; i < end; i++)".
The caffe2mdl tool does not support that input's width not equal to height. It is a limit that many tasks need to have different input width and height.
The predict output does not support get result by layer name and get result's shape. Should these features will be support in the future?

Best Regards,
WolffyChen

在window10下能运行吗，，感谢百度！

后面会增加层的支持吗？

后面会增加层的支持吗？比如deconvolution layer prelu之类的

32位转8位unit

提供量化脚本，对32位float转8位uint直接支持，模型体积量化后4M上下
请问这个脚本在哪，怎么调用

请问有前端可用的依赖吗？

squeeze net performance on ios

Which ios device did you use to test the squeeze net? I installed the demo via QR on iphone7, and the model takes about 50ms per frame.

32位转8位uint

提供量化脚本，对32位float转8位uint直接支持，模型体积量化后4M上下
请问，上面这个脚本在哪里，怎么使用

Is it possible to provide compiling config for ubuntu on arm (both armv7 and armv8)?

Some mobile hardware platforms have only ubuntu system (or other linux systems), so is it possible to provide compiling config for ubuntu on arm (both armv7 and armv8), not only android and ios?

能否提供一下在android主流CPU上的性能数据，谢谢

只看到在IOS GPU上，squeezenet能跑到30ms。能否提供在android上的性能，这样可以对比跟其他框架的性能。
从代码上来看，ncnn使用neon指令实现了convolution，感觉要比这里直接使用gemm要快一些。

caffe2mdl Unknown bottom blob

when I run
./caffe2mdl model.prototxt model.caffemodel
it says Unknown bottom blob
tried several different models and get the same error

百度又来了一个重磅开源

👍

请问自定义的模型转换后怎么给Android使用？

请问自定义的模型转换后怎么放到so里面给Android使用？我从源码编译了Android的so，但是模型并没有用上。

怎么导入自己的图片库呢？

百度的好工具，但是我想导入自己的图片库，比如识别各种手机，怎么导入呢？谢谢。

./build.sh mac报错

环境ubuntu16.04
/src/loader/loader.cpp:25:19: fatal error: zconf.h: No such file or directory
/src/commons/commons.cpp:96:48: error: ‘memcpy’ was not declared in this scope
/src/layer/pooling_layer.cpp:97:40: error: ‘INT_MAX’ was not declared in this scope

考虑多线程吗？

逻辑容易拆分的layer如pooling_layer,relu_layer,roi_pooling_layer,im2col，多线程比较简单；
看到gemm也没有使用多线程，其实可以考虑把openblas的sgemm nn和nt移植过来的，也不算复杂；
特定micro_kernel可以考虑纯汇编的unroll

caffe2mdl tool bug

squeezenet mdl without quantification loading fails ,
dump_without_quantification did not replace the character '/' in layer names.

build报错

执行./build.sh mac后：
./build.sh: line 24: cmake: command not found
make: *** No targets specified and no makefile found. Stop.

cannot build in mac beacuse there is no protobuf library by default

CMake Error at /usr/local/Cellar/cmake/3.5.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find Protobuf (missing: PROTOBUF_LIBRARY PROTOBUF_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/local/Cellar/cmake/3.5.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
/usr/local/Cellar/cmake/3.5.1/share/cmake/Modules/FindProtobuf.cmake:308 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
tools/CMakeLists.txt:5 (find_package)

-- Configuring incomplete, errors occurred!
See also "/Users/icespring/mdl/mobile-deep-learning/build/release/x86/CMakeFiles/CMakeOutput.log".

which can solve by install protoc

求 Android GPU 支持

求 Android GPU 支持。

BTW，已经看到 TODO 里面写了这个，使用什么方案支持 Android GPU 呢？

支持全卷积网络吗？例如数据层的reshape

MTCNN等全卷积网络需要对模型的输入层做一下reshape。
我们这个框架怎么做到？

To Support Linux compiled

build.sh now only supports MAC compilation, need support Linux environment.

Copyright and license violations

Hi guys,

It's cool to see that Baidu is making an open source library for doing deep learning on iOS and Android.

However, it seems that a fair amount of code in this project was taken from my Forge library. That is no problem in principle, as Forge is also open source and I'm happy that the source code I wrote helps other projects.

That said, I do hope you will respect the Forge license (https://github.com/hollance/Forge/blob/master/LICENSE.txt), which requires that you preserve the original license and copyright notice in the source code you have copied from Forge.

Thank you!

tensorflow model convert

any plan for tensorflow model import?

faster rcnn+mobile net这种模型要怎么在框架下跑呢

To support variance types of mode, such as rcnn & faster rcnn, serval layers need to be added

Caffe2mdl tools convert ResNet-18 or ResNet-50 failed

When I use the caffe2mdl tools to convert ResNet model failed.
Did it support ResNet model ?

ubuntu执行./build.sh mac问题

用ubuntu16.04编译，出现 this file was generated by a newer version of protoc which is，ubuntu安装的protoc是2.6.1，太旧了么？

“大”矩阵运算crash m:1 n:6949 k:3200

Demo项目中加载我司的模型，跑到后面crash
m:1 n:6949 k:3200

#00 pc 00000000000310c4 /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Gemmer::pack_kxNR(int, float const*, int, int, float*)+24)
#1 pc 0000000000031194 /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Gemmer::pack_B(int, int, float const*, int, int, float*)+156)
#2 pc 0000000000031fa4 /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Gemmer::dgemm_nn(int, int, int, float, float const*, int, int, float const*, int, int, float, float*, int, int)+600)
#3 pc 000000000003216c /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Gemmer::sgemm(int, int, int, float const*, float const*, float*)+52)
#4 pc 00000000000258ac /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::FCLayer::forward(int)+532)
#5 pc 0000000000034c18 /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Net::forward_from_to(float*, int, int, bool)+1172)
#6 pc 000000000003546c /data/app/com.baidu.mdl.demo-1/lib/arm64/libmdl.so (mdl::Net::predict(float*)+36)

build.sh protoc版本

使用3.4版本protobuf执行build.sh mac 报错
build/release/x86/tools/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc which is incompatible with your Protocol Buffer headers. Please regenerate this file with a newer version of protoc.

执行完./mdlTest脚本，没看懂输出的内容

如题，
执行完脚本后，终端输出如下：
start running cycle : 0
load time : 70.449ms
total cost: 672.361ms.
89.6546 105.131 210.12 197.119
Done!
end running cycle : 0
，请问加粗的数字代表什么意思？是否可以图形可视化输出？

caffe模型转换MDL时，data需要什么格式？

如题，
copy your model.prototxt and model.caffemodel to this path
also need the input data

./caffe2mdl model.prototxt model.caffemodel data

data是标注还是图像？

谢谢。

I plan to implement tensorflow models convert to MDL's, welcome to join!

通过faster rcnn训练的caffe模型，使用caffe2mdl脚本时报错

如题，
报错信息，如下，
[libprotobuf ERROR google/protobuf/text_format.cc:288] Error parsing text-format caffe.NetParameter: 743:24: Message type "caffe.LayerParameter" has no field named "smooth_l1_loss_param".
read_proto_from_text failed

谢谢！

能简单写一个文档说明一下怎么本地把Demo编译过么?

拍照完了没有方框??

我下载了example里面的Android代码,导入android studio中可以运行.但是拍照完了后没有方框??这是什么原因啊?

About TX1 hardware

hi
thank you for the job！
I have a question below:
Does MDL optimized for mobile hardware Tx1 or only for phones?

thanks a lot!

can it run on a linux with arm-v8a?

My platform is running linux on firefly-3399 and I don't see an option in the build.sh

Thanks

可以识别交通标识么

README中的TODO里面，TensorFlow单词拼写错误了

TODO

Android GPU implementation
Converting Tensoflow Model to MDL

build error

run the script './build.sh android'
get the error: CMake Error: Could not create named generator Android Gradle - Unix Makefiles
that's why?

Gemmer::sgemm not match caffe_cpu_gemm<float> completely

Gemmer::sgemm not match caffe_cpu_gemm completely ,
when you test a model,you can found that the loss is iscrease.
then I found that Gemmer::sgemm not support the transformed mat.

can baidu solve it ?

执行 "python convert.py" 失败

Loading the Caffe model...
Traceback (most recent call last):
File "convert.py", line 12, in
import caffe_pb2
ImportError: bad magic number in 'caffe_pb2': b'\x03\xf3\r\n'

是否需要安装caffe？

./build.sh android CMake Error: Could not create named generator Android Gradle - Unix Makefiles

I use ./build.sh android to build android ,
but there is a error
CMake Error: Could not create named generator Android Gradle - Unix Makefiles

who can solve this problem ?

error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options. #error This file requires compiler and library support

I use ./build.sh android to build android ,
but there is a error
error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support

who can solve it ?