wang-xinyu / tensorrtx Goto Github PK

View Code? Open in Web Editor NEW

6.6K 102.0 1.7K 1.9 MB

Implementation of popular deep learning networks with TensorRT network definition API

License: MIT License

CMake 1.45% C++ 78.70% Cuda 6.64% Python 11.16% Dockerfile 0.02% Shell 0.07% C 1.97%

tensorrt mnasnet resnet squeezenet yolov3 inceptionv3 mobilenetv2 mobilenetv3 yolov3-spp retinaface

tensorrtx's People

Stargazers

Watchers

Forkers

thunder95 funkykoki rainbow520lxr li829 chomolungma kmarconi cathy-kim chapmancpp felixzhang7 seeker1943 runauto xjsxujingsong bauerzhou wpf535236337 ericxsun rotorliu lejson ucedar riwaly azuredsky milort tianhuifangqingde xyt2008 cayek snzen gameinskysky freakiehuang sunchuanxi longmarch7 es6rc datakalp cutecrazy gjtjx qaz670756 wolfworld6 slayerlpj aubopiazt canshangd eticin palcode deidnani qiuhui1991 xrosliang aliushn xjiao004 lilin19890401 marvis taxuezcy cavalleria fengxingxiang trantorrepository clhne shuangjielin2017 jasionkit wuqiangch datcancode wu-ruijie le1kk job2003 taowenleon smallmunich intjun peterwang-hash csldali hazxone wingszb pgsrv qiuyunzhe robosina deepbehavier ggenny ngxbac youtang1993 brandy2 sinead-li songruiqi dongdem timverion diffgg luyanfcp highland2019 dengjianbo3 dineshkumares yuanliangxie haixiansheng hongdong-need-scar ydxc hzq-zjm piyalgeorge single430 m-peker stephenfang51 deepvertex threeyang chambers1994 blackxer wuyouyin notalone0125 xiaojinu visionzq

tensorrtx's Issues

YoloV4 not working for different dimensions

Hi,
In order to run yolov4 using tensorrt on C++ with a different dimension (I tried both 512 and 416) than the default value 608 I made the following changes.

In gen_wts.py line 6, changed img_size parameter passed to initialize Darknet object from 608 to 416
In yololayer.h as mentioned in README.md, INPUT_H and INPUT_W values are changed from 608 to 416
While testing it on the given test images I just got lot of random bounding boxes. I have attached the resulting image for bus.jpg.
Is there anything else I need to change?

生成的pth如果包含了一些自己写的模块信息，应该如何转化至wts呢？

必须把每一层都写出嘛？

yolov4：deserializeCudaEngine failed

Hi, I encountered the following error while running yolo4：
`
Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)

ERROR: INVALID_STATE: Unknown exception

ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
`
How to solve it?
Thanks in advance !

make error for the new yolov4 directory

Hi, thanks for the excellent work in advance.
When I try to run the latest yolov4 directory, after I follow the instruction to successfully get the wts file, when I try to make the c++ code, I get the following error.

/usr/bin/ld: warning: libcudart.so.10.2, needed by /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libnvinfer.so, may conflict with libcudart.so.10.0
/usr/bin/ld: warning: libzstd.so.1.3.7, needed by //home/student/anaconda3/lib/libtiff.so.5, not found (try using -rpath or -rpath-link)
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadDirectory@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_initCStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_freeCStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFWriteEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFIsTiled@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_maxCLevel'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFOpen@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_createCStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_isError' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_getErrorName'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_endStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFSetField@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFWriteScanline@LIBTIFF_4.0'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_createDStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFGetField@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFScanlineSize@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_initDStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_freeDStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFWriteDirectory@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFSetWarningHandler@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFSetErrorHandler@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadEncodedTile@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_compressStream'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadRGBATile@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_decompressStream'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFClose@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFRGBAImageOK@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFClientOpen@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadRGBAStrip@LIBTIFF_4.0'
collect2: error: ld returned 1 exit status
CMakeFiles/yolov4.dir/build.make:117: recipe for target 'yolov4' failed
make[2]: *** [yolov4] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yolov4.dir/all' failed
make[1]: *** [CMakeFiles/yolov4.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Is there any suggestion to how to fix that or it is just because the directory is not cleaned up yet?
Thanks for the repo again

deepstream engine serializtion error

I used the generated yolov4 engine for nvidia deepsteam application, but got a error:

ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: deserializationUtils.cpp (567) - Serialization Error in load: 0 (Serialized engine contains plugin, but no plugin factory was provided. To deserialize an engine without a factory, please use IPluginV2 instead.)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_STATE: std::exception
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1452 Deserialize engine failed from file: /opt/nvidia/deepstream/deepstream-5.0/sources/YoloV4/yolov4.engine

Adding Factory Implement may be right. But I just want to kown If IPluginV2 is a good idea. By the way, I'm a beginner for tensorrt, so I want your help.

sudo ./yolov3 -s出现问题

能编译成功，但生成引擎的时候出现错误

执行make时出错：/usr/bin/ld: cannot find -lnvinfer

我是用tar方式安装的tensorrt7.0，cmake时的显示如下：
jinjicheng@ubuntu:~/project/object-detection/tensorrtx/yolov4/build$ cmake -DCMAKE_INCLUDE_PATH='/home/jinjicheng/ENV/tensorrt-7.0/include' ..
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /home/jinjicheng/ENV/cuda-10.0 (found version "10.0")
embed_platform off
-- Found OpenCV: /usr/local (found version "4.1.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jinjicheng/project/object-detection/tensorrtx/yolov4/build
然后执行make时出现以下错误：
[100%] Linking CXX executable yolov4
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/yolov4.dir/build.make:144: recipe for target 'yolov4' failed
make[2]: *** [yolov4] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/yolov4.dir/all' failed
make[1]: *** [CMakeFiles/yolov4.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

fatal error: cuda_runtime_api.h: No such file or directory

project/resnet/resnext50_32x4d.cpp:2:10: fatal error: cuda_runtime_api.h: No such file or directory
#include "cuda_runtime_api.h"
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/resnext50.dir/build.make:62: recipe for target 'CMakeFiles/resnext50.dir/resnext50_32x4d.cpp.o' failed
make[2]: *** [CMakeFiles/resnext50.dir/resnext50_32x4d.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/resnext50.dir/all' failed
make[1]: *** [CMakeFiles/resnext50.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

RetinaFace not utilizing more than 1GB of GPU memory!! Any ideas on how to utilize more of GPU??

builder->setMaxWorkspaceSize(1 << 20);

I was not able to find much resources on how to change this to increase GPU memory utilization. Any help would be great.

Is the conversion effect of yolov4 consistent with darknet?

Thank you very much for your work！

Thank you for your excellent work.

I have a few questions for you.
1.Can you recommend some introductory tutorials?
2.Do you have any good examples? For example, on (GitHub).

INVALID_ARGUMENT: getPluginCreator could not find plugin Decode_TRT version 1

So, I am using RetinaFace in DeepStream 5.0 docker container but the problem is I cannot deserialize the engine file successfully. Below is the terminal output if you can make any sense please do let me know:

(gst-plugin-scanner:26): GStreamer-WARNING **: 07:09:47.376: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so': libtrtserver.so: cannot open shared object file: No such file or directory
Warn: 'threshold' parameter has been deprecated. Use 'pre-cluster-threshold' instead.
Now playing: ../../../samples/streams/sample_720p.mp4
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_ARGUMENT: getPluginCreator could not find plugin Decode_TRT version 1
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: safeDeserializationUtils.cpp (293) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_STATE: std::exception
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1452 Deserialize engine failed from file: /opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine
0:00:04.315405593    25 0x5618a017b8d0 WARN                 nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1566> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine failed
0:00:04.315444719    25 0x5618a017b8d0 WARN                 nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1673> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine failed, try rebuild
0:00:04.315464105    25 0x5618a017b8d0 INFO                 nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1591> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:934 failed to build network since there is no model file matched.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:872 failed to build network.
0:00:04.315759096    25 0x5618a017b8d0 ERROR                nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1611> [UID = 1]: build engine file failed
0:00:04.315783396    25 0x5618a017b8d0 ERROR                nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1697> [UID = 1]: build backend context failed
0:00:04.315800708    25 0x5618a017b8d0 ERROR                nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1024> [UID = 1]: generate backend failed, check config file settings
0:00:04.315966590    25 0x5618a017b8d0 WARN                 nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary-nvinference-engine> error: Failed to create NvDsInferContext instance
0:00:04.315979631    25 0x5618a017b8d0 WARN                 nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: retinaface_pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Running...
ERROR from element primary-nvinference-engine: Failed to create NvDsInferContext instance
Error details: gstnvinfer.cpp(781): gst_nvinfer_start (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: retinaface_pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

About nms implementation in yolov3-spp.cpp

Hi, thanks for this wonderful work. I was wondering if the compared target scores during the implementation of nms in yolov3-spp.cpp should be det_confidence * class_confidence?

Question about usability

HI,

First thanks for your repo, there is a lot to learn here ! :) I'm working on my Jetson Xavier and with others project, I've been able to go up to 220 FPS with an SSD mobilenet model optimized with TRT and quantized to FP16. With almost the same process, I'm only around 22 FPS with Yolov3. I was looking if there was any way to make it faster and I saw your repo. I've build the wts file and then i run the ./yolov3 -s and -d command, but I was wondering if there was any way to know the performance of this and also if I could use it with a webcam for example.

Regards ! :) And thanks again for this good repo

Support for yolov5

Hey, Thanks for the amazing work. I was wondering if Ultralytics yolov5 is on your list for a tensorrt implementation. I did not find any TRT impl for this anywhere.

sudo ./yolov4 -s 时的错误问题

报错：
[05/27/2020-14:34:04] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::434, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov4: /media/mrlee/software/document/AI/project/traffic_scene/tensorrtx/yolov4/yolov4.cpp:209: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.

您好，我的yolov4模型经过剪枝，压缩了95%的参数，剪了12个shortcut，出现了这个问题。

batch_size and the inference time

thanks for your great repo!!
Now I want to do inference on Xavier.
The origin batch_size is 1, the inference time is 24ms;
If I change batch_size to 3, the inference time is about 67ms.

This is not what I want, I want to use more memory to exchange time: change batch_size to 3, and the inference time is about 24ms, GPU memory 3X. How should I do?

.cu plugin files

What is the importance of .cu files and why are they loaded in CUDA? While I was going through the documentation and samples of TensorRT I didn't find such a method. Could you please explain the plugin creation method?

在jetson xaiver nx上使用 sudo ./yolov4 -s生成engine文件报错

[06/08/2020-15:42:10] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::488, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov4: /home/nvidia/data/xgy/test/tensorrtx/yolov4/yolov4.cpp:209: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
Aborted

why is yolov4 slower than yolov3-spp?

why is yolov4 slower than yolov3-spp in this repo?

Is the NMS being done on the cpu, rather than the GPU?

(thanks for this really interesting and educational repo.!)

How to run with different resolution

Hi,
Thanks for the repo, I have learned a lot from it. I would like to run yolov3-spp with different input dimensions. I have tried 608x608 but it didn't work out. I changed INPUT_H and INPUT_W both in yolov3-spp.cpp and YoloConfigs.h. Also, made the anchor heights same as anchor widths. However, the resulting images contain a lot of boxes that are not correct. I think I am doing something wrong. What should I do to make it work? Thanks in advance.

IBuilderConfig* config = builder->createBuilderConfig(); 段错误

你好，我在执行命令./yolov4 -s，在运行到 IBuilderConfig* config = builder->createBuilderConfig()报段错误，请问是什么原因呢？

为什么输入 sudo ./yolov3 -s 显示如下错误

我使用u版yolov3 自己的数据集训练出的模型 .pt 转换成 weights 后，并使用您的pytorchx仓库中的yolov3 代码转换为 .wts 后，无法生成引擎文件

beginning
Loading weights: ../yolov3.wts
0
len 32
1
len 64
2
len 32
3
len 64
5
len 128
6
len 64
7
len 128
9
len 64
10
len 128
12
len 256
13
len 128
14
len 256
16
len 128
17
len 256
19
len 128
20
len 256
22
len 128
23
len 256
25
len 128
26
len 256
28
len 128
29
len 256
31
len 128
32
len 256
34
len 128
35
len 256
37
len 512
38
len 256cuow
39
len 512
41
len 256
42
len 512
44
len 256
45
len 512
47
len 256
48
len 512
50
len 256
51
len 512
53
len 256
54
len 512
56
len 256
57
len 512
59
len 256
60
len 512
62
len 1024
63
len 512
64
len 0
ERROR: Parameter check failed at: ../builder/Network.cpp::addScale::164, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov3: /home/zxzn/tensorrtx/yolov3/yolov3.cpp:197: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string<char>, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
已放弃

Fantastic Repo. Python Question

Hello, I think this repo is great! I am curious... is there any plan on creating a python wrapper to call all your tensorRT models and preform inference with?

I have lots of code in my python, and I would love to test your models with my python libraries to give you live benchmarks on all the jetson devices :)

Thank you again!

int8 support

Do all models support int8 mode, such as yolov4?

学习了

大神通过你的代码受益匪浅，非常感谢。有时间能不能搞个关键点的模型加速，如手部姿态的关键点

yolov4和retinaface引擎的性能问题

@wang-xinyu 你好，我有两个问题请教。
1，我在jetson agx上测试。yolov4生成引擎，大概性能是20fps，我使用另外一个作者是.weights->onnx->engine，他的引擎性能大概是33fps，输入图片尺寸都是416*416，fp16的引擎。请问这是什么原因？
2，另外，retinaface生成的引擎我测试单张图片是230ms，但是直接使用Pytorch测试.pth模型，单张图片是40ms左右。不知道是哪里出了问题？

Not much change in FPS

Hello!
Thank you for the wonderful work! I would like to contribute to this project.
I noticed something strange on the yolov3-spp.cfg . I used the original pytorch repo to train and model. In normal python inference time I am getting ~33FPS on a 512x512 image.
So I used you repo to convert the weights and run using tenosrRT. However the FPS is still pretty much the same.

Isnt' tensorrt supposed to boost the inference time?
Secondly , Maybe I need to convert the precision to FP16 before inferencing. How should I do that?

Error creating tensorrt engine for Single class

Hi,

I am unable to create tensorrt engine using the default parameters for single class config "yolov3-spp-1cls.cfg. i have also tried different height and width but still not able to produce engine it fails at the serialization step. any help would be appreciated.
Thanks in Advance

fatal error: NvInfer.h: No such file or directory

fatal error: NvInfer.h: No such file or directory
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
CMake Error at myplugins_generated_mish.cu.o.Debug.cmake:219 (message):
Error generating
/media/tensorrtx/yolov4/build/CMakeFiles/myplugins.dir//./myplugins_generated_mish.cu.o

CMakeFiles/myplugins.dir/build.make:70: recipe for target 'CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o' failed
make[2]: *** [CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Great job! I have a question about it.

真的非常棒的项目,请问这个项目能做成只加载一次engine模型,然后挂在后台,有图片传进来就开始预测没图片就等待么? 因为我计算了一下时间,加载模型用了7s, 预测一张图片才8ms...
'''
载入模型时间: 7645ms
第几张图片: 1 图片文件名: zidane.jpg
图像处理时间: 109ms
Time compute: 8ms
第几张图片: 2 图片文件名: bus.jpg
图像处理时间: 48ms
加载模型后的推理时间: 8ms
247.561, 168.353, 32.9542, 46.0337, 0.794903, 0,
'''
c++不太熟悉, 期待您的回复.

用AlexeyAB项目训练好的yolov3.weights文件可以直接转成wts文件，但是tens送入让他、

how to write cpp file for net mobilenetv1?

hello, I want to run mobilenetv1 model in tensorrt, and have ever write engine function for mobilenetv1? Thanks :)

yolov3-spp windows tensorRT部署问题

您好，我使用下面的流程进行：
git clone https://github.com/wang-xinyu/tensorrtx.git
git clone https://github.com/ultralytics/yolov3.git
// download its weights 'yolov3-spp-ultralytics.pt'
cd yolov3
cp ../tensorrtx/yolov3-spp/gen_wts.py .
python gen_wts.py yolov3-spp-ultralytics.pt
// a file 'yolov3-spp_ultralytics68.wts' will be generated.
// the master branch of yolov3 should work, if not, you can checkout 4ac60018f6e6c1e24b496485f126a660d9c793d8

put yolov3-spp_ultralytics68.wts into yolov3-spp, build and run
mv yolov3-spp_ultralytics68.wts ../tensorrtx/yolov3-spp/
生成了yolov3-spp_ultralytics68.wts。
然后进行ICudaEngine* engine = builder->buildCudaEngine(*network);
engine->serialize();生成yolov3.engine文件

ICudaEngine* engine = createEngine(1, builder, DataType::kFLOAT);
assert(engine != nullptr);

	// Serialize the engine
	IHostMemory* modelStream = engine->serialize();

    assert(modelStream != nullptr);
    std::ofstream p("yolov3-spp.engine");
    if (!p) {
        std::cerr << "could not open plan output file" << std::endl;
        return -1;
    }
	printf("modelStream->size():%d\n", modelStream->size());
    p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());

发现生成的yolov3.engine每次生成的大小不一样。

生成yolov3.engine使用sudo ./yolov3-spp -d ../samples进行调用的时候，
载入yolov3.engine文件。
std::ifstream file("yolov3-spp.engine", std::ios::binary);
if (file.good()) {
file.seekg(0, file.end);
size = file.tellg();
file.seekg(0, file.beg);
trtModelStream = new char[size];
assert(trtModelStream);
file.read(trtModelStream, size);
file.close();
}
在这里总会报错： ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size, &pf);如下所示：
ERROR: C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
ERROR: INVALID_STATE: Unknown exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
这个问题在Windows下有。linux下没有出现？请问这个怎么解决？谢谢！

error: ‘createPReLUPlugin’ is not a member of ‘nvinfer1::plugin’

Hi,Wang

I try to convert yolov4 on Jetson NX.but meet below issue.
could you help to check it?

//-------------------------------------------------------------
[ 66%] Building CXX object CMakeFiles/yolov4.dir/plugin_factory.cpp.o
In file included from /home/ashing/tensorrtx/yolov4/plugin_factory.cpp:1:0:
/home/ashing/tensorrtx/yolov4/common.h: In function ‘unsigned int samples_common::getElementSize(nvinfer1::DataType)’:
/home/ashing/tensorrtx/yolov4/common.h:271:12: warning: enumeration value ‘kBOOL’ not handled in switch [-Wswitch]
switch (t)
^
/home/ashing/tensorrtx/yolov4/plugin_factory.cpp: In member function ‘virtual nvinfer1::IPlugin* nvinfer1::PluginFactory::createPlugin(const char*, const void*, size_t)’:
/home/ashing/tensorrtx/yolov4/plugin_factory.cpp:13:26: error: ‘createPReLUPlugin’ is not a member of ‘nvinfer1::plugin’
plugin = plugin::createPReLUPlugin(serialData, serialLength);
^~~~~~~~~~~~~~~~~
compilation terminated due to -Wfatal-errors.

是否计划实现mobilenetx0.25版本的retinaface人脸检测？

您好，请问是否计划实现mobilenetx0.25版本的retinaface人脸检测呢？谢谢！

how to return the results?

Firstly, Thanks for the conversion.
sudo ./yolov4 -d just detects the object with bounding box but not with the result.
How do I get the result of what the object is with the bounding box or in the terminal ??

yolov4与原版darknet比起来，精度下降有点厉害，尤其是密集场景

make failed

pengzhao@pengzhao:~/tensorrtx/yolov3/build$ make
[ 25%] Building NVCC (Device) object CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o
In file included from /home/pengzhao/tensorrtx/yolov3/yololayer.cu:1:0:
/home/pengzhao/tensorrtx/yolov3/yololayer.h:8:21: fatal error: NvInfer.h: No such file or directory
compilation terminated.
CMake Error at yololayer_generated_yololayer.cu.o.Debug.cmake:219 (message):
Error generating
/home/pengzhao/tensorrtx/yolov3/build/CMakeFiles/yololayer.dir//./yololayer_generated_yololayer.cu.o

CMakeFiles/yololayer.dir/build.make:63: recipe for target 'CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o' failed
make[2]: *** [CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yololayer.dir/all' failed
make[1]: *** [CMakeFiles/yololayer.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

parallel inferncing for multiple channels

I modified yolov3-spp code to read from video streams and infer it, now am trying to parallelize the code, mainly i am trying to share IRuntime, ICudaEngine and IExecutionContext to each spawned thread , am using p_thread to launch new thread and use IExecutionContext for doInference() routine, when i tried it on my 2 GB 1050 GTX card, i got cuda out of memory issue, this is possibly because IExecutionContext refers to same network builder object which points to serialized network, which takes almost 1.4 GB on GPU memory, but when i tried inferring same on Tesla K80 11 GB, i successfully lunched two threads calling doInference routine on shared ICudaEngine object, but the latency got doubled, for one thread it's about 30 ms but when launched 2 threads it's get doubled around 60 ms, so how can I optimise the code in order to leverage multi-threading for inferring multiple video streams in parallel with almost same latency. Thanks.

Improvement on multi-stream or multi-gpu application

Thanks for your work. I would like to propose some improvements that might help with multi-stream or multi-processor implementation.

First, add cudastream argument when calling Cuda kernel.
For example, here

tensorrtx/yolov4/mish.cu

Line 141 in ff364db

 mish_kernel<<<grid_size, block_size>>>(inputs[0], output, input_size_ * batchSize); 

Since it doesn't declare which stream is going to be used, it would run on default stream instead of the Cuda stream you create in doInference() function. The better way to write it would be:
mish_kernel<<<grid_size, block_size, 0, stream>>>(inputs[0], output, input_size_ * batchSize);
The same with the CalDetection kernel in YOLO layers.

Second, copy the anchor information when initializing YOLO plugin instead of doing cudamemcpy each time for detection.

tensorrtx/yolov4/yololayer.cu

Lines 181 to 204 in ff364db

 void YoloLayerPlugin::forwardGpu(const float *const * inputs, float* output, cudaStream_t stream, int batchSize) { 

 void* devAnchor; 

 size_t AnchorLen = sizeof(float)* CHECK_COUNT*2; 

 CUDA_CHECK(cudaMalloc(&devAnchor,AnchorLen)); 

 int outputElem = 1 + MAX_OUTPUT_BBOX_COUNT * sizeof(Detection) / sizeof(float); 

 for(int idx = 0 ; idx < batchSize; ++idx) { 

 CUDA_CHECK(cudaMemset(output + idx*outputElem, 0, sizeof(float))); 

 } 

 int numElem = 0; 

 for (unsigned int i = 0;i< mYoloKernel.size();++i) 

 { 

 const auto& yolo = mYoloKernel[i]; 

 numElem = yolo.width*yolo.height*batchSize; 

 if (numElem < mThreadCount) 

 mThreadCount = numElem; 

 CUDA_CHECK(cudaMemcpy(devAnchor, yolo.anchors, AnchorLen, cudaMemcpyHostToDevice)); 

 CalDetection<<< (yolo.width*yolo.height*batchSize + mThreadCount - 1) / mThreadCount, mThreadCount>>> 

 (inputs[i],output, numElem, yolo.width, yolo.height, (float *)devAnchor, mClassCount ,outputElem); 

 } 

 CUDA_CHECK(cudaFree(devAnchor)); 

 }

Maybe this improvement would not improve much speed-wise. But here the cudaFree() is a synchronized function, which would block all other thread in GPU. So to avoid using this function, it's better to put the anchor information in the device beforehand.

Third, use asynchronous functions instead of synchronous ones. For example, in the code above, cudaMemset() could change to cudaMemsetAsync(), cudaMemcpy() could change to cudaMemcpyAsync().

All these suggestions might not be prominent when running inference on only one GPU and one thread. But it would improve a lot for multi-stream or multi-device situation. Besides, it would be more consistent with the logic of parallel computing.

Hope it helps!

RetinaFace different dimensions error

Hi,

Thank you for your excellent work. This has really helped me in understand how to work with TensorRT.

I was trying to re-create the RetinaFace model with different dimensions as I have noticed that there is no limitation to how big or small the image has to be. I tried changing the INPUT_H and INPUT_W in the decode.h file, however, when serializing the model it throws an error of mismatch dimensions.

ERROR: (Unnamed Layer* 182) [ElementWise]: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [256,53,53] and [256,54,54]).

Can you please guide me on what I can do to solve this?

How to use batchsize in retinaface?

Windows version

Hi, thanks a lot for your excellent work. I've been able to repreduce this work on ubuntu16.04. But i wonder what i have to do to repreduce this repo on Windows?

执行 make 报错

[ 20%] Building NVCC (Device) object CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o
In file included from /opt/code/YOLO/tensorrtx/yolov4/mish.cu:5:0:
/opt/code/YOLO/tensorrtx/yolov4/mish.h:6:10: fatal error: NvInfer.h: 没有那个文件或目录
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
CMake Error at myplugins_generated_mish.cu.o.Debug.cmake:219 (message):
Error generating
/opt/code/YOLO/tensorrtx/yolov4/build/CMakeFiles/myplugins.dir//./myplugins_generated_mish.cu.o

Speed for yolov4 tensorrt engine is slower than original yolov4

Sorry to bother you again. But I just want to ask do you compare the time for the transferred yolov4 engine and the original yolov4 darknet repo?
For my side with Geforce RTX 2080 ti,
the time for the transferred engine on the two test jpg you provided is 38ms per image.
However, when I use the same image to test with the original darknet with the follow simple code
./darknet detector test ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights
The time per same image is only 21ms.
And to avoid the case that the original repo only include model forward time, I also use own video to do the FPS test in the original darknet repo, with following
include video_capturing + NMS + drawing_bboxes: ./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -dont_show -ext_output
exclude video_capturing + NMS + drawing_bboxes: ./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -benchmark
And either of the above two commands are faster than the 38ms tested here with the transferred engine.
Hence, I just want to ask have you compared the speed by yourself and on outside how much does time boost?
Thanks for the repo again

the problem in yolov3.cpp

thank you for this project,I have a problem in yolov3.cpp,why not the size is 7 , Isn't it 6,such as x,y,w,h,score,classlabel? thank you

yolov4 Unsupported fields

Traceback (most recent call last):
File "gen_wts.py", line 6, in
model = Darknet('cfg/yolov4.cfg', (608, 608))
File "/home/topsci/workspace/yolov3/models.py", line 225, in init
self.module_defs = parse_model_cfg(cfg)
File "/home/topsci/workspace/yolov3/utils/parse_config.py", line 49, in parse_model_cfg
assert not any(u), "Unsupported fields %s in %s. See ultralytics/yolov3#631" % (u, path)
AssertionError: Unsupported fields ['stopbackward', 'max_delta'] in cfg/yolov4.cfg. See ultralytics/yolov3#631

#include "NvOnnxParserTypedefs.h" make: *** [all] Error 2

In file included from /workspace/tensorrtyolo/tensorrtx/yolov3-spp/common.h:5:0,
                 from /workspace/tensorrtyolo/tensorrtx/yolov3-spp/plugin_factory.cpp:1:
/usr/local/include/NvOnnxParser.h:27:10: fatal error: NvOnnxParserTypedefs.h: No such file or directory
 #include "NvOnnxParserTypedefs.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/yolov3-spp.dir/build.make:62: recipe for target 'CMakeFiles/yolov3-spp.dir/plugin_factory.cpp.o' failed
make[2]: *** [CMakeFiles/yolov3-spp.dir/plugin_factory.cpp.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yolov3-spp.dir/all' failed
make[1]: *** [CMakeFiles/yolov3-spp.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

/.
/usr
/usr/include
/usr/include/x86_64-linux-gnu
/usr/include/x86_64-linux-gnu/NvInfer.h
/usr/include/x86_64-linux-gnu/NvInferRuntime.h
/usr/include/x86_64-linux-gnu/NvInferRuntimeCommon.h
/usr/include/x86_64-linux-gnu/NvInferVersion.h
/usr/include/x86_64-linux-gnu/NvUtils.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libnvinfer_static.a
/usr/share
/usr/share/doc
/usr/share/doc/libnvinfer-dev
/usr/share/doc/libnvinfer-dev/changelog.Debian
/usr/share/doc/libnvinfer-dev/copyright
/usr/lib/x86_64-linux-gnu/libnvinfer.so

后续会支持直接通过模型文件来构建engine，而不是通过手写构造每层网络算子的方式？

你好，非常感谢分享，很好的参考资源，我看了两三个示例代码，发现都是构造每层的网络操作来搭建网络，后续会支持通过模型文件来构建engine吗？

	void YoloLayerPlugin::forwardGpu(const float const inputs, float* output, cudaStream_t stream, int batchSize) {
	void* devAnchor;
	size_t AnchorLen = sizeof(float)* CHECK_COUNT*2;
	CUDA_CHECK(cudaMalloc(&devAnchor,AnchorLen));

	int outputElem = 1 + MAX_OUTPUT_BBOX_COUNT * sizeof(Detection) / sizeof(float);

	for(int idx = 0 ; idx < batchSize; ++idx) {
	CUDA_CHECK(cudaMemset(output + idx*outputElem, 0, sizeof(float)));
	}
	int numElem = 0;
	for (unsigned int i = 0;i< mYoloKernel.size();++i)
	{
	const auto& yolo = mYoloKernel[i];
	numElem = yolo.widthyolo.heightbatchSize;
	if (numElem < mThreadCount)
	mThreadCount = numElem;
	CUDA_CHECK(cudaMemcpy(devAnchor, yolo.anchors, AnchorLen, cudaMemcpyHostToDevice));
	CalDetection<<< (yolo.widthyolo.heightbatchSize + mThreadCount - 1) / mThreadCount, mThreadCount>>>
	(inputs[i],output, numElem, yolo.width, yolo.height, (float *)devAnchor, mClassCount ,outputElem);
	}

	CUDA_CHECK(cudaFree(devAnchor));
	}

wang-xinyu / tensorrtx Goto Github PK

tensorrtx's People

Stargazers

Watchers

Forkers

tensorrtx's Issues

Recommend Projects

Recommend Topics

Recommend Org