wang-xinyu / tensorrtx Goto Github PK
View Code? Open in Web Editor NEWImplementation of popular deep learning networks with TensorRT network definition API
License: MIT License
Implementation of popular deep learning networks with TensorRT network definition API
License: MIT License
Hi,
In order to run yolov4 using tensorrt on C++ with a different dimension (I tried both 512 and 416) than the default value 608 I made the following changes.
必须把每一层都写出嘛?
Hi, I encountered the following error while running yolo4:
`
Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
ERROR: INVALID_STATE: Unknown exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
`
How to solve it?
Thanks in advance !
Hi, thanks for the excellent work in advance.
When I try to run the latest yolov4 directory, after I follow the instruction to successfully get the wts file, when I try to make the c++ code, I get the following error.
/usr/bin/ld: warning: libcudart.so.10.2, needed by /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libnvinfer.so, may conflict with libcudart.so.10.0
/usr/bin/ld: warning: libzstd.so.1.3.7, needed by //home/student/anaconda3/lib/libtiff.so.5, not found (try using -rpath or -rpath-link)
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadDirectory@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_initCStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_freeCStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFWriteEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFIsTiled@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_maxCLevel'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFOpen@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_createCStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_isError' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_getErrorName'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_endStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFReadEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFSetField@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFWriteScanline@LIBTIFF_4.0'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_createDStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFGetField@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFScanlineSize@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_initDStream'
//home/student/anaconda3/lib/libtiff.so.5: undefined reference to ZSTD_freeDStream' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFWriteDirectory@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFSetWarningHandler@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFSetErrorHandler@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadEncodedTile@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_compressStream'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFReadRGBATile@LIBTIFF_4.0' //home/student/anaconda3/lib/libtiff.so.5: undefined reference to
ZSTD_decompressStream'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFClose@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFRGBAImageOK@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to TIFFClientOpen@LIBTIFF_4.0' /usr/local/lib/libopencv_imgcodecs.so.3.4.9: undefined reference to
TIFFReadRGBAStrip@LIBTIFF_4.0'
collect2: error: ld returned 1 exit status
CMakeFiles/yolov4.dir/build.make:117: recipe for target 'yolov4' failed
make[2]: *** [yolov4] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yolov4.dir/all' failed
make[1]: *** [CMakeFiles/yolov4.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
Is there any suggestion to how to fix that or it is just because the directory is not cleaned up yet?
Thanks for the repo again
I used the generated yolov4 engine for nvidia deepsteam application, but got a error:
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: deserializationUtils.cpp (567) - Serialization Error in load: 0 (Serialized engine contains plugin, but no plugin factory was provided. To deserialize an engine without a factory, please use IPluginV2 instead.)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_STATE: std::exception
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1452 Deserialize engine failed from file: /opt/nvidia/deepstream/deepstream-5.0/sources/YoloV4/yolov4.engine
Adding Factory Implement may be right. But I just want to kown If IPluginV2 is a good idea. By the way, I'm a beginner for tensorrt, so I want your help.
我是用tar方式安装的tensorrt7.0,cmake时的显示如下:
jinjicheng@ubuntu:~/project/object-detection/tensorrtx/yolov4/build$ cmake -DCMAKE_INCLUDE_PATH='/home/jinjicheng/ENV/tensorrt-7.0/include' ..
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /home/jinjicheng/ENV/cuda-10.0 (found version "10.0")
embed_platform off
-- Found OpenCV: /usr/local (found version "4.1.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jinjicheng/project/object-detection/tensorrtx/yolov4/build
然后执行make时出现以下错误:
[100%] Linking CXX executable yolov4
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/yolov4.dir/build.make:144: recipe for target 'yolov4' failed
make[2]: *** [yolov4] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/yolov4.dir/all' failed
make[1]: *** [CMakeFiles/yolov4.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
project/resnet/resnext50_32x4d.cpp:2:10: fatal error: cuda_runtime_api.h: No such file or directory
#include "cuda_runtime_api.h"
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/resnext50.dir/build.make:62: recipe for target 'CMakeFiles/resnext50.dir/resnext50_32x4d.cpp.o' failed
make[2]: *** [CMakeFiles/resnext50.dir/resnext50_32x4d.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/resnext50.dir/all' failed
make[1]: *** [CMakeFiles/resnext50.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
builder->setMaxWorkspaceSize(1 << 20);
I was not able to find much resources on how to change this to increase GPU memory utilization. Any help would be great.
Thank you very much for your work!
I have a few questions for you.
1.Can you recommend some introductory tutorials?
2.Do you have any good examples? For example, on (GitHub).
So, I am using RetinaFace in DeepStream 5.0 docker container but the problem is I cannot deserialize the engine file successfully. Below is the terminal output if you can make any sense please do let me know:
(gst-plugin-scanner:26): GStreamer-WARNING **: 07:09:47.376: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so': libtrtserver.so: cannot open shared object file: No such file or directory
Warn: 'threshold' parameter has been deprecated. Use 'pre-cluster-threshold' instead.
Now playing: ../../../samples/streams/sample_720p.mp4
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_ARGUMENT: getPluginCreator could not find plugin Decode_TRT version 1
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: safeDeserializationUtils.cpp (293) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_STATE: std::exception
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1452 Deserialize engine failed from file: /opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine
0:00:04.315405593 25 0x5618a017b8d0 WARN nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1566> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine failed
0:00:04.315444719 25 0x5618a017b8d0 WARN nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1673> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-5.0/sources/apps/deepstream-retinaface-multistream/tensorrt_engines_awsT4/retina_r50.engine failed, try rebuild
0:00:04.315464105 25 0x5618a017b8d0 INFO nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1591> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:934 failed to build network since there is no model file matched.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:872 failed to build network.
0:00:04.315759096 25 0x5618a017b8d0 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1611> [UID = 1]: build engine file failed
0:00:04.315783396 25 0x5618a017b8d0 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1697> [UID = 1]: build backend context failed
0:00:04.315800708 25 0x5618a017b8d0 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1024> [UID = 1]: generate backend failed, check config file settings
0:00:04.315966590 25 0x5618a017b8d0 WARN nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary-nvinference-engine> error: Failed to create NvDsInferContext instance
0:00:04.315979631 25 0x5618a017b8d0 WARN nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: retinaface_pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Running...
ERROR from element primary-nvinference-engine: Failed to create NvDsInferContext instance
Error details: gstnvinfer.cpp(781): gst_nvinfer_start (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: retinaface_pgie_config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Hi, thanks for this wonderful work. I was wondering if the compared target scores during the implementation of nms in yolov3-spp.cpp should be det_confidence * class_confidence?
HI,
First thanks for your repo, there is a lot to learn here ! :) I'm working on my Jetson Xavier and with others project, I've been able to go up to 220 FPS with an SSD mobilenet model optimized with TRT and quantized to FP16. With almost the same process, I'm only around 22 FPS with Yolov3. I was looking if there was any way to make it faster and I saw your repo. I've build the wts file and then i run the ./yolov3 -s and -d command, but I was wondering if there was any way to know the performance of this and also if I could use it with a webcam for example.
Regards ! :) And thanks again for this good repo
Hey, Thanks for the amazing work. I was wondering if Ultralytics yolov5 is on your list for a tensorrt implementation. I did not find any TRT impl for this anywhere.
报错:
[05/27/2020-14:34:04] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::434, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov4: /media/mrlee/software/document/AI/project/traffic_scene/tensorrtx/yolov4/yolov4.cpp:209: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
您好,我的yolov4模型经过剪枝,压缩了95%的参数,剪了12个shortcut,出现了这个问题。
thanks for your great repo!!
Now I want to do inference on Xavier.
The origin batch_size is 1, the inference time is 24ms;
If I change batch_size to 3, the inference time is about 67ms.
This is not what I want, I want to use more memory to exchange time: change batch_size to 3, and the inference time is about 24ms, GPU memory 3X. How should I do?
What is the importance of .cu files and why are they loaded in CUDA? While I was going through the documentation and samples of TensorRT I didn't find such a method. Could you please explain the plugin creation method?
[06/08/2020-15:42:10] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::488, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov4: /home/nvidia/data/xgy/test/tensorrtx/yolov4/yolov4.cpp:209: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
Aborted
why is yolov4 slower than yolov3-spp in this repo?
Is the NMS being done on the cpu, rather than the GPU?
(thanks for this really interesting and educational repo.!)
Hi,
Thanks for the repo, I have learned a lot from it. I would like to run yolov3-spp with different input dimensions. I have tried 608x608 but it didn't work out. I changed INPUT_H and INPUT_W both in yolov3-spp.cpp and YoloConfigs.h. Also, made the anchor heights same as anchor widths. However, the resulting images contain a lot of boxes that are not correct. I think I am doing something wrong. What should I do to make it work? Thanks in advance.
你好,我在执行命令./yolov4 -s,在运行到 IBuilderConfig* config = builder->createBuilderConfig()报段错误,请问是什么原因呢?
我使用u版yolov3 自己的数据集训练出的模型 .pt 转换成 weights 后 ,并使用您的pytorchx仓库中的yolov3 代码转换为 .wts 后,无法生成引擎文件
beginning
Loading weights: ../yolov3.wts
0
len 32
1
len 64
2
len 32
3
len 64
5
len 128
6
len 64
7
len 128
9
len 64
10
len 128
12
len 256
13
len 128
14
len 256
16
len 128
17
len 256
19
len 128
20
len 256
22
len 128
23
len 256
25
len 128
26
len 256
28
len 128
29
len 256
31
len 128
32
len 256
34
len 128
35
len 256
37
len 512
38
len 256cuow
39
len 512
41
len 256
42
len 512
44
len 256
45
len 512
47
len 256
48
len 512
50
len 256
51
len 512
53
len 256
54
len 512
56
len 256
57
len 512
59
len 256
60
len 512
62
len 1024
63
len 512
64
len 0
ERROR: Parameter check failed at: ../builder/Network.cpp::addScale::164, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
yolov3: /home/zxzn/tensorrtx/yolov3/yolov3.cpp:197: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string<char>, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
已放弃
Hello, I think this repo is great! I am curious... is there any plan on creating a python wrapper to call all your tensorRT models and preform inference with?
I have lots of code in my python, and I would love to test your models with my python libraries to give you live benchmarks on all the jetson devices :)
Thank you again!
Do all models support int8 mode, such as yolov4?
大神通过你的代码受益匪浅,非常感谢。有时间能不能搞个关键点的模型加速,如手部姿态的关键点
@wang-xinyu 你好,我有两个问题请教。
1,我在jetson agx上测试。yolov4生成引擎,大概性能是20fps,我使用另外一个作者是.weights->onnx->engine,他的引擎性能大概是33fps,输入图片尺寸都是416*416,fp16的引擎。请问这是什么原因?
2,另外,retinaface生成的引擎我测试单张图片是230ms,但是直接使用Pytorch测试.pth模型,单张图片是40ms左右。不知道是哪里出了问题?
Hello!
Thank you for the wonderful work! I would like to contribute to this project.
I noticed something strange on the yolov3-spp.cfg . I used the original pytorch repo to train and model. In normal python inference time I am getting ~33FPS on a 512x512 image.
So I used you repo to convert the weights and run using tenosrRT. However the FPS is still pretty much the same.
Hi,
I am unable to create tensorrt engine using the default parameters for single class config "yolov3-spp-1cls.cfg. i have also tried different height and width but still not able to produce engine it fails at the serialization step. any help would be appreciated.
Thanks in Advance
fatal error: NvInfer.h: No such file or directory
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
CMake Error at myplugins_generated_mish.cu.o.Debug.cmake:219 (message):
Error generating
/media/tensorrtx/yolov4/build/CMakeFiles/myplugins.dir//./myplugins_generated_mish.cu.o
CMakeFiles/myplugins.dir/build.make:70: recipe for target 'CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o' failed
make[2]: *** [CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
真的非常棒的项目,请问这个项目能做成 只加载一次engine模型,然后挂在后台,有图片传进来就开始预测没图片就等待么? 因为我计算了一下时间,加载模型用了7s, 预测一张图片才8ms...
'''
载入模型时间: 7645ms
第几张图片: 1 图片文件名: zidane.jpg
图像处理时间: 109ms
Time compute: 8ms
第几张图片: 2 图片文件名: bus.jpg
图像处理时间: 48ms
加载模型后的推理时间: 8ms
247.561, 168.353, 32.9542, 46.0337, 0.794903, 0,
'''
c++不太熟悉, 期待您的回复.
hello, I want to run mobilenetv1 model in tensorrt, and have ever write engine function for mobilenetv1? Thanks :)
您好,我使用下面的流程进行:
git clone https://github.com/wang-xinyu/tensorrtx.git
git clone https://github.com/ultralytics/yolov3.git
// download its weights 'yolov3-spp-ultralytics.pt'
cd yolov3
cp ../tensorrtx/yolov3-spp/gen_wts.py .
python gen_wts.py yolov3-spp-ultralytics.pt
// a file 'yolov3-spp_ultralytics68.wts' will be generated.
// the master branch of yolov3 should work, if not, you can checkout 4ac60018f6e6c1e24b496485f126a660d9c793d8
ICudaEngine* engine = createEngine(1, builder, DataType::kFLOAT);
assert(engine != nullptr);
// Serialize the engine
IHostMemory* modelStream = engine->serialize();
assert(modelStream != nullptr);
std::ofstream p("yolov3-spp.engine");
if (!p) {
std::cerr << "could not open plan output file" << std::endl;
return -1;
}
printf("modelStream->size():%d\n", modelStream->size());
p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());
发现生成的yolov3.engine每次生成的大小不一样。
生成yolov3.engine使用sudo ./yolov3-spp -d ../samples进行调用的时候,
载入yolov3.engine文件。
std::ifstream file("yolov3-spp.engine", std::ios::binary);
if (file.good()) {
file.seekg(0, file.end);
size = file.tellg();
file.seekg(0, file.beg);
trtModelStream = new char[size];
assert(trtModelStream);
file.read(trtModelStream, size);
file.close();
}
在这里总会报错: ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size, &pf);如下所示:
ERROR: C:\source\rtSafe\coreReadArchive.cpp (55) - Serialization Error in nvinfer1::rt::CoreReadArchive::verifyHeader: 0 (Length in header does not match remaining archive length)
ERROR: INVALID_STATE: Unknown exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
这个问题在Windows下有。linux下没有出现?请问这个怎么解决?谢谢!
Hi,Wang
I try to convert yolov4 on Jetson NX.but meet below issue.
could you help to check it?
//-------------------------------------------------------------
[ 66%] Building CXX object CMakeFiles/yolov4.dir/plugin_factory.cpp.o
In file included from /home/ashing/tensorrtx/yolov4/plugin_factory.cpp:1:0:
/home/ashing/tensorrtx/yolov4/common.h: In function ‘unsigned int samples_common::getElementSize(nvinfer1::DataType)’:
/home/ashing/tensorrtx/yolov4/common.h:271:12: warning: enumeration value ‘kBOOL’ not handled in switch [-Wswitch]
switch (t)
^
/home/ashing/tensorrtx/yolov4/plugin_factory.cpp: In member function ‘virtual nvinfer1::IPlugin* nvinfer1::PluginFactory::createPlugin(const char*, const void*, size_t)’:
/home/ashing/tensorrtx/yolov4/plugin_factory.cpp:13:26: error: ‘createPReLUPlugin’ is not a member of ‘nvinfer1::plugin’
plugin = plugin::createPReLUPlugin(serialData, serialLength);
^~~~~~~~~~~~~~~~~
compilation terminated due to -Wfatal-errors.
您好,请问是否计划实现mobilenetx0.25版本的retinaface人脸检测呢?谢谢!
Firstly, Thanks for the conversion.
sudo ./yolov4 -d
just detects the object with bounding box but not with the result.
How do I get the result of what the object is with the bounding box or in the terminal ??
pengzhao@pengzhao:~/tensorrtx/yolov3/build$ make
[ 25%] Building NVCC (Device) object CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o
In file included from /home/pengzhao/tensorrtx/yolov3/yololayer.cu:1:0:
/home/pengzhao/tensorrtx/yolov3/yololayer.h:8:21: fatal error: NvInfer.h: No such file or directory
compilation terminated.
CMake Error at yololayer_generated_yololayer.cu.o.Debug.cmake:219 (message):
Error generating
/home/pengzhao/tensorrtx/yolov3/build/CMakeFiles/yololayer.dir//./yololayer_generated_yololayer.cu.o
CMakeFiles/yololayer.dir/build.make:63: recipe for target 'CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o' failed
make[2]: *** [CMakeFiles/yololayer.dir/yololayer_generated_yololayer.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yololayer.dir/all' failed
make[1]: *** [CMakeFiles/yololayer.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
I modified yolov3-spp code to read from video streams and infer it, now am trying to parallelize the code, mainly i am trying to share IRuntime, ICudaEngine and IExecutionContext to each spawned thread , am using p_thread to launch new thread and use IExecutionContext for doInference() routine, when i tried it on my 2 GB 1050 GTX card, i got cuda out of memory issue, this is possibly because IExecutionContext refers to same network builder object which points to serialized network, which takes almost 1.4 GB on GPU memory, but when i tried inferring same on Tesla K80 11 GB, i successfully lunched two threads calling doInference routine on shared ICudaEngine object, but the latency got doubled, for one thread it's about 30 ms but when launched 2 threads it's get doubled around 60 ms, so how can I optimise the code in order to leverage multi-threading for inferring multiple video streams in parallel with almost same latency. Thanks.
Thanks for your work. I would like to propose some improvements that might help with multi-stream or multi-processor implementation.
First, add cudastream
argument when calling Cuda kernel.
For example, here
Line 141 in ff364db
doInference()
function. The better way to write it would be: mish_kernel<<<grid_size, block_size, 0, stream>>>(inputs[0], output, input_size_ * batchSize);
CalDetection
kernel in YOLO layers.
Second, copy the anchor information when initializing YOLO plugin instead of doing cudamemcpy
each time for detection.
Lines 181 to 204 in ff364db
cudaFree()
is a synchronized function, which would block all other thread in GPU. So to avoid using this function, it's better to put the anchor information in the device beforehand.
Third, use asynchronous functions instead of synchronous ones. For example, in the code above, cudaMemset()
could change to cudaMemsetAsync()
, cudaMemcpy()
could change to cudaMemcpyAsync()
.
All these suggestions might not be prominent when running inference on only one GPU and one thread. But it would improve a lot for multi-stream or multi-device situation. Besides, it would be more consistent with the logic of parallel computing.
Hope it helps!
Hi,
Thank you for your excellent work. This has really helped me in understand how to work with TensorRT.
I was trying to re-create the RetinaFace model with different dimensions as I have noticed that there is no limitation to how big or small the image has to be. I tried changing the INPUT_H and INPUT_W in the decode.h file, however, when serializing the model it throws an error of mismatch dimensions.
ERROR: (Unnamed Layer* 182) [ElementWise]: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [256,53,53] and [256,54,54]).
Can you please guide me on what I can do to solve this?
Hi, thanks a lot for your excellent work. I've been able to repreduce this work on ubuntu16.04. But i wonder what i have to do to repreduce this repo on Windows?
[ 20%] Building NVCC (Device) object CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o
In file included from /opt/code/YOLO/tensorrtx/yolov4/mish.cu:5:0:
/opt/code/YOLO/tensorrtx/yolov4/mish.h:6:10: fatal error: NvInfer.h: 没有那个文件或目录
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
CMake Error at myplugins_generated_mish.cu.o.Debug.cmake:219 (message):
Error generating
/opt/code/YOLO/tensorrtx/yolov4/build/CMakeFiles/myplugins.dir//./myplugins_generated_mish.cu.o
CMakeFiles/myplugins.dir/build.make:70: recipe for target 'CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o' failed
make[2]: *** [CMakeFiles/myplugins.dir/myplugins_generated_mish.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
Sorry to bother you again. But I just want to ask do you compare the time for the transferred yolov4 engine and the original yolov4 darknet repo?
For my side with Geforce RTX 2080 ti,
the time for the transferred engine on the two test jpg you provided is 38ms per image.
However, when I use the same image to test with the original darknet with the follow simple code
./darknet detector test ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights
The time per same image is only 21ms.
And to avoid the case that the original repo only include model forward time, I also use own video to do the FPS test in the original darknet repo, with following
include video_capturing + NMS + drawing_bboxes: ./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -dont_show -ext_output
exclude video_capturing + NMS + drawing_bboxes: ./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -benchmark
And either of the above two commands are faster than the 38ms tested here with the transferred engine.
Hence, I just want to ask have you compared the speed by yourself and on outside how much does time boost?
Thanks for the repo again
thank you for this project,I have a problem in yolov3.cpp,why not the size is 7 , Isn't it 6,such as x,y,w,h,score,classlabel? thank you
Traceback (most recent call last):
File "gen_wts.py", line 6, in
model = Darknet('cfg/yolov4.cfg', (608, 608))
File "/home/topsci/workspace/yolov3/models.py", line 225, in init
self.module_defs = parse_model_cfg(cfg)
File "/home/topsci/workspace/yolov3/utils/parse_config.py", line 49, in parse_model_cfg
assert not any(u), "Unsupported fields %s in %s. See ultralytics/yolov3#631" % (u, path)
AssertionError: Unsupported fields ['stopbackward', 'max_delta'] in cfg/yolov4.cfg. See ultralytics/yolov3#631
In file included from /workspace/tensorrtyolo/tensorrtx/yolov3-spp/common.h:5:0,
from /workspace/tensorrtyolo/tensorrtx/yolov3-spp/plugin_factory.cpp:1:
/usr/local/include/NvOnnxParser.h:27:10: fatal error: NvOnnxParserTypedefs.h: No such file or directory
#include "NvOnnxParserTypedefs.h"
^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/yolov3-spp.dir/build.make:62: recipe for target 'CMakeFiles/yolov3-spp.dir/plugin_factory.cpp.o' failed
make[2]: *** [CMakeFiles/yolov3-spp.dir/plugin_factory.cpp.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/yolov3-spp.dir/all' failed
make[1]: *** [CMakeFiles/yolov3-spp.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
/.
/usr
/usr/include
/usr/include/x86_64-linux-gnu
/usr/include/x86_64-linux-gnu/NvInfer.h
/usr/include/x86_64-linux-gnu/NvInferRuntime.h
/usr/include/x86_64-linux-gnu/NvInferRuntimeCommon.h
/usr/include/x86_64-linux-gnu/NvInferVersion.h
/usr/include/x86_64-linux-gnu/NvUtils.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libnvinfer_static.a
/usr/share
/usr/share/doc
/usr/share/doc/libnvinfer-dev
/usr/share/doc/libnvinfer-dev/changelog.Debian
/usr/share/doc/libnvinfer-dev/copyright
/usr/lib/x86_64-linux-gnu/libnvinfer.so
你好,非常感谢分享,很好的参考资源,我看了两三个示例代码,发现都是构造每层的网络操作来搭建网络,后续会支持通过模型文件来构建engine吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.