cyrusbehr / tensorrt-cpp-api Goto Github PK

TensorRT C++ API Tutorial

License: MIT License

CMake 7.85% C++ 90.07% Shell 2.08%

computer-vision cpp inference machine-learning tensorrt

tensorrt-cpp-api's Introduction

👋 Nice to meet you！

I'm Cyrus, a computer vision software developer based out of the United States specializing in high-performance machine learning inference. I'm best known for my work with TensorRT C++ API and have created several open-source tutorial projects. For consulting work, please connect with me on LinkedIn.

tensorrt-cpp-api's People

Contributors

Stargazers

Watchers

tensorrt-cpp-api's Issues

Building Can't find OpenCV-CUDA library

I am building your code and get errors in linking:


[build] /usr/bin/ld: CMakeFiles/run_inference_benchmark.dir/src/main.cpp.o: in function `main':
[build] /mnt/DATA/MP/Software/Inference/cyrus-tensorrt-cpp-api/src/main.cpp:82: undefined reference to `cv::cuda::cvtColor(cv::_InputArray const&, cv::_OutputArray const&, int, int, cv::cuda::Stream&)'
[build] /usr/bin/ld: libtensorrt_cpp_api.so: undefined reference to `cv::cuda::split(cv::_InputArray const&, std::vector<cv::cuda::GpuMat, std::allocator<cv::cuda::GpuMat> >&, cv::cuda::Stream&)'
[build] /usr/bin/ld: libtensorrt_cpp_api.so: undefined reference to `cv::cuda::divide(cv::_InputArray const&, cv::_InputArray const&, cv::_OutputArray const&, double, int, cv::cuda::Stream&)'
[build] /usr/bin/ld: libtensorrt_cpp_api.so: undefined reference to `cv::cuda::subtract(cv::_InputArray const&, cv::_InputArray const&, cv::_OutputArray const&, cv::_InputArray const&, int, cv::cuda::Stream&)'
[build] /usr/bin/ld: libtensorrt_cpp_api.so: undefined reference to `cv::cuda::resize(cv::_InputArray const&, cv::_OutputArray const&, cv::Size_<int>, double, double, int, cv::cuda::Stream&)'

I made changes to your CMakeLists.txt to adapt to my installation. Attached here

Output of CMAKE Configure:

[main] Configuring project: cyrus-tensorrt-cpp-api 
[proc] Executing command: /usr/bin/cmake --no-warn-unused-cli -DCMAKE_BUILD_TYPE:STRING=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc -DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/g++ -S/mnt/DATA/MP/Software/Inference/cyrus-tensorrt-cpp-api -B/mnt/DATA/MP/Software/Inference/cyrus-tensorrt-cpp-api/build -G "Unix Makefiles"
[cmake] Not searching for unused variables given on the command line.
[cmake] Cmake Version 3.16.3
[cmake] Cmake directory /mnt/DATA/MP/Software/Inference/cyrus-tensorrt-cpp-api/cmake
[cmake] -- OpenCV libraries OpenCV_LIBS: opencv_calib3d;opencv_core;opencv_dnn;opencv_features2d;opencv_flann;opencv_highgui;opencv_imgcodecs;opencv_imgproc;opencv_ml;opencv_objdetect;opencv_photo;opencv_stitching;opencv_video;opencv_videoio;opencv_aruco;opencv_bgsegm;opencv_bioinspired;opencv_ccalib;opencv_datasets;opencv_dnn_objdetect;opencv_dnn_superres;opencv_dpm;opencv_face;opencv_freetype;opencv_fuzzy;opencv_hdf;opencv_hfs;opencv_img_hash;opencv_line_descriptor;opencv_optflow;opencv_phase_unwrapping;opencv_plot;opencv_quality;opencv_reg;opencv_rgbd;opencv_saliency;opencv_shape;opencv_stereo;opencv_structured_light;opencv_superres;opencv_surface_matching;opencv_text;opencv_tracking;opencv_videostab;opencv_viz;opencv_ximgproc;opencv_xobjdetect;opencv_xphoto
[cmake] -- OpenCV include path OpenCV_INCLUDE_DIRS: /usr/include/opencv4
[cmake] -- OpenCV version: 4.2.0
[cmake] -- Configuring done
[cmake] -- Generating done
[cmake] -- Build files have been written to: /mnt/DATA/MP/Software/Inference/cyrus-tensorrt-cpp-api/build

OpenCV built from code, including cuda:

Thanks!

A version for Jetson Orin NX

I have found that building this project in an Orin NX-based system runs into the OpenCV v4.5.4 from the JetPack 5.1.2 not being built with the CMake option -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules. Therefore it fails at the point of including the header opencv2/cudaarithm.hpp from the file engine.h:

tensorrt-cpp-api/src/engine.h:10:10: fatal error: opencv2/cudaarithm.hpp: No such file or directory
   10 | #include <opencv2/cudaarithm.hpp>

This is the OpenCV forum Question that I found relevant to this problem, although for OpenCv v4.5.2 I realize that I see it in v.4.5.4:
OpenCV#3427.

Can you describe the steps to build OpenCV for specific Cuda versions as required by the Jetson systems?
The Jetpack 5.1.2 (https://developer.nvidia.com/embedded/downloads/archive) installs:
Cuda 11.4.315
TensorRT 8.5.2.2
OpenCV 4.5.4 without Cuda support

Issue when build windows

I have attempted to deploy on Windows OS and encountered an error as follows:

Is instance splitting supported?

How does this pointer get deleted?

Thanks a lot of sharing your repository.
Do you know how this pointer can be deleted?

IOptimizationProfile *optProfile = builder->createOptimizationProfile();

https://github.com/cyrusbehr/tensorrt-cpp-api/blob/main/src/engine.cpp#L126

resource leak, fstream is not closed.

In function "bool Engine::build(std::string onnxModelPath, const std::array<float, 3> &subVals, const std::array<float, 3> &divVals, bool normalize) " the file stream "std::ifstream file" and "std::ofstream outfile" not closed.

[feature] Add support for loading TensorRT engine file directly (instead of onnx model)

Hello!

Thank you for the amazing guide to run onxx files in cpp. I see the code is also converting to tensorrt. Can we have support where we directly load tensorrt files and execute? thanks

OpenCV 4.8 compatibility issue?

I've been having a very difficult time setting up the proper environment to execute run_inference_benchmark. It seems to have an issue with my OpenCV build.

 davis@tony2:~/tensorrt-cpp-api/build$ ./run_inference_benchmark ../models/yolov8n.onnx 
Searching for engine file with name: yolov8n.engine.TeslaT4.fp16.1.1
Engine found, not regenerating...
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.8.0) /home/davisac1/tensorrt-cpp-api/scripts/opencv_contrib-4.8.0/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp:264: error: (-217:Gpu API call) no kernel image is available for execution on the device in function 'call'

Aborted (core dumped)

Looking up this error seems to indicate a lot of people suggesting the CUDA_ARCH_BIN value for the OpenCV build should be 8.0, or 8.7, or something else. Changing that value doesn't seem to help. Any suggestions?

[feature] Add support for non-float output types

Float output type is currently hard coded (with respect to allocation and copying of output buffer). Use m_engine->getTensorDataType(tensorName); to get the datatype of the output. Allocate memory appropriately. Copy output to a generic output byte buffer, which then supports method for obtaining pointer as a specific type.

`parsed` doesn't working

Hello!
Why parsed doesn't working?

[feature] add support for and test with CUDA 12 Ubuntu 22.04

Latest OpenCV as well

undefined symbol: _ZN5pwgen18nvrtcCreateProgramE

Followed readme and ran it, but it won't start due to the following error:

/home/toystorynova/Desktop/YOLOv8-TensorRT-CPP/cmake-build-debug/detect_object_image: symbol lookup error: /usr/local/cuda/targets/x86_64-linux/lib/libnvinfer_plugin.so.8: undefined symbol: _ZN5pwgen18nvrtcCreateProgramE

Currently using CUDA 12.1, Tensor 8.6 GA, and Ubuntu 22.04.3 LTS, could this be related to #40?

Integer output compatibility

Hi there!

I wanted to send a huge thanks for this amazing repo. Honestly, the Nvidia examples are not that clear to grasp as you mentioned in your awesome video. You saved us a lot of time and I am super thankful. I just wanted to point out a minor problem, which I ran into while running a semantic segmentation model's engine using your code.

Since this code assumes "float" output types, it becomes incompatible with networks that have an integer output type. For example, if the output layer has an Argmax operation, the output of the network will be of type int32_t and to make the tensorRT runtime code compatible, one must should change the output buffer type to int32_t instead of float. Otherwise, all of the outputs will be 0.

Anyway, thank you for your amazing work once again and I wish you all the best :)

Using engine models multithread

I wrapped initializing of engines in a class and I want to use more than one engine at the same time with different .engine files.
Do inferencing engines on different threads on cpu cause different thread on gpu??
how can I do this?

Segmentation Fault Error after initial successful run

I am able to build this project, but running the inference after the engine has been generated throws a segmentation fault error

vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ make -j$(nproc)
[ 25%] Building CXX object CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o
/home/vanguard/dev/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::build(std::__cxx11::string)’:
/home/vanguard/dev/tensorrt-cpp-api/src/engine.cpp:81:16: warning: unused variable ‘output’ [-Wunused-variable]
     const auto output = network->getOutput(0);
                ^~~~~~
[ 50%] Linking CXX shared library libtensorrt_cpp_api.so
[ 50%] Built target tensorrt_cpp_api
[ 75%] Building CXX object CMakeFiles/driver.dir/src/main.cpp.o
[100%] Linking CXX executable driver
[100%] Built target driver
vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ ls
CMakeCache.txt  CMakeFiles  cmake_install.cmake  driver  libtensorrt_cpp_api.so  Makefile
vanguard@vanguard-jetson:~/dev/tensorrt-cpp-api/build$ ./driver 
Searching for engine file with name: trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Engine not found, generating...
onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
Detected invalid timing cache, setup a local cache instead
Tactic Device request: 2193MB Available: 492MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 2193MB Available: 500MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 457MB Available: 453MB. Device memory is insufficient to use tactic.
Skipping tactic 4 due to oom error on requested size of 457 detected for tactic 5.
Tactic Device request: 2193MB Available: 482MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Tactic Device request: 2193MB Available: 484MB. Device memory is insufficient to use tactic.
Skipping tactic 3 due to oom error on requested size of 2193 detected for tactic 5.
Success, saved engine to trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Success! Average time per inference: 15.2175 ms, for batch size of: 4

After the successful generation of the engine when I run the driver again I get the following error.

Searching for engine file with name: trt.engine.a220528a4ef634d2ac5172ebc267ecf9.fp32.16.2_4_8.4000000000
Engine found, not regenerating...
Segmentation fault (core dumped)

Environment
TensorRT Version : 8.0.1-1
CUDA Version : 10.2
Operating System + Version : Ubuntu 18.04.6 LTS
Inference Network : AlexNet (using the conversion tutorial here : AlexNet from PyTorch to ONNX ) . I used the dynamic_axes flag while exporting the model.

torch.onnx.export(model, dummy_input, "alexnet_dynamic.onnx", verbose=True, input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

Device for TRT engine builder: Jetson Nano 4GB

Error, directory at provided path does not exist: E:/Github/tensorrt-cpp-api/build/Benchmark/Release/CocoVal

What is cocoval?

for those who are struggling to compile

add those to CmakeLists.txt:

find_path(TENSORRT_INCLUDE_DIR NvInfer.h HINTS ${CUDA_TOOLKIT_ROOT_DIR} ${CUDA_INCLUDE_DIRS})
find_library(TENSORRT_LIB nvinfer HINTS ${CUDA_TOOLKIT_ROOT_DIR} ${CUDA_LIBRARIES})
include_directories(${TENSORRT_INCLUDE_DIR})

#insert your cuda dir
include_directories(/usr/local/cuda/include)
#Insert here your tensorrt path
include_directories(/usr/src/tensorrt/samples/common)
link_libraries(${TENSORRT_LIB}) ```

Unable to build the project

Hi,
I followed the instructions written in Getting Started. However, I am seeing following errors. Can you please help?

user6@aimlserver:~/Documents/tensorRT_CPP_API/tensorrt-cpp-api/build$ cmake ..
-- ccache: not found
-- Found CUDA: /usr/local/cuda-11.6 (found version "11.6") 
-- Found OpenCV: /usr/local (found version "4.5.5") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/build


user6@aimlserver:~/Documents/tensorRT_CPP_API/tensorrt-cpp-api/build$ make -j$(nproc)
[ 25%] Building CXX object CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::loadNetwork()’:
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:254:32: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getNbIOTensors’
  254 |     m_buffers.resize(m_engine->getNbIOTensors());
      |                                ^~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:262:35: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getNbIOTensors’
  262 |     for (int i = 0; i < m_engine->getNbIOTensors(); ++i) {
      |                                   ^~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:263:43: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getIOTensorName’
  263 |         const auto tensorName = m_engine->getIOTensorName(i);
      |                                           ^~~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:265:43: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getTensorIOMode’
  265 |         const auto tensorType = m_engine->getTensorIOMode(tensorName);
      |                                           ^~~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:266:44: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getTensorShape’
  266 |         const auto tensorShape = m_engine->getTensorShape(tensorName);
      |                                            ^~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:267:27: error: ‘TensorIOMode’ has not been declared
  267 |         if (tensorType == TensorIOMode::kINPUT) {
      |                           ^~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:275:34: error: ‘TensorIOMode’ has not been declared
  275 |         } else if (tensorType == TensorIOMode::kOUTPUT) {
      |                                  ^~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::runInference(const std::vector<std::vector<cv::cuda::GpuMat> >&, std::vector<std::vector<std::vector<float> > >&)’:
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:366:20: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setInputShape’; did you mean ‘setInputShapeBinding’?
  366 |         m_context->setInputShape(m_IOTensorNames[i].c_str(), inputDims); // Define the batch size
      |                    ^~~~~~~~~~~~~
      |                    setInputShapeBinding
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:388:34: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  388 |         bool status = m_context->setTensorAddress(m_IOTensorNames[i].c_str(), m_buffers[i]);
      |                                  ^~~~~~~~~~~~~~~~
/home/user6/Documents/tensorRT_CPP_API/tensorrt-cpp-api/src/engine.cpp:395:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘enqueueV3’; did you mean ‘enqueueV2’?
  395 |     bool status = m_context->enqueueV3(inferenceCudaStream);
      |                              ^~~~~~~~~
      |                              enqueueV2
make[2]: *** [CMakeFiles/tensorrt_cpp_api.dir/build.make:76: CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/tensorrt_cpp_api.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Untrusted benchmark test??

2000 iterations fast, but 100000 times, very, very slow. why??

2000 iterations Avg time: 4.8025 ms，Avg FPS:208 fps.
100000 iterations More than 5 minutes outputting results, still working...

half precision returns nan for the feature values

Hi, I am adapting the code for the half precision model. The code runs without error with configuration with fp16. Dimension of input and output are correct too so not buffer issues. However, the feature printout are all nan. Do you have any ideas? Thank you!

no member named `buildSerializedNetwork`

Following the build instructions (except I'm on Ubuntu 18.04), I keep getting the following compile error:

tensorrt-cpp-api/src/engine.cpp:127:48: error: ‘class nvinfer1::IBuilder’ has no member named ‘buildSerializedNetwork’
     std::unique_ptr<IHostMemory> plan{builder->buildSerializedNetwork(*network, *config)};

buildSerializedNetwork() is indeed is part of IBuilder, so question is what the builder object is.

It is constructed from:

auto builder = std::unique_ptr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(m_logger));

And interestingly, I can not find nvinfer1::createInferBuilder when searching the doxygen documentation.

However, after much digging, I managed to get to this page (not indexed by the search function?), that describes it as "anonymous_namespace", and I'm not sure if that is related to the compiler not finding it:

nvinfer1::anonymous_namespace{NvInfer.h} Namespace Reference

Constraint: I'm on Ubuntu 18.04, and Jetson sdk 4.3. If this is a version issue, / code deprecation, I would have expected others to hit the same issue? I can not upgrade, since 4.3 is a requirement set for the project I'm on (all drones that will run the model are on 4.3).

Any clue on how to resolve this would be greatly appreciated

[QUESTION] Issue when translating output to bboxes with NMS. Using YoloV5L6.onnx

I've been looking through the code, and applied the NMS code from here.

The outputs are a little bit strange, looking like they've drifted upwards. How do I get this resolved?

Segfault

Hey Cyrus,

I'm working through your code to learn TensorRT.

I trained tiny Yolov4 model and ported it over to onnx with several different batch sizes. I made some modifications to the code to accommodate static batch sizes. For each model I changed the dimension calculations, depending on which model but each was consistent with the input dimensions.

However I am getting some segfaults on this block of code in engine.cpp


    for (size_t batch = 0; batch < inputFaceChips.size(); ++batch) {
        auto image = inputFaceChips[batch];

        // Preprocess code
        image.convertTo(image, CV_32FC3, 1.f / 255.f);
        cv::subtract(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, cv::noArray(), -1);
        cv::divide(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, 1, -1);

        // NHWC to NCHW conversion
        // NHWC: For each pixel, its 3 colors are stored together in RGB order.
        // For a 3 channel image, say RGB, pixels of the R channel are stored first, then the G channel and finally the B channel.
        // https://user-images.githubusercontent.com/20233731/85104458-3928a100-b23b-11ea-9e7e-95da726fef92.png
        int offset = dims.d[1] * dims.d[2] * dims.d[3] * batch;
        int r = 0 , g = 0, b = 0;
        for (int i = 0; i < dims.d[1] * dims.d[2] * dims.d[3]; ++i) {
            if (i % 3 == 0) {
//SEGFAULT HERE  hostDataBuffer[offset + r++] = *(reinterpret_cast<float*>(image.data) + i);
            } else if (i % 3 == 1) {
                hostDataBuffer[offset + g++ + dims.d[2]*dims.d[3]] = *(reinterpret_cast<float*>(image.data) + i);
            } else {
                hostDataBuffer[offset + b++ + dims.d[2]*dims.d[3]*2] = *(reinterpret_cast<float*>(image.data) + i);
            }
        }
    }

Here are the models I am using:

https://drive.google.com/drive/u/0/folders/14AEDjchvPF8Tp8PrFsRMg7q4wZG8KMNi

NvInfer.h: No such file or directory

Try to run:
g++ /home/ubuntu/tensorrt-cpp-api/src/main.cpp

and get error:
In file included from /home/ubuntu/tensorrt-cpp-api/src/main.cpp:1: /home/ubuntu/tensorrt-cpp-api/src/engine.h:7:10: fatal error: NvInfer.h: No such file or directory 7 | #include "NvInfer.h" | ^~~~~~~~~~~ compilation terminated.

NvInfer.h located in /home/ubuntu/TensorRT-8.6.1.6/include

Unable to build, 'no member getNbBindings'

I am on arch linux, and cannot build the project. I have tried downgrading to cuda 12.1 but have had no luck. Is it a version issue/linking issue?

CUDA versions:

cuda 12.4.1-1
cuda-tools 12.4.1-1
opencv-cuda 4.9.0-3

TensorRt version:
tensorrt-10.0.0.6

Error messages:

[realuser@al build]$ cmake ..
-- The C compiler identification is GNU 13.2.1
-- The CXX compiler identification is GNU 13.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- ccache: not found
CMake Warning (dev) at /usr/lib/cmake/opencv4/OpenCVConfig.cmake:86 (find_package):
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

Call Stack (most recent call first):
  /usr/lib/cmake/opencv4/OpenCVConfig.cmake:108 (find_host_package)
  CMakeLists.txt:17 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /opt/cuda (found suitable exact version "12.4")
-- Found OpenCV: /usr (found version "4.9.0")
-- ccache: not found
-- Found TensorRT: /home/realuser/YOLOv8-TensorRT-CPP/src/TensorRT-8.6.1.6/lib/libnvinfer.so (found version "..")
CMake Warning (dev) at libs/tensorrt-cpp-api/CMakeLists.txt:23 (find_package):
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found CUDA: /usr/local/cuda (found version "12.4")
-- Configuring done (1.1s)
-- Generating done (0.0s)
-- Build files have been written to: /home/realuser/YOLOv8-TensorRT-CPP/build
[realuser@al build]$ cd build
bash: cd: build: No such file or directory
[realuser@al build]$ make -j
[  8%] Building CXX object libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o
In file included from /home/realuser/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.cpp:1:
/home/realuser/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.h: In member function 'void Engine<T>::clearGpuBuffers()':
/home/realuser/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.h:218:75: error: 'class nvinfer1::ICudaEngine' has no member named 'getNbBindings'
  218 |         for (int32_t outputBinding = numInputs; outputBinding < m_engine->getNbBindings(); ++outputBinding) {
      |                                                                           ^~~~~~~~~~~~~
/home/realuser/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.h: In member function 'bool Engine<T>::runInference(const std::vector<std::vector<cv::cuda::GpuMat> >&, std::vector<std::vector<std::vector<_Tp> > >&)':
/home/realuser/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.h:662:75: error: 'class nvinfer1::ICudaEngine' has no member named 'getNbBindings'
  662 |         for (int32_t outputBinding = numInputs; outputBinding < m_engine->getNbBindings(); ++outputBinding) {
      |                                                                           ^~~~~~~~~~~~~
make[2]: *** [libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/build.make:76: libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:215: libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

How to build for older version of Tensorrt on ARM device AGX Xavier

ccache: not found
-- Found OpenCV: /home/badal/Documents/opencv (found suitable version "4.6.0", minimum required is "4.6.0")
-- Configuring done (0.7s)
-- Generating done (0.2s)
-- Build files have been written to: /home/badal/tensorrt-cpp-api/build
badal@badal-desktop:~/tensorrt-cpp-api/build$ make -j8
[ 25%] Building CXX object CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o
/home/badal/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::loadNetwork()’:
/home/badal/tensorrt-cpp-api/src/engine.cpp:250:32: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getNbIOTensors’
250 | m_buffers.resize(m_engine->getNbIOTensors());
| ^~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:258:35: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getNbIOTensors’
258 | for (int i = 0; i < m_engine->getNbIOTensors(); ++i) {
| ^~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:259:43: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getIOTensorName’
259 | const auto tensorName = m_engine->getIOTensorName(i);
| ^~~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:261:43: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getTensorIOMode’
261 | const auto tensorType = m_engine->getTensorIOMode(tensorName);
| ^~~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:262:44: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getTensorShape’
262 | const auto tensorShape = m_engine->getTensorShape(tensorName);
| ^~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:263:27: error: ‘TensorIOMode’ has not been declared
263 | if (tensorType == TensorIOMode::kINPUT) {
| ^~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:270:34: error: ‘TensorIOMode’ has not been declared
270 | } else if (tensorType == TensorIOMode::kOUTPUT) {
| ^~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp: In member function ‘bool Engine::runInference(const std::vector<std::vectorcv::cuda::GpuMat >&, std::vector<std::vector<std::vector > >&)’:
/home/badal/tensorrt-cpp-api/src/engine.cpp:352:20: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setInputShape’; did you mean ‘setInputShapeBinding’?
352 | m_context->setInputShape(m_IOTensorNames[i].c_str(), inputDims); // Define the batch size
| ^~~~~~~~~~~~~
| setInputShapeBinding
/home/badal/tensorrt-cpp-api/src/engine.cpp:374:34: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
374 | bool status = m_context->setTensorAddress(m_IOTensorNames[i].c_str(), m_buffers[i]);
| ^~~~~~~~~~~~~~~~
/home/badal/tensorrt-cpp-api/src/engine.cpp:381:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘enqueueV3’; did you mean ‘enqueueV2’?
381 | bool status = m_context->enqueueV3(inferenceCudaStream);
| ^~~~~~~~~
| enqueueV2
make[2]: *** [CMakeFiles/tensorrt_cpp_api.dir/build.make:76: CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/tensorrt_cpp_api.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

ii graphsurgeon-tf 8.4.1-1+cuda11.4 arm64 GraphSurgeon for TensorRT package
ii libnvinfer-bin 8.4.1-1+cuda11.4 arm64 TensorRT binaries
ii libnvinfer-dev 8.4.1-1+cuda11.4 arm64 TensorRT development libraries and headers
ii libnvinfer-plugin-dev 8.4.1-1+cuda11.4 arm64 TensorRT plugin libraries
ii libnvinfer-plugin8 8.4.1-1+cuda11.4 arm64 TensorRT plugin libraries
ii libnvinfer-samples 8.4.1-1+cuda11.4 all TensorRT samples
ii libnvinfer8 8.4.1-1+cuda11.4 arm64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.4.1-1+cuda11.4 arm64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.4.1-1+cuda11.4 arm64 TensorRT ONNX libraries
ii libnvparsers-dev 8.4.1-1+cuda11.4 arm64 TensorRT parsers libraries
ii libnvparsers8 8.4.1-1+cuda11.4 arm64 TensorRT parsers libraries
ii nvidia-tensorrt 5.0.2-b231 arm64 NVIDIA TensorRT Meta Package
ii nvidia-tensorrt-dev 5.0.2-b231 arm64 NVIDIA TensorRT dev Meta Package
ii python3-libnvinfer 8.4.1-1+cuda11.4 arm64 Python 3 bindings for TensorRT
ii python3-libnvinfer-dev 8.4.1-1+cuda11.4 arm64 Python 3 development package for TensorRT
ii tensorrt 8.4.1.5-1+cuda11.4 arm64 Meta package for TensorRT
ii uff-converter-tf 8.4.1-1+cuda11.4 arm64 UFF converter for TensorRT package

[feature] Use GpuMat buffer as input to TensorRT instead of copying GPU memory.

First of all thanks for open sourcing this project. Learned lots of things from this code.

This is not an issue but more like a doubt came to my mind when I looked through your code. I can see you done all the preprocessing steps in the GpuMat and then you copy the memory from GpuMat to device memory input buffer. I am wondering whether we can use the pointer to the GpuMat directly there and remove the memcopy operation ? Is there any reason for you doing this way ?

multi-stream inference

Hello, does enqueueV3 support multi-stream inference? Is this part included in your code?

Jetson-TX2 contribution

Hi, and thanks for your work.

I updated your code to make it work on a jetson-TX2 which is compatible with:

https://github.com/ltetrel/tensorrt-cpp-api/tree/feat/jetson-tx2

That is just a first draft, with some dirty hacks. Among other, I was not able to get dynamically the input/output tensor shapes for I/O layers, I also removed compatibility for INT8 (just need to replace std::filesystem::directory_iterator), and not sure if the default seed for random number is always the same in std::default_random_engine.

Let me know how you want to proceed, from there I see 3 options:

Changes in another branch feat/jetson-tx2 (as I did on my fork). I recommend this and I would need you to create this branch on your repo so I can make a PR.
Changes living in main branch. I don't recommend this to avoid polluting your original code but that could be possible to have everything at the same place with a bunch of #ifdef depending on cuda version , c++ etc...
No mention to this work on your repo, even if I think it could benefit the community.

Let me know,

NCHW and NHWC

If using the tensorRT engine, the input should be NCHW.
Could you comment the line for the image type?
I am confuse at the process of NCHW and NHWC.
Thanks

support cuda 11.8

Thanks for the code! It was able to run with cuda tool kit 12.1, driver 11.8 and TensorRT-8.6.1.6 for 12.1. I would need the trt inference on cuda toolkit 11.8. What I have tried:

with TensorRT-8.6.1.6 for 11.8. cmake and make passed but when I ran the program, I got

	libcublas.so.12 => not found
	libcublasLt.so.12 => not found

I tried softlink the xxxx.so.11 to 12 but the program detected it and complained.

ldd run_inference_benchmark  | grep found
./run_inference_benchmark: /lib/x86_64-linux-gnu/libcublas.so.12: version `libcublas.so.12' not found (required by /lib/x86_64-linux-gnu/libnvinfer_plugin.so.8)
./run_inference_benchmark: /lib/x86_64-linux-gnu/libcublasLt.so.12: version `libcublasLt.so.12' not found (required by /lib/x86_64-linux-gnu/libnvinfer_plugin.so.8)

I saw your release 1.0 too but that tensor version was for up to 11.5.
I also tried TensorRT-8.4.1.5 but there were functions in the current code that don't exist in this TensorRT version.

Any suggestions are appreciated!

Unable to copy buffer from GPU back to CPU

Searching for engine file with name: trt.engine.651dd00d2e06e214b7f3b5d1002ac689.fp32.16.2_4_8.4000000000
Engine found, not regenerating...
Unable to copy buffer from GPU back to CPU
terminate called after throwing an instance of 'std::runtime_error'
what(): Unable to run inference.
Aborted (core dumped)

Add cross-platform support

Hey @cyrusbehr, I have successfully used your project on Windows 10, and I am eager to contribute to this project.

The project is useful, but it is not convenient when I integrate it into my project. To make it more user-friendly, I plan to modify the CMakeLists.txt and some functions. If you are okay with these changes, I will clean up my code and submit a pull request.

The compilation environment is as follows:

Visual Studio 2017/2022
C++ 11
CUDA 11.8
cudnn 8.8.1.3
OpenCV 4.7.0
TensorRT 8.6.1.6

I conducted separate tests for semantic segmentation and object detection. Only the semantic segmentation using INT8 presision failed. You can find more details in the YOLOv8-TensorRT-CPP project's issue.

By the way, I am not a native English speaker. This issue may contain grammar errors. If there are any parts that are difficult to understand or if there are any mistakes, please let me know. This can also help me improve my English.

opencv compilation error

I download the docker image from nivida/cuda with the command : sudo docker pull nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04

Through image generated after container, I can't find in the container file '/usr/local/cuda/lib64/libcudnn.So‘, how can I solve this problem?

Segmentation fault (core dumped)

@cyrusbehr Hi, i follow your codes and run the demo, "Segmentation fault (core dumped)" will occur sometimes.
here is the demo:

root@f5119fd980c7:/shopeeMT/build# CUDA_VISIBLE_DEVICES=1 ./driver
Searching for engine file with name: trt.engine.fp16
Engine found, not regenerating...
Success! Average time per inference: 0.58 ms, for batch size of: 4
root@f5119fd980c7:/shopeeMT/build# CUDA_VISIBLE_DEVICES=1 ./driver
Searching for engine file with name: trt.engine.fp16
Engine found, not regenerating...
Success! Average time per inference: 0.5475 ms, for batch size of: 4
root@f5119fd980c7:/shopeeMT/build# CUDA_VISIBLE_DEVICES=1 ./driver
Searching for engine file with name: trt.engine.fp16
Engine found, not regenerating...
Segmentation fault (core dumped)
root@f5119fd980c7:/shopeeMT/build# CUDA_VISIBLE_DEVICES=1 ./driver
Searching for engine file with name: trt.engine.fp16
Engine found, not regenerating...
Success! Average time per inference: 0.55 ms, for batch size of: 4
root@f5119fd980c7:/shopeeMT/build# CUDA_VISIBLE_DEVICES=1 ./driver
Searching for engine file with name: trt.engine.fp16
Engine found, not regenerating...
Success! Average time per inference: 0.5525 ms, for batch size of: 4

i debug the program, codes in the file "engine.cpp" can cause this, but i don't know why

    for (size_t batch = 0; batch < inputFaceChips.size(); ++batch) {
        auto image = inputFaceChips[batch];

        // Preprocess code
        image.convertTo(image, CV_32FC3, 1.f / 255.f);
        cv::subtract(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, cv::noArray(), -1);
        cv::divide(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, 1, -1);

        // NHWC to NCHW conversion
        // NHWC: For each pixel, its 3 colors are stored together in RGB order.
        // For a 3 channel image, say RGB, pixels of the R channel are stored first, then the G channel and finally the B channel.
        int offset = dims.d[1] * dims.d[2] * dims.d[3] * batch;
        int r = 0 , g = 0, b = 0;
        for (int i = 0; i < dims.d[1] * dims.d[2] * dims.d[3]; ++i) {
            if (i % 3 == 0) {
                hostDataBuffer[offset + r++] = *(reinterpret_cast<float*>(image.data) + i);
            } else if (i % 3 == 1) {
                hostDataBuffer[offset + g++ + dims.d[2]*dims.d[3]] = *(reinterpret_cast<float*>(image.data) + i);
            } else {
                hostDataBuffer[offset + b++ + dims.d[2]*dims.d[3]*2] = *(reinterpret_cast<float*>(image.data) + i);
            }
        }
    }

looking forward to your reply, thank you

The calibration table is not used when performing int8 operations.

Following the guide, I modified the options.precision value and the path to options.calibrationDataDirectoryPath, built and executed it, and confirmed that the calibration table came out normally.
However, when running, the "Searching for calibration cache:" log does not appear. Also, even if I delete the calibration table file and run it again, the calibration table is not reproduced and the results are the same as when run with the calibration table.

build_opencv.sh error

I got this error even though I edit the path of my TensorRT and cuda.
How can I solve it?

share onnx and engine files for sanity check

Hi, thanks for the great work. Would it be possible to share the onnx and engine files used in this exercise. This would help us with sanity check. Thank you very much.

problem with destructors

Hi, what could be the problem?
Does not work engine.cpp :(

cannot convert argument 1 from 'const std::filesystem::path' to 'std::initializer_list<_Elem>'

Error in Engine.cpp because we cannot convert from filesystem::path to std::string type.
Just modifying Line 17 to filepaths.emplace_back(entry.path().string()); resolves it.
This maybe a windows only problem.

Will this project upgrade to the latest API, like "enqueueV3"?

Hi, really a nice project. Since "enqueuev2" is deprecated in the latest version of TensorRT, will this project be upgrade to the latest API? I'm learning to deploy ONNX models with TensorRT. But due to lack of a complete example with the latest API, it's hard for me to write a demo. Hope I can get some help here.

set the correct value of `CUDA_ARCH_BIN` when install opencv

I am using RTX2070, I had to use 7.5, not 8.6 for CUDA_ARCH_BIN.

builder->buildSerializedNetwork fail

envs:
windows 11
NVIDIA GeForce RTX3050 Laptop
vs2022

export ONNX:
format=onnx dynamic=True simplify=True

Options:
precision=Precision::FP16
optBatchSize=32
maxBatchSize=64

but use trtexec.exe can work properly.

dynamic shape

Where is buffers.h?

I cannot compile the project because of the lack of buffers.h
where should that be located?

support cuda toolkit 11.7 and driver 12.1

Thanks for the code base! I was able to run it with cuda toolkit 12.1. However, I would need to use it on 11.7. TensorRT-8.6.1.6 requests these lib in version 12. I also tried to soft link these to 11 but the it didn't get through.

libcublas.so.12 => not found
libcublasLt.so.12 => not found

I saw your previous version uses TensorRT-8.2.1.8 but that is for up to cuda 11.5.

Thank you in advance for any suggestions!

Cuda declarations were not declared in this scope for all the cuda commands

Hey @cyrusbehr ,
Thank you for this awesome work,
I was using this library, while compiling using make , for all the cuda commands i am getting error. Please help me i am stuck here.

Is there any cuda include location needs to be changed for the target in the default CMakeLists.txt ?

Please let me know if you require further information

About extracting inference results

// Populate the input vectors
const auto& inputDims = engine.getInputDims();
std::vector<std::vectorcv::cuda::GpuMat> inputs;

// TODO:
// For the sake of the demo, we will be feeding the same image to all the inputs
// You should populate your inputs appropriately.
for (const auto & inputDim : inputDims) {
    std::vector<cv::cuda::GpuMat> input;
    for (size_t j = 0; j < batchSize; ++j) {
        cv::cuda::GpuMat resized;
        // TODO:
        // You can choose to resize by scaling, adding padding, or a combination of the two in order to maintain the aspect ratio
        // You can use the Engine::resizeKeepAspectRatioPadRightBottom to resize to a square while maintain the aspect ratio (adds padding where necessary to achieve this).
        // If you are running the sample code using the suggested model, then the input image already has the correct size.
        // The following resizes without maintaining aspect ratio so use carefully!
        cv::cuda::resize(img, resized, cv::Size(inputDim.d[2], inputDim.d[1])); // TRT dims are (height, width) whereas OpenCV is (width, height)
        input.emplace_back(std::move(resized));
    }
    inputs.emplace_back(std::move(input));
}

//std::array<float, 3> subVals {0.5f, 0.5f, 0.5f};
//std::array<float, 3> divVals {0.5f, 0.5f, 0.5f}; --> 1 detected after execution abnormal operation
const std::array<float, 3> subVals { 0.f, 0.f, 0.f };
const std::array<float, 3> divVals { 1.f, 1.f, 1.f }; --> 13 detected after execution in normal operation
bool normalize = true;
std::vector<std::vector<std::vector>> featureVectors;
bool succ = runInference(inputs, featureVectors, subVals, divVals, normalize );

13 detected after execution

featureVectors[0][0][0] -> num_dets value = 1.962e-44#DEN strange value -> Expected to be 13 is normal

featureVectors[0][1][0] -> det_boxes value = 261.750000 x normal value
featureVectors[0][1][1] -> det_boxes value = 39.4375000 y normal value
featureVectors[0][1][2] -> det_boxes value = 301.250000 x + width normal value
featureVectors[0][1][3] -> det_boxes value = 78.8750000 y + height normal value
...

featureVectors[0][2[[0] -> det_scores value = 0.910644531 normal value
featureVectors[0][2[[1] -> det_scores value = 0.903320313 normal value
...

featureVectors[0][3[[0] -> det_classes value = 2.803e-45#DEN strange value -> A value between 0 and 79 is normal
featureVectors[0][3[[1] -> det_classes value = 1.037e-43#DEN strange value -> A value between 0 and 79 is normal
...

I would appreciate it if you could tell me how to extract the inference result value normally.

cyrusbehr / tensorrt-cpp-api Goto Github PK

tensorrt-cpp-api's Introduction

👋 Nice to meet you！

tensorrt-cpp-api's People

Contributors

Stargazers

Watchers

Forkers

tensorrt-cpp-api's Issues

Recommend Projects

Recommend Topics

Recommend Org