nvidia / trt-samples-for-hackathon-cn Goto Github PK

Simple samples for TensorRT programming

License: Apache License 2.0

Makefile 0.66% C++ 12.07% Cuda 6.16% Python 79.93% Shell 1.08% CMake 0.06% C 0.03%

trt-samples-for-hackathon-cn's Introduction

NVIDIA TensorRT Tutorial repository

This repository is aimed at NVIDIA TensorRT beginners and developers. We provide TensorRT-related learning and reference materials, code examples, and summaries of the annual TensorRT Hackathon competition information.

本仓库面向 NVIDIA TensorRT 初学者和开发者,提供了 TensorRT 相关学习资料和参考资料、丰富的代码范例，以及每年举办一次的 TensorRT Hackathon 的比赛信息

Introduction of each directory

Hackathon*, a summary of the annual China TensorRT Hackathon competition information, including competition introduction, related resources of competition topics, reference implementation (which can be regarded as an optimization case of a classic model).
cookbook, a TensorRT Recipe containing rich examples of TensorRT code, such as API usage, process of building and running models in TensorRT using native APIs or Parsers, writing TensorRT Plugins, optimization of computation graph, and more advanced techniques of TensorRT.
old, other TensorRT sample codes which will be gradually put into the cookbook in the future.

Resource

link, Extraction code (提取码): gpq2
- Slices of TensorRT video tutorial on Bilibili, B 站上 TensorRT 视频教程的幻灯片
- Files and information for annual China TensorRT Hackathon competition

trt-samples-for-hackathon-cn's People

Contributors

Stargazers

Watchers

Forkers

gwangsoohong lianqi-kevin jedibobo mengbingrock dataxujing mahaitao999 carlxshen jhzhang19 oreo-lp mountvoom pinery-sls leftthink georgeliu95 zuzi2015 ethanyhzhang 13604019317 mamaship pengliuru jeshxxx chaucerg liuwq0809 benull panshaohua shining-love wangzhifeng1996 xhh1566 herolin12 xiaocai506 davis-love-ai 13281306705 zhaohb vegetablesbird6 zhangxuemiao collector-m highlightcode jie-fang superhg ruoqianguo lovingthresh dxzhanghui jiangluanbu xiaomin-d python-repository-hub duanyaqi seventeenxsq l1uj1awe103 stevengu999 suoivy yifuxiong hzg0505 hihaluemen gassolid tianhongnan hahher jiashuyao chrisgao001 triple-mu bunge-bedstraw-herb chenjiefeng2001 yuksing12 allenwake 1134366652 ioir123ju willert98 zhoutianzi666 hsq79815 sa404-star wz940216 wuleidapao yangchengjun percer013 goldenabel unmannedsupz tomicfish wulongjian timetobed pzhao-eng gaohuan2015 deepblue97 zjj-2015 mindobserver paipaipaidaxing lixiaolx cxmagan brilliantcheng tianyaxjsw 1096125073 alanjonson i-want-something-delicious 18724799167 trt2022 zu-x lxl24 misaka0316 crouchggj wozwdaqian ownlu wilson-yzwang heluocs dl19940602

trt-samples-for-hackathon-cn's Issues

onnnx中含有nonzeros 如何优化？

Hackathon-2021大赛相关作品的百度网盘提取码全部错误

Hackathon-2021大赛相关作品的百度网盘提取码全部错误，望能够更改修正，感谢大赛工作人员！

How use LayerNorm plugin to inference in C++?

I have generated the LayerNormPlugin.so, but how use this plugin to inference in C++?

AttributeError: 'tensorrt.tensorrt.ICudaEngine' object has no attribute 'num_io_tensors'

tensorrt 8.2.3

can i compare the running error in onnx and pt and trt the?

i want to compare the specific precision when i transform pytorch model to onnx or trt model.
my code is following :

import time
import onnx
import torch
import torchvision
import numpy as np

model = torchvision.models.resnet50(pretrained=True).cuda()

input_names = ['input']
output_names = ['output']
image = torch.ones(1, 3, 224, 224).cuda()

onnx_file = "./resnet50.onnx"
torch.onnx.export(model, image, onnx_file, verbose=False,
input_names=input_names, output_names=output_names,
opset_version=11,
dynamic_axes={"input":{0: "batch_size"}, "output":{0: "batch_size"},})

net = onnx.load("./resnet50.onnx")
onnx.checker.check_model(net)

model.eval()
with torch.no_grad():
output1 = model(image)

import onnxruntime

session = onnxruntime.InferenceSession("./resnet50.onnx")
session.get_modelmeta()
output2 = session.run(['output'], {"input": image.cpu().numpy()})

print(torch.mean(output1[0]))
print(np.mean(output2[0][0]))
print("{}vs{}".format(output1.mean(), output2[0].mean()))

the correspond result is following ：
tensor(6.2914e-06, device='cuda:0')
2.6550292e-06
6.291389581747353e-06vs2.655029220477445e-06

i don't know why i have such huge error. whether is my system or running error? i want to know how can i keep the computing precision consistency when i transform the pytorch model to onnx model or trt model.

RuntimeError: Function "cuMemAllocAsync" not found

When I ran the example file TensorFlowToTensorRT-NHWC.py, it occurs：Traceback (most recent call last):
File "TensorFlowToTensorRT-NHWC.py", line 161, in
_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)
File "cuda/cudart.pyx", line 16938, in cuda.cudart.cudaMallocAsync
File "cuda/ccudart.pyx", line 1210, in cuda.ccudart.cudaMallocAsync
File "cuda/_cuda/ccuda.pyx", line 4970, in cuda._cuda.ccuda._cuMemAllocAsync
RuntimeError: Function "cuMemAllocAsync" not found.

I'm using the NVIDIA NGC nvcr.io/nvidia/tensorflow:21.12-tf1-py3, the detailed environment are as follows:
GeForce RTX 2080 Ti，Driver Version: 455.23.05，nvcr.io/nvidia/tensorflow:21.12-tf1-py3，
Package Version

absl-py 1.0.0
appdirs 1.4.4
argon2-cffi 21.1.0
asgiref 3.4.1
astor 0.8.1
astunparse 1.6.3
attrs 21.2.0
audioread 2.1.9
backcall 0.2.0
bleach 4.1.0
cachetools 4.2.4
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.8
click 8.0.3
cloudpickle 2.0.0
cmake-setuptools 0.1.3
cuda-python 11.7.0
cudf 21.10.0a0+345.ge05bd4bf3c
cugraph 21.10.0a0+102.gab401cad
cuml 21.10.0a0+116.gdc14361ba
cupy-cuda114 9.3.0
cupy-cuda115 9.6.0
cycler 0.11.0
Cython 0.29.24
dask 2021.9.1
dask-cuda 21.10.0
dask-cudf 21.10.0a0+345.ge05bd4bf3c
dask-glm 0.2.0
dask-ml 1.9.0
debugpy 1.5.1
decorator 5.1.0
defusedxml 0.7.1
distributed 2021.9.1
Django 3.2.6
entrypoints 0.3
fastavro 1.4.4
fastrlock 0.8
filelock 3.4.0
flatbuffers 1.12
fsspec 2021.7.0
future 0.18.2
gast 0.3.3
google-pasta 0.2.0
graphsurgeon 0.4.5
grpcio 1.42.0
gunicorn 20.1.0
h11 0.12.0
h5py 2.10.0
HeapDict 1.0.1
horovod 0.22.1
httptools 0.2.0
huggingface-hub 0.0.12
idna 3.3
importlib-metadata 4.8.2
importlib-resources 5.4.0
iniconfig 1.1.1
ipykernel 6.6.0
ipython 7.30.0
ipython-genutils 0.2.0
jedi 0.18.1
Jinja2 3.0.3
joblib 1.1.0
json5 0.9.6
jsonschema 4.2.1
jupyter-client 7.1.0
jupyter-core 4.9.1
jupyter-tensorboard 0.2.0
jupyterlab 2.3.2
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupytext 1.13.2
Keras-Applications 1.0.8
Keras-Preprocessing 1.0.5
kiwisolver 1.3.2
librosa 0.9.1
llvmlite 0.36.0
locket 0.2.1
Markdown 3.3.6
markdown-it-py 1.1.0
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.3
mdit-py-plugins 0.2.8
mistune 0.8.4
mock 3.0.5
msgpack 1.0.3
multipledispatch 0.6.0
nbclient 0.5.9
nbconvert 6.3.0
nbformat 5.1.3
nest-asyncio 1.5.4
networkx 2.6.3
nltk 3.6.4
notebook 6.4.3
numba 0.53.1
numpy 1.22.4
nvidia-dali-cuda110 1.8.0
nvidia-dali-tf-plugin-cuda110 1.8.0
nvidia-dlprofviewer 1.8.0
nvidia-pyindex 1.0.9
nvtx 0.2.3
onnx 1.11.0
onnxruntime-gpu 1.11.1
opencv-python 4.5.5.64
opt-einsum 3.3.0
packaging 21.3
pandas 1.2.5
pandocfilters 1.5.0
parso 0.8.3
partd 1.2.0
pexpect 4.7.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
pluggy 1.0.0
polygraphy 0.33.0
pooch 1.6.0
portpicker 1.3.1
prometheus-client 0.12.0
prompt-toolkit 3.0.23
protobuf 3.19.1
psutil 5.7.0
ptyprocess 0.7.0
py 1.11.0
pyarrow 5.0.0
pycparser 2.21
Pygments 2.10.0
pynvml 11.4.1
pyparsing 3.0.6
pypi-kenlm 0.1.20210121
pyrsistent 0.18.0
pytest 6.2.5
python-dateutil 2.8.2
python-dotenv 0.19.2
pytz 2021.3
PyYAML 6.0
pyzmq 22.3.0
regex 2021.11.10
requests 2.26.0
resampy 0.2.2
rmm 21.10.0a0+42.gae27a57
sacremoses 0.0.46
scikit-learn 0.24.0
scipy 1.4.1
Send2Trash 1.8.0
setuptools 59.4.0
six 1.16.0
sortedcontainers 2.4.0
SoundFile 0.10.3.post1
sqlparse 0.4.2
tblib 1.7.0
tensorboard 1.15.0
tensorflow 1.15.5+nv
tensorflow-estimator 1.15.1
tensorrt 8.2.1.8
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
tf2onnx 1.10.1
threadpoolctl 3.0.0
tokenizers 0.10.3
toml 0.10.2
toolz 0.11.2
tornado 6.1
tqdm 4.62.3
traitlets 5.1.1
transformers 4.9.1
treelite 2.1.0
treelite-runtime 2.1.0
typing_extensions 4.0.1
ucx-py 0.21.0a0+37.gbfa0450
uff 0.6.9
urllib3 1.26.7
uvicorn 0.15.0
uvloop 0.16.0
watchgod 0.7
wcwidth 0.2.5
webencodings 0.5.1
websockets 10.1
Werkzeug 2.0.2
wheel 0.37.0
whitenoise 5.3.0
wrapt 1.13.3
xgboost 1.4.2
zict 2.0.0
zipp 3.6.0

cookbook/02-API/Layer中Einsum文档错误

在Einsum层作点积运算示例中，求和运算应该是0.0 * 1.0 + 1.0 * 1.0 + 2.0 * 1.0 + 3.0 * 1.0 = 6.0

is there the effcientnmsplugin test code?

Convert3DMMTo2DMM例子中 reshape维度问题？

trt-samples-for-hackathon-cn/cookbook/10-BestPractice/Convert3DMMTo2DMM/main.py

Line 128 in 32bb0af

reshapeV = gs.Variable("reshapeV-input", np.dtype(np.float32), ['B*T', 256])

这里输入的是B*T*1的tensor，不应该reshape成(B*T)*1吗？为什么reshape成了256？

OnnxGraphSurgeon中readme.md图片路径名不对

Problems in CookBook Demo 08 polygraphy

I wonder what tensorflow version are this repo using?
For the containner you give, the version of CUDA is 11.6, so I install tf 's newest version.
I tried the newest tf2.8.0-gpu, and a lot of problems arises from the tf version., like:
tfgraph_util.convert_variables_to_constants, initializer=tf.truncated_normal_initializer,tf.graph_util.convert_variables_to_constants and tf.gfile.FastGFile, all these are not recognized by tf2.8.0.
So I change all of those to tf.compat.v1.***, run and found no errors.
Maybe I could commit my changes and create a pr.

ReducePlugin编译错误！！！

https://github.com/NVIDIA/trt-samples-for-hackathon-cn/tree/master/cookbook/05-Plugin/PluginReposity/ReducePlugin在编译的时候报如下错误：
我们用的环境是：https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/hackathon/setup.md

50-Resource里面的pdf无法加载，下载后也没法打开

where is the batchedNMSIO.npz?

i want to learn the plugin content, but i cannot find the batchedNMSIO.npz. can anyone supply it?

"Unknown option: --exportLayerInfo ./modelInformation.txt" in 08-Tool/trtexec/command.sh

Hello,
I am using tensorflow:21.10-tf1-py3 to run the cookbook, all things go well except the last command in command.sh
seems that it need higher version trtexec, please advice on it.

thanks.

04-parser run erro

I use nvcr.io/nvidia/pytorch :21.10-py3 .

Traceback (most recent call last):
File "cuda/_cuda/ccuda.pyx", line 3553, in cuda._cuda.ccuda._cuInit
File "cuda/_cuda/ccuda.pyx", line 424, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInit'
Traceback (most recent call last):
File "cuda/_cuda/ccuda.pyx", line 3553, in cuda._cuda.ccuda._cuInit
File "cuda/_cuda/ccuda.pyx", line 424, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Traceback (most recent call last):
File "pyTorchToTensorRT.py", line 49, in
cudart.cudaDeviceSynchronize()
File "cuda/cudart.pyx", line 6923, in cuda.cudart.cudaDeviceSynchronize
File "cuda/ccudart.pyx", line 25, in cuda.ccudart.cudaDeviceSynchronize
File "cuda/cuda/ccuda.pyx", line 3853, in cuda.

failed reading calibration cache via c++ api

trt-samples-for-hackathon-cn/cookbook/04-Parser/pyTorch-ONNX-TensorRT/C++/calibrator.cpp

Lines 58 to 73 in 88fe801

 void const *MyCalibrator::readCalibrationCache(std::size_t &length) noexcept 

 { 

 std::fstream f; 

 f.open(cacheFile, std::fstream::in); 

 if (f.fail()) 

 { 

 std::cout << "Failed finding cache file!" << std::endl; 

 return nullptr; 

 } 

 char *ptr = new char[length]; 

 if (f.is_open()) 

 { 

 f >> ptr; 

 } 

 return ptr; 

 }

operator >> in line 70 only reads the first word before white space or \n

where is the 50-Resource

hi, i want the ppt about tensorrt，but i cant find it

same code in '李玮' demo

https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/hackathon/%E6%9D%8E%E7%8E%AEAPI%E6%90%AD%E5%BB%BA%E7%A4%BA%E4%BE%8B%E4%BB%A3%E7%A0%81/Loop%E5%AE%9E%E7%8E%B0%E5%8D%95%E5%B1%82%E5%8D%95%E5%90%91LSTM.py and https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/hackathon/%E6%9D%8E%E7%8E%AEAPI%E6%90%AD%E5%BB%BA%E7%A4%BA%E4%BE%8B%E4%BB%A3%E7%A0%81/Loop%E5%AE%9E%E7%8E%B0%E5%8D%95%E5%B1%82%E5%8F%8C%E5%90%91LSTM.py seems the same file. Is there a mistake at oploading?

How to use enqueV3 function in cuda graph,would you give me a demo?

I used enqueueV3 function in cuda graph, but the error occured when capturing cuda graph for cuda stream,so I want to be given a demo to show how to use enqueueV3 in cuda graph.Thank you.

Is there some information in the cookbook that is not open?

i mean that, according to the number, there are 00-11, 50-52 and 99 folders, where is the left? Do these left exist?

多模型并发

Hi,
看到官网上说
In general TensorRT objects are not thread-safe. The expected runtime concurrency model is that different threads will operate on different execution contexts. The context contains the state of the network (activation values etc) during execution, so using a context concurrently in different threads results in undefined behavior.
同时进行多个模型的并行推理，使用多个engine进行推理，是不是不安全？

关于09-advance里面的multiSteam，关于run2函数，如何传入data数据进行推理？

跑04-Parser里的pyTorchToTensorRT.py程序会占用C盘大量存储空间

我想知道这是为什么？是不是生成了文件，在那个位置？

LayerNorm Demo貌似跑不通，并且有一些错误。

文件地址：https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/06-PluginAndParser/pyTorch-LayerNorm/main.py

错误1：用onnx-graphsurgeon修改的时候报out of index, 原因是因为div后面没有新节点了，原因是因为torch.mul(x,1)被onnx忽略了。
解决方法：将61行与63行的 x = t.mul(x, 1) 改成 x = t.mul(x, 1.0) 即可。

错误2：125行与129行的onnxFile应该写错了吧，这个demo是修改onnx节点改成LayerNorm节点，按理应该是运行修改后的onnx文件onnxSurgeonFile才对。改完后check结果为False。

错误3：58行的LayerNorm应该只要第三个维度就行了，作者写了三个维度，t.nn.LayerNorm([nBS, nSL, nEmbedding], elementwise_affine=False, eps=epsilon)改成t.nn.LayerNorm(nEmbedding, elementwise_affine=False, eps=epsilon)即可

unset input 无效 bug

Environment
- TensorRT 8.4 GA
- CUDA 11.6 + CuDNN 8.4.0 + CUBLAS 11.9.2
- NVIDIA A10
- NVIDIA Driver 510.73.08
Reproduction Steps
TensorRT会去除没有用到的input，但在下面的例子里，input y并没有用到，但仍存在于输出的engine里。
在trt7.2.3 + cuda10.2 + cudnn8.0 环境下测试是没有问题的

import os
import math
import numpy as np

import tensorrt as trt
# import numpy as np

import pycuda.driver as cuda
import pycuda.autoinit

logger = trt.Logger(trt.Logger.VERBOSE)

def build(plan_name, dim):
    # create builder, network and builder_config
    builder = trt.Builder(logger)
    if not builder:
        raise RuntimeError("create trt.Builder failed!")

    flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(flag)

    builder_config = builder.create_builder_config()
    if not builder_config:
        raise RuntimeError("create_builder_config failed!")

    builder_config.max_workspace_size = 3 * (1024 * 1024 * 1024)

    x = network.add_input(name="x", dtype=trt.float32, shape=(-1, dim, dim))
    y = network.add_input(name="y", dtype=trt.float32, shape=(-1, dim, dim))

    input_len = len(x.shape)

    # softmax
    trt_layer = network.add_softmax(x)
    softmax_dim = input_len - 1
    # trt_layer.axes = 1 << (softmax_dim)
    trt_layer.axes = 1 << 2
    # trt_layer.axes = int(math.pow(2, input_len-1))
    print(f"trt_layer.axes={trt_layer.axes}")
    x = trt_layer.get_output(0)
    print("=====add softmax")
    network.mark_output(x)

    profile = builder.create_optimization_profile()
    profile.set_shape("x", min=(1, dim, dim), opt=(20, dim, dim), max=(120, dim, dim))
    profile.set_shape("y", min=(1, dim, dim), opt=(20, dim, dim), max=(120, dim, dim))
    builder_config.add_optimization_profile(profile)

    engine = builder.build_engine(network, builder_config)
    if not engine:
        raise RuntimeError("build_engine failed")

    print("====================get_binding_shape=====================")
    for i in range(0, engine.num_bindings):
        print("get_binding_shape:" + str(engine.get_binding_name(i)))
    print("====================get_binding_shape=====================")

    serialized_engine = engine.serialize()
    if serialized_engine is None:
        raise RuntimeError("serialize failed")

    with open(plan_name, "wb") as fout:
        fout.write(serialized_engine)

if __name__ == '__main__':
    plan_name = "engine.plan"
    # x_arr = np.load("0.npy")
    # dim = x_arr.shape[1]
    M = 10
    dim = 512
    x_arr = np.random.rand(M, dim, dim).astype(np.float32)
    # print(x_arr.shape[1])
    # print(x_arr)
    # assert 0
    build(plan_name, dim)

    # infer_helper = InferHelper(plan_name, logger)
    # infer_helper.infer([x_arr])

输出log

[06/27/2022-16:53:12] [TRT] [W] Unused Input: y
[06/27/2022-16:53:12] [TRT] [V] Applying generic optimizations to the graph for inference.
[06/27/2022-16:53:12] [TRT] [V] Original: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After Myelin optimization: 1 layers
[06/27/2022-16:53:12] [TRT] [V] Applying ScaleNodes fusions.
[06/27/2022-16:53:12] [TRT] [V] After scale fusion: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dupe layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After final dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After tensor merging: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After vertical fusions: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dupe layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [W] [RemoveDeadLayers] Input Tensor y is unused or used only at compile-time, but is not being removed.
[06/27/2022-16:53:12] [TRT] [V] After final dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After tensor merging: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After slice removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After concat removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] Trying to split Reshape and strided tensor
[06/27/2022-16:53:12] [TRT] [V] Graph construction and optimization completed in 0.000592053 seconds.
[06/27/2022-16:53:13] [TRT] [V] Using cublasLt as a tactic source
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +871, GPU +378, now: CPU 1569, GPU 885 (MiB)
[06/27/2022-16:53:13] [TRT] [V] Using cuDNN as a tactic source
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +127, GPU +58, now: CPU 1696, GPU 943 (MiB)
[06/27/2022-16:53:13] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0
[06/27/2022-16:53:13] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/27/2022-16:53:13] [TRT] [V] Constructing optimization profile number 0 [1/1].
[06/27/2022-16:53:13] [TRT] [V] Reserving memory for host IO tensors. Host: 0 bytes
[06/27/2022-16:53:13] [TRT] [V] =============== Computing reformatting costs
[06/27/2022-16:53:13] [TRT] [V] =============== Computing reformatting costs
[06/27/2022-16:53:13] [TRT] [V] =============== Computing costs for
[06/27/2022-16:53:13] [TRT] [V] *************** Autotuning format combination: Float(262144,512,1) -> Float(262144,512,1) ***************
[06/27/2022-16:53:13] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 0) [Softmax] (CudaSoftMax)
[06/27/2022-16:53:13] [TRT] [V] Tactic: 0x00000000000003ea Time: 0.100779
[06/27/2022-16:53:13] [TRT] [V] Tactic: 0x00000000000003e9 Time: 0.086352
[06/27/2022-16:53:13] [TRT] [V] Fastest Tactic: 0x00000000000003e9 Time: 0.086352
[06/27/2022-16:53:13] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: CudaSoftMax Tactic: 0x00000000000003e9
[06/27/2022-16:53:13] [TRT] [V] Formats and tactics selection completed in 0.00498304 seconds.
[06/27/2022-16:53:13] [TRT] [V] After reformat layers: 1 layers
[06/27/2022-16:53:13] [TRT] [V] Pre-optimized block assignment.
[06/27/2022-16:53:13] [TRT] [V] Block size 3221225472
[06/27/2022-16:53:13] [TRT] [V] Total Activation Memory: 3221225472
[06/27/2022-16:53:13] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[06/27/2022-16:53:13] [TRT] [V] Layer: (Unnamed Layer* 0) [Softmax] Host Persistent: 0 Device Persistent: 0 Scratch Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Host Persistent Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Device Persistent Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Scratch Memory: 0
[06/27/2022-16:53:13] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 256 MiB
[06/27/2022-16:53:13] [TRT] [V] Optimized block assignment.
[06/27/2022-16:53:13] [TRT] [I] Total Activation Memory: 0
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: CUDNN
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: CUBLAS, CUBLAS_LT
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[06/27/2022-16:53:13] [TRT] [V] Engine generation completed in 0.968591 seconds.
[06/27/2022-16:53:13] [TRT] [V] Deleting timing cache: 1 entries, served 0 hits since creation.
[06/27/2022-16:53:13] [TRT] [V] Engine Layer Information:
Layer(CudaSoftMax): (Unnamed Layer* 0) [Softmax], Tactic: 0x00000000000003e9, x[Float(-5,512,512)] -> (Unnamed Layer* 0) [Softmax]_output[Float(-5,512,512)]
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
====================get_binding_shape=====================
get_binding_shape:x
get_binding_shape:y
get_binding_shape:(Unnamed Layer* 0) [Softmax]_output
====================get_binding_shape=====================
[06/27/2022-16:53:13] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/27/2022-16:53:13] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

"from cuda import cudart" error

In plugin samples, "from cuda import cudart" failed. It seems there should be a cuda.py file in the project but not

how to implement tensorflow conv1d and trt version?

Hi, how to implement tensorflow conv1d with tensorrt api? and the trt version of samples provided?

TensorRT不同版本生成的模型精度相差较大

问题描述
- 我们的onnx模型在TRT 8.2.4.2环境下生成的引擎，推理精度没问题，但在TRT 8.4.1.5环境下生成的引擎，推理精度出现较大偏差。
- 插入MaskedSoftmax Plugin后，两个环境下生成的引擎推理精度都正常，所以可以往该方向排查。
Environment
- TensorRT 8.4.1.5 与 TensorRT 8.2.4.2
- CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2
- Docker registry.cn-hangzhou.aliyuncs.com/trt2022/trt-8.4-ga 以及 nvcr.io/nvidia/pytorch:22.04-py3
Reproduction Steps
- 下载链接里的onnx文件与脚本，
  链接：https://pan.baidu.com/s/1Fw9jKh3ohAvUgI0PY3iIVw?pwd=qqq2
  提取码：qqq2 ，在不同TRT版本运行里面的test.sh即可。
Expected Behavior

在3090+TRT8.2.4.2上生成的模型与onnx结果进行对比，误差是很小的。
Actual Behavior

在3090+TRT8.4.1.5上生成的模型与onnx结果进行对比，误差却是很大。

cookbook-04Parser-pytorch-onnx-tensorrt issues

my env :
wsl-20.04
tensorrt 8.4
cuda 11.3
torch 1.10
onnx 1.12.0
test code
master branch main.py
when i test the example code
the tensorrt output is all "0 0 0 0 0 0 0 0" and i have no method to solve the problem
the c++ code is same

#59 (comment)

How to use the scatterND plugin?

Hi! I want to convert a onnx model to tensorrt. However the error is scatterND is not supported by tensorrt. How can I impletment this custom plugin? Thanks a lot.

Compiling for TensorRT 7.1.3.4 with CUDA 10.2

Hi,

Thanks for your Good work.

I want to build the same in my local machine which has TensorRT 7.1.3.4 with CUDA 10.2.
But when I am running I am facing issues.
Where do I need to change in order to make it work for TensorRT 7.1.3.4 with CUDA 10.2?

Thanks,
Darshan C G

ReflectPadding Parse Error

Environment
- NVIDIA A10
- TensorRT 8.4GA
- CUDA 11.6
- CuDNN 8.4.0
- CUBLAS 11.9.2
- NVIDIA Driver 510.73.08

Reproduction Steps

CASE

unit.py: Generate ONNX graph with a Pad layer with reflect mode.

import torch
import torch.nn as nn
import torch.nn.functional as F



class ReflectPad(nn.Module):
    def __init__(self):
        super(ReflectPad, self).__init__()

    def forward(self, input):
        out = F.pad(input, (0, 1, 0, 2), "reflect")
        return out

input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3).cuda()
print(input)
rp = ReflectPad().cuda()
out = rp(input)
print(out)

torch.onnx.export(rp,
                  input,
                  "unit.onnx",
                  input_names=["input"],
                  output_names=["output"],
                  verbose=True,
                  keep_initializers_as_inputs=True,
                  opset_version=13,
                  dynamic_axes={"input": {0: "batch_size"}})

parse.sh: parse the onnx graph generated

#!/usr/bin/env bash
python3 unit.py

trtexec \
        --onnx=unit.onnx \
        --explicitBatch \
        --minShapes=lr:1x3x64x64 \
        --optShapes=lr:1x3x80x80 \
        --maxShapes=lr:1x3x120x120 \
        --saveEngine=unit.plan \
        --workspace=40960 \
        --buildOnly \
        --noTF32 \
        --verbose \

Run this case:
```
bash parse.sh
```

Expected Behavior
- Reffered from [Release TensorRT OSS v8.2.0 EA](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.0-EA), TensorRT parser supported Pad layer in ONNX, especially ND padding, along with reflect padding mode.
- So that this case can parse the graph and generate a TensorRT plan successfully.

Actual Behavior

Error occurred:

[06/27/2022-11:15:47] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[06/27/2022-11:15:47] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - Pad_13
[shuffleNode.cpp::symbolicExecute::392] Error Code 4: Internal Error (Reshape_3: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
[06/27/2022-11:15:47] [E] Failed to parse onnx file
[06/27/2022-11:15:47] [I] Finish parsing network model
[06/27/2022-11:15:47] [E] Parsing model failed
[06/27/2022-11:15:47] [E] Failed to create engine from model or file.
[06/27/2022-11:15:47] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=unit.onnx --explicitBatch --minShapes=lr:1x3x64x64 --optShapes=lr:1x3x80x80 --maxShapes=lr:1x3x120x120 --saveEngine=unit.plan --workspace=40960 --buildOnly --noTF32 --verbose

Additional Notes
- After I propose this problem, NVDIA give a workaround can be used to solve this problem temporally: [padNode-war](https://github.com/NVIDIA/trt-samples-for-hackathon-cn/tree/master/cookbook/06-PluginAndParser/pyTorch-PadNode) .
- There are many existing posts with this problem need help:
  - [Reflect padding in TensorRT - Deep Learning (Training & Inference) / TensorRT - NVIDIA Developer Forums](https://forums.developer.nvidia.com/t/reflect-padding-in-tensorrt/187326)
  - [Pad opset 11 not supported · Issue #378 · onnx/onnx-tensorrt (github.com)](onnx/onnx-tensorrt#378)

06 LayerNormPlugin error!

my env:
cuda: 10.2
cudnn:8.3.2
tensorrt:8.4.0.6
GPU:T4
server: CentOS

when I run python testLayerNormPlugin.py in 06-PluginAndParse/pyTorch-LayerNorm, there has a error:
[TRT] [E] [executionContext.cpp::enqueueInternal::329] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp:: enqueueInternal::329, condition: bindings[x] != nullptr).
How to fix the bug?

TensorRT trtexec onnx export bug

Description

Test in TensorRT-8.2.5.0 TensorRT-8.4.1.5

Environment

TensorRT Version: TensorRT-8.4.1.5
NVIDIA GPU: A10
NVIDIA Driver Version:510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4
Operating System: ubuntu20.04
Python Version (if applicable): python3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Relevant Files

onnx: https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/encoder_final_quant.onnx

Steps To Reproduce

This is my cmd and log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=./encoder_final_quant.onnx --saveEngine=./encoder.plan --minShapes=speech:1x16x80,speech_lengths:1 --optShapes=speech:4x64x80,speech_lengths:4 --maxShapes=speech:16x256x80,speech_lengths:16 --workspace=23028 --verbose --int8

log: https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/log_encoder.txt

初赛总结pdf错漏

37页与38页重复了

Able to include pre and post processing in resnet50 example to illustrate end-to-end pipeline

Thanks for the great work. TrtLite seems useful. Are you able to include pre and post processing in resnet50 example to illustrate end-to-end pipeline ? Thank you very much.

softmax+topk 合并算子可能存在溢出bug？

Environment
- TensorRT 8.4 GA
- CUDA 11.6 + CuDNN 8.4.0 + CUBLAS 11.9.2
- NVIDIA A10
- NVIDIA Driver 510.73.08
Reproduction Steps
softmax+topk=1，这两个算子，trt会合并成一个算子。
发现在特定输入情况下，如果两个算子合并，则结果为nan。如果不合并，则结果看起来是正确的。
代码如下：

import os
import math
import numpy as np

import tensorrt as trt
# import numpy as np

import pycuda.driver as cuda
import pycuda.autoinit

logger = trt.Logger(trt.Logger.VERBOSE)

class InferHelper():
    """"""
    def __init__(self, plan_name, trt_logger):
        """"""
        self.logger = trt_logger
        self.runtime = trt.Runtime(trt_logger)
        with open(plan_name, 'rb') as f:
            self.engine = self.runtime.deserialize_cuda_engine(f.read())
            self.context = self.engine.create_execution_context()
            self.context.active_optimization_profile = 0

    def infer(self, inputs: list):
        nInput = len(inputs)

        bufferD = []
        # alloc memory
        for i in range(nInput):
            bufferD.append(cuda.mem_alloc(inputs[i].nbytes))
            cuda.memcpy_htod(bufferD[i], inputs[i].ravel())
            self.context.set_binding_shape(i, tuple(inputs[i].shape))
            # print(inputs[i].nbytes)

        for i in range(0, self.engine.num_bindings):
            print("get_binding_shape:" + str(self.context.get_binding_shape(i)))

        outputs = []
        for i in range(len(inputs), self.engine.num_bindings):
            outputs.append(np.zeros(self.context.get_binding_shape(i)).astype(np.float32))

        nOutput = len(outputs)
        for i in range(nOutput):
            bufferD.append(cuda.mem_alloc(outputs[i].nbytes))
            # print(outputs[i].nbytes)

        for i in range(len(inputs), self.engine.num_bindings):
            trt_output_shape = self.context.get_binding_shape(i)
            output_idx = i - len(inputs)
            if not (list(trt_output_shape) == list(outputs[output_idx].shape)):
                self.logger.log(trt.Logger.ERROR, "[Infer] output shape is error!")
                self.logger.log(trt.Logger.ERROR, "trt_output.shape = " + str(trt_output_shape))
                self.logger.log(trt.Logger.ERROR, "base_output.shape = " + str(outputs[output_idx].shape))
                assert(0)

        # warm up
        self.context.execute_v2(bufferD)

        # T1 = time.perf_counter()

        # self.context.execute_v2(bufferD)

        # T2 =time.perf_counter()
        # print("time=" + str((T2-T1) * 1000) + "ms")

        for i in range(nInput, nInput + nOutput):
            cuda.memcpy_dtoh(outputs[i - nInput].ravel(), bufferD[i])

        for i in range(0, len(outputs)):
            print("outputs.shape:" + str(outputs[i].shape))
            print("outputs.sum:" + str(outputs[i].sum()))
            print(outputs[i])

            # print("trt_output.shape:" + str(trt_output.shape))
            # print("trt_output.sum:" + str(trt_output.sum()))
            # print(trt_output.view(-1)[0:10])
            # print("torch.allclose result:" + str(torch.allclose(base_output, trt_output, 1e-05, 1e-03)))
            # print("====================")
        return outputs
        # return torch.allclose(base_output, trt_output, 1e-05, 1e-03)

def build(plan_name, dim):
    # create builder, network and builder_config
    builder = trt.Builder(logger)
    if not builder:
        raise RuntimeError("create trt.Builder failed!")

    flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(flag)

    builder_config = builder.create_builder_config()
    if not builder_config:
        raise RuntimeError("create_builder_config failed!")

    builder_config.max_workspace_size = 3 * (1024 * 1024 * 1024)

    x = network.add_input(name="x", dtype=trt.float32, shape=(-1, dim))

    input_len = len(x.shape)

    # softmax
    softmax_layer = network.add_softmax(x)
    softmax_dim = input_len - 1
    softmax_layer.axes = 1 << softmax_dim

    print(f"softmax_layer.axes={softmax_layer.axes}")
    x = softmax_layer.get_output(0)
    print("=====add softmax")
    # network.mark_output(x)

    # max
    topk_dim = input_len - 1
    axes = 1 << topk_dim
    topk_layer = network.add_topk(x, trt.TopKOperation.MAX, 1, axes)
    x = topk_layer.get_output(0)
    print("=====add topk")

    network.mark_output(x)

    profile = builder.create_optimization_profile()
    profile.set_shape("x", min=(1, dim), opt=(10, dim), max=(100, dim))
    builder_config.add_optimization_profile(profile)

    engine = builder.build_engine(network, builder_config)
    if not engine:
        raise RuntimeError("build_engine failed")

    serialized_engine = engine.serialize()
    if serialized_engine is None:
        raise RuntimeError("serialize failed")

    with open(plan_name, "wb") as fout:
        fout.write(serialized_engine)

if __name__ == '__main__':
    plan_name = "engine.plan"

    x_arr = np.load("0.npy")
    dim = x_arr.shape[1]

    # M = 50
    # dim = 1024
    # x_arr = np.random.rand(M, dim).astype(np.float32)
    
    print(x_arr.shape)
    print(x_arr.sum())
    print(x_arr)
    build(plan_name, dim)

    infer_helper = InferHelper(plan_name, logger)
    infer_helper.infer([x_arr])

测试行为如下

只有一个topk输出，main函数中，读取0.npy，结果为nan
使用rand 随机数，有结果（没有检查是否正确）
softmax和topk的结果都mark_output，有结果（没有检查是否正确）
运行结束后报错 Segmentation fault

trt2022/dev镜像workspace目录下缺少东西

使用docker pull registry.cn-hangzhou.aliyuncs.com/trt2022/dev拉取镜像缺少东西
没有一下这些
/workspace/buildFromWorkspace.sh
/workspace/encoder.onnx
/workspace/decoder.onnx
/workspace/data/*.npz
是因为镜像更新了吗？以上这些文件哪里可以下载？

3D to 2D 算子优化 shpae = None 疑问

你好，我参考2022初赛ppt优化encoder.onnx模型，不过发现ppt里对应的node在onnx-graphsurgeon里打印出来shape时None(ppt里展示是Bxt4x256)，这样的是否无法做reshape了呢？

我的node查看代码是这样的：

from collections import OrderedDict
import numpy as np
import onnx
import onnx_graphsurgeon as gs
import os
import tensorrt as trt
from onnx import shape_inference



input_onnx = "encoder.onnx"

original_model = onnx.load(input_onnx)
# inferred_model = shape_inference.infer_shapes(original_model)
graph = gs.import_onnx(original_model)
# print(inferred_model.graph.value_info)

for node in graph.nodes:
    if node.name == "MatMul_379":
        pass

AddPlugin examples need to set plugin field in the constructor of PluginCreator

Environment
TensorRT 8.4 GA
Reproduction Steps

## Code file 1
## use AddScalarPlugin example
import onnx_graphsurgeon as gs
import onnx
from collections import OrderedDict
import numpy as np

soFile = "./AddScalarPlugin.so"
onnxFile = "./model.onnx"

ctypes.cdll.LoadLibrary(soFilePath)

inputs = gs.Variable(
    name="inputs", dtype=np.float32, shape=["batch", "seq", 256])
outputs = gs.Variable(
    name="outputs", dtype=np.float32, shape=["batch", "seq", 256])
nodes = [
    gs.Node(
        op="AddScalar",
        name="AddScalar_1",
        attrs=OrderedDict(scalar=np.array([2.0], dtype=np.float32)),
        inputs=[inputs],
        outputs=[outputs]
    )
]

graph = gs.Graph(
    nodes=nodes, inputs=[inputs], outputs=[outputs], opset=13,
    name="onnx")
model = gs.export_onnx(graph=graph)
onnx.save(model, onnxFile)

## Code file 2
## use AddScalarPlugin example
import tensorrt as trt
import numpy as np
import ctypes

soFile = "./AddScalarPlugin.so"
onnxFile = "./model.onnx"
logger = trt.Logger(trt.Logger.VERBOSE)
trt.init_libnvinfer_plugins(logger, '')
ctypes.cdll.LoadLibrary(soFile)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
profile = builder.create_optimization_profile()
config = builder.create_builder_config()
config.max_workspace_size = 6 << 30
parser = trt.OnnxParser(network, logger)
parser.parse_from_file(onnxFile)

Backgroud
When using OnnxParser or polygraphy to load plugin with attribute, it would be no plugin field when plugin::createPlugin is called, which means that it fails to load the attribute of plugin from onnx model.
How to fix
Take AddScalarPlugin::createPlugin as an example, these two lines are necessary before getting the size and data for plugin fields:

attr_.clear();  
attr_.emplace_back(PluginField("scalar", nullptr, PluginFieldType::kFLOAT32, 1));

Reference
fcPlugin in TensorRT repo

could author explain dynamic_axes?

In the file 04-Parser/pyTorch-ONNX-TensorRT/pyTorchToTensorRT.py
code exports a dynamic shape with below code.
dynamic_axes={"x": {0: "nBatchSize"}, "z": {0: "nBatchSize"}

It runs well. I have some question about
(1) nBatchSize is not define in this file, why and how to use this variable
(2)i can understand x is dynamic, but why "z"?

thanks.

当我基于cookbook的04-Parser文件pyTorch-ONNX-TensorRT，转换.plane文件，在转换RobustVideoMatting项目时候遇到一个下采样率的问题，该如何处理？

https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.onnx
这是我ONNX转换脚本

这个RVM抠像项目的onnx细节实现

这是解析日志：最后两行为错误日志
E:\Win_Anaconda3\envs\TensorRTPy\python.exe E:/project/TensorRT_project/trt-samples-for-hackathon-cn/cookbook/04-Parser/pyTorch-ONNX-TensorRT/RVM_python_auto/main.py
Succeeded converting model into onnx!
[09/03/2022-13:53:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +323, GPU +0, now: CPU 7241, GPU 1139 (MiB)
[09/03/2022-13:53:26] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +192, GPU +68, now: CPU 7537, GPU 1207 (MiB)
Succeeded finding onnx file!
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GridAnchor_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GridAnchorRect_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::NMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Reorg_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Region_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Clip_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::LReLU_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PriorBox_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Normalize_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ScatterND version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::RPROI_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchedNMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::FlattenConcat_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CropAndResize version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CropAndResizeDynamic version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::DetectionLayer_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ProposalDynamic version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Proposal version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ProposalLayer_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ResizeNearest_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Split version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::SpecialSlice_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::InstanceNormalization_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::InstanceNormalization_TRT version 2
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CoordConvAC version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::DecodeBbox3DPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GenerateDetection_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::NMSDynamic_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PillarScatterPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::VoxelGeneratorPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Adding network input: src with dtype: float32, dimensions: (-1, 3, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: src for ONNX tensor: src
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r1i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r1i for ONNX tensor: r1i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r2i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r2i for ONNX tensor: r2i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r3i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r3i for ONNX tensor: r3i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r4i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r4i for ONNX tensor: r4i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: downsample_ratio with dtype: float32, dimensions: (1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: downsample_ratio for ONNX tensor: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: aspp.aspp2.1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: project_pha.conv.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: project_pha.conv.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.box_filter_pha.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.conv_pha.6.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.conv_pha.6.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 817
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 818
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 820
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 821
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 823
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 824
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 826
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 827
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 829
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 830
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 832
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 833
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 835
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 836
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 838
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 839
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 841
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 842
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 844
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 845
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 847
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 848
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 850
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 851
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 853
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 854
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 856
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 857
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 859
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 860
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 862
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 863
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 865
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 866
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 868
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 869
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 871
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 872
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 874
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 875
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 877
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 878
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 880
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 881
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 883
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 884
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 886
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 887
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 889
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 890
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 892
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 893
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 895
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 896
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 898
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 899
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 901
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 902
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 904
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 905
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 907
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 908
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 910
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 911
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 913
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 914
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 916
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 917
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 919
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 920
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 922
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 923
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 925
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 926
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 928
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 929
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 931
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 932
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 934
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 935
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 937
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 938
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 940
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 941
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 943
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 944
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 946
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 947
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 949
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 950
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 952
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 953
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 955
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 956
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 958
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 959
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 961
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 962
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 964
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 965
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 967
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 968
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 970
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 971
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 973
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 974
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 976
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 977
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 978
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 979
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 980
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 981
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 982
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 983
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Constant_0 [Constant]
[09/03/2022-13:53:26] [TRT] [V] Constant_0 [Constant] inputs:
[09/03/2022-13:53:26] [TRT] [V] Constant_0 [Constant] outputs: [388 -> (0)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Constant_1 [Constant]
[09/03/2022-13:53:26] [TRT] [V] Constant_1 [Constant] inputs:
[09/03/2022-13:53:26] [TRT] [V] Constant_1 [Constant] outputs: [389 -> (2)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Concat_2 [Concat]
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 389
[09/03/2022-13:53:26] [TRT] [V] Searching for input: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Searching for input: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Concat_2 [Concat] inputs: [389 -> (2)[FLOAT]], [downsample_ratio -> (1)[FLOAT]], [downsample_ratio -> (1)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Registering layer: 389 for ONNX node: 389
[09/03/2022-13:53:26] [TRT] [V] Registering layer: Concat_2 for ONNX node: Concat_2
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: 390 for ONNX tensor: 390
[09/03/2022-13:53:26] [TRT] [V] Concat_2 [Concat] outputs: [390 -> (4)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Resize_3 [Resize]
[09/03/2022-13:53:26] [TRT] [V] Searching for input: src
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 388
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 390
[09/03/2022-13:53:26] [TRT] [V] Resize_3 [Resize] inputs: [src -> (-1, 3, -1, -1)[FLOAT]], [388 -> (0)[FLOAT]], [390 -> (4)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Registering layer: Resize_3 for ONNX node: Resize_3
[09/03/2022-13:53:26] [TRT] [V] Running resize layer with:
Transformation mode: pytorch_half_pixel
Resize mode: linear

[09/03/2022-13:53:26] [TRT] [E] [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1237] Error Code 4: Internal Error (downsample_ratio: network input that is shape tensor must have type Int32) Failed parsing .onnx file! In node 3 (parseGraph): INVALID_NODE: Invalid Node - Resize_3 [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1237] Error Code 4: Internal Error (downsample_ratio: network input that is shape tensor must have type Int32)

Process finished with exit code 0

Tutorial slides.

Tutorial.pdf is broken.

core dumped，分析是onnxsim和tensorrt同时存在报free(): invalid pointer异常

在进行python代码调试时，core dumped，分析是onnxsim和tensorrt同时存在报free(): invalid pointer异常

from onnxsim import simplify
import tensorrt as trt

if name == 'main':
pass

python bug.py

free(): invalid pointer
Aborted (core dumped)

pip3 list|grep onnx-sim
onnx-simplifier 0.3.10

pip3 list|grep tensorrt
nvidia-tensorrt 8.2.4.2
tensorrt 8.2.4.2

cookbook/03-APIModel/MNISTExample-pyTorch的C++例程在WSL中运行时提示 ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

cookbook/03-APIModel/MNISTExample-pyTorch中的C++例程可以正常编译，运行结果也是正确的。但是会在运行时提示错误 ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

执行make test

$ make test
make clean
make[1]: Entering directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
rm -rf ./*.d ./*.o ./*.so ./*.exe ./*.plan
make[1]: Leaving directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
make -j3
make[1]: Entering directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT main.o -o main.d main.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT cnpy.o -o cnpy.d cnpy.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT calibrator.o -o calibrator.d calibrator.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o calibrator.o -c calibrator.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o main.o -c main.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o cnpy.o -c cnpy.cpp
/usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -lcudart -L/opt/TensorRT-8.4.3.1/lib -lnvinfer -lz -o main.exe main.o cnpy.o calibrator.o
make[1]: Leaving directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
python3 ./createCalibrationAndInferenceData.py
Succeeded creating data for calibration and inference!
./main.exe > result-C++.log
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

执行./main.exe

$ ./main.exe
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
Succeeded building serialized engine!
Succeeded building engine!
Binding all? Yes
Bind[0]:i[0]->FLOAT (1, 1, 28, 28) inputT0
Bind[1]:o[0]->INT32 (1, 1) (Unnamed Layer* 17) [TopK]_output_2

inputT0: (1, 1, 28, 28, )
absSum=33566.0000,mean=42.8138,var=7573.8174,max=255.0000,min= 0.0000,diff=15760.0000,
 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 
 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 

(Unnamed Layer* 17) [TopK]_output_2: (1, 1, )
absSum= 8.0000,mean= 8.0000,var= 0.0000,max=      8,min=      8,diff= 0.0000,
       8, 
       8, 
     8

环境：
Windows 11（22000.856）WSL2
GPU：NVIDIA GeForce RTX 3070 Laptop 驱动版本 512.78
CUDA: 11.6.2 sh文件安装
CUDNN: 8.4.1.50 Tar文件安装
TensorRT: 8.4.3.1 Tar文件安装

conda:
cuda-python 11.6
cudatoolkit 11.6
cudnn 8.4.1.50

04-Parser typo during install python packages

wrong
pip install -r requirments.txt

right:
pip install -r requirements.txt

where is the int8.cache file?

I run the pyTorchToTensor.py file. In the code ,there is:
config.int8_calibrator = calibrator.MyCalibrator(calibrationDataPath, calibrationCount, (1, 1, imageHeight, imageWidth), cacheFile)
and cacheFile='./int8.cache'
where is the int8.cache file

batchedNMSPlugin is not compatible for TensorRT8?

according to batchedNMSPlugin,batchedNMSPlugin is not compatible for TensorRT8.

I have added batchedNMSPlugin to tensorrt model.
And it runs fine except the speed is slow.
Is it the reason that leads to the speed?
And how can I add nms to the model for reducing the size of output tensor while transferring the output by the network?

Plugin 申请资源位置会导致资源浪费，可优化

Environment
- TensorRT 7x +
- 与其他相关软件版本无关
背景

TensorRT-OSS 中提供的plugin（如embLayerNormPlugin， fcPlugin），资源（显存，句柄等）申请，自7.x之后，是放到了构造函数中。7.x之前我记得是放到initialize()函数中
build阶段和infer阶段，trt plugin各个函数调用顺序如下

// build 阶段
1. Plugin::Plugin
2. Plugin::clone
3. Plugin::Plugin
4. Plugin::destroy
5. Plugin::clone
6. Plugin::Plugin
7. Plugin::clone
8. Plugin::Plugin
9. Plugin::clone
10. Plugin::Plugin
11. Plugin::destroy
12. Plugin::initialize
13. Plugin::destroy
14. Plugin::terminate
15. Plugin::destroy
16. Plugin::destroy

// infer 阶段
1. Plugin::deserialize_value
2. Plugin::initialize
3. Plugin::clone
4. Plugin::Plugin
5. Plugin::enqueue
6. Plugin::terminate
7. Plugin::destro

发现的问题

plugin中申请的资源，期望在build or infer阶段中，只保留一份。
资源申请是放到了构造函数中，每clone一个plugin类，就会申请一份资源。
根据上述函数调用顺序可以发现，在build阶段，第11步 destroy之前，内存中存有四份资源。

可能导致的严重后果

trt在build阶段，对显存的消耗本来就比较大。
以我所做的一个AI大模型为例，某一个层的权值共32M，共有18层。那么会多额外占 18 * 32 * 3 = 1728M 显存。
目前AI模型越来越大，合并的算子也越来越大，这个问题会愈发明显。

尝试解决方案

对于权值比较好解决，将权值以输入的形式送入即可（比如groupNormalizationPlugin）。
但对于句柄等资源比较麻烦（stream, cublas_handle的申请都会占用显存）。资源申请放到initialize里的话，infer阶段会在initialize后又clone一次……我现在想到的解决方案是写三个构造函数，一个createPlugin时调用，一个clone时调用，一个deserialize时调用，资源申请放到一三中。

	void const *MyCalibrator::readCalibrationCache(std::size_t &length) noexcept
	{
	std::fstream f;
	f.open(cacheFile, std::fstream::in);
	if (f.fail())
	{
	std::cout << "Failed finding cache file!" << std::endl;
	return nullptr;
	}
	char *ptr = new char[length];
	if (f.is_open())
	{
	f >> ptr;
	}
	return ptr;
	}