Giter VIP home page Giter VIP logo

trt-samples-for-hackathon-cn's Introduction

NVIDIA TensorRT Tutorial repository

This repository is aimed at NVIDIA TensorRT beginners and developers. We provide TensorRT-related learning and reference materials, code examples, and summaries of the annual TensorRT Hackathon competition information.

本仓库面向 NVIDIA TensorRT 初学者和开发者,提供了 TensorRT 相关学习资料和参考资料、丰富的代码范例,以及每年举办一次的 TensorRT Hackathon 的比赛信息

Introduction of each directory

  • Hackathon*, a summary of the annual China TensorRT Hackathon competition information, including competition introduction, related resources of competition topics, reference implementation (which can be regarded as an optimization case of a classic model).

  • cookbook, a TensorRT Recipe containing rich examples of TensorRT code, such as API usage, process of building and running models in TensorRT using native APIs or Parsers, writing TensorRT Plugins, optimization of computation graph, and more advanced techniques of TensorRT.

  • old, other TensorRT sample codes which will be gradually put into the cookbook in the future.

Resource

  • link, Extraction code (提取码): gpq2
    • Slices of TensorRT video tutorial on Bilibili, B 站上 TensorRT 视频教程的幻灯片
    • Files and information for annual China TensorRT Hackathon competition

trt-samples-for-hackathon-cn's People

Contributors

jedibobo avatar jie-fang avatar nvgaryji avatar shining365 avatar wili-65535 avatar xueweilnvidia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trt-samples-for-hackathon-cn's Issues

can i compare the running error in onnx and pt and trt the?

i want to compare the specific precision when i transform pytorch model to onnx or trt model.
my code is following :

import time
import onnx
import torch
import torchvision
import numpy as np

model = torchvision.models.resnet50(pretrained=True).cuda()

input_names = ['input']
output_names = ['output']
image = torch.ones(1, 3, 224, 224).cuda()

onnx_file = "./resnet50.onnx"
torch.onnx.export(model, image, onnx_file, verbose=False,
input_names=input_names, output_names=output_names,
opset_version=11,
dynamic_axes={"input":{0: "batch_size"}, "output":{0: "batch_size"},})

net = onnx.load("./resnet50.onnx")
onnx.checker.check_model(net)

model.eval()
with torch.no_grad():
output1 = model(image)

import onnxruntime

session = onnxruntime.InferenceSession("./resnet50.onnx")
session.get_modelmeta()
output2 = session.run(['output'], {"input": image.cpu().numpy()})

print(torch.mean(output1[0]))
print(np.mean(output2[0][0]))
print("{}vs{}".format(output1.mean(), output2[0].mean()))

the correspond result is following :
tensor(6.2914e-06, device='cuda:0')
2.6550292e-06
6.291389581747353e-06vs2.655029220477445e-06

i don't know why i have such huge error. whether is my system or running error? i want to know how can i keep the computing precision consistency when i transform the pytorch model to onnx model or trt model.

RuntimeError: Function "cuMemAllocAsync" not found

When I ran the example file TensorFlowToTensorRT-NHWC.py, it occurs:Traceback (most recent call last):
File "TensorFlowToTensorRT-NHWC.py", line 161, in
_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)
File "cuda/cudart.pyx", line 16938, in cuda.cudart.cudaMallocAsync
File "cuda/ccudart.pyx", line 1210, in cuda.ccudart.cudaMallocAsync
File "cuda/_cuda/ccuda.pyx", line 4970, in cuda._cuda.ccuda._cuMemAllocAsync
RuntimeError: Function "cuMemAllocAsync" not found.


I'm using the NVIDIA NGC nvcr.io/nvidia/tensorflow:21.12-tf1-py3, the detailed environment are as follows:
GeForce RTX 2080 Ti,Driver Version: 455.23.05,nvcr.io/nvidia/tensorflow:21.12-tf1-py3,
Package Version


absl-py 1.0.0
appdirs 1.4.4
argon2-cffi 21.1.0
asgiref 3.4.1
astor 0.8.1
astunparse 1.6.3
attrs 21.2.0
audioread 2.1.9
backcall 0.2.0
bleach 4.1.0
cachetools 4.2.4
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.8
click 8.0.3
cloudpickle 2.0.0
cmake-setuptools 0.1.3
cuda-python 11.7.0
cudf 21.10.0a0+345.ge05bd4bf3c
cugraph 21.10.0a0+102.gab401cad
cuml 21.10.0a0+116.gdc14361ba
cupy-cuda114 9.3.0
cupy-cuda115 9.6.0
cycler 0.11.0
Cython 0.29.24
dask 2021.9.1
dask-cuda 21.10.0
dask-cudf 21.10.0a0+345.ge05bd4bf3c
dask-glm 0.2.0
dask-ml 1.9.0
debugpy 1.5.1
decorator 5.1.0
defusedxml 0.7.1
distributed 2021.9.1
Django 3.2.6
entrypoints 0.3
fastavro 1.4.4
fastrlock 0.8
filelock 3.4.0
flatbuffers 1.12
fsspec 2021.7.0
future 0.18.2
gast 0.3.3
google-pasta 0.2.0
graphsurgeon 0.4.5
grpcio 1.42.0
gunicorn 20.1.0
h11 0.12.0
h5py 2.10.0
HeapDict 1.0.1
horovod 0.22.1
httptools 0.2.0
huggingface-hub 0.0.12
idna 3.3
importlib-metadata 4.8.2
importlib-resources 5.4.0
iniconfig 1.1.1
ipykernel 6.6.0
ipython 7.30.0
ipython-genutils 0.2.0
jedi 0.18.1
Jinja2 3.0.3
joblib 1.1.0
json5 0.9.6
jsonschema 4.2.1
jupyter-client 7.1.0
jupyter-core 4.9.1
jupyter-tensorboard 0.2.0
jupyterlab 2.3.2
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupytext 1.13.2
Keras-Applications 1.0.8
Keras-Preprocessing 1.0.5
kiwisolver 1.3.2
librosa 0.9.1
llvmlite 0.36.0
locket 0.2.1
Markdown 3.3.6
markdown-it-py 1.1.0
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.3
mdit-py-plugins 0.2.8
mistune 0.8.4
mock 3.0.5
msgpack 1.0.3
multipledispatch 0.6.0
nbclient 0.5.9
nbconvert 6.3.0
nbformat 5.1.3
nest-asyncio 1.5.4
networkx 2.6.3
nltk 3.6.4
notebook 6.4.3
numba 0.53.1
numpy 1.22.4
nvidia-dali-cuda110 1.8.0
nvidia-dali-tf-plugin-cuda110 1.8.0
nvidia-dlprofviewer 1.8.0
nvidia-pyindex 1.0.9
nvtx 0.2.3
onnx 1.11.0
onnxruntime-gpu 1.11.1
opencv-python 4.5.5.64
opt-einsum 3.3.0
packaging 21.3
pandas 1.2.5
pandocfilters 1.5.0
parso 0.8.3
partd 1.2.0
pexpect 4.7.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
pluggy 1.0.0
polygraphy 0.33.0
pooch 1.6.0
portpicker 1.3.1
prometheus-client 0.12.0
prompt-toolkit 3.0.23
protobuf 3.19.1
psutil 5.7.0
ptyprocess 0.7.0
py 1.11.0
pyarrow 5.0.0
pycparser 2.21
Pygments 2.10.0
pynvml 11.4.1
pyparsing 3.0.6
pypi-kenlm 0.1.20210121
pyrsistent 0.18.0
pytest 6.2.5
python-dateutil 2.8.2
python-dotenv 0.19.2
pytz 2021.3
PyYAML 6.0
pyzmq 22.3.0
regex 2021.11.10
requests 2.26.0
resampy 0.2.2
rmm 21.10.0a0+42.gae27a57
sacremoses 0.0.46
scikit-learn 0.24.0
scipy 1.4.1
Send2Trash 1.8.0
setuptools 59.4.0
six 1.16.0
sortedcontainers 2.4.0
SoundFile 0.10.3.post1
sqlparse 0.4.2
tblib 1.7.0
tensorboard 1.15.0
tensorflow 1.15.5+nv
tensorflow-estimator 1.15.1
tensorrt 8.2.1.8
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
tf2onnx 1.10.1
threadpoolctl 3.0.0
tokenizers 0.10.3
toml 0.10.2
toolz 0.11.2
tornado 6.1
tqdm 4.62.3
traitlets 5.1.1
transformers 4.9.1
treelite 2.1.0
treelite-runtime 2.1.0
typing_extensions 4.0.1
ucx-py 0.21.0a0+37.gbfa0450
uff 0.6.9
urllib3 1.26.7
uvicorn 0.15.0
uvloop 0.16.0
watchgod 0.7
wcwidth 0.2.5
webencodings 0.5.1
websockets 10.1
Werkzeug 2.0.2
wheel 0.37.0
whitenoise 5.3.0
wrapt 1.13.3
xgboost 1.4.2
zict 2.0.0
zipp 3.6.0

Problems in CookBook Demo 08 polygraphy

I wonder what tensorflow version are this repo using?
For the containner you give, the version of CUDA is 11.6, so I install tf 's newest version.
I tried the newest tf2.8.0-gpu, and a lot of problems arises from the tf version., like:
tfgraph_util.convert_variables_to_constants, initializer=tf.truncated_normal_initializer,tf.graph_util.convert_variables_to_constants and tf.gfile.FastGFile, all these are not recognized by tf2.8.0.
So I change all of those to tf.compat.v1.***, run and found no errors.
Maybe I could commit my changes and create a pr.

04-parser run erro

I use nvcr.io/nvidia/pytorch :21.10-py3 .

Traceback (most recent call last):
File "cuda/_cuda/ccuda.pyx", line 3553, in cuda._cuda.ccuda._cuInit
File "cuda/_cuda/ccuda.pyx", line 424, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInit'
Traceback (most recent call last):
File "cuda/_cuda/ccuda.pyx", line 3553, in cuda._cuda.ccuda._cuInit
File "cuda/_cuda/ccuda.pyx", line 424, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Traceback (most recent call last):
File "pyTorchToTensorRT.py", line 49, in
cudart.cudaDeviceSynchronize()
File "cuda/cudart.pyx", line 6923, in cuda.cudart.cudaDeviceSynchronize
File "cuda/ccudart.pyx", line 25, in cuda.ccudart.cudaDeviceSynchronize
File "cuda/cuda/ccuda.pyx", line 3853, in cuda.

failed reading calibration cache via c++ api

void const *MyCalibrator::readCalibrationCache(std::size_t &length) noexcept
{
std::fstream f;
f.open(cacheFile, std::fstream::in);
if (f.fail())
{
std::cout << "Failed finding cache file!" << std::endl;
return nullptr;
}
char *ptr = new char[length];
if (f.is_open())
{
f >> ptr;
}
return ptr;
}

operator >> in line 70 only reads the first word before white space or \n

same code in '李玮' demo

多模型并发

Hi,
看到官网上说
In general TensorRT objects are not thread-safe. The expected runtime concurrency model is that different threads will operate on different execution contexts. The context contains the state of the network (activation values etc) during execution, so using a context concurrently in different threads results in undefined behavior.
同时进行多个模型的并行推理,使用多个engine进行推理,是不是不安全?

LayerNorm Demo貌似跑不通,并且有一些错误。

文件地址:https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/06-PluginAndParser/pyTorch-LayerNorm/main.py

错误1:用onnx-graphsurgeon修改的时候报out of index, 原因是因为div后面没有新节点了,原因是因为torch.mul(x,1)被onnx忽略了。
解决方法:将61行与63行的 x = t.mul(x, 1) 改成 x = t.mul(x, 1.0) 即可。

错误2:125行与129行的onnxFile应该写错了吧,这个demo是修改onnx节点改成LayerNorm节点,按理应该是运行修改后的onnx文件onnxSurgeonFile才对。改完后check结果为False。

错误3:58行的LayerNorm应该只要第三个维度就行了,作者写了三个维度,t.nn.LayerNorm([nBS, nSL, nEmbedding], elementwise_affine=False, eps=epsilon)改成t.nn.LayerNorm(nEmbedding, elementwise_affine=False, eps=epsilon)即可

unset input 无效 bug

  • Environment

    • TensorRT 8.4 GA
    • CUDA 11.6 + CuDNN 8.4.0 + CUBLAS 11.9.2
    • NVIDIA A10
    • NVIDIA Driver 510.73.08
  • Reproduction Steps
    TensorRT会去除没有用到的input,但在下面的例子里,input y并没有用到,但仍存在于输出的engine里。
    在trt7.2.3 + cuda10.2 + cudnn8.0 环境下测试是没有问题的

import os
import math
import numpy as np

import tensorrt as trt
# import numpy as np

import pycuda.driver as cuda
import pycuda.autoinit

logger = trt.Logger(trt.Logger.VERBOSE)

def build(plan_name, dim):
    # create builder, network and builder_config
    builder = trt.Builder(logger)
    if not builder:
        raise RuntimeError("create trt.Builder failed!")

    flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(flag)

    builder_config = builder.create_builder_config()
    if not builder_config:
        raise RuntimeError("create_builder_config failed!")

    builder_config.max_workspace_size = 3 * (1024 * 1024 * 1024)

    x = network.add_input(name="x", dtype=trt.float32, shape=(-1, dim, dim))
    y = network.add_input(name="y", dtype=trt.float32, shape=(-1, dim, dim))

    input_len = len(x.shape)

    # softmax
    trt_layer = network.add_softmax(x)
    softmax_dim = input_len - 1
    # trt_layer.axes = 1 << (softmax_dim)
    trt_layer.axes = 1 << 2
    # trt_layer.axes = int(math.pow(2, input_len-1))
    print(f"trt_layer.axes={trt_layer.axes}")
    x = trt_layer.get_output(0)
    print("=====add softmax")
    network.mark_output(x)

    profile = builder.create_optimization_profile()
    profile.set_shape("x", min=(1, dim, dim), opt=(20, dim, dim), max=(120, dim, dim))
    profile.set_shape("y", min=(1, dim, dim), opt=(20, dim, dim), max=(120, dim, dim))
    builder_config.add_optimization_profile(profile)

    engine = builder.build_engine(network, builder_config)
    if not engine:
        raise RuntimeError("build_engine failed")

    print("====================get_binding_shape=====================")
    for i in range(0, engine.num_bindings):
        print("get_binding_shape:" + str(engine.get_binding_name(i)))
    print("====================get_binding_shape=====================")

    serialized_engine = engine.serialize()
    if serialized_engine is None:
        raise RuntimeError("serialize failed")

    with open(plan_name, "wb") as fout:
        fout.write(serialized_engine)

if __name__ == '__main__':
    plan_name = "engine.plan"
    # x_arr = np.load("0.npy")
    # dim = x_arr.shape[1]
    M = 10
    dim = 512
    x_arr = np.random.rand(M, dim, dim).astype(np.float32)
    # print(x_arr.shape[1])
    # print(x_arr)
    # assert 0
    build(plan_name, dim)

    # infer_helper = InferHelper(plan_name, logger)
    # infer_helper.infer([x_arr])

  • 输出log
[06/27/2022-16:53:12] [TRT] [W] Unused Input: y
[06/27/2022-16:53:12] [TRT] [V] Applying generic optimizations to the graph for inference.
[06/27/2022-16:53:12] [TRT] [V] Original: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After Myelin optimization: 1 layers
[06/27/2022-16:53:12] [TRT] [V] Applying ScaleNodes fusions.
[06/27/2022-16:53:12] [TRT] [V] After scale fusion: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dupe layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After final dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After tensor merging: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After vertical fusions: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After dupe layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [W] [RemoveDeadLayers] Input Tensor y is unused or used only at compile-time, but is not being removed.
[06/27/2022-16:53:12] [TRT] [V] After final dead-layer removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After tensor merging: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After slice removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] After concat removal: 1 layers
[06/27/2022-16:53:12] [TRT] [V] Trying to split Reshape and strided tensor
[06/27/2022-16:53:12] [TRT] [V] Graph construction and optimization completed in 0.000592053 seconds.
[06/27/2022-16:53:13] [TRT] [V] Using cublasLt as a tactic source
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +871, GPU +378, now: CPU 1569, GPU 885 (MiB)
[06/27/2022-16:53:13] [TRT] [V] Using cuDNN as a tactic source
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +127, GPU +58, now: CPU 1696, GPU 943 (MiB)
[06/27/2022-16:53:13] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0
[06/27/2022-16:53:13] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/27/2022-16:53:13] [TRT] [V] Constructing optimization profile number 0 [1/1].
[06/27/2022-16:53:13] [TRT] [V] Reserving memory for host IO tensors. Host: 0 bytes
[06/27/2022-16:53:13] [TRT] [V] =============== Computing reformatting costs
[06/27/2022-16:53:13] [TRT] [V] =============== Computing reformatting costs
[06/27/2022-16:53:13] [TRT] [V] =============== Computing costs for
[06/27/2022-16:53:13] [TRT] [V] *************** Autotuning format combination: Float(262144,512,1) -> Float(262144,512,1) ***************
[06/27/2022-16:53:13] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 0) [Softmax] (CudaSoftMax)
[06/27/2022-16:53:13] [TRT] [V] Tactic: 0x00000000000003ea Time: 0.100779
[06/27/2022-16:53:13] [TRT] [V] Tactic: 0x00000000000003e9 Time: 0.086352
[06/27/2022-16:53:13] [TRT] [V] Fastest Tactic: 0x00000000000003e9 Time: 0.086352
[06/27/2022-16:53:13] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: CudaSoftMax Tactic: 0x00000000000003e9
[06/27/2022-16:53:13] [TRT] [V] Formats and tactics selection completed in 0.00498304 seconds.
[06/27/2022-16:53:13] [TRT] [V] After reformat layers: 1 layers
[06/27/2022-16:53:13] [TRT] [V] Pre-optimized block assignment.
[06/27/2022-16:53:13] [TRT] [V] Block size 3221225472
[06/27/2022-16:53:13] [TRT] [V] Total Activation Memory: 3221225472
[06/27/2022-16:53:13] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[06/27/2022-16:53:13] [TRT] [V] Layer: (Unnamed Layer* 0) [Softmax] Host Persistent: 0 Device Persistent: 0 Scratch Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Host Persistent Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Device Persistent Memory: 0
[06/27/2022-16:53:13] [TRT] [I] Total Scratch Memory: 0
[06/27/2022-16:53:13] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 256 MiB
[06/27/2022-16:53:13] [TRT] [V] Optimized block assignment.
[06/27/2022-16:53:13] [TRT] [I] Total Activation Memory: 0
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: CUDNN
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: CUBLAS, CUBLAS_LT
[06/27/2022-16:53:13] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[06/27/2022-16:53:13] [TRT] [V] Engine generation completed in 0.968591 seconds.
[06/27/2022-16:53:13] [TRT] [V] Deleting timing cache: 1 entries, served 0 hits since creation.
[06/27/2022-16:53:13] [TRT] [V] Engine Layer Information:
Layer(CudaSoftMax): (Unnamed Layer* 0) [Softmax], Tactic: 0x00000000000003e9, x[Float(-5,512,512)] -> (Unnamed Layer* 0) [Softmax]_output[Float(-5,512,512)]
[06/27/2022-16:53:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
====================get_binding_shape=====================
get_binding_shape:x
get_binding_shape:y
get_binding_shape:(Unnamed Layer* 0) [Softmax]_output
====================get_binding_shape=====================
[06/27/2022-16:53:13] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/27/2022-16:53:13] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

TensorRT不同版本生成的模型精度相差较大

TensorRT不同版本生成的模型精度相差较大

  • 问题描述

    • 我们的onnx模型在TRT 8.2.4.2环境下生成的引擎,推理精度没问题,但在TRT 8.4.1.5环境下生成的引擎,推理精度出现较大偏差。
    • 插入MaskedSoftmax Plugin后,两个环境下生成的引擎推理精度都正常,所以可以往该方向排查。
  • Environment

    • TensorRT 8.4.1.5 与 TensorRT 8.2.4.2
    • CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2
    • Docker registry.cn-hangzhou.aliyuncs.com/trt2022/trt-8.4-ga 以及 nvcr.io/nvidia/pytorch:22.04-py3
  • Reproduction Steps

  • Expected Behavior
    3090-8 2 4 2
    在3090+TRT8.2.4.2上生成的模型与onnx结果进行对比,误差是很小的。

  • Actual Behavior
    3090-8 4 1 5
    在3090+TRT8.4.1.5上生成的模型与onnx结果进行对比,误差却是很大。

cookbook-04Parser-pytorch-onnx-tensorrt issues

my env :
wsl-20.04
tensorrt 8.4
cuda 11.3
torch 1.10
onnx 1.12.0
test code
master branch main.py
when i test the example code
the tensorrt output is all "0 0 0 0 0 0 0 0" and i have no method to solve the problem
the c++ code is same

#59 (comment)

How to use the scatterND plugin?

Hi! I want to convert a onnx model to tensorrt. However the error is scatterND is not supported by tensorrt. How can I impletment this custom plugin? Thanks a lot.

Compiling for TensorRT 7.1.3.4 with CUDA 10.2

Hi,

Thanks for your Good work.

I want to build the same in my local machine which has TensorRT 7.1.3.4 with CUDA 10.2.
But when I am running I am facing issues.
Where do I need to change in order to make it work for TensorRT 7.1.3.4 with CUDA 10.2?

Thanks,
Darshan C G

ReflectPadding Parse Error

  • Environment

    • NVIDIA A10
    • TensorRT 8.4GA
    • CUDA 11.6
    • CuDNN 8.4.0
    • CUBLAS 11.9.2
    • NVIDIA Driver 510.73.08
  • Reproduction Steps

    • CASE

      • unit.py: Generate ONNX graph with a Pad layer with reflect mode.

        import torch
        import torch.nn as nn
        import torch.nn.functional as F
        
        
        
        class ReflectPad(nn.Module):
            def __init__(self):
                super(ReflectPad, self).__init__()
        
            def forward(self, input):
                out = F.pad(input, (0, 1, 0, 2), "reflect")
                return out
        
        input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3).cuda()
        print(input)
        rp = ReflectPad().cuda()
        out = rp(input)
        print(out)
        
        torch.onnx.export(rp,
                          input,
                          "unit.onnx",
                          input_names=["input"],
                          output_names=["output"],
                          verbose=True,
                          keep_initializers_as_inputs=True,
                          opset_version=13,
                          dynamic_axes={"input": {0: "batch_size"}})
      • parse.sh: parse the onnx graph generated

        #!/usr/bin/env bash
        python3 unit.py
        
        trtexec \
                --onnx=unit.onnx \
                --explicitBatch \
                --minShapes=lr:1x3x64x64 \
                --optShapes=lr:1x3x80x80 \
                --maxShapes=lr:1x3x120x120 \
                --saveEngine=unit.plan \
                --workspace=40960 \
                --buildOnly \
                --noTF32 \
                --verbose \
    • Run this case:

      bash parse.sh
  • Expected Behavior

  • Actual Behavior

    • Error occurred:

      [06/27/2022-11:15:47] [E] [TRT] ModelImporter.cpp:776: --- End node ---
      [06/27/2022-11:15:47] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
      [6] Invalid Node - Pad_13
      [shuffleNode.cpp::symbolicExecute::392] Error Code 4: Internal Error (Reshape_3: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
      [06/27/2022-11:15:47] [E] Failed to parse onnx file
      [06/27/2022-11:15:47] [I] Finish parsing network model
      [06/27/2022-11:15:47] [E] Parsing model failed
      [06/27/2022-11:15:47] [E] Failed to create engine from model or file.
      [06/27/2022-11:15:47] [E] Engine set up failed
      &&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=unit.onnx --explicitBatch --minShapes=lr:1x3x64x64 --optShapes=lr:1x3x80x80 --maxShapes=lr:1x3x120x120 --saveEngine=unit.plan --workspace=40960 --buildOnly --noTF32 --verbose
      
  • Additional Notes

06 LayerNormPlugin error!

my env:
cuda: 10.2
cudnn:8.3.2
tensorrt:8.4.0.6
GPU:T4
server: CentOS

when I run python testLayerNormPlugin.py in 06-PluginAndParse/pyTorch-LayerNorm, there has a error:
[TRT] [E] [executionContext.cpp::enqueueInternal::329] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp:: enqueueInternal::329, condition: bindings[x] != nullptr).
How to fix the bug?

TensorRT trtexec onnx export bug

Description

Test in TensorRT-8.2.5.0 TensorRT-8.4.1.5

Environment

TensorRT Version: TensorRT-8.4.1.5
NVIDIA GPU: A10
NVIDIA Driver Version:510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4
Operating System: ubuntu20.04
Python Version (if applicable): python3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Relevant Files

onnx: https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/encoder_final_quant.onnx

Steps To Reproduce

This is my cmd and log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=./encoder_final_quant.onnx --saveEngine=./encoder.plan --minShapes=speech:1x16x80,speech_lengths:1 --optShapes=speech:4x64x80,speech_lengths:4 --maxShapes=speech:16x256x80,speech_lengths:16 --workspace=23028 --verbose --int8

log: https://oneflow-static.oss-cn-beijing.aliyuncs.com/tripleMu/log_encoder.txt

softmax+topk 合并算子可能存在溢出bug?

  • Environment

    • TensorRT 8.4 GA
    • CUDA 11.6 + CuDNN 8.4.0 + CUBLAS 11.9.2
    • NVIDIA A10
    • NVIDIA Driver 510.73.08
  • Reproduction Steps
    softmax+topk=1,这两个算子,trt会合并成一个算子。
    发现在特定输入情况下,如果两个算子合并,则结果为nan。如果不合并,则结果看起来是正确的。
    代码如下:

import os
import math
import numpy as np

import tensorrt as trt
# import numpy as np

import pycuda.driver as cuda
import pycuda.autoinit

logger = trt.Logger(trt.Logger.VERBOSE)

class InferHelper():
    """"""
    def __init__(self, plan_name, trt_logger):
        """"""
        self.logger = trt_logger
        self.runtime = trt.Runtime(trt_logger)
        with open(plan_name, 'rb') as f:
            self.engine = self.runtime.deserialize_cuda_engine(f.read())
            self.context = self.engine.create_execution_context()
            self.context.active_optimization_profile = 0

    def infer(self, inputs: list):
        nInput = len(inputs)

        bufferD = []
        # alloc memory
        for i in range(nInput):
            bufferD.append(cuda.mem_alloc(inputs[i].nbytes))
            cuda.memcpy_htod(bufferD[i], inputs[i].ravel())
            self.context.set_binding_shape(i, tuple(inputs[i].shape))
            # print(inputs[i].nbytes)

        for i in range(0, self.engine.num_bindings):
            print("get_binding_shape:" + str(self.context.get_binding_shape(i)))

        outputs = []
        for i in range(len(inputs), self.engine.num_bindings):
            outputs.append(np.zeros(self.context.get_binding_shape(i)).astype(np.float32))

        nOutput = len(outputs)
        for i in range(nOutput):
            bufferD.append(cuda.mem_alloc(outputs[i].nbytes))
            # print(outputs[i].nbytes)

        for i in range(len(inputs), self.engine.num_bindings):
            trt_output_shape = self.context.get_binding_shape(i)
            output_idx = i - len(inputs)
            if not (list(trt_output_shape) == list(outputs[output_idx].shape)):
                self.logger.log(trt.Logger.ERROR, "[Infer] output shape is error!")
                self.logger.log(trt.Logger.ERROR, "trt_output.shape = " + str(trt_output_shape))
                self.logger.log(trt.Logger.ERROR, "base_output.shape = " + str(outputs[output_idx].shape))
                assert(0)

        # warm up
        self.context.execute_v2(bufferD)

        # T1 = time.perf_counter()

        # self.context.execute_v2(bufferD)

        # T2 =time.perf_counter()
        # print("time=" + str((T2-T1) * 1000) + "ms")

        for i in range(nInput, nInput + nOutput):
            cuda.memcpy_dtoh(outputs[i - nInput].ravel(), bufferD[i])

        for i in range(0, len(outputs)):
            print("outputs.shape:" + str(outputs[i].shape))
            print("outputs.sum:" + str(outputs[i].sum()))
            print(outputs[i])

            # print("trt_output.shape:" + str(trt_output.shape))
            # print("trt_output.sum:" + str(trt_output.sum()))
            # print(trt_output.view(-1)[0:10])
            # print("torch.allclose result:" + str(torch.allclose(base_output, trt_output, 1e-05, 1e-03)))
            # print("====================")
        return outputs
        # return torch.allclose(base_output, trt_output, 1e-05, 1e-03)

def build(plan_name, dim):
    # create builder, network and builder_config
    builder = trt.Builder(logger)
    if not builder:
        raise RuntimeError("create trt.Builder failed!")

    flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(flag)

    builder_config = builder.create_builder_config()
    if not builder_config:
        raise RuntimeError("create_builder_config failed!")

    builder_config.max_workspace_size = 3 * (1024 * 1024 * 1024)

    x = network.add_input(name="x", dtype=trt.float32, shape=(-1, dim))

    input_len = len(x.shape)

    # softmax
    softmax_layer = network.add_softmax(x)
    softmax_dim = input_len - 1
    softmax_layer.axes = 1 << softmax_dim

    print(f"softmax_layer.axes={softmax_layer.axes}")
    x = softmax_layer.get_output(0)
    print("=====add softmax")
    # network.mark_output(x)

    # max
    topk_dim = input_len - 1
    axes = 1 << topk_dim
    topk_layer = network.add_topk(x, trt.TopKOperation.MAX, 1, axes)
    x = topk_layer.get_output(0)
    print("=====add topk")

    network.mark_output(x)

    profile = builder.create_optimization_profile()
    profile.set_shape("x", min=(1, dim), opt=(10, dim), max=(100, dim))
    builder_config.add_optimization_profile(profile)

    engine = builder.build_engine(network, builder_config)
    if not engine:
        raise RuntimeError("build_engine failed")

    serialized_engine = engine.serialize()
    if serialized_engine is None:
        raise RuntimeError("serialize failed")

    with open(plan_name, "wb") as fout:
        fout.write(serialized_engine)

if __name__ == '__main__':
    plan_name = "engine.plan"

    x_arr = np.load("0.npy")
    dim = x_arr.shape[1]

    # M = 50
    # dim = 1024
    # x_arr = np.random.rand(M, dim).astype(np.float32)
    
    print(x_arr.shape)
    print(x_arr.sum())
    print(x_arr)
    build(plan_name, dim)

    infer_helper = InferHelper(plan_name, logger)
    infer_helper.infer([x_arr])
  • 测试行为如下
  1. 只有一个topk输出,main函数中,读取0.npy,结果为nan
  2. 使用rand 随机数,有结果(没有检查是否正确)
  3. softmax和topk的结果都mark_output,有结果(没有检查是否正确)
  4. 运行结束后报错 Segmentation fault

trt2022/dev镜像workspace目录下缺少东西

使用docker pull registry.cn-hangzhou.aliyuncs.com/trt2022/dev拉取镜像缺少东西
没有一下这些
/workspace/buildFromWorkspace.sh
/workspace/encoder.onnx
/workspace/decoder.onnx
/workspace/data/*.npz
是因为镜像更新了吗?以上这些文件哪里可以下载?

3D to 2D 算子优化 shpae = None 疑问

你好,我参考2022初赛ppt优化encoder.onnx模型,不过发现ppt里对应的node在onnx-graphsurgeon里打印出来shape时None(ppt里展示是Bxt4x256),这样的是否无法做reshape了呢?

我的node查看代码是这样的:

from collections import OrderedDict
import numpy as np
import onnx
import onnx_graphsurgeon as gs
import os
import tensorrt as trt
from onnx import shape_inference



input_onnx = "encoder.onnx"

original_model = onnx.load(input_onnx)
# inferred_model = shape_inference.infer_shapes(original_model)
graph = gs.import_onnx(original_model)
# print(inferred_model.graph.value_info)

for node in graph.nodes:
    if node.name == "MatMul_379":
        pass

AddPlugin examples need to set plugin field in the constructor of PluginCreator

  • Environment
    TensorRT 8.4 GA

  • Reproduction Steps

## Code file 1
## use AddScalarPlugin example
import onnx_graphsurgeon as gs
import onnx
from collections import OrderedDict
import numpy as np

soFile = "./AddScalarPlugin.so"
onnxFile = "./model.onnx"

ctypes.cdll.LoadLibrary(soFilePath)

inputs = gs.Variable(
    name="inputs", dtype=np.float32, shape=["batch", "seq", 256])
outputs = gs.Variable(
    name="outputs", dtype=np.float32, shape=["batch", "seq", 256])
nodes = [
    gs.Node(
        op="AddScalar",
        name="AddScalar_1",
        attrs=OrderedDict(scalar=np.array([2.0], dtype=np.float32)),
        inputs=[inputs],
        outputs=[outputs]
    )
]

graph = gs.Graph(
    nodes=nodes, inputs=[inputs], outputs=[outputs], opset=13,
    name="onnx")
model = gs.export_onnx(graph=graph)
onnx.save(model, onnxFile)
## Code file 2
## use AddScalarPlugin example
import tensorrt as trt
import numpy as np
import ctypes

soFile = "./AddScalarPlugin.so"
onnxFile = "./model.onnx"
logger = trt.Logger(trt.Logger.VERBOSE)
trt.init_libnvinfer_plugins(logger, '')
ctypes.cdll.LoadLibrary(soFile)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
profile = builder.create_optimization_profile()
config = builder.create_builder_config()
config.max_workspace_size = 6 << 30
parser = trt.OnnxParser(network, logger)
parser.parse_from_file(onnxFile)
  • Backgroud
    When using OnnxParser or polygraphy to load plugin with attribute, it would be no plugin field when plugin::createPlugin is called, which means that it fails to load the attribute of plugin from onnx model.

  • How to fix
    Take AddScalarPlugin::createPlugin as an example, these two lines are necessary before getting the size and data for plugin fields:

attr_.clear();  
attr_.emplace_back(PluginField("scalar", nullptr, PluginFieldType::kFLOAT32, 1));

could author explain dynamic_axes?

In the file 04-Parser/pyTorch-ONNX-TensorRT/pyTorchToTensorRT.py
code exports a dynamic shape with below code.
dynamic_axes={"x": {0: "nBatchSize"}, "z": {0: "nBatchSize"}

It runs well. I have some question about
(1) nBatchSize is not define in this file, why and how to use this variable
(2)i can understand x is dynamic, but why "z"?

thanks.

当我基于cookbook的04-Parser文件pyTorch-ONNX-TensorRT,转换.plane文件,在转换RobustVideoMatting项目时候遇到一个下采样率的问题,该如何处理?

https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.onnx
这是我ONNX转换脚本
image
这个RVM抠像项目的onnx细节实现
image
这是解析日志:最后两行为错误日志
E:\Win_Anaconda3\envs\TensorRTPy\python.exe E:/project/TensorRT_project/trt-samples-for-hackathon-cn/cookbook/04-Parser/pyTorch-ONNX-TensorRT/RVM_python_auto/main.py
Succeeded converting model into onnx!
[09/03/2022-13:53:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +323, GPU +0, now: CPU 7241, GPU 1139 (MiB)
[09/03/2022-13:53:26] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +192, GPU +68, now: CPU 7537, GPU 1207 (MiB)
Succeeded finding onnx file!
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GridAnchor_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GridAnchorRect_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::NMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Reorg_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Region_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Clip_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::LReLU_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PriorBox_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Normalize_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ScatterND version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::RPROI_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchedNMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::FlattenConcat_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CropAndResize version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CropAndResizeDynamic version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::DetectionLayer_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ProposalDynamic version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Proposal version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ProposalLayer_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::ResizeNearest_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::Split version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::SpecialSlice_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::InstanceNormalization_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::InstanceNormalization_TRT version 2
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::CoordConvAC version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::DecodeBbox3DPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::GenerateDetection_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::NMSDynamic_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::PillarScatterPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::VoxelGeneratorPlugin version 1
[09/03/2022-13:53:26] [TRT] [V] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[09/03/2022-13:53:26] [TRT] [V] Adding network input: src with dtype: float32, dimensions: (-1, 3, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: src for ONNX tensor: src
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r1i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r1i for ONNX tensor: r1i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r2i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r2i for ONNX tensor: r2i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r3i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r3i for ONNX tensor: r3i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: r4i with dtype: float32, dimensions: (-1, -1, -1, -1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: r4i for ONNX tensor: r4i
[09/03/2022-13:53:26] [TRT] [V] Adding network input: downsample_ratio with dtype: float32, dimensions: (1)
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: downsample_ratio for ONNX tensor: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.4.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.5.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.6.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.11.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.12.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.13.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.14.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc1.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc2.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: backbone.features.15.block.2.fc2.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: aspp.aspp2.1.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode4.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode3.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode2.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.ih.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.ih.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.hh.0.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: decoder.decode1.gru.hh.0.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: project_pha.conv.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: project_pha.conv.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.box_filter_pha.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.conv_pha.6.weight
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: refiner.conv_pha.6.bias
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 817
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 818
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 820
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 821
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 823
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 824
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 826
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 827
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 829
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 830
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 832
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 833
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 835
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 836
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 838
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 839
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 841
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 842
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 844
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 845
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 847
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 848
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 850
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 851
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 853
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 854
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 856
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 857
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 859
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 860
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 862
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 863
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 865
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 866
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 868
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 869
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 871
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 872
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 874
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 875
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 877
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 878
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 880
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 881
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 883
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 884
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 886
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 887
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 889
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 890
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 892
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 893
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 895
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 896
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 898
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 899
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 901
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 902
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 904
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 905
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 907
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 908
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 910
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 911
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 913
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 914
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 916
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 917
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 919
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 920
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 922
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 923
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 925
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 926
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 928
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 929
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 931
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 932
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 934
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 935
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 937
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 938
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 940
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 941
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 943
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 944
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 946
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 947
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 949
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 950
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 952
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 953
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 955
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 956
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 958
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 959
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 961
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 962
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 964
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 965
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 967
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 968
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 970
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 971
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 973
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 974
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 976
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 977
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 978
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 979
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 980
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 981
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 982
[09/03/2022-13:53:26] [TRT] [V] Importing initializer: 983
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Constant_0 [Constant]
[09/03/2022-13:53:26] [TRT] [V] Constant_0 [Constant] inputs:
[09/03/2022-13:53:26] [TRT] [V] Constant_0 [Constant] outputs: [388 -> (0)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Constant_1 [Constant]
[09/03/2022-13:53:26] [TRT] [V] Constant_1 [Constant] inputs:
[09/03/2022-13:53:26] [TRT] [V] Constant_1 [Constant] outputs: [389 -> (2)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Concat_2 [Concat]
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 389
[09/03/2022-13:53:26] [TRT] [V] Searching for input: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Searching for input: downsample_ratio
[09/03/2022-13:53:26] [TRT] [V] Concat_2 [Concat] inputs: [389 -> (2)[FLOAT]], [downsample_ratio -> (1)[FLOAT]], [downsample_ratio -> (1)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Registering layer: 389 for ONNX node: 389
[09/03/2022-13:53:26] [TRT] [V] Registering layer: Concat_2 for ONNX node: Concat_2
[09/03/2022-13:53:26] [TRT] [V] Registering tensor: 390 for ONNX tensor: 390
[09/03/2022-13:53:26] [TRT] [V] Concat_2 [Concat] outputs: [390 -> (4)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Parsing node: Resize_3 [Resize]
[09/03/2022-13:53:26] [TRT] [V] Searching for input: src
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 388
[09/03/2022-13:53:26] [TRT] [V] Searching for input: 390
[09/03/2022-13:53:26] [TRT] [V] Resize_3 [Resize] inputs: [src -> (-1, 3, -1, -1)[FLOAT]], [388 -> (0)[FLOAT]], [390 -> (4)[FLOAT]],
[09/03/2022-13:53:26] [TRT] [V] Registering layer: Resize_3 for ONNX node: Resize_3
[09/03/2022-13:53:26] [TRT] [V] Running resize layer with:
Transformation mode: pytorch_half_pixel
Resize mode: linear

[09/03/2022-13:53:26] [TRT] [E] [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1237] Error Code 4: Internal Error (downsample_ratio: network input that is shape tensor must have type Int32) Failed parsing .onnx file! In node 3 (parseGraph): INVALID_NODE: Invalid Node - Resize_3 [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1237] Error Code 4: Internal Error (downsample_ratio: network input that is shape tensor must have type Int32)

Process finished with exit code 0

core dumped,分析是onnxsim和tensorrt同时存在报free(): invalid pointer异常

在进行python代码调试时,core dumped,分析是onnxsim和tensorrt同时存在报free(): invalid pointer异常

from onnxsim import simplify
import tensorrt as trt

if name == 'main':
pass

python bug.py

free(): invalid pointer
Aborted (core dumped)

pip3 list|grep onnx-sim
onnx-simplifier 0.3.10

pip3 list|grep tensorrt
nvidia-tensorrt 8.2.4.2
tensorrt 8.2.4.2

cookbook/03-APIModel/MNISTExample-pyTorch的C++例程在WSL中运行时提示 ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

cookbook/03-APIModel/MNISTExample-pyTorch中的C++例程可以正常编译,运行结果也是正确的。但是会在运行时提示错误 ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

执行make test

$ make test
make clean
make[1]: Entering directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
rm -rf ./*.d ./*.o ./*.so ./*.exe ./*.plan
make[1]: Leaving directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
make -j3
make[1]: Entering directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT main.o -o main.d main.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT cnpy.o -o cnpy.d cnpy.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -M -MT calibrator.o -o calibrator.d calibrator.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o calibrator.o -c calibrator.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o main.o -c main.cpp
/usr/local/cuda/bin/nvcc -w -std=c++14 -O3 -UDEBUG -Xcompiler -fPIC -use_fast_math -I. -I/usr/local/cuda/include -I/opt/TensorRT-8.4.3.1/include -Xcompiler -fPIC -o cnpy.o -c cnpy.cpp
/usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -lcudart -L/opt/TensorRT-8.4.3.1/lib -lnvinfer -lz -o main.exe main.o cnpy.o calibrator.o
make[1]: Leaving directory '/home/dongyang/trt-samples-for-hackathon-cn/cookbook/03-APIModel/MNISTExample-pyTorch/C++'
python3 ./createCalibrationAndInferenceData.py
Succeeded creating data for calibration and inference!
./main.exe > result-C++.log
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)

执行./main.exe

$ ./main.exe
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
ERROR: 2: [virtualMemoryBuffer.cpp::resizePhysical::144] Error Code 2: OutOfMemory (no further information)
Succeeded building serialized engine!
Succeeded building engine!
Binding all? Yes
Bind[0]:i[0]->FLOAT (1, 1, 28, 28) inputT0
Bind[1]:o[0]->INT32 (1, 1) (Unnamed Layer* 17) [TopK]_output_2

inputT0: (1, 1, 28, 28, )
absSum=33566.0000,mean=42.8138,var=7573.8174,max=255.0000,min= 0.0000,diff=15760.0000,
 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 
 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 

(Unnamed Layer* 17) [TopK]_output_2: (1, 1, )
absSum= 8.0000,mean= 8.0000,var= 0.0000,max=      8,min=      8,diff= 0.0000,
       8, 
       8, 
     8 

环境:
Windows 11(22000.856)WSL2
GPU:NVIDIA GeForce RTX 3070 Laptop 驱动版本 512.78
CUDA: 11.6.2 sh文件安装
CUDNN: 8.4.1.50 Tar文件安装
TensorRT: 8.4.3.1 Tar文件安装

conda:
cuda-python 11.6
cudatoolkit 11.6
cudnn 8.4.1.50

where is the int8.cache file?

I run the pyTorchToTensor.py file. In the code ,there is:
config.int8_calibrator = calibrator.MyCalibrator(calibrationDataPath, calibrationCount, (1, 1, imageHeight, imageWidth), cacheFile)
and cacheFile='./int8.cache'
where is the int8.cache file

batchedNMSPlugin is not compatible for TensorRT8?

according to batchedNMSPlugin,batchedNMSPlugin is not compatible for TensorRT8.

I have added batchedNMSPlugin to tensorrt model.
And it runs fine except the speed is slow.
Is it the reason that leads to the speed?
And how can I add nms to the model for reducing the size of output tensor while transferring the output by the network?

Plugin 申请资源位置 会导致资源浪费,可优化

  • Environment

    • TensorRT 7x +
    • 与其他相关软件版本无关
  • 背景

  1. TensorRT-OSS 中提供的plugin(如embLayerNormPlugin, fcPlugin),资源(显存,句柄等)申请,自7.x之后,是放到了构造函数中。7.x之前我记得是放到initialize()函数中
  2. build阶段和infer阶段,trt plugin各个函数调用顺序如下
// build 阶段
1. Plugin::Plugin
2. Plugin::clone
3. Plugin::Plugin
4. Plugin::destroy
5. Plugin::clone
6. Plugin::Plugin
7. Plugin::clone
8. Plugin::Plugin
9. Plugin::clone
10. Plugin::Plugin
11. Plugin::destroy
12. Plugin::initialize
13. Plugin::destroy
14. Plugin::terminate
15. Plugin::destroy
16. Plugin::destroy

// infer 阶段
1. Plugin::deserialize_value
2. Plugin::initialize
3. Plugin::clone
4. Plugin::Plugin
5. Plugin::enqueue
6. Plugin::terminate
7. Plugin::destro
  • 发现的问题
  1. plugin中申请的资源,期望在build or infer阶段中,只保留一份。
  2. 资源申请是放到了构造函数中,每clone一个plugin类,就会申请一份资源。
  3. 根据上述函数调用顺序可以发现,在build阶段,第11步 destroy之前,内存中存有四份资源。
  • 可能导致的严重后果
  1. trt在build阶段,对显存的消耗本来就比较大。
  2. 以我所做的一个AI大模型为例,某一个层的权值共32M,共有18层。那么会多额外占 18 * 32 * 3 = 1728M 显存。
  3. 目前AI模型越来越大,合并的算子也越来越大,这个问题会愈发明显。
  • 尝试解决方案
  1. 对于权值比较好解决,将权值以输入的形式送入即可(比如groupNormalizationPlugin)。
  2. 但对于句柄等资源比较麻烦(stream, cublas_handle的申请都会占用显存)。资源申请放到initialize里的话,infer阶段会在initialize后又clone一次……我现在想到的解决方案是写三个构造函数,一个createPlugin时调用,一个clone时调用,一个deserialize时调用,资源申请放到一三中。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.