nvidia-ai-iot / cuda-pointpillars Goto Github PK

View Code? Open in Web Editor NEW

506.0 8.0 148.0 35.01 MB

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

License: Apache License 2.0

C++ 25.61% Cuda 29.60% CMake 0.87% Python 41.41% Dockerfile 0.22% Shell 2.28%

cuda-pointpillars's Introduction

PointPillars Inference with TensorRT

This repository contains sources and model for pointpillars inference using TensorRT.

Overall inference has below phases:

Voxelize points cloud into 10-channel features
Run TensorRT engine to get detection feature
Parse detection feature and apply NMS

Prerequisites

Prepare Model && Data

We provide a Dockerfile to ease environment setup. Please execute the following command to build the docker image after nvidia-docker installation:

cd docker && docker build . -t pointpillar

We can then run the docker with the following command:

nvidia-docker run --rm -ti -v /home/$USER/:/home/$USER/ --net=host --rm pointpillar:latest

For model exporting, please run the following command to clone pcdet repo and install custom CUDA extensions:

git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet && git checkout 846cf3e && python3 setup.py develop

Download PTM to ckpts/, then use below command to export ONNX model:

python3 tool/export_onnx.py --ckpt ckpts/pointpillar_7728.pth --out_dir model

Use below command to evaluate on kitti dataset, follow Evaluation on Kitti to get more detail for dataset preparation.

sh tool/evaluate_kitti_val.sh

Setup Runtime Environment

Nvidia Jetson Orin + CUDA 11.4 + cuDNN 8.9.0 + TensorRT 8.6.11

Compile && Run

sudo apt-get install git-lfs && git lfs install
git clone https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars.git
cd CUDA-PointPillars && . tool/environment.sh
mkdir build && cd build
cmake .. && make -j$(nproc)
cd ../ && sh tool/build_trt_engine.sh
cd build && ./pointpillar ../data/ ../data/ --timer

FP16 Performance && Metrics

Average perf in FP16 on the training set(7481 instances) of KITTI dataset.

| Function(unit:ms) | Orin   |
| ----------------- | ------ |
| Voxelization      | 0.18   |
| Backbone & Head   | 4.87   |
| Decoder & NMS     | 1.79   |
| Overall           | 6.84   |

3D moderate metrics on the validation set(3769 instances) of KITTI dataset.

|                   | Car@R11 | Pedestrian@R11 | Cyclist@R11  | 
| ----------------- | --------| -------------- | ------------ |
| CUDA-PointPillars | 77.00   | 52.50          | 62.26        |
| OpenPCDet         | 77.28   | 52.29          | 62.68        |

Note

Voxelization has random output since GPU processes all points simultaneously while points selection for a voxel is random.

References

cuda-pointpillars's People

Contributors

Stargazers

Watchers

Forkers

lyp-deeplearning guoxs chomolungma zhangjiefeng sinead-li zmmyc chaomath yxf010 kleinyuan xiangzhaohong collector-m jcuic5 jlqzzz jaminjiang mchi-zg aimicm songjunqianli rob-opsi deepbehavier apvgithub hilbertxu 92chyf thomas-kb drzhoukarl findlamp enginbozkurt poet-libai intertwistlet chasingw yukke42 iloveai8086 gujiaqivadin yeahtech miaorain dawnchen123 maxpark shanjiayao baofangyan1 roboterl 6master6 xhh1566 kimhongsuk minho-lidar-detection arslan-z adanwang stoneshuyao byte-deve guyuezuntinggithub isabella232 evanmey ywfwyht callmebylxh lvdongxu tjuzc ryanyej wanrq sissini zyxcambridge peterjaq dl19940602 fucker007 zhengfangwu perimeter-inc mazm0002 acburigo classicvalues hanmakaidao robbie-juelich huyuanchao jjho1314 arnoldfychen hoangduyloc wcf1065948474 rqbrother bennyustc mengxingshifen1218 wangy69 btryq hacunamatada liukang1811 guochaodlnu mrhagchwh cush07 paleomoon rjwb1 mediumcore zzningxp jiegeng321 autra-weiliu applededipan jizhishutong wizyke allamrahul jimbomathis andrewjsong ziweisong96 lunwk frankgty cdefg pengcheng001

cuda-pointpillars's Issues

calculate offset in the file which named preprocess_kernels.cu

at line 323 of preprocess_kernels.cu, the code calcuate offset.

//calculate offset
float x_offset = voxel_x / 2 + cordsSM[pillar_idx_inBlock].w * voxel_x + range_min_x;
float y_offset = voxel_y / 2 + cordsSM[pillar_idx_inBlock].z * voxel_y + range_min_y;
float z_offset = voxel_z / 2 + cordsSM[pillar_idx_inBlock].y * voxel_z + range_min_z;

I think the w means intensity，
when calcuate x_offset, why voxel_x multiply by cordsSM[pillar_idx_inBlock].w, not cordsSM[pillar_idx_inBlock].x
when calcuate y_offset, why voxel_y multiply by cordsSM[pillar_idx_inBlock].z, not cordsSM[pillar_idx_inBlock].y
when calcuate z_offset, why voxel_z multiply by cordsSM[pillar_idx_inBlock].y, not cordsSM[pillar_idx_inBlock].z

When I did the cmake .. && make -j64 and those error came out, is there anyone know how to fix this?

Support for older TRT 5.1.4

Hi, would it be possible to run the model on an older version of TRT/CUDNN/CUDA?
We are using the DRIVE AGX with Drive Software 10.0 with TRT 5.1.4. Even the latest Drive SDK does not provide TRT 8.4.0. so it seems like a problem right now.
If it can be done, can you please provide some instructions on how to do this?
Thanks.

The result of running the program multiple times is different

The code runs repeatedly, and “Bndbox objs:” from the same point cloud image is not the same. Why?

Is it necessary to use the simplifier_onnx.py

error: ‘virtual nvinfer1::IExecutionContext::~IExecutionContext()’ is protected within this context

When I run:
cmake .. && make -j8

it shows:

Does anyone know how to fix it?

trt_infer: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed. )

I use the tensorRT8.4, when the engine inference have this error.

vscode cuda debug config

.vscode for cuda debug can share

PostProcessCuda too slow when update code to commit 4e8e4f3

TIME: doPostprocessCuda rise to around 80000ms when I use the lastest code(commit 4e8e4f3), before(commit db037d2) the number was around 5ms. Also the bndbox nums is too large.

the gpu info:
GPU : Orin
Capbility: 8.7
Global memory: 30622MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

the run time detail
<<<<<<<<<<<
load file: ../data/000000.bin
find points num: 125635
find pillar_num: 9539
TIME: generateVoxels: 0.97344 ms.
TIME: generateFeatures: 1.00912 ms.
TIME: doinfer: 57.9808 ms.
TIME: doPostprocessCuda: 57716.1 ms.
TIME: pointpillar: 57777.3 ms.
Bndbox objs: 3061

Work for Drive Orin?

Hi,

I am wondering if this works for Drive Orin environment?

when I generate onnx by exporter.py, i got the error

['Car', 'Pedestrian', 'Cyclist']
3
0 -39.68 -3 69.12 39.68 1
[0.16, 0.16, 4]
32
40000
4
64
0.78539
0.0
2
[3.9, 1.6, 1.56, 0.0, 3.9, 1.6, 1.56, 1.57, 0.8, 0.6, 1.73, 0.0, 0.8, 0.6, 1.73, 1.57, 1.76, 0.6, 1.73, 0.0, 1.76, 0.6, 1.73, 1.57]
[-1.78, -0.6, -0.6]
0.1
0.01
anchors:      const float anchors[num_anchors * len_per_anchor] = {
      3.9,1.6,1.56,0.0,
      3.9,1.6,1.56,1.57,
      0.8,0.6,1.73,0.0,
      0.8,0.6,1.73,1.57,
      1.76,0.6,1.73,0.0,
      1.76,0.6,1.73,1.57,
      };

anchors:      const float anchor_bottom_heights[num_classes] = {-1.78,-0.6,-0.6,};

########
2022-03-15 11:08:59,269   INFO  ------ Convert OpenPCDet model for TensorRT ------
2022-03-15 11:09:05,030   INFO  ==> Loading parameters from checkpoint ../../checkpoint_epoch_1.pth to CPU
2022-03-15 11:09:05,171   INFO  ==> Checkpoint trained from version: pcdet+0.3.0+0642cf0
2022-03-15 11:09:05,462   INFO  ==> Done (loaded 127/127)
/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_3d/vfe/pillar_vfe.py:45: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if inputs.shape[0] > self.part:
/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py:31: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = coords[:, 0].max().int().item() + 1
Traceback (most recent call last):
  File "exporter.py", line 150, in <module>
    main()
  File "exporter.py", line 135, in main
    output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/__init__.py", line 208, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 92, in export
    use_external_data_format=use_external_data_format)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 530, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 366, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 319, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 338, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 426, in forward
    self._force_outplace,
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 412, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/detectors/pointpillar.py", line 31, in forward
    spatial_features_2d = self.module_list[2](spatial_features) #"BaseBEVBackbone"
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_2d/base_bev_backbone.py", line 103, in forward
    stride = int(spatial_features.shape[2] / x.shape[2])
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.```



when I generate onnx by exporter.py, i got the error 。 How can i fix it?

error inn make -j(nproc)

Hello.
I tried to install this repo, but I got error as shown below.
Is there any solution to take over this problem?

/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libdl.so when searching for -ldl
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libdl.a when searching for -ldl
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/librt.a when searching for -lrt
/usr/bin/ld: cannot find -lnvinfer
/usr/bin/ld: cannot find -lnvonnxparser
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libpthread.a when searching for -lpthread
collect2: error: ld returned 1 exit status
CMakeFiles/demo.dir/build.make:951: recipe for target 'demo' failed
make[2]: *** [demo] Error 1
CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/demo.dir/all' failed
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

i use tenrorrt8.4 and when i am running ./demo ,Building TRT engine: there are some errors:
Building TRT engine.
trt_infer: Could not register plugin creator - ::PillarScatterPlugin version 1
trt_infer: parsers/onnx/ModelImporter.cpp:780: While parsing node number 4 [ScatterBEV -> "479"]:
trt_infer: parsers/onnx/ModelImporter.cpp:781: --- Begin node ---
trt_infer: parsers/onnx/ModelImporter.cpp:782: input: "403"
input: "coords"
input: "params"
output: "479"
name: "onnx_graphsurgeon_node_0"
op_type: "ScatterBEV"
trt_infer: ModelImporter.cpp:751: --- End node ---
trt_infer: ModelImporter.cpp:754: ERROR: builtin_op_importers.cpp:4951 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
: failed to parse onnx model file, please check the onnx version and trt support op!
How can i fix the problem?

How to modify this reipo to make it work on amd64 computer?

Hi,

thanks for your amazing work!
Could you please tell me which part of this repo should be modified if I want to deploy this system on amd64 computer?

parse onnx model wrong

trt_infer: 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 97)
trt_infer: 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

Where should I find the result?

load file: /media/dk/2eee4ea8-6028-41ef-89c5-8f36a982bc1d/dk/kitti_dataset/testing/velodyne/000999.bin
find points num: 115279
find pillar_num: 7982
TIME: generateVoxels: 0.036864 ms.
TIME: generateFeatures: 0.031424 ms.
TIME: doinfer: 7.19277 ms.
TIME: doPostprocessCuda: 0.754336 ms.
TIME: pointpillar: 8.06592 ms.
Bndbox objs: 37
I run it om my 2080Ti,but I do not know where can i find the result.

catkin_make compiler fault

using catkin_make to generate ros project, fault which shows "cuda failure"invalid device function at ./.cpp error status 98" occurs when running. it seem that cmakelist should be fixed. How to do ?

change POINT_CLOUD_RANGE

When I change POINT_CLOUD_RANGE to [0, -39.68, -5, 102.4, 39.68, 5],I got an incorrect inference result.
My VOXEL_SIZE is [0.16, 0.16, 10],How can I solve this problem?

How to visualize it?

Hi all,

I want to use pointpillar to detect a KITTI tracking dataset and convert it to video format and then visualize it.
Does anyone done this before, could u help me and give me some advice.
Thanks in advance.

Barry

export onnx error

When I follow export gride export onnx from pointpillar_7729.pth

I found this error

root@pc-MS-7B89:/workspace/ssh-docker/workspace/CUDA-PointPillars/tool# python exporter.py --ckpt ../model/pointpillar_7729.pth
2022-06-08 10:17:39,104 INFO ------ Convert OpenPCDet model for TensorRT ------
2022-06-08 10:17:40,746 INFO ==> Loading parameters from checkpoint ../model/pointpillar_7729.pth to CPU
2022-06-08 10:17:40,760 INFO ==> Done (loaded 127/127)
Traceback (most recent call last):
File "exporter.py", line 150, in
main()
File "exporter.py", line 126, in main
torch.onnx.export(model, # model being run
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/init.py", line 225, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 85, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 632, in _export
_model_to_graph(model, args, verbose, input_names,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 409, in _model_to_graph
graph, params, torch_out = _create_jit_graph(model, args,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 379, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 342, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 1148, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 93, in forward
in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type int

Multi head pointpillar onnx export fail.

How to use multi head pointpillar model?

python exporter.py ./checkpoint.pth trt_infer:ModelImporter.cpp:754:ERROR:builtin_op_importers.cpp:4951 In function importFallbackPluginImporter:

Cant find a pytorch version matched with CUDA11.4

I just try to transform my own pth to onnx ,but exporter.py has an issue "report pytorch" ,
i try to set the env as readme in tools ,but cant find a pytorch1.11.0 with cuda11.4 (pytorch.org only has cu113,cu115,cu116)

thank you very much

Do you evaluate the metrics of the implemented model on TRT?

like mAP on kitti or mAPH on waymo?

waymo数据集测试效果差，对输入有什么要求？

KITTI数据集能检测障碍物，效果一般，waymo数据集很差，是为什么呢

What should I do if I want more labels？

Hello, dear developer
If the model I retrained contains more labels, how can I modify the original code? Please give me some advice
For example, labels of my model are "car", "pedestrian","cyclist","indicator" and "truck", while the open source code is only "car", "pedestrian" and "cyclist".
Thank you very much!

build fails: error: initialization with "{...}" is not allowed for object of type "dim3"

Hello, I am using

$ cmake -version
cmake3 version 3.17.5

CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ gcc   -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/workdir/local/gcc-5.4.0/bin/../libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/work/data/local/gcc-5.4.0 --enable-threads=posix --disable-checking --disable-multilib --enable-languages=c,c++ --with-gmp=/home/work/data/local/gmp4.3.2 --with-mpfr=/home/work/data/local/mpfr-2.4.2 --with-mpc=/home/work/data/local/mpc-0.8.1
Thread model: posix
gcc version 5.4.0 (GCC)
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

And I got the following errors,

cmake .. && make -j$(nproc)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/work/cuda-pointpillars/build
[ 11%] Building NVCC (Device) object CMakeFiles/demo.dir/src/demo_generated_preprocess_kernels.cu.o
[ 22%] Building NVCC (Device) object CMakeFiles/demo.dir/src/demo_generated_pillarScatterKernels.cu.o
/home/work/cuda-pointpillars/include/params.h(24): warning: field initializers are a C++11 feature

/home/work/cuda-pointpillars/include/params.h(24): warning: field initializers are a C++11 feature

/home/work/cuda-pointpillars/src/preprocess_kernels.cu(173): error: initialization with "{...}" is not allowed for object of type "dim3"

/home/work/cuda-pointpillars/src/preprocess_kernels.cu(174): error: initialization with "{...}" is not allowed for object of type "dim3"

/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(97): error: explicit type is missing ("int"assumed)

/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(99): error: argument of type "int" is incompatible with parameter of type "cudaError_t"

/home/work/cuda-pointpillars/src/preprocess_kernels.cu(211): error: expected an expression

/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(119): error: explicit type is missing ("int" assumed)

/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(121): error: argument of type "int" is incompatible with parameter of type "cudaError_t"

4 errors detected in the compilation of "/home/work/cuda-pointpillars/src/pillarScatterKernels.cu".
3 errors detected in the compilation of "/home/work/cuda-pointpillars/src/preprocess_kernels.cu".
CMake Error at demo_generated_preprocess_kernels.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/work/cuda-pointpillars/build/CMakeFiles/demo.dir/src/./demo_generated_preprocess_kernels.cu.o


CMake Error at demo_generated_pillarScatterKernels.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/work/cuda-pointpillars/build/CMakeFiles/demo.dir/src/./demo_generated_pillarScatterKernels.cu.o


make[2]: *** [CMakeFiles/demo.dir/src/demo_generated_preprocess_kernels.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/demo.dir/src/demo_generated_pillarScatterKernels.cu.o] Error 1
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
make: *** [all] Error 2

Could you provide the correct version to compile the project?

How to use docker? Am I wrong?

dk@dk-MS-7B94:~/CUDA-PointPillars$ sudo docker run --rm --gpus all -ti -v /home/dk/:/workspace/ssh-docker --net=host scrin/dev-spconv:f22dd9aee04e2fe8a9fe35866e52620d8d8b3779

Restarting OpenBSD Secure Shell server sshd [ OK ]
root@dk-MS-7B94:# ls
get-pip.py spconv vcpkg
root@dk-MS-7B94:#

demo: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Hello, I could compile，when run ./demo ,get this error.

Building TRT engine.
../model/pointpillar.onnxtrt_infer: ModelImporter.cpp:773: While parsing node number 6 [PillarScatterPlugin -> "input.3"]:
trt_infer: ModelImporter.cpp:774: --- Begin node ---
demo: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
已放弃 (核心已转储)
My cuda and tensorrt versions are:

CUDA: 11.1
cuDNN: 8.4.1
TensorRT: 8.4.1

Thanks in advance.

the version in apirl may have some bug for the model inference

Kitti model can get result and effect is normal. however, I change the dateset to dair, the model can inference in the Dec, version, but not success in Apirl version. the apirl version output the ramdom result.

another question, can you submit a cpu pillar version?? the cuda random pillar cause the pillar features are not steady.

nms_cpu(res_, params_.nms_thresh, nms_pred);

so,nms run in cpu,not calculate in fps ?

the all cost time not include cpu ?

Cuda failure: the provided PTX was compiled with an unsupported toolchain. at line 108 in file /share/CUDA-PointPillars.bak/test/main.cpp error status: 222

root@92739a255d9f:/share/CUDA-PointPillars.bak/test/build# ./demo

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA GeForce RTX 3060 Laptop GPU
Capbility: 8.6
Global memory: 5946MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Cuda failure: the provided PTX was compiled with an unsupported toolchain. at line 108 in file /share/CUDA-PointPillars.bak/test/main.cpp error status: 222
Aborted (core dumped)
root@92739a255d9f:/share/CUDA-PointPillars.bak/test/build# exit

Got error when run demo

Hi,
I use
jetson nano 2 gb
Jetpack 4.6
CUDA 10.2
tensorrt 8.0
onnx 1.8

After compling, I run the demo, got error:
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X1
Capbility: 5.3
Global memory: 1979MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Building TRT engine.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type “onnx2trt_onnx.ModelProto” has no field named “version”.
trt_infer: ModelImporter.cpp:682: Failed to parse ONNX model from file: …/…/model/pointpillar.onnx
: failed to parse onnx model file, please check the onnx version and trt support op!

So can you tell me how to solve this error or which version of onnx do you use?

export model to onnx

Hi, Nvidia AI team. thanks for your opensource sample code for deploying the pointpillar on Xavier.
By the export to onnx part, I have a question. How can you ensure you only export the middel part of the network(after voxelization and encode to 10 feature per pillar), not the whole part which include voxelization, pillar feature extraction, scatter to bev, backbone, postprocess?

 torch.onnx.export(model,                   # model being run
          (dummy_voxel_features, dummy_voxel_num_points, dummy_coords), # model input (or a tuple for multiple inputs)
          "./pointpillar.onnx",    # where to save the model (can be a file or file-like object)
          export_params=True,        # store the trained parameter weights inside the model file
          opset_version=11,          # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          keep_initializers_as_inputs=True,
          input_names = ['input', 'voxel_num_points', 'coords'],   # the model's input names
          output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
          )

thanks!

illegal memory access

Cuda failure: an illegal memory access was encountered at line 306 in file .../CUDA-PointPillars/src/pointpillar.cpp error status: 700

The size of tensor a (12160) must match the size of tensor b (199680) at non-singleton dimension 1

I've met a problem, have you ever dealt with it?

Traceback (most recent call last): | 0/3400 [00:00<?, ?it/s]
File "network.py", line 116, in
train_model(
File "/home/yueye/code/3D-MAN/tools/train_utils/train_utils.py", line 84, in train_model
accumulated_iter = train_one_epoch(
File "/home/yueye/code/3D-MAN/tools/train_utils/train_utils.py", line 36, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/yueye/code/3D-MAN/pcdet/models/init.py", line 42, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/home/yueye/anaconda3/envs/3dman/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yueye/code/3D-MAN/pcdet/models/detectors/pointpillar.py", line 14, in forward
loss, tb_dict, disp_dict = self.get_training_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/detectors/pointpillar.py", line 27, in get_training_loss
loss_rpn, tb_dict = self.dense_head.get_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/dense_heads/anchor_head_template.py", line 217, in get_loss
cls_loss, tb_dict = self.get_cls_layer_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/dense_heads/anchor_head_template.py", line 128, in get_cls_layer_loss
cls_loss_src = self.cls_loss_func(cls_preds, one_hot_targets, weights=cls_weights) # [N, M]
File "/home/yueye/anaconda3/envs/3dman/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yueye/code/3D-MAN/pcdet/utils/loss_utils.py", line 59, in forward
pt = target * (1.0 - pred_sigmoid) + (1.0 - target) * pred_sigmoid
RuntimeError: The size of tensor a (12160) must match the size of tensor b (199680) at non-singleton dimension 1

deepstream

能用deepstream优秀的流水线处理来实现 cuda-pointpillars

When I change range, pgm cost lots time.

Hi, I change the detection range x from [0,69.12] to [-69.12, 69.12] . Others is the same as the repo. But the res_.size() is 321945 before nms_cpu() in src/pointpillar.cpp . And infer time cost more than 15min.
data: 000000.bin
pointpillar_7728.pth: download from OpenPCDet

I don't know if I need to modify the other code at the same time or if I need to configure something.
Thanks for sharing the code. Looking forward to reply.

/usr/bin/ld: cannot find -lnvinfer

Hello
Thanks for sharing your great work

When I do make -j8, I got some error and I cannot find the solution of it.
Can I get some advice to solve this problem?

Thanks a lot in advance.

~/CUDA-PointPillars/build$ make -j8
[ 11%] Linking CXX executable demo

/usr/bin/ld: cannot find -lnvinfer
/usr/bin/ld: cannot find -lnvonnxparser
collect2: error: ld returned 1 exit status
CMakeFiles/demo.dir/build.make:953: recipe for target 'demo' failed
make[2]: *** [demo] Error 1
CMakeFiles/Makefile2:94: recipe for target 'CMakeFiles/demo.dir/all' failed
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2

hi, how to change FP32 to FP16? Thanks!

Got TracerWarnings converting a new onnx file and then got errors running the demo

Hey guys,
I use
cuda 11.4
cudnn 8.2.4
tensorrt 8.4.0.6
onnx 1.11.0
I downloaded the .pth file and tried to convert a new onnx file with tool/exporter.py.
Got the warnings below:

2022-05-13 17:43:39,263 INFO ------ Convert OpenPCDet model for TensorRT ------
/home/wh/anaconda3/envs/whenv2/lib/python3.9/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646756402876/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2022-05-13 17:43:41,578 INFO ==> Loading parameters from checkpoint ./pointpillar_7728.pth to CPU
2022-05-13 17:43:41,610 INFO ==> Done (loaded 127/127)
/home/wh/anaconda3/envs/whenv2/lib/python3.9/site-packages/torch/onnx/utils.py:366: UserWarning: Skipping _decide_input_format
-1
warnings.warn("Skipping _decide_input_format\n {}".format(e.args[0]))
/home/wh/gitstorage/CUDA-PointPillars/OpenPCDet/pcdet/models/backbones_3d/vfe/pillar_vfe.py:30: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if inputs.shape[0] > self.part:
/home/wh/gitstorage/CUDA-PointPillars/OpenPCDet/pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py:17: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
batch_size = coords[:, 0].max().int().item() + 1
/home/wh/gitstorage/CUDA-PointPillars/OpenPCDet/pcdet/models/backbones_2d/base_bev_backbone.py:95: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
stride = int(spatial_features.shape[2] / x.shape[2])
/home/wh/gitstorage/CUDA-PointPillars/OpenPCDet/pcdet/models/detectors/detector3d_template.py:214: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert cls_preds.shape[1] in [1, self.num_class]
/home/wh/gitstorage/CUDA-PointPillars/OpenPCDet/pcdet/models/model_utils/model_nms_utils.py:14: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if box_scores.shape[0] > 0:
/home/wh/anaconda3/envs/whenv2/lib/python3.9/site-packages/torch/onnx/utils.py:272: UserWarning: We detected that you are modifying a dictionary that is an input to your model. Note that dictionaries are allowed as inputs in ONNX but they should be handled with care. Usages of dictionaries is not recommended, and should not be used except for configuration use. Also note that the order and values of the keys must remain the same.
warnings.warn(warning)
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Use onnx_graphsurgeon to adjust postprocessing part in the onnx...
Use onnx_graphsurgeon to modify onnx...
finished exporting onnx
2022-05-13 17:43:48,895 INFO [PASS] ONNX EXPORTED.

Then, I moved the onnx file and parans.h to ../include/ as said in the README file.
But I got the following erros:

GPU has cuda devices: 2
----device id: 0 info----
GPU : NVIDIA GeForce RTX 2080
Capbility: 7.5
Global memory: 7979MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
----device id: 1 info----
GPU : NVIDIA GeForce RTX 2080
Capbility: 7.5
Global memory: 7982MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Building TRT engine.
trt_infer: [shuffleNode.cpp::symbolicExecute::391] Error Code 4: Internal Error (Reshape_249: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
trt_infer: ModelImporter.cpp:792: While parsing node number 28 [Pad -> "input.67"]:
trt_infer: ModelImporter.cpp:793: --- Begin node ---
trt_infer: ModelImporter.cpp:794: input: "input.55"
input: "onnx::Cast_451"
input: "onnx::Pad_453"
output: "input.67"
name: "Pad_260"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}

trt_infer: ModelImporter.cpp:795: --- End node ---
trt_infer: ModelImporter.cpp:798: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Pad_260
[shuffleNode.cpp::symbolicExecute::391] Error Code 4: Internal Error (Reshape_249: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
: failed to parse onnx model file, please check the onnx version and trt support op!

Guys, please help me. T-T
Thank you for your time!

trt_infer: ../rtSafe/cuda/caskUtils.cpp (98) - Assertion Error in trtSmToCask: 0 (Unsupported SM.)

ss TRT_DEPRECATED IPluginLayer : public ILayer
^~~~~~~~~~~~
[100%] Linking CXX executable demo
[100%] Built target demo
(cppy37) yixin@yixin:~/projects/CUDA-PointPillars/test/build$ ./demo

Building TRT engine.

Input filename: ../../model/pointpillar.onnx
ONNX IR version: 0.0.8
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:

input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
trt_infer: ../rtSafe/cuda/caskUtils.cpp (98) - Assertion Error in trtSmToCask: 0 (Unsupported SM.)
: engine init null!
(cppy37) yixin@yixin

While parsing node number 7 [ScatterBEV]: ERROR: ModelImporter.cpp:134 In function parseGraph: [8] No importer registered for op: ScatterBEV : failed to parse onnx model file, please check the onnx version and trt support op!

How to run the demo in Jetpack 4.4?

Thanks for your contribution to this great project !

I get some questions and need your help, please. The configuration:

Jetpack 4.4 [L4T 32.4.3]
AGX Xavier [16GB]
CUDA: 10.2.89
cuDNN: 8.0.0.180
TRT: 7.1.3.0

I used two ways to get the exe: demo, just like:

In the floder test, mkdir build && cd build && cmake .. && make -j8
Compiled success! But when I run demo, it shows:

Building TRT engine.

Input filename: ../../model/pointpillar.onnx
ONNX IR version: 0.0.8
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:

input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
Enable fp16!
input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5

Then there is no response directly and the Xavier is powered off. Is this caused by Jetpack version?

In the test floder and modify the Makefile

INCLUDE :=
INCLUDE += $(CUDA_CFLAGS)
INCLUDE += -I/usr/include/
INCLUDE += -I../include

Compiled success too! Now cd output and ./demo, it shows:

trt_infer: INVALID_ARGUMENT: getPluginCreator could not find plugin ScatterBEV version 1
ERROR: builtin_op_importers.cpp:3661 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I see the ScatterBEV.cpp in src/plugin, how can I use it?

I'm looking forward to your reply!

Two problems were found

running demo twice，but has different nms_pred.
first time result:
`0 0 Pedestrian 0 0 0 0 0 0 0 8.801592 -22.979778 -0.887550 0.699091 1.681269 0.955211 6.057534 0.886240

0 1 Car 0 0 0 0 0 0 0 12.719068 -28.110558 -0.953992 1.454717 1.440981 3.581757 1.694893 0.868306

0 2 Car 0 0 0 0 0 0 0 47.362293 -28.365889 -0.973110 1.458381 1.436911 3.486596 1.667450 0.857365

0 3 Cyclist 0 0 0 0 0 0 0 6.083002 -20.606743 -0.793952 0.521009 1.870310 1.742529 6.440432 0.723123

0 4 Pedestrian 0 0 0 0 0 0 0 38.801476 -24.652239 -0.866866 0.633589 1.681634 0.863217 7.058193 0.640031

1 0 Car 0 0 0 0 0 0 0 12.139722 -28.273060 -0.901822 1.527452 1.430591 3.621138 1.697318 0.906442

1 1 Car 0 0 0 0 0 0 0 46.858337 -28.242884 -0.900239 1.531207 1.431682 3.568392 1.667983 0.901812

1 2 Pedestrian 0 0 0 0 0 0 0 8.465889 -23.032154 -0.846444 0.645208 1.671196 0.881344 6.123196 0.870731

1 3 Pedestrian 0 0 0 0 0 0 0 38.177109 -24.687294 -0.824349 0.597974 1.776301 0.773330 6.553991 0.749789

1 4 Cyclist 0 0 0 0 0 0 0 6.105674 -20.675518 -0.786540 0.537707 1.857258 1.734642 6.339981 0.717096

2 0 Car 0 0 0 0 0 0 0 11.585449 -28.402311 -0.755955 1.502408 1.468927 3.476464 1.708760 0.914412

2 1 Car 0 0 0 0 0 0 0 46.331028 -28.403627 -0.756222 1.510054 1.459696 3.494760 1.701342 0.892783

2 2 Pedestrian 0 0 0 0 0 0 0 2.864436 -24.456514 -0.795387 0.683705 1.716824 0.740641 6.564042 0.670138`

and second result:

`0 0 Pedestrian 0 0 0 0 0 0 0 8.803576 -22.977654 -0.887780 0.696375 1.681500 0.958435 6.070025 0.884506

0 1 Car 0 0 0 0 0 0 0 12.719075 -28.110571 -0.954006 1.454742 1.441006 3.581787 1.694892 0.868305

0 2 Car 0 0 0 0 0 0 0 47.362247 -28.365870 -0.973042 1.458364 1.436905 3.486571 1.667452 0.857371

0 3 Cyclist 0 0 0 0 0 0 0 6.087093 -20.607618 -0.795005 0.520003 1.870543 1.743158 6.437171 0.718292

0 4 Pedestrian 0 0 0 0 0 0 0 38.801697 -24.651608 -0.866716 0.631521 1.683618 0.862992 3.981019 0.638015

1 0 Car 0 0 0 0 0 0 0 12.137152 -28.273653 -0.905428 1.524646 1.433306 3.596805 1.698718 0.907847

1 1 Car 0 0 0 0 0 0 0 46.859871 -28.246393 -0.904772 1.531922 1.435942 3.555500 1.668782 0.903067

2 0 Car 0 0 0 0 0 0 0 11.584668 -28.394022 -0.754803 1.500816 1.474386 3.450029 1.709120 0.922249

2 1 Car 0 0 0 0 0 0 0 46.331684 -28.388617 -0.756081 1.507665 1.464798 3.450339 1.705996 0.903383

2 2 Pedestrian 0 0 0 0 0 0 0 2.865523 -24.451092 -0.794181 0.643265 1.719354 0.729399 7.015512 0.763062

2 3 Pedestrian 0 0 0 0 0 0 0 8.110272 -23.076586 -0.753614 0.546372 1.720677 0.841813 2.003850 0.684011

2 4 Pedestrian 0 0 0 0 0 0 0 37.609577 -24.749792 -0.808255 0.581791 1.683830 0.778918 4.070107 0.650024`

Is this normal？？

reset score_thresh=0.5, then has much nms bbox
num_obj:241072
numbers of Bndbox need to be nms:241072
why？？
同样的模型，openPcdet输出结果显示正常，onnx部署之后检测结果显示异常。。。。

thanks and waiting for your reply.

Core Dumped Error after inference

Hello, I could compile and run the repo on an amd64 computer. After the inference operation, I got a core dumped error as in the following:

load TRT cache.
<<<<<<<<<<<
load file: ../data/000000.bin
find points num: 20285
find pillar_num: 3384
TIME: generateVoxels: 0.113888 ms.
TIME: generateFeatures: 0.145088 ms.
TIME: doinfer: 989.273 ms.
TIME: doPostprocessCuda: 0.855584 ms.
TIME: pointpillar: 990.544 ms.
Bndbox objs: 8
Saved prediction in: ../eval/kitti/object/pred_velo/000000.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000001.bin
find points num: 18630
find pillar_num: 6815
TIME: generateVoxels: 0.06752 ms.
TIME: generateFeatures: 0.18208 ms.
TIME: doinfer: 6.6993 ms.
TIME: doPostprocessCuda: 1.24224 ms.
TIME: pointpillar: 8.29981 ms.
Bndbox objs: 11
Saved prediction in: ../eval/kitti/object/pred_velo/000001.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000002.bin
find points num: 20210
find pillar_num: 3103
TIME: generateVoxels: 0.06768 ms.
TIME: generateFeatures: 0.125536 ms.
TIME: doinfer: 6.71338 ms.
TIME: doPostprocessCuda: 0.845728 ms.
TIME: pointpillar: 7.85507 ms.
Bndbox objs: 12
Saved prediction in: ../eval/kitti/object/pred_velo/000002.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000003.bin
find points num: 18911
find pillar_num: 3032
TIME: generateVoxels: 0.066528 ms.
TIME: generateFeatures: 0.125248 ms.
TIME: doinfer: 6.69475 ms.
TIME: doPostprocessCuda: 0.681344 ms.
TIME: pointpillar: 7.66832 ms.
Bndbox objs: 4
Saved prediction in: ../eval/kitti/object/pred_velo/000003.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000004.bin
find points num: 19063
find pillar_num: 7515
TIME: generateVoxels: 0.0672 ms.
TIME: generateFeatures: 0.193504 ms.
TIME: doinfer: 6.68547 ms.
TIME: doPostprocessCuda: 1.21693 ms.
TIME: pointpillar: 8.27584 ms.
Bndbox objs: 16
Saved prediction in: ../eval/kitti/object/pred_velo/000004.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000005.bin
find points num: 19962
find pillar_num: 8569
TIME: generateVoxels: 0.072256 ms.
TIME: generateFeatures: 0.215392 ms.
TIME: doinfer: 6.66656 ms.
TIME: doPostprocessCuda: 0.6544 ms.
TIME: pointpillar: 7.7145 ms.
Bndbox objs: 8
Saved prediction in: ../eval/kitti/object/pred_velo/000005.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000006.bin
find points num: 19473
find pillar_num: 5627
TIME: generateVoxels: 0.070752 ms.
TIME: generateFeatures: 0.161312 ms.
TIME: doinfer: 6.67094 ms.
TIME: doPostprocessCuda: 2.75331 ms.
TIME: pointpillar: 9.77827 ms.
Bndbox objs: 17
Saved prediction in: ../eval/kitti/object/pred_velo/000006.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000007.bin
find points num: 19423
find pillar_num: 7935
TIME: generateVoxels: 0.066336 ms.
TIME: generateFeatures: 0.20096 ms.
TIME: doinfer: 6.67046 ms.
TIME: doPostprocessCuda: 1.12442 ms.
TIME: pointpillar: 8.17546 ms.
Bndbox objs: 10
Saved prediction in: ../eval/kitti/object/pred_velo/000007.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000008.bin
find points num: 17238
find pillar_num: 3945
TIME: generateVoxels: 0.06432 ms.
TIME: generateFeatures: 0.128864 ms.
TIME: doinfer: 6.69002 ms.
TIME: doPostprocessCuda: 2.96333 ms.
TIME: pointpillar: 9.95306 ms.
Bndbox objs: 24
Saved prediction in: ../eval/kitti/object/pred_velo/000008.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000009.bin
find points num: 19411
find pillar_num: 7312
TIME: generateVoxels: 0.059232 ms.
TIME: generateFeatures: 0.19088 ms.
TIME: doinfer: 6.67085 ms.
TIME: doPostprocessCuda: 1.55894 ms.
TIME: pointpillar: 8.58445 ms.
Bndbox objs: 13
Saved prediction in: ../eval/kitti/object/pred_velo/000009.txt
>>>>>>>>>>>
malloc_consolidate(): invalid chunk size
Aborted (core dumped)

My cuda and tensorrt versions are:

CUDA: 11.7
cuDNN: 8.4.1
TensorRT: 8.4.1

Thanks in advance.

I have this issue with trt_->doinfer(buffers);

Hi all,

I just wondering those parameters in below is the GPU memory or CPU's?

void *buffers[] = {features_input_, voxel_idxs_, params_input_, cls_output_, box_output_, dir_cls_output_};
trt_->doinfer(buffers);

onnx2trt_onnx.ModelProto

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type "onnx2trt_onnx.ModelProto" has no field named "version".

own model input without internsity,how to modify this code

Hi, my model input only has x, y, z, andwithou intensity.Feature shape is 3. By modifying the relevant parameters of exporter.py, onnx was successfully converted and got params.h.
But don't know how to modify the code in the .cu file to fit my model
Looking forward to your reply

params.h :

#ifndef PARAMS_H_
#define PARAMS_H_
const int MAX_VOXELS = 40000;
class Params
{
  public:
    static const int num_classes = 1;
    const char *class_name [num_classes] = { "Car",};
    const float min_x_range = -5.12;
    const float max_x_range = 15.36;
    const float min_y_range = -5.12;
    const float max_y_range = 15.36;
    const float min_z_range = -2.0;
    const float max_z_range = 2.0;
    // the size of a pillar
    const float pillar_x_size = 0.16;
    const float pillar_y_size = 0.16;
    const float pillar_z_size = 4.0;
    const int max_num_points_per_pillar = 32;
    const int num_point_values = 3;
    // the number of feature maps for pillar scatter
    const int num_feature_scatter = 64;
    const float dir_offset = 0.78539;
    const float dir_limit_offset = 0.0;
    // the num of direction classes(bins)
    const int num_dir_bins = 2;
    // anchors decode by (x, y, z, dir)
    static const int num_anchors = num_classes * 2;
    static const int len_per_anchor = 4;
    const float anchors[num_anchors * len_per_anchor] = {
      3.9,1.6,1.56,0.0,
      3.9,1.6,1.56,1.57,
      };
    const float anchor_bottom_heights[num_classes] = {-1.78,};
    // the score threshold for classification
    const float score_thresh = 0.1;
    const float nms_thresh = 0.01;
    const int max_num_pillars = MAX_VOXELS;
    const int pillarPoints_bev = max_num_points_per_pillar * max_num_pillars;
    // the detected boxes result decode by (x, y, z, w, l, h, yaw)
    const int num_box_values = 7;
    // the input size of the 2D backbone network
    const int grid_x_size = (max_x_range - min_x_range) / pillar_x_size;
    const int grid_y_size = (max_y_range - min_y_range) / pillar_y_size;
    const int grid_z_size = (max_z_range - min_z_range) / pillar_z_size;
    // the output size of the 2D backbone network
    const int feature_x_size = grid_x_size / 2;
    const int feature_y_size = grid_y_size / 2;
    Params() {};
};
#endif

how to download the onnx file

Hi, in the ./model dir, only some text write version/oid/size, how can I download the onnx file? Thanks