nvidia-ai-iot / cuda-pointpillars Goto Github PK
View Code? Open in Web Editor NEWA project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
License: Apache License 2.0
A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
License: Apache License 2.0
Hi, would it be possible to run the model on an older version of TRT/CUDNN/CUDA?
We are using the DRIVE AGX with Drive Software 10.0 with TRT 5.1.4. Even the latest Drive SDK does not provide TRT 8.4.0. so it seems like a problem right now.
If it can be done, can you please provide some instructions on how to do this?
Thanks.
using catkin_make to generate ros project, fault which shows "cuda failure"invalid device function at ./.cpp error status 98" occurs when running. it seem that cmakelist should be fixed. How to do ?
I use the tensorRT8.4, when the engine inference have this error.
i use tenrorrt8.4 and when i am running ./demo ,Building TRT engine: there are some errors:
Building TRT engine.
trt_infer: Could not register plugin creator - ::PillarScatterPlugin version 1
trt_infer: parsers/onnx/ModelImporter.cpp:780: While parsing node number 4 [ScatterBEV -> "479"]:
trt_infer: parsers/onnx/ModelImporter.cpp:781: --- Begin node ---
trt_infer: parsers/onnx/ModelImporter.cpp:782: input: "403"
input: "coords"
input: "params"
output: "479"
name: "onnx_graphsurgeon_node_0"
op_type: "ScatterBEV"
trt_infer: ModelImporter.cpp:751: --- End node ---
trt_infer: ModelImporter.cpp:754: ERROR: builtin_op_importers.cpp:4951 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
: failed to parse onnx model file, please check the onnx version and trt support op!
How can i fix the problem?
I've met a problem, have you ever dealt with it?
Traceback (most recent call last): | 0/3400 [00:00<?, ?it/s]
File "network.py", line 116, in
train_model(
File "/home/yueye/code/3D-MAN/tools/train_utils/train_utils.py", line 84, in train_model
accumulated_iter = train_one_epoch(
File "/home/yueye/code/3D-MAN/tools/train_utils/train_utils.py", line 36, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/yueye/code/3D-MAN/pcdet/models/init.py", line 42, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/home/yueye/anaconda3/envs/3dman/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yueye/code/3D-MAN/pcdet/models/detectors/pointpillar.py", line 14, in forward
loss, tb_dict, disp_dict = self.get_training_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/detectors/pointpillar.py", line 27, in get_training_loss
loss_rpn, tb_dict = self.dense_head.get_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/dense_heads/anchor_head_template.py", line 217, in get_loss
cls_loss, tb_dict = self.get_cls_layer_loss()
File "/home/yueye/code/3D-MAN/pcdet/models/dense_heads/anchor_head_template.py", line 128, in get_cls_layer_loss
cls_loss_src = self.cls_loss_func(cls_preds, one_hot_targets, weights=cls_weights) # [N, M]
File "/home/yueye/anaconda3/envs/3dman/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yueye/code/3D-MAN/pcdet/utils/loss_utils.py", line 59, in forward
pt = target * (1.0 - pred_sigmoid) + (1.0 - target) * pred_sigmoid
RuntimeError: The size of tensor a (12160) must match the size of tensor b (199680) at non-singleton dimension 1
at line 323 of preprocess_kernels.cu
, the code calcuate offset.
//calculate offset
float x_offset = voxel_x / 2 + cordsSM[pillar_idx_inBlock].w * voxel_x + range_min_x;
float y_offset = voxel_y / 2 + cordsSM[pillar_idx_inBlock].z * voxel_y + range_min_y;
float z_offset = voxel_z / 2 + cordsSM[pillar_idx_inBlock].y * voxel_z + range_min_z;
I think the w
means intensity,
when calcuate x_offset, why voxel_x
multiply by cordsSM[pillar_idx_inBlock].w
, not cordsSM[pillar_idx_inBlock].x
when calcuate y_offset, why voxel_y
multiply by cordsSM[pillar_idx_inBlock].z
, not cordsSM[pillar_idx_inBlock].y
when calcuate z_offset, why voxel_z
multiply by cordsSM[pillar_idx_inBlock].y
, not cordsSM[pillar_idx_inBlock].z
root@92739a255d9f:/share/CUDA-PointPillars.bak/test/build# ./demo
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA GeForce RTX 3060 Laptop GPU
Capbility: 8.6
Global memory: 5946MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
Cuda failure: the provided PTX was compiled with an unsupported toolchain. at line 108 in file /share/CUDA-PointPillars.bak/test/main.cpp error status: 222
Aborted (core dumped)
root@92739a255d9f:/share/CUDA-PointPillars.bak/test/build# exit
Cuda failure: an illegal memory access was encountered at line 306 in file .../CUDA-PointPillars/src/pointpillar.cpp error status: 700
KITTI数据集能检测障碍物,效果一般,waymo数据集很差,是为什么呢
Thanks for your contribution to this great project !
I get some questions and need your help, please. The configuration:
Jetpack 4.4 [L4T 32.4.3]
AGX Xavier [16GB]
CUDA: 10.2.89
cuDNN: 8.0.0.180
TRT: 7.1.3.0
I used two ways to get the exe: demo, just like:
mkdir build && cd build && cmake .. && make -j8
Building TRT engine.
Input filename: ../../model/pointpillar.onnx
ONNX IR version: 0.0.8
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
Enable fp16!
input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
Then there is no response directly and the Xavier is powered off. Is this caused by Jetpack version?
test
floder and modify the Makefile
INCLUDE :=
INCLUDE += $(CUDA_CFLAGS)
INCLUDE += -I/usr/include/
INCLUDE += -I../include
Compiled success too! Now cd output
and ./demo
, it shows:
trt_infer: INVALID_ARGUMENT: getPluginCreator could not find plugin ScatterBEV version 1
ERROR: builtin_op_importers.cpp:3661 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
I see the ScatterBEV.cpp
in src/plugin
, how can I use it?
I'm looking forward to your reply!
trt_infer: 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 97)
trt_infer: 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
How to use multi head pointpillar model?
Hi all,
I just wondering those parameters in below is the GPU memory or CPU's?
void *buffers[] = {features_input_, voxel_idxs_, params_input_, cls_output_, box_output_, dir_cls_output_};
trt_->doinfer(buffers);
Hello, dear developer
If the model I retrained contains more labels, how can I modify the original code? Please give me some advice
For example, labels of my model are "car", "pedestrian","cyclist","indicator" and "truck", while the open source code is only "car", "pedestrian" and "cyclist".
Thank you very much!
['Car', 'Pedestrian', 'Cyclist']
3
0 -39.68 -3 69.12 39.68 1
[0.16, 0.16, 4]
32
40000
4
64
0.78539
0.0
2
[3.9, 1.6, 1.56, 0.0, 3.9, 1.6, 1.56, 1.57, 0.8, 0.6, 1.73, 0.0, 0.8, 0.6, 1.73, 1.57, 1.76, 0.6, 1.73, 0.0, 1.76, 0.6, 1.73, 1.57]
[-1.78, -0.6, -0.6]
0.1
0.01
anchors: const float anchors[num_anchors * len_per_anchor] = {
3.9,1.6,1.56,0.0,
3.9,1.6,1.56,1.57,
0.8,0.6,1.73,0.0,
0.8,0.6,1.73,1.57,
1.76,0.6,1.73,0.0,
1.76,0.6,1.73,1.57,
};
anchors: const float anchor_bottom_heights[num_classes] = {-1.78,-0.6,-0.6,};
########
2022-03-15 11:08:59,269 INFO ------ Convert OpenPCDet model for TensorRT ------
2022-03-15 11:09:05,030 INFO ==> Loading parameters from checkpoint ../../checkpoint_epoch_1.pth to CPU
2022-03-15 11:09:05,171 INFO ==> Checkpoint trained from version: pcdet+0.3.0+0642cf0
2022-03-15 11:09:05,462 INFO ==> Done (loaded 127/127)
/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_3d/vfe/pillar_vfe.py:45: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if inputs.shape[0] > self.part:
/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py:31: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
batch_size = coords[:, 0].max().int().item() + 1
Traceback (most recent call last):
File "exporter.py", line 150, in <module>
main()
File "exporter.py", line 135, in main
output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/__init__.py", line 208, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 92, in export
use_external_data_format=use_external_data_format)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 530, in _export
fixed_batch_size=fixed_batch_size)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 366, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 319, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 338, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 426, in forward
self._force_outplace,
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 412, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/detectors/pointpillar.py", line 31, in forward
spatial_features_2d = self.module_list[2](spatial_features) #"BaseBEVBackbone"
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/nvidia/project/pointpillar/CUDA-PointPillars-main/tool/pcdet/models/backbones_2d/base_bev_backbone.py", line 103, in forward
stride = int(spatial_features.shape[2] / x.shape[2])
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.```
when I generate onnx by exporter.py, i got the error 。 How can i fix it?
Hello, I could compile,when run ./demo ,get this error.
Building TRT engine.
../model/pointpillar.onnxtrt_infer: ModelImporter.cpp:773: While parsing node number 6 [PillarScatterPlugin -> "input.3"]:
trt_infer: ModelImporter.cpp:774: --- Begin node ---
demo: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
已放弃 (核心已转储)
My cuda and tensorrt versions are:
CUDA: 11.1
cuDNN: 8.4.1
TensorRT: 8.4.1
Thanks in advance.
Kitti model can get result and effect is normal. however, I change the dateset to dair, the model can inference in the Dec, version, but not success in Apirl version. the apirl version output the ramdom result.
another question, can you submit a cpu pillar version?? the cuda random pillar cause the pillar features are not steady.
Hello, I could compile and run the repo on an amd64 computer. After the inference operation, I got a core dumped error as in the following:
load TRT cache.
<<<<<<<<<<<
load file: ../data/000000.bin
find points num: 20285
find pillar_num: 3384
TIME: generateVoxels: 0.113888 ms.
TIME: generateFeatures: 0.145088 ms.
TIME: doinfer: 989.273 ms.
TIME: doPostprocessCuda: 0.855584 ms.
TIME: pointpillar: 990.544 ms.
Bndbox objs: 8
Saved prediction in: ../eval/kitti/object/pred_velo/000000.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000001.bin
find points num: 18630
find pillar_num: 6815
TIME: generateVoxels: 0.06752 ms.
TIME: generateFeatures: 0.18208 ms.
TIME: doinfer: 6.6993 ms.
TIME: doPostprocessCuda: 1.24224 ms.
TIME: pointpillar: 8.29981 ms.
Bndbox objs: 11
Saved prediction in: ../eval/kitti/object/pred_velo/000001.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000002.bin
find points num: 20210
find pillar_num: 3103
TIME: generateVoxels: 0.06768 ms.
TIME: generateFeatures: 0.125536 ms.
TIME: doinfer: 6.71338 ms.
TIME: doPostprocessCuda: 0.845728 ms.
TIME: pointpillar: 7.85507 ms.
Bndbox objs: 12
Saved prediction in: ../eval/kitti/object/pred_velo/000002.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000003.bin
find points num: 18911
find pillar_num: 3032
TIME: generateVoxels: 0.066528 ms.
TIME: generateFeatures: 0.125248 ms.
TIME: doinfer: 6.69475 ms.
TIME: doPostprocessCuda: 0.681344 ms.
TIME: pointpillar: 7.66832 ms.
Bndbox objs: 4
Saved prediction in: ../eval/kitti/object/pred_velo/000003.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000004.bin
find points num: 19063
find pillar_num: 7515
TIME: generateVoxels: 0.0672 ms.
TIME: generateFeatures: 0.193504 ms.
TIME: doinfer: 6.68547 ms.
TIME: doPostprocessCuda: 1.21693 ms.
TIME: pointpillar: 8.27584 ms.
Bndbox objs: 16
Saved prediction in: ../eval/kitti/object/pred_velo/000004.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000005.bin
find points num: 19962
find pillar_num: 8569
TIME: generateVoxels: 0.072256 ms.
TIME: generateFeatures: 0.215392 ms.
TIME: doinfer: 6.66656 ms.
TIME: doPostprocessCuda: 0.6544 ms.
TIME: pointpillar: 7.7145 ms.
Bndbox objs: 8
Saved prediction in: ../eval/kitti/object/pred_velo/000005.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000006.bin
find points num: 19473
find pillar_num: 5627
TIME: generateVoxels: 0.070752 ms.
TIME: generateFeatures: 0.161312 ms.
TIME: doinfer: 6.67094 ms.
TIME: doPostprocessCuda: 2.75331 ms.
TIME: pointpillar: 9.77827 ms.
Bndbox objs: 17
Saved prediction in: ../eval/kitti/object/pred_velo/000006.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000007.bin
find points num: 19423
find pillar_num: 7935
TIME: generateVoxels: 0.066336 ms.
TIME: generateFeatures: 0.20096 ms.
TIME: doinfer: 6.67046 ms.
TIME: doPostprocessCuda: 1.12442 ms.
TIME: pointpillar: 8.17546 ms.
Bndbox objs: 10
Saved prediction in: ../eval/kitti/object/pred_velo/000007.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000008.bin
find points num: 17238
find pillar_num: 3945
TIME: generateVoxels: 0.06432 ms.
TIME: generateFeatures: 0.128864 ms.
TIME: doinfer: 6.69002 ms.
TIME: doPostprocessCuda: 2.96333 ms.
TIME: pointpillar: 9.95306 ms.
Bndbox objs: 24
Saved prediction in: ../eval/kitti/object/pred_velo/000008.txt
>>>>>>>>>>>
<<<<<<<<<<<
load file: ../data/000009.bin
find points num: 19411
find pillar_num: 7312
TIME: generateVoxels: 0.059232 ms.
TIME: generateFeatures: 0.19088 ms.
TIME: doinfer: 6.67085 ms.
TIME: doPostprocessCuda: 1.55894 ms.
TIME: pointpillar: 8.58445 ms.
Bndbox objs: 13
Saved prediction in: ../eval/kitti/object/pred_velo/000009.txt
>>>>>>>>>>>
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
My cuda and tensorrt versions are:
CUDA: 11.7
cuDNN: 8.4.1
TensorRT: 8.4.1
Thanks in advance.
ss TRT_DEPRECATED IPluginLayer : public ILayer
^~~~~~~~~~~~
[100%] Linking CXX executable demo
[100%] Built target demo
(cppy37) yixin@yixin:~/projects/CUDA-PointPillars/test/build$ ./demo
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA GeForce RTX 3060 Laptop GPU
Capbility: 8.6
Global memory: 5946MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
input[0]: 10000 32 64
input[1]: 1 1 10000 4
input[2]: 1 1 1 5
trt_infer: ../rtSafe/cuda/caskUtils.cpp (98) - Assertion Error in trtSmToCask: 0 (Unsupported SM.)
: engine init null!
(cppy37) yixin@yixin
TIME: doPostprocessCuda rise to around 80000ms when I use the lastest code(commit 4e8e4f3), before(commit db037d2) the number was around 5ms. Also the bndbox nums is too large.
the gpu info:
GPU : Orin
Capbility: 8.7
Global memory: 30622MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
the run time detail
<<<<<<<<<<<
load file: ../data/000000.bin
find points num: 125635
find pillar_num: 9539
TIME: generateVoxels: 0.97344 ms.
TIME: generateFeatures: 1.00912 ms.
TIME: doinfer: 57.9808 ms.
TIME: doPostprocessCuda: 57716.1 ms.
TIME: pointpillar: 57777.3 ms.
Bndbox objs: 3061
When I follow export gride export onnx from pointpillar_7729.pth
I found this error
root@pc-MS-7B89:/workspace/ssh-docker/workspace/CUDA-PointPillars/tool# python exporter.py --ckpt ../model/pointpillar_7729.pth
2022-06-08 10:17:39,104 INFO ------ Convert OpenPCDet model for TensorRT ------
2022-06-08 10:17:40,746 INFO ==> Loading parameters from checkpoint ../model/pointpillar_7729.pth to CPU
2022-06-08 10:17:40,760 INFO ==> Done (loaded 127/127)
Traceback (most recent call last):
File "exporter.py", line 150, in
main()
File "exporter.py", line 126, in main
torch.onnx.export(model, # model being run
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/init.py", line 225, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 85, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 632, in _export
_model_to_graph(model, args, verbose, input_names,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 409, in _model_to_graph
graph, params, torch_out = _create_jit_graph(model, args,
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 379, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 342, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 1148, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 93, in forward
in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type int
Hi all,
I want to use pointpillar to detect a KITTI tracking dataset and convert it to video format and then visualize it.
Does anyone done this before, could u help me and give me some advice.
Thanks in advance.
Barry
like mAP on kitti or mAPH on waymo?
When I change POINT_CLOUD_RANGE to [0, -39.68, -5, 102.4, 39.68, 5],I got an incorrect inference result.
My VOXEL_SIZE is [0.16, 0.16, 10],How can I solve this problem?
Hello.
I tried to install this repo, but I got error as shown below.
Is there any solution to take over this problem?
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libdl.so when searching for -ldl
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libdl.a when searching for -ldl
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/librt.a when searching for -lrt
/usr/bin/ld: cannot find -lnvinfer
/usr/bin/ld: cannot find -lnvonnxparser
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/aarch64-linux-gnu/lib/libpthread.a when searching for -lpthread
collect2: error: ld returned 1 exit status
CMakeFiles/demo.dir/build.make:951: recipe for target 'demo' failed
make[2]: *** [demo] Error 1
CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/demo.dir/all' failed
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
Hello
Thanks for sharing your great work
When I do make -j8, I got some error and I cannot find the solution of it.
Can I get some advice to solve this problem?
Thanks a lot in advance.
~/CUDA-PointPillars/build$ make -j8
[ 11%] Linking CXX executable demo
/usr/bin/ld: cannot find -lnvinfer
/usr/bin/ld: cannot find -lnvonnxparser
collect2: error: ld returned 1 exit status
CMakeFiles/demo.dir/build.make:953: recipe for target 'demo' failed
make[2]: *** [demo] Error 1
CMakeFiles/Makefile2:94: recipe for target 'CMakeFiles/demo.dir/all' failed
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2
load file: /media/dk/2eee4ea8-6028-41ef-89c5-8f36a982bc1d/dk/kitti_dataset/testing/velodyne/000999.bin
find points num: 115279
find pillar_num: 7982
TIME: generateVoxels: 0.036864 ms.
TIME: generateFeatures: 0.031424 ms.
TIME: doinfer: 7.19277 ms.
TIME: doPostprocessCuda: 0.754336 ms.
TIME: pointpillar: 8.06592 ms.
Bndbox objs: 37
I run it om my 2080Ti,but I do not know where can i find the result.
dk@dk-MS-7B94:~/CUDA-PointPillars$ sudo docker run --rm --gpus all -ti -v /home/dk/:/workspace/ssh-docker --net=host scrin/dev-spconv:f22dd9aee04e2fe8a9fe35866e52620d8d8b3779
Hi,
thanks for your amazing work!
Could you please tell me which part of this repo should be modified if I want to deploy this system on amd64 computer?
能用deepstream优秀的流水线处理 来实现 cuda-pointpillars
.vscode for cuda debug can share
I just try to transform my own pth to onnx ,but exporter.py has an issue "report pytorch" ,
i try to set the env as readme in tools ,but cant find a pytorch1.11.0 with cuda11.4 (pytorch.org only has cu113,cu115,cu116)
thank you very much
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type "onnx2trt_onnx.ModelProto" has no field named "version".
Hi, I change the detection range x from [0,69.12] to [-69.12, 69.12] . Others is the same as the repo. But the res_.size() is 321945 before nms_cpu() in src/pointpillar.cpp . And infer time cost more than 15min.
data: 000000.bin
pointpillar_7728.pth: download from OpenPCDet
I don't know if I need to modify the other code at the same time or if I need to configure something.
Thanks for sharing the code. Looking forward to reply.
hi, how to change FP32 to FP16? Thanks!
GPU has cuda devices: 2
----device id: 0 info----
GPU : NVIDIA GeForce RTX 2080
Capbility: 7.5
Global memory: 7979MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
----device id: 1 info----
GPU : NVIDIA GeForce RTX 2080
Capbility: 7.5
Global memory: 7982MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
Building TRT engine.
trt_infer: [shuffleNode.cpp::symbolicExecute::391] Error Code 4: Internal Error (Reshape_249: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
trt_infer: ModelImporter.cpp:792: While parsing node number 28 [Pad -> "input.67"]:
trt_infer: ModelImporter.cpp:793: --- Begin node ---
trt_infer: ModelImporter.cpp:794: input: "input.55"
input: "onnx::Cast_451"
input: "onnx::Pad_453"
output: "input.67"
name: "Pad_260"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}
Guys, please help me. T-T
Thank you for your time!
Hello, I am using
$ cmake -version
cmake3 version 3.17.5
CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/workdir/local/gcc-5.4.0/bin/../libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/work/data/local/gcc-5.4.0 --enable-threads=posix --disable-checking --disable-multilib --enable-languages=c,c++ --with-gmp=/home/work/data/local/gmp4.3.2 --with-mpfr=/home/work/data/local/mpfr-2.4.2 --with-mpc=/home/work/data/local/mpc-0.8.1
Thread model: posix
gcc version 5.4.0 (GCC)
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
And I got the following errors,
cmake .. && make -j$(nproc)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/work/cuda-pointpillars/build
[ 11%] Building NVCC (Device) object CMakeFiles/demo.dir/src/demo_generated_preprocess_kernels.cu.o
[ 22%] Building NVCC (Device) object CMakeFiles/demo.dir/src/demo_generated_pillarScatterKernels.cu.o
/home/work/cuda-pointpillars/include/params.h(24): warning: field initializers are a C++11 feature
/home/work/cuda-pointpillars/include/params.h(24): warning: field initializers are a C++11 feature
/home/work/cuda-pointpillars/src/preprocess_kernels.cu(173): error: initialization with "{...}" is not allowed for object of type "dim3"
/home/work/cuda-pointpillars/src/preprocess_kernels.cu(174): error: initialization with "{...}" is not allowed for object of type "dim3"
/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(97): error: explicit type is missing ("int"assumed)
/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(99): error: argument of type "int" is incompatible with parameter of type "cudaError_t"
/home/work/cuda-pointpillars/src/preprocess_kernels.cu(211): error: expected an expression
/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(119): error: explicit type is missing ("int" assumed)
/home/work/cuda-pointpillars/src/pillarScatterKernels.cu(121): error: argument of type "int" is incompatible with parameter of type "cudaError_t"
4 errors detected in the compilation of "/home/work/cuda-pointpillars/src/pillarScatterKernels.cu".
3 errors detected in the compilation of "/home/work/cuda-pointpillars/src/preprocess_kernels.cu".
CMake Error at demo_generated_preprocess_kernels.cu.o.Release.cmake:280 (message):
Error generating file
/home/work/cuda-pointpillars/build/CMakeFiles/demo.dir/src/./demo_generated_preprocess_kernels.cu.o
CMake Error at demo_generated_pillarScatterKernels.cu.o.Release.cmake:280 (message):
Error generating file
/home/work/cuda-pointpillars/build/CMakeFiles/demo.dir/src/./demo_generated_pillarScatterKernels.cu.o
make[2]: *** [CMakeFiles/demo.dir/src/demo_generated_preprocess_kernels.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/demo.dir/src/demo_generated_pillarScatterKernels.cu.o] Error 1
make[1]: *** [CMakeFiles/demo.dir/all] Error 2
make: *** [all] Error 2
Could you provide the correct version to compile the project?
Hi, my model input only has x, y, z, andwithou intensity.Feature shape is 3. By modifying the relevant parameters of exporter.py, onnx was successfully converted and got params.h.
But don't know how to modify the code in the .cu file to fit my model
Looking forward to your reply
params.h :
#ifndef PARAMS_H_
#define PARAMS_H_
const int MAX_VOXELS = 40000;
class Params
{
public:
static const int num_classes = 1;
const char *class_name [num_classes] = { "Car",};
const float min_x_range = -5.12;
const float max_x_range = 15.36;
const float min_y_range = -5.12;
const float max_y_range = 15.36;
const float min_z_range = -2.0;
const float max_z_range = 2.0;
// the size of a pillar
const float pillar_x_size = 0.16;
const float pillar_y_size = 0.16;
const float pillar_z_size = 4.0;
const int max_num_points_per_pillar = 32;
const int num_point_values = 3;
// the number of feature maps for pillar scatter
const int num_feature_scatter = 64;
const float dir_offset = 0.78539;
const float dir_limit_offset = 0.0;
// the num of direction classes(bins)
const int num_dir_bins = 2;
// anchors decode by (x, y, z, dir)
static const int num_anchors = num_classes * 2;
static const int len_per_anchor = 4;
const float anchors[num_anchors * len_per_anchor] = {
3.9,1.6,1.56,0.0,
3.9,1.6,1.56,1.57,
};
const float anchor_bottom_heights[num_classes] = {-1.78,};
// the score threshold for classification
const float score_thresh = 0.1;
const float nms_thresh = 0.01;
const int max_num_pillars = MAX_VOXELS;
const int pillarPoints_bev = max_num_points_per_pillar * max_num_pillars;
// the detected boxes result decode by (x, y, z, w, l, h, yaw)
const int num_box_values = 7;
// the input size of the 2D backbone network
const int grid_x_size = (max_x_range - min_x_range) / pillar_x_size;
const int grid_y_size = (max_y_range - min_y_range) / pillar_y_size;
const int grid_z_size = (max_z_range - min_z_range) / pillar_z_size;
// the output size of the 2D backbone network
const int feature_x_size = grid_x_size / 2;
const int feature_y_size = grid_y_size / 2;
Params() {};
};
#endif
Hi, Nvidia AI team. thanks for your opensource sample code for deploying the pointpillar on Xavier.
By the export to onnx part, I have a question. How can you ensure you only export the middel part of the network(after voxelization and encode to 10 feature per pillar), not the whole part which include voxelization, pillar feature extraction, scatter to bev, backbone, postprocess?
torch.onnx.export(model, # model being run
(dummy_voxel_features, dummy_voxel_num_points, dummy_coords), # model input (or a tuple for multiple inputs)
"./pointpillar.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
keep_initializers_as_inputs=True,
input_names = ['input', 'voxel_num_points', 'coords'], # the model's input names
output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
)
thanks!
Hi,
I am wondering if this works for Drive Orin environment?
0 1 Car 0 0 0 0 0 0 0 12.719068 -28.110558 -0.953992 1.454717 1.440981 3.581757 1.694893 0.868306
0 2 Car 0 0 0 0 0 0 0 47.362293 -28.365889 -0.973110 1.458381 1.436911 3.486596 1.667450 0.857365
0 3 Cyclist 0 0 0 0 0 0 0 6.083002 -20.606743 -0.793952 0.521009 1.870310 1.742529 6.440432 0.723123
0 4 Pedestrian 0 0 0 0 0 0 0 38.801476 -24.652239 -0.866866 0.633589 1.681634 0.863217 7.058193 0.640031
1 0 Car 0 0 0 0 0 0 0 12.139722 -28.273060 -0.901822 1.527452 1.430591 3.621138 1.697318 0.906442
1 1 Car 0 0 0 0 0 0 0 46.858337 -28.242884 -0.900239 1.531207 1.431682 3.568392 1.667983 0.901812
1 2 Pedestrian 0 0 0 0 0 0 0 8.465889 -23.032154 -0.846444 0.645208 1.671196 0.881344 6.123196 0.870731
1 3 Pedestrian 0 0 0 0 0 0 0 38.177109 -24.687294 -0.824349 0.597974 1.776301 0.773330 6.553991 0.749789
1 4 Cyclist 0 0 0 0 0 0 0 6.105674 -20.675518 -0.786540 0.537707 1.857258 1.734642 6.339981 0.717096
2 0 Car 0 0 0 0 0 0 0 11.585449 -28.402311 -0.755955 1.502408 1.468927 3.476464 1.708760 0.914412
2 1 Car 0 0 0 0 0 0 0 46.331028 -28.403627 -0.756222 1.510054 1.459696 3.494760 1.701342 0.892783
2 2 Pedestrian 0 0 0 0 0 0 0 2.864436 -24.456514 -0.795387 0.683705 1.716824 0.740641 6.564042 0.670138`
and second result:
`0 0 Pedestrian 0 0 0 0 0 0 0 8.803576 -22.977654 -0.887780 0.696375 1.681500 0.958435 6.070025 0.884506
0 1 Car 0 0 0 0 0 0 0 12.719075 -28.110571 -0.954006 1.454742 1.441006 3.581787 1.694892 0.868305
0 2 Car 0 0 0 0 0 0 0 47.362247 -28.365870 -0.973042 1.458364 1.436905 3.486571 1.667452 0.857371
0 3 Cyclist 0 0 0 0 0 0 0 6.087093 -20.607618 -0.795005 0.520003 1.870543 1.743158 6.437171 0.718292
0 4 Pedestrian 0 0 0 0 0 0 0 38.801697 -24.651608 -0.866716 0.631521 1.683618 0.862992 3.981019 0.638015
1 0 Car 0 0 0 0 0 0 0 12.137152 -28.273653 -0.905428 1.524646 1.433306 3.596805 1.698718 0.907847
1 1 Car 0 0 0 0 0 0 0 46.859871 -28.246393 -0.904772 1.531922 1.435942 3.555500 1.668782 0.903067
2 0 Car 0 0 0 0 0 0 0 11.584668 -28.394022 -0.754803 1.500816 1.474386 3.450029 1.709120 0.922249
2 1 Car 0 0 0 0 0 0 0 46.331684 -28.388617 -0.756081 1.507665 1.464798 3.450339 1.705996 0.903383
2 2 Pedestrian 0 0 0 0 0 0 0 2.865523 -24.451092 -0.794181 0.643265 1.719354 0.729399 7.015512 0.763062
2 3 Pedestrian 0 0 0 0 0 0 0 8.110272 -23.076586 -0.753614 0.546372 1.720677 0.841813 2.003850 0.684011
2 4 Pedestrian 0 0 0 0 0 0 0 37.609577 -24.749792 -0.808255 0.581791 1.683830 0.778918 4.070107 0.650024`
Is this normal??
reset score_thresh=0.5, then has much nms bbox
num_obj:241072
numbers of Bndbox need to be nms:241072
why??
同样的模型,openPcdet输出结果显示正常,onnx部署之后检测结果显示异常。。。。
thanks and waiting for your reply.
The code runs repeatedly, and “Bndbox objs:” from the same point cloud image is not the same. Why?
Hi, in the ./model
dir, only some text write version/oid/size
, how can I download the onnx file? Thanks
Hi,
I use
jetson nano 2 gb
Jetpack 4.6
CUDA 10.2
tensorrt 8.0
onnx 1.8
After compling, I run the demo, got error:
GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X1
Capbility: 5.3
Global memory: 1979MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
Building TRT engine.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type “onnx2trt_onnx.ModelProto” has no field named “version”.
trt_infer: ModelImporter.cpp:682: Failed to parse ONNX model from file: …/…/model/pointpillar.onnx
: failed to parse onnx model file, please check the onnx version and trt support op!
So can you tell me how to solve this error or which version of onnx do you use?
so,nms run in cpu,not calculate in fps ?
the all cost time not include cpu ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.