tsinghua-rll / voxelnet-tensorflow Goto Github PK
View Code? Open in Web Editor NEWA 3D object detection system for autonomous driving.
License: MIT License
A 3D object detection system for autonomous driving.
License: MIT License
~/tf_voxelnet$ python3 train.py
Traceback (most recent call last):
File "train.py", line 15, in
import tensorflow as tf
File "/usr/local/lib/python3.5/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/init.py", line 52, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/core/framework/graph_pb2.py", line 6, in
from google.protobuf import descriptor as _descriptor
File "/usr/local/lib/python3.5/dist-packages/google/protobuf/init.py", line 37, in
import('pkg_resources').declare_namespace(name)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2927, in
@_call_aside
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2913, in _call_aside
f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2952, in _initialize_master_working_set
add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 956, in subscribe
callback(dist)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2952, in
add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2515, in activate
declare_namespace(pkg)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2097, in declare_namespace
_handle_ns(packageName, path_item)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2047, in _handle_ns
_rebuild_mod_path(path, packageName, module)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2066, in _rebuild_mod_path
orig_path.sort(key=position_in_sys_path)
AttributeError: '_NamespacePath' object has no attribute 'sort'
Hello Jeasine,
I do not understand the additions of +1 in box_overlaps.pyx . I looked at the AVOD code and there they do not add +1. Maybe in Faster RCNN it is used because they work with pixels.
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
)
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + 1
)
ua = float(
(boxes[n, 2] - boxes[n, 0] + 1) *
(boxes[n, 3] - boxes[n, 1] + 1) +
box_area - iw * ih
)
My interpretation of the effect (not certain):
As a result the current code should return higher IOU values or in other words the negative/positive thresholds should be effectively a little lower than 45/60. (which may be beneficial anyways..)
Hi jeasinema, thank you for this great work!
When I run train.py, it stops here and make no more progress with no error in terminal:
train: 20/18700 @ epoch:0/10 loss: 4.318506240844727 reg_loss: 2.653141498565674 cls_loss: 1.6653645038604736 default
the gpu-util turns down to 0% , with a high gpu memory usage of 8527/11172MB x 4 1080Ti
Any help is appreciated . Thanks in advance!!
From my understanding, the 3d ground truth box can be tilted, and the intersection between gt box and the anchor is usually not rectangular. However, in box_overlaps.pyx it seems you assume that the intersection of two boxes is a rectangle of which the area is calculated by multiplying iw and ih?
Could you show us the environment you are using? Because the interface has changed a lot, there are a lot of problems when run the code. Thanks a lot.
In the paper, the inference time for the VoxelNet is 225ms on a TitanX GPU and
1.7Ghz CPU. when I test for this project , the inference time is about 1200ms on a 1080 GPU.
In your README in Usage section mentioned:
make sure your working directory looks like this
├── build <-- Cython build file ├── model <-- some src files ├── utils <-- some src files ├── setup.py ├── config.py ├── test.py ├── train.py ├── train_hook.py ├── README.md └── data <-- KITTI data directory └── object ├── training <-- training data | ├── image_2 | ├── label_2 | └── velodyne └── testing <--- testing data ├── image_2 ├── label_2 └── velodyne
But I can't find "data" folder.
hello, I am already done my training work but when I did python3 test.py, there comes an error:
Dataset total length: 0
2018-05-11 17:25:13.173997: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/user/VoxelNet-tensorflow/utils/kitti_loader.py", line 233, in loader_worker_main
self.fill_queue(batch_size)
File "/home/user/VoxelNet-tensorflow/utils/kitti_loader.py", line 162, in fill_queue
voxel[idx * single_batch_size:(idx + 1) * single_batch_size])
File "/home/user/VoxelNet-tensorflow/utils/kitti_loader.py", line 265, in build_input
feature = np.concatenate(feature_list)
ValueError: need at least one array to concatenate
2018-05-11 17:25:13.256266: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-11 17:25:13.256663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: TITAN Xp COLLECTORS EDITION major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.23GiB
2018-05-11 17:25:13.256680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-11 17:25:13.434146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-11 17:25:13.434184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-11 17:25:13.434190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-11 17:25:13.434382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 12194 MB memory) -> physical GPU (device: 0, name: TITAN Xp COLLECTORS EDITION, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/user/VoxelNet-tensorflow/model/group_pointcloud.py:75: calling reduce_max (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Reading model parameters from ./save_model/default
test done.
There is no output coming from testing.
My dataset directory is same in data/object/training(testing)/image_2 label_2 velodyne.
Can you help me solve the problem? Thank you a lot!
I want project Velodyne data to spherical with utils/preprocess.py, when I ran it, this problem occurred:
OSError: [Errno 2] No such file or directory: 'velodyne'
Where is Velodyne directory?
And how can I use a Velodyne data and convert its point to the spherical?
Also mentioned in the closed issue here:
#13
predict ['007029' '006278' '007221' '005063']
/home/tensorflow/lib/python3.5/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce
return umr_minimum(a, axis, None, out, keepdims)
Traceback (most recent call last):
File "train.py", line 142, in <module>
tf.app.run(main)
File "/home/tensorflow/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "train.py", line 118, in main
sess, valid_loader.load(), summary=True)
File "/home/Work/VoxelNet-tensorflow/model/model.py", line 320, in predict_step
batch_gt_boxes3d[0])
File "/home/Work/VoxelNet-tensorflow/utils/utils.py", line 339, in draw_lidar_box3d_on_image
projections = lidar_box3d_to_camera_box(boxes3d, cal_projection=True)
File "/home/Work/VoxelNet-tensorflow/utils/utils.py", line 298, in lidar_box3d_to_camera_box
minx = int(np.min(points[:, 0]))
ValueError: cannot convert float NaN to integer
Hi, @jeasinema
Could you kindly shed more light on "lidar_coord"?
How to get the coordinate of the lidar? Is it necessary to shift the point cloud by adding "lidar_coord"?
Hey,
So I am trying to train on a single GPU of 11GB. In the config.py file I changed
__C.GPU_AVAILABLE = '3,1,2,0'
to
__C.GPU_AVAILABLE = '0'
Due to your multiprocessing which you have implemented it breaks by processes up in 16 parts of only 58MB each. Though I have plenty more GPU available, it does not seem to access it.
Do you know how I can solve this?
hi ,y friend,when I run sudo python train.py, some errors happen. like that:
2018-11-20 10:13:15.537986: W tensorflow/core/common_runtime/bfc_allocator.cc:277] __********xxxxxxx****
2018-11-20 10:13:15.538016: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1,128,12,402,354]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in run_fn
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,128,12,402,354]
[[Node: gpu_0/MiddleAndRPN/conv1/Conv3D = Conv3D[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 2, 1, 1, 1], device="/job:localhost/replica:0/task:0/device:GPU:0"](gpu_0/MiddleAndRPN/conv1/Pad, MiddleAndRPN/conv1/kernel/read)]]
[[Node: gpu_0/MiddleAndRPN/Sum/_153 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4878_gpu_0/MiddleAndRPN_/Sum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 135, in
tf.app.run(main)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 109, in main
ret = model.train_step(sess, train_loader.load(), train=True, summary=is_summary)
File "/home/liulei/tensorflow_workplace/VoxelNet-tensorflow/model/model.py", line 197, in train_step
return session.run(output_feed, input_feed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,128,12,402,354]
[[Node: gpu_0/MiddleAndRPN/conv1/Conv3D = Conv3D[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 2, 1, 1, 1], device="/job:localhost/replica:0/task:0/device:GPU:0"](gpu_0/MiddleAndRPN/conv1/Pad, MiddleAndRPN/conv1/kernel/read)]]
[[Node: gpu_0/MiddleAndRPN/Sum/_153 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4878_gpu_0/MiddleAndRPN_/Sum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op 'gpu_0/MiddleAndRPN_/conv1/Conv3D', defined at:
File "train.py", line 135, in
tf.app.run(main)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 73, in main
avail_gpus=cfg.GPU_AVAILABLE.split(',')
File "/home/liulei/tensorflow_workplace/VoxelNet-tensorflow/model/model.py", line 73, in init
input=feature.outputs, alpha=self.alpha, beta=self.beta, training=is_train)
File "/home/liulei/tensorflow_workplace/VoxelNet-tensorflow/model/rpn.py", line 41, in init
(1, 1, 1), self.input, name='conv1')
File "/home/liulei/tensorflow_workplace/VoxelNet-tensorflow/model/rpn.py", line 149, in ConvMD
pad, Cout, k, strides=s, padding="valid", reuse=tf.AUTO_REUSE, name=scope)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py", line 809, in conv3d
return layer.apply(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 835, in call
return self.conv_op(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 499, in call
return self.call(inp, filter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_ops.py", line 187, in call
name=self.name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 847, in conv3d
padding=padding, data_format=data_format, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,128,12,402,354]
[[Node: gpu_0/MiddleAndRPN_/conv1/Conv3D = Conv3D[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 2, 1, 1, 1], device="/job:localhost/replica:0/task:0/device:GPU:0"](gpu_0/MiddleAndRPN/conv1/Pad, MiddleAndRPN_/conv1/kernel/read)]]
[[Node: gpu_0/MiddleAndRPN_/Sum/_153 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4878_gpu_0/MiddleAndRPN_/Sum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
hi,jeasinema.I get your code,do the next step:
1.change the config of gpu: '3,1,2,0' -> '0' # I only have gtx 1070
2.python3 setup.py build_ext --inplace # I use python3.5.2
3.cd utils
python3 preprocess.py
error:there is no config,so i copy tf_voxelnet/config.py to tf_voxelnet/utils/ #is that right?????
the data is from kitti object include data_object_velodyne.zip (about 29G) \image_2(12G)\label_2\voxel(about 25G)
4.python3 train.py
errror : thert is no label of testing data ,so i copy "training" to "testing" in trainpy.
then :
..........
train: 18/60 @ epoch:3/10 loss: 1.9014711380004883 reg_loss: 0.31266674399375916 cls_loss: 1.5888043642044067 default
train ['000004']
--------------------using time: 73.70951771736145s-------------------
train: 19/60 @ epoch:3/10 loss: 1.5401957035064697 reg_loss: 0.23529152572155 cls_loss: 1.3049042224884033 default
train ['000001']
--------------------using time: 77.3743188381195s-------------------
train: 20/60 @ epoch:3/10 loss: 1.8793950080871582 reg_loss: 0.2751219868659973 cls_loss: 1.6042730808258057 default
it will stop in this ,there is no error in terminal.
when use 4 tanx train the model, it used all the 20 cpu threads and 45G ram.it used gpu-memory 149m*8.Why it use so much cpu?????
i found that gpu-util:0%,0%,0%,50%.I train another model,so i think it didn't use gpu. what's the reason?
i'm having trouble understanding why the p_map layer doesn't have 'activation=False':
https://github.com/jeasinema/VoxelNet-tensorflow/blob/master/model/rpn.py#L98
it's fed into a sigmoid, so if the activations are ReLU'd to be non-negative, it seems the objectness outputs:
https://github.com/jeasinema/VoxelNet-tensorflow/blob/master/model/rpn.py#L104
will be limited to [.5,1]. maybe even if that's a bit wrong-ish, it doesn't much matter somehow?
The problem is located in the utils.py's delta_to_boxes3d() function
Can this code be used for training all the three class Car, Pedestrian and cyclists together? In code, it's for car only and can be trained for individual classes.
Thanks in advance.
hi,jeasinema.I get your code,do the next step:
1.change the config of gpu: '3,1,2,0' -> '0' # I only have gtx 1070
2.python3 setup.py build_ext --inplace # I use python3.5.2
3.cd utils
python3 preprocess.py
error:there is no config,so i copy tf_voxelnet/config.py to tf_voxelnet/utils/ #is that right?????
the data is from kitti object include data_object_velodyne.zip (about 29G) \image_2(12G)\label_2\voxel(about 25G)
4.python3 train.py
errror : thert is no label of testing data ,so i copy "training" to "testing" in trainpy.
then :
..........
train: 18/60 @ epoch:3/10 loss: 1.9014711380004883 reg_loss: 0.31266674399375916 cls_loss: 1.5888043642044067 default
train ['000004']
--------------------using time: 73.70951771736145s-------------------
train: 19/60 @ epoch:3/10 loss: 1.5401957035064697 reg_loss: 0.23529152572155 cls_loss: 1.3049042224884033 default
train ['000001']
--------------------using time: 77.3743188381195s-------------------
train: 20/60 @ epoch:3/10 loss: 1.8793950080871582 reg_loss: 0.2751219868659973 cls_loss: 1.6042730808258057 default
it will stop in this ,there is no error in terminal.
when use 4 tanx train the model, it used all the 20 cpu threads and 45G ram.it used gpu-memory 149m*8.Why it use so much cpu?????
i found that gpu-util:0%,0%,0%,50%.I train another model,so i think it didn't use gpu. what's the reason?
Can you give some sample of codes to save the results of boboxes on image, lidar bird view?
I believe there is one difference in implementation against the original paper. The VFE is not done by extracting the non-zero points. You should do FCN with all [K,T,7] tensor. If you use map_fn, it is very likely that the network run slowly.
Hello, im trying to impelemnt a singlestage detector (perception and prediction) which is inspired from IntenNet and Fast and Furious.
These two approaches uses voxelization to create a 3D voxel grid from point cloud data and feed it to a 3D CNN.
My question is, can i get inspired by the voxelization process in voxelnet to do the same?
Thank you
I notice you use the relu layer for the last layer of RPN which predicts class score and regression. However, as far as I know, the last layer of RPN should not use Relu. For example, if you want to shrink your anchor, you must have some negative prediction in regression.
The paper says 'BN and relu for all conv' but I doubt it is not applied at least for the last map.
Is there any way to visualize Velodyne point cloud and this project detection/tracking marker in the ROS Rviz visualizer?
And is there any way to run this project with a ROS bagfile such as this link?
Hi thanks for you hard working on this project. Is it possible to train with all the classes at once? If not, could you give me some hints so that I can figure out my own method?
I use only 6 frame data(30M),when I train the model,I found that it print "std::alloc" and the 16G memory +16G swap is out.I don't know why it use so much memory????what should I do? I change the "queue_size=6, use_multi_process_num=3",it can train,but very slow.
my computer config:
gtx1070 8G
i7
RAM 16G
Hi! I'm training the model but with no good results yet. Do you plan to share the trained model? Thanks!!
Can you provide it for me?Thanks!
Hi, Thank you for this great work!
I am wondering what the tensorflow & cudnn version you used for this implementation , cause when I ran train.py with tensorflow1.4, cudnn7.03 and cuda8.0 ,I get this error:
Loaded runtime CuDNN library: 7003 (compatibility version 7000) but source was compiled with 6021 (compatibility version 6000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
In data_aug.py, for rotation and scaling:
angle = np.random.uniform(-np.pi / 4, np.pi / 4)
lidar[:, 0:3] = point_transform(lidar[:, 0:3], 0, 0, 0, rz=angle)
gt_box3d = box_transform(gt_box3d, 0, 0, 0, -angle, 'camera')
newtag = 'aug_{}_2_{:.4f}'.format(tag, angle).replace('.', '_')
Above is the code for rotation:
It calls box_transform to rotational transform the gt_box3d.
At this step, gt_box3d is in 'camera' coordinate.
In box_transform function, it calculates boxes_corner using center_to_corner_box3d function.
And the elements in boxes_corner are still 'camera' coordinate.
However, since the point_transform function is written under assumption that the origin of coordinate is the location of Lidar, no matter it rotates around rz or ry, at this step boxes_corner should be in 'Lidar' coordinate.
I think you considered this already in the first option of data augmentation. It first transform to lidar based box and calculated everything needed for transforming and then finally it converts back to 'camera' coordinate using lidar_to_camera_box.
But for the second and the third option of data augmentation, you didn't consider location of the origin of 'camera' coordinate and 'Lidar' coordinate.
So I suggest you to update the code for the second data augmentation, rotation as:
how did u come up with the matrices for camera and lidar coordinate convertion in config.py?
(__C.MATRIX_P2, __C.MATRIX_T_VELO_2_CAM, __C.MATRIX_R_RECT_0 )
the thing is I want to try the VoxelNet with another lidar sensor but I don't know how to adjust this matrices.
Can you add an appropriate license to this repo?
Hey Jeasinema!
I'm getting the following error when running on a system with 2gb nvidia geforce 940M card with ompute capability 5.0
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with [shape[1,10,400,352,128]
]
can you please tell me if i can run training with my current system configuration. If so, please give me the suitable input arguments with train.py so as to avoid over usage of gpu.
Hello,
I am quite stuck with the text files containing labels. From KITTI i see that there are tracklets stored as xmls. Do i need to parse those into some particulat format ?. Any help is appreciated . Thanks in advance
Follwing Line 230 in utils.py
x, y, z = np.sum(roi, axis=0) / 8
should be replaced with the follwoing
x = np.sum(roi[:, 0], axis=0)/ 8
y = np.sum(roi[0:4, 1], axis=0)/ 4
z = np.sum(roi[:, 2], axis=0)/ 8
This makes sure that the x,y, z is the location of the bottom center of the bounding box.
hi @jeasinema, I am working on other project, where I have similar oriented bounding boxes, Could you please tell me how you are performing non maximum suppression on oriented bounding boxes.
Best Regards
Traceback (most recent call last):
File "train.py", line 21, in
from train_hook import check_if_should_pause
ImportError: No module named 'train_hook'
Could not find train_hook anywhere else
Hello,
Since you have used kitti dataset for LiDAR data, can you tell me about labels given in dataset. Are the labels only for objects visible in image? I went through one of the implementation on LiDAR based detection where data is being cropped on the basis of image data. The reason given by author is that labels are only available for objects visible in image. But VoxelNet is the implementation based only upon LiDAR, how did you train your model using Kitti Dataset? Did you do cropping of LiDAR data based on image.
qianguih/voxelnet#9
Thanks,
Sakshi
Hey,
I am trying to train your neural network on the KITTI dataset, but I have run into this error "Bus error: 10" which is associated with the code trying to read and invalid memory location, but cannot seem to figure out why.
I got the error initially because I did not have a "testing" folder set up while trying to train and I would get the error immediately.
But after doing so and populating it with some data, it started to run but this error gets thrown at random times within the first 7 steps.
Does the training try to read anything else specific which I may be missing?
Also, I only have a single GPU so I lowered the number in the config, which is why it is only training on 1 file.
Thank you very much
how to handle when there is no object in scene while training,
what should be the default regression values for reference?
I think there is something wrong with the logic of preprocess.py.
By original logic,
# [K, T, 7] feature buffer as described in the paper
feature_buffer = np.zeros(shape=(K, T, 7), dtype=np.float32)
# build a reverse index for coordinate buffer
index_buffer = {}
for i in range(K):
index_buffer[tuple(coordinate_buffer[i])] = i
for voxel, point in zip(voxel_index, point_cloud):
index = index_buffer[tuple(voxel)]
number = number_buffer[index]
if number < T:
feature_buffer[index, number, :4] = point
number_buffer[index] += 1
feature_buffer[:, :, -3:] = feature_buffer[:, :, :3] - \
feature_buffer[:, :, :3].sum(axis=1, keepdims=True)/number_buffer.reshape(K, 1, 1)
Since a voxel always contains less than T points, those entries will be [0,0,0,0, -vx, -vy, -vz] if implemented as above. I don't know whether this implementation is resonable because the original paper seems does not give instruction about how to hundle this case.
I think a reasonable result should be [vx, vy, vz, 0, 0,0,0]. Do you have some ideas?
when I run test.py:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value MiddleAndRPN_/conv7/kernel
I have followed the instructions in README.md, but when I tried to train the module, something went wrong.
Traceback (most recent call last):
File "train.py", line 19, in <module>
from model import RPN3D
File "/home/xxx/VoxelNet-tensorflow/model/__init__.py", line 12, in <module>
from model.model import *
File "/home/xxx/VoxelNet-tensorflow/model/model.py", line 17, in <module>
from utils import *
File "/home/xxx/VoxelNet-tensorflow/utils/__init__.py", line 10, in <module>
from utils.box_overlaps import *
ImportError: /home/xxx/VoxelNet-tensorflow/utils/box_overlaps.so: undefined symbol: _Py_ZeroStruct
I thought it may be due to cython compiling error. But it seems the compilation is successful.
When I ran the setup.py again, I got
running build_ext
Is there any other way to compile the box_overlaps.cpp
?
Can you help me solve this problem? Thanks so much!
When profiling the runtime I noticed that the tiling of the mask to mask = tf.tile(mask, [1, 1, 2 * self.units])
takes a long time. I was wondering why this mask is applied, don't empty voxels already get filtered out when selecting the pointcloud? Would love to know if there is any reason behind this code...
Hi @jeasinema,
I have noticed after augmantation of type 1, and adding delta Z makes some of the boxes are below ground plane. And some are above the ground plane. is it ok to give training data like this?
could you please comment on this
The function cal_rpn_target in utils.py
When you calculate targets[...], anchors_d was included, but it should be h_a, not d_a if you see the equation (1) in the original paper
What hardware did you use?
Also, with my puny card (GTX 970) I get "ResourceExhaustedError". What would be an easy way to scale down the memory requirements of the model?
with tf.variable_scope('MiddleAndRPN_' + name):
# convolutinal middle layers
temp_conv = ConvMD(3, 64, 64, 3, (2, 1, 1), (1, 1, 1), temp_conv, name='conv3')
temp_conv = tf.transpose(temp_conv, perm=[0, 2, 3, 4, 1])
temp_conv = tf.reshape(temp_conv, [-1, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128])
Hi,
Thanks for the amazing work on the implementation. I am wondering if it is possible to share some performance numbers of the trained model? Thanks!
Thanks for some amazing work in getting this started.
Any suggestions for a run that dies after step 10 on a single GPU?
Traceback (most recent call last):
File "/home/b.weinstein/voxelnet/train.py", line 202, in <module>
tf.app.run(main)
File "/home/b.weinstein/miniconda3/envs/voxelnet/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/b.weinstein/voxelnet/train.py", line 131, in main
batch = sample_test_data(val_dir, args.single_batch_size * cfg.GPU_USE_COUNT, multi_gpu_sum=cfg.GPU_USE_COUNT)
File "/home/b.weinstein/voxelnet/utils/kitti_loader.py", line 140, in sample_test_data
_, per_vox_feature, per_vox_number, per_vox_coordinate = build_input(voxel[idx * single_batch_size:(idx + 1) * single_batch_size])
File "/home/b.weinstein/voxelnet/utils/kitti_loader.py", line 172, in build_input
feature = np.concatenate(feature_list)
ValueError: need at least one array to concatenate
'Default' (orange) is a CPU run (still going)
The date (blue) was a Tesla K80 GPU run.
Because the CPU is slower, I don't know yet if it dies, i'll update if it makes it to step 15.
I'm going to start playing around with batch size? The error would suggest that its picking up an empty array?
May be similar to #21 and #11 , did anyone have success or a suspicion on what might have cause this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.