Giter VIP home page Giter VIP logo

det3d's People

Contributors

a157801 avatar cslxiao avatar idiomaticrefactoring avatar jhultman avatar nuri-benbarka avatar poodarchu avatar qchenclaire avatar s-ryosky avatar siyeong-lee avatar tianweiy avatar tyagi-iiitv avatar xmyqsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

det3d's Issues

Trying to train CBFGS. all values are nan

Kindly help: All values are naN

2020-01-07 17:22:53,040 - INFO - task : ['car'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 26.6600, num_neg: 31688.8400
2020-01-07 17:22:53,040 - INFO - task : ['truck', 'construction_vehicle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 40.4800, num_neg: 63408.1400
2020-01-07 17:22:53,040 - INFO - task : ['bus', 'trailer'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 58.1800, num_neg: 63362.3000
2020-01-07 17:22:53,040 - INFO - task : ['barrier'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 7.8600, num_neg: 31742.0200
2020-01-07 17:22:53,040 - INFO - task : ['motorcycle', 'bicycle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 11.8800, num_neg: 63486.6800
2020-01-07 17:22:53,040 - INFO - task : ['pedestrian', 'traffic_cone'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 13.6200, num_neg: 63489.2200

Unable to reproduce CBGS's results on NuScenes

Comparing to the current master branch, I made two changes in order to fix the NaN training loss.

The first change is described in #46 .

The second change is to add what's below before line 193 in losses.py

# FIX NaN TARGETS 
target_tensor = torch.where(
    torch.isnan(target_tensor), prediction_tensor, target_tensor
)

Besides, I set

norm_cfg = dict(type='SyncBN', eps=1e-3, momentum=0.01)

in examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py and

torch.backends.cudnn.benchmark = True

in tools/train.py.

Here are my results on the validation set after training 20 epochs:

car Nusc dist [email protected], 1.0, 2.0, 4.0
59.25, 71.87, 77.22, 79.63 mean AP: 0.7199402759604012
truck Nusc dist [email protected], 1.0, 2.0, 4.0
17.96, 35.01, 43.00, 47.15 mean AP: 0.357782470829584
construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0
0.00, 1.28, 6.75, 13.37 mean AP: 0.05348830261303094
bus Nusc dist [email protected], 1.0, 2.0, 4.0
23.87, 48.49, 62.98, 66.32 mean AP: 0.5041451034213309
trailer Nusc dist [email protected], 1.0, 2.0, 4.0
1.94, 14.27, 30.88, 42.11 mean AP: 0.22300031478924093
barrier Nusc dist [email protected], 1.0, 2.0, 4.0
28.06, 48.97, 57.80, 60.27 mean AP: 0.4877375663669212
motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0
24.97, 29.29, 30.38, 30.99 mean AP: 0.28906646690838084
bicycle Nusc dist [email protected], 1.0, 2.0, 4.0
6.20, 7.36, 7.98, 8.53 mean AP: 0.07516058303100348
pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0
62.82, 64.73, 66.83, 69.03 mean AP: 0.658543997130018
traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0
42.10, 44.31, 46.23, 50.65 mean AP: 0.4582346501948114

Overall the mean AP is 38.2, which is much lower than what's reported.

Can someone point me to what I might have missed? Thanks!

Circular dependency causes ImportError

Thanks so much for sharing your codebase. It really helps accelerate research.

I noticed there is a circular dependency between det3d/core/__init__.py and det3d/datasets/kitti/kitti.py (they import each other).

To replicate:

python -c 'import det3d.core'
>> 
  Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/det3d/det3d/core/__init__.py", line 2, in <module>
    from .evaluation import *
  File "/root/det3d/det3d/core/evaluation/__init__.py", line 10, in <module>
    from .eval_hooks import KittiDistEvalmAPHook, KittiEvalmAPHookV2
  File "/root/det3d/det3d/core/evaluation/eval_hooks.py", line 8, in <module>
    from det3d import datasets, torchie
  File "/root/det3d/det3d/datasets/__init__.py", line 4, in <module>
    from .kitti import KittiDataset
  File "/root/det3d/det3d/datasets/kitti/__init__.py", line 1, in <module>
    from .kitti import KittiDataset
  File "/root/det3d/det3d/datasets/kitti/kitti.py", line 7, in <module>
    from det3d.core import box_np_ops
ImportError: cannot import name 'box_np_ops'

A workaround is to make sure to import det3d.datasets before importing det3d.core. A preferable solution would be to remove the circular dependency.

Size of CBGS network

I have read through your paper and id like to know more about the network structure and number of parameters in each module. I tried to search in the code but i havent found anything.
Would you be able to indicate the size of 3D Feature extractor, RPN, Multi-group head in terms of layers and number of neurons in each layer? Or where can i find it in this repo?
Thank you

ModuleNotFoundError: No module named 'det3d.ops.nms.nms'

python setup.py develop made an issue

After adding in setup.py, got error below.

det3d/ops/nms/nms_kernel.cu.cc:48:61: note: (if you use \u2018-fpermissive\u2019, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
det3d/ops/nms/nms_kernel.cu.cc:50:61: error: there are no arguments to \u2018min\u2019 that depend on a template parameter, so a declaration of \u2018min\u2019 must be available [-fpermissive]
min(n_boxes - col_start * BLOCK_THREADS, BLOCK_THREADS);
^
det3d/ops/nms/nms_kernel.cu.cc:53:7: error: \u2018threadIdx\u2019 was not declared in this scope
if (threadIdx.x < col_size)
^
det3d/ops/nms/nms_kernel.cu.cc:62:17: error: there are no arguments to \u2018__syncthreads\u2019 that depend on a template parameter, so a declaration of \u2018__syncthreads\u2019 must be available [-fpermissive]
__syncthreads();
^
det3d/ops/nms/nms_kernel.cu.cc:64:7: error: \u2018threadIdx\u2019 was not declared in this scope
if (threadIdx.x < row_size)
^
det3d/ops/nms/nms_kernel.cu.cc: In function \u2018int _nms_gpu(int*, const DType*, int, int, DType, int)\u2019:
det3d/ops/nms/nms_kernel.cu.cc:123:37: error: expected primary-expression before \u2018<\u2019 token
nms_kernel<DType, BLOCK_THREADS><<<blocks, threads>>>(boxes_num,
^
det3d/ops/nms/nms_kernel.cu.cc:123:55: error: expected primary-expression before \u2018>\u2019 token
nms_kernel<DType, BLOCK_THREADS><<<blocks, threads>>>(boxes_num,
^
det3d/ops/nms/nms_kernel.cu.cc: In instantiation of \u2018int _nms_gpu(int*, const DType*, int, int, DType, int) [with DType = float; int BLOCK_THREADS = 64]\u2019:
det3d/ops/nms/nms_kernel.cu.cc:162:65: required from here
det3d/ops/nms/nms_kernel.cu.cc:123:66: warning: left operand of comma operator has no effect [-Wunused-value]
nms_kernel<DType, BLOCK_THREADS><<<blocks, threads>>>(boxes_num,
^
det3d/ops/nms/nms_kernel.cu.cc:124:53: warning: right operand of comma operator has no effect [-Wunused-value]
nms_overlap_thresh,
^
det3d/ops/nms/nms_kernel.cu.cc:125:44: warning: right operand of comma operator has no effect [-Wunused-value]
boxes_dev,
^
det3d/ops/nms/nms_kernel.cu.cc: In instantiation of \u2018int _nms_gpu(int*, const DType*, int, int, DType, int) [with DType = double; int BLOCK_THREADS = 64]\u2019:
det3d/ops/nms/nms_kernel.cu.cc:165:66: required from here
det3d/ops/nms/nms_kernel.cu.cc:123:66: warning: left operand of comma operator has no effect [-Wunused-value]
nms_kernel<DType, BLOCK_THREADS><<<blocks, threads>>>(boxes_num,
^
det3d/ops/nms/nms_kernel.cu.cc:124:53: warning: right operand of comma operator has no effect [-Wunused-value]
nms_overlap_thresh,
^
det3d/ops/nms/nms_kernel.cu.cc:125:44: warning: right operand of comma operator has no effect [-Wunused-value]

boxes_dev,
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
@chowkamlee81

ModuleNotFoundError: No module named 'det3d.ops.nms.nms'

when i try to execute python create_data.py function, it gave error below:

Kinfdly help

File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/ops/nms/nms_gpu.py", line 10, in
from det3d.ops.nms.nms import non_max_suppression
ModuleNotFoundError: No module named 'det3d.ops.nms.nms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/create_data.py", line 7, in
from det3d.datasets.kitti import kitti_common as kitti_ds
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/datasets/init.py", line 4, in
from .kitti import KittiDataset
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/datasets/kitti/init.py", line 1, in
from .kitti import KittiDataset
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/datasets/kitti/kitti.py", line 7, in
from det3d.core import box_np_ops
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/init.py", line 4, in
from .anchor import *
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/anchor/init.py", line 1, in
from .anchor_generator import (
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/anchor/anchor_generator.py", line 2, in
from det3d.core.bbox import box_np_ops
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/bbox/init.py", line 42, in
from . import box_coders, box_np_ops, box_torch_ops, geometry, region_similarity
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/bbox/box_coders.py", line 5, in
from . import box_np_ops, box_torch_ops
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/core/bbox/box_torch_ops.py", line 6, in
from det3d.ops.nms.nms_cpu import rotate_nms_cc
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/ops/nms/init.py", line 1, in
from det3d.ops.nms.nms_cpu import nms_jit, soft_nms_jit
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/ops/nms/nms_cpu.py", line 7, in
from det3d.ops.nms.nms_gpu import rotate_iou_gpu
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/ops/nms/nms_gpu.py", line 17, in
cuda=True,
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/utils/buildtools/pybind11_build.py", line 109, in load_pb11
cmds.append(Nvcc(s, out(s), arch))
File "/home/ubuntu/Nuscenes_Top/Det3D-master/det3d/utils/buildtools/command.py", line 128, in init
raise ValueError("you must specify arch if use cuda.")
ValueError: you must specify arch if use cuda.

No module named 'det3d.torchie.cnn.alexnet', resnet and vgg

First of all, thank you for your work!

When I was create dataset, the following error occurs:

python tools/create_data.py nuscenes_data_prep --root_path=/media/hz3014/DataLinux/v1.0-trainval_blobs --version="v1.0-trainval" --nsweeps=10
Traceback (most recent call last):
File "tools/create_data.py", line 7, in
from det3d.datasets.kitti import kitti_common as kitti_ds
File "/home/hz3014/Det3D/det3d/datasets/init.py", line 1, in
from .builder import build_dataset
File "/home/hz3014/Det3D/det3d/datasets/builder.py", line 3, in
from det3d.utils import build_from_cfg
File "/home/hz3014/Det3D/det3d/utils/init.py", line 2, in
from .registry import Registry, build_from_cfg
File "/home/hz3014/Det3D/det3d/utils/registry.py", line 3, in
from det3d import torchie
File "/home/hz3014/Det3D/det3d/torchie/init.py", line 2, in
from .cnn import *
File "/home/hz3014/Det3D/det3d/torchie/cnn/init.py", line 1, in
from .alexnet import AlexNet
ModuleNotFoundError: No module named 'det3d.torchie.cnn.alexnet'

It seems like in det3d/torchie/cnn, there is no related module provided.

DDP consumes much more GPU memory

@poodarchu @a157801 thanks for this wonderful code base.
When training pointpillars on one gpu(3 samples per gpu), it consumes 8417MB GPU memory.
However, it consumes 13849/13237 MB memory when trained on 2 gpus in one machine with DDP, and samples per gpu are still 3.
I wonder if this normal case?

About mghead loss compute question

Beside CBGS, tring train original pointpillars in nuscenes with the repo.
find the loss compute problem leading to a gradient explosion
here is the first epoch Head1 box_conv weight:

 box conv weight: Parameter containing:
tensor([[[[-0.0235]],

         [[-0.0223]],

         [[ 0.0100]],

         ...,

         [[ 0.0126]],

         [[-0.0176]],

         [[ 0.0154]]],


        [[[-0.0487]],

         [[ 0.0367]],

         [[ 0.0096]],

         ...,

         [[ 0.0182]],

         [[ 0.0200]],

         [[-0.0325]]],


        [[[ 0.0089]],

         [[-0.0121]],

         [[-0.0017]],

         ...,

         [[-0.0492]],

         [[-0.0505]],

         [[-0.0137]]],


        ...,


        [[[-0.0302]],

         [[-0.0257]],

         [[-0.0246]],

         ...,

         [[ 0.0090]],

         [[-0.0497]],

         [[ 0.0128]]],


        [[[ 0.0449]],

         [[ 0.0291]],

         [[ 0.0460]],

         ...,

         [[ 0.0024]],

         [[-0.0081]],

         [[-0.0162]]],


        [[[ 0.0178]],

         [[-0.0133]],

         [[ 0.0189]],

         ...,

         [[ 0.0100]],

         [[-0.0445]],

         [[-0.0162]]]], device='cuda:0', requires_grad=True)

here is the loss output (only compute head1 loss):

OrderedDict([('loss', [203.5531005859375]), ('cls_pos_loss', [0.04986190423369408]), ('cls_neg_loss', [201.2117919921875]), ('dir_loss_reduced', [0.6615481376647949]), ('cls_loss_reduced', [201.26165771484375]), ('loc_loss_reduced', [2.1591315269470215]), ('loc_loss_elem', [[0.05492932349443436, 0.041640881448984146, 0.67469322681427, 0.035490743815898895, 0.05674883723258972, 0.05906621366739273, 0.0, 0.0, 0.15699654817581177]]), ('num_pos', [86]), ('num_neg', [126794])])

in the second epoch:
the head1 cpnv_box weight changed and contain some NaN value:

 box conv weight: Parameter containing:
tensor([[[[-0.0235]],

         [[-0.0223]],

         [[ 0.0100]],

         ...,

         [[ 0.0126]],

         [[-0.0176]],

         [[ 0.0154]]],


        [[[-0.0487]],

         [[ 0.0367]],

         [[ 0.0096]],

         ...,

         [[ 0.0182]],

         [[ 0.0200]],

         [[-0.0325]]],


        [[[ 0.0089]],

         [[-0.0121]],

         [[-0.0017]],

         ...,

         [[-0.0492]],

         [[-0.0505]],

         [[-0.0137]]],


        ...,


        [[[    nan]],

         [[    nan]],

         [[    nan]],

         ...,

         [[    nan]],

         [[    nan]],

         [[    nan]]],


        [[[    nan]],

         [[    nan]],

         [[    nan]],

         ...,

         [[    nan]],

         [[    nan]],

         [[    nan]]],


        [[[ 0.0178]],

         [[-0.0133]],

         [[ 0.0189]],

         ...,

         [[ 0.0100]],

         [[-0.0445]],

         [[-0.0162]]]], device='cuda:0', requires_grad=True)

that's the last layer weight contain nan value leading back propagation to other layer are all nan value, the grad clip are set to:

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

Another try is that I set the loss value in a fixed num(300), which leading no nan value in all layer weight, and the loss are normal value(which means the problem is the loss compute rather than the network layer compute problem).

@poodarchu

PointPillar's performance is not so good compared with published results

Hi @poodarchu Thanks for your great code! I trained pointpillar with default config, while the performance is as follow, which is similar to the results post by @s-ryosky in #18 .

default

For the category of car, the published mAP of moderate level on kitti 3D test dataset is 74.99, while my trained one is only 75.66 on kitti val dataset. It seems cannot exceed the published one.
As far as I konw, other researchers could achieve about 77 on val with pointpillar, I wonder if there exists any problem in the configs? and can you pubulish your results? Thanks a lot!

About reproduce CBGS result

Instructions To Reproduce the Issue:

  1. what changes you made (git diff) or what code you wrote
    using the Ego velocity in every annotations, the other config are same.
# convert velo from global to lidar
for i in range(len(ref_boxes)):
    velo = np.array([*velocity[i], 0.0])
    velo = velo @ np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(
          l2e_r_mat).T
    velocity[i] = velo[:2]
    velocity = velocity.reshape(-1,2)
  1. what exact command you run:
python3 -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=/home/ubuntu/Documents/Det3D/trained_model
  1. what you observed:
mAP: 0.3719
mATE: 0.3724
mASE: 0.2661
mAOE: 0.9296
mAVE: 1.3655
mAAE: 0.2684
NDS: 0.4023
Eval time: 140.1s

Per-class results:
Object Class	AP	ATE	ASE	AOE	AVE	AAE
car	0.721	0.219	0.158	0.841	1.116	0.230
truck	0.371	0.426	0.198	0.640	1.155	0.307
bus	0.500	0.439	0.174	1.223	2.171	0.431
trailer	0.213	0.687	0.219	0.670	1.371	0.184
construction_vehicle	0.058	0.798	0.481	1.370	0.157	0.372
pedestrian	0.653	0.165	0.287	1.350	0.869	0.439
motorcycle	0.242	0.223	0.243	1.107	3.192	0.153
bicycle	0.043	0.199	0.264	1.111	0.894	0.031
traffic_cone	0.449	0.170	0.348	nan	nan	nan
barrier	0.470	0.398	0.289	0.056	nan	nan
Evaluation nusc: Nusc v1.0-trainval Evaluation
car Nusc dist [email protected], 1.0, 2.0, 4.0
59.48, 71.97, 77.40, 79.65 mean AP: 0.7212472062431424
truck Nusc dist [email protected], 1.0, 2.0, 4.0
18.48, 36.29, 44.83, 48.91 mean AP: 0.3712787143771077
construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0
0.00, 2.09, 8.06, 12.96 mean AP: 0.05777510817362395
bus Nusc dist [email protected], 1.0, 2.0, 4.0
23.60, 47.30, 62.94, 66.32 mean AP: 0.5003920838518946
trailer Nusc dist [email protected], 1.0, 2.0, 4.0
1.66, 13.24, 29.62, 40.49 mean AP: 0.21251682647224052
barrier Nusc dist [email protected], 1.0, 2.0, 4.0
26.14, 46.78, 55.95, 59.13 mean AP: 0.4700045657239055
motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0
20.39, 24.74, 25.62, 26.05 mean AP: 0.24202605811125658
bicycle Nusc dist [email protected], 1.0, 2.0, 4.0
3.93, 4.27, 4.35, 4.58 mean AP: 0.04280152387228541
pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0
62.12, 64.40, 66.13, 68.36 mean AP: 0.6525328516852104
traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0
40.81, 43.40, 45.49, 49.80 mean AP: 0.44874109465427564

9c9f22a58fdc45f2b8a119cda3554f1f
93fdce35d7db4764ad5f822f57ab49e2

Unable to reproduce the results in model zoo.

Expected behavior:

the score NDS don't reach the number in released paper, and the AVE number is abnormal large than others, this reproduced result even worse than pointpillars. Is the loss compute func exist some problems leading to this result?

How about kitti performance between multitask and single task?

Hi, thanks for your great work!

In the paper and Nuscenes dataset leader board, I saw the remarkable improvements. So how about kitti dataset performance? Did you compare the multitask results with SECOND1.6's single-class results? I think pointpillars multitask is not quite good because it directly uses SECOND1.0 code which failed to choose smartly the headers(even SECOND1.6 is not good).

Since KITTI dataset exists longer, I believe better results on it will be much persuasive.

Thanks!

Ground truth velocities are left uninitialized as NaNs

I believe this is related to #6 #42 #43 and #19 .

I followed INSTALL.md and installed nuscenes from https://github.com/poodarchu/nuscenes.git. I have also run create_data.py accordingly. From what I have seen, ground truth velocities that are cached in infos_train_10sweeps_withvelo.pkl are all NaN. I believe this is at least one of the issues that results in NaN losses.

I think line 516 in nusc_common.py

velocity = np.array([b.velocity for b in ref_boxes]).reshape(-1, 3)

should be:

velocity = np.array([
    nusc.box_velocity(token) for token in sample['anns']
]).reshape((-1, 3))

Otherwise the function (box_velocity) that computes velocity will never be called and b.velocity will stay uninitialized as NaNs.

Trying to train cbfgs. All values are NaN.

Kindly help all values are naN . Iam using single GPU

2020-01-07 17:22:53,040 - INFO - task : ['car'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 26.6600, num_neg: 31688.8400
2020-01-07 17:22:53,040 - INFO - task : ['truck', 'construction_vehicle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 40.4800, num_neg: 63408.1400
2020-01-07 17:22:53,040 - INFO - task : ['bus', 'trailer'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 58.1800, num_neg: 63362.3000
2020-01-07 17:22:53,040 - INFO - task : ['barrier'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 7.8600, num_neg: 31742.0200
2020-01-07 17:22:53,040 - INFO - task : ['motorcycle', 'bicycle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 11.8800, num_neg: 63486.6800
2020-01-07 17:22:53,040 - INFO - task : ['pedestrian', 'traffic_cone'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 13.6200, num_neg: 63489.2200

Training with Multi-GPU

Hi,

Thanks for sharing your great work! I am wondering if it is easy to train with multiple GPUs. I tried calling tools/train.py with --gpus=4 but it does not seem to do the trick.

Thanks,
Peiyun

Strange Results of SECOND on KITTI

In the training process, the code would eval the trained model every several epoches, while the results were strange like below:
image
How could it be like this?

multi-GPU training error

trying train CBGS in 8 GPU(2080ti), using the newest repo code, use follow code to start.

python3 -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=/home/ubuntu/Documents/Det3D/trained_model

the error looks like happend in syncBN:

    return SyncBatchnormFunction.apply(input, z, self.weight, self.bias, self.running_mean, self.running_var, self.eps, self.training or not self.track_running_stats, exponential_average_factor, self.process_group, self.channel_last, self.fuse_relu)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/apex/parallel/optimized_sync_batchnorm_kernel.py", line 26, in forward
    mean, var_biased = syncbn.welford_mean_var(input)
RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) (maybe_wrap_dim at /pytorch/c10/core/WrapDimMinimal.h:20)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f92265a1813 in /home/ubuntu/.local/lib/python3.6/site-packages/torch/lib/libc10.so)

by the way, server environment are using pytorch 1.3.1 + CUDA 10.1 + python3.6
@poodarchu

[Question] Why does fixed weight decay support only "one cylce" lr scheduler?

In SECOND's repo adam optimizer with fixed weight decay is supported on all lr scheduler.
However, in this repo fixed weight decay is supported on only "one cycle" lr scheduler.
Why?

def build_one_cycle_optimizer(model, optimizer_config):
    if optimizer_config.fixed_wd:
        optimizer_func = partial(
            torch.optim.Adam, betas=(0.9, 0.99), amsgrad=optimizer_config.amsgrad
        )
    else:
        optimizer_func = partial(torch.optim.Adam, amsgrad=optimizer_cfg.amsgrad)

    optimizer = OptimWrapper.create(
        optimizer_func,
        3e-3,
        get_layer_groups(model),
        wd=optimizer_config.wd,
        true_wd=optimizer_config.fixed_wd,
        bn_wd=True,
    )

    return optimizer

where is code for dataset sampling(DS Sampling)?

hi, thanks for this great code base.
I wonder where is code for dataset sampling(DS Sampling), which cause +5 map gain according to your paper "Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection", I could only find db_sampler of "GT_AUG" type, but I think it's diffierent than DS Sampling, am I understanding this correctly?

where is code for ds

How to use Det3D

Questions like:

  1. How to do X with Det3D?
  2. How Det3D does X?

Example: How to visualize detection result with Det3D?

NOTE:

  1. If you met any unexpected issue when using Det3D and wish to know why,
    please use the "Unexpected Problems / Bugs" issue template.

  2. We do not answer general machine learning / computer vision questions that are not specific to
    Det3D, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.

Result of a simple experiment on KITTI dataset by adding RGB features into points

Here is a simple experiment on KITTI dataset.
By adding RGB features into points, the 3d AP increases, but the bev AP drops a lot.

Benchmark

car [email protected], 0.70, 0.70:
bbox AP:90.70, 88.95, 87.33
bev  AP:89.65, 84.71, 81.73
3d   AP:85.85, 76.36, 69.63
aos  AP:90.61, 88.30, 86.31

with RGB feature

car  [email protected],  0.70,  0.70:
bbox AP:90.63, 88.86, 87.35
bev  AP:89.75, 86.15, 83.00
3d   AP:85.75, 75.68, 68.93
aos  AP:90.48, 88.36, 86.58

Based on Painted PointPillars result with segmentation feature instead of RGB feature
BEV on test set

 mAP | Car AP
Mod. | Easy | Mod. | Hard 
73.84 90.21 87.75    84.92
76.46 90.01 87.65    85.26
+2.62 -0.2   -0.1	+0.34

I address this as an overfitting problem and will test it.

Does anybody observe a similar result?
How about using the Nucense dataset?
How about adding augmentation on RGB?

Hope for large 3d AP gain on Pedestrian and Cyclist.

cudahash: Completely failed to build Cuda error in file

During training, I meet the problem below
at epoch 63 with --nproc_per_node=2, samples_per_gpu=6 and workers_per_gpu=6
at epoch 81 with --nproc_per_node=2, samples_per_gpu=4 and workers_per_gpu=4

cudahash: Completely failed to build
Cuda error in file '/root/spconv/src/cuhash/hash_table.cpp' in line 194 : an illegal memory access was encountered.

Val Result

I want to know the correspondence between result_val.json and Nuscenes sample_annotation.json in trainval set. It will help a lot when I apply your result in other tasks.
Thanks for attention!

Where can I download sample dataset

Where can I download sample dataset

I want to run you sample code, but I dou`t know where can I download this dataset. Can you support the dataset download link?

ImportError: cannot import name 'syncbn_gpu'

cxt@ubuntu4-X299X-AORUS-MASTER:~/codetest/det3d$ python tools/create_data.py kitti_data_prep --root_path=/home/cxt/Kitti/object/
/home/cxt/anaconda3/envs/second/lib/python3.6/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
warnings.warn(errors.NumbaWarning(msg))
/home/cxt/anaconda3/envs/second/lib/python3.6/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
warnings.warn(errors.NumbaWarning(msg))
/home/cxt/anaconda3/envs/second/lib/python3.6/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
warnings.warn(errors.NumbaWarning(msg))
Traceback (most recent call last):
File "tools/create_data.py", line 7, in
from det3d.datasets.kitti import kitti_common as kitti_ds
File "/home/cxt/codetest/det3d/det3d/datasets/init.py", line 4, in
from .kitti import KittiDataset
File "/home/cxt/codetest/det3d/det3d/datasets/kitti/init.py", line 1, in
from .kitti import KittiDataset
File "/home/cxt/codetest/det3d/det3d/datasets/kitti/kitti.py", line 8, in
from det3d.datasets.custom import PointCloudDataset
File "/home/cxt/codetest/det3d/det3d/datasets/custom.py", line 8, in
from .pipelines import Compose
File "/home/cxt/codetest/det3d/det3d/datasets/pipelines/init.py", line 18, in
from .preprocess import Preprocess, Voxelization, AssignTarget
File "/home/cxt/codetest/det3d/det3d/datasets/pipelines/preprocess.py", line 8, in
from det3d.builder import (
File "/home/cxt/codetest/det3d/det3d/builder.py", line 18, in
from det3d.models.losses import GHMCLoss, GHMRLoss, losses
File "/home/cxt/codetest/det3d/det3d/models/init.py", line 2, in
from .backbones import * # noqa: F401,F403
File "/home/cxt/codetest/det3d/det3d/models/backbones/init.py", line 1, in
from .scn import RCNNSpMiddleFHD, SpMiddleFHD
File "/home/cxt/codetest/det3d/det3d/models/backbones/scn.py", line 6, in
from det3d.models.utils import Empty, change_default_args
File "/home/cxt/codetest/det3d/det3d/models/utils/init.py", line 1, in
from .conv_module import ConvModule, build_conv_layer
File "/home/cxt/codetest/det3d/det3d/models/utils/conv_module.py", line 7, in
from .norm import build_norm_layer
File "/home/cxt/codetest/det3d/det3d/models/utils/norm.py", line 4, in
from det3d.ops.syncbn import DistributedSyncBN
File "/home/cxt/codetest/det3d/det3d/ops/syncbn/init.py", line 1, in
from .syncbn import DistributedSyncBN
File "/home/cxt/codetest/det3d/det3d/ops/syncbn/syncbn.py", line 12, in
from . import syncbn_gpu
ImportError: cannot import name 'syncbn_gpu'

CUDA error terminates the training process

Sometimes my program is terminated by "CUDA error: an illegal memory acess was encountered" in the training process. I used official code and default config setting, only changing the data_root and work_dir, the bug occured in the training in both cases of single gpu and distributed multiple gpus. The picture below shows the error infomation:
中途停顿

Sometimes the training on a single gpu could also be terminated as below:
单卡训的问题

While this problems seems can be ignored in multi-gpu training:
单卡问题被忽略

The envrionment of my server includes:

- OS: Ubuntu 16.04
- Python:  3.7.3
- CUDA: 10.1
- CUDNN: 7.4.1
- pytorch: 1.3.1
- gcc: 5.5.0
- cmake: 3.16.0
- nvidia driver version: 418.40.04
- gpu: 8 TITAN Xp

Really weird! How can i solve the problems as they usually occurs? Could anyone provide some information on these problems? Thanks a lot!

Invalid syntax

i get error in this section of nuscenes_commons.py
"len(info["sweeps"]) == nsweeps - 1), f"sweep {curr_sd_rec['token']} only has {len(info['sweeps'])} sweeps, you should duplicate to sweep num {nsweeps-1}"" saying as invalid syntax

Try to train the CBGS , all INFO value are nan

After modify some configs and compile the nms_gpu module successfully,
I am trying to train the CBGS network in my local computer with Nuscenens Dataset,
Not using the train.sh, but directly use

python3 train.py 
/home/muzi2045/Documents/Det3D/examples/cbgs/configs/nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py  --gpus=1

it can run , but the output in log file are nan value

2019-12-21 14:56:28,351 - INFO - Start running, host: muzi2045@muzi2045-MS-7B48, work_dir: /home/muzi2045/Documents/Det3D/trained_model
2019-12-21 14:56:28,351 - INFO - workflow: [('train', 1), ('val', 1)], max: 20 epochs
2019-12-21 14:56:57,005 - INFO - Epoch [1/20][50/64050]	lr: 0.00010, eta: 8 days, 11:53:41, time: 0.573, data_time: 0.178, transfer_time: 0.012, forward_time: 0.112, loss_parse_time: 0.000 memory: 1689, 
2019-12-21 14:56:57,005 - INFO - task : ['car'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 26.4600, num_neg: 31687.8400
2019-12-21 14:56:57,005 - INFO - task : ['truck', 'construction_vehicle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 36.3400, num_neg: 63408.7600
2019-12-21 14:56:57,005 - INFO - task : ['bus', 'trailer'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 54.0200, num_neg: 63379.1400
2019-12-21 14:56:57,005 - INFO - task : ['barrier'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 7.6200, num_neg: 31742.6000
2019-12-21 14:56:57,005 - INFO - task : ['motorcycle', 'bicycle'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 11.4400, num_neg: 63487.4600
2019-12-21 14:56:57,005 - INFO - task : ['pedestrian', 'traffic_cone'], loss: nan, cls_pos_loss: nan, cls_neg_loss: nan, dir_loss_reduced: nan, cls_loss_reduced: nan, loc_loss_reduced: nan, loc_loss_elem: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], num_pos: 13.4600, num_neg: 63489.3400

And how can I shutdown the gt_database file path output log?

Hopefully for any advice!
@poodarchu

bbox_head compute loss error

Try to train this CBGS in single GPU, after modified some params, occur this error:
image

it looks like when trained with Nuscenes dataset, setting the Range [-50.4, -50.4, 50.4, 50,4], and voxels_size[0.1, 0.1], it will generate [1008, 1008] array -> 1008 * 1008 * 2 = 2032128 anchors per class, but the box_preds output is [1, 126, 126, 18] per class -> 126 * 126 * 2 = 31752.

def add_sin_difference(boxes1, boxes2):
    rad_pred_encoding = torch.sin(boxes1[..., -1:]) * torch.cos(boxes2[..., -1:])
    rad_tg_encoding = torch.cos(boxes1[..., -1:]) * torch.sin(boxes2[..., -1:])
    boxes1 = torch.cat([boxes1[..., :-1], rad_pred_encoding], dim=-1)
    boxes2 = torch.cat([boxes2[..., :-1], rad_tg_encoding], dim=-1)
    return boxes1, boxes2

Hopefully for any advice
@a157801 @poodarchu

Can you shared the results of your reproduced models?

Dear @poodarchu ,

Thanks for your great work, with your open code, most researchers can save a lot of time. Now, many open source code just cannot reproduce results announced in their papers, which causes much confusion for followers. Can you share detailed results with your reproduced models (like pointrcnn) and give more configs files for various models? Thanks.

Augmentation , Class group balancing and Anchors

@poodarchu @a157801 thanks for the wonderful code base had few queries

  1. as mentioned in the paper which function performs the GT-AUG in the current code base
    2.Is class group balancing feature performed in the current code base if so which function is performing
  2. are anchors generated separately for kitti and nuscenses or having the same anchor which performs the anchor generation

Error in nuscenes-devkit installation

Error when I install the nuscenes-devkit.
The detail error is :
error in nuscenes-zbj setup command: "values of 'package_data' dict" must be a list of strings (got '*.json')

求助求助

How to determine in which direction a certain annotation was taken by the camera?
您好。
我是在读本科生,关于nuscenes可以交流一下吗?
想问一下,对于某个sample,其有6个方向的camera的jpg图片
现在我们想要确定某个方向图片的annotation。
可是,官方接口只给出了一个sample的所有annotation,这包括六个方向所有图片的annotation,我们无法确定某个方向图片有哪些annotation与之对应。
拜托了, 麻烦了。

也就是,怎么知道 某个annotation是哪个方向的相机拍出来的
How to determine in which direction a certain annotation was taken by the camera?

PointRCNN release time

Feature

PointRCNN

Motivation

PointRCNN as mentioned in README TODO list and #35 (comment)

Pitch

Hi @poodarchu , could you please estimate release time of PointRCNN? Thanks a lot and looking forward to that

Not support pytorch 1.1

Instructions To Reproduce the Issue:

  1. what changes you made (git diff) or what code you wrote
    After the commit b567905

  2. what exact command you run:
    Det3D/tools/train.py

  3. what you observed (including the full logs):

AttributeError: 'Tensor' object has no attribute 'bool'

is occurred in the following line.
https://github.com/poodarchu/Det3D/blob/56402d4761a5b73acd23080f537599b0888cce07/det3d/models/bbox_heads/mg_head.py#L1038

  1. please also simplify the steps as much as possible so they do not require additional resources to
    Run on PyTorch 1.1

Expected behavior:

PyTorch 1.1 or higher is recommended in readme.
But PyTorch==1.1 doesn't support to the 'bool' attribute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.