adlab-autodrive / bevfusion Goto Github PK

View Code? Open in Web Editor NEW

683.0 683.0 99.0 6.86 MB

Offical PyTorch implementation of "BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework"

License: Apache License 2.0

Python 94.12% Shell 0.19% Dockerfile 0.11% C++ 3.40% Cuda 2.15% Makefile 0.01% Batchfile 0.01%

bevfusion's People

Contributors

Stargazers

Watchers

Forkers

waterbearbee acmaheri shengstar zihan987 yuhuang-ca pyten fangwudi x2ss flookkrup hzm8341 mfkiwl chisyliu einstein10147 tyzaizl kejingjing88212 goldenminerlmg xiaocongcsu zhengfangwu lfl256 jcuic5 qianxin888 leonoyz bysowhat sinead-li zccjjj skyezzz z78406 hxcai kongan ye-hanyu tangtaogo zjufkq muchangshi gubei1998 kentang-mit sxjyjay wistful-8029 xbx-code jzhangdyc collector-m zhouleidcc adlab3ds sadwy deepbehavier xijunke hehualin-tut cc-xiaoyu landhill jucic mengxingshifen1218 minho8849 chaoqunwangcs deepphysicvision lambert-hpx chameleon001 stinky-tofu rhythmoftherain-byte zhaoxiaolong2020 datouready leslie27ch jinshubai helloworld77 seabird-go hehern lvyuanfeicr7 bob-cheng wang-jh18-svm chengwei920412 leungwaiho wjmcidi qihangwang bxzhao7 avi9700 zeng2345 hulaifeng re-frank zhangaigh zh-lei hellollw woojulee24 tianhaiwang vv-dl-jump lihua919 cxh0806 wkzcml-1 einsnull barrydoooit finch-lab lky-coder aibo-ryan tsingjinyun stu-z lishuanghua daxiongpro sharkls dl19940602 alvinloc xczhanjun kexuanxia whuhxb

bevfusion's Issues

mmdet3d Version Incompatible

I can't install mmdet3d=0.11.0 with the recommended version of torch=1.7.0 cuda=10.1 or 11.0. What's the matter? Or has anyone successfully configured the environment? thank
If I use torch=1.6.0, I can install mmdet3d==0.11.0, but code errors will occur in the end

dynamic fusion module

dear author:
I wonder where is code of dynamic fusion module in your project?

Training Error

Hi,
Nice work!

I could run the inference code successfully, but encountered with errors during training.
Issue1:

    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BEVF_FasterRCNN' object has no attribute 'kd'

I found self.kd and self.kd_feat_loss not defined, so I added self.kd=False in the init() function.

            if self.kd:
                losses_pts['kd_feat_loss'] = self.kd_feat_loss

Issue2:

    Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 4; 31.75 GiB total capacity; 17.30 GiB already allocated; 3.86 GiB free; 26.34 GiB reserved in total by PyTorch)

I used 8xV100(32G) and the error above occurred, I tried with adding torch.cuda.empty_cache() to the training script, but still OOM. Then I tried to tune the parameter sample_per_gpu: 4 → 2 for training.

Question about img_neck pretrain

Hi, thanks for your great work. When I train bevf_pp_cam model on nuScene dataset, I load mask_rcnn_dbswin-t_fpn_3x_nuim_cocopre.pth pretrained weight on nuImage into img_backbone and img_neck. But I found that the img_neck in bevf_pp_cam model has a adp_layer to concat multi-scale feature into a single feature map, and the parameters in this adp_layer are not contained in mask_rcnn_dbswin-t_fpn_3x_nuim_cocopre.pth, and you just simply freeze the whole img_neck when training, the parameters in adp_layer will never update right? I wonder why you freeze this adp_layer, can you give some explanations about it?

How to get the BEV feature of Lidar?

Hi, thanks a lot for sharing your great work!
I'm new to the 3D detection field. After reading your paper and the code, I didn't understand how to get the BEV feature of Lidar (e.g. using transfusion). I got the BEV feature of images obtained via the Lift-splat manner but didn't get the way getting the BEV feature of Lidar. From my reading of the code, it seems that there is no transformation of point cloud to BEV space.
Could you please help me figure it out?

batch_size

hello,I have some problems about how to change batch_size in training,I hope you will give me answers!Thank you very much!

TransFusion-L pretrain model

Thank you for your great work. I would like to train bevfusion, please provide a weight model of Transfusion-L. Thanks

About sensor failure tests

Hi, thank you for your wonderful and insightful work. As I was going through your readme, I noticed that some checkpoint files for the sensor-failed tests were given. So, for the sensor-failed tests, are these models trained in a sensor-failed way?

out of cuda momery in the latest version

batchsize 1
24GB
bevf_pp_2x8_1x_nusc.py

How much can the img_depth improve the NDS?

in bevf_transfusion.py
loss_depth = self.depth_dist_loss(depth_dist, img_depth, loss_method=self.img_depth_loss_method, img=img) * self.img_depth_loss_weight
losses.update(img_depth_loss=loss_depth)

thanks!

How to form the nuimage dataset?

How to form the nuimage dataset? Can I form the josn by " python tools/data_converter/nuimage_converter.py " ?

nuimages datasets

hello，I have a question about creating a nuimage dataset, which py file should I use to process the nuimages dataset after downloading it? Or can I just download it directly to the folder data/nuimages?

How to train on Waymo dataset?

Hello,

Thanks for sharing the code.

I found some code and configs files related to Waymo dataset.
Do you support Waymo dataset? How should I preprocess Waymo data? How to train on Waymo?

Regards

Training Error in LiDAR stream

Hello, thanks for your excellent work, when I tried to train the lidar stream, I got an error and I can not solve it, please help me with some useful advice, thanks very much!

My Environment:

sys.platform: linux
Python: 3.8.3 (default, Jul  2 2020, 16:21:59) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6: NVIDIA TITAN RTX
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.0
OpenCV: 4.6.0
MMCV: 1.3.8
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.0
MMDetection: 2.11.0
MMDetection3D: 0.11.0+be0cb2e

when I run ./tools/dist_train.sh configs/bevfusion/lidar_stream/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py 1
I got

import DCN failed
2022-08-07 13:26:24,943 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.3 (default, Jul  2 2020, 16:21:59) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6: NVIDIA TITAN RTX
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.0
OpenCV: 4.6.0
MMCV: 1.3.8
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.0
MMDetection: 2.11.0
MMDetection3D: 0.11.0+be0cb2e
------------------------------------------------------------

2022-08-07 13:26:27,683 - mmdet - INFO - Distributed training: True
2022-08-07 13:26:30,617 - mmdet - INFO - Config:
voxel_size = [0.25, 0.25, 8]
model = dict(
    type='MVXFasterRCNN',
    pts_voxel_layer=dict(
        max_num_points=64,
        point_cloud_range=[-50, -50, -5, 50, 50, 3],
        voxel_size=[0.25, 0.25, 8],
        max_voxels=(30000, 40000)),
    pts_voxel_encoder=dict(
        type='HardVFE',
        in_channels=4,
        feat_channels=[64, 64],
        with_distance=False,
        voxel_size=[0.25, 0.25, 8],
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=[-50, -50, -5, 50, 50, 3],
        norm_cfg=dict(type='naiveSyncBN1d', eps=0.001, momentum=0.01)),
    pts_middle_encoder=dict(
        type='PointPillarsScatter', in_channels=64, output_shape=[400, 400]),
    pts_backbone=dict(
        type='SECOND',
        in_channels=64,
        norm_cfg=dict(type='naiveSyncBN2d', eps=0.001, momentum=0.01),
        layer_nums=[3, 5, 5],
        layer_strides=[2, 2, 2],
        out_channels=[64, 128, 256]),
    pts_neck=dict(
        type='SECONDFPN',
        norm_cfg=dict(type='naiveSyncBN2d', eps=0.001, momentum=0.01),
        in_channels=[64, 128, 256],
        upsample_strides=[1, 2, 4],
        out_channels=[128, 128, 128]),
    pts_bbox_head=dict(
        type='Anchor3DHead',
        num_classes=10,
        in_channels=384,
        feat_channels=384,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='AlignedAnchor3DRangeGenerator',
            ranges=[[-49.6, -49.6, -1.80032795, 49.6, 49.6, -1.80032795],
                    [-49.6, -49.6, -1.74440365, 49.6, 49.6, -1.74440365],
                    [-49.6, -49.6, -1.68526504, 49.6, 49.6, -1.68526504],
                    [-49.6, -49.6, -1.67339111, 49.6, 49.6, -1.67339111],
                    [-49.6, -49.6, -1.61785072, 49.6, 49.6, -1.61785072],
                    [-49.6, -49.6, -1.80984986, 49.6, 49.6, -1.80984986],
                    [-49.6, -49.6, -1.763965, 49.6, 49.6, -1.763965]],
            sizes=[[1.95017717, 4.60718145, 1.72270761],
                   [2.4560939, 6.73778078, 2.73004906],
                   [2.87427237, 12.01320693, 3.81509561],
                   [0.60058911, 1.68452161, 1.27192197],
                   [0.66344886, 0.7256437, 1.75748069],
                   [0.39694519, 0.40359262, 1.06232151],
                   [2.49008838, 0.48578221, 0.98297065]],
            custom_values=[0, 0],
            rotations=[0, 1.57],
            reshape_out=True),
        assigner_per_size=False,
        diff_rad_by_sin=True,
        dir_offset=0.7854,
        dir_limit_offset=0,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    train_cfg=dict(
        pts=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.6,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                ignore_iof_thr=-1),
            allowed_border=0,
            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        pts=dict(
            use_rotate_nms=True,
            nms_across_levels=False,
            nms_pre=1000,
            nms_thr=0.2,
            score_thr=0.05,
            min_bbox_size=0,
            max_num=500)))
point_cloud_range = [-50, -50, -5, 50, 50, 3]
class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
dataset_type = 'NuScenesDataset'
data_root = 'data/nuscenes/'
input_modality = dict(
    use_lidar=True,
    use_camera=False,
    use_radar=False,
    use_map=False,
    use_external=False)
file_client_args = dict(backend='disk')
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=dict(backend='disk')),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=dict(backend='disk')),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='PointsRangeFilter', point_cloud_range=[-50, -50, -5, 50, 50, 3]),
    dict(
        type='ObjectRangeFilter', point_cloud_range=[-50, -50, -5, 50, 50, 3]),
    dict(
        type='ObjectNameFilter',
        classes=[
            'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
            'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
        ]),
    dict(type='PointShuffle'),
    dict(
        type='DefaultFormatBundle3D',
        class_names=[
            'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
            'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
        ]),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=dict(backend='disk')),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=dict(backend='disk')),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1.0, 1.0],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter',
                point_cloud_range=[-50, -50, -5, 50, 50, 3]),
            dict(
                type='DefaultFormatBundle3D',
                class_names=[
                    'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
                    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone',
                    'barrier'
                ],
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='NuScenesDataset',
        data_root='data/nuscenes/',
        ann_file='data/nuscenes/nuscenes_infos_train.pkl',
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=5,
                use_dim=5,
                file_client_args=dict(backend='disk')),
            dict(
                type='LoadPointsFromMultiSweeps',
                sweeps_num=10,
                file_client_args=dict(backend='disk')),
            dict(
                type='LoadAnnotations3D',
                with_bbox_3d=True,
                with_label_3d=True),
            dict(
                type='PointsRangeFilter',
                point_cloud_range=[-50, -50, -5, 50, 50, 3]),
            dict(
                type='ObjectRangeFilter',
                point_cloud_range=[-50, -50, -5, 50, 50, 3]),
            dict(
                type='ObjectNameFilter',
                classes=[
                    'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
                    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone',
                    'barrier'
                ]),
            dict(type='PointShuffle'),
            dict(
                type='DefaultFormatBundle3D',
                class_names=[
                    'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
                    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone',
                    'barrier'
                ]),
            dict(
                type='Collect3D',
                keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
        ],
        classes=[
            'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
            'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
        ],
        modality=dict(
            use_lidar=True,
            use_camera=False,
            use_radar=False,
            use_map=False,
            use_external=False),
        test_mode=False,
        box_type_3d='LiDAR'),
    val=dict(
        type='NuScenesDataset',
        data_root='data/nuscenes/',
        ann_file='data/nuscenes/nuscenes_infos_val.pkl',
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=5,
                use_dim=5,
                file_client_args=dict(backend='disk')),
            dict(
                type='LoadPointsFromMultiSweeps',
                sweeps_num=10,
                file_client_args=dict(backend='disk')),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1333, 800),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[-50, -50, -5, 50, 50, 3]),
                    dict(
                        type='DefaultFormatBundle3D',
                        class_names=[
                            'car', 'truck', 'trailer', 'bus',
                            'construction_vehicle', 'bicycle', 'motorcycle',
                            'pedestrian', 'traffic_cone', 'barrier'
                        ],
                        with_label=False),
                    dict(type='Collect3D', keys=['points'])
                ])
        ],
        classes=[
            'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
            'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
        ],
        modality=dict(
            use_lidar=True,
            use_camera=False,
            use_radar=False,
            use_map=False,
            use_external=False),
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type='NuScenesDataset',
        data_root='data/nuscenes/',
        ann_file='data/nuscenes/nuscenes_infos_val.pkl',
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=5,
                use_dim=5,
                file_client_args=dict(backend='disk')),
            dict(
                type='LoadPointsFromMultiSweeps',
                sweeps_num=10,
                file_client_args=dict(backend='disk')),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1333, 800),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[-50, -50, -5, 50, 50, 3]),
                    dict(
                        type='DefaultFormatBundle3D',
                        class_names=[
                            'car', 'truck', 'trailer', 'bus',
                            'construction_vehicle', 'bicycle', 'motorcycle',
                            'pedestrian', 'traffic_cone', 'barrier'
                        ],
                        with_label=False),
                    dict(type='Collect3D', keys=['points'])
                ])
        ],
        classes=[
            'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
            'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
        ],
        modality=dict(
            use_lidar=True,
            use_camera=False,
            use_radar=False,
            use_map=False,
            use_external=False),
        test_mode=True,
        box_type_3d='LiDAR'))
evaluation = dict(interval=24)
optimizer = dict(type='AdamW', lr=0.001, weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[20, 23])
momentum_config = None
total_epochs = 24
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d'
load_from = None
resume_from = None
workflow = [('train', 1)]
gpu_ids = range(0, 1)

2022-08-07 13:26:30,618 - mmdet - INFO - Set random seed to 0, deterministic: False
create hard
create hard
2022-08-07 13:26:30,677 - mmdet - INFO - Model:
MVXFasterRCNN(
  (pts_voxel_layer): Voxelization(voxel_size=[0.25, 0.25, 8], point_cloud_range=[-50, -50, -5, 50, 50, 3], max_num_points=64, max_voxels=(30000, 40000))
  (pts_voxel_encoder): HardVFE(
    (scatter): DynamicScatter(voxel_size=[0.25, 0.25, 8], point_cloud_range=[-50, -50, -5, 50, 50, 3], average_points=True)
    (vfe_layers): ModuleList(
      (0): VFELayer(
        (norm): NaiveSyncBatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (linear): Linear(in_features=10, out_features=64, bias=False)
      )
      (1): VFELayer(
        (norm): NaiveSyncBatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (linear): Linear(in_features=128, out_features=64, bias=False)
      )
    )
  )
  (pts_middle_encoder): PointPillarsScatter()
  (pts_backbone): SECOND(
    (blocks): ModuleList(
      (0): Sequential(
        (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): NaiveSyncBatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): NaiveSyncBatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): NaiveSyncBatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): NaiveSyncBatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
        (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (13): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (14): ReLU(inplace=True)
        (15): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (16): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (17): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
        (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (13): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (14): ReLU(inplace=True)
        (15): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (16): NaiveSyncBatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (17): ReLU(inplace=True)
      )
    )
  )
  (pts_neck): SECONDFPN(
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
        (1): NaiveSyncBatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
  )
  (pts_bbox_head): Anchor3DHead(
    (loss_cls): FocalLoss()
    (loss_bbox): SmoothL1Loss()
    (loss_dir): CrossEntropyLoss()
    (conv_cls): Conv2d(384, 140, kernel_size=(1, 1), stride=(1, 1))
    (conv_reg): Conv2d(384, 126, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(384, 28, kernel_size=(1, 1), stride=(1, 1))
  )
)
noise setting:
/root/BEVFusion/mmdetection-2.11.0/mmdet/apis/train.py:95: UserWarning: config is now expected to have a `runner` section, please set `runner` in your config.
  warnings.warn(
noise setting:
2022-08-07 13:26:33,548 - mmdet - INFO - Start running, host: root@zhangcaiji, work_dir: /root/BEVFusion/work_dirs/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d
2022-08-07 13:26:33,548 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) CheckpointHook
(NORMAL      ) DistEvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) DistSamplerSeedHook
(NORMAL      ) DistEvalHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook
(LOW         ) IterTimerHook
 --------------------
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL      ) CheckpointHook
(NORMAL      ) DistEvalHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_train_epoch:
(NORMAL      ) CheckpointHook
(NORMAL      ) DistEvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_epoch:
(NORMAL      ) DistSamplerSeedHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_epoch:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_run:
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
2022-08-07 13:26:33,548 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
Traceback (most recent call last):
  File "./tools/train.py", line 316, in <module>
    main()
  File "./tools/train.py", line 305, in main
    train_detector(
  File "/root/BEVFusion/mmdetection-2.11.0/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 51, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/root/BEVFusion/mmdetection-2.11.0/mmdet/models/detectors/base.py", line 247, in train_step
    losses = self(**data)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/root/BEVFusion/mmdet3d/models/detectors/base.py", line 58, in forward
    return self.forward_train(**kwargs)
  File "/root/BEVFusion/mmdet3d/models/detectors/mvx_two_stage.py", line 295, in forward_train
    img_feats, pts_feats = self.extract_feat(
  File "/root/BEVFusion/mmdet3d/models/detectors/mvx_two_stage.py", line 230, in extract_feat
    pts_feats = self.extract_pts_feat(points, img_feats, img_metas)
  File "/root/BEVFusion/mmdet3d/models/detectors/mvx_two_stage.py", line 214, in extract_pts_feat
    voxels, num_points, coors = self.voxelize(pts) # torch.Size([13909, 64, 4]) torch.Size([13909]) torch.Size([13909, 4])
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/root/BEVFusion/mmdet3d/models/detectors/mvx_two_stage.py", line 247, in voxelize
    res_voxels, res_coors, res_num_points = self.pts_voxel_layer(res)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/BEVFusion/mmdet3d/ops/voxel/voxelize.py", line 112, in forward
    return voxelization(input, self.voxel_size, self.point_cloud_range,
  File "/root/BEVFusion/mmdet3d/ops/voxel/voxelize.py", line 51, in forward
    voxel_num = hard_voxelize(points, voxels, coors,
RuntimeError: CUDA error: invalid device function
Traceback (most recent call last):
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/root/anaconda3/envs/BEVFusion_ali/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/BEVFusion_ali/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/bevfusion/lidar_stream/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.

loss:NAN or loss does not decline

Hello, good job.
When i was training the camera branch with nuscenes dataset, two problems occurred:
(1)loss didn't decline
(2)when i use img_depth， loss became nan after serveral iters.

When I run the demo,error occurs as follows

Traceback (most recent call last):
File "demo/pcd_demo.py", line 28, in
main()
File "demo/pcd_demo.py", line 20, in main
model = init_detector(args.config, args.checkpoint, device=args.device)
File "/home1/ugv/autodriving/BEVFusion-main/mmdet3d/apis/inference.py", line 51, in init_detector
model = build_detector(config.model, train_cfg=None, test_cfg=config.get('test_cfg'))
File "/home1/ugv/autodriving/BEVFusion-main/mmdet3d/models/builder.py", line 48, in build_detector
return DETECTORS.build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/mnt/EXOS_AUTO/JIANG_FOLDER/anoconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
TypeError: build_model_from_cfg() got multiple values for argument 'registry'

Who can help me?

custom dataset collection ?

Hi~

Is this ok to use different nums of camera or multi-lidar?

I want to make my own dataset and deploy this model to real vehicle, how many cameras should i choose ? And what kind of camera(FOV and resolution) should be choose ?

Thanks a lot.

camera feature bev conversion code

I would like to ask what part of the camera feature acquisition depth information and bev conversion code is. I can't find them.And What is the function of fpnc？

Camera failure test

@ADLab-AutoDrive Thanks so much for the simple and effective structure, the amazing work, and releasing the code ! After reading your paper, I am curious about the camera failure experiments. May I ask if it is possible for me to reproduce the camera failure experiments with the code and model that you currently provide? Many thanks!!

how to demo or inference

Hi, how to perform a demo or model inference to check the bev map generation process in the image stream? Also, the data_preparation seems to be missed？ Could you please provide a new guidance for dataset file structure? Thanks.

NuImage-pretrained backbone and neck

Thanks for your great work!
Is that possible to provide the nuImage-pretrained swin backbone and neck?

Citation on BEV space augmentation

Dear authors,

I'm one of the contributors to another project named BEVFusion. I noticed that you added in BEV space augmentation following BEVDet and our work. Thanks for citing our work in your up-to-date arXiv paper.

I wonder if it is possible to also reflect that you have referred to our code when adding this feature to your codebase in your README.md. There are discussions here and here back in June and July this year between me and the author who committed BEV space augmentation to this codebase, @Trent-tangtao.

I also found that the augmentation implementation in your dataset pipeline is the same as the implementation from our codebase (released a couple of months ago): lines 27-42 in this file corresponds to lines 134-155 in this file, lines 58-77 in this file corresponds to lines 253-272 in this file. Given the fact that our implementation is release much earlier than yours and we even release the training recipe way before your camera ready deadline, it would be greatly appreciated if you can consider acknowledging our work in this codebase.

There is also a minor comment on how you cite our results in the camera-ready version (thanks again for mentioning our work). Would you mind also including the MACs of your method and ours in the table? An accuracy number alone is not helpful for the community to understand the efficiency-accuracy tradeoff of both methods. We would like to cite your paper and this number in our future arXiv update and I believe an official number will be helpful. By the way, the entry BEVFusion-base on the nuScenes leaderboard is from us, and it was made before your camera-ready deadline. It will also be great if you can update this entry to your arXiv paper.

Thank you very much and congratulations on the acceptance of your paper to NeurIPS 2022. Looking forward to your presentation at the conference.

Best,
Haotian

How to visualize the result?

I used: ./tools/dist_test.sh configs/bevfusion/bevf_tf_4x8_6e_nusc.py ./work_dirs/bevfusion_tf.pth 8 --eval bbox --show --show-dir ./results/

but it didn't work, what is the visualization command?

matched_ious loss

may be loss_dict[f'matched_ious'] = 1 - layer_loss_cls.new_tensor(matched_ious)?

BEVFusion/mmdet3d/models/dense_heads/transfusion_head.py

Line 1299 in 3f99283

loss_dict[f'matched_ious'] = layer_loss_cls.new_tensor(matched_ious)

cannot import name 'ball_query_ext'

I tried to train the model and got this issue.
cannot import name 'ball_query_ext' from partially initialized module 'mmdet3d.ops.ball_query'
I use python setup.py develop to compile mmdet3d , but it still doesn't work
I have successfully installed mmdet3d==0.11.0 mmcv==1.4.0 mmdet == 2.11.0
Anyone has got and fixed this issue?
Thank you!

How about the inference time?

@ADLab-AutoDrive Thank you very much for your sharing! I am reading your paper and I am curious about the model inference time because you used the 2D-3D projection module.

about camera calibration

Thank you for your working.

https://github.com/ADLab-AutoDrive/BEVFusion/blob/main/mmdet3d/models/detectors/bevf_faster_rcnn.py#L126
2.
https://github.com/ADLab-AutoDrive/BEVFusion/blob/main/mmdet3d/models/detectors/cam_stream_lss.py#L311

In 1., the projection matrix is passed, but in 2., it looks like it is not used.
Do you use this kind of camera calibration when projecting to BEV?
(In original liftsplatshoot, that is used)

pretrain about lidar model

Hi,
Is the provided transfusion_train/lidar_tf.pth the same to work_dirs/transfusion_nusc_voxel_L/epoch_15.pth in the config?

KeyError: 'cam_intrinsic'

HI，Great Work！
When I was training, I suddenly reported an error in the middle of the operation. I encountered it in the training camera flow, lidar flow and bevfusion. How to solve it.

The following is the error：
KeyError: 'cam_intrinsic'

Differences in inference results

This is a great job, but I have two questions

There is a bug in running the configuration file.
The link to the bug is here
TypeError: forward() got an unexpected keyword argument 'extra_rots'
Why I downloaded the pre-trained model that you provided and the inference results did not match the readme NDS：72.1 mAP：69.6 . I got the result of NDS：0.7132 mAP：0.6848 with config（Maybe I didn't find the difference between this configuration file and the above mentioned）

Looking forward to your reply ^_^

train.py：keyerror

Hello, good job.
When I run the train.py, I run into some problems:
intrinsic = cam_info['cam_intrinsic']
KeyError: 'cam_intrinsic'

who can help me
I've run create_data.py but it didn't fix the problem

TransFusion Training Trick

Hi there, I am confused why my TransFusion Lidar training result is much lower than yours.
I fellow the same config file with 8GPU. Could u please help check whether the config is the correct version or do you have any training trick can share?
Thanks
Sincerely

Is it necessary to train nuimage for camera stream backbone and neck?

KeyError: 'MindFreeHook is not in the hook registry'

I was running command

./tools/dist_train.sh configs/bevfusion/cam_stream/bevf_tf_4x8_20e_nusc_cam_lr.py 8

and I had following error:

noise setting:
models/mask_rcnn_dbswin-t_fpn_3x_nuim_cocopre.pth
/workspace/BEVFusion/mmdetection-2.11.0/mmdet/apis/train.py:95: UserWarning: config is now expected to have a runner section, please set runner in your config.
warnings.warn(
noise setting:
Traceback (most recent call last):
File "tools/train.py", line 316, in
main()
File "tools/train.py", line 305, in main
train_detector(
File "/workspace/BEVFusion/mmdetection-2.11.0/mmdet/apis/train.py", line 163, in train_detector
hook = build_from_cfg(hook_cfg, HOOKS)
File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
raise KeyError(
KeyError: 'MindFreeHook is not in the hook registry'

It seems to be able to train after i comment the custom hook. I do not find information about MindFreeHook in the repo abd I was wondering what does it do? and does it affect performance if i comment it?

pretrain model missing

work_dirs/mask_rcnn_cbv2_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/mask_rcnn_cbv2_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.pth is not a checkpoint file

Hello, I got and pretrain model missing issue.
Anyone know where to find this model file?
mask_rcnn_cbv2_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.pth
Thank you.

missing keys in source state_dict

Hi,I came across the following warning in Training,please take a look about it,Thanks
The camera-only model : https://github.com/ADLab-AutoDrive/BEVFusion#nuscenes-detection-validation
print(cfg.load_lift_from) is ok

missing keys in source state_dict: img_backbone.cb_modules.0.patch_embed.proj.weight, img_backbone.cb_modules.0.patch_embed.proj.bias, img_backbone.cb_modules.0.patch_embed.norm.weight, img_backbone.cb_modules.0.patch_embed.norm.bias, img_backbone.cb_modules.0.layers.0.blocks.0.norm1.weight, img_backbone.cb_modules.0.layers.0.blocks.0.norm1.bias, img_backbone.cb_modules.0.layers.0.blocks.0.attn.relative_position_bias_table, img_backbone.cb_modules.0.layers.0.blocks.0.attn.relative_position_index, img_backbone.cb_modules.0.layers.0.blocks.0.attn.qkv.weight, img_backbone.cb_modules.0.layers.0.blocks.0.attn.qkv.bias, img_backbone.cb_modules.0.layers.0.blocks.0.attn.proj.weight, img_backbone.cb_modules.0.layers.0.blocks.0.attn.proj.bias, img_backbone.cb_modules.0.layers.0.blocks.0.norm2.weight, img_backbone.cb_modules.0.layers.0.blocks.0.norm2.bias, img_backbone.cb_modules.0.layers.0.blocks.0.mlp.fc1.weight, img_backbone.cb_modules.0.layers.0.blocks.0.mlp.fc1.bias, img_backbone.cb_modules.0.layers.0.blocks.0.mlp.fc2.weight, img_backbone.cb_modules.0.layers.0.blocks.0.mlp.fc2.bias, img_backbone.cb_modules.0.layers.0.blocks.1.norm1.weight, img_backbone.cb_modules.0.layers.0.blocks.1.norm1.bias, img_backbone.cb_modules.0.layers.0.blocks.1.attn.relative_position_bias_table, img_backbone.cb_modules.0.layers.0.blocks.1.attn.relative_position_index,
....
lift_splat_shot_vis.dx, lift_splat_shot_vis.bx, lift_splat_shot_vis.nx, lift_splat_shot_vis.frustum, lift_splat_shot_vis.camencode.depthnet.weight, lift_splat_shot_vis.camencode.depthnet.bias, lift_splat_shot_vis.bevencode.0.weight, lift_splat_shot_vis.bevencode.1.weight, lift_splat_shot_vis.bevencode.1.bias, lift_splat_shot_vis.bevencode.1.running_mean, lift_splat_shot_vis.bevencode.1.running_var, lift_splat_shot_vis.bevencode.3.weight, lift_splat_shot_vis.bevencode.4.weight, lift_splat_shot_vis.bevencode.4.bias, lift_splat_shot_vis.bevencode.4.running_mean, lift_splat_shot_vis.bevencode.4.running_var, lift_splat_shot_vis.bevencode.6.weight, lift_splat_shot_vis.bevencode.7.weight, lift_splat_shot_vis.bevencode.7.bias, lift_splat_shot_vis.bevencode.7.running_mean, lift_splat_shot_vis.bevencode.7.running_var, lift_splat_shot_vis.bevencode.9.weight, lift_splat_shot_vis.bevencode.10.weight, lift_splat_shot_vis.bevencode.10.bias, lift_splat_shot_vis.bevencode.10.running_mean, lift_splat_shot_vis.bevencode.10.running_var, seblock.att.1.weight, seblock.att.1.bias, reduc_conv.conv.weight, reduc_conv.bn.weight, reduc_conv.bn.bias, reduc_conv.bn.running_mean, reduc_conv.bn.running_var

cannot import name 'ball_query_ext' from partially initialized module 'mmdet3d.ops.ball_query' (most likely due to a circular import)

Hello, this problem occurred when I was organizing the dataset file. I read the source code and found no exceptions. No one asked in other issues. Is there a solution?

Traceback (most recent call last):
File "/home/wistful/work/my_bevfusion/tools/create_data.py", line 5, in
from tools.data_converter import kitti_converter as kitti
File "/home/wistful/work/my_bevfusion/tools/data_converter/kitti_converter.py", line 5, in
from mmdet3d.core.bbox import box_np_ops
File "/home/wistful/work/my_bevfusion/mmdet3d/core/init.py", line 2, in
from .bbox import * # noqa: F401, F403
File "/home/wistful/work/my_bevfusion/mmdet3d/core/bbox/init.py", line 4, in
from .iou_calculators import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D,
File "/home/wistful/work/my_bevfusion/mmdet3d/core/bbox/iou_calculators/init.py", line 1, in
from .iou3d_calculator import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D,
File "/home/wistful/work/my_bevfusion/mmdet3d/core/bbox/iou_calculators/iou3d_calculator.py", line 5, in
from ..structures import get_box_type
File "/home/wistful/work/my_bevfusion/mmdet3d/core/bbox/structures/init.py", line 1, in
from .base_box3d import BaseInstance3DBoxes
File "/home/wistful/work/my_bevfusion/mmdet3d/core/bbox/structures/base_box3d.py", line 5, in
from mmdet3d.ops.iou3d import iou3d_cuda
File "/home/wistful/work/my_bevfusion/mmdet3d/ops/init.py", line 5, in
from .ball_query import ball_query
File "/home/wistful/work/my_bevfusion/mmdet3d/ops/ball_query/init.py", line 1, in
from .ball_query import ball_query
File "/home/wistful/work/my_bevfusion/mmdet3d/ops/ball_query/ball_query.py", line 4, in
from . import ball_query_ext
ImportError: cannot import name 'ball_query_ext' from partially initialized module 'mmdet3d.ops.ball_query' (most likely due to a circular import)

gpu memory

Hi, how much memory does the training network need? Is 16g OK? How long does 8v100 (16g) need training? Thanks!

CUDA out of memory

When I train by bevf_tf_4x8_20e_nusc_cam_lr.py on 4x3090,samples_per_gpu=2, workers_per_gpu=4,it will report an error :CUDA out of memory,I would like to know what is causing this error, is it really due to a lack of video memory on the card or is it a code issue? I am looking forward to your answers,thank you very much!

Reproduce problem

Hi, @tingtingliangvs @Trent-tangtao Thanks for your great work. I'm trying to reproduce the fusion results as mentioned in README Table. I follow the training strategy and use the command you provide, I get the same results in camera stream, but lower mAP, NDS in lidar stream, and this causes the fusion result much lower:

# I use pointpillar as 3D backbone and pointpillar head as detection head.
                            
bevf_pp_nusc_cam(repo):  mAP: 22.9. NDS: 31.1
bevf_pp_nusc_cam(reproduce):  mAP: 22.6. NDS: 30.6
bevf_pp_nusc_lidar(repo):  mAP: 35.1. NDS: 49.8
bevf_pp_nusc_lidar(reproduce):  mAP: 31.6. NDS: 47.6
bevf_pp_nusce_fusion(repo):  mAP: 53.5  NDS: 60.4
bevf_pp_nusce_fusion(reproduce):  mAP: 51.7. NDS: 59.5

Is there something that i missed? Do you have any advice to reproduce the results?

Best,
Birdy

How about freezing the LiDAR stream?

Hello,

I find a trigger freeze_lidar_components in the config file. Have you performed the BEVfusion [the second stage] with the LiDAR stream frozen? Or freezing both LIDAR and camera stream and optimizing the fusion part only?

Pretrain：about bevf_pp_4x8_2x_nusc_cam/epoch_24.pth，hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/epoch_24.pth

Thanks for your great work！
Is that possible to provide：bevf_pp_4x8_2x_nusc_cam/epoch_24.pth，hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/epoch_24.pth
I didn't find it in the readme.

Question cocerning the pretrained model.

Hello, thanks for your inspiring work~

For training BEVfusion with the config configs/bevfusion/bevf_pp_2x8_1x_nusc.py, we notice the pretrained model of camera stream is missing, I,e., work_dirs/bevf_pp_4x8_2x_nusc_cam/epoch_24.pth.

Have you provided this model in this repo? Or we can use the provided mask_rcnn_dbswin-t_fpn_3x_nuim_cocopre.pth directly?

What do you think of is the main similarity and difference between your work and another BEVFusion(by MIT)?

Thanks for your great work! I have a question: what do you think of is the main similarity and difference between your work and another BEVFusion? I also would like to know why bev-based fusion is a good schema for fusion-based perception and what makes for its success? Thanks!!

About the fusion method

Hello, I don't quite understand your fusion method. You are fusing the image features under BEV with LIDAR results. Can you point out where your specific fusion stage code is? One more question, if I want to integrate the features near the predicted center point of centerpoint and the corresponding image features, can you give some tips?

import 'ball_query_ext' issue

I tried to train the model and got this issue.
cannot import name 'ball_query_ext' from partially initialized module 'mmdet3d.ops.ball_query'
Anyone has got and fixed this issue?
Thank you!

RuntimeError: mmdet3d/ops/spconv/src/indice_cuda.cu 124 cuda execution failed with error 2

File "/data0/HR_dataset/3D_det_track/2_shan/BEVFusion/mmdet3d/ops/spconv/ops.py", line 91, in get_indice_pairs
return get_indice_pairs_func(indices, batch_size, out_shape,
RuntimeError: mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.