nightmare-n / pvt-ssd Goto Github PK

View Code? Open in Web Editor NEW

42.0 3.0 5.0 55.41 MB

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer (CVPR 2023)

License: Apache License 2.0

Python 75.08% C++ 8.69% Cuda 15.53% C 0.54% Shell 0.16%

pvt-ssd's Introduction

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

Installation

We test this project on NVIDIA A100 GPUs and Ubuntu 18.04.

conda create -n pvt-ssd python=3.7
conda activate pvt-ssd
conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install -y -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -y pytorch3d -c pytorch3d
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-2-0 nuscenes-devkit==1.0.5 einops==0.6.0 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu111.html
git clone https://github.com/Nightmare-n/PVT-SSD
cd PVT-SSD && python setup.py develop --user

Data Preparation

Please follow the instruction of OpenPCDet to prepare the dataset. For the Waymo dataset, we use the evaluation toolkits to evaluate detection results.

data
│── waymo
│   │── ImageSets/
│   │── raw_data
│   │   │── segment-xxxxxxxx.tfrecord
│   │   │── ...
│   │── waymo_processed_data
│   │   │── segment-xxxxxxxx/
│   │   │── ...
│   │── waymo_processed_data_gt_database_train_sampled_1/
│   │── waymo_processed_data_waymo_dbinfos_train_sampled_1.pkl
│   │── waymo_processed_data_infos_test.pkl
│   │── waymo_processed_data_infos_train.pkl
│   │── waymo_processed_data_infos_val.pkl
│   │── compute_detection_metrics_main
│   │── gt.bin
│── kitti
│   │── ImageSets/
│   │── training
│   │   │── label_2/
│   │   │── velodyne/
│   │   │── ...
│   │── testing
│   │   │── velodyne/
│   │   │── ...
│   │── gt_database/
│   │── kitti_dbinfos_train.pkl
│   │── kitti_infos_test.pkl
│   │── kitti_infos_train.pkl
│   │── kitti_infos_val.pkl
│   │── kitti_infos_trainval.pkl
│── once
│   │── ImageSets/
│   │── data
│   │   │── 000000/
│   │   │── ...
│   │── gt_database/
│   │── once_dbinfos_train.pkl
│   │── once_infos_raw_large.pkl
│   │── once_infos_raw_medium.pkl
│   │── once_infos_raw_small.pkl
│   │── once_infos_train.pkl
│   │── once_infos_val.pkl
│── kitti-360
│   │── data_3d_raw
│   │   │── xxxxxxxx_sync/
│   │   │── ...
│── ckpts
│   │── pvt_ssd.pth
│   │── ...

Training & Testing

# train
bash scripts/dist_train.sh

# test
bash scripts/dist_test.sh

Results

Waymo

	Vec_L1	Vec_L2	Ped_L1	Ped_L2	Cyc_L1	Cyc_L2	Model
PVT-SSD	79.2/78.7	70.2/69.8	79.9/74.0	72.6/67.0	77.1/76.0	74.0/73.0	log
PVT-SSD_3f	80.6/80.2	71.9/71.5	83.9/80.6	75.1/72.1	77.9/77.0	74.8/74.0	log

We could not provide the above pretrained models due to Waymo Dataset License Agreement.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{yang2023pvtssd,
    author    = {Yang, Honghui and Wang, Wenxiao and Chen, Minghao and Lin, Binbin and He, Tong and Chen, Hua and He, Xiaofei and Ouyang, Wanli},
    title     = {PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {13476-13487}
}

Acknowledgement

This project is mainly based on the following codebases. Thanks for their great works!

pvt-ssd's People

Contributors

Stargazers

Watchers

Forkers

leftthink zppppppx hsushuai whuhxb

pvt-ssd's Issues

Why Q=vote_candidate_features.unsqueeze(1).permute(1, 0, 2), K=V=torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0)

vote_features = self.pv_transformer(
src=torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0),
tgt=vote_candidate_features.unsqueeze(1).permute(1, 0, 2),
pos_res=torch.cat([key_pos_emb.permute(1, 0, 2).unsqueeze(0), voxel_key_pos_emb.permute(1, 0, 2).unsqueeze(0)], dim=1)
).squeeze(0)

About transformer module, why you choose vote_candidate_features as Q, and torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0) as K ?
@Nightmare-n

about cuda version

The environment in readme include the waymo-open-dataset-tf-2-2-0 and cuda 11.1 .But tensorflow 2.2.0 requires cuda 10.1 so that i can't extract point cloud data from tfrecord

Building network failed on kitti, 3 classes.

I was trying to build the network for kitti, and I noticed that the yaml file only contains 'Car', so I changed it to the three classes, but it began to show that the target classes' dim is not 3. I did not find any other information about how to get the correct structure.

velodyne_reduced

Hi there
Thanks for your work first.
My question is for velodyne_reduced, how can I generate it? (64 lines in kitti_dataset.py)
Or it's the same thing as Velodyne?

A visualization of Figure 6

Hi! Thank you for sharing this interesting work.
I would like to know how you generated the visualization Figure 6, can you share the code?

Waymo training loss descreases very slowly or even not decreasing.

Hi, very good work! Fast in inference and very light-weighted.

May I know how the training went on your machine? I had finished training, but according to training logs, the loss basically stays the same. Considering I made a few changes in the code, I am trying using completely original code to train. The loss starts to decrease, but very slowly.

I did not change much of the code, I made two major changes:

When processing the waymo dataset using your code, it raised an error, because there is no file_client._map_path. I changed the code to:

def process_single_sequence(sequence_file, save_path, sampled_interval, client, has_label=True, use_two_returns=True):
    sequence_name = os.path.splitext(os.path.basename(sequence_file))[0]

    # print('Load record (sampled_interval=%d): %s' % (sampled_interval, sequence_name))
    if not client.exists(sequence_file):
        print('NotFoundError: %s' % sequence_file)
        return []

    # dataset = tf.data.TFRecordDataset(client._map_path(sequence_file), compression_type='')
    dataset = tf.data.TFRecordDataset(str(sequence_file), compression_type='')
    cur_save_dir = save_path / sequence_name
    cur_save_dir.mkdir(parents=True, exist_ok=True)
........

When I did the first training, I changed the dist_train.sh to OpenPCDet format, which is:

#!/usr/bin/env bash
set -x
NGPUS=$1
PY_ARGS=${@:2}

echo "#######################################" $PY_ARGS

while true
do
    PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
    status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
    if [ "${status}" != "0" ]; then
        break;
    fi
done
echo $PORT

python3 -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port $PORT train.py --launcher pytorch ${PY_ARGS}

The first training log is attached as well.
train-waymo-pvt-ssd.log

Any pretrained models on KITTI?

Hi, I wonder whether you have any pre-trained models on the KITTI dataset? It can be a great help!

Besides, may I ask which implementation of PointPillars and SECOND you used? Is it OpenPCDet?

open source

hello, congratulations! An excellent work, could you please share your code?

Waymo dataset

您好，我想请问一下，我已经使用V100-SXM2-32GB * 1卡成功在Kitti数据集中复现结果，
但输入python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml换用waymo数据集进行数据预处理时，报错如图所示，您知道什么原因吗

Nusense

作者你好工作非常好请问一下有NuScenes数据集的配置文件吗

waymo dataset

when I train waymo dataset, i meet the problem.
I use openpcdet to produce the waymo dataset.

KeyError: 'extrinsic'

The spconv version

Thanks for your work first.
My question is why your codes cannot be run on the latest SPCONV version(2.3.6).
After several days of debugging, I finally found that the spconv version you use here is weird.

For example,
If you use the .indices method to get the voxel index of the feature map backbone.
In the version you provide, the order of batch dim is increased. In this way, every batch is organized separately.
it looks like (0, x, y, z), (0, x, y, z) , (1, x, y, z), (1, x, y, z) , (2, x, y, z), (2, x, y, z)

However, from 2.2 version, the return value is out of order.
it looks like (0, x, y, z), (1, x, y, z) , (1, x, y, z), (0, x, y, z) , (2, x, y, z), (1, x, y, z)

Could you please check it?
The lastest version of spconv is much faster than 2.1 on Ampere Architecture card