Giter VIP home page Giter VIP logo

pvt-ssd's Introduction

arXiv GitHub Stars visitors

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

Installation

We test this project on NVIDIA A100 GPUs and Ubuntu 18.04.

conda create -n pvt-ssd python=3.7
conda activate pvt-ssd
conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install -y -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -y pytorch3d -c pytorch3d
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-2-0 nuscenes-devkit==1.0.5 einops==0.6.0 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu111.html
git clone https://github.com/Nightmare-n/PVT-SSD
cd PVT-SSD && python setup.py develop --user

Data Preparation

Please follow the instruction of OpenPCDet to prepare the dataset. For the Waymo dataset, we use the evaluation toolkits to evaluate detection results.

data
│── waymo
│   │── ImageSets/
│   │── raw_data
│   │   │── segment-xxxxxxxx.tfrecord
│   │   │── ...
│   │── waymo_processed_data
│   │   │── segment-xxxxxxxx/
│   │   │── ...
│   │── waymo_processed_data_gt_database_train_sampled_1/
│   │── waymo_processed_data_waymo_dbinfos_train_sampled_1.pkl
│   │── waymo_processed_data_infos_test.pkl
│   │── waymo_processed_data_infos_train.pkl
│   │── waymo_processed_data_infos_val.pkl
│   │── compute_detection_metrics_main
│   │── gt.bin
│── kitti
│   │── ImageSets/
│   │── training
│   │   │── label_2/
│   │   │── velodyne/
│   │   │── ...
│   │── testing
│   │   │── velodyne/
│   │   │── ...
│   │── gt_database/
│   │── kitti_dbinfos_train.pkl
│   │── kitti_infos_test.pkl
│   │── kitti_infos_train.pkl
│   │── kitti_infos_val.pkl
│   │── kitti_infos_trainval.pkl
│── once
│   │── ImageSets/
│   │── data
│   │   │── 000000/
│   │   │── ...
│   │── gt_database/
│   │── once_dbinfos_train.pkl
│   │── once_infos_raw_large.pkl
│   │── once_infos_raw_medium.pkl
│   │── once_infos_raw_small.pkl
│   │── once_infos_train.pkl
│   │── once_infos_val.pkl
│── kitti-360
│   │── data_3d_raw
│   │   │── xxxxxxxx_sync/
│   │   │── ...
│── ckpts
│   │── pvt_ssd.pth
│   │── ...

Training & Testing

# train
bash scripts/dist_train.sh

# test
bash scripts/dist_test.sh

Results

Waymo

Vec_L1 Vec_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2 Model
PVT-SSD 79.2/78.7 70.2/69.8 79.9/74.0 72.6/67.0 77.1/76.0 74.0/73.0 log
PVT-SSD_3f 80.6/80.2 71.9/71.5 83.9/80.6 75.1/72.1 77.9/77.0 74.8/74.0 log

We could not provide the above pretrained models due to Waymo Dataset License Agreement.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{yang2023pvtssd,
    author    = {Yang, Honghui and Wang, Wenxiao and Chen, Minghao and Lin, Binbin and He, Tong and Chen, Hua and He, Xiaofei and Ouyang, Wanli},
    title     = {PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {13476-13487}
}

Acknowledgement

This project is mainly based on the following codebases. Thanks for their great works!

pvt-ssd's People

Contributors

nightmare-n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pvt-ssd's Issues

Why Q=vote_candidate_features.unsqueeze(1).permute(1, 0, 2), K=V=torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0)

vote_features = self.pv_transformer(
src=torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0),
tgt=vote_candidate_features.unsqueeze(1).permute(1, 0, 2),
pos_res=torch.cat([key_pos_emb.permute(1, 0, 2).unsqueeze(0), voxel_key_pos_emb.permute(1, 0, 2).unsqueeze(0)], dim=1)
).squeeze(0)

About transformer module, why you choose vote_candidate_features as Q, and torch.cat([key_features.permute(1, 0, 2), voxel_key_features.permute(1, 0, 2)], dim=0) as K ?
@Nightmare-n

about cuda version

The environment in readme include the waymo-open-dataset-tf-2-2-0 and cuda 11.1 .But tensorflow 2.2.0 requires cuda 10.1 so that i can't extract point cloud data from tfrecord

Building network failed on kitti, 3 classes.

I was trying to build the network for kitti, and I noticed that the yaml file only contains 'Car', so I changed it to the three classes, but it began to show that the target classes' dim is not 3. I did not find any other information about how to get the correct structure.

image

velodyne_reduced

Hi there
Thanks for your work first.
My question is for velodyne_reduced, how can I generate it? (64 lines in kitti_dataset.py)
Or it's the same thing as Velodyne?

A visualization of Figure 6

Hi! Thank you for sharing this interesting work.
I would like to know how you generated the visualization Figure 6, can you share the code?

Waymo training loss descreases very slowly or even not decreasing.

Hi, very good work! Fast in inference and very light-weighted.

May I know how the training went on your machine? I had finished training, but according to training logs, the loss basically stays the same. Considering I made a few changes in the code, I am trying using completely original code to train. The loss starts to decrease, but very slowly.

I did not change much of the code, I made two major changes:

  1. When processing the waymo dataset using your code, it raised an error, because there is no file_client._map_path. I changed the code to:
def process_single_sequence(sequence_file, save_path, sampled_interval, client, has_label=True, use_two_returns=True):
    sequence_name = os.path.splitext(os.path.basename(sequence_file))[0]

    # print('Load record (sampled_interval=%d): %s' % (sampled_interval, sequence_name))
    if not client.exists(sequence_file):
        print('NotFoundError: %s' % sequence_file)
        return []

    # dataset = tf.data.TFRecordDataset(client._map_path(sequence_file), compression_type='')
    dataset = tf.data.TFRecordDataset(str(sequence_file), compression_type='')
    cur_save_dir = save_path / sequence_name
    cur_save_dir.mkdir(parents=True, exist_ok=True)
........
  1. When I did the first training, I changed the dist_train.sh to OpenPCDet format, which is:
#!/usr/bin/env bash
set -x
NGPUS=$1
PY_ARGS=${@:2}

echo "#######################################" $PY_ARGS

while true
do
    PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
    status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
    if [ "${status}" != "0" ]; then
        break;
    fi
done
echo $PORT

python3 -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port $PORT train.py --launcher pytorch ${PY_ARGS}

The first training log is attached as well.
train-waymo-pvt-ssd.log

Any pretrained models on KITTI?

Hi, I wonder whether you have any pre-trained models on the KITTI dataset? It can be a great help!

Besides, may I ask which implementation of PointPillars and SECOND you used? Is it OpenPCDet?

open source

hello, congratulations! An excellent work, could you please share your code?

Waymo dataset

您好,我想请问一下,我已经使用V100-SXM2-32GB * 1卡成功在Kitti数据集中复现结果,
但输入python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml换用waymo数据集进行数据预处理时,报错如图所示,您知道什么原因吗
Snipaste_2024-05-10_20-56-36

Nusense

作者你好 工作非常好 请问一下有NuScenes数据集的配置文件吗

waymo dataset

when I train waymo dataset, i meet the problem.
I use openpcdet to produce the waymo dataset.
b70ab99ab50e9f90b16fa626e787124

KeyError: 'extrinsic'

The spconv version

Thanks for your work first.
My question is why your codes cannot be run on the latest SPCONV version(2.3.6).
After several days of debugging, I finally found that the spconv version you use here is weird.

For example,
If you use the .indices method to get the voxel index of the feature map backbone.
In the version you provide, the order of batch dim is increased. In this way, every batch is organized separately.
it looks like (0, x, y, z), (0, x, y, z) , (1, x, y, z), (1, x, y, z) , (2, x, y, z), (2, x, y, z)

However, from 2.2 version, the return value is out of order.
it looks like (0, x, y, z), (1, x, y, z) , (1, x, y, z), (0, x, y, z) , (2, x, y, z), (1, x, y, z)

Could you please check it?
The lastest version of spconv is much faster than 2.1 on Ampere Architecture card

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.