Giter VIP home page Giter VIP logo

pseudo_lidar_v2's Introduction

End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection

This paper has been accepted by Computer Vision and Pattern Recognition 2020.

End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection

by Rui Qian*, Divyansh Garg*, Yan Wang*, Yurong You*, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger and Wei-Lun Chao

Citation

@inproceedings{qian2020end,
  title={End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection},
  author={Qian, Rui and Garg, Divyansh and Wang, Yan and You, Yurong and Belongie, Serge and Hariharan, Bharath and Campbell, Mark and Weinberger, Kilian Q and Chao, Wei-Lun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5881--5890},
  year={2020}
}

###Abstract

Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. However, so far these two networks have to be trained separately. In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end. The resulting framework is compatible with most state-of-the-art networks for both tasks and in combination with PointRCNN improves over PL consistently across all benchmarks --- yielding the highest entry on the KITTI image-based 3D object detection leaderboard at the time of submission.

Contents

Root
    | PIXOR
    | PointRCNN

We provide end-to-end modification on pointcloud-based detector(PointRCNN) and voxel-based detector(PIXOR).

The PIXOR folder contains implementation of Quantization as described in Section3.1 of the paper. Also it contains our own implementation of PIXOR.

The PointRCNN folder contains implementation of Subsampling as described in Section3.2 of the paper. It is developed based on the codebase of Shaoshuai Shi.

Data Preparation

This repo is based on the KITTI dataset. Please download it and prepare the data as same as in Pseudo-LiDAR++. Please refer to its readme for more details.

Training and evaluation

Please refer to each subfolder for details.

Questions

This repo is currently maintained by Rui Qian and Yurong You. Please feel free to ask any question.

You can reach us by put an issue or email: [email protected], [email protected]

pseudo_lidar_v2's People

Contributors

mileyan avatar yurongyou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pseudo_lidar_v2's Issues

Artifacts on lidar clouds and depthmaps

Hello! Thanks for your work.

I have some strange effect using your methods: I've got depth maps - and they are ok. But using the ./src/preprocess/generate_lidar_from_depth.py script to get lidar clouds from depth maps I get artifacts near the coordinate center, while other areas look quite good.
Here is an example:
pc_000
pc_001

This problem appears both before GDC and after.

Can anyone help me to deal with these artifacts?

GPU support for GDC

Thanks for publishing the code.

In the paper, you are rather explicit about running GDC on GPU, and it seems like you can achieve really nice run times. However, it seems that all the GDC code is implemented in numpy without GPU support. Is this correctly understood? And in this case, do you have any plans about publishing GDC with GPU support?

Thank you in advance,
Frederik

The Dataset

Hi, I want download the dataset, but I have no idea which one should be downloaded, because the kitti and sceneflow have many kinds of data fomat. Are the image_2 and image_3 means right color image and left color image?
and the sceneflow dataset's size is about 200GB when it's unzip, can I just download the example pack and use the pretrained model to test ?
I'd be very appreciate it if you could tell me how to test the example dataset, thank you very much.

some confusion about paper

1.p-lidar-v1 contains mono and stereo method,why p-lidar-v2 dont take mono into consider anymore?is the accuracy of p-lidar-series depened on depth estimation?
2.we found the depth estimation comes from the SDN, what is the difference between SDN depth estimation and stereo camera result(some stereo cameras (e.g. zed camera)can get depth map directly by using camera API).
3.if mono depth estimation has significant improvement(depth anything model), can pl++ use mono estimation to get pl data?

Read the calib file

Hi, thank you for your work. I am generating predictions according to what you said, but I have encountered a problem. The calib file in the KITTI data set, as you said, the calib file in Kitti dataset should be object calib instead of depth calib, so where can I find depth calib files? thanks in advance.

checkpoint.pth.tar

I can not find checkpoint.pth.tar,and What is the function of checkpoint in inferenceing

Typo

Thank you very much for sharing the codes.
I think there is a typo in the README instructions: "pip install -r ./requirments.txt" should be "pip install -r ./requirements.txt". A letter "e" is missing in "requirments" .

Questions about the baseline

Hi, I found that the code uses a fixed 0.54 baseline for data loading. Is there any difference between using a dynamic baseline and a fixed baseline?

Apply custom depth map dataset

Hello~ may I ask can I apply my own depth map and generate my own Pseudo-Lidar? I have my own depth dataset,and I want to apply in your repo to generate my own Pseudo-lidar not Nuscenes, is this possible?
Thank you~

Generated point clouds are shifted !

generated point clouds

Screenshot 2021-05-17 095714

provided point clouds

s

Hi ,I am trying to generate point clouds from depth using the script you provided but the point clouds are not correctly matched with the bounding boxes like those of the point clouds you provided.

Is this because you use GDC or something else ?
also I noticed that the bin files for the same image are different, my generated file is about 4.7mb while the one you provided is 5.6 mb.

edit: I even tried the training disparities given in the pseudo lidar repo but the results where close to the generated points clouds
I also tried the gdc but nothing has changed

tensorflow lite

First I want to congratulate you for the work done.
I would like to know if you plan to make a version that runs with tensorflow / tensorflow lite?

FPS Reported In Paper

Hello,

Many thanks for this excellent work! I was very impressed by your first Pseudo-LiDAR paper and now your team's idea to include a 4-beam LiDAR is really sensible .... just makes sense.

The runtime in the paper is reported at 90 ms/frame on a single GPU but the specs of the GPU are not reported.

Would you mind to share which GPU you used to obtain the 90 ms/frame reported in the paper?

You mentioned some CUDA improvements, did you run any experiments with CUDA improvements (not reported in the paper)?

I am curious what would be an acceptable frame rate for a real autonomous driving scenario? A single GPU would get ca. 11 frames/second essentially, have you guys tried multiple GPU's? Or anything similar not reported in the paper to keep things simple?

I re-produced the first PL paper using your AnyNet implementation for disp estimation and saw a dramatic improvement in processing speed with limited (although at times noticeable) difference.

I am looking forward to doing this same again here. Hopefully the L# idea can really make up that difference for AnyNet and still be efficient for embedded platforms

about depth map

may I use 4 beams lidar and single image to produce dense depth map? like Depth Completion Evaluation.

Train and test on Argoverse dataset

Dear authors,

I want to train and test your model on the Argoverse dataset.

Could you let me know the changes that have to be made in the code?

Thanks.

parameters of gdc/sparsify.py

Hi,
I'm wondering about the parameters of gdc/sparsify.py
with the parameters --W 1024 --H 64 --line_spec 5 7 9 11 can extract 4-line LiDAR from the velodyne data provided by KITTI,
what if I need 8-line LiDAR, how can i set the proper parameters to get them?

I'd be very appreciate it if you could tell me how to do that.

How to generate point clouds like the one in the paper

I have generated point clouds using "generate_lidar_from_depth.py" with depth maps. I visulized the point clouds by mayavi, however very different from the one in the paper. The point clouds I generated has red distorations (inside the red circle)and some long tails. Anyone knows how to get the point clouds properly?
1145df43e212b73cf0f561e27fc23bf

image

Question about 3D obejct detection results

Hello,mileyan!
Thanks for sharing your great code about Pseudo LiDAR and its ++ version. I am testing your output pseudo lidar on second/pointpillars and other voxel-based 3d object detection methods but found the results of them are relatively poor compared to frustumpointnet/PointRCNN and other methods you mentioned in your paper 1 and 2.
I wonder have you tested your lidar on these voxel-based 3d obejct detection methods? And how will it turn out to be finally?
Thanks a lot!

Sparsify the corrected point clouds

Hello

In the data set Kitti, the lidar truth value of each image is about 20000 points. Taking psmnet as an example, the pseudo radar generated by each image has about 200000 points. I would like to ask you, how many points are there in each image after you sparse the pseudo radar? thank you very much !

Must be sparse before gdc?

Hello, I am very interested in your article and code, and I have read the code carefully,

But I tried to use the depth map generated by the 64-line ground truth to correct the depth estimation result with gdc, and then down-sampling, but gdc has no effect

I must first convert the depth estimation result into pseudo-radar, then down-sampling it to 64 lines and then converting it into depth map, so that gdc can get the correct result

Do you have to convert the depth map to pseudo-radar, then sample the pseudo-radar, and then convert the depth map? It feels so cumbersome. Can't we use gdc for depth map first and then sample it down?
image

kitti datasets

Hi!
I want to know python ./src/preprocess/generate_depth_map.py --data_path path-to-KITTI/ --split_file ./split/trainval.txt

--split_file ./split/trainval.txt This file path does that mean I need to set up my own validation set from training set?

The question about GDC process

According to README, we should get the groudtruth depthmap and the predicted depthmap, then run the main_batch to obtain the corrected depthmap. I don't know where to sparse the kitti 64-beam signal in main_batch program. I want to use 64-beam signal to correct the depthmap predicted by stereo image, how to do?

Questions about pre-trained models

Hi, I used the your pre-trained model 'sdn_kitti_object.pth.', but the results of the test on train set are as follows:

[2021-12-18 15:26:29 main.py:214] INFO TEST Epoch0:0 L 1.770 RLI 3.965 RLO 0.142 ABS 0.072 SQ 0.443 DEL 0.945 DELQ 0.985 DELC 0.992

For this result I checked the information in the pre-trained model and it has an RMSE of only 3.0320 which is quite different from the actual test result 3.965 , do you know what causes this situation?

How much GPU memory is required during inference?

Seems like my 11GB GTX 1070 is not enough. I already reduced bval to 1.

CUDA out of memory. Tried to allocate 949.38 MiB (GPU 0; 10.92 GiB total capacity; 9.06 GiB already allocated; 365.00 MiB free; 339.49 MiB cached)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.