autoailab / fusiondepth Goto Github PK

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

License: MIT License

Python 98.96% Shell 1.04%

computer-vision computer-science lidar lidar-point-cloud depth-estimation monocular-depth-estimation self-supervised-learning self-driving-car artificial-intelligence convolutional-neural-networks

fusiondepth's Introduction

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

This paper has been accepted by CoRL (Conference on Robot Learning) 2021.

By Ziyue Feng, Longlong Jing, Peng Yin, Yingli Tian, and Bing Li.

Arxiv: Link YouTube: link Slides: Link Poster: Link

Abstract

Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard.

⚙️ Setup

You can install the dependencies with:

conda create -n depth python=3.6.6
conda activate depth
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
pip install tensorboardX==1.4
conda install opencv=3.3.1   # just needed for evaluation
pip install open3d
pip install wandb
pip install scikit-image

We ran our experiments with PyTorch 1.8.0, CUDA 11.1, Python 3.6.6 and Ubuntu 18.04.

💾 KITTI Data Prepare

Download Data

You need to first download the KITTI RAW dataset, put in the kitti_data folder. If your data path is different, you can either soft link it as kitti_data or update it into here

Our default settings expect that you have converted the png images to jpeg with this command, which also deletes the raw KITTI .png files:

find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and train from raw png files by adding the flag --png when training, at the expense of slower load times.

Preprocess Data

# bash prepare_1beam_data_for_prediction.sh
# bash prepare_2beam_data_for_prediction.sh
# bash prepare_3beam_data_for_prediction.sh
bash prepare_4beam_data_for_prediction.sh
# bash prepare_r100.sh # random sample 100 LiDAR points
# bash prepare_r200.sh # random sample 200 LiDAR points

⏳ Training

By default models and tensorboard event files are saved to log/mdp/.

Depth Prediction:

python trainer.py
python inf_depth_map.py --need_path
python inf_gdc.py
python refiner.py

Depth Completion:

Please first download the KITTI Completion dataset.

python completor.py

Monocular 3D Object Detection:

Please first download the KITTI 3D Detection dataset.

python export_detection.py

Then you can train the PatchNet based on the exported depth maps.

📦 Pretrained model

You can download our pretrained model from the following links: (These are weights for the "Initial Depth" prediction only. Please use the updated data preparation scripts, which will provide better performance than mentioned our paper.)

CNN Backbone	Input size	Initial Depth Eigen Original AbsRel	Link
ResNet 18	640 x 192	0.070	Download 🔗
ResNet 50	640 x 192	0.073	Download 🔗

📊 KITTI evaluation

python evaluate_depth.py
python evaluate_completion.py

python evaluate_depth.py --load_weights_folder log/res18/models/weights_best --eval_mono --nbeams 4 --num_layers 18
python evaluate_depth.py --load_weights_folder log/res50/models/weights_best --eval_mono --nbeams 4 --num_layers 50

Citation

@inproceedings{feng2022advancing,
  title={Advancing self-supervised monocular depth learning with sparse liDAR},
  author={Feng, Ziyue and Jing, Longlong and Yin, Peng and Tian, Yingli and Li, Bing},
  booktitle={Conference on Robot Learning},
  pages={685--694},
  year={2022},
  organization={PMLR}
}

Reference

Our code is developed from the Monodepth2: https://github.com/nianticlabs/monodepth2

Contact

If you have any concern with this paper or implementation, welcome to open an issue or email me at '[email protected]'

fusiondepth's People

Contributors

Stargazers

Watchers

Forkers

weixuwang dzin18 jhan15 russ76 dtbinh luomeiwen dongyanchaotj

fusiondepth's Issues

请问为什么论文里Initial Depth结果与预训练模型结果不一样？

论文中为0.078

Visualization

Hi Ziyue,

Thanks for releasing this fantastic work. I have a quick question about your video demo. Can you give some suggestions about point cloud visualization? like any tools or software etc.

Thanks,
Hang

Error when run evaluation on training result

Thank you for your work!

I have followed the procedure of Preprocess Data and Depth Prediction. However, when I run the evaluation on my model, the error occurred:

Traceback (most recent call last):
  File "evaluate_depth.py", line 510, in <module>
    evaluate(options.parse())
  File "evaluate_depth.py", line 120, in evaluate
    encoder.load_state_dict({k: v for k, v in encoder_dict.items() if k in model_dict})
  File "/home/chenwei/anaconda3/envs/diff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
        size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
        size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64,1, 1]).
        size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size ([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
        size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
        size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
        size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3] ).
        size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).

It seems that the format of the weights missmatch the evaluation code. This error also occurred when I tried to run evaluation on the initial model (generated by running python trainer.py). Nevertheless, when I run your pretrained model, everything was OK.

Absolute depth or relative depth?

Hello! Thank you for your splendid work! But I have some questions about what we get is absolute depth or relative depth by using your model?

同样的权重文件训练过程中val的结果与evaluate_depth.py运行的结果不一致？

Can you share the pretrained model

Accurate or normalized depth value in training?

Thanks for your contribution in FusionDepth!

I wonder whether the "depth" in the trainer.py is normalized or the accurate value of the depth in each pixel. When I print them in the training process, the value looks like normalized, which is generally below 1. But in the reprojection procedure, this depth value is used directly.
$T@S_F{IPZ(TMO7H2@GL9`%1$

What's more, when calculating the loss, the depth is multiplied by 26. I have no idea what the "26" means.

Error when run evaluation by pretrained model

Thank you for your interesting work.
I want to confirm the performance of depth completion task. I prepared the pretrained model ResNet50 and validation data(data_depth_selection).
When I run
python evaluate_completion.py --load_weights_folder log/res50/models/weights_best --eval_mono --nbeams 4 --num_layers 50
It returns the following error.

Traceback (most recent call last):
  File "evaluate_completion.py", line 373, in <module>
    evaluate(options.parse())
  File "evaluate_completion.py", line 174, in evaluate
    output = depth_decoder(features, beam_features=beam_features)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/localDisk/yangjunjie/scripts/3d/FusionDepth/networks/depth_decoder.py", line 70, in forward
    x = input_features[-1] + beam_features[-1]
RuntimeError: The size of tensor a (20) must match the size of tensor b (38) at non-singleton dimension 3

It seems that input image is resized while the corresponding depth map keeps the same.

运行refiner.py时loss为nan

作者你好，很感谢你的工作，当我运行到refiner.py时我发现loss为nan，经过debug我发现主要是因为gdc_loss为nan，这是我各个变量的值，希望你能帮助我找出问题

The evaluation_depth.py is missing

Hello, I want to follow your work, but I can't find any files named evaluation_depth.py, could you help me
?

the quantitative comparsion in your paper

ziyue 你好。
你的论文里标表1 中，比如Dorn &BTS的结果从哪里得到的啊？
我在他们的论文里没有找到。
你是重新在kitti eigen split上训练这些网络了吗？

Tensor size matching error

Dear authors, thanks for the great work! I'm trying to train with a custom dataset containing images and Pseudo Dense Representations Generation of size H * W * 1 and have changed the Resnet encoder dimension from 2 to 1 accordingly. However, I'm getting RuntimeError: The size of tensor a (10) must match the size of tensor b (15) at non-singleton dimension 3 at x = input_features[-1] + beam_features[-1] in depth_decoder.py. I guess it's related to scaling as the Pseudo Dense Representations Generation has the original scale while for image it's scaled down. However in your original inputs["2channel"] = self.load_4beam_2channel(folder, frame_index, side, do_flip) it seems that there's no scaling down involved. Do you have any idea what might be the issue? Thanks!