ICRA 2019 "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera"

License: MIT License

Python 90.61% Shell 9.39%

depth-estimation depth-completion depth-prediction lidar deep-learning pytorch computer-vision self-supervised-learning kitti-dataset

self-supervised-depth-completion's People

Contributors

Stargazers

Watchers

Forkers

collector-m zebrajack esmaeilinia daidai321 pkurainbow satoshirobatofujimoto wanjinchang zswang666 visioan weblucas hansry mslavescu ossdc paman-ninja signalimagecv you2996 dontlovebugs chunyu-lin-bjtu dstarer neuzyy dorelcoman goodguy-dev ahandsomeperson minxuanjun yangyongguang chzhu940222 foolfrog nnu-gisa swapnesh-wani caiyufengone icemiliang leeyangg stiphyjay flybirdtian tuandle doctorwk007 swayfreeda zhzxlcc peterzs liuzc188 lechatelia sailfish009 suyibjut zhouyao4321 taoistsu chrgri seokjulee catsupfish rensimon romanoss jangocheng scri abelguima changgyhub kylincar dengqingkang elongeng shlomishriki labimage powerleonlee jlqzzz weisongwen amanwalia123 yosungho ml-and-ai-repo haozhen315 guominyingxiongququ liuguoyou electronicdevil edisonqkj mac137 hitzhangyu siyamsajeebkhan nakajimakou1 guivenca kinggreat24 kuersatp frederikwarburg avinash-ramesh yzxstore greendream182 wedexyz freefxy biancaalexandru rafiqul713 wangqiqi577 duaa1985 zq7734509 gaoqiangwu liangji-l fengziyue mfkiwl xrosliang unite-deals anasm87 mdl-psu liyang53719 zermax hongshenggeng daydreamer2023

self-supervised-depth-completion's Issues

About extracting trained model

When I downloaded the trained model, I could not extract the 'tar' file. I was wondering if there is something wrong with your 'tar' file.

Updated data_structure

Hey! @fangchangma I find your work really interesting and I am evaluating your approach for the task for single view depth completion.

I have a query regarding the data structure that you updated on 1st of October,
Initially, the data structure used to be like data/kitti_depth and data/kitti_rgb.

As in your updated data structure, you have included two new subfolders in data that is data/data_depth_velodyne and data/depth_selection.

Can you explain me where we need to download the data for the respective subfolders?

Pretrained model got poor result (RMSE=1343.609)

Hi @fangchangma Thanks for sharing the code. I evaluated the pretrained model provided in readme. The result is not as good as reported in the paper (rmse 1343 vs 814). It was a clean clone and I followed the data folder structure. I attached the command and the screenshot of the results. Please let me know if there is an error or if I missed something here. Thank you.

python main.py --evaluate pretrain/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-02-26@07-50/model_best.pth.tar

=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-05-08@10-21
Val Epoch: 8 [990/1000]	lr=0 t_Data=0.001(0.001) t_GPU=0.014(0.023)
	RMSE=1086.59(1347.03) MAE=308.10(359.76) iRMSE=4.29(4.27) iMAE=1.50(1.64)
	silog=4.67(5.24) squared_rel=0.00(0.01) Delta1=0.994(0.992) REL=0.018(0.020)
	Lg10=0.007(0.008) Photometric=0.000(0.000) 
=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-05-08@10-21
Val Epoch: 8 [1000/1000]	lr=0 t_Data=0.001(0.001) t_GPU=0.014(0.023)
	RMSE=1005.17(1343.61) MAE=262.70(358.79) iRMSE=4.57(4.28) iMAE=1.57(1.64)
	silog=4.98(5.24) squared_rel=0.00(0.01) Delta1=0.993(0.992) REL=0.018(0.020)
	Lg10=0.007(0.008) Photometric=0.000(0.000) 
*
Summary of  val round
RMSE=1343.609
MAE=358.790
Photo=0.000
iRMSE=4.277
iMAE=1.642
squared_rel=0.006554501281207195
silog=5.2404233943858145
Delta1=0.992
REL=0.020
Lg10=0.008
t_GPU=0.023
(best rmse is 1343.609)
*

How can I get the result in your paper?

In the paper, the best result is 814(rmse), how can I get it?

question about depth-estimation results

Hi, great papers, thanks a lot for sharing!

I have a question - have you done any evaluations of your latest work, but in a depth-estimation (not completion) setup? unsupervised or supervised?

any idea/thoughts of how such SOTA methods would compare to your work?

Thanks a lot!
Z.

dataset

from the kiti dataset, how is the data to be distributed in the data folder, with kitti_depth and kitti_rgb as its sub-folders?? I want to test he pre-trained model only. Please tell me what data do I need

cuda memory problem

RuntimeError: CUDA out of memory. Tried to allocate 52.25 MiB (GPU 0; 7.92 GiB total capacity; 6.71 GiB already allocated; 53.94 MiB free; 35.80 MiB cached)
File "main.py", line 247, in
main()
File "main.py", line 230, in main
result, is_best = iterate("val", args, val_loader, model, None, logger, checkpoint['epoch'])
File "main.py", line 105, in iterate
pred = model(batch_data)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/dense_lidar/self-supervised-depth-completion/model.py", line 119, in forward
conv3 = self.conv3(conv2) # batchsize * ? * 176 * 608
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torchvision/models/resnet.py", line 45, in forward
out = self.conv1(x)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)

When will the dataset script be made public?

Hello , I wanna know when will the dataset script be made public?

Save output depth map

Hey,
I want to save the depth map estimated by your approach. How is that possible? Right now if I am running

python3 main.py --evaluate /home/username/Downloads/model_best.pth.tar --val select

I just get the error-results printed in the terminal and in the result folder a summary. I want to access each estimated depth map. How is that possible?

Best

Error when import the pre-train model

I download the model trained with semi-dense lidar ground truth, from:supervised model.

When I run

torch.load('supervised/model_best.pth.tar')

This is a error:

Traceback (most recent call last):
  File "model.py", line 173, in <module>
    main()
  File "model.py", line 164, in main
    torch.load('supervised/model_best.pth.tar')
  File "/home/S/.local/lib/python3.5/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)
  File "/home/S/.local/lib/python3.5/site-packages/torch/serialization.py", line 538, in _load
    result = unpickler.load()
ImportError: No module named 'metrics'

Could your tell me why?
Thanks~

The result cannot be reproduced

Hi, fangchangma.
Thank you for your nice work. But I can't seem to reproduce the self-supervised results. Could you provide your training log and detailed training plan and hyperparameter configuration?
Looking forward to your reply.

Why is self-supervised worse than supervised?

On page 9 of the paper figure 6b, on the rightmost point, the self-supervised method receives semi-dense lidar ground truth, which is no longer "sparse depth loss"; I don't understand why it performs worse than the supervised method which has the same ground truth supervision. The self-supervised one has additional losses such as photometric loss, etc, so it should at least perform as well as the supervised one in my opinion.

How do you explain this?

More Information

Hi, i m curious about your work. i have already read your paper . Is there anything new?
When will you update the repository?

About batch_size and cuda memory

Hello,
When I am training the model, there will be a problem about "cuda: out of memory". I try to reduce the batch size, but the batch size does not seem too small for this work. Can you give me some advice about the minimum batch size ?

What is the network used for single d?

Seems there is no network choice as mentioned in paper for single d ?

silog error measurement

Hi,
Is the "silog" measurement calculated in metrices.py the scale-invariant Error proposed by Eigen et al. in https://arxiv.org/pdf/1406.2283.pdf

thanks.

inference

What should I do if I only want to use your code for a few images?

Use Stereo Pair Instead of Temporal Pair for Self-Supervised Training?

Hi @fangchangma,

Thank you for your impressive work. I just have one question regarding to your training setting.
In the paper, you used photometric consistency between current frame and next frame as the cue for self-supervised training.
But following similar idea, if using a stereo pair can accomplish such supervision as well by maintaining the photometric consistency between left & right image, right?
Have you try training in such way? I think it is computationally cheaper.
I am looking forward to your reply.

Best,

training with vlp-16 dataset

Hi Fangchang,

Our lab is currently working on a project which requires generating depth maps from our vlp-16 lidar and camera setting. Your work looks great as the depth map solution. Since we got different size images as input, I think what we need to do to use this network is (1) read in our own calibration information (K) and (2) crop input images as (width, high) both multiples of 16 (since we got errors when going through decode layers with some other sizes), is that right?

We've tested with a rather small dataset (only ~700 frames) and got results like the figure showing below.
We are wondering if the dataset is too small or the depth info from vlp-16 is too sparse since the results remain clear projected lines. It would be great if you have any suggestions, thanks!

Error while loading "calib_cam_to_cam.txt" - can not reshape the array.

I am using your pre-trained models for testing on the validation set "val_selection_cropped". But while loading the "calib_cam_to_cam.txt" certain errors come. I attaching the screenshot of the error.

I am running the script with this command in the terminal: -

python3 /scratch/gjain2s/Approaches/sparse_dense/self-supervised-depth-completion/main.py --data-folder /scratch/gjain2s/Approaches/sparse_dense/data --evaluate /home/gjain2s/self_supervised/model_best.pth.tar --val select

Any help would be highly appreciated.

python main.py --evaluate [checkpoint-path]

An error occurred when entering such a parameter on the command line，what can we input about [checkpoint -path]？Can you give us an example?

Namespace(batch_size=1, criterion='l2', epochs=11, evaluate='[checkpoint-path]', i
nput='gd', jitter=0.1, layers=34, lr=1e-05, pretrained=False, print_freq=10, rank_metric='rmse', result='..\results', resume='', start_epoch=0, train_mode='dense', use_d=True, use_g=True, use_pose=False, use_rgb=False, val='select', w1=0, w2=0, weight_decay=0, workers=4)
=> no model found at '[checkpoint-path]'

Training doesn't converge

Hi Fangchang:

Thank you so much for sharing this great project!

I have tested your pre-trained self-supervised model, it's RMSE is around 1300, matched with your paper.
But when I try to train the model with this command:
python main.py --train-mode sparse+photo
on 2 Tesla-V100 GPU for around 15 epochs, it can only converge to RMSE ~8k-9k and never further. I didn't change any hyper parameter from your code, just the batch-size is smaller than you mentioned (8).

Are there any parameters or options I need to change from this Github repo? Or do you have any suggestions on training?

Thank you so much!

Sincerely,
Ziyue Feng

Mismatch between comment and code

Hi,
Thank you for sharing your code with us! I am trying to evaluate the method on our own dataset. We gathered larger images and thus have to crop/resize them. When looking at the code, the comment in kitti_loader.py states:

note: we will take the center crop of the images during augmentation
# that changes the optical centers, but not focal lengths
https://github.com/fangchangma/self-supervised-depth-completion/blob/master/dataloaders/kitti_loader.py#L29

The optical center is then adjusted. However, in lines 145 and 168, a bottom crop is applied to the images. Thus, if I understand the code correctly, the full crop distance has to be subtracted from the focal centers.

Can you check if my understanding in this regard is correct?

Kind regards,
Chris

About result visualization

Hello,
In the evaluation results, I found that there is content about the visualization of the results, as shown in the picture. What I don't know is what the fourth column is. It seems to be a semi-dense depth map of the annotations, but I used the self-supervised mode(sparse+photo), which should not use annotation data. Can you answer my doubts? Thank you.

Trained Models Links Not Working

The links
http://datasets.lids.mit.edu/self-supervised-depth-completion
http://datasets.lids.mit.edu/self-supervised-depth-completion/supervised/
http://datasets.lids.mit.edu/self-supervised-depth-completion/self-supervised/

aren't working.

To much warning.

When I training the net, the warning raise.
[W IndexingUtils.h:20] Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (function expandTensors)
Can you tell me how to fix it ?

Running Error in train mode sparse+photo

Hi， Fangchang Ma:
After downloading the dataSet required and put them at the tree structure showed in readme, I try to run the demo using command "python main.py --train-mode sparse+photo -b 6". It meets error as follow:

'''
=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=6.wd=0.pretrained=False.jitter=0.1.time=2021-05-10@12-35
Train Epoch: 0 [9290/14317] lr=1e-05 t_Data=0.009(0.008) t_GPU=0.395(0.402)
RMSE=3337.93(10104.38) MAE=1233.75(5881.38) iRMSE=11.95(inf) iMAE=5.93(inf)
silog=11.96(nan) squared_rel=0.02(0.24) Delta1=0.954(0.600) REL=0.066(0.333)
Lg10=0.029(nan) Photometric=39.012(56.714)

Traceback (most recent call last):
File "main.py", line 362, in
main()
File "main.py", line 349, in main
epoch) # train for one epoch
File "main.py", line 171, in iterate
for i, batch_data in enumerate(loader):
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
return self._process_data(data)
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 306, in getitem
rgb, sparse, target, rgb_near = self.getraw(index)
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 300, in getraw
self.paths['gt'][index] is not None else None
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 156, in depth_read
depth_png = np.array(img_file, dtype=int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PngImageFile'
'''

Could anyone help me to solve the problem?
Or provide some suggestion?

Thanks!

How to test with KITTI object detection dataset

I am trying to use your code with KITTI object detection dataset.

Generate depth map by projecting the Lidar points to image plane
Put image_2 and generated depth map to test_depth_completion_anonymous folder
Run with test_completion mode

However, I got the following error

Traceback (most recent call last):
  File "main.py", line 248, in <module>
    main()
  File "main.py", line 231, in main
    result, is_best = iterate("test_completion", args, val_loader, model, None, logger, checkpoint['epoch'])
  File "main.py", line 105, in iterate
    pred = model(batch_data)
  File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ssd/public/jlliu/depth_completion/self-supervised-depth-completion/model.py", line 126, in forward
    y = torch.cat((convt5, conv5), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 48 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

Did I miss something?

Some questions about the details of the code

I notice that the model.py includes such a line:

I'm wondering why should it multiplies by 100?

colorize the depth map

hello fangchangma
would you please tell me that how to colorize the depth map in your papper
I get the gray depth image and don't konw how to compare with your result which is colorized like monodepth results.

Creating the data folder structure

@fangchangma I have two questions

I downloaded the kitii dataset which has a folder structure like this

Now, do I have to delete |-- val folder and replace with the following?

if my assumption is correct then the above steps will make dir structure like this

Where can I get the kitti_rgb data? Do I need to download from here for all the drive data that is there is kitti depth dataset? If yes, then do I need to copy the same folder
again?

some problem about photometric_loss

Hi,

Thanks for open-sourcing this great piece of work!

I am trying to implement your code and faced some problem，I see that in the main.py file is calculating loss2 (photometric_loss), you use rgb_curr_ (the image of the current frame), warped_ (the image of the neighboring frame predicted by the current frame image), and proofread through the mask to calculate the photometric_loss . So my question is

Is my understanding of rgb_curr_ and warped_ correct? If not, I hope to get your corrections.
Why use the current frame and predicted neighboring frame images to calculate photometric_loss, instead of using the current frame and predicted current frame to calculate photometric_loss.
In your paper, I have seen guidance on using RGB images for depth prediction, Is it the only way to calculate the photometric_loss using the RGB guide? If not, I hope you can give me your advice. Do you have any suggestions?

I have just come into contact with this knowledge, and there may be something wrong. I hope you forgive me.

Thanks for the help!

input GD better than RGBD

hello,I test the input-gd is better than input rgbd.I want to know whether is it true.

Why do you need to multiply the result of final conv by 100?

In the end of model.py


        if self.training:
            return 100 * y
        else:
            min_distance = 0.9
            return F.relu(100 * y - min_distance) + min_distance # the minimum range of Velodyne is around 3 feet ~= 0.9m

Could your tell me what is the 100 means?

The dataset used

hi, I wonder how many data you used? The whole depth completion datasst in KITTI about 85898 training data or just choose one sequence like 2011_09_26 to train the model ?
And you the image size feed the net 352x1216, with the limit of hardware, it it proper to downsample t0 176* 608 or 88 * 304 and then feed the network?
Is it necessary to do data augmentation which not mentioned in your paper

RuntimeError

When I try to run with test_completion split. I got the following error:

Traceback (most recent call last):
  File "main.py", line 248, in <module>
    main()
  File "main.py", line 231, in main
    result, is_best = iterate("test_completion", args, val_loader, model, None, logger, checkpoint['epoch'])
  File "main.py", line 105, in iterate
    pred = model(batch_data)
  File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/ssd/public/jlliu/self-supervised-depth-completion/model.py", line 126, in forward
    y = torch.cat((convt5, conv5), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 48 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

Any update on the release?

Hi Fangchang,
Wondering whether there is any update on the release? Is it possible to release partial implementation of the paper, for example, training and evaluation on the supervised learning? Also releasing the best model of your network will be very helpful.

Thanks!

About dataset

Hello,
This is a great project, I am very interested in it, but I found that there is no data set that can be used directly. Can you share it?
Thank you

Depth from image

How is depth image generated from .png files for velodyne scans?

About reduce scan lines of LiDAR depth

How do you reduce the scan lines of the LiDAR depth(64-line to 32-line...)? Could you share your code or your idea about this?

Clip output in model.py

Hey,

Why are you clipping the output of the model during eval (0.9m) but not during training?

Thanks

AttributeError

'Namespace' object has no attribute 'data_folder'

The above error occurred during the test.

What should be fixed?

Why I can't get the result when using the trained model you provided?

Why I can't get the result when using the trained model you provided?
The result:
`

RMSE=1000.051
MAE=437.180
Photo=0.000
iRMSE=4.515
iMAE=2.634
squared_rel=0.002914395970059559
silog=3.7430007658768316
Delta1=0.996
REL=0.028
Lg10=0.012
t_GPU=0.018
`

Some Question about '6.4 On Input Sparsity' in your ICRA paper

Hello!

Thank your for your gread work. When I read the paper, I met a question about chapter 6 On Input Sparsity.

In figure 6 you show the result when you trained with self-supervised framework, 'using both RGB and sparse depth yields the same level of accuracy as using sparse depth only'.

Could you tell me if the following guess is correct?

When we have only LiDAR sparse input, we have only 'depth Loss' and 'Smoothness Loss' during training. And Network Architecture in Figure 2 only have 32 channels LiDAR input.

If my guess is right, The input of your paper is Degenerate to the same as Sparsity Invariant CNNs(Only LiDAR). But in this case, your network gets better results. So how do you prove that it is the reason for Self-Supervised framework or Photometric Loss function, not because your network is optimized for lidar?

Thank you for your help!

How can I avoid this? When I dont run your code the GPU usage is low (approx 500 Mb). Looks strange that a 11 Gb GPU is not enough for your code.

I use the following command:

python3 main.py --evaluate /home/username/Downloads/model_best.pth.tar --val select

I have nothing changed in any file.

fangchangma / self-supervised-depth-completion Goto Github PK

self-supervised-depth-completion's People

Contributors

Stargazers

Watchers

Forkers

self-supervised-depth-completion's Issues

Recommend Projects

Recommend Topics

Recommend Org