fangchangma / self-supervised-depth-completion Goto Github PK
View Code? Open in Web Editor NEWICRA 2019 "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera"
License: MIT License
ICRA 2019 "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera"
License: MIT License
When I downloaded the trained model, I could not extract the 'tar' file. I was wondering if there is something wrong with your 'tar' file.
Hey! @fangchangma I find your work really interesting and I am evaluating your approach for the task for single view depth completion.
I have a query regarding the data structure that you updated on 1st of October,
Initially, the data structure used to be like data/kitti_depth and data/kitti_rgb.
As in your updated data structure, you have included two new subfolders in data that is data/data_depth_velodyne and data/depth_selection.
Can you explain me where we need to download the data for the respective subfolders?
Hi @fangchangma Thanks for sharing the code. I evaluated the pretrained model provided in readme. The result is not as good as reported in the paper (rmse 1343 vs 814). It was a clean clone and I followed the data folder structure. I attached the command and the screenshot of the results. Please let me know if there is an error or if I missed something here. Thank you.
python main.py --evaluate pretrain/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-02-26@07-50/model_best.pth.tar
=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-05-08@10-21
Val Epoch: 8 [990/1000] lr=0 t_Data=0.001(0.001) t_GPU=0.014(0.023)
RMSE=1086.59(1347.03) MAE=308.10(359.76) iRMSE=4.29(4.27) iMAE=1.50(1.64)
silog=4.67(5.24) squared_rel=0.00(0.01) Delta1=0.994(0.992) REL=0.018(0.020)
Lg10=0.007(0.008) Photometric=0.000(0.000)
=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=16.wd=0.pretrained=False.jitter=0.1.time=2019-05-08@10-21
Val Epoch: 8 [1000/1000] lr=0 t_Data=0.001(0.001) t_GPU=0.014(0.023)
RMSE=1005.17(1343.61) MAE=262.70(358.79) iRMSE=4.57(4.28) iMAE=1.57(1.64)
silog=4.98(5.24) squared_rel=0.00(0.01) Delta1=0.993(0.992) REL=0.018(0.020)
Lg10=0.007(0.008) Photometric=0.000(0.000)
*
Summary of val round
RMSE=1343.609
MAE=358.790
Photo=0.000
iRMSE=4.277
iMAE=1.642
squared_rel=0.006554501281207195
silog=5.2404233943858145
Delta1=0.992
REL=0.020
Lg10=0.008
t_GPU=0.023
(best rmse is 1343.609)
*
In the paper, the best result is 814(rmse), how can I get it?
Hi, great papers, thanks a lot for sharing!
I have a question - have you done any evaluations of your latest work, but in a depth-estimation (not completion) setup? unsupervised or supervised?
any idea/thoughts of how such SOTA methods would compare to your work?
Thanks a lot!
Z.
from the kiti dataset, how is the data to be distributed in the data folder, with kitti_depth and kitti_rgb as its sub-folders?? I want to test he pre-trained model only. Please tell me what data do I need
RuntimeError: CUDA out of memory. Tried to allocate 52.25 MiB (GPU 0; 7.92 GiB total capacity; 6.71 GiB already allocated; 53.94 MiB free; 35.80 MiB cached)
File "main.py", line 247, in
main()
File "main.py", line 230, in main
result, is_best = iterate("val", args, val_loader, model, None, logger, checkpoint['epoch'])
File "main.py", line 105, in iterate
pred = model(batch_data)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/dense_lidar/self-supervised-depth-completion/model.py", line 119, in forward
conv3 = self.conv3(conv2) # batchsize * ? * 176 * 608
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torchvision/models/resnet.py", line 45, in forward
out = self.conv1(x)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/jadoo/pytorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
Hello , I wanna know when will the dataset script be made public?
Hey,
I want to save the depth map estimated by your approach. How is that possible? Right now if I am running
python3 main.py --evaluate /home/username/Downloads/model_best.pth.tar --val select
I just get the error-results printed in the terminal and in the result folder a summary. I want to access each estimated depth map. How is that possible?
Best
I download the model trained with semi-dense lidar ground truth, from:supervised model.
When I run
torch.load('supervised/model_best.pth.tar')
This is a error:
Traceback (most recent call last):
File "model.py", line 173, in <module>
main()
File "model.py", line 164, in main
torch.load('supervised/model_best.pth.tar')
File "/home/S/.local/lib/python3.5/site-packages/torch/serialization.py", line 367, in load
return _load(f, map_location, pickle_module)
File "/home/S/.local/lib/python3.5/site-packages/torch/serialization.py", line 538, in _load
result = unpickler.load()
ImportError: No module named 'metrics'
Could your tell me why?
Thanks~
Hi, fangchangma.
Thank you for your nice work. But I can't seem to reproduce the self-supervised results. Could you provide your training log and detailed training plan and hyperparameter configuration?
Looking forward to your reply.
On page 9 of the paper figure 6b, on the rightmost point, the self-supervised method receives semi-dense lidar ground truth, which is no longer "sparse depth loss"; I don't understand why it performs worse than the supervised method which has the same ground truth supervision. The self-supervised one has additional losses such as photometric loss, etc, so it should at least perform as well as the supervised one in my opinion.
How do you explain this?
Hi, i m curious about your work. i have already read your paper . Is there anything new?
When will you update the repository?
Hello,
When I am training the model, there will be a problem about "cuda: out of memory". I try to reduce the batch size, but the batch size does not seem too small for this work. Can you give me some advice about the minimum batch size ?
Seems there is no network choice as mentioned in paper for single d ?
Hi,
Is the "silog" measurement calculated in metrices.py the scale-invariant Error proposed by Eigen et al. in https://arxiv.org/pdf/1406.2283.pdf
thanks.
What should I do if I only want to use your code for a few images?
Hi @fangchangma,
Thank you for your impressive work. I just have one question regarding to your training setting.
In the paper, you used photometric consistency between current frame and next frame as the cue for self-supervised training.
But following similar idea, if using a stereo pair can accomplish such supervision as well by maintaining the photometric consistency between left & right image, right?
Have you try training in such way? I think it is computationally cheaper.
I am looking forward to your reply.
Best,
Hi Fangchang,
Our lab is currently working on a project which requires generating depth maps from our vlp-16 lidar and camera setting. Your work looks great as the depth map solution. Since we got different size images as input, I think what we need to do to use this network is (1) read in our own calibration information (K) and (2) crop input images as (width, high) both multiples of 16 (since we got errors when going through decode layers with some other sizes), is that right?
We've tested with a rather small dataset (only ~700 frames) and got results like the figure showing below.
We are wondering if the dataset is too small or the depth info from vlp-16 is too sparse since the results remain clear projected lines. It would be great if you have any suggestions, thanks!
I am using your pre-trained models for testing on the validation set "val_selection_cropped". But while loading the "calib_cam_to_cam.txt" certain errors come. I attaching the screenshot of the error.
I am running the script with this command in the terminal: -
python3 /scratch/gjain2s/Approaches/sparse_dense/self-supervised-depth-completion/main.py --data-folder /scratch/gjain2s/Approaches/sparse_dense/data --evaluate /home/gjain2s/self_supervised/model_best.pth.tar --val select
Any help would be highly appreciated.
An error occurred when entering such a parameter on the command line,what can we input about [checkpoint -path]?Can you give us an example?
Namespace(batch_size=1, criterion='l2', epochs=11, evaluate='[checkpoint-path]', i
nput='gd', jitter=0.1, layers=34, lr=1e-05, pretrained=False, print_freq=10, rank_metric='rmse', result='..\results', resume='', start_epoch=0, train_mode='dense', use_d=True, use_g=True, use_pose=False, use_rgb=False, val='select', w1=0, w2=0, weight_decay=0, workers=4)
=> no model found at '[checkpoint-path]'
Hi Fangchang:
Thank you so much for sharing this great project!
I have tested your pre-trained self-supervised model, it's RMSE is around 1300, matched with your paper.
But when I try to train the model with this command:
python main.py --train-mode sparse+photo
on 2 Tesla-V100 GPU for around 15 epochs, it can only converge to RMSE ~8k-9k and never further. I didn't change any hyper parameter from your code, just the batch-size is smaller than you mentioned (8).
Are there any parameters or options I need to change from this Github repo? Or do you have any suggestions on training?
Thank you so much!
Sincerely,
Ziyue Feng
Hi,
Thank you for sharing your code with us! I am trying to evaluate the method on our own dataset. We gathered larger images and thus have to crop/resize them. When looking at the code, the comment in kitti_loader.py states:
note: we will take the center crop of the images during augmentation
# that changes the optical centers, but not focal lengths
https://github.com/fangchangma/self-supervised-depth-completion/blob/master/dataloaders/kitti_loader.py#L29
The optical center is then adjusted. However, in lines 145 and 168, a bottom crop is applied to the images. Thus, if I understand the code correctly, the full crop distance has to be subtracted from the focal centers.
Can you check if my understanding in this regard is correct?
Kind regards,
Chris
Hello,
In the evaluation results, I found that there is content about the visualization of the results, as shown in the picture. What I don't know is what the fourth column is. It seems to be a semi-dense depth map of the annotations, but I used the self-supervised mode(sparse+photo), which should not use annotation data. Can you answer my doubts? Thank you.
When I training the net, the warning raise.
[W IndexingUtils.h:20] Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (function expandTensors)
Can you tell me how to fix it ?
Hi, Fangchang Ma:
After downloading the dataSet required and put them at the tree structure showed in readme, I try to run the demo using command "python main.py --train-mode sparse+photo -b 6". It meets error as follow:
'''
=> output: ../results/mode=sparse+photo.w1=0.1.w2=0.1.input=gd.resnet34.criterion=l2.lr=1e-05.bs=6.wd=0.pretrained=False.jitter=0.1.time=2021-05-10@12-35
Train Epoch: 0 [9290/14317] lr=1e-05 t_Data=0.009(0.008) t_GPU=0.395(0.402)
RMSE=3337.93(10104.38) MAE=1233.75(5881.38) iRMSE=11.95(inf) iMAE=5.93(inf)
silog=11.96(nan) squared_rel=0.02(0.24) Delta1=0.954(0.600) REL=0.066(0.333)
Lg10=0.029(nan) Photometric=39.012(56.714)
Traceback (most recent call last):
File "main.py", line 362, in
main()
File "main.py", line 349, in main
epoch) # train for one epoch
File "main.py", line 171, in iterate
for i, batch_data in enumerate(loader):
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
return self._process_data(data)
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hsg/data/software/anaconda/envs/hsg/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 306, in getitem
rgb, sparse, target, rgb_near = self.getraw(index)
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 300, in getraw
self.paths['gt'][index] is not None else None
File "/home/hsg/data/proj/SSDC/self-supervised-depth-completion/dataloaders/kitti_loader.py", line 156, in depth_read
depth_png = np.array(img_file, dtype=int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PngImageFile'
'''
Could anyone help me to solve the problem?
Or provide some suggestion?
Thanks!
I am trying to use your code with KITTI object detection dataset.
test_depth_completion_anonymous
folderHowever, I got the following error
Traceback (most recent call last):
File "main.py", line 248, in <module>
main()
File "main.py", line 231, in main
result, is_best = iterate("test_completion", args, val_loader, model, None, logger, checkpoint['epoch'])
File "main.py", line 105, in iterate
pred = model(batch_data)
File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/data/ssd/public/jlliu/pythonlib/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data/ssd/public/jlliu/depth_completion/self-supervised-depth-completion/model.py", line 126, in forward
y = torch.cat((convt5, conv5), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 48 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83
Did I miss something?
hello fangchangma
would you please tell me that how to colorize the depth map in your papper
I get the gray depth image and don't konw how to compare with your result which is colorized like monodepth results.
@fangchangma I have two questions
Now, do I have to delete |-- val folder and replace with the following?
if my assumption is correct then the above steps will make dir structure like this
Hi,
Thanks for open-sourcing this great piece of work!
I am trying to implement your code and faced some problem,I see that in the main.py file is calculating loss2 (photometric_loss), you use rgb_curr_ (the image of the current frame), warped_ (the image of the neighboring frame predicted by the current frame image), and proofread through the mask to calculate the photometric_loss . So my question is
Is my understanding of rgb_curr_ and warped_ correct? If not, I hope to get your corrections.
Why use the current frame and predicted neighboring frame images to calculate photometric_loss, instead of using the current frame and predicted current frame to calculate photometric_loss.
In your paper, I have seen guidance on using RGB images for depth prediction, Is it the only way to calculate the photometric_loss using the RGB guide? If not, I hope you can give me your advice. Do you have any suggestions?
I have just come into contact with this knowledge, and there may be something wrong. I hope you forgive me.
Thanks for the help!
hello,I test the input-gd is better than input rgbd.I want to know whether is it true.
In the end of model.py
if self.training:
return 100 * y
else:
min_distance = 0.9
return F.relu(100 * y - min_distance) + min_distance # the minimum range of Velodyne is around 3 feet ~= 0.9m
Could your tell me what is the 100 means?
hi, I wonder how many data you used? The whole depth completion datasst in KITTI about 85898 training data or just choose one sequence like 2011_09_26 to train the model ?
And you the image size feed the net 352x1216, with the limit of hardware, it it proper to downsample t0 176* 608 or 88 * 304 and then feed the network?
Is it necessary to do data augmentation which not mentioned in your paper
When I try to run with test_completion split. I got the following error:
Traceback (most recent call last):
File "main.py", line 248, in <module>
main()
File "main.py", line 231, in main
result, is_best = iterate("test_completion", args, val_loader, model, None, logger, checkpoint['epoch'])
File "main.py", line 105, in iterate
pred = model(batch_data)
File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/data/ssd/public/jlliu/pythonlib/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data/ssd/public/jlliu/self-supervised-depth-completion/model.py", line 126, in forward
y = torch.cat((convt5, conv5), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 47 and 48 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83
Hi Fangchang,
Wondering whether there is any update on the release? Is it possible to release partial implementation of the paper, for example, training and evaluation on the supervised learning? Also releasing the best model of your network will be very helpful.
Thanks!
Hello,
This is a great project, I am very interested in it, but I found that there is no data set that can be used directly. Can you share it?
Thank you
How is depth image generated from .png files for velodyne scans?
How do you reduce the scan lines of the LiDAR depth(64-line to 32-line...)? Could you share your code or your idea about this?
Hey,
Why are you clipping the output of the model during eval (0.9m) but not during training?
Thanks
'Namespace' object has no attribute 'data_folder'
The above error occurred during the test.
What should be fixed?
Why I can't get the result when using the trained model you provided?
The result:
`
RMSE=1000.051
MAE=437.180
Photo=0.000
iRMSE=4.515
iMAE=2.634
squared_rel=0.002914395970059559
silog=3.7430007658768316
Delta1=0.996
REL=0.028
Lg10=0.012
t_GPU=0.018
`
Hello!
Thank your for your gread work. When I read the paper, I met a question about chapter 6 On Input Sparsity.
In figure 6 you show the result when you trained with self-supervised framework, 'using both RGB and sparse depth yields the same level of accuracy as using sparse depth only'.
Could you tell me if the following guess is correct?
When we have only LiDAR sparse input, we have only 'depth Loss' and 'Smoothness Loss' during training. And Network Architecture in Figure 2 only have 32 channels LiDAR input.
If my guess is right, The input of your paper is Degenerate to the same as Sparsity Invariant CNNs(Only LiDAR). But in this case, your network gets better results. So how do you prove that it is the reason for Self-Supervised framework or Photometric Loss function, not because your network is optimized for lidar?
Thank you for your help!
When running main.py, I get the following error in line 262
Error message: Invalid syntax end='')
I'm not sure why it doesn't like the end='') part of the code
the scripts in the download folder just extracting the dara_rgb, how could I extract other files?
Hi,
Great work! I wonder which tools you are using for solving PnP /w RANSAC to estimate the camera pose. Could please provide a snip-shot for these part of codes?
Your paper is really interesting, and I have tried implementing the network following your descriptions there.
For the supervised network, I was wondering if you have used any data augmentation during training. If so, what kind? Also, do you normalize the depths in any way (both the input and the ground truth)?
Hey,
I just want to use your pretrained model and create some results (on val_selection_cropped). For some strange reason I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 210.00 MiB (GPU 0; 10.91 GiB total capacity; 8.95 GiB already allocated; 194.06 MiB free; 9.59 GiB reserved in total by PyTorch)
How can I avoid this? When I dont run your code the GPU usage is low (approx 500 Mb). Looks strange that a 11 Gb GPU is not enough for your code.
I use the following command:
python3 main.py --evaluate /home/username/Downloads/model_best.pth.tar --val select
I have nothing changed in any file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.