weiyithu / surrounddepth Goto Github PK

View Code? Open in Web Editor NEW

247.0 247.0 38.0 4.01 MB

[CoRL 2022] SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

License: MIT License

Python 100.00%

depth-estimation multi-camera self-supervised-learning transformer

surrounddepth's People

Contributors

Stargazers

Watchers

surrounddepth's Issues

Depth Metrics

Hi, I have a question about the unit of the metrics. Is the RMSE for nuScenes in meter?

作者你好，我尝试着用你的模型训练了一下。但是在DDAD数据集下的训练样本似乎只有12319，而不是12650。请问这是正常的吗？我统计了我所下载的DDAD数据集的图片数目，确保了它是完整的（99600）。请问你是否与nuscence数据集一样去除了静态帧？在你提供的index.pkl中只有16421张图片，另外的79张是人为去除了吗?为什么要这么做呢？
另外，关于模型测试，SurroundDepth是没有额外设置测试集吗？
期待您的回复！
祝好！

whether intrinsics matrix is normalized?

Hi weiyi,
nice job.
I have some questions.
when you use the intrinsics matrix in the Nuscenes dataset, I found you did not normalize the intrinsics matrix K.
In monodepth2, the intrinsics matrix is normalized by the original image size.
so the K = np.array([[0.58, 0, 0.5, 0],
[0, 1.92, 0.5, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]], dtype=np.float32)
some small values.
But in your work, the K is directly introduced without any change.
I wonder if this way is ok?

About focal_scale

Hi, thank you for nice work!
I'm confused about the 'focal_scale'. Why do we need to do this:

SurroundDepth/runer.py

Lines 382 to 383 in 22dfecf

 if self.opt.focal: 

 pred_depth = pred_depth * data[("K", 0, 0)][i, 0, 0].item() / self.opt.focal_scale

A question about distributed training on DDAD dataset

Hello! I am following your work and doing a reproduction. But I got these questions below while using the command python -m torch.distributed.launch --nproc_per_node 8 run.py --model_name ddad --config configs/ddad.txt for distributed training on the DDAD dataset.

[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1806986 milliseconds before timing out. '

[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. T o avoid this inconsistency, we are taking the entire process down.

After training for a while, the process would be automatically shut down for running overtime.
Are there any details or training settings that I have ignored? Or does the torch version matter?
Thanks!

About disp to depth

Hi， Thanks a lot for the nice work!

I have a basic question about the disp to depth as follows:

def disp_to_depth(disp, min_depth, max_depth):
"""Convert network's sigmoid output into depth prediction
The formula for this conversion is given in the 'additional considerations'
section of the paper.
"""
min_disp = 1 / max_depth
max_disp = 1 / min_depth
scaled_disp = min_disp + (max_disp - min_disp) * disp # 0~~1 0.1~~10
depth = 1 / scaled_disp # 0.1~10
return scaled_disp, depth

May I ask the reason that doing such an operation for depth rather than disp * max_depth? Thanks a lot for your help!

A question about depth gt storage usage on nusc dataset

python export_gt_depth_nusc.py val

The depth map occupies more than 200G, Can the depth map gt take up less space?
I want to use depth map supervision.

python export_gt_depth_nusc.py train

This will make my storage explode.

Question about scale-aware model type nuScenes evaluation

Hi, I tried to evaluate nuScenes validation set performance with your released model, which is nusc_scale model. Since this model should be scale-aware, I suppose the scale-aware evaluation results should be similar as you mentioned in README.md, which is

type	dataset	Abs Rel	Sq Rel	delta < 1.25
scale-aware	nuScenes	0.280	4.401	0.661

However, it turns out the result worse than expected. The scale-ambiguity evaluation was higher than scale-aware evaluation in the scale-aware model. The relevant output was shown as below

Loading depth weights...
Loading encoder weights...
Training model named: nusc_scale
There are 20096 training items and 6019 validation items

median: 0.33512431383132935
-> Evaluating 1
scale-ambiguous evaluation:
front
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.262  &   2.961  &  10.989  &   0.398  &   0.527  &   0.791  &   0.889  \\
front_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.301  &   2.457  &   7.887  &   0.398  &   0.532  &   0.788  &   0.893  \\
back_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.289  &   2.230  &   7.050  &   0.386  &   0.578  &   0.799  &   0.895  \\
back
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.329  &   3.483  &  11.492  &   0.477  &   0.405  &   0.712  &   0.850  \\
back_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.298  &   2.418  &   7.371  &   0.405  &   0.556  &   0.789  &   0.887  \\
front_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.305  &   2.714  &   8.292  &   0.417  &   0.530  &   0.778  &   0.882  \\
all
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.297  &   2.711  &   8.847  &   0.413  &   0.522  &   0.776  &   0.883  \\
scale-aware evaluation:
front
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   1.976  &  42.443  &  22.277  &   1.090  &   0.048  &   0.107  &   0.194  \\
front_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.300  &  49.173  &  21.861  &   1.169  &   0.036  &   0.081  &   0.152  \\
back_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.392  &  51.237  &  21.708  &   1.186  &   0.029  &   0.070  &   0.135  \\
back
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   1.679  &  26.246  &  15.413  &   0.992  &   0.092  &   0.190  &   0.295  \\
back_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.569  &  61.392  &  22.064  &   1.214  &   0.031  &   0.075  &   0.143  \\
front_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.476  &  57.699  &  22.465  &   1.206  &   0.037  &   0.087  &   0.154  \\
all
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.232  &  48.032  &  20.965  &   1.143  &   0.046  &   0.102  &   0.179  \\

I export the GT by using tools/export_gt_depth_nusc.py with val and used configs/nusc_scale_pretrain.txt for evaluation (most config stay same except I changed the min_depth to 0.5).

Is this reasonable or where I am using it wrong, thank you.

Reproduction problem of the nuscenes dataset

Hello, Your results are great, I'm following your work.
I have a problem of reproducing the results for the nuScenes dataset, the abs_rel by Monodepth2 for the nuScenes dataset are high to 0.35, and the curve during training is very strange.
I use the code of Monodepth2, and use your code to process the dataset.
How can I reproduce the Monodepth2 results in your paper？Are there any details that need special attention?
Thanks！

About Deployment

Hi，It's a very great work.Can it be deployed on embed system such as Nvidia Jetson AGX?

Why the number of training datasets is less than the number of official nuScene dataset?

Hi,
I found the number of training datasets is 20096, but the number of offical nuScene datasets is 28119.
I am confused why the training number is less than the official nuScene dataset.
And another question is how to generate the train.txt?
Thank you for your help.

About outputs[("cam_T_cam", 0, f_i)]

Thanks for your nice work! I want to know what outputs[("cam_T_cam", 0, f_i)] means. Is it tranformation matrix from pose id=0 to id=f_i or id=f_i to id=0? Thanks in advance!

NAN values in losses while training

While I am training with scale aware pretraining, I am getting NAN values in losses. What could be the possible issue? Without scale aware pretraining the losses seem to be fine.

About the dataset fetching for Nuscenes

Hi, thanks for sharing your great work. I am reading the code of this work. I notice you get the adjacent frame by this line: index_temporal_i = cam_sample['prev'] and index_temporal_i = cam_sample['next']
I'm very curious about whether the image obtained in the way is a keyframe or a sweep frame.
And how do you remove the static scenes?

Thanks a lot and looking forward to your reply.

about training time

Hello, thank you for your great work. I was wondering how long it took to train this network and what was the model and quantity of the GPU(s) used. I would greatly appreciate it if you could reply!

A question about self-supervised and supervised learning

Basically we use self-supervised methods to train depth prediction models. Have you tried self-supervised combined with supervised methods? You know, Nuscenes and DDAD datasets have some sparse point cloud. I tried but got bad performance and did not figure out what lead to such result.

Camera intrinsics when image flip

Hi, I am confused that does the cx in intrinsics matrix need to change to w - cx when the image is flipped(horizontal)?

How to get the metric depth?

Dear author:
Thank you very much for your contributions in this paper！
I try to get the depth in evaluation. Theoretically, we can get the metric depth between min_depth and max_depth(meter) after the function 'disp_to_depth' . However, I observed several groups of depth generated in this way, it seems not like a correct metric depth. As below:
(Pdb) pred_depth
array([[0.7076706 , 0.7073359 , 0.70599675, ..., 0.60197115, 0.59946537, 0.5988389 ],
[0.70776784, 0.7074405 , 0.7061312 , ..., 0.60220945, 0.599669 , 0.59903383],
[0.7082062 , 0.70791256, 0.706738 , ..., 0.603285 , 0.60058796, 0.5999137 ],
...,
[0.11618532, 0.1162222 , 0.11636975, ..., 0.11287601, 0.11278087, 0.11275709],
[0.11617843, 0.11621682, 0.11637037, ..., 0.11286871, 0.11277094, 0.11274651],
[0.1161769 , 0.11621563, 0.11637051, ..., 0.11286709, 0.11276875, 0.11274416]], dtype=float32)
The depth number is very small compare to real depth, much of them less than 1.0. So how to get the metric depth?

Some problems encountered in preparing data

Hi!
I follow the README to prepare DDAD data. But after I perform sift and match operations, I find that the content contained in the sift and match folders is not complete. There should be folders from 000 to 199, but the sift and match folders are not complete, my sift folder only contains 000 to 106, and my match folder only contains 001 to 144. The same problem also exists in the depth folder, the numbered files in this folder are complete (000 to 199), but some files are empty. Why are these situations?

不能复现论文结果

Depth Metrics

A question about visualize_depth

Hi,

I use the code "python -m torch.distributed.launch --nproc_per_node 4 run.py --model_name test --config configs/nusc.txt --models_to_load depth encoder --load_weights_folder=/log/nusc/model/weights/ --save_pred_disps --eval_out_dir=/log/nusc/eval/ --eval_only"

But there is no picture output. The eval folder in the log directory is empty. So I'd like to ask for advice on how to visualize my evaluation

Thanks,

	if self.opt.focal:
	pred_depth = pred_depth * data[("K", 0, 0)][i, 0, 0].item() / self.opt.focal_scale

weiyithu / surrounddepth Goto Github PK

surrounddepth's People

Contributors

Stargazers

Watchers

Forkers

surrounddepth's Issues

Recommend Projects

Recommend Topics

Recommend Org