weiyithu / surrounddepth Goto Github PK
View Code? Open in Web Editor NEW[CoRL 2022] SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
License: MIT License
[CoRL 2022] SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
License: MIT License
Hi, I have a question about the unit of the metrics. Is the RMSE for nuScenes in meter?
作者你好,我尝试着用你的模型训练了一下。但是在DDAD数据集下的训练样本似乎只有12319,而不是12650。请问这是正常的吗?我统计了我所下载的DDAD数据集的图片数目,确保了它是完整的(99600)。请问你是否与nuscence数据集一样去除了静态帧?在你提供的index.pkl中只有16421张图片,另外的79张是人为去除了吗?为什么要这么做呢?
另外,关于模型测试,SurroundDepth是没有额外设置测试集吗?
期待您的回复!
祝好!
Hi weiyi,
nice job.
I have some questions.
when you use the intrinsics matrix in the Nuscenes dataset, I found you did not normalize the intrinsics matrix K.
In monodepth2, the intrinsics matrix is normalized by the original image size.
so the K = np.array([[0.58, 0, 0.5, 0],
[0, 1.92, 0.5, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]], dtype=np.float32)
some small values.
But in your work, the K is directly introduced without any change.
I wonder if this way is ok?
Hi, thank you for nice work!
I'm confused about the 'focal_scale'. Why do we need to do this:
Lines 382 to 383 in 22dfecf
Hello! I am following your work and doing a reproduction. But I got these questions below while using the command python -m torch.distributed.launch --nproc_per_node 8 run.py --model_name ddad --config configs/ddad.txt
for distributed training on the DDAD dataset.
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1806986 milliseconds before timing out. '
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. T o avoid this inconsistency, we are taking the entire process down.
After training for a while, the process would be automatically shut down for running overtime.
Are there any details or training settings that I have ignored? Or does the torch version matter?
Thanks!
Hi, Thanks a lot for the nice work!
I have a basic question about the disp to depth as follows:
def disp_to_depth(disp, min_depth, max_depth):
"""Convert network's sigmoid output into depth prediction
The formula for this conversion is given in the 'additional considerations'
section of the paper.
"""
min_disp = 1 / max_depth
max_disp = 1 / min_depth
scaled_disp = min_disp + (max_disp - min_disp) * disp # 01 0.110
depth = 1 / scaled_disp # 0.1~10
return scaled_disp, depth
May I ask the reason that doing such an operation for depth rather than disp * max_depth? Thanks a lot for your help!
python export_gt_depth_nusc.py val
The depth map occupies more than 200G, Can the depth map gt take up less space?
I want to use depth map supervision.
python export_gt_depth_nusc.py train
This will make my storage explode.
Hi, I tried to evaluate nuScenes validation set performance with your released model, which is nusc_scale
model. Since this model should be scale-aware, I suppose the scale-aware evaluation results should be similar as you mentioned in README.md
, which is
type | dataset | Abs Rel | Sq Rel | delta < 1.25 |
scale-aware | nuScenes | 0.280 | 4.401 | 0.661 |
However, it turns out the result worse than expected. The scale-ambiguity evaluation was higher than scale-aware evaluation in the scale-aware model. The relevant output was shown as below
Loading depth weights...
Loading encoder weights...
Training model named: nusc_scale
There are 20096 training items and 6019 validation items
median: 0.33512431383132935
-> Evaluating 1
scale-ambiguous evaluation:
front
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.262 & 2.961 & 10.989 & 0.398 & 0.527 & 0.791 & 0.889 \\
front_left
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.301 & 2.457 & 7.887 & 0.398 & 0.532 & 0.788 & 0.893 \\
back_left
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.289 & 2.230 & 7.050 & 0.386 & 0.578 & 0.799 & 0.895 \\
back
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.329 & 3.483 & 11.492 & 0.477 & 0.405 & 0.712 & 0.850 \\
back_right
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.298 & 2.418 & 7.371 & 0.405 & 0.556 & 0.789 & 0.887 \\
front_right
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.305 & 2.714 & 8.292 & 0.417 & 0.530 & 0.778 & 0.882 \\
all
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.297 & 2.711 & 8.847 & 0.413 & 0.522 & 0.776 & 0.883 \\
scale-aware evaluation:
front
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 1.976 & 42.443 & 22.277 & 1.090 & 0.048 & 0.107 & 0.194 \\
front_left
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 2.300 & 49.173 & 21.861 & 1.169 & 0.036 & 0.081 & 0.152 \\
back_left
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 2.392 & 51.237 & 21.708 & 1.186 & 0.029 & 0.070 & 0.135 \\
back
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 1.679 & 26.246 & 15.413 & 0.992 & 0.092 & 0.190 & 0.295 \\
back_right
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 2.569 & 61.392 & 22.064 & 1.214 & 0.031 & 0.075 & 0.143 \\
front_right
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 2.476 & 57.699 & 22.465 & 1.206 & 0.037 & 0.087 & 0.154 \\
all
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 2.232 & 48.032 & 20.965 & 1.143 & 0.046 & 0.102 & 0.179 \\
I export the GT by using tools/export_gt_depth_nusc.py
with val
and used configs/nusc_scale_pretrain.txt
for evaluation (most config stay same except I changed the min_depth to 0.5).
Is this reasonable or where I am using it wrong, thank you.
Hello, Your results are great, I'm following your work.
I have a problem of reproducing the results for the nuScenes dataset, the abs_rel by Monodepth2 for the nuScenes dataset are high to 0.35, and the curve during training is very strange.
I use the code of Monodepth2, and use your code to process the dataset.
How can I reproduce the Monodepth2 results in your paper?Are there any details that need special attention?
Thanks!
Hi,It's a very great work.Can it be deployed on embed system such as Nvidia Jetson AGX?
Hi,
I found the number of training datasets is 20096, but the number of offical nuScene datasets is 28119.
I am confused why the training number is less than the official nuScene dataset.
And another question is how to generate the train.txt?
Thank you for your help.
Thanks for your nice work! I want to know what outputs[("cam_T_cam", 0, f_i)] means. Is it tranformation matrix from pose id=0 to id=f_i or id=f_i to id=0? Thanks in advance!
While I am training with scale aware pretraining, I am getting NAN values in losses. What could be the possible issue? Without scale aware pretraining the losses seem to be fine.
Hi, thanks for sharing your great work. I am reading the code of this work. I notice you get the adjacent frame by this line: index_temporal_i = cam_sample['prev'] and index_temporal_i = cam_sample['next']
I'm very curious about whether the image obtained in the way is a keyframe or a sweep frame.
And how do you remove the static scenes?
Thanks a lot and looking forward to your reply.
Hello, thank you for your great work. I was wondering how long it took to train this network and what was the model and quantity of the GPU(s) used. I would greatly appreciate it if you could reply!
Basically we use self-supervised methods to train depth prediction models. Have you tried self-supervised combined with supervised methods? You know, Nuscenes and DDAD datasets have some sparse point cloud. I tried but got bad performance and did not figure out what lead to such result.
Hi, I am confused that does the cx in intrinsics matrix need to change to w - cx when the image is flipped(horizontal)?
Dear author:
Thank you very much for your contributions in this paper!
I try to get the depth in evaluation. Theoretically, we can get the metric depth between min_depth and max_depth(meter) after the function 'disp_to_depth' . However, I observed several groups of depth generated in this way, it seems not like a correct metric depth. As below:
(Pdb) pred_depth
array([[0.7076706 , 0.7073359 , 0.70599675, ..., 0.60197115, 0.59946537, 0.5988389 ],
[0.70776784, 0.7074405 , 0.7061312 , ..., 0.60220945, 0.599669 , 0.59903383],
[0.7082062 , 0.70791256, 0.706738 , ..., 0.603285 , 0.60058796, 0.5999137 ],
...,
[0.11618532, 0.1162222 , 0.11636975, ..., 0.11287601, 0.11278087, 0.11275709],
[0.11617843, 0.11621682, 0.11637037, ..., 0.11286871, 0.11277094, 0.11274651],
[0.1161769 , 0.11621563, 0.11637051, ..., 0.11286709, 0.11276875, 0.11274416]], dtype=float32)
The depth number is very small compare to real depth, much of them less than 1.0. So how to get the metric depth?
Hi!
I follow the README to prepare DDAD data. But after I perform sift and match operations, I find that the content contained in the sift and match folders is not complete. There should be folders from 000 to 199, but the sift and match folders are not complete, my sift folder only contains 000 to 106, and my match folder only contains 001 to 144. The same problem also exists in the depth folder, the numbered files in this folder are complete (000 to 199), but some files are empty. Why are these situations?
Hi,
I use the code "python -m torch.distributed.launch --nproc_per_node 4 run.py --model_name test --config configs/nusc.txt --models_to_load depth encoder --load_weights_folder=/log/nusc/model/weights/ --save_pred_disps --eval_out_dir=/log/nusc/eval/ --eval_only"
But there is no picture output. The eval folder in the log directory is empty. So I'd like to ask for advice on how to visualize my evaluation
Thanks,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.