baegwangbin / magnet Goto Github PK

[CVPR 2022 Oral] Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

License: MIT License

Python 100.00%

3d-reconstruction computer-vision cvpr2022 deep-learning depth-estimation multi-view-stereo multiview-geometry multiview-stereo uncertainty uncertainty-estimation

magnet's People

Contributors

Stargazers

Watchers

Forkers

zebrajack wenjiar yijunwu arctanbell kimsoohwan longruidong flamehaze1115 shuweishao mmrrqq superchong1987 eviliclufas cv-depth qwang666

magnet's Issues

Loss of D-net

Hi, Excellent work!
I found the loss of D-Net is nll = (torch.square(mu - gt_depth) / (2 * var)) + (0.5 * torch.log(var)) in here.
I have two questions below:

will the loss turns negative as log(1e-5) < 0 (take 1e-5 for example)
how long does the training step of D-net take for each dataset?

Thank you for your reply!!

Negative frame index/name

Hi,

Trying to test the multi-view model via

python test_MaGNet.py ./test_scripts/magnet/scannet.txt

but, got FileNotFoundError due to negative image index/name (-15.jpg).
Anyone got the same error?

Btw, monocular model (python test_DNet.py ./test_scripts/dnet/scannet.txt) works fine.

About the depth volume predicted by the mono depth

I have a question. mu + k*sigma might be negative. The depth volume constructed by negative depth might bring some errors to the training results. How do the depth volume work and keep positive?

Error During F-Net Training

When I try to run

python train_FNet.py train_scripts/fnet/scannet.txt

I get the following error:

Traceback (most recent call last):
File "MaGNet/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "MaGNet/train_FNet.py", line 238, in main_worker
train(model, args, device=args.gpu)
File "MaGNet/train_FNet.py", line 130, in train
utils.visualize_F(args, ref_img, gt_dmap, gt_dmap_mask, pred_dmap, total_iter)
File "MaGNet/utils/utils.py", line 286, in visualize_F
pred_emap = np.abs(pred_dmap - gt_dmap)
ValueError: operands could not be broadcast together with shapes (480,640) (120,160)

I think the problem is caused by this line:

MaGNet/utils/utils.py

Line 266 in d916ec6

 pred_dmap = F.interpolate(pred_dmap, size=[img.shape[2], img.shape[3]], mode='nearest') 

resizing the predicted depth to the image dimensions (which have a 4 times higher resolution than the ground truth depth).

I removed the respective line as a workaround. Does that make sense or am I missing some logic?

The training script

Hello, when will you release the training script?

How to visualize sigma (uncertainty) with the same color map as that used in your paper?

I cannot find the colormap similar to your paper in opencv-python. Could you share your visualization code about the sigma? Thanks!

GPU?

How many GPUs are needed to reproduce the results?

How to organize the scannnet dataset as you are

I have downloaded the scannet dataset ,and it is organized as

i am wondering how to organized it as

The meaning of predicted depth

Hello, I'm curious about the meaning of model prediction. Does it represent a relative depth or absolute depth in meters?

A quession about D_net!

thanks for your nice work!
I trained the D_net on DTU dataset, the training loss declined normally(avg depth_error 5mm), but on the validation datase, the loss is high(avg depth_error 50mm). That seems like an overfit,.Could you tell me how to deal with this problem?

How do I run an image I have to get the depth output?

I'm pretty new to this. I've installed the requirements file. How do I run an image that I have to get an output from the model? Please help me with the commands to do that.

Details results of D-Net

Excellent work!
Hi, I'm using another dataset to train and test D-Net! Could you please show us the results of D-Net after training and testing?
like abs_rel abs_diff sq_rel rmse rmse_log irmse log_10 silog a1 a2 a3 NLL for each scene?
Thank you so much!

[SOLVED] _pickle.UnpicklingError: invalid load key, '<'.

In my case, after running ckpts/download.py, the test command makes an error.

...
  File "/home/chung/anaconda3/envs/magnet/lib/python3.6/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/chung/anaconda3/envs/magnet/lib/python3.6/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

The problem was the .pt file which was not properly downloaded by the provided code (download.py)
Instead, I recommend using gdown to download the .pt files: for example,

## pip install gdown
gdown 1bbzfboj6XkfFhoJ54Iiqc5Ylj95A015M -O ./ckpts/DNET_scannet.pt

or you can run those codes with python easily

import os 
os.system("gdown 1bbzfboj6XkfFhoJ54Iiqc5Ylj95A015M -O ./ckpts/DNET_scannet.pt")
os.system("gdown 1ugDr67UOanpQZMlPopiM8OihUexhPql4 -O ./ckpts/FNET_scannet.pt")
...

and it works. :)

How do you predict the scale of the depth

the current model input only has relative pose. I don't see any input of model related to depth scale (like depth range or absolute value of extrinsics). If a dataset has mixied scales of depth of images (drone data and regular photo), your training/inference can fail.

"_compute_cost_CW" function in homography.py

I'm confused why the normalized pixel coordinates need do the following operation:
src_coords[src_coords > 10.0] = 10.0 src_coords[src_coords < -10.0] = -10.0
The "F.grid_sample" function requires the coordinates in the range of -1 to 1 rather than -10 to 10. The coordinates out of [-1, 1] will be invalid thus produce 0 at the corresponding position of output.