jeffwang987 / mvster Goto Github PK

View Code? Open in Web Editor NEW

186.0 186.0 13.0 215.28 MB

[ECCV 2022] MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

License: MIT License

Python 85.52% MATLAB 13.61% Shell 0.87%

depth-estimation eccv2022 multi-view-stereo transformer

mvster's People

Contributors

Stargazers

Watchers

Forkers

zebrajack mfkiwl czj942650673 yukishiyk david12233 cudnn ligang305 yu-shaonian lukahola pietertolsma su-terry topworld2040 headreaper-hc

mvster's Issues

whether parameters other than images are required？

Hello, when using this network is it necessary to know the camera internal parameters to generate the Epipolar lines? Or the network only requires the input of the image and no other parameters.

The output .ply. Could you explain what are they ?

Thank you for sharing the repo! and for producing this awesome ECCV paper !!!

It is my first time working on this topic, so I would like to kindly ask questions about the output files.

Here, I have seen two sets of outputs. One set of the .ply files is in scanX/ply_local folders, where of each of scanX contains
the following output folders, cam, confidence, depth_est, images, mask, and ply_local. The other set of .ply files is those mvsnetXXX_l3.ply

So, my questions are

Which one is the output from MVSTER? What is the meaning of each of these .ply files? For example, why there are two sets of them? (Please see the figure below for the examples)
Is mvsnetXXX_l3.ply used for comparison? or it is also output from MVSTER?
In each of scanX/ply_local folder, there are 3 .ply files... What are they? What is the meaning for each of them?
(Please see the figure below for the examples)
How did you visualize them..?. I am so sorry for asking this question. I only would like to learn from you. I have used Open3d, but it looks like the following which does not look as good as the example. I only use the command open3d draw

Question 1 & 2	Question 3	Question 4	Question 4

A qustion about mono depth

thanks for your code！
I have a question about MVS4Net.py
“line 109 outputs = self.mono_depth_decoder(outputs, depth_values[:,0], depth_values[:,1])”
depth_values[:,-1] seems to d_max rather than depth_values[:,1]. Is there a problem with my understanding?

Is there any other way to download the dataset?

Test only generate 4 ply files

Is there any settings wrong?
Trained with the source code , not pretrained model.

Where is the script for evaluation on tt?

question about test_mvs4.py

Multi-scale camera projection matrices

Hi @JeffWang987

Thanks for the great work. There's a little confusion about the data preprocessing in dtu_yao4.py

Around Line 220, there are four different scales of corresponded camera intrinsic parameter. I aware MVSTER train the model in a multi-scale fasion, and the resolution of transformer outputs are as follows:

Stage 1: H/8 x W/8
Stage 2: H/4 x W/4
Stage 3: H/2 x W/2
Stage 4: H x W

So I expect the intrinsic of each stage will be /8, /4, /2, /1 sequentially. However, I find out in the code stage1 is /2, stage2 does nothing, stage3 is *2 and stage4 is *4 instead.

I'm wondering if there's something I misunderstand? Hope for a hint from you, thanks again for the great work!

Fine tuning on other datasets

Has anyone fine-tuned the pretrained model(https://github.com/JeffWang987/MVSTER/releases/tag/dtu_ckpt) provided by the author？When I did so, I found that the author may have left out "The Auxiliary Branch makes Monocular depth Estimation" from the model.

I got the following errors:

RuntimeError: Error(s) in loading state_dict for MVS4net:
Missing key(s) in state_dict: "mono_depth_decoder.convblocks.0.conv.weight", "mono_depth_decoder.convblocks.0.bn.weight", "mono_depth_decoder.convblocks.0.bn.bias", "mono_depth_decoder.convblocks.0.bn.running_mean", "mono_depth_decoder.convblocks.0.bn.running_var", "mono_depth_decoder.convblocks.1.conv.weight", "mono_depth_decoder.convblocks.1.bn.weight", "mono_depth_decoder.convblocks.1.bn.bias", "mono_depth_decoder.convblocks.1.bn.running_mean", "mono_depth_decoder.convblocks.1.bn.running_var", "mono_depth_decoder.convblocks.2.conv.weight", "mono_depth_decoder.convblocks.2.bn.weight", "mono_depth_decoder.convblocks.2.bn.bias", "mono_depth_decoder.convblocks.2.bn.running_mean", "mono_depth_decoder.convblocks.2.bn.running_var", "mono_depth_decoder.conv3x3.0.weight", "mono_depth_decoder.conv3x3.0.bias", "mono_depth_decoder.conv3x3.1.weight", "mono_depth_decoder.conv3x3.1.bias", "mono_depth_decoder.conv3x3.2.weight", "mono_depth_decoder.conv3x3.2.bias".

Rectified_raw cannot be downloaded

Hi @JeffWang987

Thanks for the great work.

Currently I'm trying to download Rectified_raw for training, however, the data is not able to be downloaded. Wondering if the download point is been closed?

Thanks

question about pretrained model

Thanks your code,when i use your pretrained model in test_mvs4.py,there are some questions,such as :
size mismatch for reg.0.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 64, 1, 3, 3]).
size mismatch for reg.1.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 32, 1, 3, 3]).
size mismatch for reg.2.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 16, 1, 3, 3]).
size mismatch for reg.3.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 1, 3, 3]).
can you give me some advice to edit your code ? Thanks!

ValueError: cannot select an axis to squeeze out which has size not equal to one

Thanks for your code.When I used the newly trained model in test_mvs.py, I encountered the following error:Traceback (most recent call last):
File "F:/MVSTER-main/test_mvs4.py", line 495, in
save_depth(testlist)
File "F:/MVSTER-main/test_mvs4.py", line 165, in save_depth
time_this_scene, sample_this_scene = save_scene_depth([scene])
File "F:/MVSTER-main/test_mvs4.py", line 264, in save_scene_depth
range(1, 5)]
File "F:/MVSTER-main/test_mvs4.py", line 263, in
confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in
ValueError: cannot select an axis to squeeze out which has size not equal to one，Have you encountered the same problem and how did you solve it

Potential for Stereo Matching？

I wonder have you test MVSTER on Stereo Matching datasets?
It looks like your method is easy to adapt to 2 image Stereo Matching scene

Are DTU results reported in the paper trained on 640 or 1280?

Thanks for your good work! I have not seen any discussion of the input resolution in the paper.

Training jammed on epoch 1

RUN train_mvs4.py

很抱歉打扰您，在运行程序时遇到了问题，想向您请教一下：
当我运行train_mvs4.py时出现了下面的问题：
Traceback (most recent call last):
File "/home/ly/Work/MVSTER/train_mvs4.py", line 425, in
train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 103, in train
loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 207, in train_sample
outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/Work/MVSTER/models/MVS4Net.py", line 97, in forward
depth_hypo = schedule_range(outputs_stage['depth'].detach(), self.stage_splits[stage_idx], self.depth_interals_ratio[stage_idx] * depth_interval, H, W)
File "/home/ly/Work/MVSTER/models/mvs4net_utils.py", line 97, in schedule_range
requires_grad=False).reshape(1, -1, 1, 1) * new_interval.unsqueeze(1))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
请问这个问题怎么解决呀

Why Epipolar ?

We use epipolar to reduce search space in order to find given ref image's point x's corresponding point on source images x'.

But before this we use homography warping to transform source images to ref image which means we already obtained x'?

LICENSE

Thank you for publishing code for your work! I was wondering if you could add a LICENSE file to your repo? Thanks!

Is DeformConvPack necessary when I train?

Cannot reproduce results - retraining results in overconfident model

Hi there,

Thanks for your great work!

I've used your code to re-train MVSTER on 640x512 images/depths and test on DTU 1152x864 images. I'm using a slightly different system - instead of 4 GPUs with batch size 2 on each GPU I am using 2 GPUs with batch size 4 on each. I've tried:

using the defaults you've set in the code (i.e., using a mono depth weight of 0 in the loss)
using the hyperparams you indicate in the paper (i.e., setting mono depth loss weight to 0.0003)
changing various parameters not indicated in the paper like warmup of the LR for transformer convergence and # of OT iterations

I've trained several models for each condition. So far, no model achieves the 0.313 overall result on DTU test set your pretrained model achieves. The best retrained model so far gets 0.325 on the test set.

One thing I've noticed is every re-trained model is highly overconfident in it's depth predictions. When testing using your pretrained model, the probability mask (with prob threshold 0.5) for a random image looks like this:

For the same image (and every other image I've inspected thus far), the probability mask of models I train looks like this:

Upon inspecting the output confidence map (i.e. test_mvs4.py: line 262), my re-trained models output >0.99 confidence for seemingly >99% of pixels of all confidence maps. I haven't actually measured this quantitatively, just inspected a large number of confidence maps. This results in less points being filtered, which in turn appears to result in a worse accuracy than the pretrained model (~0.400 for all my models vs. 0.350 for yours). Furthermore, these overconfident predictions occur even when testing a model only trained for a single epoch.

Any idea why this might be occurring? Was the provided pretrained model trained using this exact code or a previous version? Do you think the change from 4 GPU batch 2 to 2 GPU batch 4 would have that large a training effect? Any help would be much appreciated