jeffwang987 / mvster Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2022] MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
License: MIT License
[ECCV 2022] MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
License: MIT License
Hello, when using this network is it necessary to know the camera internal parameters to generate the Epipolar lines? Or the network only requires the input of the image and no other parameters.
Thank you for sharing the repo! and for producing this awesome ECCV paper !!!
It is my first time working on this topic, so I would like to kindly ask questions about the output files.
Here, I have seen two sets of outputs. One set of the .ply files is in scanX/ply_local
folders, where of each of scanX
contains
the following output folders, cam
, confidence
, depth_est
, images
, mask
, and ply_local
. The other set of .ply files is those mvsnetXXX_l3.ply
So, my questions are
Which one is the output from MVSTER? What is the meaning of each of these .ply files? For example, why there are two sets of them? (Please see the figure below for the examples)
Is mvsnetXXX_l3.ply
used for comparison? or it is also output from MVSTER?
In each of scanX/ply_local
folder, there are 3 .ply
files... What are they? What is the meaning for each of them?
(Please see the figure below for the examples)
How did you visualize them..?. I am so sorry for asking this question. I only would like to learn from you. I have used Open3d, but it looks like the following which does not look as good as the example. I only use the command open3d draw
Question 1 & 2 | Question 3 | Question 4 | Question 4 |
---|---|---|---|
thanks for your code!
I have a question about MVS4Net.py
“line 109 outputs = self.mono_depth_decoder(outputs, depth_values[:,0], depth_values[:,1])”
depth_values[:,-1] seems to d_max rather than depth_values[:,1]. Is there a problem with my understanding?
Is there any settings wrong?
Trained with the source code , not pretrained model.
Hi @JeffWang987
Thanks for the great work. There's a little confusion about the data preprocessing in dtu_yao4.py
Around Line 220, there are four different scales of corresponded camera intrinsic parameter. I aware MVSTER train the model in a multi-scale fasion, and the resolution of transformer outputs are as follows:
Stage 1: H/8 x W/8
Stage 2: H/4 x W/4
Stage 3: H/2 x W/2
Stage 4: H x W
So I expect the intrinsic of each stage will be /8, /4, /2, /1 sequentially. However, I find out in the code stage1 is /2, stage2 does nothing, stage3 is *2 and stage4 is *4 instead.
I'm wondering if there's something I misunderstand? Hope for a hint from you, thanks again for the great work!
Has anyone fine-tuned the pretrained model(https://github.com/JeffWang987/MVSTER/releases/tag/dtu_ckpt) provided by the author?When I did so, I found that the author may have left out "The Auxiliary Branch makes Monocular depth Estimation" from the model.
I got the following errors:
RuntimeError: Error(s) in loading state_dict for MVS4net:
Missing key(s) in state_dict: "mono_depth_decoder.convblocks.0.conv.weight", "mono_depth_decoder.convblocks.0.bn.weight", "mono_depth_decoder.convblocks.0.bn.bias", "mono_depth_decoder.convblocks.0.bn.running_mean", "mono_depth_decoder.convblocks.0.bn.running_var", "mono_depth_decoder.convblocks.1.conv.weight", "mono_depth_decoder.convblocks.1.bn.weight", "mono_depth_decoder.convblocks.1.bn.bias", "mono_depth_decoder.convblocks.1.bn.running_mean", "mono_depth_decoder.convblocks.1.bn.running_var", "mono_depth_decoder.convblocks.2.conv.weight", "mono_depth_decoder.convblocks.2.bn.weight", "mono_depth_decoder.convblocks.2.bn.bias", "mono_depth_decoder.convblocks.2.bn.running_mean", "mono_depth_decoder.convblocks.2.bn.running_var", "mono_depth_decoder.conv3x3.0.weight", "mono_depth_decoder.conv3x3.0.bias", "mono_depth_decoder.conv3x3.1.weight", "mono_depth_decoder.conv3x3.1.bias", "mono_depth_decoder.conv3x3.2.weight", "mono_depth_decoder.conv3x3.2.bias".
Hi @JeffWang987
Thanks for the great work.
Currently I'm trying to download Rectified_raw for training, however, the data is not able to be downloaded. Wondering if the download point is been closed?
Thanks
Thanks your code,when i use your pretrained model in test_mvs4.py,there are some questions,such as :
size mismatch for reg.0.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 64, 1, 3, 3]).
size mismatch for reg.1.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 32, 1, 3, 3]).
size mismatch for reg.2.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 16, 1, 3, 3]).
size mismatch for reg.3.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 1, 3, 3]).
can you give me some advice to edit your code ? Thanks!
Thanks for your code.When I used the newly trained model in test_mvs.py, I encountered the following error:Traceback (most recent call last):
File "F:/MVSTER-main/test_mvs4.py", line 495, in
save_depth(testlist)
File "F:/MVSTER-main/test_mvs4.py", line 165, in save_depth
time_this_scene, sample_this_scene = save_scene_depth([scene])
File "F:/MVSTER-main/test_mvs4.py", line 264, in save_scene_depth
range(1, 5)]
File "F:/MVSTER-main/test_mvs4.py", line 263, in
confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in
ValueError: cannot select an axis to squeeze out which has size not equal to one,Have you encountered the same problem and how did you solve it
I wonder have you test MVSTER on Stereo Matching datasets?
It looks like your method is easy to adapt to 2 image Stereo Matching scene
Thanks for your good work! I have not seen any discussion of the input resolution in the paper.
很抱歉打扰您,在运行程序时遇到了问题,想向您请教一下:
当我运行train_mvs4.py时出现了下面的问题:
Traceback (most recent call last):
File "/home/ly/Work/MVSTER/train_mvs4.py", line 425, in
train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 103, in train
loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 207, in train_sample
outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/Work/MVSTER/models/MVS4Net.py", line 97, in forward
depth_hypo = schedule_range(outputs_stage['depth'].detach(), self.stage_splits[stage_idx], self.depth_interals_ratio[stage_idx] * depth_interval, H, W)
File "/home/ly/Work/MVSTER/models/mvs4net_utils.py", line 97, in schedule_range
requires_grad=False).reshape(1, -1, 1, 1) * new_interval.unsqueeze(1))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
请问这个问题怎么解决呀
We use epipolar to reduce search space in order to find given ref image's point x
's corresponding point on source images x'
.
But before this we use homography warping to transform source images to ref image which means we already obtained x'
?
Thank you for publishing code for your work! I was wondering if you could add a LICENSE
file to your repo? Thanks!
Hi there,
Thanks for your great work!
I've used your code to re-train MVSTER on 640x512 images/depths and test on DTU 1152x864 images. I'm using a slightly different system - instead of 4 GPUs with batch size 2 on each GPU I am using 2 GPUs with batch size 4 on each. I've tried:
I've trained several models for each condition. So far, no model achieves the 0.313 overall result on DTU test set your pretrained model achieves. The best retrained model so far gets 0.325 on the test set.
One thing I've noticed is every re-trained model is highly overconfident in it's depth predictions. When testing using your pretrained model, the probability mask (with prob threshold 0.5) for a random image looks like this:
For the same image (and every other image I've inspected thus far), the probability mask of models I train looks like this:
Upon inspecting the output confidence map (i.e. test_mvs4.py: line 262
), my re-trained models output >0.99 confidence for seemingly >99% of pixels of all confidence maps. I haven't actually measured this quantitatively, just inspected a large number of confidence maps. This results in less points being filtered, which in turn appears to result in a worse accuracy than the pretrained model (~0.400 for all my models vs. 0.350 for yours). Furthermore, these overconfident predictions occur even when testing a model only trained for a single epoch.
Any idea why this might be occurring? Was the provided pretrained model trained using this exact code or a previous version? Do you think the change from 4 GPU batch 2 to 2 GPU batch 4 would have that large a training effect? Any help would be much appreciated
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.