Giter VIP home page Giter VIP logo

mvster's People

Contributors

jeffwang987 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mvster's Issues

The output .ply. Could you explain what are they ?

Thank you for sharing the repo! and for producing this awesome ECCV paper !!!

It is my first time working on this topic, so I would like to kindly ask questions about the output files.

Here, I have seen two sets of outputs. One set of the .ply files is in scanX/ply_local folders, where of each of scanX contains
the following output folders, cam, confidence, depth_est, images, mask, and ply_local. The other set of .ply files is those mvsnetXXX_l3.ply

So, my questions are

  1. Which one is the output from MVSTER? What is the meaning of each of these .ply files? For example, why there are two sets of them? (Please see the figure below for the examples)

  2. Is mvsnetXXX_l3.ply used for comparison? or it is also output from MVSTER?

  3. In each of scanX/ply_local folder, there are 3 .ply files... What are they? What is the meaning for each of them?
    (Please see the figure below for the examples)

  4. How did you visualize them..?. I am so sorry for asking this question. I only would like to learn from you. I have used Open3d, but it looks like the following which does not look as good as the example. I only use the command open3d draw

Question 1 & 2 Question 3 Question 4 Question 4
Screenshot from 2022-09-22 14-16-56 Screenshot from 2022-09-22 14-17-12 Screenshot from 2022-09-22 14-36-07 Screenshot from 2022-09-22 14-32-03 Screenshot from 2022-09-22 14-32-15

A qustion about mono depth

thanks for your code!
I have a question about MVS4Net.py
“line 109 outputs = self.mono_depth_decoder(outputs, depth_values[:,0], depth_values[:,1])”
depth_values[:,-1] seems to d_max rather than depth_values[:,1]. Is there a problem with my understanding?

Multi-scale camera projection matrices

Hi @JeffWang987

Thanks for the great work. There's a little confusion about the data preprocessing in dtu_yao4.py

Around Line 220, there are four different scales of corresponded camera intrinsic parameter. I aware MVSTER train the model in a multi-scale fasion, and the resolution of transformer outputs are as follows:

Stage 1: H/8 x W/8
Stage 2: H/4 x W/4
Stage 3: H/2 x W/2
Stage 4: H x W

So I expect the intrinsic of each stage will be /8, /4, /2, /1 sequentially. However, I find out in the code stage1 is /2, stage2 does nothing, stage3 is *2 and stage4 is *4 instead.

I'm wondering if there's something I misunderstand? Hope for a hint from you, thanks again for the great work!

Fine tuning on other datasets

Has anyone fine-tuned the pretrained model(https://github.com/JeffWang987/MVSTER/releases/tag/dtu_ckpt) provided by the author?When I did so, I found that the author may have left out "The Auxiliary Branch makes Monocular depth Estimation" from the model.

I got the following errors:

RuntimeError: Error(s) in loading state_dict for MVS4net:
Missing key(s) in state_dict: "mono_depth_decoder.convblocks.0.conv.weight", "mono_depth_decoder.convblocks.0.bn.weight", "mono_depth_decoder.convblocks.0.bn.bias", "mono_depth_decoder.convblocks.0.bn.running_mean", "mono_depth_decoder.convblocks.0.bn.running_var", "mono_depth_decoder.convblocks.1.conv.weight", "mono_depth_decoder.convblocks.1.bn.weight", "mono_depth_decoder.convblocks.1.bn.bias", "mono_depth_decoder.convblocks.1.bn.running_mean", "mono_depth_decoder.convblocks.1.bn.running_var", "mono_depth_decoder.convblocks.2.conv.weight", "mono_depth_decoder.convblocks.2.bn.weight", "mono_depth_decoder.convblocks.2.bn.bias", "mono_depth_decoder.convblocks.2.bn.running_mean", "mono_depth_decoder.convblocks.2.bn.running_var", "mono_depth_decoder.conv3x3.0.weight", "mono_depth_decoder.conv3x3.0.bias", "mono_depth_decoder.conv3x3.1.weight", "mono_depth_decoder.conv3x3.1.bias", "mono_depth_decoder.conv3x3.2.weight", "mono_depth_decoder.conv3x3.2.bias".

Rectified_raw cannot be downloaded

Hi @JeffWang987

Thanks for the great work.

Currently I'm trying to download Rectified_raw for training, however, the data is not able to be downloaded. Wondering if the download point is been closed?

Thanks

question about pretrained model

Thanks your code,when i use your pretrained model in test_mvs4.py,there are some questions,such as :
size mismatch for reg.0.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 64, 1, 3, 3]).
size mismatch for reg.1.conv0.conv.weight: copying a param with shape torch.Size([8, 8, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 32, 1, 3, 3]).
size mismatch for reg.2.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 16, 1, 3, 3]).
size mismatch for reg.3.conv0.conv.weight: copying a param with shape torch.Size([8, 4, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 1, 3, 3]).
can you give me some advice to edit your code ? Thanks!

ValueError: cannot select an axis to squeeze out which has size not equal to one

Thanks for your code.When I used the newly trained model in test_mvs.py, I encountered the following error:Traceback (most recent call last):
File "F:/MVSTER-main/test_mvs4.py", line 495, in
save_depth(testlist)
File "F:/MVSTER-main/test_mvs4.py", line 165, in save_depth
time_this_scene, sample_this_scene = save_scene_depth([scene])
File "F:/MVSTER-main/test_mvs4.py", line 264, in save_scene_depth
range(1, 5)]
File "F:/MVSTER-main/test_mvs4.py", line 263, in
confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in
ValueError: cannot select an axis to squeeze out which has size not equal to one,Have you encountered the same problem and how did you solve it

Potential for Stereo Matching?

I wonder have you test MVSTER on Stereo Matching datasets?
It looks like your method is easy to adapt to 2 image Stereo Matching scene

RUN train_mvs4.py

很抱歉打扰您,在运行程序时遇到了问题,想向您请教一下:
当我运行train_mvs4.py时出现了下面的问题:
Traceback (most recent call last):
File "/home/ly/Work/MVSTER/train_mvs4.py", line 425, in
train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 103, in train
loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args)
File "/home/ly/Work/MVSTER/train_mvs4.py", line 207, in train_sample
outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ly/anaconda3/envs/MVSTER/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ly/Work/MVSTER/models/MVS4Net.py", line 97, in forward
depth_hypo = schedule_range(outputs_stage['depth'].detach(), self.stage_splits[stage_idx], self.depth_interals_ratio[stage_idx] * depth_interval, H, W)
File "/home/ly/Work/MVSTER/models/mvs4net_utils.py", line 97, in schedule_range
requires_grad=False).reshape(1, -1, 1, 1) * new_interval.unsqueeze(1))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
请问这个问题怎么解决呀

Why Epipolar ?

We use epipolar to reduce search space in order to find given ref image's point x's corresponding point on source images x'.

But before this we use homography warping to transform source images to ref image which means we already obtained x'?

LICENSE

Thank you for publishing code for your work! I was wondering if you could add a LICENSE file to your repo? Thanks!

Cannot reproduce results - retraining results in overconfident model

Hi there,

Thanks for your great work!

I've used your code to re-train MVSTER on 640x512 images/depths and test on DTU 1152x864 images. I'm using a slightly different system - instead of 4 GPUs with batch size 2 on each GPU I am using 2 GPUs with batch size 4 on each. I've tried:

  1. using the defaults you've set in the code (i.e., using a mono depth weight of 0 in the loss)
  2. using the hyperparams you indicate in the paper (i.e., setting mono depth loss weight to 0.0003)
  3. changing various parameters not indicated in the paper like warmup of the LR for transformer convergence and # of OT iterations

I've trained several models for each condition. So far, no model achieves the 0.313 overall result on DTU test set your pretrained model achieves. The best retrained model so far gets 0.325 on the test set.

One thing I've noticed is every re-trained model is highly overconfident in it's depth predictions. When testing using your pretrained model, the probability mask (with prob threshold 0.5) for a random image looks like this:
00000000_photo

For the same image (and every other image I've inspected thus far), the probability mask of models I train looks like this:
00000000_photo

Upon inspecting the output confidence map (i.e. test_mvs4.py: line 262), my re-trained models output >0.99 confidence for seemingly >99% of pixels of all confidence maps. I haven't actually measured this quantitatively, just inspected a large number of confidence maps. This results in less points being filtered, which in turn appears to result in a worse accuracy than the pretrained model (~0.400 for all my models vs. 0.350 for yours). Furthermore, these overconfident predictions occur even when testing a model only trained for a single epoch.

Any idea why this might be occurring? Was the provided pretrained model trained using this exact code or a previous version? Do you think the change from 4 GPU batch 2 to 2 GPU batch 4 would have that large a training effect? Any help would be much appreciated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.