jiayuyang / cvp-mvsnet Goto Github PK

Cost Volume Pyramid Based Depth Inference for Multi-View Stereo (CVPR 2020 Oral)

Python 98.09% Shell 1.64% Dockerfile 0.27%

cvp-mvsnet's Introduction

Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

CVP-MVSNet (CVPR 2020 Oral) is a cost volume pyramid based depth inference framework for Multi-View Stereo.

CVP-MVSNet is compact, lightweight, fast in runtime and can handle high resolution images to obtain high quality depth map for 3D reconstruction.

If you find this project useful for your research, please cite:

@InProceedings{Yang_2020_CVPR,
    author = {Yang, Jiayu and Mao, Wei and Alvarez, Jose M. and Liu, Miaomiao},
    title = {Cost Volume Pyramid Based Depth Inference for Multi-View Stereo},
    booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2020}
}

How to use

0. Pre-requisites

Nvidia GPU with 11GB or more vRam.
CUDA 10.1
python3.6
python2.7 for fusion script

1. Clone the source code

git clone https://github.com/JiayuYANG/CVP-MVSNet.git

2. Download testing dataset

Testing data(2G):

Download our pre-processed DTU testing data from here and extract it to CVP_MVSNet/dataset/dtu-test-1200.

3. Install requirements

cd CVP_MVSNet

pip3 install -r requirements.txt

4. Generate depth map using our pre-trained model

sh eval.sh

When finished, you can find depth maps in outputs_pretrained folder.

5. Generate point clouds and reproduce DTU results

Check out Yao Yao's modified version of fusibile

git clone https://github.com/YoYo000/fusibile

Install fusibile by cmake . and make, which will generate the executable atFUSIBILE_EXE_PATH

Link fusibile executable into fusion folder (Note: You should modify FUSIBILE_EXE_PATH to the path to your fusibile executable)

ln -s FUSIBILE_EXE_PATH CVP_MVSNet/fusion/fusibile

Install extra dependencies

pip2 install -r CVP_MVSNet/fusion/requirements_fusion.txt

Use provided script to use fusibile to generate point clouds.

cd CVP_MVSNet/fusion/

sh fusion.sh

Use provided script to move generated point clouds into outputs_pretrained/dtu_eval folder

python2 fusibile_to_dtu_eval.py

Evaluate the point clouds using the DTU evaluation code.

The results should be like:

Acc. (mm)	Comp. (mm)	Overall (mm)
0.296	0.406	0.351

6. Train your own model

Download training dataset from here and extract it to CVP-MVSNet/datasets/dtu-train-128.

Modify training parameters in train.sh script.

Start training

sh train.sh

Acknowledgment

This repository is partly based on the MVSNet_pytorch repository by Xiaoyang Guo. Many thanks to Xiaoyang Guo for the great code!

This repository is inspired by the MVSNet by Yao Yao et al. Many thanks to Yao Yao and his mates for the great paper and great code!

cvp-mvsnet's People

Contributors

Stargazers

Watchers

cvp-mvsnet's Issues

depth_refine in pretrained model

Hi,

I found it will report unexpected keys when loading the provided ckpt. Is this depth_refine module abandoned or renamed in the final version? I was wondering if we need to change some settings here.

Permission denied when using ./fusible of fusion.sh

filter depth map with probability map
Convert mvsnet output to gipuma input
Run depth map fusion & filter
./fusibile -input_folder ../outputs_pretrained/fusibile_fused/scan1/ -p_folder ../outputs_pretrained/fusibile_fused/scan1/cams/ -images_folder ../outputs_pretrained/fusibile_fused/scan1/images/ --depth_min=0.001 --depth_max=100000 --normal_thresh=360 --disp_thresh=0.13 --num_consistent=3.0
sh: 1: ./fusibile: Permission denied

datasets request

I can't download dtu-test-1200 and dtu-train-128 from the links provided, can anyone share the downloaded data set resources?

Questions about the automatically selected depth search range?

Hi, @JiayuYANG ! The code works fine on DTU dataset. Recently I am trying to apply the CVP-MVSNet on Tanks and Temples dataset. I have written the dataloader myself based on the preprocessed dataset of MVSNet. However, I found it not working combined with your code. Sometimes it even outputs a depth map with negative values. I tried to print out the depth search range and its related variables, and found it wierd.
For example, in DTU dataset, in default depth_min=425, depth_max=1065:

level 0:
nhypothesis_init = 48
depth_interval_mean: 13.617020606994629
level 1:
value of interval_maps: 9.305537227260542
level 2:
value of interval_maps: 4.637211586944452
level 3:
value of interval_maps: 2.31558094108701
level 4:
value of interval_maps: 1.1571272050767465

However, in Tanks and Temples dataset, we take the Train scene as example, depth_min=0.7208, depth_max=2.7420.

level 0:
nhypothesis_init = 48
depth_interval_mean: 0.043003734201192856
level 1:
value of interval_maps: 0.1490792499588456
level 2:
value of interval_maps: 0.07300084890942911
level 3:
value of interval_maps: 0.0361162031170659
level 4:
value of interval_maps: 0.017939026685205597

I am confused about the reason why the value of interval_maps 0.1490792499588456 even becomes larger than the initial depth interval 0.043003734201192856. The Tanks and Temples dataloader works fine combined with MVSNet. Am I missing something about the code or the paper? How did you test your model on Tanks and Temples dataset? Do you have any suggestions how to solve this problem?
Thank You!

How to choose the specific model from 40 models

Hi @JiayuYANG,

Thanks for your great work!

In this project, authors provided the pre-trained model of the 27th epoch (model_0027.ckpt). Would you kindly explain how the authors choose the specific model from 40 models?

I've retrained your network using the default setting, and got 40 models which correspond to 40 epochs. Attached is the generated log file, and the epoch loss is dropping gradually during 40 epochs. Seems that there is no validation process during training, so it's a little confusing about how to choose the specific model to test.

Thanks for your kind reply in advance, and please correct me if I miss something.
20200831-0132.log

Evaluation on Tanks and Temples dataset

Hi @JiayuYANG , thanks for sharing the codes of your paper. The codes work fine on DTU dataset. Currently, I want to apply the codes to Tanks and Temples dataset(provided by MVSNet), which is also mentioned in your paper. Although I have already modified the dataloader by adjusting the camera intrinsics according to the image resolution, I am still not able to run the code correctly. I can not find any instructions and replys about the Tanks and Temple dataset in other repos in github. Could you provide the evaluation code or the dataloader of Tanks and Temples dataset? Thanks a lot!!!

Hi,can you share some detail

When test in tank&temples dataset,the fusion parameters should be set as? And depth intervals follow MVSNet?
Tank you very much!!!

How much is the final training loss?

Thanks for your amazing work. @JiayuYANG @MiaomiaoLiu

The 3D points(DTU) produced by your pretrained model_0027.ckpt is consistent with your paper.
I have re-trained the network with bactch_size=12. In this case, the 3D points(DTU) produced by my model_0027.ckpt is worse than the result in your paper, using the same depth fusion method.
Besides, the traing loss of my model_0027.ckpt is ~52. I think it's a bit big. So I wonder how much is your final training loss?

Could you please provide a link of your Supp. Material if you are convenient?

Looking forward to you reply.
Best Regards.

Found 0.00 million points

Hi,thanks for your work, and when I use my own datasets train a model,but when fusion with depths,I cant get results
of point clounds without any error information, so I guess maybe the para settings of fusion code can you give any information about it thanks very much! And my own datasets is with size of 960x540 and 30 cameras.

Can you provide your 3D point clouds for DTU test set?

Hi, thanks for your great work!

Can you upload your final point clouds of DTU test set so that I can have a fair comparison without reproducing your results?

Thanks,
Khang.

Running inference on 8GB RTX 3070

Hi. I have a RTX 3070 GPU with 8GB VRAM which is lower than the requirements mentioned in the repo. I ran into an error which executing eval.sh:

I assume it is because of my GPU memory running out. Could you suggest any simple modifications I could do to the code to make it fit within my GPU? Thanks!

License

I would like to congratulate you on your awesome work on multi-view stereo (CVP-MVSNet). I am working in a startup where we have to generate 3d point clouds from multiple sets of images.

I am writing this email to ask you if you can give us a generous license (MIT or BSD) using which we can use your algorithm in our pipeline. It would to really helpful if you can add the license to your GitHub repository as well.

Anyway to run test it on Nvidia GPU 10 GB

Hi there, please can you help me if possible, i want to test it on GPU having 10GB VRAM is it possible?

RuntimeError: cuda runtime error (710) : device-side assert triggered at

I just run the train.py as your Readme, but this error occurs. It seems due to index of testing set, but I don't know how to solve it.
I change the --loadckpt to False while training becauese I want it train from the begining. Besides, while training at the Iter 210, the loss is Nan.
BTW, if I change the batch size from 1 to 2, this error disappers, but the result was terrible. I don't know why.

generalization ability（泛化能力） of neural network based MVS

Dear Author,

Can we just train the neural network based MVS with given dataset like DTU and do the inference on our own scene ? Dose the neural network generalize well to different scenes?

Thanks!

Test on custom dataset

Hi @JiayuYANG , thank you for sharing this great project, I tried the dtu dataset, the result is very impressive.
But I'm troubling in testing on other dataset such as ETH3D, I changed the dataset loader code, but since the image size is different, there is an error says "shape '[1, 1, 8, 7605]' is invalid for input of size 59392"
Can you provide a script to show how to test the network on other dataset? Or if you can give me some advice

自采数据

您好，使用colmap2mvsnet代码实现dtu数据scan1 sfm相机参数的转化，但是效果非常的差不知道为什么您能给点意见嘛
以下是我修改的代码
max_d = 192
interval_scale = 0.8
depth_max = 256

Found 0.00 million points

when running “fusion.sh", ”found 0.00 million points“ always appears and makes the final point cloud disappear, why is this？However when I run Cascade MVSNet, the point cloud fusion is successful. I use python2.7. ：）

How to load checkpoint(.ckpt) in test code?

hello. While I run this code,
I'can't load test checkpoint file you uploaded.
It doesn't use tensorflow but, it is saved .ckpt file extension.
How to do it?

Question about pixel interval setting

You claim that the depth search interval in the finer levels cannot be too narrow, otherwise the reprojected pixels will have similar features that make them difficult to distinguish. I agree with this statement, however a wider depth interval also makes the precision to decrease, doesn't it?

In my opinion, this problem comes down to how large actually is the depth interval. Let's take original MVSNet as example, it has 256 depth planes with an interval of 2mm. If your proposed method can narrow that down to, for example, 1mm in the finer levels, then it is indeed good, but if it only narrows to 1.9mm, it barely has no effect... why not just set a fixed value from the beginning? By the way, I also read a paper that claims they can narrow the depth interval to 0.8mm with high precision.

So my questions are

How large actually is 0.5pix range in real mm?
If it's somewhere near the original 2mm, is it really effective/necessary to compute this? If it is <2mm, how small it is?

Environmental conflicts

Dear author, in the fifth step of readme, I am not sure how to install tensorflow in a Python2 environment because the minimum Python version required by protobuf is 3.7. May I ask how you resolved this issue

accuracy

Hello, how do you get the accuracy rate?
I've downloaded the matlab code now, but I don't know how to use it.

returned value in calDepthHypo

Hello, in modules/calDepthHypo.py, X multiply by the reprojection of X which concat with projection of X3(if I am right, X3 is projection of X plus 1 pixel in source camera).

            tmp1 = torch.inverse(M1)
            tmp2 = M2.unsqueeze(2)
            ans = torch.matmul(tmp1, tmp2)

https://github.com/ToughStoneX/Self-Supervised-MVS/blob/1420bfb1d65e16677f5488360c387b28b57770ef/jdacs-ms/models/modules.py#L192

I can't understand why the sentence above can get the delta_d(depth plus residual error?)
Can you explain why

What is "disp.dmb" in "outputs_pretrained/fusibile_fused"?How to understand this?

Google Colab notebook?

Thanks for making this code avaiable! would you consider making a google colab notebook so people who are not as technically proficient like myself can try it out? thanks in advance!

fusion.sh

when I run fusion.sh, have a bug:
data = np.fromstring(data_string, data_type)
ValueError: string size must be a multiple of element size

your code is"

scale = float((file.readline()).decode('UTF-8').rstrip())
if scale < 0: # little-endian
data_type = '<f'
else:
data_type = '>f' # big-endian
data_string = file.read()
#data = np.fromstring(data_string)
data = np.fromstring(data_string, "data_type")

How to compute the f-score of DTU point cloud ?

Hi, @JiayuYANG
How to compute the f-score for DTU dataset, as described in Table 2 in your paper?

Question about GPU memory usage

Hi @JiayuYANG ,

Thanks for your great work!

When running the evaluation.sh with default setting, I found that the GPU memory usage usually ranges from 9900~11000 MB, and it is higher than 8795 MB (which is shown in Table 2 in the paper).

Are there possible reasons to cause higher GPU memory usage? Thank you very much.

ERROR

Traceback (most recent call last):
File "train.py", line 196, in
train()
File "train.py", line 137, in train
loss = train_sample(sample, detailed_summary=do_summary)
File "train.py", line 183, in train_sample
loss.append(model_loss(depth_est_list[i], depth_gt.float(), mask))
File "/mnt/CVP-MVSNet-master/CVP_MVSNet/models/net.py", line 195, in sL1_loss
return F.smooth_l1_loss(depth_est[mask], depth_gt[mask], reduction='mean')
IndexError: The shape of the mask [1, 128, 160] at index 1 does not match the shape of the indexed tensor [1, 512, 640] at index 1

coda=2(cudaErrorMemoryallocation) when run fusible in 'Tanks and Temples'

When I fuse the depth map of Tanks and Temples by Yao Yao's fusibile, I meet this error.
Specifically, I meet this error with the scene Francis(302 images), Lighthouse(309 images), M60(313 images), Panther(314 images), Playground(307 images), Train(301 images). The scene Family(152 images) and Horse(151 images) can run successfully. The image size is 1056x1920 or 1056x2048.
This seems to be a problem of out of memory. My GPU is RTX2080Ti. Did you meet this problem?

如何使用自己录制的数据集

您好，我已经成功运行了您的成果，在dtu数据集上结果确实很好，但我现在想利用您的成果把自己的数据集输出重建结果，您能告诉我该怎么做吗？

自采数据集的参数设置

您好，我们使用colmap2mvsnet.py得到了自采数据的相机参数，但最终输出的深度图效果较差，请问colmap2mvsnet.py中的max_d以及interval_scale参数应该如何设置？谢谢

Pixel Interval

where is evaluation pixel interval settings?

Matlab Evaluation

Hello @JiayuYANG, very excited to see your constructed CVP-MVSNet model. In the process of training, I found that the speed of training and inference speed is very fast. However, the process of calculating the dense point cloud and the test set of DTU data based on matlab code is very slow. Do you have similar problems? Thank you very much.

Tank test

Thank you for your work. Where can I download the supplementary materials mentioned in your paper?

No points in .ply output

Sorry to bother again.

I run fusion.sh and the setting is not changed. But I could not get any points in .ply.
My depth_est and confidence is OK, i'v checked by visualization.

(fusion) ➜ fusion bash fusion.sh
filter depth map with probability map
Convert mvsnet output to gipuma input
Run depth map fusion & filter
./fusibile/fusibile -input_folder ../outputs_pretrained/fusibile_fused/scan1/ -p_folder ../outputs_pretrained/fusibile_fused/scan1/cams/ -images_folder ../outputs_pretrained/fusibile_fused/scan1/images/ --depth_min=0.001 --depth_max=100000 --normal_thresh=360 --disp_thresh=0.13 --num_consistent=3.0
./fusibile/fusibile
Command-line parameter error: unknown option -input_folder
input folder is ../outputs_pretrained/fusibile_fused/scan1/
image folder is ../outputs_pretrained/fusibile_fused/scan1/images/
p folder is ../outputs_pretrained/fusibile_fused/scan1/cams/
pmvs folder is
numImages is 49
img_filenames is 49
Device memory used: 172.818436MB
Device memory used: 172.818436MB
P folder is ../outputs_pretrained/fusibile_fused/scan1/cams/
numCameras is 49
Camera size is 49
Accepted intersection angle of central rays is 10.000000 to 30.000000 degrees
Selected views: 49
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
Reading normals and depth from disk
Size consideredIds is 49
Reading normal 0
Reading disp 0
Reading normal 1
Reading disp 1
Reading normal 2
Reading disp 2
Reading normal 3
Reading disp 3
Reading normal 4
Reading disp 4
Reading normal 5
Reading disp 5
Reading normal 6
Reading disp 6
Reading normal 7
Reading disp 7
Reading normal 8
Reading disp 8
Reading normal 9
Reading disp 9
Reading normal 10
Reading disp 10
Reading normal 11
Reading disp 11
Reading normal 12
Reading disp 12
Reading normal 13
Reading disp 13
Reading normal 14
Reading disp 14
Reading normal 15
Reading disp 15
Reading normal 16
Reading disp 16
Reading normal 17
Reading disp 17
Reading normal 18
Reading disp 18
Reading normal 19
Reading disp 19
Reading normal 20
Reading disp 20
Reading normal 21
Reading disp 21
Reading normal 22
Reading disp 22
Reading normal 23
Reading disp 23
Reading normal 24
Reading disp 24
Reading normal 25
Reading disp 25
Reading normal 26
Reading disp 26
Reading normal 27
Reading disp 27
Reading normal 28
Reading disp 28
Reading normal 29
Reading disp 29
Reading normal 30
Reading disp 30
Reading normal 31
Reading disp 31
Reading normal 32
Reading disp 32
Reading normal 33
Reading disp 33
Reading normal 34
Reading disp 34
Reading normal 35
Reading disp 35
Reading normal 36
Reading disp 36
Reading normal 37
Reading disp 37
Reading normal 38
Reading disp 38
Reading normal 39
Reading disp 39
Reading normal 40
Reading disp 40
Reading normal 41
Reading disp 41
Reading normal 42
Reading disp 42
Reading normal 43
Reading disp 43
Reading normal 44
Reading disp 44
Reading normal 45
Reading disp 45
Reading normal 46
Reading disp 46
Reading normal 47
Reading disp 47
Reading normal 48
Reading disp 48
Resizing globalstate to 49
Run cuda
Run gipuma
Grid size initrand is grid: 50-38 block: 32-32
Device memory used: 3461.152832MB
Number of iterations is 8
Blocksize is 15x15
Disparity threshold is 0.130000
Normal threshold is 6.283185
Number of consistent points is 3
Cam scale is 1.000000
Fusing points
Processing camera 0
Found 0.00 million points
Processing camera 1
Found 0.00 million points
Processing camera 2
Found 0.00 million points
Processing camera 3
Found 0.00 million points
Processing camera 4
Found 0.00 million points
Processing camera 5
Found 0.00 million points
Processing camera 6
Found 0.00 million points
Processing camera 7
Found 0.00 million points
Processing camera 8
Found 0.00 million points
Processing camera 9
Found 0.00 million points
Processing camera 10
Found 0.00 million points
Processing camera 11
Found 0.00 million points
Processing camera 12
Found 0.00 million points
Processing camera 13
Found 0.00 million points
Processing camera 14
Found 0.00 million points
Processing camera 15
Found 0.00 million points
Processing camera 16
Found 0.00 million points
Processing camera 17
Found 0.00 million points
Processing camera 18
Found 0.00 million points
Processing camera 19
Found 0.00 million points
Processing camera 20
Found 0.00 million points
Processing camera 21
Found 0.00 million points
Processing camera 22
Found 0.00 million points
Processing camera 23
Found 0.00 million points
Processing camera 24
Found 0.00 million points
Processing camera 25
Found 0.00 million points
Processing camera 26
Found 0.00 million points
Processing camera 27
Found 0.00 million points
Processing camera 28
Found 0.00 million points
Processing camera 29
Found 0.00 million points
Processing camera 30
Found 0.00 million points
Processing camera 31
Found 0.00 million points
Processing camera 32
Found 0.00 million points
Processing camera 33
Found 0.00 million points
Processing camera 34
Found 0.00 million points
Processing camera 35
Found 0.00 million points
Processing camera 36
Found 0.00 million points
Processing camera 37
Found 0.00 million points
Processing camera 38
Found 0.00 million points
Processing camera 39
Found 0.00 million points
Processing camera 40
Found 0.00 million points
Processing camera 41
Found 0.00 million points
Processing camera 42
Found 0.00 million points
Processing camera 43
Found 0.00 million points
Processing camera 44
Found 0.00 million points
Processing camera 45
Found 0.00 million points
Processing camera 46
Found 0.00 million points
Processing camera 47
Found 0.00 million points
Processing camera 48
Found 0.00 million points
ELAPSED 0.914342 seconds
Error: no kernel image is available for execution on the device
Writing ply file ../outputs_pretrained/fusibile_fused/scan1//consistencyCheck-20200723-125040//final3d_model.ply
store 3D points to ply file

Normal Map

How to generate normal map and render it like your paper presented?

Could the normal map be generated directly from the point cloud( i.e. the .ply file) ?

Different image size with same intrinsics between your training data and data from MVSNet_pytorch.

@JiayuYANG Hi, thank you for your great work.
I notice the intrinsics of your training data is the same with the data from MVSNet_pytorch, but the image size of them is different, your training image is 128x160, MVSNet_pytorch is 512x640.
e.g.
The camera parameters of your training data in dtu-train-128/Cameras/00000000_cam.txt is:

But the camera parameters of MVSNet_pytorch 's data in mvs_training/dtu/Cameras/train/00000000_cam.txt is:

Does that have no effect on training results, and the final inference?
I am confused because I think the intrinsics should be scaled down.

By the way, the different scale fx/fy/cx/cy between test data and train data in MVSNet_pytorch makes me confused too.
test | train | scale
H 1200 | 512 | 2.22
W 1600 | 640 | 2.5
fx 2892.33 | 361.54125 | 8.0
fy 2883.18 | 360.3975 | 8.0
cx 823.205 | 82.900625 | 9.93
cy 619.071 | 66.383875 | 9.33