Giter VIP home page Giter VIP logo

imfnet's Introduction

IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration, 2022

PWC

This repository is the implementation of IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration.

The existing state-of-the-art point descriptors relies on structure information only, which omit the texture information. However, texture information is crucial for our humans to distinguish a scene part. Moreover, the current learning-based point descriptors are all black boxes which are unclear how the original points contribute to the fnal descriptors. In this paper, we propose a new multimodal fusion method to generate a point cloud registration descriptors by considering both structure and texture information. Specifcally, a novel attention-fusion module is designed to extract the weighted texture information for the descriptors extraction. In addition, we propose an interpretable module to explain our neural network by visually showing the original points in contributing to the fnal descriptors. We use the descriptors’ channel value as the loss to backpropagate to the target layer and consider the gradient as the signifcance of this point to the fnal descriptors. This paper moves one step further to explainable deep learning in the registration task. Comprehensive experiments on 3DMatch, 3DLoMatch and KITTI demonstrate that the multimodal fusion descriptors achieves state-of-the-art accuracy and improve the descriptors’ distinctiveness. We also demonstrate that our interpretable module in explaining the registration descriptors extraction.

Paper

FMR vs. RR

FMR Table RR Table
Table Table

Feature-match recall and Rigistration Recall in log scale on the 3DMatch benchmark.

The framework of IMFNet

The network architecture of the proposed IMFNet. The input is a point cloud and an image, and the output is a point descriptors. Inside the attention-fusion module, W is the weight matrix, FI is the point texture feature. Then, the fusion feature (Ffe) of point structure feature (Fpe) and point texture feature (FI) as an input to the decoder module to get the output descriptors. Final, the descriptors are interpreted by DAM.

The Overall Framework
FW

Please refer to our paper for more details.

Visualization of DAM

Our DAM can visiualize the points contribution distribution of descriptor extraction.

IMFNet FCGF
0 1
2 3

Requirements

  • Ubuntu 18.04.1 or higher
  • CUDA 11.1 or higher
  • Python v3.6 or higher
  • Pytorch v1.8 or higher
  • MinkowskiEngine v0.5 or higher

Dataset Download

Regarding the 3DMatch and 3DLoMatch, the images are selected for each point cloud based on their covered content to construct a dataset of paired images and point clouds named 3DImageMatch. Our experiments are conducted on this dataset. The dataset construction and training details are attached in the supplement material. Download the 3DImageMatch/Kitti . The code is p2gl.

Please concat the files

# 3DImageMatch
cat x00 x01 ... x17 > 3DImageMatch.zip
# Kitti
cat Kitti01 ... Kitti10 > Kitti.zip

Then, unzip the zip files.

Training

Train the 3DMatch

python train.py train_3DMatch.py

Train the Kitti

python train.py train_Kitti.py

Evaluating

For benchmarking the trained weights, download the pretrain file here . We also provide key points (5000) and some other results, here

Evaluating the 3DMatch or 3DLoMatch

# Generating Descriptors
python generate_desc.py --source <Testing Set Path> --target <Output Path> --model <CheckPoint Path>
# Evaluating 3DMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DMatch"
# Evaluating 3DLoMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DLoMatch"

Evaluating the Kitti

# Evaluating Kitti
python evaluation_kitti.py --save_dir <Output Path> --kitti_root <Testing Set Path>

Descriptor Activation Mapping

Visualization the target descriptor

python dam.py --target <target point index>

Citing our work

Please cite the following papers if you use our code:

@article{huang2021imfnet,
  title={IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration},
  author={Xiaoshui Huang, Wentao Qu, Yifan Zuo, Yuming Fang, Xiaowei Zhao},
  journal={IEEE Robotics and Automation Letters},
  year={2022}
}

imfnet's People

Contributors

qwtforgithub avatar

Stargazers

Marc Fischer avatar y.dai1@siat.ac.cn avatar  avatar ChenZhen avatar  avatar RUNHENG ZUO avatar Seungjae Lee avatar Matt avatar wofw avatar  avatar  avatar Saad Shahid avatar Xiaobing Han avatar Cahlen Humphreys avatar  avatar  avatar Ziyang Ye avatar Zirconium avatar Sai Shruthi Balaji avatar Miguel A. Vega Torres avatar Masamune Ishihara avatar ZhaoXianglin avatar Yinqiang Zhang avatar  avatar  avatar gzg avatar  avatar  avatar Cris@None avatar  avatar  avatar Yang Ai avatar Zhaoyi Wang avatar Scott Laue avatar Leander van den Eijnden avatar Yifan Xie avatar lihj1108 avatar Grandzxw avatar  avatar xiaofei sun avatar  avatar  avatar Xiaoshui Huang avatar  avatar

Watchers

 avatar Xiaoshui Huang avatar

imfnet's Issues

dataset

Thanks for sharing the source to the public.
There's a question about dataset that you provide a link in readme, i just download it and the "Kitti" data seems not complete !!
The structure of the dataset file doesn't look like the same as code want.
can you please check the source is correct!!

my route is "C:\kuan\IMFNet\Kitti\dataset\poses"
"C:\kuan\IMFNet\Kitti\dataset\sequences\00\velodyne"

thanks a lot!!

Licensing?

I would like to reuse your project for one of my projects.

Do you mind adding a license you are comfortable with?

I might repackage IMFNet into a python library. I will of course give you credit for most of the fantastic work. Let me know if you have any issues with it.

Thanks for open sourcing your project.

I've got some problem understanding the code in transform_estimation.py

def est_quad_linear_robust(pts0, pts1, weight=None):
pts0_curr = pts0
trans = torch.eye(4)

par = 1.0
if weight is None:
weight = torch.ones(pts0.size()[0], 1)

for i in range(20):
if i > 0 and i % 5 == 0:
par /= 2.0

A, b = build_linear_system(pts0_curr, pts1, weight)
x = solve_linear_system(A, b)



trans_curr = get_trans(x)
pts0_curr = update_pcd(pts0_curr, trans_curr)
weight = compute_weights(pts0_curr, pts1, par)
trans = trans_curr.mm(trans)

return trans

As shown above, I couldn't get the point why we can attain the transformation matrix by iteration. The code of solve_linear system seems to be complicated. Can someone help me solve my confusion?

def build_linear_system(pts0, pts1, weight):
npts0 = pts0.shape[0]
A0 = torch.zeros((npts0, 6))
A1 = torch.zeros((npts0, 6))
A2 = torch.zeros((npts0, 6))
A0[:, 1] = pts0[:, 2]
A0[:, 2] = -pts0[:, 1]
A0[:, 3] = 1
A1[:, 0] = -pts0[:, 2]
A1[:, 2] = pts0[:, 0]
A1[:, 4] = 1
A2[:, 0] = pts0[:, 1]
A2[:, 1] = -pts0[:, 0]
A2[:, 5] = 1
ww1 = weight.repeat(3, 6)
ww2 = weight.repeat(3, 1)
A = ww1 * torch.cat((A0, A1, A2), 0)
b = ww2 * torch.cat(
(pts1[:, 0] - pts0[:, 0], pts1[:, 1] - pts0[:, 1], pts1[:, 2] - pts0[:, 2]),
0,
).unsqueeze(1)
return A, b

def solve_linear_system(A, b):
temp = torch.inverse(A.t().mm(A))
return temp.mm(A.t()).mm(b)

Questions at the evaluation stage

Hello, I am very interested in your paper, I would like to ask you for the download link of the key points, because I cannot open this "For benchmarking the trained weights, download the pretrain file here" that you are now providing. We also provide key points (5000) and some other results, here", thank you again and look forward to your response.

Correspondences present in the result file IMFNet_3DMatch_result do not seem to be accurate

In the result folder IMFNet_3DMatch_result, taken from (https://drive.google.com/drive/folders/1Pb9bkQefwgBfxcrrfUokiY7_EYv10dfD?usp=sharing), the corresponding keypoints given for pairs of scans in the IMFNet subfolder don't seem to be accurate.

The corresponding keypoints are numpy array files which have two sets of indices inds_i and inds_j in each file. I assume that they are the keypoint ID's of the matched correspondences between the two scans 'i' and 'j'. For example, I presume that in the folder, "IMFNet_3DMatch_result/IMFNet", the file "7-scenes-redkitchen_seq-01_0_1_keypoints.npz" is the set of correspondences 'inds_i' and 'inds_j' between the scans "cloud_bin_0.ply" and "cloud_bin_1.ply" in the kitchen dataset in the 7-scenes-redkitchen dataset taken from http://vision.princeton.edu/projects/2016/3DMatch/downloads/scene-fragments/7-scenes-redkitchen.zip .

I assumed that the keypoint_ids given by inds_i and inds_j are the indices of the points in the point clouds "cloud_bin_0.ply" and "cloud_bin_1.ply" respectively. But there seems to be a mismatch in the correspondences on visual inspection and they don't seem to give good registration results also. Are there are any conventions and details that I am missing?

Also, the subfolders IMFNet and IMFNet_keypoints both seem to contain exactly the same .npz files. For example, the files "IMFNet/7-scenes-redkitchen_seq-01_0_1_keypoints.npz" and "IMFNet_keypoints/7-scenes-redkitchen_seq-01_0_1_keypoints.npz" have the same keypoint_id pairs inds_i and inds_j.

dataset and run out of memory

Thanks for sharing the impressive work!
Can you tell me where's your "Kitti" dataset's download website?
I just try "https://www.cvlibs.net/datasets/kitti/eval_odometry.php" and the file structure doesn't as same as yours.
also i have another question when i train 3Dmatch . It always run out of memory.
I have changed the batch size , voxel size and subsample number. How can I run this program successfully.(my device is 3080TI)

thanks a lot, sincerely waiting for your response.

requirement.txt

thanks for sharing this great opensource on the social platform, here is a question about requirement.txt, when i run the command to install all the package inside the file, it occurs plenty of errors. could you please check all the version in the requirement.txt and upload again for us !!

thanks a lot!!

sicerely waiting for your response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.