xiaoshuihuang / imfnet Goto Github PK

[IEEE RAL 2022] IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration

License: MIT License

Python 100.00%

imfnet's Introduction

IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration, 2022

This repository is the implementation of IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration.

The existing state-of-the-art point descriptors relies on structure information only, which omit the texture information. However, texture information is crucial for our humans to distinguish a scene part. Moreover, the current learning-based point descriptors are all black boxes which are unclear how the original points contribute to the fnal descriptors. In this paper, we propose a new multimodal fusion method to generate a point cloud registration descriptors by considering both structure and texture information. Specifcally, a novel attention-fusion module is designed to extract the weighted texture information for the descriptors extraction. In addition, we propose an interpretable module to explain our neural network by visually showing the original points in contributing to the fnal descriptors. We use the descriptors’ channel value as the loss to backpropagate to the target layer and consider the gradient as the signifcance of this point to the fnal descriptors. This paper moves one step further to explainable deep learning in the registration task. Comprehensive experiments on 3DMatch, 3DLoMatch and KITTI demonstrate that the multimodal fusion descriptors achieves state-of-the-art accuracy and improve the descriptors’ distinctiveness. We also demonstrate that our interpretable module in explaining the registration descriptors extraction.

Paper

FMR vs. RR

FMR Table	RR Table

Feature-match recall and Rigistration Recall in log scale on the 3DMatch benchmark.

The framework of IMFNet

The network architecture of the proposed IMFNet. The input is a point cloud and an image, and the output is a point descriptors. Inside the attention-fusion module, W is the weight matrix, FI is the point texture feature. Then, the fusion feature (Ffe) of point structure feature (Fpe) and point texture feature (FI) as an input to the decoder module to get the output descriptors. Final, the descriptors are interpreted by DAM.

The Overall Framework

Please refer to our paper for more details.

Visualization of DAM

Our DAM can visiualize the points contribution distribution of descriptor extraction.

IMFNet	FCGF

Requirements

Ubuntu 18.04.1 or higher
CUDA 11.1 or higher
Python v3.6 or higher
Pytorch v1.8 or higher
MinkowskiEngine v0.5 or higher

Dataset Download

Regarding the 3DMatch and 3DLoMatch, the images are selected for each point cloud based on their covered content to construct a dataset of paired images and point clouds named 3DImageMatch. Our experiments are conducted on this dataset. The dataset construction and training details are attached in the supplement material. Download the 3DImageMatch/Kitti . The code is p2gl.

Please concat the files

# 3DImageMatch
cat x00 x01 ... x17 > 3DImageMatch.zip
# Kitti
cat Kitti01 ... Kitti10 > Kitti.zip

Then, unzip the zip files.

Training

Train the 3DMatch

python train.py train_3DMatch.py

Train the Kitti

python train.py train_Kitti.py

Evaluating

For benchmarking the trained weights, download the pretrain file here . We also provide key points (5000) and some other results, here

Evaluating the 3DMatch or 3DLoMatch

# Generating Descriptors
python generate_desc.py --source <Testing Set Path> --target <Output Path> --model <CheckPoint Path>
# Evaluating 3DMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DMatch"
# Evaluating 3DLoMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DLoMatch"

Evaluating the Kitti

# Evaluating Kitti
python evaluation_kitti.py --save_dir <Output Path> --kitti_root <Testing Set Path>

Descriptor Activation Mapping

Visualization the target descriptor

python dam.py --target <target point index>

Citing our work

Please cite the following papers if you use our code:

@article{huang2021imfnet,
  title={IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration},
  author={Xiaoshui Huang, Wentao Qu, Yifan Zuo, Yuming Fang, Xiaowei Zhao},
  journal={IEEE Robotics and Automation Letters},
  year={2022}
}

imfnet's People

Contributors

Stargazers

Watchers

Forkers

qwtforgithub liuxinren456852 haixuantao gfmei zlynpu hyy-grab whuhxb hitkkz xiaokun-dong

imfnet's Issues

dataset

Thanks for sharing the source to the public.
There's a question about dataset that you provide a link in readme, i just download it and the "Kitti" data seems not complete !!
The structure of the dataset file doesn't look like the same as code want.
can you please check the source is correct!!

my route is "C:\kuan\IMFNet\Kitti\dataset\poses"
"C:\kuan\IMFNet\Kitti\dataset\sequences\00\velodyne"

thanks a lot!!

Licensing?

I would like to reuse your project for one of my projects.

Do you mind adding a license you are comfortable with?

I might repackage IMFNet into a python library. I will of course give you credit for most of the fantastic work. Let me know if you have any issues with it.

Thanks for open sourcing your project.

I've got some problem understanding the code in transform_estimation.py

def est_quad_linear_robust(pts0, pts1, weight=None):
pts0_curr = pts0
trans = torch.eye(4)

par = 1.0
if weight is None:
weight = torch.ones(pts0.size()[0], 1)

for i in range(20):
if i > 0 and i % 5 == 0:
par /= 2.0
A, b = build_linear_system(pts0_curr, pts1, weight)
x = solve_linear_system(A, b)



trans_curr = get_trans(x)
pts0_curr = update_pcd(pts0_curr, trans_curr)
weight = compute_weights(pts0_curr, pts1, par)
trans = trans_curr.mm(trans)
return trans

As shown above, I couldn't get the point why we can attain the transformation matrix by iteration. The code of solve_linear system seems to be complicated. Can someone help me solve my confusion?

def build_linear_system(pts0, pts1, weight):
npts0 = pts0.shape[0]
A0 = torch.zeros((npts0, 6))
A1 = torch.zeros((npts0, 6))
A2 = torch.zeros((npts0, 6))
A0[:, 1] = pts0[:, 2]
A0[:, 2] = -pts0[:, 1]
A0[:, 3] = 1
A1[:, 0] = -pts0[:, 2]
A1[:, 2] = pts0[:, 0]
A1[:, 4] = 1
A2[:, 0] = pts0[:, 1]
A2[:, 1] = -pts0[:, 0]
A2[:, 5] = 1
ww1 = weight.repeat(3, 6)
ww2 = weight.repeat(3, 1)
A = ww1 * torch.cat((A0, A1, A2), 0)
b = ww2 * torch.cat(
(pts1[:, 0] - pts0[:, 0], pts1[:, 1] - pts0[:, 1], pts1[:, 2] - pts0[:, 2]),
0,
).unsqueeze(1)
return A, b

def solve_linear_system(A, b):
temp = torch.inverse(A.t().mm(A))
return temp.mm(A.t()).mm(b)

Questions at the evaluation stage

Hello, I am very interested in your paper, I would like to ask you for the download link of the key points, because I cannot open this "For benchmarking the trained weights, download the pretrain file here" that you are now providing. We also provide key points (5000) and some other results, here", thank you again and look forward to your response.

Correspondences present in the result file IMFNet_3DMatch_result do not seem to be accurate

In the result folder IMFNet_3DMatch_result, taken from (https://drive.google.com/drive/folders/1Pb9bkQefwgBfxcrrfUokiY7_EYv10dfD?usp=sharing), the corresponding keypoints given for pairs of scans in the IMFNet subfolder don't seem to be accurate.

The corresponding keypoints are numpy array files which have two sets of indices inds_i and inds_j in each file. I assume that they are the keypoint ID's of the matched correspondences between the two scans 'i' and 'j'. For example, I presume that in the folder, "IMFNet_3DMatch_result/IMFNet", the file "7-scenes-redkitchen_seq-01_0_1_keypoints.npz" is the set of correspondences 'inds_i' and 'inds_j' between the scans "cloud_bin_0.ply" and "cloud_bin_1.ply" in the kitchen dataset in the 7-scenes-redkitchen dataset taken from http://vision.princeton.edu/projects/2016/3DMatch/downloads/scene-fragments/7-scenes-redkitchen.zip .

I assumed that the keypoint_ids given by inds_i and inds_j are the indices of the points in the point clouds "cloud_bin_0.ply" and "cloud_bin_1.ply" respectively. But there seems to be a mismatch in the correspondences on visual inspection and they don't seem to give good registration results also. Are there are any conventions and details that I am missing?

Also, the subfolders IMFNet and IMFNet_keypoints both seem to contain exactly the same .npz files. For example, the files "IMFNet/7-scenes-redkitchen_seq-01_0_1_keypoints.npz" and "IMFNet_keypoints/7-scenes-redkitchen_seq-01_0_1_keypoints.npz" have the same keypoint_id pairs inds_i and inds_j.

dataset and run out of memory

Thanks for sharing the impressive work!
Can you tell me where's your "Kitti" dataset's download website?
I just try "https://www.cvlibs.net/datasets/kitti/eval_odometry.php" and the file structure doesn't as same as yours.
also i have another question when i train 3Dmatch . It always run out of memory.
I have changed the batch size , voxel size and subsample number. How can I run this program successfully.(my device is 3080TI)

thanks a lot, sincerely waiting for your response.

requirement.txt

thanks for sharing this great opensource on the social platform, here is a question about requirement.txt, when i run the command to install all the package inside the file, it occurs plenty of errors. could you please check all the version in the requirement.txt and upload again for us !!