jac99 / minklocmultimodal Goto Github PK

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

License: MIT License

Python 100.00%

3d-convolutional-network 3d-vision computer-vision deep-learning metric-learning minkowski-engine multimodal place-recognition point-cloud

minklocmultimodal's People

Contributors

Stargazers

Watchers

Forkers

mramezani64 gameinskysky fanzhaoxin666 xiaojake gatsby23 tuskaw alexmelekhin zwl995 minhdov

minklocmultimodal's Issues

KITTI dataset details

I have a few questions regarding the KITTI dataset.

There are a total of 4 images (grayscale 2, color 2) in the KITTI dataset. I wonder which one was used.
When using the point cloud, did you remove the road and downsampled it to have 4096 points like the oxford and inhouse datasets? Moreover, if so, KITTI's point cloud seems to contain a wider area of information. Did you crop it?
How did you handle the northing and easting information? The distance was calculated only by northing and easting in the existing oxford and inhouse datasets without considering the altitude. Did you save and use it in the same way in KITTI? (for example, easting: x, northing: z) Or did you use the 3D distance?

I would be waiting for your reply. Thank you!

the Oxford Robotcar Dataset unavailable

Hi, thanks for your great work on multi modal fusion for place recognition.
However, the Oxford Robotcar Dataset is unavailable from 2022-11-08 and I'm a bit frustrated that there is no sign of re-opening the download. Could you please provide me with the centre RGB images by Google drive?
Thanks in advance.

KITTI dataset

According to your paper, the database and query split of the KITTI dataset were as follows.

We take Sequence 00 which visits the same places repeatedly and construct the reference database using the data gathered during the first 170 seconds. The rest is used as localization queries.

However, most queries did not have a close database data with the above split when I tried. (sequence 00 -> database: first 170 sec & query: rest(=170~470 sec)) Could you please explain more about this setting or send me the code?

How to normalize the RobotCar pointcloud data to [-1,1]?

Hi, in your paper "As point coordinates in the Oxford RobotCar dataset are normalized to be within [−1,1] range, this gives up to 200 voxels in each spatial direction." May I know what algorithm was used to get the normalized point cloud between -1 and 1?

bin file data format

Hello, I have another question. When I read the point cloud bin file, I found that the value inside is very small, all less than 1. What preprocessing has been done? I would be waiting for your reply. Thank you!

questions about training

Hello, thanks for your great work.
I have run your training code and found something confused.

The part of image loss is relatively larger than point cloud (around 100 times), which is also mentioned in the paper as "overfitting problem". Could you please explain it more detailed? Why are there less active triplets for RGB image modality than for 3D modality, does the active triplet correspond to num_non_zero_triplets in code?
The adopted resnetFPN is already pretrained. Is the large training loss mainly caused by huge difference in illumination across traversals on the Oxford RobotCar Dataset? It's also a bit werried that image loss gets down while its weight(beta) is set 0.0.

Looking forward to your reply. Thanks.

MinkLoc++ (RGB-only) generation

Hello~ Thanks for your work!
I saw that your paper gave the result of using only rgb pictures. But I did not find the relevant weight file and runing steps. Can you tell me the operation steps?

RGB generation

Thanks for the work!
I'm a beginner in point cloud. When I run generate_rgb_for_lidar.py, it seems that lidar2image_ndx.pickle is required, but I cannot figure out where it is first generated. The problem is reported before the function create_lidar2img_ndx has been run. Could U plz give me some hint about that?

How about the MinkLoc++ inference efficiency compared to MinkLoc3D

Thansk for your great work
I didn't see the introduction of inference time in paper, how about the MinkLoc++ inference efficiency compared to MinkLoc3D(22 ms per cloud) ?

About how to get image to lidar dataset?

Hi,
Thanks for your nice work and shared code.
When I need to get the image to correspond to the lidar point cloud, I follow your instruction to run the generate_rgb_for_lidar.py script firstly. But the code which showed as below is confused for me:

May I know what are lidar2image_ndx_path, lidar2image_ndx.pickle and pickle..., etc. Do I need to run other scripts to get these files?

RobotCar Seasons

Hello~ Thanks for your work!
How can I reproduce the result of Table 3 in your paper? Can you provide the script that finds LiDAR readings with corresponding timestamps in the in original RobotCar dataset for each image in RobotCar Seasons Datasets?

loss become nan during training

hello!
I run your code and find the loss becomes nan after 40 epoch. I find it is caused by that values of all embeddings become nan. Do you find this during your research? Is there any bug? or just by accident?

The pre-processed RobotCar images are unavailable

Hi, thanks for your great work.
The pre-processed and downsampled RobotCar images are unavailable here. Could you please re-upload to the pre-processed images via Google Drive?
Thanks in advance.

RobotCar images

Hi, thanks for your great work.
The pre-processed and downsampled RobotCar images are unavailable here. Could you please re-upload to the pre-processed images via Google Drive?
Thanks in advance.

How to evaluate on KITTI dataset?

Hi, I wanted to recreate the generalisation results on KITTI dataset, that you have mentioned in the paper. I would appreciate any advice on how to run the model for KITTI dataset.

Bug in lidar2image_ndx generation for val queries

I've tried to implement validation during training and faced a problem that in the val phase the script returns an error:

AssertionError: Unknown lidar timestamp: 1435937763823973

I've checked the index generation script and found that there is a bug in the code:

MinkLocMultimodal/scripts/generate_rgb_for_lidar.py

Line 76 in e8fc6d2

ts, traversal = get_ts_traversal(train_queries[e].rel_scan_filepath)

I am pretty sure that it should be ts, traversal = get_ts_traversal(val_queries[e].rel_scan_filepath)

Something is wrong with the mapping file lidar2image_ndx.pickle

I don't know if it's me, but I am using it to map lidar to images of the oxford dataset and I end up with a very low number of samples per run. I supposed the mapping doesn't map all the point clouds to an image.

KITTI Dataset Evaluation

When evaluating the KITTI dataset, how did you map the point cloud to the image?

Question about the code

Question about the model structure

Hi,
I recently implemented your work, but I am confused about your model structure, which is shown as blew:

Question-1: I think the module of convs(1) is not consistent with your description in Fig.3 of your paper?
Question-2: What does the module downsample for?

Question about how to generate batches?

Hi,
My question is about the way you generate batches (in samplers.py).

MinkLocMultimodal/datasets/samplers.py

Line 92 in 683ef1a

def generate_batches(self):

I am confused why you generate batches in this way rather than randomly sample batch-size elements from all lidar point clouds and then remove the selected elements?

Question about the code- "positives_masks" & "negatives_masks"

Hi, I am confused about the positives_masks and negatives_masks, could you please explain more about it?

MinkLocMultimodal/datasets/dataset_utils.py

Line 48 in 683ef1a

 positives_mask = [[in_sorted_array(e, dataset.queries[label].positives) for e in labels] for label in labels] 

Question about pointcloud data

Hi, in your work you used the 2D LiDAR (from Oxford RobotCar: 2 x SICK LMS-151 2D LIDAR, 270° FoV, 50Hz, 50m range, 0.5° resolution). According to the documentation description: Each 2D scan consists of 541 triplets of (x, y, R), where x, y are the 2D Cartesian coordinates of the LiDAR return relative to the sensor (in metres), and R is the measured infrared reflectance value. Does the point cloud data you used actually not a real 3-dimensional dataset, the coordinate Z represents the reflectance?

Oxford Dataset RGB Image Process

Hi, thanks for your great work.
I have one question.
The stereo/center camera data downloaded from the official website is single-channel, but I haven't seen any code in your implementation for handling single-channel data. Did I overlook something?
Thanks in advance.