Giter VIP home page Giter VIP logo

lightglue's People

Contributors

ducha-aiki avatar fabio-sim avatar phil26at avatar sarlinpe avatar skydes avatar yusufaydin0797 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightglue's Issues

Batch mode slow

Hi, I modified your codes to make it support batch mode. But when I tested on 40 pairs of images, I found it is slower (0.57 s) under batch mode compared to matching them one by one (0.48 s), do you know the potential issue for this problem?

Another usage of LightGlue

Hi,
I trained the neural network that provides (x,y) coordinates of objects and descriptors for them (such as SuperPoint).

When I use GT points and LightGlue, the matching works excellently. But when I use (x, y) estimations, LightGlue strongly filters matches (for my example, I have 264 detected points, when LightGlue returns 110 matches).

It is worth highlighting that the objects between frames can lightly move.

I set the parameters:

depth_confidence: -1
width_confidence': -1
filter_threshold': 1e-5, 

It is possible to extend the algorithm for this purpose? Could you suggest any changes/improvements for such a task?

SIFT+LightGlue

Hello, thank you for publishing the great code. could you please also share the SIFT+LightGlue code?

getting 0 image matching correspondences for some image pair when giving whole video as consecutive image frames.

Hello, It was a very helpful project. I am trying to get the camera pose from the image correspondences. but when i am giving the whole video as frames some of the frames are getting 0 or less 3 matches .

` extractor = SuperPoint(max_num_keypoints=2048)
matcher = LightGlue(features='superpoint')
feats0 = extractor.extract(image0)
feats1 = extractor.extract(image1)
matches01 = matcher({'image0': feats0, 'image1': feats1})
feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]] # remove batch dimension

kpts0, kpts1, matches = feats0['keypoints'], feats1['keypoints'], matches01['matches']
m_kpts0, m_kpts1 = kpts0[matches[..., 0]], kpts1[matches[..., 1]]
return m_kpts0.cpu(),m_kpts1.cpu()

`

LightGlue on cases with either 90 or 180 degree rotation

Thanks for sharing this cool repo! I've been getting great results with image pairs where "up is up." However, I'm curious about cases where there's a 90-degree or even 180-degree rotation. It seems to work when we manually rectify those images before applying the technique. But I'm wondering if there's a way to do this without the manual step.

"up is up" scenario:
1

"up is not up" scenarios:
2

3

Reproducing results with SuperPoint + MNN on Megadepth-1500

Hello, first of all thank you for the great work and to make the license permissive, it will surely boost research in image matching!

I am trying to reproduce SuperPoint + MNN as a baseline. For that, I follow the protocol of the paper, trying to achieve results as close as possible to values reported in Table 2 of the LightGlue paper. I am doing the following steps:

  • Resize image such that longer dim is 1600 px;
  • Extract top 2048 keypoints using the default parameters for SuperPoint defined in this repo;
  • Match descriptors using NN + Mutual Check;
  • Use OpenCV findEssentialMat with prob=0.99999 and default "classic" cv2.RANSAC. As those details are not explicitly mentioned, I'm basically following LoFTR protocol defined in their original repo as suggested in LightGlue's paper for finding pose AUC @ [5,10,20].
  • I tested with several inlier thresholds ranging from [0.25, 2.5] px. The best results I can achieve is the following:
ransac_thr = 1.5 
{'auc@5': 0.251782299270867, 'auc@10': 0.3987322068921645, 'auc@20': 0.5415882032042043}

I also attempted to run LO-RANSAC instead of using cv2.RANSAC since it gives a great boost in AUC in Table 2, but without success. I tested the implementation from both pydegensac and cv2.USACs, but with results very far away to the performance of AUC@5 of 0.51, testing with several configurations of inlier thresholds and different flags. Could you guys kindly provide more details on the SuperPoint parameters, RANSAC implementation and hyperparameters used to achieve these results, specifically for SuperPoint + MNN matching (Table 2)?

Thank you in advance!

there may exist a code bug

The code of m1 may exist a bug, the shape of attn10 is 'b h j i', so the shape of attn10.transpose(-2, -1) is 'b h i j', can you help check that?

       “qk0, qk1 = qk0 * self.scale**0.5, qk1 * self.scale**0.5
        sim = torch.einsum('b h i d, b h j d -> b h i j', qk0, qk1)
        attn01 = F.softmax(sim, dim=-1)
        attn10 = F.softmax(sim.transpose(-2, -1).contiguous(), dim=-1)
        m0 = torch.einsum('bhij, bhjd -> bhid', attn01, v1)
        m1 = torch.einsum('bhji, bhjd -> bhid', attn10.transpose(-2, -1), v0)”

Can you provide another descriptor for lightGlue?

superPoint descriptor is 256 dim , which is too big for my application ,/ but the superPoint (u, v) is the best point for me . I want to konw can you provide another descriptor (maybe 64 dim), to train LightGlue.

question about rotated images for LightGlue and OpenCV SIFT

Hi,
Thanks for your great work~
I'm using LightGlue pretrained weight get keypoints from a template image and an input image, and then align the input image with the template by keypoints obtained. I noticed when the input image is rotated(by 90 degrees, 180 degrees and 270 degrees), if I use LightGlue pretrained weight to get the keypoints, the output aligned image is distorted which seems failed. However I tried to use OpenCV SIFT to get the keypoints, the output aligned image is good.

keypoints got by SIFT

sift = cv2.SIFT_create(contrastThreshold=0.02)
...
kp2, des2 = sift.detectAndCompute(target_img, None)

align

# m_kpts1-> template image, m_kpts0 -> input image
M, mask = cv2.findHomography(m_kpts1, m_kpts0, cv2.RANSAC, 5.0) 
M_r = np.linalg.inv(M)
aligned_img = cv2.warpPerspective(src_img, M_r, (template_w, template_h))

Do you know if the problem is caused by that LightGlue pretrained weight does not work good for rotated images? or if there is anything wrong with my code for alignment?Appreciated for suggestions.

Batch mode results worse than non-batch

Hi,

I tested the speed and match results using batch mode and non-batch mode respectively and found: although batch mode is faster, its accuracy is worse than non-batch mode. I have checked the matched points' coordinates on each image between batch or non-batch modes and found most of them are the same, but some are different.

I used a query image with 50 similar images to do testing and print the matched pairs number of two modes and get:

batch matched points num: 179
non-batch matched points num: 179
batch matched points num: 109
non-batch matched points num: 107
batch matched points num: 107
non-batch matched points num: 106
batch matched points num: 124
non-batch matched points num: 117
batch matched points num: 111
non-batch matched points num: 113
batch matched points num: 138
non-batch matched points num: 140
batch matched points num: 125
non-batch matched points num: 125
batch matched points num: 136
non-batch matched points num: 129
batch matched points num: 110
non-batch matched points num: 108
batch matched points num: 126
non-batch matched points num: 127
batch matched points num: 141
non-batch matched points num: 135
batch matched points num: 137
non-batch matched points num: 130
batch matched points num: 157
non-batch matched points num: 157
batch matched points num: 129
non-batch matched points num: 126
batch matched points num: 93
non-batch matched points num: 93
batch matched points num: 115
non-batch matched points num: 113
batch matched points num: 71
non-batch matched points num: 106
batch matched points num: 53
non-batch matched points num: 128
batch matched points num: 70
non-batch matched points num: 132
batch matched points num: 57
non-batch matched points num: 76
batch matched points num: 87
non-batch matched points num: 106
batch matched points num: 68
non-batch matched points num: 119
batch matched points num: 85
non-batch matched points num: 76
batch matched points num: 58
non-batch matched points num: 96
batch matched points num: 87
non-batch matched points num: 75
batch matched points num: 121
non-batch matched points num: 150
batch matched points num: 73
non-batch matched points num: 85
batch matched points num: 89
non-batch matched points num: 128
batch matched points num: 79
non-batch matched points num: 133
batch matched points num: 125
non-batch matched points num: 112
batch matched points num: 67
non-batch matched points num: 118
batch matched points num: 75
non-batch matched points num: 114
batch matched points num: 67
non-batch matched points num: 45
batch matched points num: 83
non-batch matched points num: 97
batch matched points num: 98
non-batch matched points num: 168
batch matched points num: 62
non-batch matched points num: 85
batch matched points num: 94
non-batch matched points num: 101
batch matched points num: 106
non-batch matched points num: 82
batch matched points num: 88
non-batch matched points num: 80
batch matched points num: 32
non-batch matched points num: 42
batch matched points num: 95
non-batch matched points num: 113
batch matched points num: 98
non-batch matched points num: 180
batch matched points num: 88
non-batch matched points num: 101
batch matched points num: 51
non-batch matched points num: 109
batch matched points num: 77
non-batch matched points num: 114
batch matched points num: 85
non-batch matched points num: 99
batch matched points num: 64
non-batch matched points num: 62

I also checked these matched pairs and found non-batch mode is more accurate, do you know why this happened? All other parameters remain the same in the two testings.

Thank you!

Match confidence

Hello, is there any scoring to the points matched?

pred = match_pair(extractor, matcher, prev_frame_t, frame_t)

'keypoints0'
'keypoint_scores0'
'descriptors0'
'keypoints1'
'keypoint_scores1'
'descriptors1'
'image0'
'image1'
'log_assignment'
'matches0'
'matches1'

I see that pred contains matching_score but I cant see any info on what that actually is. What I would like to do is filter out potentialy bad matches / low confidence, on for example white walls with low ammount of features.

Keypoints Pixels & Heatmap

I'd love to be able to generate a heatmap to overlay on each of the images used in a comparison to visualize the distribution of keypoints and highlight significant areas.

I don't have much experience handling torch.Tensor outputs, but I see the data below for an image comparison

kpts0[:3] - tensor([[1012.0620, 455.7500], [1371.6401, 755.7500], [1226.8101, 123.2500])

kpts1[:3] - tensor([[ 382.5601, 11.8231], [ 412.9731, 11.8231], [ 502.7641, 11.8231])

matches[:3] - tensor([[ 0, 794], [ 1, 1017], [ 2, 363])

Can you suggest the best way to return (x,y) pixel values of keypoints for each image?

Thanks! This model & paper are really excellent. :)

No-match case

Hi,

It was unclear to me whether your early exit scheme also supports the possibility of rejecting a pair of frames to match at all. Is this something you've looked at? If not would it be a straightforward extension, or do you see caveats?

loss

Hi, great work, how can I calculate the difference of keypoints of two images and backpropagate it as a loss

About feature extraction accuracy

Hi,

First of all, thanks for this excellent work.

When I use Lightglue for matching, I found the input image resized to 1024 in the default config and want to know the reason for doing this. Because I found that image size changes may cause feature point positions to offset, and I tried sparse reconstruction based on matching results, which may increase mean reprojection errors.

Thank you

How to use Revisiting Oxford and Paris to pre-train model

Hi there

Thanks for your great work!

I am a new student in machine learning and am facing difficulties in retraining the model due to issues with data and loss.

  1. In section 4, "Details that Matter," the paper discusses using the "Revisiting Oxford and Paris" dataset for pre-training the model. However, since this dataset was designed for retrieval purposes, I am uncertain how to incorporate it into the retraining of the model.

  2. How to calculate the matching error of two images after obtaining Superpoints feature points.

[Announcements] Release of training and evaluation code

Hit the subscribe button on the right of this issue if you wish to be notified of the training and evaluation code release in a separate repo. Please do not reply to this issue to not spam other subscribers. Please do not contact us to ask for early-access to the code.

ETA: July 2023

Estimating the H-Matrix (homography)

Hi,

Can you please let me know how to use the matching result to generate the H matrix??

i can see that there are 283 matches between my image pairs.

image

_matches.size()
torch.Size([283, 2])_

It would be helpful if you can help me with this.

thank you

using another method as a feature extraction module

Thank you for your work. Due to the lack of training code, I am currently unable to replace the method used myself. I wonder if you are interested in testing the use of ALIKED as a feature extraction module.

Is it possible to run inference in batch mode

Hello,
I was wondering if it would be possible if lightglue could be ran in batch mode, since I have to match a large amount of images the bottleneck right now is inference time.
Thank you for your work!

tensorRT in c++

Hi! Thank you for making this available. DO you have any plans to create a TensorRT c++ inference example? Or, are you aware of one?

Thanks!

can you offer train dataset?

In the paper, there are two datasets, can you offer them? So when you release the train code, I can train immediately

  1. "first pre-train LightGlue on synthetic homographies of real-images"
    2, "We use 170k images from the Oxford-Pairs 1M distractors dataset, and split them into 150K/10K/10K images for training/validation/test"

resize_image() got an unexpected keyword argument 'grayscale'

resize_image() function in utils doesn't take a 'grayscale' argument, which explains why you're seeing the "resize_image() got an unexpected keyword argument 'grayscale'". The only arguments this function takes are: "image", "size", "fn", and "interp". If you want to convert the image to grayscale, you'll need to do it either before or after calling resize_image().

/content/LightGlue/lightglue/utils.py in load_image(path, resize, **kwargs)
119 image = read_image(path)
120 if resize is not None:
--> 121 image, _ = resize_image(image, resize, **kwargs)
122 return numpy_image_to_torch(image)
123

TypeError: resize_image() got an unexpected keyword argument 'grayscale'

https://colab.research.google.com/drive/1eH6Vv-K3pq-ben6LI2PcoYHBVbLFWVUu?usp=sharing#scrollTo=frknzZtJM_wb

Can I get a matching distances??

I want to know about each matching points'matching accuracy so I have to make use of matching distances
but there are errors that there are no matching distances in 'matches01'

so could you know how to solve it?

About Inference time

I tested lightglue and superglue using both CPU and GPU ,In both cases, superglue takes less time,but in paper,lightglue is faster。I want to know why there is such a contradiction,and my CPU:Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz ,GPU:
RTX 2080Ti

Some confusion about this work

Hi! Thanks for your great work! But I don't quite understand some parts of the code

About the training and results

  1. How much does the pre-training affect the final performance ? Have you tried to directly train LightGlue on MegaDepth (i.e., no pre-training)?
  2. As you stated in C.1. Architecture.Confidence classifier: "... and its gradients are not propagated into the states to avoid impacting the matching accuracy." Why the classifier's gradients will impacting the matching accuracy? have you conducted experiments about it?
  3. The training of classifier is after the "pre-training" or the "fine-tuned with the MegaDepth"?

some details in code

  1. Why is an additional row/column added to the log assignment matrix? I don't see it in the paper. Even not used in this code. code.
  2. Why assert mask is None when using FlashCrossAttention() which is self-installed ? (FlashCrossAttention also support maskparameter) code
  3. In CrossBlock , when flash is enable, you do not use Bidirectional Cross Attention actually, right ? code . If so, what's the reason ?
  4. Why do not calculate the scores1 by scores1 = F.log_softmax(sim, 1) but in a more complicated way (scores1 = F.log_softmax(sim.transpose(-1, -2).contiguous(), 2).transpose(-1, -2)) ? code.

Thanks for your time! Looking forward to your reply!

TypeError: 'numpy._DTypeMeta' object is not subscriptable

Tried running the example script but I get this strange error when trying to import LightGlue

from lightglue import LightGlue 
.. 
.. <stack trace>
TypeError: 'numpy._DTypeMeta' object is not subscriptable

Seems to be using some typed implementation of numpy? Any guidance would be greatly appreciated.

about scores

Hi, thanks for posting your awesome code. Did you supervise the loss on the matching scores during the training stage

Regarding estimating Camera matrix K, R and T

Hi,

Thank you and the matching results are very good. :)

Earlier i used Sift for feature extraction and cv.detail_BestOf2NearestMatcher() for matching. Then i use cv.detail_HomographyBasedEstimator() to estimate the K, R and T.

How can i achieve this using LightGlue results? Could you please help me with calculating K, R and T from lightglue results?

Thank you

Imge stitching

I know that this is probably not the primary focus of code/repository.
But I found lightglue very efficient in my use case and I would like to merge multipl flat images (slices of a marble/granite slab).
I dont have homography issues, just a little light correction and blending.
Which is the preferred approach?

images in [assets] are used for matching, but getting 0 image matching correspondences

extractor = SuperPoint(max_num_keypoints=2048).eval().cuda()
matcher = LightGlue(features='superpoint').eval().cuda()

image0 = load_image('path/assets/DSC_0410.jpg').cuda()
image1 = load_image('path/assets/DSC_0411.jpg').cuda()

feats0 = extractor.extract(image0)
feats1 = extractor.extract(image1)

matches01 = matcher({'image0': feats0, 'image1': feats1})
feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]]
kpts0 = feats0['keypoints']
kpts1 = feats1['keypoints']
print("kpts0 = ",kpts0.shape)
print("kpts1 = ",kpts1.shape)
matches = matches01['matches']
print("matches = ",matches.shape)

kpts0 = torch.Size([2048, 2])
kpts1 = torch.Size([2048, 2])
matches = torch.Size([0, 2])

Batch support

I suppose batch support is not yet implemented?
The inference speed is nice but it needs to be batched in order to use 100% GPU

matcher needs image as an input

Hi there,

Thanks for this great work.

In class LightGlue(nn.Module), the images are not mentioned as required_data_keys. But later on they are needed in the forward function:
kpts0 = normalize_keypoints(
kpts0_, size=data.get('image_size0'), shape=data['image0'].shape)
kpts1 = normalize_keypoints(
kpts1_, size=data.get('image_size1'), shape=data['image1'].shape)

About IMC 2023

Thank you very much for you works
I have some questions about IMC2023. I want to turn my pipeline with your Lightglue, my feature matching is superpoint and superglue which i get 0.65 in heritage_dioscuri scence
But i turn to superpoint and lightglue ,the scores is 0.48 ,i am very confused with the results beacuse the large decline .
It is Strange beacuse in other scences the scores improved.
The two ways have the same settings with resized to 1600 and the number of superpoint is 2048

Thank you

How should I use this project to complete the pose estimation of the target object?

I now have: the internal parameter matrix of the camera, the 3D model of the target object, and the rgb image of the target object. My goal is to estimate the pose matrix of the object in the camera coordinate system. I am currently get target object bbox by object detection model and using the pnp method. Can I use this project to obtain the pose of the target object more accurately?

ONNX convert/export

Hello!

First of all, I would like to applaud your work in pushing the envelope on SOTA local feature matching with LightGlue!

I've actually made an ONNX-compatible version at https://github.com/fabio-sim/LightGlue-ONNX. It'd be great if you could kindly add a link to it in your readme :)

With ONNX, however, comes some caveats (e.g., difficulty in exporting dynamic control flow). Do let me know if you've got any ideas to support early stopping & adaptive point pruning in ONNX runtime. Have a good day!

Training/Evaluation code

I would like to train / test light-glue with other feature extractors models like (r2d2 and shift ), can you please publish training code ?

About the positional encoding

Hi!
Your ablation experiments demonstrate the excellent performance of relative position encoding, howerver, I have two questions:

  1. The original RoPE uses Sinusodial encoding. I'm not very understanding why you use "Fourier features" instead of it.
  2. The original RoPE is designed for language, which is 1 dimension. If I'm not mistaken, you just use the 1-d RoPE to encode the position of keypoints in the code. Howerver, the image is 2 dimensions data, i thank it is not suitable. Or it is the reason that you use the "Fourier features" ?

Looking forward for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.