Giter VIP home page Giter VIP logo

Comments (7)

axelBarroso avatar axelBarroso commented on June 8, 2024

Hello!

Thank you for your interest in our work! Here are the answers:

  1. We use the max(e^R, e^t) error for binning the pose hypotheses.
  2. We use a total of 15 bins. We do not create the bins uniformly, since we are mainly interested in poses with low errors. Thus, we use a log function to split the poses such as low error poses are sampled more often. We use ln(x)/0.35, but we did not do an extensive study on this function, and other ways of splitting the data might be better.
  3. During training, we use 56 image pairs and a single hypothesis. We investigated using fewer image pairs but sampling more hypotheses but did not observe a significant difference.

Hope this helps!
Axel

from scoring-without-correspondences.

efreidun avatar efreidun commented on June 8, 2024

Thank you very much for the swift reply. That clarifies it!

May I also ask for some clarification on a doubt I have regarding the validation splits:

In the paper it is mentioned that the validation splits from LoFTR are followed. If I'm not mistaken LoFTR uses 1500 test pairs from SuperGlue for ScanNet, and 1500 sampled pairs from “Sacre Coeur” and “St. Peter’s Square” for MegaDepth. However, as discussed in the appendix, SuperGlue's (also LoFTR's) are trained for pairs with much higher visual overlap. So my understanding is you draw your own image pair samples with [0.1, 0.4] visual overlap for both ScanNet and MegaDepth.

My doubt is:

  1. Which scenes do you use for drawing the samples? Is it the "test scans" for ScanNet and only “Sacre Coeur” and “St. Peter’s Square” for MegaDepth?
  2. How many sample pairs do you draw for validation of each dataset?
  3. By any chance do you use a separate test split (separate from training/validation) for reporting the results in the tables?

Thanks again!
Fereidoon

from scoring-without-correspondences.

axelBarroso avatar axelBarroso commented on June 8, 2024

Hi!

You are right, we use a "custom" training, validation, and test split with image pairs with little overlap (10%-40%). Here are some clarifications on how we generate the training, validation and test sets:

  1. For indoor datasets, we use the standard training, validation, and test splits, but we do sample our image pairs. If you are interested in the results of the standard ScanNet test set (the one in SuperGlue), we report them in the supplementary material (Table 5). For the outdoor scenes, we use MegaDepth scenes for training and validation. We remove scenes that are in the IMW and in the test set of the PhotoTourism dataset. As you noted, there are "only" two test scenes in MegaDepth, and hence, we use the PhotoTourism test as our testing scenes. Further details in section 5.2.

  2. For training and validation, we sample 90,000 and 30,000 image pairs, respectively. With our configuration, we didn't see further improvements when increasing the dataset size.

  3. As mentioned in answer 1), the training and validation sets are from different scenes than the test scenes we use to report the results in the paper tables.

Hope this helps, thanks!
Axel

from scoring-without-correspondences.

efreidun avatar efreidun commented on June 8, 2024

Ah I think I understand now. Thanks a ton for the clarification! I'll close the issue.

Cheers,
Fereidoon

from scoring-without-correspondences.

efreidun avatar efreidun commented on June 8, 2024

Hi there again,

If I may reopen the issue with another question about the data generation (please let me know if you prefer another channel other than github - e.g. email - for such questions):

I have tried running the provided pretrained model on a set of test samples that I generated from the ScanNet test split with pairs having 0.1 to 0.4 visual overlap score. However, I don't observe the same performance when I compare the metrics that I compute to the ones reported in the paper. Specifically I see noticeably worse performance in the translation component of the final solution after optimization (in median error by ~5 degrees, and in mAA by 0.05).

As these metrics depend a lot on the underlying pool of hypotheses, I'm wondering if you have some additional filtering/preprocessing steps when you produce the training/evaluation samples, for example to remove planar degenerate scenarios? I'm curious because in Figure 1 of appendix I see the error distributions only up to 90 degrees, whereas hypothesis errors can in theory reach 180 degrees.

Thanks in advance!

Best,
Fereidoon

from scoring-without-correspondences.

axelBarroso avatar axelBarroso commented on June 8, 2024

Hey there,

Thanks for the follow-up question. Happy to have the conversation here, hopefully, it is also useful for others.

Regarding the drop in performance, I can think of a few reasons why that might be. To sample hypotheses, we use the USAC framework (from OpenCV). We follow the very same pipeline as in MAGSAC++, and hence, we also use all additional checks implemented within it. Besides that, all the hypotheses are refined, and that proves to improve the accuracy of the computed poses. As a side note, to refine the poses, we rely on MAGSAC++ inliers.

Regarding the distribution errors only reaching 90 degrees. That is only true for the translation error, and it is due to the ambiguity of the translation vector in the Essential/Fundamental matrix. See SuperGlue (angle_error_vec) and (compute_pose_error) for more details on how to handle that.

Please let me know if you have further questions. If you would find it useful, I could also add to the repo our test set, although it might take me a bit of time to clean up/prepare that data.

Thanks!

from scoring-without-correspondences.

axelBarroso avatar axelBarroso commented on June 8, 2024

I'm closing this issue now since it did not have activity for a while. Do please feel free to reopen it if you have any other questions!

Thanks a lot!

from scoring-without-correspondences.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.