Comments (7)
Hello!
Thank you for your interest in our work! Here are the answers:
- We use the
max(e^R, e^t)
error for binning the pose hypotheses. - We use a total of 15 bins. We do not create the bins uniformly, since we are mainly interested in poses with low errors. Thus, we use a log function to split the poses such as low error poses are sampled more often. We use ln(x)/0.35, but we did not do an extensive study on this function, and other ways of splitting the data might be better.
- During training, we use 56 image pairs and a single hypothesis. We investigated using fewer image pairs but sampling more hypotheses but did not observe a significant difference.
Hope this helps!
Axel
from scoring-without-correspondences.
Thank you very much for the swift reply. That clarifies it!
May I also ask for some clarification on a doubt I have regarding the validation splits:
In the paper it is mentioned that the validation splits from LoFTR are followed. If I'm not mistaken LoFTR uses 1500 test pairs from SuperGlue for ScanNet, and 1500 sampled pairs from “Sacre Coeur” and “St. Peter’s Square” for MegaDepth. However, as discussed in the appendix, SuperGlue's (also LoFTR's) are trained for pairs with much higher visual overlap. So my understanding is you draw your own image pair samples with [0.1, 0.4] visual overlap for both ScanNet and MegaDepth.
My doubt is:
- Which scenes do you use for drawing the samples? Is it the "test scans" for ScanNet and only “Sacre Coeur” and “St. Peter’s Square” for MegaDepth?
- How many sample pairs do you draw for validation of each dataset?
- By any chance do you use a separate test split (separate from training/validation) for reporting the results in the tables?
Thanks again!
Fereidoon
from scoring-without-correspondences.
Hi!
You are right, we use a "custom" training, validation, and test split with image pairs with little overlap (10%-40%). Here are some clarifications on how we generate the training, validation and test sets:
-
For indoor datasets, we use the standard training, validation, and test splits, but we do sample our image pairs. If you are interested in the results of the standard ScanNet test set (the one in SuperGlue), we report them in the supplementary material (Table 5). For the outdoor scenes, we use MegaDepth scenes for training and validation. We remove scenes that are in the IMW and in the test set of the PhotoTourism dataset. As you noted, there are "only" two test scenes in MegaDepth, and hence, we use the PhotoTourism test as our testing scenes. Further details in section 5.2.
-
For training and validation, we sample 90,000 and 30,000 image pairs, respectively. With our configuration, we didn't see further improvements when increasing the dataset size.
-
As mentioned in answer 1), the training and validation sets are from different scenes than the test scenes we use to report the results in the paper tables.
Hope this helps, thanks!
Axel
from scoring-without-correspondences.
Ah I think I understand now. Thanks a ton for the clarification! I'll close the issue.
Cheers,
Fereidoon
from scoring-without-correspondences.
Hi there again,
If I may reopen the issue with another question about the data generation (please let me know if you prefer another channel other than github - e.g. email - for such questions):
I have tried running the provided pretrained model on a set of test samples that I generated from the ScanNet test split with pairs having 0.1 to 0.4 visual overlap score. However, I don't observe the same performance when I compare the metrics that I compute to the ones reported in the paper. Specifically I see noticeably worse performance in the translation component of the final solution after optimization (in median error by ~5 degrees, and in mAA by 0.05).
As these metrics depend a lot on the underlying pool of hypotheses, I'm wondering if you have some additional filtering/preprocessing steps when you produce the training/evaluation samples, for example to remove planar degenerate scenarios? I'm curious because in Figure 1 of appendix I see the error distributions only up to 90 degrees, whereas hypothesis errors can in theory reach 180 degrees.
Thanks in advance!
Best,
Fereidoon
from scoring-without-correspondences.
Hey there,
Thanks for the follow-up question. Happy to have the conversation here, hopefully, it is also useful for others.
Regarding the drop in performance, I can think of a few reasons why that might be. To sample hypotheses, we use the USAC framework (from OpenCV). We follow the very same pipeline as in MAGSAC++, and hence, we also use all additional checks implemented within it. Besides that, all the hypotheses are refined, and that proves to improve the accuracy of the computed poses. As a side note, to refine the poses, we rely on MAGSAC++ inliers.
Regarding the distribution errors only reaching 90 degrees. That is only true for the translation error, and it is due to the ambiguity of the translation vector in the Essential/Fundamental matrix. See SuperGlue (angle_error_vec) and (compute_pose_error) for more details on how to handle that.
Please let me know if you have further questions. If you would find it useful, I could also add to the repo our test set, although it might take me a bit of time to clean up/prepare that data.
Thanks!
from scoring-without-correspondences.
I'm closing this issue now since it did not have activity for a while. Do please feel free to reopen it if you have any other questions!
Thanks a lot!
from scoring-without-correspondences.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scoring-without-correspondences.