This is the original paper: <a href="https://www.cs.princeton.edu/cass/papers/mplsh_vl

Implemented the naive version in <a class="issue-link js-issue-link" data-error-text="

🎆 🥳 🎈 Implemented Algorithm 1 from Qin et. al. in <a class="issue

MultiProbe for L2 Similarity about elastiknn HOT 3 CLOSED

alexklibisz commented on May 18, 2024

MultiProbe for L2 Similarity

from elastiknn.

Comments (3)

alexklibisz commented on May 18, 2024

Some rough ideas on how to implement this:

Keep the same L2Lsh Mapping structure. This seems to be strictly a query-time optimization.

Add a parameter probes: Int to the L2Lsh query structure, controlling how many additional hashes of length k (k is called M in the original paper..) are generated for each of the L tables.

Hashing function will generate the additional probes hashes right after it generates the standard hash.
Roughly:

for l in range(L):
  generate the original hash and do some bookkeeping for below..
  for p in range(probes):
    generate additional hash

Otherwise the query looks exactly the same. It takes the generated hashes and goes on to look them up the same as any other approx. query.

Implement the naive version first (enumerate, score, sort all possible perturbations). Make sure that you can get equivalent recall on SIFT with fewer tables and ideally but not necessarily shorter query time. Then go back and implement the optimized version which precomputes perturbation sets, estimates the scores, etc. Make sure the optimized matches the naive.

from elastiknn.

alexklibisz commented on May 18, 2024

Implemented the naive version in #123.

As far I can tell, the naivety is not a bottleneck in the current benchmarking configuration (k = 2, probes = 3).
In this configuration the overwhelming bottleneck is (understandably) still the countMatches method.
The only thing I could find related to the multiprobe implementation was taking 0.1% of the runtime on search threads.

Even with k = 3, probes = 27, any footprint from the hashWithProbes method is minimal compared to countMatches.

So for now there are likely more worthwhile things to implement than the optimized version.

from elastiknn.

alexklibisz commented on May 18, 2024

🎆 🥳 🎈

Implemented Algorithm 1 from Qin et. al. in #124. This generates the perturbation sets iteratively instead of generating them exhaustively and sorting all of them. I'd say this is good enough or now. There doesn't seem to be a need to implement the expected scores optimization from section 4.5, and it's also not obvious to me why/how it works.

from elastiknn.

Recommend Projects

MultiProbe for L2 Similarity about elastiknn HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent