Giter VIP home page Giter VIP logo

Comments (3)

alexklibisz avatar alexklibisz commented on May 18, 2024

Some rough ideas on how to implement this:

Keep the same L2Lsh Mapping structure. This seems to be strictly a query-time optimization.

Add a parameter probes: Int to the L2Lsh query structure, controlling how many additional hashes of length k (k is called M in the original paper..) are generated for each of the L tables.

Hashing function will generate the additional probes hashes right after it generates the standard hash.
Roughly:

for l in range(L):
  generate the original hash and do some bookkeeping for below..
  for p in range(probes):
    generate additional hash

Otherwise the query looks exactly the same. It takes the generated hashes and goes on to look them up the same as any other approx. query.

Implement the naive version first (enumerate, score, sort all possible perturbations). Make sure that you can get equivalent recall on SIFT with fewer tables and ideally but not necessarily shorter query time. Then go back and implement the optimized version which precomputes perturbation sets, estimates the scores, etc. Make sure the optimized matches the naive.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 18, 2024

Implemented the naive version in #123.

As far I can tell, the naivety is not a bottleneck in the current benchmarking configuration (k = 2, probes = 3).
In this configuration the overwhelming bottleneck is (understandably) still the countMatches method.
The only thing I could find related to the multiprobe implementation was taking 0.1% of the runtime on search threads.

Even with k = 3, probes = 27, any footprint from the hashWithProbes method is minimal compared to countMatches.
image

So for now there are likely more worthwhile things to implement than the optimized version.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 18, 2024

🎆 🥳 🎈

Implemented Algorithm 1 from Qin et. al. in #124. This generates the perturbation sets iteratively instead of generating them exhaustively and sorting all of them. I'd say this is good enough or now. There doesn't seem to be a need to implement the expected scores optimization from section 4.5, and it's also not obvious to me why/how it works.

from elastiknn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.