Giter VIP home page Giter VIP logo

Comments (5)

maumueller avatar maumueller commented on June 9, 2024 1

By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)

Sure! As I said, I just used it to the test our wrapper methods. Thanks for the clarification!

from pqtable.

matsui528 avatar matsui528 commented on June 9, 2024

Hi @maumueller,

Thanks for inviting me and I'd love to include my algorithm in your benchmark! I guess it'd be best to write Python wrappers by pybind11. As I'm busy this month, I'll work on it in Jan or Feb next year.

Best,
Yusuke

from pqtable.

maumueller avatar maumueller commented on June 9, 2024

Hi @matsui528,

great to hear! Please keep me up-to-date and ping me if you have questions w.r.t. the benchmark.

Best,
Martin

from pqtable.

maumueller avatar maumueller commented on June 9, 2024

Hi @matsui528,

a small update here: We are currently including support for other languages in ann-benchmarks and I chose your implementation to play around a little bit. (Basically it boils down to implement a wrapper like this: https://github.com/maumueller/pqtable/blob/master/wrapper/wrapper.cpp. We are still working on making it easier accessible.)

I noticed that while there is support for top-k queries, there doesn't seem to be a parameter to improve the quality of the results. (I think I read something about that in your paper, but I couldn't find it as an option in your code.) E.g., I was expecting to see an option that gets the top-k' data points from the hash tables and then chooses the k closest of them through exact distance computations.

Since this option is missing, we only get a single result for each dataset, which ranges in quality quite a bit depending on the dataset. Would be great to have an option that affects the result quality, maybe as sketched above?

Any thoughts?

Best,
Martin

from pqtable.

matsui528 avatar matsui528 commented on June 9, 2024

Hi @maumueller,

PQTable doesn't have any runtime parameters. This is intentionally designed because I don't want to bother users with lots of parameters :)

As you suggested, late checking through a comparison to the original vectors is one direction. But currently I don't plan to do so because managing the original vectors takes an additional memory space. PQTable was originally developed in order to handle billion-scale data, and maintaining billion-scale original vectors requires prohibitive memory cost. Switching the search w/ late-checking for million-scale data and the search w/o late-checking for billion-sale data can be a solution, but the design would a bit difficult (PR is welcome)

As you pointed out, I guess only a single dot can be plotted for each dataset if each line line is drawn by tweaking a runtime parameter. I'm sorry for that.

By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)

from pqtable.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.