netrasys / pgann Goto Github PK
View Code? Open in Web Editor NEWFast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
License: MIT License
Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
License: MIT License
hello.
I found this wonderful docker file, where cube max dimension limit was increased up to 2048.
https://hub.docker.com/r/expert/postgresql-large-cube
Hi,
could your approach be improved by using dblink? Example: http://www.programmersought.com/article/76671080348/
Best
Wilhelm
Hey! This is really great, and I want to thank you for looking into using PostgreSQL cube data types and GIST indexes for nearest neighbor queries. I did want to ask you, though, did you ever run into any issues where the GIST index actually made performance worse? The reason I ask is, I seem to be experiencing that. It's all documented in this StackOverflow question and also in a related GitHub repo. I'm still researching the problem, but my working theory right now is that, without the index postgres will do a parallel sequential scan on the table, whereas with the index it'll only do a sequential scan of the index. If that's true, then I'm trying to figure out how to coax it into doing a parallel index scan if at all. Will update you as I progress!
Nice to find a ANN which is not RAM based! Thanks!!
Just tried this with 50 000 000 entries of length 90. Size of the table is with the embeddings 38Gb. Used postgres in a container and ran a search for a single random embedding with:
sql = "select id,embeddings from images order by embeddings <-> cube({0}) asc limit 25".format((emb_string))
This search took 230s. I have very good cpu and memory speeds on this computer. I'm I doing something wrong or is this reasonable?
My issue is this: the total size of the table was about 38Gb which means it sort of fits in RAM. Is it better to use faiss? If I double the db size will it take double the time?
Hi,
Really happy to have stumbled on this and get an indication this idea can work at scale. This is awesome and we'd love to know more about your experience running this in prod. Any chance for a blog post ?
Tnx
Why is this an "approximate" nearest neighbor search? The documentation at https://www.postgresql.org/docs/13/cube.html says nothing about distances or search being approximate, and I don't see anything in https://github.com/postgres/postgres/blob/472e518a44eacd9caac7d618f1b6451672ca4481/contrib/cube/cube.c to indicate it's anything other than a typical kd-tree search.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.