Giter VIP home page Giter VIP logo

pgann's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgann's Issues

Worse performance with GIST?

Hey! This is really great, and I want to thank you for looking into using PostgreSQL cube data types and GIST indexes for nearest neighbor queries. I did want to ask you, though, did you ever run into any issues where the GIST index actually made performance worse? The reason I ask is, I seem to be experiencing that. It's all documented in this StackOverflow question and also in a related GitHub repo. I'm still researching the problem, but my working theory right now is that, without the index postgres will do a parallel sequential scan on the table, whereas with the index it'll only do a sequential scan of the index. If that's true, then I'm trying to figure out how to coax it into doing a parallel index scan if at all. Will update you as I progress!

Performance

Nice to find a ANN which is not RAM based! Thanks!!

Just tried this with 50 000 000 entries of length 90. Size of the table is with the embeddings 38Gb. Used postgres in a container and ran a search for a single random embedding with:

sql = "select id,embeddings from images order by embeddings <-> cube({0}) asc limit 25".format((emb_string))

This search took 230s. I have very good cpu and memory speeds on this computer. I'm I doing something wrong or is this reasonable?

My issue is this: the total size of the table was about 38Gb which means it sort of fits in RAM. Is it better to use faiss? If I double the db size will it take double the time?

Can we get a blog post

Hi,
Really happy to have stumbled on this and get an indication this idea can work at scale. This is awesome and we'd love to know more about your experience running this in prod. Any chance for a blog post ?
Tnx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.