Giter VIP home page Giter VIP logo

Comments (12)

erikbern avatar erikbern commented on May 22, 2024

No hesitations but it's probably not trivial. You would have to store a mapping that works well with mmap meaning it has to be packed in contiguous memory. Strings can have different length etc so I'm not sure how to do that easily.

You could probably prepend the index with something like <id1><string1>NULL<id2><string2>NULL... and binary search in that index... I don't know

from annoy.

erikbern avatar erikbern commented on May 22, 2024

the other solution is just put it in a separate file and use some existing hashmap format like bdb, TC,

from annoy.

a1k0n avatar a1k0n commented on May 22, 2024

Or just store an indirect pointer alongside the item vectors, and have it point to a separate variable-sized chunk of memory after the vectors but before the tree nodes.

from annoy.

erikbern avatar erikbern commented on May 22, 2024

yes but that only gives you mapping in one direction, what about the other direction? you need some kind of binary search or hashing to make that work

from annoy.

erikbern avatar erikbern commented on May 22, 2024

I guess instead of using a contiguous sequence of nodes you can treat the whole annoy index as a hashmap and use something like cuckoo hashing? that way you just need to store the mapping from nodes to keys, not the other way around.

from annoy.

a1k0n avatar a1k0n commented on May 22, 2024

Oh, I see, Chris is asking about general keys. Yeah, then it needs to be pretty much like a sparkey hash.

from annoy.

MrChrisJohnson avatar MrChrisJohnson commented on May 22, 2024

Given that Sparkey already supports this with mmapping how would you feel about solving it with an annoy dependence on Sparkey?

from annoy.

erikbern avatar erikbern commented on May 22, 2024

I don't think Sparkey is available in PyPI so not sure if we want to depend on it

from annoy.

MrChrisJohnson avatar MrChrisJohnson commented on May 22, 2024

So maybe we can hold off on implementing for python annoy at the moment, but should be fine for the java version. In the meantime I can also try and get Sparkey submitted to PyPI. If we get Sparkey in PyPI then does this seem like a clean enough solution?

from annoy.

erikbern avatar erikbern commented on May 22, 2024

Sure. If you rely on Sparkey you could actually ignore the C++ part and implement this as a pure Python solution

from annoy.

MrChrisJohnson avatar MrChrisJohnson commented on May 22, 2024

Just realized that there is a C library for Sparkey so should be possible to do this outside of Python altogether. Would be nice to manage the dependencies through Travis but not sure how that would work with the travis.yml being language: python. Any ideas?

from annoy.

erikbern avatar erikbern commented on May 22, 2024

I don't know either maybe a git submodule?

It sounds like a lot of work to get this working compared to the benefit tbh... it's not super hard to just store a hash table separately. I think this is also how the other ANN libraries work (flann, sklearn, panns, nearpy)

from annoy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.