Comments (12)
No hesitations but it's probably not trivial. You would have to store a mapping that works well with mmap meaning it has to be packed in contiguous memory. Strings can have different length etc so I'm not sure how to do that easily.
You could probably prepend the index with something like <id1><string1>NULL<id2><string2>NULL...
and binary search in that index... I don't know
from annoy.
the other solution is just put it in a separate file and use some existing hashmap format like bdb, TC,
from annoy.
Or just store an indirect pointer alongside the item vectors, and have it point to a separate variable-sized chunk of memory after the vectors but before the tree nodes.
from annoy.
yes but that only gives you mapping in one direction, what about the other direction? you need some kind of binary search or hashing to make that work
from annoy.
I guess instead of using a contiguous sequence of nodes you can treat the whole annoy index as a hashmap and use something like cuckoo hashing? that way you just need to store the mapping from nodes to keys, not the other way around.
from annoy.
Oh, I see, Chris is asking about general keys. Yeah, then it needs to be pretty much like a sparkey hash.
from annoy.
Given that Sparkey already supports this with mmapping how would you feel about solving it with an annoy dependence on Sparkey?
from annoy.
I don't think Sparkey is available in PyPI so not sure if we want to depend on it
from annoy.
So maybe we can hold off on implementing for python annoy at the moment, but should be fine for the java version. In the meantime I can also try and get Sparkey submitted to PyPI. If we get Sparkey in PyPI then does this seem like a clean enough solution?
from annoy.
Sure. If you rely on Sparkey you could actually ignore the C++ part and implement this as a pure Python solution
from annoy.
Just realized that there is a C library for Sparkey so should be possible to do this outside of Python altogether. Would be nice to manage the dependencies through Travis but not sure how that would work with the travis.yml being language: python. Any ideas?
from annoy.
I don't know either maybe a git submodule?
It sounds like a lot of work to get this working compared to the benefit tbh... it's not super hard to just store a hash table separately. I think this is also how the other ANN libraries work (flann, sklearn, panns, nearpy)
from annoy.
Related Issues (20)
- Build parallalization is not working. HOT 2
- Support annoy index loading from binary index data
- How many trees should I use? HOT 2
- Memory Leak in Annoy (get_nns_by_vector)? HOT 8
- Annoy Object Not Pickle'able HOT 1
- Add sample weights to distance metric? HOT 3
- Source distribution not availabe for 1.17.2 version HOT 2
- Unable to inherit the AnnoyIndex class HOT 2
- doesn't work correctly if torch tensor is input. But also doesn't throw error. Pls add an assertion that this only takes np arrays not torch tensors HOT 2
- _Vector should use position-only parameter for the index HOT 3
- How do you reduce a vector to 2 coordinates HOT 1
- [Distance] What did I do wrong?
- [MSVC] Annoy failed to run test on Windows HOT 1
- Some segment faults HOT 1
- Regarding updating an existing ANNOY model HOT 2
- Anyone tried storing trees and nodes in DynamoDB? HOT 1
- Is there any workaround to be able to use the Chebyshev distance with this library? HOT 1
- from annoy import AnnoyIndex
- Annoy build failed in MSVC x86 mode
- Using a built Annoy tree in a different device HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annoy.