pixelogik / nearpy Goto Github PK

Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes.

License: MIT License

Python 100.00%

nearpy's Introduction

NearPy

NearPy is a Python framework for fast (approximated) nearest neighbour search in high dimensional vector spaces using different locality-sensitive hashing methods.

It allows to experiment and to evaluate new methods but is also production-ready. It comes with a redis storage adapter.

To install simply do pip install NearPy. It will also install the packages scipy, numpy and redis.

Both dense and sparse (scipy.sparse) vectors are supported right now.

Read more here: http://pixelogik.github.io/NearPy/

Principle

To find approximated nearest neighbours for a query vector, first the vectors to be stored are indexed. For each vector that should be indexed ('stored') a hash is generated, that is a string value. This hash is used as the bucket key of the bucket the vector is then stored into. Buckets are in most cases just lists of vectors, it is the terminology used in these applications.

This would not be of any use for finding nearest neighbours to a query vector. So the 'secret' to this mechanism is to use so called locality sensitive hashes, LSHs. These hashes take spatiality into account so that they tend to generate identical hash values (bucket keys) for close vectors. This makes it then super fast to get close vectors given a query vector. Because this is a very rough approach it is called approximated nearest neighbour search, because you might not get the real nearest neighbours. In many applications this is fine because you just wanna get 20 vectors that are 'equal enough'.

Engine

When using NearPy you will mostly do this by configuring and using an Engine object. Engines are configurable pipelines for approximated nearest neighbour search (ANNS) using locality sensitive hashes (LSHs).

Engines are configured using the constructor that accepts the different components along the pipeline:

def __init__(self, dim, lshashes=[RandomBinaryProjections('default', 10)],
             distance=CosineDistance(),
             vector_filters=[NearestFilter(10)],
             storage=MemoryStorage()):

The ANNS pipeline is configured for a fixed dimensionality of the feature space, that is set using the dim parameter of the constructor. This must be an positive integer value.

The engine can use multiple LSHs and takes them from the lshashes parameter, that must be an array of LSHash objects.

Depending on the kind of filters used during querying a distance measure can be specified. This is only needed if you use filters that need a distance (like NearestFilter or DistanceThresholdFilter).

Filters are used in a last step during querying nearest neighbours. Existing implementations are NearestFilter, DistanceThresholdFilter and UniqueFilter.

The is another parameter to the engine called fetch_vector_filters, which is performed after fetching candidate vectors from the buckets and before distances or vector filters are used. This is by default one UniqueFilter and it should always be. I keep this still as a parameter if people need to play around with that.

The engine supports different kinds of ways how the indexed vectors (and the buckets) are stored. Current storage implementations are MemoryStorage and RedisStorage.

There are two main methods of the engine:

store_vector(self, v, data=None)
neighbours(self, v)

store_vector() hashes vector v with all configured LSHs and stores it in all matching buckets in the storage. The optional data argument must be JSON-serializable. It is stored with the vector and will be returned in search results. It is best practice to use only database ids as attached 'data' and store the actual data in a database.

neighbours() hashes vector v with all configured LSHs, collects all candidate vectors from the matching buckets in storage, applies the (optional) distance function and finally the (optional) filter function to construct the returned list of either (vector, data, distance) tuples or (vector, data) tuples.

To remove indexed vectors and their data from the engine these two methods can be used:

clean_all_buckets(self)
clean_buckets(self, hash_name)

Hashes

All LSH implementation in NearPy do subclass nearpy.hashes.LSHash, which has one main method, besides constructor and reset methods.

    hash_vector(self, v)

hash_vector() hashes the specified vector and returns a list of bucket keys with one or more entries.

The LSH RandomBinaryProjections projects the specified vector on n random normalized vectors in the feature space and returns a string made from zeros and ones. If v lies on the positive side of the n-th normal vector the n-th character in the string is a '1', if v lies on the negative side of it, the n-th character in the string is a '0'. This way this LSH projects each possible vector of the feature space ('input space') into one of many possible buckets.

The LSH RandomDiscretizedProjections is almost identical to RandomBinaryProjections. The only difference is, that is divides the projection value by a bin width, and using the bin index in each random projection as part of the bucket key. Given the same count of random projection vectors as RandomBinaryProjections, this results in more buckets given the same vector set. The density of buckets on the projections can be controlled by the bin width, which is part of the constructor.

The LSH PCABinaryProjections is trained with a training set of vectors specified with the constructor. It performs PCA (principal component analysis) to find the directions of highest variance in the training set. It then uses the first n principal components as projection vectors (or dimensions of the subspace that is projected into). The idea was that this makes it more safe to get a good distribution of the vectors among the buckets. I do not have any tests on this and don't know if this makes sense at all.

The LSH PCADiscretizedProjections is the pca version of RandomDiscretizedProjections, not using random vectors but the first n principal components of the training set, like PCABinaryProjections does it.

Hash configurations

To save your index in Redis and re-use it after the indexing process is done you should persist your hash configurations so that afterwards you can re-create the same engine for more indexing or querying.

This is done like this:

# Create redis storage adapter
redis_object = Redis(host='localhost', port=6379, db=0)
redis_storage = RedisStorage(redis_object)

# Get hash config from redis
config = redis_storage.load_hash_configuration('MyHash')

if config is None:
    # Config is not existing, create hash from scratch, with 10 projections
    lshash = RandomBinaryProjections('MyHash', 10)
else:
    # Config is existing, create hash with None parameters
    lshash = RandomBinaryProjections(None, None)
    # Apply configuration loaded from redis
    lshash.apply_config(config)

# Create engine for feature space of 100 dimensions and use our hash.
# This will set the dimension of the lshash only the first time, not when
# using the configuration loaded from redis. Use redis storage to store
# buckets.
engine = Engine(100, lshashes=[lshash], storage=redis_storage)

# Do some stuff like indexing or querying with the engine...

# Finally store hash configuration in redis for later use
redis_storage.store_hash_configuration(lshash)

===========

Example usage:

from nearpy import Engine
from nearpy.hashes import RandomBinaryProjections

# Dimension of our vector space
dimension = 500

# Create a random binary hash with 10 bits
rbp = RandomBinaryProjections('rbp', 10)

# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp])

# Index 1000000 random vectors (set their data to a unique string)
for index in range(100000):
    v = numpy.random.randn(dimension)
    engine.store_vector(v, 'data_%d' % index)

# Create random query vector
query = numpy.random.randn(dimension)

# Get nearest neighbours
N = engine.neighbours(query)

===========

nearpy's People

Contributors

Stargazers

Watchers

Forkers

waytai zehsilva rudaoshi maheshakya scw wajihcz mikluko slitayem linearregression vicgc evangoris nhpackard owo kharazi ml-lab shixing lazycrazyowl sufyan-wish gnavvy ogh marccote wanji koorukuroo hellcoderz zclfly andreacatalucci pombredanne deepack mindis kelvict wajih-o dragancc beifeizhou mesnilgr liaobs noa johnny555 mlopezm bartleyn sbhavani semanticmd fmacias64 rtvt123 xieyanfu imsparsh nguyenhoan1988 utsavdrolia amorgun ymt123 yangspeaking elezar qingsong99 sujaynarumanchi soroushmehr tegjyotsingh nobodyinamerica viktorpi njuhugn ajgappmark stevenlol jinzhu2014 wanjinchang pandasasa iamxiaodong giacobenin desperado1992 maxwellrebo rickysahu erramuzpe lctpalmer bbt132 sergherrero zhenv5 zugenliu yuedeji rahulgupta-reflektion reflektion runrunliuliu python3pkg aaronchazhoor-reflektion sarthaktum rishmeh19 filitchp jakirkham ayiruss mariusvniekerk brewkode alexfridman posix4e xieqihui ubear pakozm liwzhi mldsbigguy wzhang1 nixworks sachuin23 ashishyadavppe liuheng0111 jdb110

nearpy's Issues

nearpy.io down

Seems like it has been out of service for more than a week.

Thanks for the project, it is really nice :)

Sparse matrix support

Hi,
Readme says that sparse vectors are currently support. Is that true?
I've just tries recent version from github and getting ValueError: dimension mismatch when it call a hash_vector function.
It can be just my misunderstanding how to work with this library. I can provide a code where I reproduced the error.

Use set instead of list in redis storage to prevent duplicate vectors

I suggest using sadd instead of rpush at https://github.com/pixelogik/NearPy/blob/master/nearpy/storage/storage_redis.py#L87

Also we should use smembers instead of lrange https://github.com/pixelogik/NearPy/blob/master/nearpy/storage/storage_redis.py#L94

This will prevent duplicate vectors.

No sdist available on pypi

Can you upload an sdist to pypi?

https://pypi.python.org/pypi/NearPy

As part of conda-forge/staged-recipes#3858 I want to package this as a conda-forge feedstock.

Memory Usage

Hi,
I have a problem with memory usage, my vector dim is 2**14, i know it's very large but in #3 you say about turning vectors to sparse matrix
this is my redis memory usage when my vectors count is 239:

   # Memory
   used_memory:178110184
   used_memory_human:169.86M
   used_memory_rss:181440512
   used_memory_peak:178248520
   used_memory_peak_human:169.99M
   used_memory_lua:33792
   mem_fragmentation_ratio:1.02
   mem_allocator:jemalloc-3.0.0

Why it's Use about 169M of ram? is it normal for my data size?

Start a real documentation of the library

With detailed description of the components, tutorials, examples, etc...

Retrieving `k` nearest neighbours?

Is there a process/parameter setting that will make sure queries return no less than k ANN? (and preferably also not much more than k... exactly k would be ideal)

Like when the user asks for exactly top-20 most similar items.

I'm imagining something like searching nearby buckets, if there's too few candidates. I haven't checked the code yet, maybe it's already there :)

Readthedocs

A bug in class Engine init() function

In engine.py, the class Engine __init__() function, the default value for storage is MemoryStorage().
However, in this case, if I create two Engine instances without passing storage instance (i.e. using the default value), the two engine instances would share the same storage instance. I am not sure if you want that behavior, but to me, it's a little bit awkward.

I recommend change to this :

def __init__(...,storage = None):
    if storage == None : 
         self.storage = MemoryStorage()

Use python table storage instead of pickle for hashes

Hi,

I'v been doing some testing with nearpy, and I think it would be a good idea to replace pickle storage with python table for hashes matrixes to be able to work with data in large dimendions.
Real life problems (CF recommendation etc..) have billions of dimensions, and more dimensions we have, the more we need hash methods as well.
Pickle starts to struggle when it comes to store those large matrixes (while numpy can handle without problems matrixes with billion of rows/thousands of columns, if you do have memory for it).
Pickles also has performance memory issues vs table
http://www.shocksolution.com/2010/01/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/

I guess more scalable solutions are probably hadoop based
https://github.com/takahi-i/likelike
https://github.com/mrsqueeze/spark-hash

(still a very handy lib though!)

Thanks!

Finish / polish support for sparse vectors in storage adapters.

Mismatch between docstring and functionality

There is at least one mismatch that I'm noticing right now. Specifically, the functionality specified in the docstring of Engine doesn't match what happens.

For example, in engine.py, you say:

There are four different modes of the engine:

. . .

2) No distance - The distance argument is None.
                In this case only the vector filters are applied to
                the bucket contents and the result is a list of
                filtered (vector, data) tuples.

But you're specifically replacing all incoming None arguments with default values:

def __init__(self, ...):
    #. . .
    if distance is None: distance = CosineDistance()
    self.distance = distance

Then, in Engine.neighbors, you check for self.distance being None and execute based on that:

def neighbors(self, v):
    #. . .
    # Apply distance implementation if specified
    if self.distance:
        candidates = self._append_distances(v, self.distance, candidates)

multiple hash tables

In LSH there are two factors that affect the accuracy, the number of bits in the code (projections) and the number of hash tables.
When you create an engin like:

rbp = RandomBinaryProjections('rbp', 10)
engine = Engine(dimension, lshashes=[rbp])

Does it only do one time random projection?
Can we redo the hashing multiple times to increase the accuracy?

Use tags for releases and use develop branch for development

so that people can do pull requests on the develop branch and not the master.

When set distance=None, it actually still assign a distance function

If I explicitly set the distance=None in the init, why it still set a cos distance function? Should it be just None. If I didn't set any value there, then set a default distance function. Otherwise, I have to set it again using engine.distance = None.

Persist hash configurations in storage

This is needed to load hash configurations from storage (projection vectors). Otherwise random hashes are of no use.

TypeError: unhashable type: 'numpy.ndarray'

Hi,
I'm trying NearPy, and I got some error. Below is some code and error message.

import scipy.io as sio

mat_file = sio.loadmat("C:\\Users\\RA2\\Desktop\\Workspaces\\LightenedCNN_A_lfw.mat")
fea = mat_file["features"]
labels = mat_file["labels_original"]
img_path = mat_file["image_path"]

from nearpy import Engine
from nearpy.hashes import RandomBinaryProjections

# Dimension of our vector space
dimension = 256

n_d = 10
rbp1 = RandomBinaryProjections('rbp', n_d)


# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp1])

for v, i in zip(fea[1:,:], range(1, len(fea)+1)):
    engine.store_vector(v, img_path[i])

query = fea[0]

# Get nearest neighbours
N = engine.neighbours(query)

File "C:\Users\RA2\Anaconda3\lib\site-packages\nearpy\filters\uniquefilter.py", line 51, in filter_vectors
    unique_dict[v[1]] = v

TypeError: unhashable type: 'numpy.ndarray'

The code crash when n_d is below 15.
Is this a bug?

Thanks
Zehao Shi

Storage wrapper for Postgres?

In my opinion, Postgres is a highly used Database and there should be storage support for Postgres Database as well. I am willing to take up this task.
Planning to use Psycopg2 interface

Upgrade version available in pip

The version available in pip is not compatible with python3. Please push new compatible version.

Thanks.

Add more documentation and comments to the new permutation component.

ValueError: dimension mismatch, When using space coo matrix

Hi there,
I have this error when updating nearpy to new version,

    neighbours = nearpy_engine.neighbours(vector)
  File "/home/vahid/dev/mirad/backend/venv/local/lib/python2.7/site-packages/nearpy/engine.py", line 153, in neighbours
    nv = v / np.linalg.norm(v)
  File "/home/vahid/dev/mirad/backend/venv/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 2126, in norm
    sqnorm = dot(x, x)
  File "/home/vahid/dev/mirad/backend/venv/local/lib/python2.7/site-packages/scipy/sparse/base.py", line 317, in __mul__
    raise ValueError('dimension mismatch')
ValueError: dimension mismatch

my vector is:

    basic_vector = vectorizer.transform(
        ['some text'])
    vector = basic_vector.transpose().tocoo()

and vectorizer is a sklearn hashing vectrozier (n_features=2**18). np.linalg.norm will dot(v, v) and since v is an sparce matrix this error occurs.

Add save/load support to new meta-hash HashPermutations

Python 3 compatibility

I've done some work in this direction. Tets in run_tests.py are passing correctly. Pull request is coming.

Indexing is too slow.

ImportError: No module named permutation.hashpermutations

I'm trying to get the latest version of NearPy so I'm doing the following:

root@67f7a4cf4faf:/# pip install git+git://github.com/pixelogik/NearPy.git
*** cut ***
root@67f7a4cf4faf:/# pip freeze
NearPy==0.1.2
argparse==1.2.1
bitarray==0.8.1
numpy==1.9.1
redis==2.10.3
scipy==0.14.0
virtualenv==1.11.6
wsgiref==0.1.2

No virtualenv in my case. Everything is fine until I try to use it:

root@67f7a4cf4faf:/# python -c 'import nearpy'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nearpy/__init__.py", line 20, in <module>
    from nearpy.engine import Engine
  File "/usr/local/lib/python2.7/dist-packages/nearpy/engine.py", line 27, in <module>
    from nearpy.hashes import RandomBinaryProjections
  File "/usr/local/lib/python2.7/dist-packages/nearpy/hashes/__init__.py", line 31, in <module>
    from nearpy.hashes.permutation.hashpermutations import HashPermutations
ImportError: No module named permutation.hashpermutations

I believe this could be related to 90239b7.

Use Hilbert space filling curve?

I'm curious about the choice of random binary projections versus Hilbert curve. Does rbp outperform it?

Would it make sense to add another hasher?

engine.store_vector

For a large number of vectors, it takes a long time to call "engine.store_vector()" for all the data points.
Is there faster way to index the vectors?

BTW, thank you for answering quickly.

Nearest Neighbours in Cosine distance

A vector V is hashed into a bucket B, the V is compared with all vectors in B. This only bucket B usually leads to bad performance.

A good way to solve it is to find several closest neighbour buckets in term of hamming distance. e.g. B= "1001", the neighbour buckets could be "1000" ,"1011" ... Then compare V with all vectors in these buckets.

How do you think about it ? I've implemented an extension class based on your class Engine and would like to incorporate into your code.

Thanks,
Xing

Crashing in storing and retrieve when data is string or list of string

Try this library when data is a string or a list of string.
Expect storage a success and get nearest neighbor etc upon querying.

Code:
import numpy
from numpy import random
from nearpy import Engine
from nearpy.hashes import RandomBinaryProjections
from nearpy.filters import NearestFilter
from nearpy.distances import EuclideanDistance

dimension=5000
encoding_length = 10
rbp = RandomBinaryProjections('rbp', encoding_length) #or PCABinaryProjection
engine = Engine(dimension, lshashes=[rbp],
distance=EuclideanDistance(),
vector_filters=[NearestFilter(encoding_length)])
engine.clean_all_buckets()
i = numpy.random.randn(dimension)
s = 'abcde'
engine.store_vector(i, s)
y = engine.neighbours(s)

CRASH:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nearpy/engine.py", line 93, in neighbours
for bucket_key in lshash.hash_vector(v):
File "/usr/local/lib/python2.7/dist-packages/nearpy/hashes/randombinaryprojections.py", line 63, in hash_vector
return [''.join(['1' if x > 0.0 else '0' for x in projection])]
TypeError: 'NotImplementedType' object is not iterable
ret = engine.neighbours(s)

Struggle in sparse vector

I used csr_matrix in scipy library to represent sparse vector, but it can not suit for store_vector function of our library because a vector in scipy still be a 1xn matrix, which give the dimension =1 will be mismatch with our lshash. Could you give me an example or let me know the suitable library for computing with sparse vector and can be applied to our library.

Possibility to update vectors

Hi,

I think it would be very useful to have possibility to update vectors index.
Typically, a CF application would need to update movies/users vectors regularilly as users buy/see movies. From my undertanding of the model, it would require 1) vectorize movie before update 2) retrieve buckets containing movie vectors and remove it from set 3) vectorize movie after update and push it.
While it's certainly doable, there might be some more intermediary data structures that could help in that task.

Thanks!

Do a tutorial using word2vec

Word2vec apparently produces good data sets for testing hash configurations and doing a tutorial

Experiment with binary trees

This would guarantee to always get N results.

Complete test coverage

Engines, sparse vector storage, all hashes.

Rename AngularDistance to CosineDistance

For clarity, be more in line with Machine Learning terminology.

Support for generating and searching in L hash tables

Hi,
I didn't see this feature while experimenting with NearPy API, so apologies if I missed something...
As I understand NearPy.Engine creates a single LSH hash table. Most ANN algorithms use a number of hash tables to reduce false negative rates (missing a close neighbor).
Is it possible to create a simple API for this in NearPy?

Thanks!

What are the plans for adding supports to store index data in mongo ?

I was using Redis to store index for index application. As Redis stores everything as string, this drives the memory usage up.
Is anybody working on integrating mongo to store indexes?

cpickle required on py3 - crash redis storage

fix to - storage_redis.py
line 34

------ ----------------------

try:
import cPickle
except:
import pickle

Improvement on README about role of RandomBinaryProjections

The information about role of RandomBinaryProjections is sparse.
Please improve that?

The code for experiment is clearer about that they are doing then the example usage.
Suggest improve example usage. Also have a folder 'example' containing self contained samples.

Add function to `engine` to remove an item from the index

I would like to be able to remove things from a bucket.

I think the function should look like this:

engine.remove(id)
# what should it return?
# what should happen if `id` is not in the index?

So simple I might come back with a PR.

scipy.sparse.issparse(v) object has no attribute 'sparse'

Hi, pixelogik. I'm very interested in the Local Sensitive Hashing and plan to use it for image retrieval. I install the NearPy and the redis packages. Then I test the following example:

import numpy
from nearpy import Engine
from nearpy.hashes import RandomBinaryProjections

# Dimension of our vector space
dimension = 500

# Create a random binary hash with 10 bits
rbp = RandomBinaryProjections('rbp', 10)

# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp])

# Index 1000000 random vectors (set their data to a unique string)
for index in range(100000):
    v = numpy.random.randn(dimension)
    engine.store_vector(v, 'data_%d' % index)

# Create random query vector
query = numpy.random.randn(dimension)

# Get nearest neighbours
N = engine.neighbours(query)

It throws out the error:

C:\Python27\python.exe D:/python/projects/lsh/lsh-image-search.py
Traceback (most recent call last):
  File "D:/python/projects/lsh/lsh-image-search.py", line 17, in <module>
    engine.store_vector(v, 'data_%d' % index)
  File "C:\Python27\lib\site-packages\nearpy\engine.py", line 79, in store_vector
    for bucket_key in lshash.hash_vector(v):
  File "C:\Python27\lib\site-packages\nearpy\hashes\randombinaryprojections.py", line 65, in hash_vector
    if scipy.sparse.issparse(v):
AttributeError: 'module' object has no attribute 'sparse'

This happens because the scipy module doesn't have any attribute named sparse. That attribute only gets defined when you import scipy.sparse.

So in the randombinaryprojections.py, it's good to import scipy.sparse. After I do this, it succeed.

Number of returned neighbors

I have this problem that I always get 10 results from the engine.neighbours() function. It doesn't matter how many projections I have of what is the bucket size.

Am I missing something? Could not find any setting on number of neighbors returned.

Adding new images and removing an exisitng images index

I'm new to using the NearPy package. I'm having set of images with their vectors of features that are already indexed. If i get a new image, Is there a function that allows me to add to specific bucket according to this image's similarity to other images already existing?

Also if i need to remove an index of a specific image that i deleted from my dataset, is there a function for that?

Thanks

Any idear how to save the engine?

Hi, is there a way to save the object "engine" to disk? So I don't have to rerun the code if there is a reboot.

Please fix the Redis store_many_vectors_case

@xieqihui Sorry for the hassle.

Could you please create a new pull request for your store_many_vectors addition and make this little adjustment I suggested, so that store_many_vectors works even if not Redis is used for storage?

nearest neighbour list is empty

On running the example code, the list of nearest neighbours is empty.
N = engine.neighbours(query)

len(N) # 0

Can you explain why is that so?

Empty Sequence returned by Nearest Neighbour query

Hi,

I am using the example usage code snippet to hash vectors into buckets and find the nearest neighbour. I have stored 20 vectors of 4096 dimensions each in the engine. Querying the engine for the nearest neighbour of a test vector occasionally returns an empty sequence.

I tried increasing the number of random projections to 4000 with the same results. Can you help me to understand why this is happening?

Thanks!
Deshana

Configure CI

After #40 this repository works well with Travis CI so I suggest the following:

Enable building of this project at https://travis-ci.org/pixelogik/NearPy
Add badge to readme

what's the effect of permutations hash object?

At first, I only put RandomBinaryProjections hash in my codes, and it did not yield good results ,but I can repeat run my query function, not include index vector function, eg engine.store_vector(vector, data);

when I add HashPermutations object in my code. such as
permutations = HashPermutations('p')
rbp_conf = {'num_permutation':50, 'beam_size':10, 'num_neighbour':10}
permutations.add_child_hash(rbp_perm, rbp_conf)
......
redis_storage.store_hash_configuration(rbp_perm)
permutations.build_permuted_index()
Finally the result is better than first.
but if I want to run query function ,I cann't remove 'engine.store_vector(vector, data); '.
so how can I only execute query function code, not include store vector codes.
just like,

redis_object = redis.Redis(host='localhost', port=0, db=0, unix_socket_path='/tmp/redis.sock')
redis_storage = RedisStorage(redis_object)
config = redis_storage.load_hash_configuration('v')
if config is None:
rbp_perm = RandomBinaryProjections('v', 30)
else:
rbp_perm = RandomBinaryProjections(None, None)
rbp_perm.apply_config(config)

 engine.candidate_count(query)
 engine.neighbours(query)

I hope you will soon answer my question. ths very much

Add option to compute and export similarity matrix as JSON file.

This would allow to use just the similarity matrix (nearest N for each vector), that can easily be stored in databases (NoSQL, SQL) for lookup. This way applications must not compute nearest neighbours on every request.

pixelogik / nearpy Goto Github PK

nearpy's Introduction

NearPy

Principle

Engine

Hashes

Hash configurations

nearpy's People

Contributors

Stargazers

Watchers

Forkers

nearpy's Issues

------ ----------------------

Recommend Projects

Recommend Topics

Recommend Org