u1234x1234 / pynanoflann Goto Github PK

View Code? Open in Web Editor NEW

32.0 3.0 8.0 73 KB

Unofficial python wrapper to the nanoflann k-d tree

License: BSD 2-Clause "Simplified" License

Python 57.02% C++ 42.98%

python kd-tree kdtree nearest-neighbor-search nearest-neighbors nanoflann pybind11

pynanoflann's People

Contributors

Stargazers

Watchers

Forkers

sandy4321 ccinc eokeeffe stongeetienne hochshi akaszynski shreyanshdarshan dwastberg

pynanoflann's Issues

Memory leak

Hey,

Thanks for the very good library, it is quite fast (fastest python implementation I've found so far). Unfortunately it seems that there is a memory leak somewhere.

If I run the following code memory usage keeps slowly increasing:

import pynanoflann
import numpy as np

for i in range(100):
    print(i)
    
    nn_search = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)    
    nn_search.fit(np.random.rand(10000000,3))
    _, nn_idx = nn_search.kneighbors(np.random.rand(1000,3))
    
    del nn_search, nn_idx

I'm using version 0.0.1 of pynanoflann. The problem limits the usability of the library since I've to create new kdtree in each loop iteration and in the end I'll run out of memory.

Best regards
Juho

.AddPoint() function?

Hey there, wondering if there's a way to append elements without refitting every time the dataset.
Thanks!

 /**
     * Called during search to add an element matching the criteria.
     * @return true if the search should be continued, false if the results are
     * sufficient
     */
    bool addPoint(DistanceType dist, IndexType index)
    {
        printf(
            "addPoint() called: dist=%f index=%u\n", dist,
            static_cast<unsigned int>(index));

        if (dist < radius) m_indices_dists.emplace_back(index, dist);
        return true;
    }

    DistanceType worstDist() const { return radius; }
};

Multicore processing of query points

Hey,

Another feature I've in mind is if it is possible to implement parallellized multicore processing for queries. For instance, in Scipy cKDTree (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query) you can define number of cores with n_jobs parameter.

What I mean is that we could set 'n_cores' parameter when executing the query:

import pynanoflann
import numpy as np

target = np.random.rand(10000, 3)
query = np.random.rand(2000, 3)

kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(target)
d, nn_idx = kd_tree.kneighbors(query, n_cores=10)

This would create a single kd-tree for the data in 'target' and then split the data in 'query' for multiple cores to do the queries.

Python implementation of this will introduce quite much overhead when working with relatively small data sets. I assume that this would be much faster when implemented in c++ side.

Best regards
Juho

Parallelized batch processing

Hey,

I'm wondering whether it could be possible to implement multi-core implementation for processing multiple batches of data simultaneously.

My current approach is:

import pynanoflann
import numpy as np

n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)

for i in range(n_batches):
    pts_target = target[i]
    pts_query = query[i]
    
    kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
    kd_tree.fit(pts_target)
    d, nn_idx = kd_tree.kneighbors(pts_query)

Instead, I'd like to do something like that:

import pynanoflann
import numpy as np

n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)

kd_trees = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_trees.fit(pts_target)
distances, nn_indexes = kd_trees.kneighbors(pts_query)

This would create a kd-tree for each batch in 'target'. Corresponding batches in 'query' then would be used to make nearest neighbor searches with corresponding kd-trees.

Currently, if I want to implement this kind of parallelized processing I have do it in python. For large data sets this is not a problem but for smaller data sets it will cause too much overhead. I assume that it would be much faster to implement in c++ side of the code.

Best regards
Juho

Adding pynanoflann to pypi

Do you have any plans to add pynanoflann to pypi to make it easier to include in other projects? I've made some updates and written a GitHub workflow to build wheels of pynanoflann the I can merge if you are interested. Otherwise and can also upload my fork to pypi (possibly under a different name) if you're not interested.

How to find the package 'nanoflann_ext' ?

Hello,

Thanks for this great tool.

I try to use your code, and I realize that there is no specific package named 'nanoflann_ext' in import nanoflann_ext.

Can I get some help with this? Thanks in advance

I tried to use pickle, but nanoflann_ext.KDTree32 or nanoflann_ext.KDTree64 object are not supported.

Do you think there is a way to save these objects to files? Or enable pickle to serialize them?

Best,
Hugues

u1234x1234 / pynanoflann Goto Github PK

pynanoflann's People

Contributors

Stargazers

Watchers

Forkers

pynanoflann's Issues

Memory leak

.AddPoint() function?

Multicore processing of query points

Parallelized batch processing

Adding pynanoflann to pypi

How to find the package 'nanoflann_ext' ?

how can i added quad_tree nearest method in benchmark?

Parallelized Radius Query

What does the radius parameter do?

Serializing KDTrees

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent