u1234x1234 / pynanoflann Goto Github PK
View Code? Open in Web Editor NEWUnofficial python wrapper to the nanoflann k-d tree
License: BSD 2-Clause "Simplified" License
Unofficial python wrapper to the nanoflann k-d tree
License: BSD 2-Clause "Simplified" License
Hey,
Thanks for the very good library, it is quite fast (fastest python implementation I've found so far). Unfortunately it seems that there is a memory leak somewhere.
If I run the following code memory usage keeps slowly increasing:
import pynanoflann
import numpy as np
for i in range(100):
print(i)
nn_search = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
nn_search.fit(np.random.rand(10000000,3))
_, nn_idx = nn_search.kneighbors(np.random.rand(1000,3))
del nn_search, nn_idx
I'm using version 0.0.1 of pynanoflann. The problem limits the usability of the library since I've to create new kdtree in each loop iteration and in the end I'll run out of memory.
Best regards
Juho
Hey there, wondering if there's a way to append elements without refitting every time the dataset.
Thanks!
/**
* Called during search to add an element matching the criteria.
* @return true if the search should be continued, false if the results are
* sufficient
*/
bool addPoint(DistanceType dist, IndexType index)
{
printf(
"addPoint() called: dist=%f index=%u\n", dist,
static_cast<unsigned int>(index));
if (dist < radius) m_indices_dists.emplace_back(index, dist);
return true;
}
DistanceType worstDist() const { return radius; }
};
Hey,
Another feature I've in mind is if it is possible to implement parallellized multicore processing for queries. For instance, in Scipy cKDTree (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query) you can define number of cores with n_jobs parameter.
What I mean is that we could set 'n_cores' parameter when executing the query:
import pynanoflann
import numpy as np
target = np.random.rand(10000, 3)
query = np.random.rand(2000, 3)
kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(target)
d, nn_idx = kd_tree.kneighbors(query, n_cores=10)
This would create a single kd-tree for the data in 'target' and then split the data in 'query' for multiple cores to do the queries.
Python implementation of this will introduce quite much overhead when working with relatively small data sets. I assume that this would be much faster when implemented in c++ side.
Best regards
Juho
Hey,
I'm wondering whether it could be possible to implement multi-core implementation for processing multiple batches of data simultaneously.
My current approach is:
import pynanoflann
import numpy as np
n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)
for i in range(n_batches):
pts_target = target[i]
pts_query = query[i]
kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(pts_target)
d, nn_idx = kd_tree.kneighbors(pts_query)
Instead, I'd like to do something like that:
import pynanoflann
import numpy as np
n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)
kd_trees = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_trees.fit(pts_target)
distances, nn_indexes = kd_trees.kneighbors(pts_query)
This would create a kd-tree for each batch in 'target'. Corresponding batches in 'query' then would be used to make nearest neighbor searches with corresponding kd-trees.
Currently, if I want to implement this kind of parallelized processing I have do it in python. For large data sets this is not a problem but for smaller data sets it will cause too much overhead. I assume that it would be much faster to implement in c++ side of the code.
Best regards
Juho
Do you have any plans to add pynanoflann to pypi to make it easier to include in other projects? I've made some updates and written a GitHub workflow to build wheels of pynanoflann the I can merge if you are interested. Otherwise and can also upload my fork to pypi (possibly under a different name) if you're not interested.
Hello,
Thanks for this great tool.
I try to use your code, and I realize that there is no specific package named 'nanoflann_ext' in import nanoflann_ext
.
Can I get some help with this? Thanks in advance
Hey! I really like the new multiprocessing capabilities of kneighbors
, for one of my projects I need to use radius queries, would it be possible to get the same feature for radius_neighbors
?
Your library is great, by the way, I have found it to be the fastest option for processing very large point clouds, so thank you!
First thanks for this great tool.
I'm a little confused by the radius parameter:
nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)
It seems like I can set this to a tiny value and still get correct results when I query.
Hi @u1234x1234,
Thanks for your awesome wrapper. I would like to use it for radius search in 3D point clouds. In my implementation, I would like to be able to save a KDTree in a text file and then reload it afterwards.
I tried to use pickle, but nanoflann_ext.KDTree32 or nanoflann_ext.KDTree64 object are not supported.
Do you think there is a way to save these objects to files? Or enable pickle to serialize them?
Best,
Hugues
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.