mourner / kdbush Goto Github PK

View Code? Open in Web Editor NEW

620.0 24.0 70.0 51 KB

A fast static index for 2D points

License: ISC License

JavaScript 100.00%

javascript algorithm spatial-index computational-geometry 2d-points fast

kdbush's Introduction

KDBush

A very fast static spatial index for 2D points based on a flat KD-tree. Compared to RBush:

Points only — no rectangles.
Static — you can't add/remove items after initial indexing.
Faster indexing and search, with lower memory footprint.
Index is stored as a single array buffer (so you can transfer it between threads or store it as a compact file).

If you need a static index for rectangles, not only points, see Flatbush. When indexing points, KDBush has the advantage of taking ~2x less memory than Flatbush.

Usage

// initialize KDBush for 1000 items
const index = new KDBush(1000);

// fill it with 1000 points
for (const {x, y} of items) {
    index.add(x, y);
}

// perform the indexing
index.finish();

// make a bounding box query
const foundIds = index.range(minX, minY, maxX, maxY);

// map ids to original items
const foundItems = foundIds.map(i => items[i]);

// make a radius query
const neighborIds = index.within(x, y, 5);

// instantly transfer the index from a worker to the main thread
postMessage(index.data, [index.data]);

// reconstruct the index from a raw array buffer
const index = KDBush.from(e.data);

Install

Install with NPM: npm install kdbush, then import as a module:

import KDBush from 'kdbush';

Or use as a module directly in the browser with jsDelivr:

<script type="module">
    import KDBush from 'https://cdn.jsdelivr.net/npm/kdbush/+esm';
</script>

Alternatively, there's a browser bundle with a KDBush global variable:

<script src="https://cdn.jsdelivr.net/npm/kdbush"></script>

API

new KDBush(numItems[, nodeSize, ArrayType])

Creates an index that will hold a given number of points (numItems). Additionally accepts:

nodeSize: Size of the KD-tree node, 64 by default. Higher means faster indexing but slower search, and vise versa.
ArrayType: Array type to use for storing coordinate values. Float64Array by default, but if your coordinates are integer values, Int32Array makes the index faster and smaller.

index.add(x, y)

Adds a given point to the index. Returns a zero-based, incremental number that represents the newly added point.

index.range(minX, minY, maxX, maxY)

Finds all items within the given bounding box and returns an array of indices that refer to the order the items were added (the values returned by index.add(x, y)).

index.within(x, y, radius)

Finds all items within a given radius from the query point and returns an array of indices.

`KDBush.from(data)`

Recreates a KDBush index from raw ArrayBuffer data (that's exposed as index.data on a previously indexed KDBush instance). Very useful for transferring or sharing indices between threads or storing them in a file.

Properties

data: array buffer that holds the index.
numItems: number of stored items.
nodeSize: number of items in a KD-tree node.
ArrayType: array type used for internal coordinates storage.
IndexArrayType: array type used for internal item indices storage.

kdbush's People

Contributors

Stargazers

Watchers

kdbush's Issues

[Feature Request] Support 3D Space

As the title says, would it be possible to extend this module for 3d coordinates? Not sure if the change would affect the performance.

I'm asking for 3d extension as the project I'm working on needs to deal with massive points in 3d spaces with complex calculation. Having this tree structure to limit the set of points to process will help greatly.

Please let me know the direction to go. I'm also happy to submit a PR if the feature can be done.

Thanks.

Tests

Needs basic tests.

Visualization

Something similar to RBush viz: http://mourner.github.io/rbush/viz/viz-cluster.html

Java port

I am trying to port Mapbox supercluster algorithm in Java. Is there any Java port for kdbush ?

Using kdbush with lat long coordinates

How do I use kdbush with a set of lat, long co ordinates . Is there a easy way to convert from lat-long to x,y .

Deterministic queries

Hello,

I've been trying to use kdbsuh (v3) as an index for geographical points to be shown in a Leaflet map for a thesis. Given the amount of markers to display I'm trying to implement a pseudoclustering by myself. I tried already different libraries like supercluster but the I didn't like the result.
My use case is that the stored points can change by fetching more and merging with the already existing one when the user move on the map.

My solution was to create an index every time the markers changed (use case is to have an upper bound of maximum of 150k markers but it's kinda a really low probability use case) and compute the visible points in such way:

function getVisibleMarkers(index, markers, bounds, zoom) {
             const markersInBounds = index.range(bounds.sw.lng, bounds.sw.lat, bounds.ne.lng, bounds.ne.lat)
                                           .map((index) =>  index.points[index]);
             
             const overlappingMarkers = {};
             const markersToRender  = [];
        
           markersInBounds.forEach((marker) => {
               if(marker.id in overlappingMarkers) return;
              
              // we start with a 1 mil. meters radius, this is relative to the level of zoom and a dispersion factor
              // more details and more zoom means a lower radius to detect overlapping markers
              
               const radius = (1.000.000 / Math.pow(2, zoom * -1)) * 2.4;
               const nearbyPoints = geokdbush.around(index, marker.lng, marker.lat, Infinity, radius / 1000)  // convert meters to km
               nearbyPoints.forEach(nearbyPoint) => markersToIgnore[nearbyPoint.id] = nearbyPoint);
               markersToRender.push(marker);
   }
   
   return markersToRender;
 }

This is computed when the bounds change or the new markers are merged with the current one, it seems to work quite fine and fast but it's producing different results. For example, when retrieving new markers the previously shown markers are changing in favor of others even when calculating the visible markers in the same order.

I was wondering if could be that when rebuilding the index the order changes, if that's the case what could be a solution?
Thanks!

[Feature Request] Import from /Export to Array Buffer, like Flatbush

Hi,

Thanks for the work on this very helpful library!

I landed here from a comment in the companion library, which has methods that support serializing the index to disk (via the data property on an index instance), and loading that serialized result back conveniently (via Flatbush.from). It would be really convenient to have those same methods available for this library too.

mourner/flatbush#19 (comment)

Use an appropriate type for indices regardless of data type

First of all, thanks for providing this awesome lib! This is more a question rather than a bug report.

I notice that one can specify the dtype using the 4th parameter of the constructor, which is used for the index and coords array: https://github.com/mourner/kdbush/blob/master/src/index.js#L18-L19

Would it make sense to store the indices in anything other than a typed int array? I am asking myself this because storing the points as floats would automatically set this.ids to floats as well.

As an idea, we could fix this.ids to its most efficient dtype using something like this

  const bits = Math.log2(points.length);
  let IdxArrayType = Uint8Array;
  if (bits > 8) IdxArrayType = Uint16Array;
  if (bits > 16) IdxArrayType = Uint32Array;

  this.ids = new IdxArrayType(points.length);
  this.coords = new ArrayType(points.length * 2);

kNN search

"tape" should be in "devDependencies"

"tape" is listed in "dependencies" but it should be in "devDependencies"

Should range request include all boundaries?

Is it intent that range function captures points on both left and right (or top and bottom) boundaries? When I look at argument names (minX, maxX) than this behavior is understandable, but it can lead to bugs when used in map tilification - one point can be assigned in two tiles.

Btw. thank you for great open source libraries!

Tomas Zamecnik
Windy.com

range function returns with an empty array.

Red points are the min/max points of the boundary box.
Blue points are my coordinates in the tree.
As you can see there are a few points that are inside the boundary box, but KDBush returns with an empty array.
Why could this happen?

`.within()` queries using different radii in each dimension

This is a fantastic library--thank you for your great work!

I just ran into a situation where I had Date().getTime() objects in one dimension and floats scaled 1:100 in the other dimension, and couldn't figure out why my .within() queries were not returning results. It seems the query radius is currently uniform across both dimensions, so a value that might make sense for one of my dimensions did not make sense for the other!

It could be interesting to allow for one search radius in one dimension and another radius in the other dimension in case other fools like myself bump into this kind of situation. This challenge can of course be overcome by using a .range() query or by scaling one's data, but it could be nice to perform radius-driven queries without scaling one's data!

KDBush does not run in IE 10 or IE 11

Looks like all the code exported is in ES6 syntax. Currently I am running babel against the package to transpile it. Is there a way we can add this step to the build process so that the code consumed is in ES% format and can run cross browser?

Publish browserified builds on NPM

Wikipedia version of the algorithm is slow

https://stackoverflow.com/questions/29592546/floyd-rivest-vs-introselect-algorithm-performance says:

The original Floyd-Rivest paper gives an example implementation of their algorithm (this is the implementation listed on Wikipedia, at the time of writing this). However, this is a simplified version. From my tests, the simplified version is actually pretty slow! If you want a fast implementation, I think you'll need to implement the full algorithm. I recommend reading the paper "On Floyd and Rivest's SELECT algorithm" by Krzysztof C. Kiwiel. He gives a pretty good description of how to implement a fast Floyd-Rivest selection.

I also found a more correct version here:
https://softwareengineering.stackexchange.com/questions/284767/kth-selection-routine-floyd-algorithm-489

However, there may also be bugs. At least with FloydWirth_kth, something is wrong.

NN search

Similar to #3 but simpler in implementation since no priority queue is needed.

Behaviour when X wraps at 1/0

Hi,

First of all thanks for kdbush, it is indeed very fast! I have been browsing the code, and one question comes to mind: is there any provision for X (aka longitude) wraps at the 1/0 boundary?

Thanks,
Jeremy

Missing CJS

Hi,

Thank you for working on the project, I really appreciate it.

I was updating all my libraries as usual, but it seems like the new v4.0.1 doesn't work with Common JS anymore. Is this intentional or is it a bug? I had to roll back to version 3.0.0, which works as intended.

Thank you!

How to retrieve bounding boxes/generate bounding boxes

I am looking to serialize indexes into smaller chunks for better performance in a KV store.
If there are boxes at each tree level, I would like to be able to get the pre-configured bounding box coordinates from the tree root. Then I will construct a kdbush from just those coordinates.

I'm still not 100% clear on the rtree algorithm. If there is a node size of 16, does that mean that the first level has 16 leaves?

I would be happy to use a combination of flatbush and kdbush.

Fast theoretical insertion/removal

I've been thinking about implementing a fast algorithm for insertion and removal, which would allow this spatial index to be used in dynamic use cases. It might get a bit complicated, and it won't be as fast as optimal log(n), but I think it can be made at least log^2(n).

Let's look at a single point insertion. We need to reorder elements in our flat KD array to make it valid after addition of an item, but do this selectively, avoiding sorting the whole array again.

We have log(n) axis splits that partition the array into KD "nodes".
When we insert a new point, each split will either stay at its current index (if the number of items was odd), or increase by 1.
For each split, the new point will belong either to the left half, the right half, or become the new median.
Let's add the point to the end of the array. Either way, to rearrange the array with respect to the current split, we'll need a constant number of swaps involving the new point, the current median, and a point adjacent to the median (either on the left or on the right).
For subsequent splits of each half, we perform rearrangement recursively to account for a potential new point (divide & conquer).
For one axis step, we have to perform the search of point adjacent to the median (because they're not sorted), which is log(n).
The total complexity of insertion will be O(log(n)) + 2 * O(log(n/2)) + 4 * O(log(n/4)) + ... ~= O(log^2(n).

Does this sound feasible, or are some of my quick assumptions wrong?

cc @anandthakker @mikolalysenko

GeoJSON implementation - geojson-kdbush?

This library is awesome 🥇!

@mourner any interest of having a GeoJSON implementation, similar to geojson-rbush?

Let me know and I can aim to have it as minimalistic like your repos.

👍/ 👎 the idea and close this issue whenever you like.