tidwall / rtree Goto Github PK

An R-tree implementation for Go

License: MIT License

Go 100.00%

rtree's Introduction

rtree

This package provides an in-memory R-Tree implementation for Go. It's designed for Tile38 and is optimized for fast rect inserts and replacements.

Usage

Installing

To start using rtree, install Go and run go get:

$ go get -u github.com/tidwall/rtree

Basic operations

// create a 2D RTree
var tr rtree.RTree

// insert a point
tr.Insert([2]float64{-112.0078, 33.4373}, [2]float64{-112.0078, 33.4373}, "PHX")

// insert a rect
tr.Insert([2]float64{10, 10}, [2]float64{20, 20}, "rect")

// search 
tr.Search([2]float64{-112.1, 33.4}, [2]float64{-112.0, 33.5}, 
 	func(min, max [2]float64, data interface{}) bool {
		println(data.(string)) // prints "PHX"
	},
)

// delete 
tr.Delete([2]float64{-112.0078, 33.4373}, [2]float64{-112.0078, 33.4373}, "PHX")

Support for Generics (Go 1.18+)

// create a 2D RTree
var tr rtree.RTreeG[string]

// insert a point
tr.Insert([2]float64{-112.0078, 33.4373}, [2]float64{-112.0078, 33.4373}, "PHX")

// insert a rect
tr.Insert([2]float64{10, 10}, [2]float64{20, 20}, "rect")

// search 
tr.Search([2]float64{-112.1, 33.4}, [2]float64{-112.0, 33.5}, 
 	func(min, max [2]float64, data string) bool {
		println(data) // prints "PHX"
	},
)

// delete 
tr.Delete([2]float64{-112.0078, 33.4373}, [2]float64{-112.0078, 33.4373}, "PHX")

Support for generic numeric types, like int, float32, etc.

// create a 2D RTree
var tr rtree.RTreeGN[float32, string]

// insert a point
tr.Insert([2]float32{-112.0078, 33.4373}, [2]float32{-112.0078, 33.4373}, "PHX")

// insert a rect
tr.Insert([2]float32{10, 10}, [2]float32{20, 20}, "rect")

// search 
tr.Search([2]float32{-112.1, 33.4}, [2]float32{-112.0, 33.5}, 
 	func(min, max [2]float32, data string) bool {
		println(data) // prints "PHX"
	},
)

// delete 
tr.Delete([2]float32{-112.0078, 33.4373}, [2]float32{-112.0078, 33.4373}, "PHX")

Algorithms

This implementation is a variant of the original paper:
R-TREES. A DYNAMIC INDEX STRUCTURE FOR SPATIAL SEARCHING

Inserting

Similar to the original paper. From the root to the leaf, the rects which will incur the least enlargment are chosen. Ties go to rects with the smallest area.

Added to this implementation: when a rect does not incur any enlargement at all, it's chosen immediately and without further checks on other rects in the same node. Also added is all child rectangles in every node are ordered by their minimum x value. This can dramatically speed up searching for intersecting rectangles on most modern hardware.

Deleting

A target rect is searched for from root to the leaf, and if found it's deleted. When there are no more child rects in a node, that node is immedately removed from the tree.

Searching

Same as the original algorithm.

Splitting

This is a custom algorithm. It attempts to minimize intensive operations such as pre-sorting the children and comparing overlaps & area sizes. The desire is to do simple single axis distance calculations each child only once, with a target 50/50 chance that the child might be moved in-memory.

When a rect has reached it's max number of entries it's largest axis is calculated and the rect is split into two smaller rects, named left and right. Each child rects is then evaluated to determine which smaller rect it should be placed into. Two values, min-dist and max-dist, are calcuated for each child.

min-dist is the distance from the parent's minumum value of it's largest axis to the child's minumum value of the parent largest axis.
max-dist is the distance from the parent's maximum value of it's largest axis to the child's maximum value of the parent largest axis.

When the min-dist is less than max-dist then the child is placed into the left rect. When the max-dist is less than min-dist then the child is placed into the right rect. When the min-dist is equal to max-dist then the child is placed into an equal bucket until all of the children are evaluated. Each equal rect is then one-by-one placed in either left or right, whichever has less children.

Finally, sort all the rects in the parent node of the split rect by their minimum x value.

License

rtree source code is available under the MIT License.

rtree's People

Contributors

Stargazers

Watchers

rtree's Issues

Is rbang threadsafe, or planned to be threadsafe?

I tried to use https://github.com/Workiva/go-datastructures/tree/master/rtree, but it seems buggy when you are changing some members' coordinates; deleting and inserting doesn't seem to work correctly.

However, it claims that the RTree can be used threadsafe. From looking at the source, I think rbang is not threadsafe. Is that correct? Any plans to make it threadsafe?

Best practice regarding different (0,0) origins

I have a graphics lib, where the origin (0,0) is at the top/left, and rects use a top/left and width/height notation.

rbang origin (0,0) is bottom/left and uses a bottom/left, top/right notation.

So, when inserting rects from the graphics lib, I use code like this to update the same rectangle:

r = rectangle
xll = x lower-left
yll = y lower-left
xur = x upper-right
your = y upper-right

rbang.Replace(
	[2]float64{
		float64(r.xll),
		float64(r.yll)},
	[2]float64{
		float64(r.xur),
		float64(r.yur)},
	r,
	[2]float64{
		float64(r.GetLeft()),
		float64(r.GetTop())},
	[2]float64{
		float64(r.GetLeft() + r.GetWidth()),
		float64(r.GetTop() + r.GetHeight())}, r)

The problem is, that I have a very simple three rects layout (root, two childs with 50% width and 100% height of root), where a search with a point inside c2, returns the root node and c1... and not the root node and c2.

Replacing the root node with a bigger size than before, re-inserts the root node and grows the Rtree.

If the size of the root node, which is the first node inserted, changes size in that the new size is bigger than the old one the delete won't delete because tr.root.contains(&item) returns false and hence is truein the code below.

Because the Replace function is just a Delete and Insert, this leads to a bug where the root node is now inserted once again. So after the Replace we have RTree.Len() + 1 , which is wrong. RTree.Len() has to be the same before and after the Replace.

https://github.com/tidwall/rbang/blob/ecaccd2e0367c5dd8e735fe40d6780dfa38f7806/rbang.go#L370-L375

Is this library thread safe?

I am wondering if I can do parallel inserts into this and whether it'll lead to any issues.

Performing deep copy?

I have a potentially expensive rtree query operation. Since the rtree is not thread safe I currently use a mutex to perform this.
Since the rtree is being continually updated this has a measured performance impact.
Is it possible to get a deep copy of the rtree, or some other 'thread safe' access to a fixed view of the tree for searching? Or is this what the shallow copy actually achieves?

a few performance notes

Hi! I came here via a twitter post about performance. I was curious, so I poked around a bit for fun. A few minor comments, in the hopes that they are useful.

First, BenchmarkRandomInsert can be a bit misleading. If you make it run faster, it gets to a higher b.N, but at higher b.N, each iteration has to do more work. Thus code improvements don't necessarily show up in improved benchmark results. Picking a fixed number of iterations to do each time would be better. Perhaps like:

func BenchmarkRandomInsert(b *testing.B) {
	const iters = 10000
	boxes := randBoxes(iters)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		var tr RTree
		for iter := 0; iter < iters; iter++ {
			tr.Insert(boxes[iter].min[:], boxes[iter].max[:], iter)
		}
	}
}

Second, I can get a ~15% speedup on BenchmarkRandomInsert on my machine by unrolling the loops in chooseLeastEnlargement. Here's the 3d version:

// min is a sloppy float min function.
// It is faster than math.Min, at the cost
// of sloppy handling of the usual
// floating point delights: NaN and Inf.
func min(x, y float64) float64 {
	if x < y {
		return x
	}
	return y
}

// max is like min, only max.
func max(x, y float64) float64 {
	if x < y {
		return y
	}
	return x
}

func (r *box) chooseLeastEnlargement(b *box) int {
	j, jenlargement, jarea := -1, 0.0, 0.0
	n := r.data.(*node)
	for i := 0; i < n.count; i++ {
		area := (n.boxes[i].max[0] - n.boxes[i].min[0]) *
			(n.boxes[i].max[1] - n.boxes[i].min[1]) *
			(n.boxes[i].max[2] - n.boxes[i].min[2])

		enlargedArea := (max(b.max[0], n.boxes[i].max[0]) - min(b.min[0], n.boxes[i].min[0])) *
			(max(b.max[1], n.boxes[i].max[1]) - min(b.min[1], n.boxes[i].min[1])) *
			(max(b.max[2], n.boxes[i].max[2]) - min(b.min[2], n.boxes[i].min[2]))

		enlargement := enlargedArea - area

		if j == -1 || enlargement < jenlargement || (enlargement == jenlargement && area < jarea) {
			j, jenlargement, jarea = i, enlargement, area
		}
	}
	return j
}

IMHO, it also improves readability a bit. Of course, it makes your code generation more complicated.