Giter VIP home page Giter VIP logo

rs_teardown_tree's Introduction

teardown_tree

docs crates travis

A BST (binary search tree) written in Rust that supports efficient query and teardown scenarios, i.e. the typical usage pattern is to build a master copy of the tree, then

  1. clone the master copy to a new tree
  2. tear the tree down with a series of delete-range operations (and do something with the retrieved items), interspersed with range queries
  3. rinse, repeat

Two data structures are currently implemented: TeardownTree and IntervalTeardownTree (an augmented Interval Tree), both with conventional Map and Set interfaces.

The tree does not use any kind of self-balancing and does not support insert operation.

Details

The tree is pointer-free, meaning that nodes do not store explicit pointers to their children. The only thing we store in a node is your data. This is similar to how binary heaps work: all nodes in the tree reside in an array, the root always at index 0, and given a node with index i, its left/right children are found at indices 2*i+1 and 2*i+2. Thus no dynamic memory allocation or deallocation is required. This makes it possible to implement a fast clone operation: instead of traversing the tree, allocating and copying each node individually, we are able to allocate the whole array in a single call and efficiently copy the entire content. The tree also supports a refill operation (currently only implemented for T: Copy), which copies the contents of the master tree into self without allocating at all.

As to delete-range operation, we use a custom algorithm running in O(k + log n) time, where k is the number of items deleted (and returned) and n is the initial size of the tree. Detailed description.

An exhaustive automated test for delete-range has been written and is found in lib.rs. I have tested all trees up to the size n=10. All the other supported operations have been tested extensively for every variation of the tree (we use slightly different algorithms for IntervalTree and for the filtered variants). In general, I feel the quality is already pretty good. If you find any bugs, please open an issue.

The library has been optimized for speed based on profiling. A significant amount of unsafe code is employed. Every occurrence of unsafe code has been carefully reviewed, most have been annotated with comments elaborating on safety.

Usage

As a library

Add to your Cargo.toml:

[dependencies]
teardown_tree = "0.6.6"

And to your crate's root:

extern crate teardown_tree;

To run the benchmarks

  1. Install Rust and Cargo (any recent version will do, stable or nightly).
  2. git clone https://github.com/kirillkh/rs_teardown_tree.git
  3. cd rs_teardown_tree/benchmarks
  4. cargo run --release

Performance

Complexity

All query and delete operations work in O(k + log n) time, where k is the number of items queried/deleted/returned and n is the initial size of the tree. Note that if you have already deleted m items, the next delete_range operation will still take O(n), not O(n-m) time as in many other data structures. However, this distinction only matters in practice when delete's are interspersed with insert's, and TeardownTree does not support insert's.

The amount of memory consumed by a TeardownTree built with n items, each of size s is n*s + n + 2*log_2(n) + z bytes, where z is a small constant. (The first term is the size of an array holding just your data; the second -- of an array of flags that are unset for removed items; the third -- of two auxiliary arrays used internally by the delete_range algorithm).

Benchmarks

TeardownTree vs other data structures: full refill/teardown cycle in bulks of 1000

The first set of benchmarks compares TeardownTree::delete_range() against:

  1. BTreeSet::remove() in Rust's standard library
  2. Treap
  3. SplayTree
  4. TeardownSet::delete(), which deletes a single element
  5. UnbalancedBST, a pointer-based Binary Search Tree that uses an efficient delete_range() algorithm

I made straightforward modifications to Treap and SplayTree in order to add support for delete_range, however BTreeSet lacks an equivalent operation (it has an O(log n) split, but not merge, see Rust #34666), therefore BTreeSet::remove() is used instead.

As the graph above shows, on my machine the whole refill/teardown sequence on a tree of 1,000,000 u64 items (we refill the tree with elements from the master copy, then delete 1000 items at a time until the tree is empty), is ~20 times faster with TeardownTreeSet<usize>::delete_range() than with BTreeSet<usize>::remove(). It also consumes 45% less memory.

Variations of TeardownTree: full refill/teardown cycle in bulks of 1000

Another set of benchmarks exposes the overhead of several variations of the data structure and its algorithms.

For details, please see the benchmarks page.

Drawbacks

  1. No insert operation (yet?..)
  2. The storage is not deallocated until the structure is dropped.
  3. Fine print regarding complexity: delete_range works in O(k + log n) time, where n is the initial size of the tree, not its current size (if you have already deleted m items, the next delete_range operation will still take O(log n) time in the worst case, not necessarily O(log(n-m))). The same applies to the other logarithmic operations.
  4. Performance is sensitive to the size of your data. Starting from a certain size, it is faster to use a TeardownSet<Box<(Key,Value)>> or store the key-value pairs separately and use external handles as keys (e.g. TeardownSet<INDEX_INTO_EXTERNAL_VEC> or TeardownSet<&MyKey>). It's probably a good idea to run some benchmarks to know what's best in your case.

rs_teardown_tree's People

Contributors

kirillkh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sbnair

rs_teardown_tree's Issues

Double free in mir will violate exception safety in this crate.

We detected several double free bugs in your crate via static analysis.
Double free will appear when these function unwind, mainly caused by Vec::from_raw_parts & mem::forget.
In Rust Mir, inserting code between Vec::from_raw_parts & mem::forget will violate exception safety. Because when these code unwind, the Vec generated will drop as well as the entity which ptr pointed to.

let slots = unsafe { Vec::from_raw_parts(self.slots, self.nslots, self.capacity) };

mem::forget(slots);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.