Giter VIP home page Giter VIP logo

order-statistics-trees's Introduction

Order Statistics Trees

STL-style implementation of in situ min-max heaps and order statistics trees.

Min-Max Heaps

Min-max heaps are complete binary trees with the following properties.

  • The smallest element of the heap can be found in the root node. Thus guaranteeing O(1) access to the minimum.
  • The largest element of the heap can be found in one of the child nodes of the root element. Thus the maximum can also be accessed in O(1).
  • Every level of the tree is labelled a min (even) or max (odd) level and the root node of a subtree starting at a min (max) level has the minimum (maximum) value of that subtree.
  • Insertion has a worst case complexity of O(log n).
  • Deletion has a worst case complexity of O(log n). In particular deleting the minimum or the maximum has a worst case complexity of O(log n).

Analoguos a max-min heap starts with the maximum value in the root and the minimum in one of the root node's child nodes.

This library defines the following operations on min-max heaps similar to the heap operations provided by the C++ Standard Library

Operation Min-Max Heap Max-Min Heap Min Heap (C++ Standard Library)
Create order_statistics::make_mm_heap(first, last) order_statistics::make_mm_heap(first, last, std::greater<>{}) std::make_heap(first, last)
Insert order_statistics::push_mm_heap(first, last) order_statistics::push_mm_heap(first, last, std::greater<>{}) std::make_heap(first, last)
Delete order_statistics::pop_mm_heap(first, last) order_statistics::pop_mm_heap(first, last, std::greater<>{}) std::make_heap(first, last)
Check order_statistics::is_mm_heap(first, last) order_statistics::is_mm_heap(first, last, std::greater<>{}) std::is_heap(first, last)

Constant Time Access to the Minimum Element

order_statistics::make_mm_heap(begin(container), end(container));

const auto min{*begin(container)};

Constant Time Access to the Maximum Element

// Expects that the heap has already been created.
assert(order_statistics::is_mm_heap(begin(container), end(container)));
// Expects that there are 2 child nodes. In case of a single child the comparison is not needed.
assert(std::distance(begin(container), end(container)) > 2);

const auto max{std::max(*(begin(container) + 1), *(begin(container) + 2))};

Logarithmic Time Insertion of a New Element

Append the new element to the container of the heap, then call order_statistics::push_mm_heap.

assert(order_statistics::is_mm_heap(begin(container), end(container) - 1));

order_statistics::push_mm_heap(begin(container), end(container));

Logarithmic Time Deletion of the Minimum Element

assert(order_statistics::is_mm_heap(begin(container), end(container)));

order_statistics::pop_mm_heap(begin(container), end(container));

assert(std::min_element(begin(container), end(container)) == (end(container) - 1));

Order Statistics Trees

An Order Statistics Tree is a data type that supports finding the element at rank k_i from a set of predefined ranks K = {k_i, i in 1 .. m | k_i < k_(i+1)} in O(1) (rank-query). It also supports finding all elements (unsorted) in the range [k_i, k_j], i < j in O(outputsize) (range-query).

This library provides the following operations to create, maintain and query order statistics trees in flat memory (in situ).

Operation Order Statistics Tree
Create order_statistics::make_order_statistics_tree(first, last, ranks_first, ranks_last)
Insert order_statistics::push_order_statistics_tree(first, last, ranks_first, ranks_last)
Delete order_statistics::pop_order_statistics_tree(first, last, ranks_first, ranks_last)
rank-query implicitly defined
range-query implicitly defined

In statistics there is also the concept of quantiles that divide the set of samples into equal intervals. Some of those quantiles are of special interest, e.g., Q1 (25th percentile, k_i = n/4), Q2 (median, 50th percentile, k_i = n/2), Q3 (75th percentile, k_i = n*3/4) to describe a distribution.

Rank-Query the Quantiles Q1, Q2 (Median), Q3 in Constant Time

const auto size{std::distance(begin(container), end(container))};
const typename container_type::iterator ranks[3]{
  begin(container) + size / 4, // Q1
  begin(container) + size / 2, // Q2
  begin(container) + size * 3 / 4 // Q3
};

order_statistics::make_order_statistics_tree(begin(container), end(container), begin(ranks), end(ranks));

// *ranks[0] is Q1
// *ranks[1] is Q2
// *ranks[2] is Q3

Range-Query all Elements Smaller than the Median in Linear Time

const auto size{std::distance(begin(container), end(container))};
const typename container_type::iterator ranks[1]{
  begin(container) + size / 2, // Median
};

order_statistics::make_order_statistics_tree(begin(container), end(container), begin(ranks), end(ranks));

std::for_each(begin(container), ranks[0], ...);

Implementation Details

Conceptually, an order statistics tree is a hybrid data structure that features one vector V[1..m] holding the elements of rank k_i and multiple min-max heaps H_0, ..., H_m where H_i holds the elements V[i] and V[i+1] with V[0] = -inf and V[m+1] = +inf.

To construct the min-max heaps, the order_statistics::make_mm_heap functions are used. To find the V[i] the function std::nth_element is used. The min-max heaps are already flat data structures, the elements V[i] are just placed in between those memory sequences.


References

  • Min-Max Heaps and Generalized Priority Queues (1986) by M.D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte
  • Minmaxheaps, Orderstatisticstrees and Their Application to the Coursemarks Problem (1985) by M.D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte

order-statistics-trees's People

Contributors

patschkowski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.