Giter VIP home page Giter VIP logo

interval-tree's Introduction

com.dean.interval-tree

This library provides a collection of data structures implemented using a modular, extensible, foldable, weight balanced persistent binary tree: ordered-sets, ordered-maps, interval-sets, and interval-maps.

tests Clojars Project

Usage

To install, add the following dependency to your project or build file:

[com.dean/interval-tree "0.1.2"]

Public API

The public api resides in the top-level com.dean.interval-tree.core namespace:

(require '[com.dean.interval-tree.core :as dean])

The basic operation of this library is as a drop-in replacement for clojure.core/sorted-set and clojure.core/sorted-map.

Constructors

  • (dean/ordered-set coll)
  • (dean/ordered-set-by pred coll)
  • (dean/ordered-map coll)
  • (dean/ordered-map-by pred coll)
  • (dean/interval-set coll)
  • (dean/interval-map coll)

Topics

What is an Interval Map?

Imagine you'd like to associate values with members of a set of intervals over some continuous domain such as time or real numbers. An example of this is shown below. An interval map answers the question, which intervals overlap at some point on the domain. At 3.14159, in this case, would be x4 and x7. The interval map is sparse itself, of course, and would only need to contain the 8 constituent intervals.

 x8:                         +-----+
 x7:                   +-----------------------------------+
 x6:                                                       +
 x5:                                     +-----------+
 x4: +-----------------------------+
 x3:                                                 +-----+
 x2:                         +-----------------+
 x1:       +-----------+

     0=====1=====2=====3=====4=====5=====6=====7=====8=====9

This corresponds to the following example code:

(def x (dean/interval-map {[1 3] :x1
                           [4 7] :x2
                           [8 9] :x3
                           [0 5] :x4
                           [6 8] :x5
                           [9 9] :x6
                           [3 9] :x7
                           [4 5] :x8}))

(x 3.141592654) ;; =>  [:x4 :x7]
(x [5 5])       ;; =>  [:x4 :x7 :x8 :x2]

(get x 9)       ;; =>  [:x7 :x3 :x6]
(get x 9.00001) ;; =>  nil
(get x [1 4])   ;; =>  [:x4 :x1 :x7 :x8 :x2]

Efficient Set Operations

This library implements a diverse collection of efficent set operations on foldably parallel ordered sets:

  (def foo (shuffle (range 500000)))
  (def bar (shuffle (range 1000000)))

  (def s0 (shuffle (range 0 1000000 2)))
  (def s1 (shuffle (range 0 1000000 3)))

;;
;;; dean/ordered-set
;;

  (time (def x (ordered-set foo)))         ;; 500K: "Elapsed time: 564.248517 msecs"
  (time (def y (ordered-set bar)))         ;;   1M: "Elapsed time: 1187.734211 msecs"

  (time (def s (dean/intersection
                 (ordered-set s0)
                 (ordered-set s1))))       ;; 833K: "Elapsed time: 1242.961445 msecs"

  (time (r/fold + + y))                    ;;   1M: "Elapsed time: 54.363545 msecs"


;;
;;; clojure.core/sorted-set
;;

  (time (def v (into (sorted-set) foo)))   ;; 500K: "Elapsed time: 839.188189 msecs"
  (time (def w (into (sorted-set) bar)))   ;;   1M: "Elapsed time: 1974.798286 msecs"

  (time (def s (clojure.set/intersection
                 (into (sorted-set) s0)
                 (into (sorted-set) s1)))) ;; 833K: "Elapsed time: 1589.786106 msecs"

  (time (r/fold + + w))                    ;;   1M: "Elapsed time: 167.916539 msecs"

Testing

Testing is accomplished with the standard lein test

$ time lein test

lein test com.dean.interval-tree.interval-map-test

lein test com.dean.interval-tree.interval-set-test

lein test com.dean.interval-tree.interval-test

lein test com.dean.interval-tree.ordered-map-test

lein test com.dean.interval-tree.ordered-set-test

lein test com.dean.interval-tree.tree-test

Ran 30 tests containing 98214 assertions.
0 failures, 0 errors.

real     5m34.487s
user    10m21.397s
sys      0m5.047s

Modularity

This data structure library is designed around the following concepts of modularity and extensibility.

Clojure/Java Interfaces

The top level collections are built on the standard Clojure/Java interfaces, so, for example, working with an ordered-set is identical to working with Clojure's sorted-set, using all of the same standard collection functions, for the 99% case: meta, nth, seq, rseq, assoc(-in), get(-in), invoke, compare, to-array, empty, .indexOf, .lastIndexof, size, iterator-seq, first, last, =, count, empty, contains, conj. disj, cons, fold, and many old friends will just work, using an efficient implementation that takes full advantage of the capabilities of our underlying tree index.

PExtensibleset

An exception to the above is due to the fact that clojure.set does not provide interfaces for extensible sets. So, we provide our own intersection, union, difference, subset, and superset. These operators work most efficiently on com.dean.interval-tree collections and provide support for backward interoperability with clojure (or possibly other) set datatypes.

Root Container

The individual collection types (ordered-set, ordered-map, interval-set, interval-map} are defined by their individual Class (clojure deftype) of top level container that holds the root of an indexed tree. This container describes the behavior of the underlying tree data structure along several architectural dimensions.

INodeCollection

The fundamental collection of nodes provides an interface to node allocation machinery and to the root contained node. A variant based on persistent (on-disk) storage, for example, has been built with customizations at this layer.

IBalancedCollection

For functional balanced trees, provides an interface to the stitch function that returns a new, properly balanced tree containing one newly allocated node adjoined. The provided algorithm is weight balanced however others may be used. We've experimented with red-black trees, in particular, as variants at this layer.

IOrderedCollection

Ordered collections define a comparator and predicates to determine the underlying algorithmic compatibility of other collections. Interval Collections are a special type of OrderedCollection.

Tree

The heart of the library is our persistent tree.

The code is well documented and explains in more detail the efficiencies of the internal collection operators.

This species of binary tree supports representations of sets, maps, and vectors. In addition to indexed key and range query, it supports the nth operation to return nth node from the beginning of the ordered tree, and node-rank to return the rank (sequential position) of a given key within the ordered tree, both in logarithmic time.

The axes of exstensibility of the tree implemntation (*compare*,*n-join*, *t-join*) correspond to the interfaces described above.

Inspiration

This implementation of a weight-balanced binary interval-tree data structure was inspired by the following:

License

The use and distribution terms for this software are covered by the Eclipse Public License 1.0, which can be found in the file LICENSE.txt at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.