Giter VIP home page Giter VIP logo

bplustree's Introduction

bplustree


Maven Central
codecov

Status: beta

Disk based B+-tree in java using memory mapped files (size limited only by available disk space).

Features

  • size only limited by available disk
  • supports range queries
  • optionally supports duplicate keys
  • much faster read and write than H2 file-based database (because no transactions and different persistence model).

Requirements

  • fast read time for range queries by time and key
  • fast insert time
  • single node implementation (not distributed)
  • use memory-mapped files for speed
  • fixed size keys
  • variable size values
  • very large size storage (>2GB of keys or values)
  • optimized for insert in approximate index order
  • single threaded
  • no transactions
  • delete not supported (?)

Getting started

Add this to your pom.xml:

<dependency>
  <groupId>com.github.davidmoten</groupId>
  <artifactId>bplustree</artifactId>
  <version>VERSION_HERE</version>
</dependency>

Example

Lets create a file based index of timestamped strings (for example lines from a log). Timestamps don't have to be unique.

BPlusTree<Long, String> tree = 
  BPlusTree 
    .file()
    .directory(indexDirectory)
    .maxLeafKeys(32)
    .maxNonLeafKeys(8)
    .segmentSizeMB(1)
    .keySerializer(Serializer.LONG)
    .valueSerializer(Serializer.utf8())
    .naturalOrder();
    
// insert some values    
tree.insert(1000L, "hello");
tree.insert(2000L, "there");

// search the tree for values with keys between 0 and 3000
// and print out key value pairs
tree.findEntries(0, 3000).forEach(System.out.println);

// search the tree for values with keys between 0 and 3000
// and print out values only
tree.find(0, 3000).forEach(System.out.println);

Duplicate keys

Duplicate keys are allowed by default. You can force overwrite of keyed values by setting .unique(false) in the builder.

Note that for efficiency values with duplicate keys are entered into the tree in reverse insert order so to extract the values retaining insert order a special method is used:

tree.findOrderPreserving(0, 3000);

Using bplustree for String keys

Suppose you want to create a B-+ tree with String keys and those keys can have effectively arbitrary length. Keys are stored as fixed size records (unlike values which can be arbitrary in length). You can use hashes to get good find performance and keep the keys small (4 bytes of hash code) by making a tree of type:

BPlusTree<Integer, StringWithValue> tree = ...

So you insert the String hashcode in the key and combine the String with the value. You find records using the hashcode of the String key and then filter the results based on an exact match of the String component of StringAndValue.

Design

B+-tree index is stored across multiple files (of fixed size). Pointers to values are stored in the tree and the values are stored across a separate set of files (of fixed size).

A LargeByteBuffer abstracts access via Memory Mapped Files to a set of files (ByteBuffer only offers int positions which restricts size to 2GB, LargeByteBuffer offers long positions with no effective limit of size (apart from available disk)).

bplustree's People

Contributors

davidmoten avatar dependabot[bot] avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.