andylamp / bplustree Goto Github PK

An efficient, conscise, and simple implementation of a purely on-disk B+ Tree data structure

License: Apache License 2.0

Java 100.00%

bplustree btree disk java b-plus-tree bplus-tree on-disk

bplustree's Introduction

A Purely On-Disk Implementation of a B+ Tree

After quite a few hours that included developing the thing as well as testing it here is a (fully) functional implementation of a B+ Tree data structure purely on the disk. This was developed mostly for educational reasons that stemmed from the fact that I could not find a B+ Tree implementation that met the following:

Purely Disk based implementation
Used strict paging which the user could tune (256k, 1K, 4K etc)
Was a (Key, Value) storage not just for Keys (B-Tree)
Implemented deleting from the data structure
Supported duplicate entries (if required)
was adequately tested (so that I know it would work)

I came about needing to implement a B+ Tree for a side project of mine... and I was really surprised to see that there were no implementations that met the above requirements; not only that but I was not able to find a reference code/pseudocode that had a working delete implementation and support for duplicate keys. To my dismay even my trusty (and beloved) CLRS didn't include an implementation of a B+ Tree but only for the B-Tree with no duplicates and delete pseudocode (well, one could say that they gave the steps...but that's not the point).

It has to be noted that I could find some B+ Tree implementations that had delete and duplicate key support, mainly in open-source database projects. This unfortunately meant that these implementations were deeply integrated into their parent projects and as a result they were optimized for their particular use-cases. Finally the code bases where significantly larger (hence making the code reading/understanding much harder than it should be!).

So I went about to implement mine (cool stuff, many hours of head scratching were involved!) while also putting effort in creating this "tuned" down version which mainly cuts down on features for simplicity, enhanced code clarity and clutter reduction.

Ease of use features

As I said above this project was done mainly to create a working example of a B+ Tree data structure purely on the disk so the code is well commented (I think) and can be understood easily; that said... we have some "delicacies" that make working and testing this project a bit easier, which are outlined below

it uses maven, so most modern IDE's can import it without hassle...
it requires jdk v8 (for some parts, change them to have backwards support)
it uses a dedicated tester as well as JUnit tests
has an interactive menu that the user can individually perform the operations

Implementation details

Insertions

For insertions we use a modified version of the method provided by CLRS with modifications to support duplicate keys (which will be covered below). Complexity is not altered from the usual case and we require one pass down the tree as well.

Searching

This is assumed to be the most common operation on the B+ Tree; which support two modes of searching:

Singular Key searches
Range Queries

Here two distinct functions are used to cover these two cases:

searchKey is used for singular value searches (unique or not)
rangeSearch is used for performing range queries

Both of these methods require only one pass over the tree to find the results (if any). Additionally, since we store the keys in a sorted order we can exploit binary search to reduce the total node lookup time significantly. This is done along with a slight modification to the search algorithm to return the lower bound instead of failure.

Deletes

We again use one pass down the tree deletes but this is a bit more complex than the other two operations... it again requires only one pass to delete the key (if found) but as it descends down the tree it performs any redistribution/merging that needs to happen in order to keep our data structure valid.

Handling of duplicate keys

In order to avoid hurting the search performance functionality which is (assumed to be) the most common operation in a B+ Tree data structure the following scheme was implemented to support duplicate keys (at the cost of a bit more page reads).

The tree only actually indexes singular keys but each key has an overflow pointer which leads to its overflow page that has all of the duplicate entries stored as a linked list. If needed, multiple trailing overflow pages per key are created to accommodate for these extra insertions should they exceed the capacity of one page. The downside is that we use a bit more space per page as well as reads in order to read these overflow pages.

Page lookup table

To avoid moving around things too much we keep each page into a free page pool that has all of the available pages so far; this in turn let's us create an index very fast without having to pay costly reads if we wanted to have a clustered tree (although we again use more space, usually).

Payload sizes

Due to inherent limitations of how B+ Tree is designed to work the payload contained within each entry must be of a specific size. This can be configured at initialization but cannot be changed after creation.

This means that if the payload is less than the entry size it will be padded with whitespaces to fill in the bucket whereas if it is more it will be truncuated to the size of the bucket.

License

This work, at its current version, is licensed under the Apache 2.0 license.

Final words

Hopefully I'll create a GitHub page for this... where I explain my implementation a bit more but until then this will suffice! Oh and I really hope this implementation is clear and concise enough so that it can make the notions of B+ Trees crystal clear!

bplustree's People

Contributors

Stargazers

Watchers

bplustree's Issues

Licensing for C# port

I'm looking to port this to C# for use in a commercial product. I'm happy to release the ported code, but I'm curious if you would mind allowing me to release it with the MIT license? Or possibly also to consider licensing this code as MIT? Unfortunately, the GPL is an unpopular set of licenses with our legal department.

GPL or Apache License 2.0?

Hi,
I am interested in this implementation.
I'd like to know which License does this library use? The readme.md says it is GPLv3 while it is Apache License 2.0 on the github repo.

According to https://www.apache.org/legal/resolved.html#category-x, if you use GPLv3, then for a project which use Apache 2.0 license, the project can not use your library because they are not compatible..

For example, if Apache Cassandra/ Apache IoTDB databases want to use this implemention, they are not allowed.

I have a question for you!Can you help me?

Hello, have a great weekend! at 1590 lines of code
the ds.bplus.bptreehandleLeafNodeRedistributionOrMerging.BPlusTree#handleLeafNodeRedistributionOrMerging method，I don't quite understand this condition，
code:

int snum = canRedistribute(splitNode);
if (snum > 0) {
  code of redistributing next node with elements from split
}

method canRedistribute is to determine whether a node will still be greater than or equal to M / 2-1 after removing a Key. However, the condition for entering the current method is that the number of keys of the node must be less than or equal to M / 2-1,So I don't think this greater than 0 condition is going to hold, What do you think?I am looking forward to your reply! (^_^)

Because my English is not very good,So the above English is translated by me using software,please forgive me if I have incorrect grammar or offended you! 😀

call out limitations of the library in README

Thanks for this library. Just a note that you should also call out the limitations of this library. e.g., the library truncates values stored into the b+ tree.

private String conditionString(String s) {
        if(s == null) {
            s = " ";
            //System.out.println("Cannot have a null string");
        }

        if(s.length() > conf.getEntrySize()) {
            System.out.println("Satellite length can't exceed " +
                    conf.getEntrySize() + " trimming...");
            s = s.substring(0, conf.getEntrySize());
        } else if(s.length() < conf.getEntrySize()) {
            //System.out.println("Satellite length can't be less than" +
            //        conf.getEntrySize() + ", adding whitespaces to make up");
            int add = conf.getEntrySize() - s.length();
            for(int i = 0; i < add; i++) {s = s + " ";}
        }
        return(s);
    }

as a user I feel better if I know the limitations upfront vs. discovering them by surprise which makes me feel cheated.

Can you explain the structure of the tree?

I don't understand several page types，for example，"overflow page" and "lookup page overflow page"? Also, I don't quite understand the parameters of class BPlusConfiguration.

make unique an option for the whole tree

I think whether a tree is unique or not. But currently, I find that in every insertion we can choose to be unique or not. This is not a common choice.

Nice!

I just wanted to say that i had a look at your code and i wanted to thank you!
Good job you did there, cost you some time for sure. I love it. Couldnt help myself but reading everything. (Docs n sht)

Anyways 1 issue though... the randomaccesfile object sucks for writing xD. The seeks cost to much 2! But i fixed that by just caching the nodes in a linkedhashmap for N seconds, untill another thread cleans it up if its out of time.

Didn't enjoy a project this much in a while :3

potential performance gain

Hey, maybe you can try out some techniques like writing int / double etc into a byte array and then write it into the file rather than invoking treeFile.writeInt/writeDouble etc for several times. For my project, such an optimization saves 90% time.

Why only positive keys are allowed?

In your project, it says:
@throws NumberFormatException is thrown when we have an invalid key value (we only allow >= 0 as keys

Why?

bug in writeNode?

In the writeNode of TreeLeaf, there is this code:

        if(this.isRoot()) {
            r.seek(conf.getHeaderSize()-8L);
            r.writeLong(getPageIndex());
        }

But in BPlusTree.writeFileHeader:

        treeFile.seek(0L);
        treeFile.writeInt(conf.getHeaderSize());
        treeFile.writeInt(conf.getPageSize());
        treeFile.writeInt(conf.getEntrySize());
        treeFile.writeInt(conf.getKeySize());
        treeFile.writeLong(totalTreePages);
        treeFile.writeLong(maxPageNumber);
        treeFile.writeLong(root.getPageIndex());
        treeFile.writeLong(this.firstPoolNextPointer);

I believe conf.getHeaderSize()-8L is where firstPoolNextPointer lives and it should be:

        if(this.isRoot()) {
            r.seek(conf.getHeaderSize()-16L);
            r.writeLong(getPageIndex());
        }

Thread safety?

hey @andylamp great work here!

Is this tree thread safe?