Giter VIP home page Giter VIP logo

limousine's Introduction

Limousine

Rust Latest Version

limousine is an exploration into the world of hybrid key-value stores. Traditional indices -- such as BTrees -- have been optimized for decades, offering consistent performance for both inserts and reads. On the other hand, learned indices, which leverage statistical models to approximate the locations of keys, bring massive benefits in memory usage and read performance. However, these newer structures come with their own set of trade-offs; most notably there isn't a canonical or efficient algorithm for performing inserts.

This project experiments with hybrid key-value stores -- data structures which consist of a combination of traditional BTree nodes and learned, statistical model nodes. The goal is to harness the strengths of both indexing methods, in addition to improving the state of the art for learned index insertion. While developing efficient and mutable hybrid indexes is an active area of research, this crate offers a fully-functioning prototype, capable of turning a layout specification into a working design.

Most of our work with learned indexes was inspired by PGM Index.

This is mostly a prototype project. Although the generated key-value stores are functional and fairly efficient, they lack many features such as efficient removal of entries, batch inserts, multithreaded insertion, transactions, etc. Eventually, we hope that we are able to implement these features, however there are a variety of challenges associated with dynamic code generation of novel data structures and this is still an active area of research.

Overview

limousine_engine provides a procedural macro to automatically generate an hybrid key-value store design consisting of both classical and learned components.

As of the current version, learned components are not yet fully supported.

use limousine_engine::prelude::*;

create_kv_store! {
    name: ExampleStore,
    layout: [
        btree_top(),
        pgm(epsilon = 8),
        pgm(epsilon = 8),
        btree(fanout = 32),
        btree(fanout = 32, persist),
        btree(fanout = 64, persist)   
    ]
}

To generate a design, we provide a name for the structure and a layout description which consists of a stack of components. For instance in the above example, the key-value store consists of a base layer of on-disk BTree nodes of fanout 64, underneath a
layer of on on-disk BTree nodes with fanout 32, underneath an in-memory layer of BTree nodes with fanout 32. On top of this, we have two in-memory PGM learned layers with epsilon parameters of 8, and a tiny in-memory BTree as a top layer.

Since learned components are not yet fully supported, the above example will not compile. To get a working key-value store in the current version, we should only use BTree components.

use limousine_engine::prelude::*;

create_kv_store! {
    name: ExampleStore,
    layout: [
        btree_top(),
        btree(fanout = 8),
        btree(fanout = 8),
        btree(fanout = 32),
        btree(fanout = 32, persist),
        btree(fanout = 64, persist)   
    ]
}

We can then use these generated data structures to perform queries:

// Load the first two layer of the index from memory
let index: ExampleStore<u128, u128> = ExampleStore::open("data/index")?;

index.insert(10, 50)?;
index.insert(20, 60)?;
index.insert(30, 70)?;
index.insert(40, 80)?;

assert_eq!(index.search(10)?, Some(50));

limousine's People

Contributors

dependabot[bot] avatar levkruglyak avatar mpekala23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

limousine's Issues

MVP for persistent components

  • The macro should support the persist attribute, ensure that persistent layers are always at the bottom
  • The macro should materialize different method signatures depending on if any persistent layers were specified
  • There should be some common methods for serialization and deserialization, perhaps similar to LinkedList
  • Should decide on a serialization format and directory structure for persistent indexes
  • Working Persistent BTree Internal and Base layers

Automated cargo publish

  • Script should increment version, update dependencies of limousine_engine
  • Automated testing for compatibility, not just in individual crates, this should be a blocker for commits to main.

Macro can read layout files

Currently, the macro only supports expressions of the form:

create_hybrid_index! {
    name: ExampleHybridIndex,
    layout: [
        btree_top(),
        pgm(epsilon = 4),
        btree(fanout = 32),
        btree(fanout = 32),
        btree(fanout = 1024, persist),
    ]
}

We would also like to support expressions of the form:

create_hybrid_index! {
    name: ExampleHybridIndex,
    layout: include!("example.layout")
}

or

create_hybrid_index! {
    name: ExampleHybridIndex,
    layout: "example.layout"
}

Simplify Code and Unit Test

  • Reduce the overall complexity of the code base for milestone 1, not sacrificing performance significantly.
  • Verify that all components are working properly, adding explicit guarantees alongside unit tests
  • Simplification might involve splitting up the project into multiple crates, especially independent components like storage implementations, data structures like the stackmap, etc.
  • Provide documentation for each individual component, as well as how they fit together.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.