Giter VIP home page Giter VIP logo

wavelet-trie's Introduction

Wavelet trie

Build Status

A wavelet trie implementation in Rust, based on the paper by Grossi et al. [1] (link).

In short, it is a succinct data structure that allows fast exact or prefix-search on sequences of binary strings. Note that the strings have to be prefix-free, i.e., no string can be a prefix of another. Append a terminator symbol to each string to avoid this problem. This implementation provides some methods to ease working with the &str type.

Documentation and examples are coming up in the near future. Until then, take a look at the tests to see how to use it.

[1] Grossi, Roberto & Ottaviano, Giuseppe. (2012). The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. . 10.1145/2213556.2213586.

Examples

A simple example with strings:

extern crate wavelet_trie;

use wavelet_trie::wavelet_trie::WaveletTrie;

// create a wavelet trie with two strings:
let mut wt = WaveletTrie::new();
wt.append_str("Hello world!");
wt.append_str("Hello everybody!");

// See on which positions the strings start with "Hell"
assert_eq!(vec![0, 1], wt.select_all_str("Hell"));

// get the string at position 1
assert_eq!("Hello everybody!", wt.access_str(1).unwrap());

Creating a wavelet trie from 'static' binary content, and performing some queries:

extern crate dyn_bit_vec;
extern crate wavelet_trie;

use dyn_bit_vec::DBVec;
use wavelet_trie::wavelet_trie::WaveletTrie;

// We start with some binary sequences to put into the wavelet trie
let sequence1 = DBVec::from_bytes(&[0b00001000]);
let sequence2 = DBVec::from_bytes(&[0b10000000]);
let sequence3 = DBVec::from_bytes(&[0b10000100]);
let sequence4 = DBVec::from_bytes(&[0b11000100, 0b10000000]);

// Here we create a wavelet trie from these sequences
let wt = WaveletTrie::from_sequences(&[sequence1, sequence2, sequence3, sequence4]);

// There should be 4 sequences in the trie now:
assert_eq!(4, wt.len());

// Let's see at which positions prefix '001' occurs (should be 2 and 3).
let mut prefix_001 = DBVec::new();
prefix_001.push(false);
prefix_001.push(false);
prefix_001.push(true);
assert_eq!(vec![2, 3], wt.select_all(&prefix_001));

Features at this moment

  • Dynamic: insert or delete a string at any position
  • Fast (prefix) count
  • Fast (prefix) search

Features planned

  • Exact count & search
  • Helper methods to work with texts.
  • Range methods
  • Many optimisations!

A note on performance

Since this library heavily depends on the CPU feature POPCNT, it might be a good idea to include a .cargo/configfile in your project and enable certain compiler flags and boost performance. The commented-out example in this repository optimises the code for the CPU it is compiled on. This makes the resulting binary not very portable though.

See also https://doc.rust-lang.org/cargo/reference/config.html.

wavelet-trie's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

the-locksmith

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.