Giter VIP home page Giter VIP logo

bbhash's Introduction

Build Status

BBHash

BBHash is a simple library for building minimal perfect hash function. It is designed to handle large scale datasets. The function is just a little bit larger than other state-of-the-art libraries, it takes approximately 3 bits / elements (compared to 2.62 bits/elem for the emphf lib), but construction is faster and does not require additional memory.

It is easy to include in other projects (just include a single .h file) and has no dependencies.

Citation

A preprint paper is available on arXiv: https://arxiv.org/abs/1702.03154

Usage

Here is a simple example showing how to build and query a mphf with input keys in a std::vector<u_int64_t> . Input keys can also be read from a disk file, or from some user-defined iterator.

 #include "BooPHF.h"
 //tells bbhash to use included hash function working on u_int64_t input keys :
 typedef boomphf::SingleHashFunctor<u_int64_t>  hasher_t;
 typedef boomphf::mphf<  u_int64_t, hasher_t  > boophf_t;
 
std::vector<u_int64_t> input_keys;
//
... fill the input_keys vector
//build the mphf  
boophf_t * bphf = new boomphf::mphf<u_int64_t,hasher_t>(input_keys.size(),input_keys,nthreads);
 
 //query the mphf :
 uint64_t  idx = bphf->lookup(input_keys[0]);

Types supported

The master branch works with Plain Old Data types only (POD). To work with other types, use the "alltypes" branch (it runs slighlty slower). The alltypes branch includes a sample code with strings. The "internal_hash" branch allows to work with types that do not support copy or assignment operators.

How to run test

A sample usage is provided in file example.cpp, compile and run with: ( params are nb_elements nb_threads)

make
./example 100000000 1

File Bootest.cpp contains more options, use ./Bootest with -check to check correctness of the hash function, and -bench to benchmark lookup performance.

Here is a sample output :

$./Bootest 100000000 8 -check -bench
key file generated 
Construct a BooPHF with  100000000 elements  
[Building BooPHF]  100  %   elapsed:   0 min 31 sec   remaining:   0 min 0  sec
BooPHF constructed perfect hash for 100000000 keys in 30.95s
Bitarray       305808384  bits (99.49 %)   (array + ranks )
final hash       1557024  bits (0.51 %) (nb in final hash 4634)
boophf  bits/elem : 3.073654
 --- boophf working correctly --- 
bench lookups  sample size 9999872 
BBhash bench lookups average 258.06 ns +- stddev  5.55 %   (fingerprint 5000111281850410) 

Authors

Guillaume Rizk, Antoine Limasset, Rayan Chikhi

[email protected]

bbhash's People

Contributors

malfoy avatar pierrepeterlongo avatar rchikhi avatar rizkg avatar rob-p avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.