Giter VIP home page Giter VIP logo

go-hll's Introduction

HyperLogLog in golang. Docs. Build Status codecov

What

A go implementation of HypeLogLog data structure with a twist. See HyperLogLog in Practice paper by Stefan Heule, Marc Nunkesser, Alex Hall.

Ø-serialization

There is no need to serialize/deserialize hll. Everything is stored in a byte slice, which can be memory mapped, passed around over the network as is etc.

Differences from the paper:

  • sparse representation. this implementation does exact counting for small sets.
  • fixed memory usage (even for empty HLL). HLL of a given precision P uses fixed (8 + 3*2^(P-2), 8 byte header + 6 bits per register) size in bytes.
  • thresholds are tuned. different from Sub-Algorithm Threshold.

Why

I wanted an HLL implementation that is

  • simple
  • reasonably fast
  • (almost) non-allocating
  • exact when number of unique elements is small
  • memory-mapped file friendly
  • well tested (90+% coverage)

Usage

Get go-hll:

go get github.com/sasha-s/go-hll

Use it:

s, err := SizeByP(16)
if err != nil {
	log.Panicln(err)
}
h := make(HLL, s)
...
for _, x := range []string{"alpha", "beta"} {
	h.Add(siphash.Hash(2, 57, []byte(x)))
}
log.Println(h.EstimateCardinality())

Use good hash (otherwise accuracy would be poor). Some options:

Speed

Benchmark results on my MacBook Pro (Mid 2014).

Add-8            9.68ns ± 1%
Estimate-8       27.3µs ± 1%
Merge-8          38.0µs ± 1%

AddDense-8       6.73ns ± 3%
MergeDense-8     37.9µs ± 1%
EstimateDense-8  22.9µs ± 1%
Sort-8            108µs ± 1%
AddSparse-8      10.2ns ± 3%

Merge/Estimate etc. are done for P=14.

Other implementations (in no particular order)

go-hll's People

Contributors

sasha-s avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.