Giter VIP home page Giter VIP logo

go-tdigest's Introduction

T-Digest

A fast map-reduce and parallel streaming friendly data-structure for accurate quantile approximation.

This package provides an implementation of Ted Dunning's t-digest data structure in Go.

GoDoc Go Report Card

Project Status

This project is actively maintained. We are happy to collaborate on features and issues if/when they arrive.

Installation

This package uses go modules. Our releases are tagged and signed following the Semantic Versioning scheme.

go get github.com/caio/go-tdigest/v4

Example Usage

package main

import (
	"fmt"
	"math/rand"

	"github.com/caio/go-tdigest/v4"
)

func main() {
	// Analogue to tdigest.New(tdigest.Compression(100))
	t, _ := tdigest.New()

	for i := 0; i < 10000; i++ {
		// Analogue to t.AddWeighted(rand.Float64(), 1)
		t.Add(rand.Float64())
	}

	fmt.Printf("p(.5) = %.6f\n", t.Quantile(0.5))
	fmt.Printf("CDF(Quantile(.5)) = %.6f\n", t.CDF(t.Quantile(0.5)))
}

Configuration

You can configure your digest upon creation with options documented at options.go. Example:

// Construct a digest with compression=200 and its own
// (thread-unsafe) RNG seeded with 0xCA10:
digest, _ := tdigest.New(
        tdigest.Compression(200),
        tdigest.LocalRandomNumberGenerator(0xCA10),
)

References

This is a port of the reference implementation with some ideas borrowed from the python version. If you wanna get a quick grasp of how it works and why it's useful, this video and companion article is pretty helpful.

go-tdigest's People

Contributors

caio avatar christineyen avatar dgryski avatar emfree avatar ianwilkes avatar mikecoop83 avatar nathanleclaire avatar vmihailenco avatar zdylag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

go-tdigest's Issues

Safe to share tdigest object across goroutines?

I have a use case where I am spawning several producer goroutines that will be emitting latency metrics. Is it safe to expose a single tdigest object globally and have it be written to by each of these producer goroutines? I'm concerned about possible data races and also if there will be significant overhead due to producers waiting to acquire locks to write to the tdigest. Thanks!

Considerations for v3

These are the things planned so far:

  • Make API consistent (single type for integers, single type for floats, compression configuration (accepts uint, sets float))
  • Change default rng to be a local one (ref #29)
  • Change internal counts to uint64 from uint32

TDigest.Compress bug?

t.Summary = newSummary(estimateCapacity(t.Compression), oldTree.Resolution)

# missing
t.Count = 0

t.Add also does t.Count += count, so t.Count is doubled on each TDigest.Compress

Update() leaks goroutines

The AVL iteration loop (spawned from t.summary.Iter()) creates a goroutine; terminating the loop early (via break) leaves a goroutune blocked on sending to the current iteration channel.

There is no way I can see with the current AVL library to terminating an iteration early; perhaps switching to a different key/value data structure or implementation would help.

Next release considerations

Working on #12 I stumbled on several things I would like to change which would trigger api breakage (forcing a 2.0.0 release):

  • Remove the centroid abstraction: it's useless in its current encantation, however it's exposed publicly
  • Get rid of all explicit panic()s (the most annoying one is gone on avl-wip already)
  • Configurable RNG (mostly to benefit testing)

Other things that would be nice to work on and don't necessarily force a major release:

  • (Dropped from v2: doesn't seem worth it) More deterministic tests: currently running them with -run TestName yields different (but still correct of course) results when compared with running them all.
  • Test different distributions: requires writing some code or adding test-only dependencies
  • (Dropped from v2: blocked) Sync up merge formats after the reference implementation changes (issue87)
  • Investigate prefix sum optimization ideas: using a binary indexed tree or a plain cache should improve speed noticeably without a big memory cost: d89c9f2

multiple-value tdigest.New() in single-value context

I am trying to run the Example Usage code with the purpose of understanding how the data-structure works but it throws up multiple-value tdigest.New() in single-value context error.
why would this be happening?

Awful quantile estimation on small sample size

Following the discussion on PR #11 I've isolated a tiny pathological sample:

t := New(10)
data := []float64{0, 279, 2, 281}
for _, f := range data {
    t.Add(f, 1)
}
fmt.Println(t.Quantile(0.25))

Which yields -67.7500. In hindsight, this outrageously wrong number should have raised a major red flag in my head already, but when comparing to Java this gets even more obviously wrong:

public class Ughhh {
    public static void main(String[] args) {
        TDigest t = new AVLTreeDigest(10);
        Arrays.asList(0, 279, 2, 281).forEach(t::add);
        System.out.println(t.quantile(0.25));
    }
}

Prints out 1.0 ๐Ÿ˜… (Note: before the min/max patch on tdunning/t-digest@89bb394 it yields 1.5).

Go Modules

Do you plan to add Go Modules support
When I am using this module, it is showing it as incompatible module

github.com/caio/go-tdigest v3.1.0+incompatible

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.