caio / go-tdigest Goto Github PK
View Code? Open in Web Editor NEWmirror of https://caio.co/de/go-tdigest/
Home Page: https://caio.co/de/go-tdigest/
License: MIT License
mirror of https://caio.co/de/go-tdigest/
Home Page: https://caio.co/de/go-tdigest/
License: MIT License
I am trying to run the Example Usage code with the purpose of understanding how the data-structure works but it throws up multiple-value tdigest.New() in single-value context
error.
why would this be happening?
Do you plan to add Go Modules support
When I am using this module, it is showing it as incompatible module
github.com/caio/go-tdigest v3.1.0+incompatible
t.Summary = newSummary(estimateCapacity(t.Compression), oldTree.Resolution)
# missing
t.Count = 0
t.Add also does t.Count += count, so t.Count is doubled on each TDigest.Compress
It looks like original Java implementation writes 2 doubles after the encoding code. But Go port have nothing like that. Am I right to assume that those 2 encodings are incompatible and there is nothing to do about it?
These are the things planned so far:
Seems like there are some optimizations in https://github.com/honeycombio/go-tdigest that might be nice. I'm not sure if @ianwilkes was planning on upstreaming them.
The AVL iteration loop (spawned from t.summary.Iter()
) creates a goroutine; terminating the loop early (via break) leaves a goroutune blocked on sending to the current iteration channel.
There is no way I can see with the current AVL library to terminating an iteration early; perhaps switching to a different key/value data structure or implementation would help.
Working on #12 I stumbled on several things I would like to change which would trigger api breakage (forcing a 2.0.0 release):
avl-wip
already)Other things that would be nice to work on and don't necessarily force a major release:
-run TestName
yields different (but still correct of course) results when compared with running them all.Following the discussion on PR #11 I've isolated a tiny pathological sample:
t := New(10)
data := []float64{0, 279, 2, 281}
for _, f := range data {
t.Add(f, 1)
}
fmt.Println(t.Quantile(0.25))
Which yields -67.7500
. In hindsight, this outrageously wrong number should have raised a major red flag in my head already, but when comparing to Java this gets even more obviously wrong:
public class Ughhh {
public static void main(String[] args) {
TDigest t = new AVLTreeDigest(10);
Arrays.asList(0, 279, 2, 281).forEach(t::add);
System.out.println(t.quantile(0.25));
}
}
Prints out 1.0
๐
(Note: before the min/max patch on tdunning/t-digest@89bb394 it yields 1.5
).
The issue #12 leads me to believe there might be even more pathological scenarios, so it's a good idea to take a close look at https://github.com/tdunning/t-digest/tree/master/quality and see what can be learned and reused.
I have a use case where I am spawning several producer goroutines that will be emitting latency metrics. Is it safe to expose a single tdigest object globally and have it be written to by each of these producer goroutines? I'm concerned about possible data races and also if there will be significant overhead due to producers waiting to acquire locks to write to the tdigest. Thanks!
Try adding fmt.Println(digest.CDF(0.71875))
to the end of TestGammaDistribution. I'd expect a number between 0 and 1 (closer to 1) but i get 49999.091465195546
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.