dynatrace-oss / dynahist Goto Github PK
View Code? Open in Web Editor NEWDynaHist: A Dynamic Histogram Library for Java
License: Apache License 2.0
DynaHist: A Dynamic Histogram Library for Java
License: Apache License 2.0
Context, creating a Rust implementation. To keep the exercise finite/do-able within the resource and time constraints available, I intend that a 1.0 release:
While I haven't finished, right now it isn't clear to me what the role of null values is in a DynaHist histogram.
Related to this, there is no explicit statement(s) about what data types are supported in a DynaHist.
As best I can tell the supported data types are Java's Int, Long and IEEE 754 - excluding NaN, and the infinities.
In DynaHist null
is never a (de)serialized value.
Current code comments in src/dynahist/src/main/java/com/dynatrace/dynahist/serialization/SerializationReader.java:
Implementations should never return {@code null} except for the case {@code null} was really the serialized value.
We're curious to known the context for expecting null in a (de)serialized histogram.
Document a requirement that implementation MUST ensure that null is never a serialized value.
Likely a update to the (de)serialization comments:
Happy to draft a PR but the maintainers will likely find it quicker to add any such details themselves?
Are implementations free to use their own hash code calculation
I suppose this is under the rubric of a specification.
Congratulations @oertl on a very interesting and substantial advance to the state of the art.
Quite disappointing that there has not been more widespread adoption and implementation....:
Publish the Dynahist reference implementation.
Publish the Java reference implementation repository to for example the Journal of Open Source Software
Here is an example repository/publication and what the review process can look like.
I've reviewed previously and IMO the implementation is good to go - you can (transparently) suggest reviewers to the editor.
I'd be happy to review, however my PhD is in math-stats & option pricing, so perhaps reaching out to some of the people that implemented HDRHistogram and the Julia community would yield more feedback and interest?
Dear Dynahist team,
I have 100 billion elements (value between 0.75 and 1, can be any one, 4 digits, one element per line) stored in a file (100G), I want to plot a histogram of this file, bin size 0.005, starting from 0.75 to 1 and count number of elements fall in each bin. I did not find any other library/tool that can do the job. Exact counting is not possible because it requires a huge mount of memory. Since Dynahist is a Java library and I am not a java person at all. I think I have no choice but to do it myself using this library.
Want to hear your suggestions.
Thanks,
Jianshu
When executing gradlew check
I get the following test-failures.
This is on Ubuntu Linux, Java 11, German Locale via LANG=de_AT.UTF-8
org.junit.ComparisonFailure: expected:<-5[,50000000000000000E+00 - -5,]50000000000000000E+0...> but was:<-5[.50000000000000000E+00 - -5.]50000000000000000E+0...>
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at com.dynatrace.dynahist.examples.HistogramUsage.addSingleValue(HistogramUsage.java:67)
org.junit.ComparisonFailure: expected:<-5[,50000000000000000E+00 - -5,]50000000000000000E+0...> but was:<-5[.50000000000000000E+00 - -5.]50000000000000000E+0...>
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at com.dynatrace.dynahist.examples.HistogramUsage.addValueWithMultiplicity(HistogramUsage.java:83)
org.junit.ComparisonFailure: expected:< 0[,00000000000000000E+00 - 9,99999999999999900E-01 : *
1,00000000000000000E+03 - 9,99999999999999800E+03 : *****
1,00000000000000000E+04 - 7,]77237591081370300E+0...> but was:< 0[.00000000000000000E+00 - 9.99999999999999900E-01 : *
1.00000000000000000E+03 - 9.99999999999999800E+03 : *****
1.00000000000000000E+04 - 7.]77237591081370300E+0...>
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at com.dynatrace.dynahist.examples.ResponseTimeExample.mappingResponseTimes1(ResponseTimeExample.java:43)
org.junit.ComparisonFailure: expected:< 0[,00000000000000000E+00 - 9,99999999999999900E-01 : 14
1,00000000000000000E+00 - 9,99999999999999800E+00 : 114
1,00000000000000000E+01 - 9,99999999999999900E+01 : 924
1,00000000000000000E+02 - 9,99999999999999900E+02 : 6971
1,00000000000000000E+03 - 9,99999999999999800E+03 : 47866
1,00000000000000000E+04 - 9,]98000950924521900E+0...> but was:< 0[.00000000000000000E+00 - 9.99999999999999900E-01 : 14
1.00000000000000000E+00 - 9.99999999999999800E+00 : 114
1.00000000000000000E+01 - 9.99999999999999900E+01 : 924
1.00000000000000000E+02 - 9.99999999999999900E+02 : 6971
1.00000000000000000E+03 - 9.99999999999999800E+03 : 47866
1.00000000000000000E+04 - 9.]98000950924521900E+0...>
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at com.dynatrace.dynahist.examples.ResponseTimeExample.mappingResponseTimes2(ResponseTimeExample.java:65)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.