Giter VIP home page Giter VIP logo

hdrhistogram's Introduction

HdrHistogram

Gitter Java CI Javadocs

HdrHistogram: A High Dynamic Range (HDR) Histogram

This repository currently includes a Java implementation of HdrHistogram. C, C#/.NET, Python, Javascript, Rust, Erlang, and Go ports can be found in other repositories. All of which share common concepts and data representation capabilities. Look at repositories under the HdrHistogram organization for various implementations and useful tools.

Note: The below is an excerpt from a Histogram JavaDoc. While much of it generally applies to other language implementations as well, some details may vary by implementation (e.g. iteration and synchronization), so you should consult the documentation or header information of the specific API library you intend to use.


HdrHistogram supports the recording and analyzing of sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

The HdrHistogram package includes the Histogram implementation, which tracks value counts in long fields, and is expected to be the commonly used Histogram form. IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial.

HdrHistogram is designed for recording histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. AbstractHistogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.

A common use example of HdrHistogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can include computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording analysis techniques, this example can maintain a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint of around 185KB (see "Footprint estimation" below).

Histogram variants and internal representation

The HdrHistogram package includes multiple implementations of the AbstractHistogram class:

  • Histogram, which is the commonly used Histogram form and tracks value counts in long fields.
  • IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are being tracked).
  • AtomicHistogram and SynchronizedHistogram (see 'Synchronization and concurrent access' below)

Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating point number representation: Using an exponent a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and numberOfSignificantValueDigits controlling resolution.

Synchronization and concurrent access

In the interest of keeping value recording cost to a minimum, the commonly used Histogram class and it's IntHistogram and ShortHistogram variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally synchronize and/or order their access, or use the ConcurrentHistogram, AtomicHistogram, or SynchronizedHistogram or variants.

A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or reporting operations are needed, consider using the Recorder and SingleWriterRecorder recorder variants that were specifically designed for that purpose. Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive histograms such that recording remains wait-free in the presence of accurate and stable interval sampling.

It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread non-synchronized histograms or SingleWriterRecorders, and use a summary/reporting thread to perform histogram aggregation math across time and/or threads.

Iteration

Histograms support multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. The iteration mechanisms are accessible through the HistogramData available through getHistogramData(). Iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set, and are available for the default (corrected) histogram data set via the following HistogramData methods:

  • percentiles: An Iterable<HistogramIterationValue> through the histogram using a PercentileIterator
  • linearBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LinearIterator
  • logarithmicBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LogarithmicIterator
  • recordedValues: An Iterable<HistogramIterationValue> through the histogram using a RecordedValuesIterator
  • allValues: An Iterable<HistogramIterationValue> through the histogram using a AllValuesIterator

Iteration is typically done with a for-each loop statement. E.g.:

 for (HistogramIterationValue v :
      histogram.getHistogramData().percentiles(ticksPerHalfDistance)) {
     ...
 }

or

 for (HistogramIterationValue v :
      histogram.getRawHistogramData().linearBucketValues(unitsPerBucket)) {
     ...
 }

The iterators associated with each iteration method are resettable, such that a caller that would like to avoid allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's hasNext() and next() methods.

So to avoid allocating a new iterator object for each iteration loop:

 PercentileIterator iter =
    histogram.getHistogramData().percentiles().iterator(ticksPerHalfDistance);
 ...
 iter.reset(percentileTicksPerHalfDistance);
 for (iter.hasNext() {
     HistogramIterationValue v = iter.next();
     ...
 }

Equivalent Values and value ranges

Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. HdrHistogram provides methods for determining the lowest and highest equivalent values for any given value, as well as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid a double-counting count).

Corrected vs. Raw value recording calls

In order to support a common use case needed when histogram values are used to track response time distribution, Histogram provides for the recording of corrected histogram value by supporting a recordValueWithExpectedInterval() variant is provided. This value recording form is useful in [common latency measurement] scenarios where response times may exceed the expected interval between issuing requests, leading to "dropped" response time measurements that would typically correlate with "bad" results.

When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter, recorded histogram data will reflect an appropriate number of additional values, linearly decreasing in steps of expectedIntervalBetweenValueSamples, down to the last value that would still be higher than expectedIntervalBetweenValueSamples.

To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples at 1 msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1 msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1 msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec.

Data sets recorded with and without an expectedIntervalBetweenValueSamples parameter will differ only if at least one value recorded with the recordValue method was greater than its associated expectedIntervalBetweenValueSamples parameter. Data sets recorded with an expectedIntervalBetweenValueSamples parameter will be identical to ones recorded without it if all values recorded via the recordValue calls were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters.

When used for response time characterization, the recording with the optional expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced.

Footprint estimation

Due to its dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and numberOfSignificantValueDigits combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by its data value recording counts array. The total footprint can be conservatively estimated by:

 largestValueWithSingleUnitResolution =
        2 * (10 ^ numberOfSignificantValueDigits);
 subBucketSize =
        roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 expectedHistogramFootprintInBytes = 512 +
      ({primitive type size} / 2) *
      (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
      subBucketSize

A conservative (high) estimate of a Histogram's footprint in bytes is available via the getEstimatedFootprintInBytes() method.

hdrhistogram's People

Contributors

ahothan avatar anoncoderonline avatar chrisvest avatar donnerbart avatar franz1981 avatar giltene avatar gitter-badger avatar jerrinot avatar krystiannowak avatar leecampbell avatar marshallpierce avatar matthurne avatar mattwarren avatar mikeb01 avatar mkosmul avatar nitsanw avatar njmsaikat avatar obourgain avatar peterfaiman avatar peterm0x avatar reiz avatar rkuhn avatar sbtourist avatar sounie avatar stig avatar the-alchemist avatar tobi5775 avatar trask avatar vladimir-bukhtoyarov avatar ygree avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdrhistogram's Issues

getTotalValueToThisValue() gone from HistogramIterationValue

The method HistogramInterationValue.getTotalValueToThisValue() seems to be gone. (Parts of my code made good use of it.)

Is this deliberate or a bug? The JavaDoc for HistogramInterationValue still mentions the corresponding property.

If it is deliberate, is there a recommended suggestion for refactoring its use?

I've used this method to produce "moving mean" graphs, plotting totalValueAtThisValue/totalCountAtThisValue over valueIteratedTo. This gives a rather informative graph showing how the mean is influenced by outliers in the data.

ArrayIndexOutOfBounds from outputPercentileDistribution on overflowed Histogram

I am trying to measure the latency of claiming objects from an object pool that I am writing. I have built a benchmark program, and construct my Histogram like this:

AbstractHistogram histogram = new Histogram(3000, 3);

Each individual operation is measured, and the numbers are recorded in the histogram, as well as with some other code I use for reporting these things. My old reporting code gives me the following output:

trials = 1492000
period-ms = 500
ops-sec = 2984000
lat-max = 1
lat-mean = 0.000361
lat-min = 0
lat-stddev = 0.202970

I think the stddev is calculated wrong, though. Then I call hasOverflowed on the histogram, and print a warning if it has overflowed.

Warning: Histogram overflow!

... which apparently it has, in this case. Then I ask the histogram to output a percentile distribution like this:

HistogramData data = histogram.getHistogramData();
data.outputPercentileDistribution(System.out, 1, 1.0);

and I get the following output

Value, Percentile, TotalCountIncludingThisValue

       0.000 0.000000000000    1491160
       0.000 0.500000000000    1491160
       0.000 0.750000000000    1491160
       0.000 0.875000000000    1491160
       0.000 0.937500000000    1491160
       0.000 0.968750000000    1491160
       0.000 0.984375000000    1491160
       0.000 0.992187500000    1491160
       0.000 0.996093750000    1491160
       0.000 0.998046875000    1491160
       0.000 0.999023437500    1491160
       0.000 0.999511718750    1491160
       1.000 0.999755859375    1491698
       1.000 0.999877929688    1491698
       1.000 0.999938964844    1491698
       1.000 0.999969482422    1491698
       1.000 0.999984741211    1491698
       1.000 0.999992370605    1491698
       1.000 0.999996185303    1491698
       1.000 0.999998092651    1491698
       1.000 0.999999046326    1491698
java.lang.ArrayIndexOutOfBoundsException
  at org.HdrHistogram.AbstractHistogramIterator.next(AbstractHistogramIterator.java:108)
  at org.HdrHistogram.PercentileIterator.next(PercentileIterator.java:20)
  at org.HdrHistogram.HistogramData.outputPercentileDistribution(HistogramData.java:301)
  at stormpot.benchmark.Bench.report(Bench.java:76)
  at stormpot.benchmark.Benchmark.trial(Benchmark.java:128)
  at stormpot.benchmark.Benchmark.trial(Benchmark.java:120)
  at stormpot.benchmark.Benchmark.runBenchmark(Benchmark.java:89)
  at stormpot.benchmark.Benchmark.run(Benchmark.java:78)
  at stormpot.benchmark.Main.main(Main.java:47)

A bit suspect how the value 1, which has been determined to be the max, doesn't include all the trials at the 0.99975 percentile.

I am using the latest master version at this time of writing: 94d7c04

Java HDR histogram getMinValue() returns Long.MAX_VALUE on empty histogram.

In a Java version of HDR histogram, a call to getMinValue() on an empty histogram, i.e. one with no recorded values, yields Long.MAX_VALUE. Is this the intended behaviour, or should the getMinValue() return zero?

This is using version 2.1.4 from Maven Central.

Following test fails:

@Test
public void emptyHistogramReturnsZeroAsMinumumValue() {
    Histogram histogram = new org.HdrHistogram.Histogram(2);

    assertThat(histogram.getTotalCount(), is(0L));
    assertThat(histogram.getMinValue(), is(0L));
}

Output:

java.lang.AssertionError: 
Expected: is <0L>
 but: was <9223372036854775807L>

Histograms not equal after re-creation

I was moving the data from histogram to simple arrays and back, using this code:

      AbstractHistogram.AllValues values = histogram1.allValues();
      ArrayList<Long> ranges = new ArrayList<>();
      ArrayList<Long> counts = new ArrayList<>();
      for (HistogramIterationValue value : values) {
         if (value.getCountAddedInThisIterationStep() > 0) {
            ranges.add(value.getValueIteratedTo());
            counts.add(value.getCountAddedInThisIterationStep());
         }
      }

and then

      AbstractHistogram histogram2 = new org.HdrHistogram.Histogram(maxValue, digits);
      for (int i = 0; i < ranges.length; ++i) {
         histogram2.recordValueWithCount(ranges[i], counts[i]);
      }

(note that the ArrayLists are converted into long[])

Sometimes histogram1.equals(histogram2) returns false. All the records are there (since histogram1.getTotalCount() == histogram2.getTotalCount()), but histograms were not equal.
The highest trackable value and number of significant digits is always the same.
Is there anything I am doing wrong, or is this a bug? I use version 1.2.1.

HistogramLogProcessor fails with ArrayIndexOutOfBoundsException

[error] Exception in thread "HistogramLogProcessor" java.lang.ArrayIndexOutOfBoundsException: The other histogram includes values that do not fit in this histogram's range.
[error]     at org.HdrHistogram.AbstractHistogram.add(AbstractHistogram.java:567)
[error]     at org.HdrHistogram.HistogramLogProcessor.run(HistogramLogProcessor.java:207)

The logfile was created using a HistogramLogWriter from four lightly filled interval histograms.

build error

Getting a compile error on master, w/eclipse and also from cmd line -- any suggestions?

TIA!

btmacpro:HdrHistogram btorpey$ ant
Buildfile: /Users/btorpey/miscinfo/blog/code/HdrHistogram/build.xml

init:

clean.module.hdrhistogram:
[delete] Deleting directory /Users/btorpey/miscinfo/blog/code/HdrHistogram/out/production/HdrHistogram

clean:

compile.module.hdrhistogram.production:
[mkdir] Created dir: /Users/btorpey/miscinfo/blog/code/HdrHistogram/out/production/HdrHistogram
[javac] /Users/btorpey/miscinfo/blog/code/HdrHistogram/build.xml:133: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 34 source files to /Users/btorpey/miscinfo/blog/code/HdrHistogram/out/production/HdrHistogram
[javac] /Users/btorpey/miscinfo/blog/code/HdrHistogram/src/main/java/org/HdrHistogram/HistogramLogProcessor.java:50: error: cannot find symbol
[javac] public static final String versionString = "Histogram Log Processor version " + Version.version;
[javac] ^
[javac] symbol: variable Version
[javac] location: class HistogramLogProcessor
[javac] 1 error

BUILD FAILED
/Users/btorpey/miscinfo/blog/code/HdrHistogram/build.xml:133: Compile failed; see the compiler error output for details.

Total time: 1 second

Serialization problem

Hi,

I have a problem when de-serializing a Histogram instance. In the following test the last assert fails:

import org.HdrHistogram.Histogram;
import org.junit.Test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.util.Random;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotEquals;
import static org.junit.Assert.assertTrue;

public class HistogramTest {

    public static final int LATENCY_RECORD_COUNT = 5000;
    public static final int MAX_LATENCY = 30000;

    private final Random random = new Random();

    @Test
    public void testHistogramSerialization() throws Exception {
        Histogram original = new Histogram(MAX_LATENCY, 4);
        populateHistogram(original);

        // serialize
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream outputStream = new ObjectOutputStream(byteArrayOutputStream);
        outputStream.writeObject(original);
        byte[] bytes = byteArrayOutputStream.toByteArray();

        // de-serialize
        ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
        ObjectInputStream inputStream = new ObjectInputStream(byteArrayInputStream);
        Histogram read = (Histogram) inputStream.readObject();

        assertEquals(original, read);
        assertTrue(original.equals(read));
        assertNotEquals(original.hashCode(), read.hashCode());
        assertEquals(original.getNeededByteBufferCapacity(), read.copy().getNeededByteBufferCapacity());
        assertEquals(original.getNeededByteBufferCapacity(), read.getNeededByteBufferCapacity());
    }

    private void populateHistogram(Histogram original) {
        for (int i = 0; i < LATENCY_RECORD_COUNT; i++) {
            original.recordValue(random.nextInt(MAX_LATENCY));
        }
    }
}

As it seems the wordSizeInBytes is not restored correctly, it's 0 instead of 8.

Is this a bug or am I doing something wrong?

Sporadic ArrayIndexOutOfBoundsException even with tight-ranged data

We use HdrHistogram to store microsecond-resolution/precision latency data for our messaging subsystem. We measure every step of our processing pipeline and record all results in a histogram that gets polled for data and reset every 10 seconds.
We occasionally see AIOOBE upon recording values. At first we attributed those occurences to incorrect values for numberOfSignificantValueDigits and highestTrackableValue but testing with all ranges of combinations did not yield any positive effect.
Is it possible that HdrHistogram cannot reliably handle continuous values (due to the way it internaly stores them)?

support negative values

It would be very useful for me if it were possible to construct a histogram that accepted negative values.

gnuplot histogram question

I'd love to be able to plot a histogram of latencies against SLA, similar to what you use in your presentations, and preferably using gnuplot.

However, maybe I'm dense, but I can't make sense of the format of the SLA data in the gnuplot example. How would I define my own SLA values? Any pointers would be much appreciated!

Assertion error using small histograms

Aborts due to an assertion error

$ clang -c hdr_histogram.c
$ clang++ --std=c++11 bug.cc hdr_histogram.o
$ ./a.out
a.out: hdr_histogram.c:56: int32_t counts_index(struct hdr_histogram *, int32_t, int32_t): Assertion `bucket_index == 0 || (sub_bucket_index >= h->sub_bucket_half_count)' failed.
Aborted (core dumped)

Here is my test case

$ cat bug.cc

include <iostream>

include "hdr_histogram.h"

int main(int,char*) {
hdr_histogram
h;
hdr_init(0, 64*1024, 2, &h);
hdr_record_value(h, 10);

return 0;
}

Version of HDRHistogram

commit e05791e
MD5 (src/hdr_histogram.c) = d1a3b058710948967c6f74f169bc7cf6

Note I modified hdr_histogram.h to add "extern C" as follows:
#ifdef __cplusplus
extern "C" {
#endif

More about my environment

$ uname -a
Linux vagrant-ubuntu-trusty-64 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ clang -v
Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ clang++ -v
Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4)
Target: x86_64-pc-linux-gnu
Thread model: posix

HistogramLogProcessor does not support IntCountsHistogram

When i try to parse the log file with HistogramLogProcessor, I get the this error. I see in the code only DoubleHistogram and Histogram are supported by the LogProcessor. Is there a workaround to this? Is there any other way to read these histogram files?

Exception in thread "HistogramLogProcessor" java.lang.IllegalArgumentException: The buffer's encoded value byte size (4) does not match the Histogram's (8)
at org.HdrHistogram.AbstractHistogram.decodeFromByteBuffer(AbstractHistogram.java:1778)
at org.HdrHistogram.AbstractHistogram.decodeFromCompressedByteBuffer(AbstractHistogram.java:1886)
at org.HdrHistogram.Histogram.decodeFromCompressedByteBuffer(Histogram.java:252)
at org.HdrHistogram.EncodableHistogram.decodeFromCompressedByteBuffer(EncodableHistogram.java:55)
at org.HdrHistogram.HistogramLogReader.nextIntervalHistogram(HistogramLogReader.java:232)
at org.HdrHistogram.HistogramLogReader.nextIntervalHistogram(HistogramLogReader.java:136)
at org.HdrHistogram.HistogramLogProcessor.run(HistogramLogProcessor.java:181)

Compilation error

Hello and thanks for this tool.

I'm trying to build from source, but I get the following error.

[ERROR] ./git/HdrHistogram/src/main/java/org/HdrHistogram/AbstractHistogram.java:[1799,36] error: getEncodingCookie() has private access in AbstractHistogram
[ERROR] ./git/HdrHistogram/src/main/java/org/HdrHistogram/AbstractHistogram.java:[1800,40] error: getV0EncodingCookie() has private access in AbstractHistogram
[ERROR] ./git/HdrHistogram/src/main/java/org/HdrHistogram/AbstractHistogram.java:[1850,17] error: fillCountsArrayFromSourceBuffer(ByteBuffer,int,int) has private access in AbstractHistogram

And this is the environment:
-Xmx4g -XX:MaxPermSize=256m
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

I can fix it locally extending the visibility of those methods, but I thought it was worth making you aware of this.

Thanks,
Ivan Valeriani

LogWriter/Reader have mixed notions of start vs. base time

In HistogramLogWriter we can optionally set a baseTime which will lead to following histograms logged being attributed to a relative timestamp. This base time is set independently from the recommended logging of start time.
On the reader side it is assumed that start time IS base time (see

if (scanner.hasNext("#\\[StartTime:")) {
):

            if (scanner.hasNext("\\#.*")) {
                // comment line
                if (scanner.hasNext("#\\[StartTime:")) {
                    scanner.next("#\\[StartTime:");
                    if (scanner.hasNextDouble()) {
                        startTimeSec = scanner.nextDouble(); // start time represented as seconds since epoch
                    }
                }
                scanner.nextLine();
                continue;
            }

            if (scanner.hasNext("\"StartTimestamp\".*")) {
                // Legend line
                scanner.nextLine();
                continue;
            }

            // Decode: startTimestamp, intervalLength, maxTime, histogramPayload

            final double offsetStartTimeStampSec = scanner.nextDouble(); // Timestamp start is expect to be in seconds
            final double absoluteStartTimeStampSec = getStartTimeSec() + offsetStartTimeStampSec;

            final double intervalLengthSec = scanner.nextDouble(); // Timestamp length is expect to be in seconds
            final double offsetEndTimeStampSec = offsetStartTimeStampSec + intervalLengthSec;
            final double absoluteEndTimeStampSec = getStartTimeSec() + offsetEndTimeStampSec;

            final double startTimeStampToCheckRangeOn = absolute ? absoluteStartTimeStampSec : offsetStartTimeStampSec;

            if (startTimeStampToCheckRangeOn < rangeStartTimeSec) {
                scanner.nextLine();
                continue;
            }

            if (startTimeStampToCheckRangeOn > rangeEndTimeSec) {
                return null;
            }

Start time and base time can be set multiple times in the process of reading/writing a log.
I can see the value of reporting relative timestamps, I can also see the value of reporting the log or application start time. The current state leads to confusing scenarios where an application which writes the start time (as recommended by the docs) and goes on to log histograms with the timestamps set will end up with a log that is written with absolute timestamps but read with relative timestamps.
To prevent confusion and displeasure I suggest we set the baseTime when logging start time, and log the start time when setting base time. This will at least reduce confusion. Alternatively we could use a different mechanism to determine base time (i.e a different prefix for the reader to detect) and use it to communicate the base time setting.
Finally, in the absence of a base time the reader should still be able to filter relative ranges in the log, and HistogramLogProcessor should expose that option (currently the default but broken for logs containing the start time).

What does HistogramIterationValue.valueIteratedTo mean?

OK, let's make a DoubleHistogram covering a dynamic range of 10^4 to 3 decimal places.

user=> (def d (org.HdrHistogram.DoubleHistogram. 1e4 3))
#'user/d

Let's throw a few small numbers into that DoubleHistogram.

user=> (def xs [3 5 7 11 13])
#'user/xs
user=> (doseq [x xs] (.recordValue d x))
nil

Percentiles behave like you'd expect:

user=> (.getValueAtPercentile d 0)
3.0
user=> (.getValueAtPercentile d 50)
7.001953125
user=> (.getValueAtPercentile d 100)
13.005859375

But let's say I wanted to iterate over the ranges in this histogram:

user=> (->> d .recordedValues .iterator iterator-seq pprint)
(#<HistogramIterationValue valueIteratedTo:1536, prevValueIteratedTo:0, countAtValueIteratedTo:1, countAddedInThisIterationStep:1, totalCountToThisValue:1, totalValueToThisValue:1536, percentile:20.0, percentileLevelIteratedTo:20.0>
 #<HistogramIterationValue valueIteratedTo:2561, prevValueIteratedTo:1536, countAtValueIteratedTo:1, countAddedInThisIterationStep:1, totalCountToThisValue:2, totalValueToThisValue:4097, percentile:40.0, percentileLevelIteratedTo:40.0>
 #<HistogramIterationValue valueIteratedTo:3585, prevValueIteratedTo:2561, countAtValueIteratedTo:1, countAddedInThisIterationStep:1, totalCountToThisValue:3, totalValueToThisValue:7682, percentile:60.0, percentileLevelIteratedTo:60.0>
 #<HistogramIterationValue valueIteratedTo:5635, prevValueIteratedTo:3585, countAtValueIteratedTo:1, countAddedInThisIterationStep:1, totalCountToThisValue:4, totalValueToThisValue:13316, percentile:80.0, percentileLevelIteratedTo:80.0>
 #<HistogramIterationValue valueIteratedTo:6659, prevValueIteratedTo:5635, countAtValueIteratedTo:1, countAddedInThisIterationStep:1, totalCountToThisValue:5, totalValueToThisValue:19974, percentile:100.0, percentileLevelIteratedTo:100.0>)

What's up with valueIteratedTo? I'd expect numbers in the range [3, 13], but instead they're:

user=> (def hdr-xs (->> d .recordedValues .iterator iterator-seq (map #(.getValueIteratedTo %))))
user=> (prn hdr-xs)
(1536 2561 3585 5635 6659))

Comparing the DoubleHistogram values to the real values shows they're all off by a little over 512.

user=> (map (comp float /) hdr-xs xs)
(512.0 512.2 512.1429 512.2727 512.2308)

... which is, coincidentally, the IntegerToDoubleValueConversionRatio for d:

user=> (.getIntegerToDoubleValueConversionRatio d)
0.001953125
user=> (float 1/512)
0.001953125

Is this actually the correct behavior and I just misunderstood the API?

I've been trying to trace the code path through AbstractHistogramIterator--which is aware of IntegerToDoubleValueConversionRatio but doesn't seem to actually use it. OTOH, this might arise from DoubleHistogram.highestEquivalentValue. Not quite sure. Any thoughts?

DoubleHistogram doesn't set autoResize on underlying AbstractHistogram

If an empty DoubleHistogram is created with autoResize set to true and then this is used to merge other DoubleHistograms into it, it throws an ArrayOutOfBoundsException from the underlying AbstractHistogram. Given that the DoubleHistogram is set to auto-resize I would think the underlying AbstractHistogram should auto-resize as well?

Code to reproduce:

package sandbox;

import org.HdrHistogram.DoubleHistogram;

public class HDRHistoTest {

    public static void main(String[] args) {

        DoubleHistogram histo1 = new DoubleHistogram(3);
        histo1.setAutoResize(true);
        histo1.recordValue(6.0);
        histo1.recordValue(1.0);
        histo1.recordValue(5.0);
        histo1.recordValue(8.0);
        histo1.recordValue(3.0);
        histo1.recordValue(7.0);
        DoubleHistogram histo2 = new DoubleHistogram(3);
        histo2.setAutoResize(true);
        histo2.recordValue(9.0);
        DoubleHistogram histo3 = new DoubleHistogram(3);
        histo3.setAutoResize(true);
        histo3.recordValue(4.0);
        histo3.recordValue(2.0);
        histo3.recordValue(10.0);

        DoubleHistogram merged = new DoubleHistogram(3);
        merged.setAutoResize(true);
        merged.add(histo1);
        merged.add(histo2);
        merged.add(histo3);
        System.out.println("SUCCESS!");
    }

}

This code throws the ArrayOutOfBoundsException at merged.add(histo1);

Concurrent write resize issue

Continued from https://gist.github.com/marshallpierce/9e22df2be9c9f42ab875 and https://twitter.com/giltene/status/547905010470641664.

With HdrHistogram @ 3f34467, it's now much less frequent. Most of the jmh runs complete.

With the (default) config:

# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 3 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.mpierce.metrics.reservoir.hdrhistogram.HdrHistogramReservoirJmh.readWhileRecording

I'm seeing only two failures instead of all but two runs failing.

# Run progress: 40.00% complete, ETA 00:04:52
# Fork: 5 of 10
# Warmup Iteration   1: <failure>

java.lang.IndexOutOfBoundsException: index 2688
    at java.util.concurrent.atomic.AtomicLongArray.checkedByteOffset(AtomicLongArray.java:65)
    at java.util.concurrent.atomic.AtomicLongArray.lazySet(AtomicLongArray.java:137)
    at org.HdrHistogram.ConcurrentHistogram.resize(ConcurrentHistogram.java:265)
    at org.HdrHistogram.AbstractHistogram.handleRecordException(AbstractHistogram.java:428)
    at org.HdrHistogram.AbstractHistogram.recordSingleValue(AbstractHistogram.java:418)
    at org.HdrHistogram.AbstractHistogram.recordValue(AbstractHistogram.java:331)
    at org.HdrHistogram.Recorder.recordValue(Recorder.java:98)
    at org.mpierce.metrics.reservoir.hdrhistogram.HdrHistogramReservoir.update(HdrHistogramReservoir.java:58)
    at org.mpierce.metrics.reservoir.hdrhistogram.HdrHistogramReservoirJmh.recordMeasurements(HdrHistogramReservoirJmh.java:28)
    at org.mpierce.metrics.reservoir.hdrhistogram.generated.HdrHistogramReservoirJmh_readWhileRecording.recordMeasurements_thrpt_jmhStub(HdrHistogramReservoirJmh_readWhileRecording.java:167)
    at org.mpierce.metrics.reservoir.hdrhistogram.generated.HdrHistogramReservoirJmh_readWhileRecording.readWhileRecording_Throughput(HdrHistogramReservoirJmh_readWhileRecording.java:118)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:198)
    at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:180)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

and

# Run progress: 80.00% complete, ETA 00:01:25
# Fork: 9 of 10
# Warmup Iteration   1: <failure>

java.lang.IndexOutOfBoundsException: index 2816
    at java.util.concurrent.atomic.AtomicLongArray.checkedByteOffset(AtomicLongArray.java:65)
    at java.util.concurrent.atomic.AtomicLongArray.lazySet(AtomicLongArray.java:137)
    at org.HdrHistogram.ConcurrentHistogram.resize(ConcurrentHistogram.java:265)
    at org.HdrHistogram.AbstractHistogram.handleRecordException(AbstractHistogram.java:428)
    at org.HdrHistogram.AbstractHistogram.recordSingleValue(AbstractHistogram.java:418)
    at org.HdrHistogram.AbstractHistogram.recordValue(AbstractHistogram.java:331)
    at org.HdrHistogram.Recorder.recordValue(Recorder.java:98)
    at org.mpierce.metrics.reservoir.hdrhistogram.HdrHistogramReservoir.update(HdrHistogramReservoir.java:58)
    at org.mpierce.metrics.reservoir.hdrhistogram.HdrHistogramReservoirJmh.recordMeasurements(HdrHistogramReservoirJmh.java:28)
    at org.mpierce.metrics.reservoir.hdrhistogram.generated.HdrHistogramReservoirJmh_readWhileRecording.recordMeasurements_thrpt_jmhStub(HdrHistogramReservoirJmh_readWhileRecording.java:167)
    at org.mpierce.metrics.reservoir.hdrhistogram.generated.HdrHistogramReservoirJmh_readWhileRecording.readWhileRecording_Throughput(HdrHistogramReservoirJmh_readWhileRecording.java:118)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:198)
    at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:180)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Serializing a DoubleHistogram may drop the lowest value

I've been putting HdrHistogram through test.check, a generative testing system in Clojure, as a part of my work on Tesser. I think I may have found a bug in the serialization for DoubleHistograms, but it might also be a problem in my code. Here's what I think is a minimal failing case, as close to the Java as I can get:

(defspec digest-serialization-spec
  (prop/for-all [xs (gen/vector gen/pos-int)]
                (let [digest (DoubleHistogram. 1e8 3)]
                  ; Fill digest
                  (doseq [x xs] 
                    (.recordValue digest x)) 

                  ; Serialize and deserialize
                  (let [buf     (ByteBuffer/allocate (.getNeededByteBufferCapacity digest))
                        _       (.encodeIntoCompressedByteBuffer digest
                                                                 buf)
                        _       (.rewind buf)
                        digest' (DoubleHistogram/decodeFromCompressedByteBuffer buf 0)]
                    (prn)
                    (prn :a (q/distribution digest))
                    (prn :b (q/distribution digest'))

Here's a range of tests, some of which succeed, and some of which fail. The smallest failing inputs test.check identifies are [2, 1, 0], and [4, 1].

:a ([0.0 1] [1.0 1] [3.0009765625 1])
:b ([0.0 1] [1.0 1] [3.0009765625 1])

:a ([0.0 1] [2.0 1] [3.0 1])
:b ([0.0 1] [2.0 1] [3.0 1])

:a ([0.0 1] [1.0 1] [2.0009765625 1])
:b ([1.0 1] [2.0009765625 1])

FAIL in (digest-serialization-spec) (serialization_test.clj:34)
expected: (= digest digest')
  actual: (not (= #<DoubleHistogram org.HdrHistogram.DoubleHistogram@18e72522> #<DoubleHistogram org.HdrHistogram.DoubleHistogram@6f36e7f3>))

:a ([1.0 1] [2.0009765625 1] [3.0009765625 1])
:b ([1.0 1] [2.0009765625 1] [3.0009765625 1])

:a ([0.0 1] [1.0 1])
:b ([0.0 1] [1.0 1])

:a ([0.0 1] [2.0 1])
:b ([0.0 1] [2.0 1])

:a ([1.0 1] [2.0009765625 1])
:b ([1.0 1] [2.0009765625 1])

:a ([0.0 2] [1.0 1])
:b ([0.0 2] [1.0 1])

:a ([0.0 1] [1.0 2])
:b ([0.0 1] [1.0 2])

:a ([0.0 2] [2.0 1])
:b ([0.0 2] [2.0 1])
{:test-var "digest-serialization-spec", :result false, :seed 1418868764580, :failing-size 7, :num-tests 8, :fail [[0 2 1 3 0]], :shrunk {:total-nodes-visited 10, :depth 2, :result false, :smallest [[2 1 0]]}}

FAIL in (digest-serialization-spec) (clojure_test.clj:18)
expected: result
  actual: false
:a ([0.0 1] [1.0 1])
:b ([0.0 1] [1.0 1])

:a ([1.0 1] [4.0029296875 1])
:b ([4.0029296875 1])

FAIL in (digest-serialization-spec) (serialization_test.clj:34)
expected: (= digest digest')
  actual: (not (= #<DoubleHistogram org.HdrHistogram.DoubleHistogram@6bb2e497> #<DoubleHistogram org.HdrHistogram.DoubleHistogram@2dbdacbe>))

:a ([1.0 1] [6.0029296875 1])
:b ([6.0029296875 1])

FAIL in (digest-serialization-spec) (serialization_test.clj:34)
expected: (= digest digest')
  actual: (not (= #<DoubleHistogram org.HdrHistogram.DoubleHistogram@7f980567> #<DoubleHistogram org.HdrHistogram.DoubleHistogram@558aeea2>))

:a ([1.0 1])
:b ([1.0 1])

:a ([4.0 1])
:b ([4.0 1])

:a ([0.0 1] [1.0 1])
:b ([0.0 1] [1.0 1])

:a ([1.0 1] [2.0009765625 1])
:b ([1.0 1] [2.0009765625 1])

:a ([1.0 1] [3.0009765625 1])
:b ([1.0 1] [3.0009765625 1])

:a ([0.0 1] [4.0 1])
:b ([0.0 1] [4.0 1])
{:test-var "digest-serialization-spec", :result false, :seed 1418868808541, :failing-size 12, :num-tests 13, :fail [[12 0 5 4 11 12 3 10]], :shrunk {:total-nodes-visited 47, :depth 11, :result false, :smallest [[4 1]]}}

FAIL in (digest-serialization-spec) (clojure_test.clj:18)
expected: result
  actual: false

Anything stand out to you here? I'm happy to clone and build a SNAPSHOT release for testing if you want to experiment!

Improve memory efficiency

Ideally, to ensure a given maximum relative error r the bucket boundaries must not differ by more than a factor of (1+r). If we want to cover a range [a,b], we need at least (log(b)-log(a))/log(1+r) bins.

The allocated array sizes of HDR histogram are often much larger than this theoretical limit. For example, ((Histogram)(new DoubleHistogram(130, 4).integerValuesHistogram)).counts.length gives 163840 while the theoretically needed number of buckets is log(130)/log(1.0001) or approximately 48678 which is more than a factor 3 less.

Of course, using the optimal number of buckets, each with equal width on the logarithmic scale is not feasible, because the index function which maps a given value to the corresponding bucket would require costly evaluations of the logarithm. The key idea of the HDR histogram approach is to use slightly smaller buckets in such a way that the corresponding index function is less expensive. By nature, this optimization is at the expense of memory utilization. There are multiple effects which increase the memory costs of the HDR histogram approach:

  • As far as I understand the HDR approach is only able to limit the relative error to values that are a power of 1/2. That means if a maximum relative error of 0.01 needs to be guaranteed, the HDR approch must limit the relative error to (1/2)^7 = 0.0078, because (1/2)^6 = 0.0156 is too large.
  • Sequences of buckets of equal width (on the linear scale) range from some value to the double of that value. To cover the same range with buckets of equal width on the logarithmic scale approximately 30% less buckets would be required.
  • I did not analyze the code in detail. However, I guess there are some other issues that further increase the memory costs, e.g. memory alignment, array resizing, auto-scaling?

Since I really liked the HDR key idea to use smaller bucket sizes to reduce indexing costs, I tried to find another index function that is more optimal regarding memory, but still cheap to evaluate. Here is what I have got so far. The bucket sizes are only slightly reduced which means that not more than approx. 8% additional buckets (compared to the optimal number) are required to cover the given range while keeping the relative error below the specified maximum. First tests gave me an average recording time of about 6ns per value. Maybe the proposed approach can tackle both memory and CPU costs.

ArrayIndexOutOfBoundsException when adding histograms and iterating

We record latency values in several atomic histograms, and then add those into a single Histogram (non atomic as this happens single threaded) to compute aggregated percentiles.
By doing so, we sometimes get ArrayIndexOutOfBoundsException, with no message and stacktrace, but according to my debugging it happens inside the AbstractHistogram#add method.
The to/from histograms have same max trackable value and significant digits.
Version is 1.2.1.

Any ideas?

Eliminate HistogramData class, and fold it's functionality into AbstractHistogram

The HistogramData class was originally conceived to abstract the data access to the innards of a histogram. This was at a time when the internal representation was able to track multiple data sets (e.g. corrected and uncorrected), and was therefore able to expose multiple instances of HistogramData for the same histogram.

Since early use patterns showed that rather than tracking multiple internal data sets in a single histogram, it made much more sense to have each histogram represent a single data set, and to use multiple Histogram instances to track corrected vs. uncorrected data for those who actually cared. I.e. only carry the burden of storing twice, recording twice, and dereferencing further in cases where someone actually wanted to track both (which is rare, and only tends to happen when people study coordinated omission details).

But HistogramData survived this shift, since it worked just fine. It just makes for cumbersome syntax, and generates the same questions over and over again.

So it's time to get rid of it. I'm planning to fold all of HistogramData's method calls to be direct method calls on AbstractHistogram (and all of it's subclasses). HistogramData will stick around as a deprecated class, to keep current code working, and we'll see when we can actually get rid of it in the future (a year or two?).

Infinite loop due to integer overflow

The following code never returns:

new Histogram(20000000, 100000000, 5);

It hangs here:

at org.HdrHistogram.AbstractHistogram.getBucketsNeededToCoverValue(AbstractHistogram.java:1440)
at org.HdrHistogram.AbstractHistogram.init(AbstractHistogram.java:189)
at org.HdrHistogram.AbstractHistogram.<init>(AbstractHistogram.java:152)
at org.HdrHistogram.Histogram.<init>(Histogram.java:139)

At line 1440, it sets trackableValue to (subBucketCount - 1) << unitMagnitude which is (262144 - 1) << 24, which overflows to -16777216. As a result the loop condition never returns true, because trackableValue eventually shifts down to 0.

.Net HdrHistogram with multiple writers

First of all thanks @giltene and @mattwarren for the awesome work on HdrHistogram.

I'm trying to implement support for HdrHistogram into my Metrics.Net port of the java metrics library. Initial version uses a lock to sincronize writes and reads to the histogram and so far seems to work fine and is already faster than the default reservoir (exponentially decaying reservoir port).

I've noticed that the java version uses a "recorder" to handle multiple writer threads and "flipping" the active and inactive histograms. Is there anything similar that can be used in .Net? If not, do you think it is worth porting the java recorder to .net?

Also i wonder, are there any other suggestions on how to handle the scenario where multiple threads record values and one (or maybe more) threads need to read the values?

Thanks,
Iulian

Wasteful byte copies when decoding/decompressing

As part of doing the .NET port, I've notice a few places where (unless I'm mistaken), extra bytes are being copied when decompressing/decoding histogram data.

When decompressing (https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L1364), couldn't it instead do the following?

int decompressedBytes = decompressor.inflate(countsBuffer.array());
histogram.fillCountsArrayFromBuffer(countsBuffer, decompressedBytes);

inflate(..) returns the # of bytes that were uncompressed http://docs.oracle.com/javase/7/docs/api/java/util/zip/Inflater.html#inflate(byte[])

Also when encoding (https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L1249), couldn't we store the relevantLength value like so?

buffer.putLong(getTotalCount()); 
buffer.putInt(relevantLength); // store this, so we can use it when decoding

fillBufferFromCountsArray(buffer, relevantLength);

Then it can be pulled out when decoding, so we know how many bytes to copy here https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L1341.

It's not so much a performance issue, because this doesn't happen on any hot-paths. It just seems a bit redundant to be copying all the extra bytes.

Or am I missing something?

Lossy capture

I would like to be able to take a point in time snapshot of a histogram. The snapshot can be lossy which will require that counts will need to be recalculated.

Question about possible inconsistency in percentile reporting

I've noticed that the percentile value printed by outputPercentileDistribution() is different from the value at the same percentile as returned from getValueAtPercentile(). Are they expected to be different? I can submit a test case if that would be helpful.

Histogram subtract() method broken for ShortCountsHistogram

An attempt to subtract a ShortCountsHistogram from another results an exception to be thrown with stack trace:

java.lang.IllegalArgumentException: would overflow short integer count
    at org.HdrHistogram.ShortCountsHistogram.addToCountAtIndex(ShortCountsHistogram.java:54)
    at org.HdrHistogram.AbstractHistogram.subtract(AbstractHistogram.java:637)
    at 
    ...

This is obvious from code though. On line 637 of AbstractHistogram.java the subtract() method negates the other histogram's sample count before calling addToCountAtIndex():

addToCountAtIndex(i, -otherCount);

But addToCountAtIndex throws an exception because the value is negative:

@Override
void addToCountAtIndex(final int index, final long value) {
    int normalizedIndex = normalizeIndex(index, normalizingIndexOffset, countsArrayLength);
    short currentCount = counts[normalizedIndex];

    // THE BUG IS HERE, SEE BELOW:
    if ((value < 0) || (value > Short.MAX_VALUE)) {
        throw new IllegalArgumentException("would overflow short integer count");
    }

    ...

Fix Readme

Hi,

I was looking around and the README file you get when you enter Github states these classes are not thread-safe.

I spent a good deal of time wondering and looking for obscure reasons as to why atomic and synched versions of the Histogram were not thread-safe when in fact they are. Later a friend referred me to the javadocs clarifying everything.

I would update the README file to avoid such confusions and refer you to the updated javadocs instead.

cmd line util to parse csv file?

I'm looking for a way to feed data parsed from an existing log file to HdrHistogram. At first, I thought that HistogramLogProcessor would do the trick, but now I'm not so sure.

Is there something like that ?

Thanks!

.NET port (work in progress)

@giltene (sorry I don't know a better way to contact you)

Just to let you know I'm working on a .NET port of HdrHistogram, you can see the current progress here https://github.com/mattwarren/HdrHistogram.NET.

At the moment it's has all the functionality that was in HdrHistorgram before you added HistogramLogProcessor (I plan to add that when I get time).

Hope the license and attribution is okay with you? If you'd rather host it within the main HdrHistorgram code base let me know?

BTW I plan to port jHiccup next (or at least the parts of it you can do in .NET, there isn't an Agent API like Java)

Cheers

Matt

Work on the README

I've just done some formatting changes and fixed a few typos in the README (0307890), hope that's okay? I figured that it's the first thing people see when they look at the GitHib project, so it's worth doing.

One question though, there's seems to be an entire paragraph that is repeated, does it need to be? Take a look at the 6th paragraph starting "Internally, AbstractHistogram data is maintained using a concept somewhat similar to that of floating point number..." in this section. It's repeated in this section "Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating point number..." with only a few minor changes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.