hdrhistogram / hdrhistogram.net Goto Github PK

The .NET port of HdrHistogram

License: Other

C# 99.85% Batchfile 0.15%

hdrhistogram.net's Introduction

HdrHistogram

HdrHistogram: A High Dynamic Range (HDR) Histogram

This repository currently includes a Java implementation of HdrHistogram. C, C#/.NET, Python, Javascript, Rust, Erlang, and Go ports can be found in other repositories. All of which share common concepts and data representation capabilities. Look at repositories under the HdrHistogram organization for various implementations and useful tools.

Note: The below is an excerpt from a Histogram JavaDoc. While much of it generally applies to other language implementations as well, some details may vary by implementation (e.g. iteration and synchronization), so you should consult the documentation or header information of the specific API library you intend to use.

HdrHistogram supports the recording and analyzing of sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

The HdrHistogram package includes the Histogram implementation, which tracks value counts in long fields, and is expected to be the commonly used Histogram form. IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial.

HdrHistogram is designed for recording histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. AbstractHistogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.

A common use example of HdrHistogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can include computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording analysis techniques, this example can maintain a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint of around 185KB (see "Footprint estimation" below).

Histogram variants and internal representation

The HdrHistogram package includes multiple implementations of the AbstractHistogram class:

Histogram, which is the commonly used Histogram form and tracks value counts in long fields.
IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are being tracked).
AtomicHistogram and SynchronizedHistogram (see 'Synchronization and concurrent access' below)

Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating point number representation: Using an exponent a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and numberOfSignificantValueDigits controlling resolution.

Synchronization and concurrent access

In the interest of keeping value recording cost to a minimum, the commonly used Histogram class and it's IntHistogram and ShortHistogram variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally synchronize and/or order their access, or use the ConcurrentHistogram, AtomicHistogram, or SynchronizedHistogram or variants.

A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or reporting operations are needed, consider using the Recorder and SingleWriterRecorder recorder variants that were specifically designed for that purpose. Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive histograms such that recording remains wait-free in the presence of accurate and stable interval sampling.

It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread non-synchronized histograms or SingleWriterRecorders, and use a summary/reporting thread to perform histogram aggregation math across time and/or threads.

Iteration

Histograms support multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. The iteration mechanisms are accessible through the HistogramData available through getHistogramData(). Iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set, and are available for the default (corrected) histogram data set via the following HistogramData methods:

percentiles: An Iterable<HistogramIterationValue> through the histogram using a PercentileIterator
linearBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LinearIterator
logarithmicBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LogarithmicIterator
recordedValues: An Iterable<HistogramIterationValue> through the histogram using a RecordedValuesIterator
allValues: An Iterable<HistogramIterationValue> through the histogram using a AllValuesIterator

Iteration is typically done with a for-each loop statement. E.g.:

 for (HistogramIterationValue v :
      histogram.getHistogramData().percentiles(ticksPerHalfDistance)) {
     ...
 }

 for (HistogramIterationValue v :
      histogram.getRawHistogramData().linearBucketValues(unitsPerBucket)) {
     ...
 }

The iterators associated with each iteration method are resettable, such that a caller that would like to avoid allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's hasNext() and next() methods.

So to avoid allocating a new iterator object for each iteration loop:

 PercentileIterator iter =
    histogram.getHistogramData().percentiles().iterator(ticksPerHalfDistance);
 ...
 iter.reset(percentileTicksPerHalfDistance);
 for (iter.hasNext() {
     HistogramIterationValue v = iter.next();
     ...
 }

Equivalent Values and value ranges

Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. HdrHistogram provides methods for determining the lowest and highest equivalent values for any given value, as well as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid a double-counting count).

Corrected vs. Raw value recording calls

In order to support a common use case needed when histogram values are used to track response time distribution, Histogram provides for the recording of corrected histogram value by supporting a recordValueWithExpectedInterval() variant is provided. This value recording form is useful in [common latency measurement] scenarios where response times may exceed the expected interval between issuing requests, leading to "dropped" response time measurements that would typically correlate with "bad" results.

When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter, recorded histogram data will reflect an appropriate number of additional values, linearly decreasing in steps of expectedIntervalBetweenValueSamples, down to the last value that would still be higher than expectedIntervalBetweenValueSamples.

To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples at 1 msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1 msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1 msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec.

Data sets recorded with and without an expectedIntervalBetweenValueSamples parameter will differ only if at least one value recorded with the recordValue method was greater than its associated expectedIntervalBetweenValueSamples parameter. Data sets recorded with an expectedIntervalBetweenValueSamples parameter will be identical to ones recorded without it if all values recorded via the recordValue calls were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters.

When used for response time characterization, the recording with the optional expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced.

Footprint estimation

Due to its dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and numberOfSignificantValueDigits combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by its data value recording counts array. The total footprint can be conservatively estimated by:

 largestValueWithSingleUnitResolution =
        2 * (10 ^ numberOfSignificantValueDigits);
 subBucketSize =
        roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 expectedHistogramFootprintInBytes = 512 +
      ({primitive type size} / 2) *
      (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
      subBucketSize

A conservative (high) estimate of a Histogram's footprint in bytes is available via the getEstimatedFootprintInBytes() method.

hdrhistogram.net's People

Contributors

Stargazers

Watchers

hdrhistogram.net's Issues

Record Scope

Add a feature where you can record the scope of a function call by leveraging the using statement in C#.

It could be use as such

using(recorder.RecordScope())
{
    await SomeExpensiveCall();
}

This allows recording of tasks. It also simplifies recording of long statements without having to create lambdas/closures.

It would incur the allocation cost of assigning the IDisposable resource, but in theory if there is a Task being involved, that allocation cost should be dwarfed by the async context switch and work.

Code to be added could be like the following (added to src/HdrHistogram/HistogramExtensions.cs):

/// <summary>
/// Records the time to call dispose on the returned token.
/// This can be useful to testing large blocks of code, or wrapping around and <c>await</c> clause.
/// </summary>
/// <param name="recorder">The <see cref="IRecorder"/> instance to record the latency in.</param>
/// <returns>Returns a token to be disposed once the scope </returns>
public static IDisposable RecordScope(this IRecorder recorder)
{
    return new Timer(recorder);
}

private sealed class Timer : IDisposable
{
    private readonly IRecorder _recorder;
    private readonly long _start;

    public Timer(IRecorder recorder)
    {
        _recorder = recorder;
        _start = Stopwatch.GetTimestamp();
    }

    public void Dispose()
    {
        var elapsed = Stopwatch.GetTimestamp() - _start;
        _recorder.RecordValue(elapsed);
    }
}

Invalid .hgrm output produced

The logic for `IsLastValue()' is incorrect and can flag multiple values as the last value.

This causes the incorrect output shown below (note multiple lines with the last column missing):

       Value     Percentile TotalCount 1/(1-Percentile)

       1.000 0.000000000000    7604459           1.00
       1.000 0.100000000000    7604459           1.11
       1.000 0.200000000000    7604459           1.25
       1.000 0.300000000000    7604459           1.43
       1.000 0.400000000000    7604459           1.67
       1.000 0.500000000000    7604459           2.00
       1.000 0.550000000000    7604459           2.22
       1.000 0.600000000000    7604459           2.50
<SEVERAL ROWS REMOVED FOR BREVITY>
     383.000 0.999998283386    9999983      582542.22
     453.000 0.999998474121    9999985      655360.00
     511.000 0.999998664856    9999987      748982.86
     537.000 0.999998855591    9999990      873813.33
     672.000 0.999999046326    9999991
     777.000 0.999999141693    9999992
   18143.000 0.999999237061    9999993
  208127.000 0.999999332428    9999994
  224639.000 0.999999427795    9999995
  229759.000 0.999999523163    9999996
  229759.000 0.999999570847    9999996
  230271.000 0.999999618530    9999997
  230271.000 0.999999666214    9999997
  258943.000 0.999999713898    9999998
  258943.000 0.999999761581    9999998
  258943.000 0.999999785423    9999998
  275711.000 0.999999809265    9999999
  275711.000 0.999999833107    9999999
  275711.000 0.999999856949    9999999
  275711.000 0.999999880791    9999999
  275711.000 0.999999892712    9999999
  282111.000 0.999999904633   10000000
  282111.000 1.000000000000   10000000
#[Mean    =        1.714, StdDeviation   =      205.505]
#[Max     =   282111.000, Total count    =     10000000]
#[Buckets =           14, SubBuckets     =         2048]

I'm pretty sure the fix is to make IsLastValue look like this (i.e using double.Epsilon):

public bool IsLastValue()
{
    return Math.Abs(PercentileLevelIteratedTo - 100.0D) < double.Epsilon
}

Example of how to Send or save Histograms

This seems like it will become a popular thing and people are already asking for it

https://gitter.im/HdrHistogram/HdrHistogram?at=563f8127c712fe074e4e7101

Create a Recorder

Consider supporting Recorder (which supports multiple concurrent writers), but that will also require a ConcurrentHistogram.

Question: What is the Recorder?

Consistent index error on Windows

When initialising the following LongHistogram

var measurements = new LongHistogram((long) TimeSpan.FromMinutes(15).TotalMilliseconds, 3);

and then in a continuous loop add measurements to it with

measurements.RecordValue(actual - expected);

I consistently get the following exception:

 Index was outside the bounds of the array.
Stack Trace:
   at HdrHistogram.Utilities.Bitwise.Log2(Int32 i)
   at HdrHistogram.Utilities.Bitwise.NumberOfLeadingZeros(Int64 value)
   at HdrHistogram.HistogramBase.RecordSingleValue(Int64 value)

Could this be because actual - expected can be negative – or 0? I had assumed HdrHistogram handled that, but I realise now I might have been incorrect.

CultureInfo used to create output cannot be controlled resulting in invalid .hgrm format

A line in a .hgrm file should look something like this:

       5.02 0.881250000000        888           8.42

Notice the decimal separator which is a dot.

The OutputPercentileDistribution method will format the data using Thread.CurrentThread.CurrentCulture and in many cultures the decimal separator is a comma and not a dot.

       5,02 0,881250000000        888           8,42

This results in a .hgrm file that cannot be parsed correctly.

To work around this you need to set Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture before calling OutputPercentileDistribution. In many cases you will have to change the culture back after the call to avoid situations where other code depends on the current culture of the thread. In general this is a hack and not a sustainable solution.

However, the decimal should not always be a dot. If the output of OutputPercentileDistribution is intended to be read by a human the current choice of using Thread.CurrentThread.CurrentCulture is the right way of formatting the output.

I suggest that an overload of OutputPercentileDistribution accepting a CultureInfo is created. Writing to a .hgrm file would then require specifying CultureInfo.InvariantCulture.

Auto-sizing

Guidance from Gil:

Auto-sizing is another useful thing… Not having to specify an initial range is useful for lazy folks (who are ok with resizing latencies in the recording path). It is also useful as a way to avoid overflowing wrongly-initial-sized histograms: unexpected large values result in a resize rather than an AIOOB exception. If your are ok with taking the latency hit (and potential mem size hit) for that, it's cleaner to code to.

.NET Core support

Ideally HdrHistogram should be available on all supported versions of .NET.
Currently this is 4.5.2 & 4.6.1
It may however be prudent to wait until .NET Core and supporting tooling is available before undertaking this task, as it may incur alot of rework if the goalposts move by too much.

Examples

I've been using HdrHistogram for a while and have created some C# classes to simplify its usage. They allow me to enable of set of histograms via simple names in an application configuration file. The classes that use the enabled histograms can initialize and use them with only a small amount of code. I also have an F# version that uses F# modules/functions instead of classes. Here's a quick C# example:

// step 1 - initialize the histogram names that will be used in the current run from config file
List<string> names = cfg.getValues("histograms");

foreach (string name in names)
{
    Histograms.Add(name);
    _log.DebugFormat("enabled histogram: {0}", name);
}

// step 2 - initialize an enabled histogram within class it's used in

private static HistogramTimer _hgPostAck = null;
private static bool _hgPostAckEnabled = false;
public static readonly string HG_POSTACK = "postack";

_hgPostAckEnabled = Histograms.isEnabled(HG_POSTACK);

if (_hgPostAckEnabled)
    _hgPostAck = Histograms.makeHistogramTimer(HG_POSTACK, HistogramTimer.NSECS_IN_MIN * 10L, 3, false, 100); // 1 nsec to 10 min, 3 decimal point resolution, don't warmup, only log report every 100 calls

// step 3 - use histogram within class

try
{
    if (_hgPostAckEnabled)
        _hgPostAck.startTimer();

    // do work...
}
finally
{
    if (_hgPostAckEnabled)
    {
        _hgPostAck.recordTime();
        Histograms.logReport(HG_POSTACK, ScaleFactor.MSEC);
    }
}

Let me know if you are interested in these potential contributions. Note that my existing C# and F# code will most likely require some modifications before they can be included.

Peter Santoro

Create extreme packages

Create extreme packages that are designed for extreme performance cases

only have a single implementation of a histogram defined
remove the base class
only implement the interface
sealed class with no inheritance
x64 and x86 (32bit) releases.
no support for synchronization.

e.g. HdrHistogramLongx64.nupkg, HdrHistogramIntx64.nupkg, HdrHistogramShortx86.nupkg

Test to see if they provide significant improvements in

footprint (dll/nuget package)
throughput

These would be targets for either .NET platforms requiring extreme throughput (MMOG, Trading, etc), or that require very small footprint (UWA, RasberryPi, etc)

Correct docs and Scaling regarding Ticks

The comments, documents and scaling helper class all misinterpret the result of a Stopwatch.GetTimeStamp() - Stopwatch.GetTimeStamp() to be an elapsed period of ticks i.e. 1/10,000,000 of a second.
This assumption was made from simply thinking that TimeSpan.TicksPerSecond related to the tick values returned from Stopwatch.GetTimestamp().
As per documentation

Gets the current number of ticks in the timer mechanism.

However, it is correctly defined* as 1 second = Stopwatch.Frequency;

This means the following helper fields should be added

public static readonly double TimeStampToMicroseconds = Stopwatch.Frequency / (1000d * 1000d);
public static readonly double TimeStampToMilliseconds = Stopwatch.Frequency / 1000d;
public static readonly double TimeStampToSeconds = Stopwatch.Frequency;

And documents referring to ticks, should be amended.

https://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.frequency(v=vs.110).aspx

.NET Lib to create charts

Currently I believe that the only way for a .NET program to render the captured histograms into a chart is via the web project http://hdrhistogram.github.io/HdrHistogram/plotFiles.html.

It would be nice if this was ported to a .NET process (using the Drawing or WPF libs) to generate this stuff on the fly.

.gitignore does not currently exclude .vs files generated by Visual Studio

Allow thread safe writing/recording

Either create a thread safe implementation of each Histogram (16/32/64 bit) or provide a generic wrapper to allow synchronized access to them

Thread safe writes
Thread safe reads (via a recorder)

Please consider signing your assemblies

Hello,

Could you please consider signing your assemblies so that they can be used in a wider range of projects where strong-name information is required?

Further reading: https://www.pedrolamas.com/2018/09/11/start-strong-naming-your-assemblies/

Non shared values from Enumerables

As identified in #64 , it is non-intuitive that values returned from our Enumerable factories are shared and mutable.

Running benchmarks against master and PR #64 shows no significant performance change, so this looks like it is an unnecessary coding style ported from the Java implementation

Create CI build

Create an automated build that

compiles the code for each platform
runs all the tests
packages and deploys

Consider using AppVeyor?

Document/Visualize the internal bucket model

As asked and answered here https://gitter.im/HdrHistogram/HdrHistogram?at=56db5610ddfe3d431627fa97

Create IntHistogram

A histogram with 32Bit bucket counts

GetPerecentileOfValue

How hard would it be to add a method that queries the histogram for a percentile given a sample?

If i have a measurement of 300ms what percentile does this sample fall under?

Lzct SSE instruction when available

We currently have at the heart of the hot path a manual method for find the leading zero count.
This is used to identify the correct bucket to assign a recorded value.
Frustratingly, this is supported as an intrinsic instruction on most modern CPU architectures.

The code is found in Bitwise.NumberOfLeadingZeros, and has been isolated with the intent that it can be a single place to refactor/optimize if the opportunity arises.

Follow this .NET core issue for progress/resolution

dotnet/corefx#2209
possibly moved from coreFx to CoreCLR https://github.com/dotnet/coreclr/issues/8089

Create coding standards

To enable the community to contrubute to the repository in a consitent and predictable manner, it would be good if there was a set of standards for the project to adopt.

Which style of C# to use (e.g. StyleCop defaults vs Resharper defaults)
Expectation of XML documentation for public api
Test coverage and style expectations.
Platform support that is expected (.NET 20-4.5, Mono, Dnxcore etc?)

These expectation should be documented in the wiki

Create automated Documentation generation

Could use https://readthedocs.org/ to host the generated docs.

Create ShortHistogram

A histogram with 16Bit bucket counts

Provide a simple entry point for ASP.NET recording

Ideally Web devs should be able to just pull a nuget package and add an attribute to controllers that should be measured.

Potentially an ActionFilterAttibute could be provided in a standalone ASP specific nuget

public class HistogramAttribute : ActionFilterAttribute
{
    private const string StartTimestampKey = "HistogramAttribute.StartTimeStamp";

    public override void OnActionExecuting(ActionExecutingContext context)
    {
        context.HttpContext.Items[StartTimestampKey] = Stopwatch.GetTimestamp();
    }

    public override void OnActionExecuted(ActionExecutedContext context)
    {
        var stopTimestamp = Stopwatch.GetTimestamp();
        var startTimestamp = (long)context.HttpContext.Items[StartTimestampKey];
        var elapsed = stopTimestamp - startTimestamp;
        Log.RecordValue(elapsed);
    }
}

Ideally this would also integrate with tag, recorder and thread safety support.

StrongName Assembly

Hi,

Could you provide a strong named version assembly package on nuget?

Thanks.

Auto release on tag

https://www.appveyor.com/docs/deployment/github/

When we tag the repo, it would be nice if a release was just created

Repo tagged
GitHub "release" created
Nuget package added to the Release
Package published to Nuget.org

Fix line endings

The test files have \n line endings, but currently git applies autocrlf to these files changing their line ending incorrectly to \r\n.
I think the fix is to remove the .gitattributes file.

Update home page/Readme.md to have less detail

Instead of having a dense detailed homepage, link to the content in the wiki

Move Detailed description to wiki
Update code sample to a factory
Reduce the sample output content size

Subtract - additional method rather than issue

Hello,

Would it be possible to add a Subtract function that would subtract one histogram from another. An Add function exists. A Subtract function would also be very helpful.

Apologies if this request is in the wrong place. All help appreciated.

Simon

Benchmarks as part of the build.

Can we use Benchmark DotNet or NBench to ensure our hope paths stay allocation free and fast.

Investigate SkinnyHistogram

I am not sure what the Skinny Histogram is and what it could offer the .NET project.

Update to Encoding docs

As per https://gitter.im/HdrHistogram/HdrHistogram?at=5731f888b51b0e294850e027

File Format:

the use of "File" is not the best as this construct is not always saved in a file (for example in our app it is stored in a memory buffer and sent over the wire)
add valid values for cookie
add length encoding and valid range

I initially struggled to understand the rational for this encapsulation, maybe @giltene can provide the history behind this (i.e. why do we have this encapsulation)?

CompressedHeader Format:

the description should specify the byte order for all the 32 and 64 bit fields (A to F)
list of valid cookie values
valid values for B to F

Perhaps a better (final) place is to have that document one level up or in a doc repo since it is language independent, then it'll be easier to have others contribute information to it (for example other people can add info related to encoding size and speed for their language implementation )

Remove usage of "file"
document Cookie
length encoding definition
specify the byte order for compressed header (provide example?)
list valid cookie values
valid values for B to F

Document Unit Tests and Benchmarks

The readme currently doesn't have the basic details on how to run tests!

It should say something like

dotnet test .\HdrHistogram.UnitTests\HdrHistogram.UnitTests.csproj -v=q -c=Release

Add Tag support to Log format

https://gitter.im/HdrHistogram/HdrHistogram?at=572fa2a7f9a53a60793cd710

Create LongHistogram

Create the default implementation with signed 64 bit counts.

It should support

auto-sizing
percentile, linear, and log based iteration
encode into the compressed V2 serialized histogram format (which is what the current Java code will encode to, as well as the current C and Python code bases).
decode V2.
SingleWriter i.e. not thread safe for writes.

I also propose that the current Histogrram type be renamed to LongHistogram. The Histogram type can then be used as a factory that can help guide users to which instance they need.

potentially something like

var histogram = Histogram
    .With64bitBucketCount() 
    .AsThreadSafe()
    .Create()

Add support for decoding V0 and V1 encodings

Test logs with V2, V1, and V0 encoded histograms are included in the Java repo under test/resources.

Histogram Factory

I also propose that the current Histogrram type be renamed to LongHistogram. The Histogram type can then be used as a factory that can help guide users to which instance they need.

potentially something like

var histogram = Histogram
    .With64bitBucketCount() 
    .AsThreadSafe()
    .Create()

Event tracing for Windows support

It would be nice if there would be an out-of-the-box way to use input from ETW events as input.
I will look into this as I think that things like this should be done out-of-process. As the .NET garbage collector also publishes stats via ETW, it would give a standard way to look at Garbage collection times of programs.

Create documentation standards

Adoption of a library can be greatly improved with quality documentation.

Providing guidelines on how to create consistent documentation can reduce rework and improve the feeling of a quality repository.

example of a set of guidelines : http://dotnet.github.io/orleans/Documentation-Guidelines

HistogramLogWriter is blocking on purpose ?

in order to persist data on external system overtime we spins up a Timer that will run this :
HistogramLogWriter.Append(someStream)

This code in wrapped in a "PeriodicTask" and contains a "Stop" method and some CancellationToken
The Stream is done throught this API :

Is that intended not to use the Async method with the possibility to pass a cancellation token so that everything could be stopped "gracefully" ?

for example if the accumulated data are like 500Mo, and your are trying to shutdown the App, this would probably not goes really well with blocking code

Publish the costs to record

Having created benchmarks for each variation of the library, have these published to allow users to make an educated decision about which is the most appropriate way to use the library.

These benchmarks should include a wide variety of CPUs.
Once broad platform support is available, each platform should be included in the results.

Create a Log writer

Support a log writer that would output V2 encoded histograms to a log file compatible with HistogramLogWriter

Rename "header" to envelope in HistogramEncoding

There are cases in the code where we refere to an envelope structure as a header.
This leads you to look for the body/contents, however they are inside the "header".
Thus the header is really and envelope.

Create log reader

Create EBNF definition of the log format

https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form

https://gitter.im/HdrHistogram/HdrHistogram?at=571e8a9d47b4c6480ffa300d

Instrument ASP.NET end point package

I assume that there is a simple way to add a middleware/handler/router/filter/thing to ASP.NET to allow HdrHistogram to record the time taken for the request to be processed.

If there is then this would be good to provide as a separate nuget package that web devs can just add and then wire up in a one-liner
It should

record the service time,
autorotate instances of HdrHistogram when writing to disk (to target directory)
default to using the endpoint/method/action name as the key for grouping/tagging. (histograms can be merged at later date for higher level aggregates)

Benchmark the Library

The unofficial port at [https://github.com/LeeCampbell/HdrHistogram.NET] had some benchmarks that were used to compare it to the existing official implementation and could be used to track performance over versions.

It seems sensible to investigate BenchmarkDotNet as a standard way to measure performance. It should also reduce the amount of code to maintain in this repository.