<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support for prometheus quantiles when using @Timed about micrometer HOT 10 CLOSED

micrometer-metrics commented on August 18, 2024

Support for prometheus quantiles when using @Timed

from micrometer.

Comments (10)

jkschneider commented on August 18, 2024 1

With the current implementation of CKMS that seems widespread (literally copied from place to place including Prometheus and Netflix), the quantiles are updated after each 500 sample batch. So it largely depends on how many samples are coming in within a particular time window.

But the flat nature of it in this graph just shows how CKMS is more stable and accurate at the expense of computational complexity. I imagine if you want an alert on one of these quantiles, you would set the threshold high enough that the perturbation in a successive approximation approach like Frugal is probably irrelevant.

At any rate, feel free to play with the sample from which this graph was generated.

from micrometer.

jkschneider commented on August 18, 2024

Prometheus uses CKMS, but we can evaluate the instrumentation cost of a range of quantile algorithms against this. Netflix also uses Frugal2U for load balancing. Great illustration of the effect over time here.

from micrometer.

jkschneider commented on August 18, 2024

Frugal is substantially faster, but as a successive approximation algorithm benefits from good initial estimates for faster convergence. There is a legitimate question about how to arrive at those initial estimates.

QuantilesBenchmark.ckmsQuantiles       avgt      30     1233.518 ±  105.064   ns/op
QuantilesBenchmark.frugal2uQuantiles   avgt      30       82.686 ±    2.720   ns/op

from micrometer.

checketts commented on August 18, 2024

I'm not familiar with CKMS/Frugal. Are those algorithms? If the output is pretty much identical, is it something worth pulling upstream?

I know a common footnote with using Summaries in Prometheus is in regards to potential 'cost' of the client side calculations.

Very cool to see the substantial difference.

from micrometer.

jkschneider commented on August 18, 2024

The average time benchmark does hide one somewhat pernicious characteristic about CKMS, which is the effect of the algorithm's "batch observations and calculate every so often" approach on worst case performance. The implementation in Netflix Ocelli and Prometheus batches 500 observations before computing a result. p=0.999 samples show the effect of this:

QuantilesBenchmark.ckmsQuantiles:p0.999      sample          1435648.000         ns/op
QuantilesBenchmark.frugal2uQuantiles:p0.999  sample             6693.648         ns/op

Yikes.

from micrometer.

jkschneider commented on August 18, 2024

An example of both algorithms at work simultaneously on two summaries recording the same values.

from micrometer.

checketts commented on August 18, 2024

The flat nature of CKMS seems to imply it lags.

If you don't mind, I would be interested to see if the numbers doubled at one point, how long would it take until CKMS actually register it?

from micrometer.

jkschneider commented on August 18, 2024

Ultimately, I built in 4 different quantile algorithms and selected the GK-based sliding window algorithm as the underlying implementation for @Timed because it requires no tuning per quantile (so is simplest for annotation use) and is otherwise most similar to the native Prometheus implementation which is CKMS wrapped in a sliding-window.

from micrometer.

sbilello commented on August 18, 2024

if you want to know the number of requests in a certain period of time. How can you achieve that with the given _sum and _count metrics.
https://www.innoq.com/en/blog/prometheus-counters/ I was looking to apply the increase function correctly.

from micrometer.

checketts commented on August 18, 2024

Please ask this on stack overflow

from micrometer.

Support for prometheus quantiles when using @Timed about micrometer HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent