openobservability / openmetrics Goto Github PK

View Code? Open in Web Editor NEW

2.3K 2.3K 170.0 3.59 MB

Evolving the Prometheus exposition format into a standard.

Home Page: https://openmetrics.io

License: Apache License 2.0

Makefile 3.63% HTML 4.92% Go 89.05% Sass 1.68% Dockerfile 0.27% Python 0.35% Ruby 0.10%

openmetrics's Introduction

OpenObservability

This is the umbrella repository for everything within https://github.com/OpenObservability/

openmetrics's People

Contributors

Stargazers

Watchers

Forkers

beorn7 leecalcote manolama pauldix mattbostock tomwilkie opsnull raghu999 pkanna000 curlup atyronesmith lulzzz hbs caniszczyk enterstudio pilhuhn richih sunfriendli mailsahu bt-dschleich simonz130 blyzer etsangsplk d-ulyanov worldup robskillington zhaoxwen trilokgm bhathiya dellemc-k8 joewrightss zubryan shailendersinghchauhan callistoaz keyolk logicalhan dejanu bhks hellowaywewe uno-su aliceinwire xlwh hamasw beautytiger chakra-coder trv-rpeoples wawababe dystudio alexandreyang krzysion psafont timrots newtob shijuc whmountains runlevel-six jeanmachuca lining-github xxxmailk isgasho omowunmi-svg yzhuge devopstoday11 plant99 sarulon mfkiwl theneva mrueg doytsujin ellisab qiaogj1 kevinschweikert mxinden ylz-at jiamaozheng egbertw 32bitkid shakuzen asdffdsa-wq xdmxr jbristowe vipul-mehtalogy izeye jeis4wpi rodmatos clix-dev-llc marvel-works fbreckle jburgess chrischinchilla aabmass nathanawmk hacklschorsch vpranckaitis sinkingpoint moteesh-reddy angelafevi95 criteo-forks philgbr ericfortinsp

openmetrics's Issues

Consensus: Events & metrics need to be correlated by labels, but we want to accommodate a shortcut

Be explicit about scraping, streaming and push

In #11 we mention scraping but I can imagine that someone might want to use the exposition format for streaming (or pushing) metrics.

We should consider such use cases, their implications and whether the specification should allow for them.

If/How to support extra supporting data besides metrics

E.g. trace id for a specific quantile.

Code of conduct

We should have a code of conduct.

It'd make sense to use the CNCF Code of Conduct (as Prometheus does). If others agree, then we need a couple of volunteers to be listed as contacts.

Annotations on top of labels?

Chris Larsen would like to have annotations; labels which don't change the identity of a series.

Similar to https://prometheus.io/docs/alerting/rules/ annotations.

Start timestamp for points in a cumulative timeseries

Continuing the discussion from this week's meeting. This is not restricted to COUNTER, since histograms can also be cumulative (which is another discussion -- there is a difference between "type" (int, histogram etc) and "kind" (gauge, cumulative) of a metric).

In this model a cumulative point is a tuple (t_start, t, v) where t is the time when value v was sampled and t_start is when the timeseries had value 0.
The motivation for the start timestamp is that it is completely describes the cumulative point and does not assume anything about when the last point was collected. This guards against various inaccuracies caused by the monitoring backend losing points or the source being monitored restarting (or clearing its metric state to save memory), and allows for pushing metrics to the monitoring backend.

For example, consider a source generating counter points at timestamp 0min, 10min, 20min, with values 5, 10, 15 and then it crashes, and finally restarts at 57m, and reports a point with value 17 at 60min. We don't want to incorrectly assume that the counter has increased from 15 to 17 over the interval [20m, 60m]. With a perfect failure detector (which of course doesn't exist) a backend could narrow that interval to [50m, 60m] but it is still less accurate than the real t_start=57min, which is easy for the source to report.

Staleness markers / implications of missing time series

For TSDB supporting a concept of staleness, it should be possible to mark TS as stale/not there any more.

stale or removed probably make sense.

Specify process_* ?

https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics is specified for Prometheus client libraries. Do we want to copy this into this spec?

Specify 0..1 vs 0..100 percentages

Current BCP is the former as it's easier to calculate with; we should make this explicit in the format.

Add considerations about metric evolution?

@sumeer said in https://github.com/RichiH/OpenMetrics/issues/2#issuecomment-316855917

"There is still the issue that different versions of the same library may have incompatible metric types if the developer is not careful, but that should be less common, especially if the standard provides guidelines on how users should evolve their metric (such guidelines are beneficial even in a backend that doesn't enforce the type at data ingestion time, for queries to work sanely across library versions)."

Specify spelling of non-real numbers

How NaN, +inf and -inf are spelled across languages varies. We should specify which are acceptable, probably all variants.

Specify metadata extendability

Elsewhere we're talking about enum and bool metadata, and there'll probably be others.

We should probably specify if additional metadata fields are permitted, what to do if you come across them, what names are reserved etc.

Specify the case of empty metrics

There is unclarity about the handling of empty MetricFamilies. See prometheus/common#50 for discussion.

This issue is about the decision if it should be allowed or not, and then about the documentation of the behavior.

prometheus/docs#544

Specify suffixes convention

Presently there is a Prometheus convention that all Counter types should end in _total.
This should probably be a SHOULD, with verbage around enforcement.

We should also mention that _sum, _count and _bucket are reserved suffixes for their respective types and should not be used elsewhere.

Create a Glossary

Since different platforms and users have various terminology we need a glossary for OpenMetrics that delineates exactly what each term means.

How to deal with duplicate metrics

prometheus/docs#547

Histograms as first-class metrics

@beorn7 do you want to write down / link your braindump in here? We should discuss this and make a deliberate decision either way.

Include base unit names in metric names?

prometheus/docs#553

I.e. should we suggest people append _seconds, _bits, etc as a matter of SHOULD.

How to deal with inconsistent timestamps in text format

prometheus/docs#552

Timestamps subseconds

This is already already decided, this issue merely documents the TODO.

Clarify what metrics are vs. event logging and other systems

Even in our own discussions we're getting confused about what is and is not in scope for a metrics-based transport format. I expect this to continue to be an issue (it still is within the Prometheus ecosystem, even after a few years of being quite clear about it).

I think we should have an opening section covering the idea that metrics are regular snapshots of state, not statsd style samples, profiling or other event loggingy stuff.

Termination marker for scrapes

prometheus/docs#546

Support for geo-time series metrics?

Paul briefly touched on it but there is a huge amount of interest in geo-time series info, i.e. numeric values with coordinates. We should be forward thinking and account for this. Probably a first level data type ala histograms would be fine.

Reserve a valid character which should not be used by direct instrumentation: `:`

The colon is intended for use in a naming convention for aggregation on the Prometheus side, not by instrumentation. That is to say it's reserved for user use.

We should specify this, as it's not uncommon to see instrumentation use this where they should have used an underscore.

Specify grouping/ordering of timeseries in metrics

Currently Prometheus only specifies that histogram buckets must be increasing.

I think we should further specify that a given metric (i.e. same labels) must be together, as that makes things easier for those who support first-class metrics.

Specify whitespace handling in text format

prometheus/docs#543

New types

Add type support for values?

UNIT64 seems like a given
bool can be handled by longer numbers, Prometheus is good at compressing down
- This is implementation-specific for Prometheus, so we might want to have the type nonetheless
- bools can be cast as ENUM if people want to see true/false on the UI
Strings? Will be linked to new issue by Sumeer
ENUMs: https://github.com/RichiH/OpenMetrics/issues/3

Document that NaN does not mean "missing"

This has been a confusion a few times in Prometheus, as we're one of the very few systems that support non-real floating point values. NaN should only be used where it makes mathematical sense.

Specify how to handle missing time series in metrics

We know Summarys will permit missing quantiles.

We should also specify what happens if a Summary is missing _count/_sum or if a Histogram is missing _sum as that can happen with some other monitoring systems. I believe we should not expose them. It should also be made clear that NaN should only be used where it makes mathematical sense, not as a general "missing" signal.

Define what are valid characters in metric names and labels

From #28 we discussed that not all characters are valid in metric names.
We should define the exact set of characters that are allowed in metric names.

Similarly for label names.

Specify precisely the semantics of empty label values

prometheus/docs#548

Namespacing

Do we need namespacing? If yes, what form should it take?

Schemes
- Java scheme - works, but long
- "Play nice" - Won't work
- Current Prometheus style, i.e. prefix your library name (grpc_, snmp_, etc)
Possible formats (Java as example)
- metric{__namespace__=”io.prometheus.foo.bar...”} 1
- io.prometheus.foo.bar.metric{} 1

Exposition format: charset in content-type

text/plain; version=0.0.4 should probably be extended by the charset: text/plain; version=0.0.4; charset=utf-8. Also, we should clarify that the charset is always UTF-8, even if the text format is used as a fallback for an incomplete or unknown content-type.

prometheus/docs#557

Handle/mark unknown kind of data

There was discussion about renaming this, but I forget the conclusion.

Allow leading/trailing whitespace in label values?

It's arguably never a good idea to do this, on the other hand, this can have implications for #14 and possibly others.

To be precise and reproducible, I am somewhat leaning towards "SHOULD not expose leading/trailing whitespace" and "ingestor MUST NOT drop leading/trailing whitespace". But having this strong wording leads to "what about storage/UI/etc which can not properly handle leading/trailing whitespace".

It might be best to prohibit leading/trailing whitespace, along with the implications for #14.

Unify escaping

Currently there's slightly different escaping rules for label values and help string. It's not unlikely that implementors will get this incorrect, so we should unify on just one approach.

prometheus/docs#550

Should unit be a separate field in the datamodel?

...and potentially appended to the metric name in the text format?

Alternatively should it just be part of the metric name?

Extend CONTRIBUTORS.md

If you're reading this, there's a non-zero chance that you should add yourself so people know who's who.

Multiple values for one label?

We don't really want that, but we should have a place to point as to why we didn't do this.

Specify inconsistent labels

Within a given metric, different children might have different labels. While this is wrong for direct instrumentation, it can be valid once target labels are involved. Thus this must be allowed for in the format, but some words of warning would be good.

Reserved label names? name comes to mind

Decision about protobuf vs. text format

Do we want to specify both or only one of them? If the latter, should the other be actively discouraged or just excluded from the spec?

@beorn7's essay as a starter for the discussion: https://github.com/RichiH/OpenMetrics/blob/master/protobuf_vs_text.md

Specify label ordering

Currently there's no specification around ordering of labels for exposition. However in practice client libraries have a consistent ordering due to unittests.

Optimisations in Prometheus 2.0 require a consistent ordering, so we should encourage that to be the case.

Specify that nulls are permitted utf8 values

This has caught us off guard once or twice, so it should be mentioned explicitly.

We should also probably mandate validation of utf-8 label values.

ENUMs

Do we want ENUMs, mapped against the numeric value of the time series?

If yes, we need to make this globally unique to avoid clashes (might be solved by namespacing as per https://github.com/RichiH/OpenMetrics/issues/2)

metric_enum{enum=”1:foo,2:bar,3:baz”} 2
metric_enum{enum="foo"} 1, metric_enum{enum="bar"} 0 ….
- Have a "# ENUM enum" metadata to hint this to storage
metric_enum{enum_1=”foo”,enum_2=”bar”,enum_3=”baz”} 1