open-telemetry / opentelemetry-proto Goto Github PK

View Code? Open in Web Editor NEW

575.0 54.0 253.0 705 KB

OpenTelemetry protocol (OTLP) specification and Protobuf definitions

Home Page: https://opentelemetry.io/docs/specs/otlp/

License: Apache License 2.0

Makefile 100.00%

opentelemetry

opentelemetry-proto's Issues

Dictionary of attribute names

Will we need at some point a dictionary of standard attribute names and use indexes instead of names for the attributes collection? It may save a lot of bandwidth, but will require to change the type for now from string to AttributeKey.

Declare Trace part of the protocol as Stable

Problem

Trace part of the protocol is currently declared Beta. According to our maturity definitions this means we can make breaking changes at most once every 3 months.

Breaking changes in the APIs once every 3 months are annoying but can be reasonably dealt with. You can choose your time and update each application that uses the API to the newer version individually when you want.

Unfortunately the impact of a breaking change in the protocol is much higher. An incompatible protocol change in a deployed fleet of apps and Collectors is a headache and requires coordinated simultaneous updating of all participants of the network which in a large deployment can be very difficult or practically impossible to do.

This impacts the adoption of OTLP. I have been told by end users that they want to use OTLP because of its benefits but "Beta" guarantees are a showstopper.

Proposal

I would like us to declare Traces portion of the protocol as "Stable". The last breaking change we made to Traces portion of the protocol was 3 months ago, so there is not much churn happening in the Traces protocol.

There is currently one outstanging breaking change: #150. We need to make a decision on this (either accept or reject). After that I suggest that the Trace part of the protocol is declared "Stable".

We will still be able to improve OTLP, however only in backwards compatible manner (which is still pretty good given ProtoBuf's nice handling of newly added fields). Any truly incompatible additions will have to be implemented using versioning and graceful fallbacks. This slightly complicates the work but is not impossible to handle. Given that there is no anticipation of frequent breaking changes to the protocol I think this is an acceptable cost to pay to encourage OTLP adoption.

Note: this only applies to Traces part of the protocol. Metrics are still under active development and will stay in "Alpha" for now.

I would like to understand what people think about this proposal.

Do we need local_child_span_count at all?

There were no feedback on what scenarios will be affected by #90

Do we need this field in proto at all since it's not in the specs?

Publish compiled files for Python

CODEOWNERS does not work

See #174: No code owners reviews were requested automatically. Maybe something needs to be changed in the repository settings?

Create a Docker image with toolset for compiling proto files

The current process requires local install of protoc, which is far from ideal. Consider creating a stable Docker image with the build tools, similar to https://github.com/jaegertracing/docker-protobuf.

Set up continuous integration.

As an example see https://github.com/census-instrumentation/opencensus-proto/blob/master/.travis.yml.

AnyValue

I’m trying to update to latest proto and I’m having a hard time with oneof value for AnyValue and testing that
Could you please correct me if I'm wrong .

Let say I have a span with following attributes

const span = {
  attributes: [{
    myParam: 'foo'  
  }]
}

In previous version the attribute looked like

const span = {
  attributes: [{
    key: 'myParam',
    stringValue: 'foo'
  }]
}

How the span should look like ?
I tried all different cases, nothing works except old format

This is how it should be (if I understand correctly)

{
  key: 'myParam',
  value: {
    stringValue: "foo"
  }
}

Error: (proto: wrong wireType = 2 for field Type��(proto: wrong wireType = 2 for field Type

{
  key: 'myParam',
  stringValue: 'foo'
}

No error, but value is not present in zipkin

{
  key: 'myParam',
  type: 0, // or 'STRING'
  stringValue: 'foo'
}

No error, but value is not present in zipkin

{
  key: 'myParam',
  value: 'foo'
}

Error: (proto: wrong wireType = 2 for field Type��(proto: wrong wireType = 2 for field Type

{
  key: 'myParam',
  type: 0, // 'STRING'
  value: 'foo'
}

Error: (proto: wrong wireType = 2 for field Type��(proto: wrong wireType = 2 for field Type

{
  value: {
    key: 'myParam',
    stringValue: 'foo'
  }
}

Error: (proto: wrong wireType = 2 for field Type��(proto: wrong wireType = 2 for field Type

For me it looks like new format is not available yet ?, only old is respected but the value of tag is not shown anyway in zipkin

And as last I tried this

  span.setAttribute('mapAndArrayValue', [
    0,1,2.25, 'alicja',  {
      foo: 'bar',
      baz: 'bara',
      array: [1,2,'aaa']
    }
  ]);

It was accepted but in zipkin only tag name has been shown

'attributes': [{
            'key': 'key',
            'stringValue': 'value'
          }, {
            'key': 'mapAndArrayValue',
            'arrayValue': {
              'valuesList': [{ 'doubleValue': 0 }, { 'doubleValue': 1 }, { 'doubleValue': 2.25 }, { 'stringValue': 'alicja' }, {
                'kvlistValue': {
                  'valuesList': [{
                    'key': 'foo',
                    'stringValue': 'bar'
                  }, { 'key': 'baz', 'stringValue': 'bara' }, {
                    'key': 'array',
                    'arrayValue': { 'valuesList': [{ 'doubleValue': 1 }, { 'doubleValue': 2 }, { 'stringValue': 'aaa' }] }
                  }]
                }
              }]
            }
          }]

So main question does the "oneOf" nothing change in fact, or the format is not fully implemented or ?.
Why the values for tags are not shown in zipkin ?

Link type is missing from the protocol

Both OpenCensus and Jaeger have the concept of link type. Not having this concept in the proto makes it difficult to perform lossless translation between the formats, which is one of the use cases for this proto.

Add sampled flag to Span type

Lack of a sampled flag in the Span type results in ambiguity. It implies that non-sampled traces are dropped at the instrumentation level and not forwarded through the system. If using tail-sampling decisions, this means all traces need to be marked sampled up front. In addition, it leaves no flexibility for different routing of traces. For example, routing all traces to a logging exporter and only sampled ones to a full-scale backend.

In addition, other protocols such as Zipkin have a sampled flag. The lack of this data makes it impossible to preserve full fidelity when converting between protocols.

Clarify and potentially remove InstrumentationLibrarySpans

The traces proto currently contains InstrumentationLibrarySpans which does
not have clearly defined semantics. InstrumentationLibrarySpans may contain a
number of spans all associated with an InstrumentationLibrary. The nature of
this association is not clear.

The InstrumentationLibrary has a name and a version. It is not clear if
these fields are part of Resource identity or are attributes of a Span.
Presumably they should be interpreted as attributes of the Span.

I am not aware of any other trace protocols or backends that have the
equivalent of InstrumentationLibrary concept. However ultimately all span data
produced by OpenTelemetry libraries will end up in a backend and the
InstrumentationLibrary concept must be mapped to an existing concept. Span
attributes seem to be the only concept that fit the bill. Using attributes from
the start of the collection pipeline removes the need to deal with
InstrumentationLibrary by all codebases that need to make a mapping decision
(Collector, backend ingest points, etc).

Proposal 1

I suggest to remove InstrumentationLibrary message type from the protocol and
add semantic conventions for recording instrumentation library in Span
attributes.

The benefits of this approach over using InstrumentationLibrary are the following:

There is not need for a new concept and new message type at the protocol
level. This adds unnecessary complexity to all codebases that need to read and
write traces but don't care about instrumentation library concept (likely the
majory of codebases).
It uses the general concept of attributes that already exists and is well
understood and by doing so makes the semantics of instrumentation library name
clear.

There is potentially a small downside: using InstrumentationLibrary message
may be more efficient to encode/decode especially especially if there are
multiple spans from the same instrumentation library (see Proposal 2 for a
potential solution to this).

On the other hand, for Span data that does not have an associated
instrumentation library name and version we are incurring cost of having the
InstrumentationLibrarySpans message even if it does not refer to any
InstrumentationLibrary. We pay the cost of the concept even if we don't need
it. As opposed to this the approach with recording instrumentation library
information in attributes using semantic conventions is zero cost for cases
which do not need to record instrumentation library.

To illustrate, here is an example. Using InstrumentationLibrarySpans we would
need to record the following:

resource_spans:
  resource: 
    ...
  instrumentation_library_spans:
    - instrumentation_library:
        name: io.opentelemetry.redis
      spans:
        - name: request
            start_time: 123

    - instrumentation_library:
        name: io.opentelemetry.apache.httpd
      spans:
        - name: request
            start_time: 456

After removing InstrumentationLibrarySpans concept we would do this:

resource_spans:
  resource: 
    ...
  spans:
    - name: request
      start_time: 123
      attributes:
        - key: instrumentation.library.name
          value: io.opentelemetry.redis

    - name: request
      start_time: 456
      attributes:
        - key: instrumentation.library.name
          value: io.opentelemetry.apache.httpd

Proposal 2

Rename InstrumentationLibrarySpans to SpanBatch and instead of allowing to
only record instrumentation library name and version allow recording any span
attributes at the batch level:

resource_spans:
  resource: 
    ...
  span_batches:
    - common_attributes:
        - key: instrumentation.library.name
          value: io.opentelemetry.redis
      spans:
        - name: request
            start_time: 123

    - common_attributes:
        - key: instrumentation.library.name
          value: io.opentelemetry.apache.httpd
      spans:
        - name: request
            start_time: 456

This allows to represent the instrumentation library or anything other
attributes that are common for a batch of spans. The protocol will require that
common_attributes and Span aatributes do not contain the same key to avoid
expensive merging in translations.

The benefits of this approach is that instead of having a special concept just
for instrumentation library we have a generic mechanism to record repeating span
attributes efficiently (and instrumentation library information is just one of
the use cases).

Similar approach will also work nicely for log attributes when we add log
support.

Proposal: Renamed TimedEvent to Event

Following PR #56.

I'm glad to see that there is only one kind of Event represented here. I think, now, we should rename TimedEvent to Event. "Timed" implies that a duration is being recorded to me. These are timestamped events, and since there's not two Event types now I think we should just call it Event.

Add the new kinds decided in Java

See some discussion about this:
open-telemetry/opentelemetry-specification#43

Here is where this is defined:
https://github.com/open-telemetry/opentelemetry-proto/blob/master/src/opentelemetry/proto/trace/v1/trace.proto#L107

Support optional sample-weight factors for all metrics Value count fields

As discussed in #59, there is a question of how to report metrics that have been sampled. Although the metrics API currently does not specify a way to input sampled metrics, certain exporters may wish to perform sampling themselves. Here we propose a solution for representing sampled metrics in the protocol without touching on whether the API should support sampled inputs.

Each of the Int64Value, DoubleValue, SummaryValue, HistogramValue, and HistogramValue.Bucket types would support a new optional double value named weight, which, if omitted, has a default equal to 1.0, meaning that no sampling took place.

For Int64Value and DoubleValue types, there is an implicit count of 1 already, as these represent one measured value. The other types have explicit count fields. In each case, the value of weight is logically multiplied by the count, yielding an adjusted count due to sampling.

Consumers SHOULD use the value of count times weight as the total number of metric events being reported. Consumers MUST interpret count as the actual number of events collected, whereas weight MUST be considered a count multiplier due to sampling in the metrics export pipeline.

Eliminate multi data point TimeSeries

Metric proto currently allows to represent multiple data points for each metric, i.e. a timeseries with several values of the same metric, one value per timestamp.

However, reading the metric SDK proposal tells me that normally each collection will emit only a single data point per metric, the last aggregate state of the metric. There may be custom exporters which accumulate data points and emit them all in one batch but it seems to be a possible exceptional case and not a norm. Typically an exporter will simply send the last aggregate value.

Another use case where we could possibly have multiple data points per metric is in Collector, where we could in theory batch points from the same metric into one timeseries. However, in practice I do not think we will do it since it is a very costly operation that requires maintaining maps of metrics.

Given that a single data point per metric is the most common case I suggest that we optimize the proto for that case. Currently the proto definition is the following (only int64 is shown for simplicity):

message Metric {
  MetricDescriptor metric_descriptor = 1;
  Resource resource = 2;
  repeated Int64TimeSeries int64_timeseries = 3;
}
message Int64TimeSeries {
  repeated string label_values = 1;
  repeated Int64Value points = 2;
}
message Int64Value {
  fixed64 start_time_unixnano = 1;
  fixed64 timestamp_unixnano = 2;
  sfixed64 value = 3;
}

I propose to change this to the following:

message Metric {
  MetricDescriptor metric_descriptor = 1;
  Resource resource = 2;
  repeated Int64Point int64_points = 3;
}
message Int64Point {
  repeated string label_values = 1;
  fixed64 start_time_unixnano = 2;
  fixed64 timestamp_unixnano = 3;
  sfixed64 value = 4;
}

Preliminary benchmarking shows that for the case when we have 1 int64 data point per metric this change reduces CPU and RAM usage by 25% for encoding and decoding.

Note: this schema still allows representing multiple data points per metric if neccessary, however at the cost of repeating label values.

MetricDescriptor.Type enum items do not match specification

MetricDescriptor.Type enum items currently specify GAUGE and COUNTER types, however their meaning does not match that of the specification.

The GAUGE enum value in this repository corresponds to specification's Non-monotonic Counters and Gauges.

The COUNTER enum value in this repository corresponds to specification's monotonic Counters and Gauges.

To avoid confusion I suggest renaming enum values to the following:

enum Type {
// Represents monotonic int64 Counter or Gauge
MONOTONIC_INT64 = 1;

// Represents monotonic double Counter or Gauge
MONOTONIC_DOUBLE = 2;

// Represents non-monotonic int64 Counter or Gauge
NONMONOTONIC_INT64 = 3;

// Represents non-monotonic double Counter or Gauge
NONMONOTONIC_DOUBLE = 4;
}

Alternatively we can have finer definition of metric type that includes 3 fields: data type, kind of the metric as defined in the spec (i.e. Counter, Gauge, Measure) and a flag indicating monotonicity of the metric:

enum Kind { Counter, Gauge, Measure };
enum DataType { Int64, Double, Histogram, Summary };
Kind kind;
DataType data_type;
bool monotonic;

Note that in this case certain combinations will be non-representable with current protobuf schema and we need to define precisely which combinations of enums and monitonicity are valid and correspond to DataPoint types for which we have defined protobuf messages.

traces / trace Go package naming for collector RPC

See Go package gen/go/collector/traces generated from protos, named "traces" (plural).

Is this a typo? Everywhere else we use a singular "trace":

This is relevant to open-telemetry/opentelemetry-collector#574

Metrics Proto Does Not Support the OpenTelemetry Specification

As language SIGs include support to export metrics using the OTLP it is being discovered that the native export format they are exporting from do not fit nicely or at all into the OTLP structure. This issue is intended to track these friction points, proposed changes, and action items to resolution.

Ideally the OTLP would natively support the output of all combination of Instruments and Aggregators defined in the OpenTelemetry Specification. Currently, this is not the case.

Identified Issues

First Class Support For Min and Max Values of a MinMaxSumCount Aggregation

There is no metric kind for a MinMaxSumCount aggregator that outputs the minimum, maximum, sum, and count of events observed.

It has been suggested that a Summary metric kind can be used to transport these values. The Summary kind does have fields for the count and sum, however, it does not have dedicated fields for the maximum nor minimum.

The suggested work around here is to send the maximum as the 100th percentile and the minimum as the 0th percentile. Both of which are not obvious and the latter is mathematically incorrect (the 0th percentile is the value which 0% of events occured, where as the minimum is the minimal value where at least 1 event occured).

HistogramValue.Bucket.Exemplar is inadequate

As outlined in the linked ticket:

It's not possible to provide exemplars for non-histograms
It's not possible to provide more than one exemplar per bucket
It requires converting certain data types to a string, which will require additional specification

Instrumentation Instead of Aggregation

The OTLP metric kinds are centered around instruments not output from Aggregation:

    ...
    GAUGE_INT64 = 1;
    GAUGE_DOUBLE = 2;
    GAUGE_HISTOGRAM = 3;
    COUNTER_INT64 = 4;
    COUNTER_DOUBLE = 5;
    CUMULATIVE_HISTOGRAM = 6;
    SUMMARY = 7;

The incongruence of the instruments these kinds were modeled after and the actual instruments of the OpenTelemetry Specification is secondary to the fact that they are modeled after instruments in the first place.

If the goal of the OTLP is to transport the output of Instrument -> Aggregator, it should be modeled after the output of the Aggregators, aggregations. And while, yes, the Histogram is one of these aggregations, the included histogram kinds are conflated with instrument qualifiers (Gauge, Cumulative).

Nebulously Defined Need to be Compatible with OpenMetrics

A common refrain for the design of the OTLP is "We are trying to also stay compatible with OpenMetrics protocol".

This desire to remain compatible is an admirable one, it is negatively impacting this project. The OTLP in its pursuit to maintain easy translation to OpenMetrics has made it difficult to translate from OpenTelemetry into it. Additionally, it has led to a design that is bloated with OpenMetrics metric kinds (GAUGE_INT64, GAUGE_DOUBLE) that OpenTelemetry does not use.

It seems as though if support for the OpenMetrics protocol is desired, the implementations of the OpenTelemetry Specification should support that protocol. Instead of having the OTLP try to match that projects decisions.

Steps Forward

I'm hoping to have this Issue start the discussion on how we can resolve these issues. I see the metrics proto in need of a major overhaul to provide first class support for OpenTelemetry. Possibly being done in a v2 of the proto.

cc @bogdandrutu, @jmacd, @jkwatson, @c24t

Allow histograms to specify negative bucket bounds

According to the proto documentation, histogram bucket bounds can only be positive, which makes it impossible for OpenTelemetry to be fully compatible with Prometheus, which allows negative bounds as well.

Prometheus receiver doesn't respect this and sends negative bucket bounds down OpenTelemetry collector pipelines anyway. This can lead to bugs, as processors/exporters can make incorrect assumptions about the data (in fact, at least Stackdriver exporter already does).

This is a request to extend OpenTelemetry data model to treat first histogram bucket as (-inf, buckets[0]) instead of [0, buckets[0]).

/cc @bogdandrutu @rghetia @nilebox @mayurkale22

Zero means unknown for child span count

I think it is important to discuss this. Copying some notes from the thread. Note, that the main problem is that semantics of the field and how it will be used is not clearly defined.

Let's treat 0 as “unknown”. It has the same practical result that it can’t be used for validation, but it works better with an optional field.

If one simply forgot to set this property to any value - it will mean 0 children which may be factually incorrect.

Depending on semantics of the field and how it should be used there may be no practical difference. Validation is only useful when this value > 0.

Support array valued attributes

See open-telemetry/opentelemetry-specification#368

Description for traceId, spanId, parentId is wrong

The description for traceId, spanId, parentId says it should be bytes, however it works only if traceId is base64

https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/trace/v1/trace.proto#L58
https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/trace/v1/trace.proto#L68
https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/trace/v1/trace.proto#L83

This is misleading and was hard to debug what is the problem

Should we revert tracestate -> trace_state rename?

See comment here: https://github.com/open-telemetry/opentelemetry-proto/pull/120/files#r393369691

Add benchmarks to proto

I have benchmarks in my personal repo https://github.com/tigrannajaryan/exp-otelproto

This contains both microbenchmarks for proto encoding/decoding and also full protocol implementations (OTLP, OpenCensus and others).

I am happy to donate this provided that I get merging rights on this subtree of the repo. I need to be able quickly iterate on this and waiting for regular approvals for changes in benchmarks is not going to work for the upcoming plans I have for iterating on logs support in OTLP. It will just slow me down too much.

If we can introduce CODEOWNERS and give me merging rights for a subdirectory where the benchmarks reside I will be glad to move it to this repo so that other people who need to work on the proto can also benchmark their changes.

If there are concerns about giving merging rights I am happy to keep the benchmarks in my repo, no problem.

Should HistogramTimeSeries be called HistogramOfDoublesTimeSeries?

In case we have any plans to introduce int64 histograms - we may want to specify the type of histogram values now.

Add comment about CUMULATIVE edge case

One edge case that probably we should document is if one implementation decides to use CUMULATIVE and resets the start_time every export interval making this type essentially a DELTA. What should we recommend/do when in this case?

Originally posted by @bogdandrutu in #140

Span tracestate incorrect type

OpenTelemetry Span tracestate is currently a string and an object in OpenCensus

If string will be used in this protocol it would be good to specify more details about it

Remove generated code from repository

The generated go code should not be checked in into the repository. See also #77 (comment)

Usually you should generate the files only during the build and maybe publish an artifact with them but never check them into source control (the same way you don't check in binary executables). I see #74 stated reasons to generate the code in this repository, and I think having the build files here is fine. But having the generated code here seems wrong.

End-to-end demonstration of DELTA and CUMULATIVE metric export pipeline

In the draft Metrics API spec update for 0.4, I've written a section on the fact that choice of cumulative- vs. delta-oriented export is significant. Any time a conversion has to be made from cumulative to delta, or from delta to cumulative, there is a memory requirement. The converter has to track the last value that was current in either case.

The use of these terms was a stumbling block for OTEP 98. Filing this issue to resolve issues there. The resolution is to avoid the terms cumulative and delta in the API documentation. The API terms will be "additive synchronous" (Counter and UpDownCounter) or "additive asynchronous" (SumObserver and UpDownSumObserver).

There is still validation to be done, to answer #143. Another way this question was phrased recently in gitter (thanks @jkwatson), was this:

Do we think that anyone in this SIG is in a position to write a full end-to-end description of how metrics would flow from API through to export, including OLTP, statsd, and prometheus, with the views API thrown into the mix? I'm less concerned with the SDK internals, and more about how the data inputs map to data outputs given the various instrument types and exporter flavors?

To close this issue, we should demonstrate an end-to-end metrics export pipeline that includes a memory-less client, OTLP transport, and collector-based exporters for statsd and prometheus that remember last-values in order to convert between delta- and cumulative- values during the export pipeline. Statsd exporters have to remember last-values in order to export deltas. Prometheus exporters have to remember last-values in order to export cumulatives.

proto is missing serviceName

Either serviceName is missing from proto or the https://github.com/open-telemetry/opentelemetry-collector doesn't support that, or both of them are missing this info.
This is causing for example in zipkin the trace to look like this

https://imgur.com/a/uB5dSSo

Remove or Update the name for same_process_as_parent_span

See comments for what is intended to be used, and decide if it is needed:

https://github.com/open-telemetry/opentelemetry-proto/blob/master/src/opentelemetry/proto/trace/v1/trace.proto#L245

Consider moving start_time_unixnano to timeseries

start_time_unixnano is likely going to be the same for all data point in timeseries. Moving this to timeseries message instead of putting it in the value (as it is now) will be a space and speed optimization.

Clarify and potentially remove InstrumentationLibraryMetrics

The metrics proto currently includes InstrumentationLibraryMetrics which does
not have clearly defined semantics. InstrumentationLibraryMetrics may contain
a number of metrics all associated with one InstrumentationLibrary. The nature
of this association is not clear.

The InstrumentationLibrary has a name and a version. It is not clear if
these fields are part of metric identity. For example if I have 2 different
InstrumentationLibrarys each having a different name and both containing a
Metric that have the same MetricDescriptor.name are these 2 different
timeseries or the same one?

Let's have a look at an example (pardon yaml syntax):

resource_metrics:
  resource: 
    ...
  instrumentation_library_metrics:
    - instrumentation_library:
        name: io.opentelemetry.redis
      metrics:
        - metric_descriptor:
            name: request.count
          int64_data_points:
            - value: 10

    - instrumentation_library:
        name: io.opentelemetry.apache.httpd
      metrics:
        - metric_descriptor:
            name: request.count
          int64_data_points:
            - value: 200

Presumably this data is about 2 different timeseries: one is the number of
database requests issued to Redis, the other is number of requests served by
Apache web server. They certainly need to be separate timeseries.

Semantically the name of InstrumentationLibrary appears to be equivalent to a
metric label.

If this is true then we have the same result by simply recording the name of the
instrumentation library as a regular label on the metric, e.g.:

resource_metrics:
  resource: 
    ...
  metrics:
    - metric_descriptor:
        name: "request.count"
      int64_data_points:
        - value: 10
          labels:
            - key: instrumentation.library.name
              value: io.opentelemetry.redis

        - value: 200
          labels:
            - key: instrumentation.library.name
              value: io.opentelemetry.apache.httpd

Alternatively if grouping by metric name is difficult for metric producer then
this data can produced:

resource_metrics:
  resource: 
    ...
  metrics:
    - metric_descriptor:
        name: "request.count"
        int64_data_points:
          - value: 10
            labels:
              - key: instrumentation.library.name
                value: io.opentelemetry.redis

    - metric_descriptor:
        name: "request.count"
        int64_data_points:
          - value: 200
            labels:
              - key: instrumentation.library.name
                value: io.opentelemetry.apache.httpd

Both of these approaches describe the data correctly using the generic labels
concept and still make it possible for interested parties to find out the
sending library by comparing the label key to "instrumentation.library.name"
(which should be added to our semantic conventions).

The benefits of this approach over using dedicated InstrumentationLibrary
concept are the following:

There is not need for a new concept and new message type at the protocol
level. This adds unnecessary complexity to all codebases that need to read and
write metrics but don't care about instrumentation library concept (likely the
majority of codebases).
It uses the general concept of metric labels that already exists and is well
understood and by doing so makes the semantics of instrumentation library name
clear. The instrumentation library name is one of the labels that form
timeseries identifier.
It makes mapping to other metric protocols and backend clearer. I am not aware
of any other metric protocol or backend that have the equivalent of
InstrumentationLibrary concept. However ultimately all metric data produced
by OpenTelemetry libraries will end up in a backend and the
InstrumentationLibrary concept must be mapped to an existing concept. Labels
seem to be the only concept that fit the bill. Using labels from the start of
the collection pipeline removes the need to deal with InstrumentationLibrary
by all codebases that need to make a mapping or translation decision
(Collector, backend ingest points, etc).

I suggest to remove InstrumentationLibrary message type from the protocol and
add semantic conventions for recording instrumentation library in metric labels.

There is potentially a minor downside: using InstrumentationLibrary message
may be slightly more efficient to encode/decode especially if there are multiple
metrics from the same instrumentation library, but the difference is small and I
think is not worth the complication.

If we want to retain the efficiency and deduplicate common labels we can put
common labels in the Metric. This was previously the case, we had labels in the
Metric and
they were removed
due to lack of clearly specified semantics. Assuming that we clearly require
that labels in the Metric don't overlap with labels in data points then we can
bring back common labels in the Metric.

Move Resource[Spans|Metrics] to the main proto files

Currently this messages are defined in the collector directory but they should probably live directly in the [metrics|trace].proto files because they define the structure of the data produce by the library.

Same comment applies to the Library[Spans|Metrics] if #94 gets merged.

Consider to remove Resource from Metrics and Spans

With the newly added ResourceMetrics and ResourceSpans messages it is not clear what is the benefit in having Resource as part of the Metric/Span definition is.

Review Data Model (Protos)

Everyone, please review the current protobuf and metadata definitions and leave comments here.

Take beta declaration serious or remove it.

#103 declared large parts of the protocols as beta, but purely cosmetic yet breaking changes like #120 were still incorporated. What's the definition of beta, this repository is following?
(I think our official definition is a bit over the top https://github.com/open-telemetry/community/blob/master/maturity-matrix.yaml#L43-L49)

Dual placement of Labels makes certain translations expensive

Problem

Metric proto currently allows Labels in 2 places: in the MetricDescriptor and in each DataPoint.

The semantics is that Labels in MetricDescriptor apply to all DataPoints. This requires complicated and expensive merging of Labels maps (which possibly is made even more expensive since they are stored as lists, requiring either intermediary data structures or O(nm) merging) when performing translations from OTLP to other formats that have no equivalent notion of having label keys per data point and per group of data points (e.g. OpenCensus).

Approach 1

I suggest that we eliminate one or the other and have Labels in one place only.

Approach 2

Alternatively, we can require that Labels in MetricDescriptor and in DataPoints cannot overlap, i.e. there do not exists any entries that have the same key in both places. This way the Labels in MetricDescriptor allow to avoid repeating common labels for all data points. In this case merging is possible by plain concatenation and there is no need to expensive merging that ensures uniqueness of the keys.

This approach is probably less desirable since it makes impossible to translate series of data points from OTLP to OpenCensus TimeSeries because each DataPoint can have a different set of keys (not possible in OpenCensus).

However, this would still be an improvement over current state of things if we really want to keep dual Labels placement.

Inconsistent Naming_CONVENTIONS for enum values

Looking at

opentelemetry-proto/opentelemetry/proto/trace/v1/trace.proto

Line 106 in e76584d

SPAN_KIND_UNSPECIFIED = 0;

opentelemetry-proto/opentelemetry/proto/trace/v1/trace.proto

Line 239 in e76584d

InvalidArgument = 3;

it seems we need to decide on a convention to follow when naming enum values.

Proposal: convert Tracestate to an unstructured raw string field in Span

The Span.tracestate field is currently a structured data that stores fully decoded Tracestate. However, there does not seem to be a good use case for keeping this in structured form, there are no known applications that work with this elements of this data, other than just passing it around fully.

I suggest to eliminate Tracestate message type and make Span.tracestate a simple string field in w3c-trace-context format.

This has been previously discussed as part of open-telemetry/oteps#59 (comment) and decided to be addressed in a separate PR.

Rename Resource.labels to Resource.attributes

Most entities (e.g. #48) are using a common Attribute structure, but Resource calls them "labels". This introduces unnecessary cognitive overhead.

Proposal: rename to "attributes" to be consistent across all entities.

Better describe the difference between Resource and Node

Describe the difference between Resource

opentelemetry-proto/src/opentelemetry/proto/resource/v1/resource.proto

Lines 24 to 27 in d8587aa

 message Resource { 

 // Set of labels that describe the resource. 

 map<string,string> labels = 1; 

 }

and Node

opentelemetry-proto/src/opentelemetry/proto/agent/common/v1/common.proto

Lines 28 to 96 in a46c815

 // Identifier metadata of the Node that produces the span or tracing data. 

 // Note, this is not the metadata about the Node or service that is described by associated spans. 

 // In the future we plan to extend the identifier proto definition to support 

 // additional information (e.g cloud id, etc.) 

 message Node { 

 // Identifier that uniquely identifies a process within a VM/container. 

 ProcessIdentifier identifier = 1; 

 // Information on the OpenTelemetry Library that initiates the stream. 

 LibraryInfo library_info = 2; 

 // Additional information on service. 

 ServiceInfo service_info = 3; 

 // Additional attributes. 

 map<string, string> attributes = 4; 

 // TODO(songya): Add more identifiers in the future as needed, like cloud 

 // identifiers. 

 } 

 // Identifier that uniquely identifies a process within a VM/container. 

 message ProcessIdentifier { 

 // The host name. Usually refers to the machine/container name. 

 // For example: os.Hostname() in Go, socket.gethostname() in Python. 

 string host_name = 1; 

 // Process id. 

 uint32 pid = 2; 

 // Start time of this ProcessIdentifier. Represented in epoch time. 

 google.protobuf.Timestamp start_timestamp = 3; 

 } 

 // Information on OpenTelemetry Library. 

 message LibraryInfo { 

 enum Language { 

 LANGUAGE_UNSPECIFIED = 0; 

 CPP = 1; 

 C_SHARP = 2; 

 ERLANG = 3; 

 GO_LANG = 4; 

 JAVA = 5; 

 NODE_JS = 6; 

 PHP = 7; 

 PYTHON = 8; 

 RUBY = 9; 

 } 

 // Language of OpenTelemetry Library. 

 Language language = 1; 

 // Version of Agent exporter of Library. 

 string exporter_version = 2; 

 // Version of OpenTelemetry Library. 

 string core_library_version = 3; 

 } 

 // Additional service information. 

 message ServiceInfo { 

 // Name of the service. 

 string name = 1; 

 // TODO(songya): add more fields as needed. 

 }

Remove gradle build and keep only .proto files

In Java client we would like to consume this library as a GIT submodule and run the build directly from there. The model classes will be published to maven central as part of the client build (as a separate module).

I think this approach will work for all clients. This way we don't have to configure publishing to central registries in multiple places and we also assure a client use the same versions of proto dependencies (e.g. java client builds Jaeger proto and otel proto)

cc) @bogdandrutu @tigrannajaryan @c24t @reyang @mayurkale22 @vmarchaud @fbogsany @SergeyKanzhelev @lmolkova @tsloughter

Here is an example of using proto files in Java open-telemetry/opentelemetry-java#504 - at the moment it uses https://github.com/pavolloffay/opentelemetry-proto

Release v0.4.0 to match Collector v0.4.0

There is a v0.3.0 tag matching Collector v0.3.0 (previous release), but no tag v0.4.0.

If we pin OTLP versioning to the collector releases, we should tag v0.4.0 matching the git submodule commit used by Collector v0.4.0 in https://github.com/open-telemetry/opentelemetry-collector/tree/v0.4.0/internal/data (/opentelemetry-proto Git submodule)

HistogramValue.Bucket.Exemplar is inadequate

The HistogramValue Bucket supports one exemplar per bucket, with exemplar values being restricted to string-valued attributes. This is particularly restrictive, for a number of reasons:

It's not possible to provide exemplars for non-histograms
It's not possible to provide more than one exemplar per bucket
It requires converting certain data types to a string, which will require additional specification

I would remove this support entirely until a full set of requirements can be drafted. (1) It's meaningful and reasonable to support encoding exemplars for any kind of metric instrument. (2) It's certainly useful to provide more than one exemplar per instrument/value-range, as a means of conveying a sample distribution, (3) There is a common request (e.g., open-telemetry/opentelemetry-specification#381, https://kccncna19.sched.com/event/UaXX/deep-linking-metrics-and-traces-with-opentelemetry-openmetrics-and-m3-rob-skillington-chronosphere) to include trace context in these exemplars, and forcing them to be encoded as string-attribute lists is onerous (and inefficient)--probably trace context should be specialized.

ExportTraceServiceRequest

Currently the ExportTraceServiceRequest is missing node and resource
Is this going to be implemented, was it removed or ?
https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-exporter-collector/src/types.ts#L92

Remove usage of TruncatableString

Initially TruncatableString was added in the OpenCensus to allow implementations to truncate strings at certain sizes if they exceed a certain limit.

This was actually never implemented or requested (there were clear requests for dropping events and attributes for long living Spans like for streaming RPCs).

I would suggest that we remove this:
pros:

less allocations when constructing the proto. this will improve the proto encode/decode.
less complicated JSON representation of the proto.
cons:
does not allow us to truncate strings and pass this information downstream.

cc @jmacd @iredelmeier @SergeyKanzhelev @tigrannajaryan

	message Resource {
	// Set of labels that describe the resource.
	map<string,string> labels = 1;
	}

	// Identifier metadata of the Node that produces the span or tracing data.
	// Note, this is not the metadata about the Node or service that is described by associated spans.
	// In the future we plan to extend the identifier proto definition to support
	// additional information (e.g cloud id, etc.)
	message Node {
	// Identifier that uniquely identifies a process within a VM/container.
	ProcessIdentifier identifier = 1;

	// Information on the OpenTelemetry Library that initiates the stream.
	LibraryInfo library_info = 2;

	// Additional information on service.
	ServiceInfo service_info = 3;

	// Additional attributes.
	map<string, string> attributes = 4;

	// TODO(songya): Add more identifiers in the future as needed, like cloud
	// identifiers.
	}

	// Identifier that uniquely identifies a process within a VM/container.
	message ProcessIdentifier {

	// The host name. Usually refers to the machine/container name.
	// For example: os.Hostname() in Go, socket.gethostname() in Python.
	string host_name = 1;

	// Process id.
	uint32 pid = 2;

	// Start time of this ProcessIdentifier. Represented in epoch time.
	google.protobuf.Timestamp start_timestamp = 3;
	}

	// Information on OpenTelemetry Library.
	message LibraryInfo {

	enum Language {
	LANGUAGE_UNSPECIFIED = 0;
	CPP = 1;
	C_SHARP = 2;
	ERLANG = 3;
	GO_LANG = 4;
	JAVA = 5;
	NODE_JS = 6;
	PHP = 7;
	PYTHON = 8;
	RUBY = 9;
	}

	// Language of OpenTelemetry Library.
	Language language = 1;

	// Version of Agent exporter of Library.
	string exporter_version = 2;

	// Version of OpenTelemetry Library.
	string core_library_version = 3;
	}

	// Additional service information.
	message ServiceInfo {

	// Name of the service.
	string name = 1;

	// TODO(songya): add more fields as needed.
	}

open-telemetry / opentelemetry-proto Goto Github PK

opentelemetry-proto's Issues

Problem

Proposal

Proposal 1

Proposal 2

Identified Issues

Instrumentation Instead of Aggregation

Nebulously Defined Need to be Compatible with OpenMetrics

Steps Forward

Problem

Approach 1

Approach 2

Recommend Projects

Recommend Topics

Recommend Org