connectrpc / otelconnect-go Goto Github PK
View Code? Open in Web Editor NEWOpenTelemetry tracing and metrics for Connect.
Home Page: https://connectrpc.com
License: Apache License 2.0
OpenTelemetry tracing and metrics for Connect.
Home Page: https://connectrpc.com
License: Apache License 2.0
We just switched to using the module and have been very pleased with the auto-instrumented requests but have also seen the number of events we send to our provider (honeycomb) increase quite a bit. Because their usage is based on number of events, when calling span.AddEvent
to attach the request and response, a single request between two services is now sending 5 events instead of 2.
Is there a specific otel convention for using AddEvent
to include the request/response sizes? Could we have an option to include them as attributes on the main request span?
for docs linked.
https://connectrpc.com/docs/go/observability/#enabling-opentelemetry-for-connect
missing an s on WithInterceptor func should be WithInterceptors
having the pkgs to import also makes it a little easier to integrate with.
import (
"connectrpc.com/connect"
"connectrpc.com/otelconnect"
)
func main() {
mux := http.NewServeMux()
mux.Handle(pingv1connect.NewPingServiceHandler(
&pingv1connect.UnimplementedPingServiceHandler{},
connect.WithInterceptors(
otelconnect.NewInterceptor(),
),
))
http.ListenAndServe("localhost:8080", mux)
}
func makeRequest() {
client := pingv1connect.NewPingServiceClient(
http.DefaultClient,
"http://localhost:8080",
connect.WithInterceptors(
otelconnect.NewInterceptor(),
),
)
resp, err := client.Ping(
context.Background(),
connect.NewRequest(&pingv1.PingRequest{}),
)
if err != nil {
log.Fatal(err)
}
log.Print(resp)
}
Let's follow up on https://github.com/bufbuild/connect-opentelemetry-go/pull/37#issuecomment-1358073237. We shouldn't use the net.peer.ip
attribute at all, and servers and clients should both emit net.peer.name
and net.peer.port
by default. This is high cardinality, but matches the OTel specification.
As @joshcarp notes on #37, the AttributeFilter
option introduced in #41 lets handlers strip the client port and IP from metrics to reduce cardinality. This seems likely to be a common request, so we should export an option like WithLowerCardinality
that does this automatically.
From the spec:
The Span Status MUST be left unset for an OK gRPC status code, and set to Error for all others.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/rpc.md#grpc-status
Currently spanStatus is set to codes.Ok
https://github.com/bufbuild/connect-opentelemetry-go/blob/6db18fe3823a852ca17875b68d739b035f26d028/interceptor.go#L277
Metrics sdk is also now stable: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.19.0
I don't know fi this is an otelconnect
issue or a general connectrpc issue (I'm new to both).
I have a mix of interceptors (including this one) and http.Handlers. It'd be very helpful if this interceptor added the request span to the http.Request context.
Right now to have tracing in the http.Handler middleware I have to use otelhttp which adds another span to the trace, or do similar work to "recapture" the traceparent from the incoming headers.
I have a connect client and another connect server and both instances have the interceptor for otelgrpc. Both of the services have the following setup:
otelInterceptor, err := otelconnect.NewInterceptor(otelconnect.WithTrustRemote())
if err != nil {
logger.Fatal().Err(err).Send()
}
interceptors := connect.WithInterceptors(
otelInterceptor,
logging.NewInterceptor(logger),
validateInterceptor,
)
// -------------------------------------------------------
organizationAuth := vanguard.NewService(authapiconnect.NewOrganizationAuthApiHandler(api, interceptors))
With jaeger, the traces are split. The docs mentioned that traces will be split between instances by default unless otelconnect.WithTrustRemote()
is added. But adding this option didn't fix the issue.
Could this also be because i'm using vanguard?
The following attributes have been deprecated:
net.peer.name
-> replaced with server.address
on client spans and client.address
on server spans.net.peer.port
-> replaced with server.port
on client spans and client.port
on server spans.http.status_code
-> replaced with http.response.status_code
.Opentelemetry interceptors should propagate baggage, currently connect-opentelemetry-go does not
The following is a trace of an API request. Note the "uncovered" time at the end. If I turn compression off, this uncovered time shrinks by a quite a bit (around 2 ms total)
I'd love to see exactly what's taking this response that much longer. It seems like ~10K of data should compress in faster than 2 ms, but I'm actually not sure how long that takes.
Per Otel spec for gRPC only errors indicating server fault (like INTERNAL
) must result in the span being marked as errors.
https://opentelemetry.io/docs/specs/semconv/rpc/grpc/#grpc-status
Errors indicating a client fault (like INVALID_ARGUMENT
) must not result in spans being marked as errors. This makes sense: for instance, user errors should not trigger alerts.
Otelconnect incorrectly reports all failed RPCs as errors here:
https://github.com/connectrpc/otelconnect-go/blob/main/interceptor.go#L357-L368
Once all the remaining issues are completed this repo is ready for a beta release
I'm running into a weird issue setting up otelconnect. When I add the WithTrustRemote
option to the interceptor, traces are not exported. When I remove the option, traces are exported and I can see the parent trace successfully marked as a trace.Link
My client is passing a traceparent
header that looks like this:
newHeaders.set('traceparent', `00-${activeSpan.spanContext().traceId}-${activeSpan.spanContext().spanId}-00`)
And my server setup is almost identical to the docs:
otelInterceptor := connect_go.WithInterceptors(
otelconnect.NewInterceptor(otelconnect.WithTrustRemote()),
)
// register rpc handlers
rpc.handle(prodRPC.NewProductionServiceHandler(productionService, otelInterceptor))
func setupOTelSDK(ctx context.Context, serviceName, serviceVersion string, logger *slog.Logger) (shutdown func(context.Context) error, err error) {
var shutdownFuncs []func(context.Context) error
// shutdown calls cleanup functions registered via shutdownFuncs.
// The errors from the calls are joined.
// Each registered cleanup will be invoked once.
shutdown = func(ctx context.Context) error {
var err error
for _, fn := range shutdownFuncs {
err = errors.Join(err, fn(ctx))
}
shutdownFuncs = nil
return err
}
// handleErr calls shutdown for cleanup and makes sure that all errors are returned.
handleErr := func(inErr error) {
logger.Info(fmt.Sprintf("OTEL ERROR: %s", inErr.Error()))
err = errors.Join(inErr, shutdown(ctx))
}
// Set up resource.
res, err := newResource(serviceName, serviceVersion)
if err != nil {
handleErr(err)
return
}
// Set up propagator.
prop := newPropagator()
otel.SetTextMapPropagator(prop)
// Set up trace provider.
tracerProvider, err := newTraceProvider(res)
if err != nil {
handleErr(err)
return
}
shutdownFuncs = append(shutdownFuncs, tracerProvider.Shutdown)
otel.SetTracerProvider(tracerProvider)
return
}
func newResource(serviceName, serviceVersion string) (*resource.Resource, error) {
return resource.Merge(resource.Default(),
resource.NewWithAttributes(semconv.SchemaURL,
semconv.ServiceName(serviceName),
semconv.ServiceVersion(serviceVersion),
))
}
func newPropagator() propagation.TextMapPropagator {
return propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
)
}
func newTraceProvider(res *resource.Resource) (*trace.TracerProvider, error) {
traceExporter, err := otlptracegrpc.New(context.Background())
if err != nil {
return nil, err
}
traceProvider := trace.NewTracerProvider(
trace.WithBatcher(traceExporter,
// Default is 5s. Set to 1s for demonstrative purposes.
trace.WithBatchTimeout(time.Second)),
trace.WithResource(res),
)
return traceProvider, nil
}
Need to implement
Implementation (conditionally)needs following:
net.peer.port
net.peer.name
net.sock.peer.addr
is unsetopen-telemetry/opentelemetry-specification#3333 introduced a breaking change to the OpenTelemetry semantic conventions for gRPC so this should be implemented
Hello,
I'm trying to receive a trace which seems to be picked up by the otel package, however, its missed by the Interceptor which then provides a new trace ID to the context.
I've added this to my project before creating the interceptor:
tc := propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
)
otel.SetTextMapPropagator(tc)
I'm passing a traceparent header and a tracestate header in my http request, from a Nodejs server.
The headers are recieved in my go server in the req.Headers interface. And when I step through the otel/propagation/trace_context.go extract
function, the header is found and the correct IDs are returned but then not picked up by the interceptor.
Any guidance would be helpful!
Thanks
With the changes in #41, I think we're currently assuming that handlers are internal: if the client has already created a trace, we add new spans to it. (After the holiday break, everything's a little hazy - I could be wrong about this!)
This is fine for RPCs within a trusted system, but it's not the right approach for external-facing servers. External-facing handlers should instead create a new span and attach any span sent by the client as a link (using trace.LinkFromContext). We should look at otelgrpc
and otelhttp
to make sure we get this right.
I lean toward making the external API behavior the default, since it's safest, but I'm not sure.
There seems to be a problem when trying the following command in projects that import this library. This is the case for all projects I've tried so far.
go get -u ./...
github.com/<user>/<project> imports
github.com/bufbuild/connect-opentelemetry-go imports
go.opentelemetry.io/otel/metric/global: cannot find module providing package go.opentelemetry.io/otel/metric/global
The issue seems to be that the upstream opentelemetry.io library moved the metric
directory in an update and then moved it back in another. Creating a conflict somewhere.
Can you please take a look and maybe update the dependencies of this library so it pulls in the correct sdk version please?
On this line there is a race condition, because createSpanOnce.Do(createSpan)
doesn't ensure that Do
returns and initializes before the span is actually created.
This means that the span can be nil in the case where OnClose and Send are called simultaneously.
Superset of all changes can be found here: open-telemetry/opentelemetry-go@metric/v0.34.0...metric/v0.35.0
This is somewhat a strange case. We have an application where there's a legacy gRPC component that is instrumented using the Otel gRPC instrumentation and a newer component that uses Connect and is instrumented with otelconnect-go
. As expected, both instrumentations produce rpc_client_*
metrics that are nearly identical. However, because the Connect instrumentation does not add descriptions to the metrics, the Prometheus exporter takes issue with that and fails complaining that the descriptions don't match. Here's a trimmed down example without all the labels cluttering the error message:
collected metric rpc_client_duration_milliseconds ... has help "Measures the duration of inbound RPC." but should have ""
This seems to come from a consistency check in the Prometheus client. Unfortunately, it's doing the check on the metric family rather than on individual metrics so having the rpc_system: connect
label doesn't help.
I am not sure what's the best solution for this is. Copying the descriptions from the Otel gRPC instrumentation seems to be quickest fix but if they ever change them upstream, the problem would surface again ๐คท๐ฝโโ๏ธ
cs := myconsumer.NewConsumerServer()
api := http.NewServeMux()
api.Handle(myconsumerv1connect.NewMyServiceHandler(cs, connect.WithInterceptors(
otelconnect.NewInterceptor())))
gives me the following error:
cannot use otelconnect.NewInterceptor() (value of type error) as connect.Interceptor value in argument to connect.WithInterceptors: error does not implement connect.Interceptor (missing method WrapStreamingClient)
There's a section on adding gRPC request and response metadata as attributes that needs to be implemented
rpc.grpc.request.metadata.<key>
rpc.grpc.response.metadata.<key>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.