I have a use case, where I want to include span_id an

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Making execution order of handlers guaranteed about telemetry HOT 16 OPEN

beam-telemetry commented on May 26, 2024

Making execution order of handlers guaranteed

from telemetry.

Comments (16)

hubertlepicki commented on May 26, 2024 2

@tsloughter you're actually bringing an interesting point, by chance, as the other Logger statements are being generated as well, based on telemetry events:

https://github.com/phoenixframework/phoenix/blob/ad4cd8b224c60a644090242e6a70e2b6404c9c1a/lib/phoenix/logger.ex#L132-L138

Logger is one thing, and if it's a side-effects free case or not, the way Phoenix/ecosystem embraced Telemetry library is that it replaced the old Phoenix.Endpoint.Instrumenter with it. Things like Spandex, AppSignal, NewRelic agent etc. etc. were previously all hooking into Instrumenter, now hook into Telemetry events and perform internal data collection/reporting as they happen.

The first example found in 2 mins, I just opened NewRelic agent and indeed they collect and report data in telemetry event handlers:

https://github.com/newrelic/elixir_agent/blob/master/lib/new_relic/telemetry/ecto/handler.ex#L54

It's probably not the use case Telemetry was designed for, but because the way Phoenix and ecosystem replaced own instrumentation with it, the ecosystem around tracing/errors/logging/reporting switched to using it and I'm not sure if that's a good thing entirely.

Part of the work I do is working on medical applications, where audits are critical. Currently, we're also listening to telemetry events to log things like request start/stop, or LiveView events as well. This is all good, but adds quite a lot of complexity if you want to do it properly, and by 'properly', and in that environment it means several things:

audits should fail the thing they're auditing if audits cannot be written
there shouldn't be actions with no audits

Telemetry's events are being detached when they fail, and system keeps running without them, and we had to guard against this by doing very little in the handlers and delegating the event handling to separate processes, but as you can see from the NewRelic example above, not everyone does it.

Ideally, now that I'm thinking about it, to support the real-life use cases that the events are being used in the wild already, and not the idealized way where people don't do side-effects in the handlers, we'd have:

event handlers being optionally "durable", i.e. surviving the handlers crashing
event handlers being executed in a predictable order (and this not being implementation detail)
having the ability for the programmer to insert before/after other events to ensure proper order, similar to what Absinthe does with it's pipelines https://hexdocs.pm/absinthe/Absinthe.Pipeline.html#insert_after/3

I agree we're probably not using Telemetry as intended, but here we are as a community it seems. I'm just looking for a solution here, maybe it's not to use Telemetry but to develop a separate mechanism for instrumentation these things, but it's very very tempting to just rely on telemetry as it is currently.

from telemetry.

josevalim commented on May 26, 2024 1

The order comes from an implementation detail and later on we may find more efficient implementations that do not provide consistent ordering and it would be a pity to not be able to change due to that. I also think relying on side-effects of telemetry handlers a poor practice and I definitely don't want to encourage it. :) So I am a strong 👎.

from telemetry.

tsloughter commented on May 26, 2024 1

I was going to suggest another function: telemetry:attach_durable.

from telemetry.

josevalim commented on May 26, 2024

In your particular case, those libraries should have a plug that sets those traces and run before the loggers or use another non-telemetry related hook to set those up.

from telemetry.

hubertlepicki commented on May 26, 2024

I know it's an implementation detail, but I suggest we make it an official behavior instead.

The solution to use Plug is okay, but it has to be a macro you use within your Endpoint, which then calls super/2 (like https://github.com/appsignal/appsignal-elixir-plug/blob/main/lib/appsignal_plug.ex#L52), otherwise there is no way to squeeze in your code before Phoenix emits [:phoenix, :endpoint, :start] event, and Phoenix.Logger logs "GET /".

I would love to get span_id/trace_id on the "GET /" line, and on anything that the default/used Plug pipeline produces between request starts and the user code as well.

I don't particularly like using the macro solution described above, but if that's the official way I'll have to stick to it.

from telemetry.

bryannaegele commented on May 26, 2024

On the general proposal I agree with Jóse that this is an implementation detail of telemetry and not something we could facilitate even if we wanted to given how we'd like to change the implementation. The only reason it's even in ets is because we had to support OTP <21.

For the specific issue I'd encourage you to bring it up in the opentelemetry channel on the Elixir slack for help.

from telemetry.

tsloughter commented on May 26, 2024

I can add a couple thoughts that sadly don't help with the immediate issue:

Why isn't Phoenix just using logger instead of telemetry+logger? I'm guessing the idea was to simply have 1 line in the code instead of 2 since you are going to need the telemetry event anyway? The same filtering functionality could generate logger filters to do the same I believe.

For handlers that don't have side effects the order certainly doesn't matter and as @josevalim says, those are the best use case for telemetry and shouldn't be encouraged. But starting an active span has a side effect, so I still wish there was a different way to wrap code -- so not with a Plug -- around a phoenix endpoint/plug so span starting and stopping could be handled that way and the underlying pdict context be guaranteed :)

from telemetry.

bryannaegele commented on May 26, 2024

@hubertlepicki since handlers can be attached and detached at any time by anything, guaranteeing some form of ordering becomes much more difficult. Questions like which caller's ordering takes precedence then have to be addressed. It would also box us in on other features which have been requested and we want to support which we couldn't efficiently handle before such as registering a handler by prefix, e.g. [:http, :request, *].

I don't mean to discount the functionality you're describing being valuable to someone's use case. It sounds like that kind of thing would be useful to you. However, telemetry was never intended to be a generic event dispatching library. There are already solutions for that. Perhaps some bridge using telemetry events to feed into a solution based on gen_event and gen_statem where you could manage more complex flows. But I'm not convinced this library which has always been meant to be a very basic library on which to build on should expand to these kinds of features.

Hopefully the new release with the failure events can help you at least handle re-attaching handlers on failure.

from telemetry.

hubertlepicki commented on May 26, 2024

@bryannaegele I agree completely. For my use cases, the switch from own instrumenter to telemetry, made by Phoenix and friends, take a step in somewhat the wrong direction for that very reason that telemetry was never meant to be used the same way.

It's almost like we need a second event dispatch mechanism, along Telemetry, where events are durable (i.e. do not detach), execute in predictable and defined way, and failures in handlers result in overall request failure as well. I think Logger should probably hook in there too, by the way. The default console logger is pretty rock solid but that's not guaranteed people don't use custom logger backends and, correct me if I am wrong, an error in these handlers would result in detaching logger from Phoenix events altogether (although I am imagining this thing, never actually happened to me so maybe not an issue).

I can see this being especially useful for LiveView and channels, as with normal HTTP requests a lot of what I am describing here actually can be done with Plug.

from telemetry.

hubertlepicki commented on May 26, 2024

@bryannaegele okay thank you so much. For now your solution seems to work, i.e. I added event handler that feeds the events I am interested in to another events dispatch mechanism. If a failure happens, I want to crash, so I do it, and re-attach the over all event handler.

I suspect, however, that there's a tiny chance for a race condition to happen here:
https://github.com/beam-telemetry/telemetry/blob/main/src/telemetry.erl#L164:L170

This happens because detach removes the handler from ETS table, then executes some instructions and then we re-attach it, and I think I see this actually happening but fairly rarely, in the wild.

Would adding option to not detach event handlers on failures, i.e. "durable handler" be acceptable solution? I could work on a PR for that, but if it's a deal breaker I wouldn't bother. I can probably live with this race condition, it seems to happen extremely rarely.

from telemetry.

tsloughter commented on May 26, 2024

Very good point. Probably a good idea in my opinion.

from telemetry.

bryannaegele commented on May 26, 2024

Yeah, I'm not opposed to an opt-in like that. It could be set in the user provided config.

@josevalim what do you think? Allow the user to explicitly opt in to not detach a handler on exception?

telemetry:attach(<<"durable handler">>,
                 [some, event],
                 fun log_response_handler:handle_event/4,
                 #{durable => true}).

The only sticky bit is the config can be any term and we'd be enforcing them to use a map if they want this feature but I'm ok with that versus API changes or new functions

from telemetry.

arkgil commented on May 26, 2024

@bryannaegele in my view this changes the purpose of the config. It's supposed to be the config of the handler itself, and not the configuration of how telemetry deals with the handler. What do you think about adding another argument. Too verbose?

from telemetry.

bryannaegele commented on May 26, 2024

Given those two choices, I'd probably go with the additional map options arg for flexibility and introduce attach/5 attach_many/5.

from telemetry.

hubertlepicki commented on May 26, 2024

It's funny how I went through the same options when looking at the code.

I agree the HandlerConfig is meant to be passed to the handler itself, so it's not great place to do so. Maybe it's not a great name either, as it's more of a "handler argument" rather than a config that would describe what to do with a handler. And modifying it to be a map would mean breaking the compatibility with current code so that's not great either.

I also was thinking about attach_durable but I also think I am leaning towards adding another argument. And, maybe, renaming the current one from "handler config" to "handler argument" - and calling the new one "handler config" or "options" ?

from telemetry.

hubertlepicki commented on May 26, 2024

I have made a proposal in the PR but please do check my erlang style as I'm not used to writing it.

from telemetry.

Making execution order of handlers guaranteed about telemetry HOT 16 OPEN

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent