I'm working on an instrumentation package for sending Ecto data to New Relic (<a href=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Expected user experience for installing a telemetry backend about telemetry HOT 19 CLOSED

beam-telemetry commented on May 26, 2024

Expected user experience for installing a telemetry backend

from telemetry.

Comments (19)

arkgil commented on May 26, 2024 3

Here you can check out my attempt at writing Telemetry.Metrics/StatsD integration: https://github.com/arkgil/telemetry_metrics_statsd. The code is not something I'd ship to production, but I hope it shows how one can use Telemetry.Metrics metric specs, attach to events and report metrics to external system based on that.

@deadtrickster I think that writing integration with Prometheus would be even easier, because the whole infrastructure is already in place. I.e. we would only need to create proper Prometheus metric and bump it from the event handler.

from telemetry.

josevalim commented on May 26, 2024

Hi @binaryseed! We are still working on the full experience but telemetry_metrics is meant to play a big role in that. We are hoping that new Phoenix applications will ship with something like this:

defmodule MyApp.Telemetry do
  use Supervisor

  def start_link(arg) do
    Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
  end

  def init(_arg) do
    children = [
      {Telemetry.StatsD, metrics: metrics()}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  defp metrics do
    [
      counter(
        "http.request",
        metadata: [:controller, :action], tags: [:controller, :action]
      ),
      sum(
        "my_app.repo.query",
       metadata: [...]
      )
    ]
  end
end

Likely we won't ship with StatsD per se but the idea is exactly that you can replace the StatsD call by anything else (or even have multiple of them):

    children = [
      {Telemetry.StatsD, metrics: metrics()}
      {NewRelic.Telemetry, metrics: metrics()}
    ]

Or even different metrics per system. Then what NewRelic.Telemetry will do is to use the information in the metrics to attach to the event behind the scenes and publish it to new relic in the most efficient possible. In other words, Telemetry doesn't do any of the wiring together, it is up to each implementation.

In other words, having an Ecto package would be fine for now but in practice you will expect users to tell you exactly what to get out of Ecto.

/cc @tsloughter @deadtrickster

from telemetry.

deadtrickster commented on May 26, 2024

What children processes are supposed to do? Get the list of metrics in the init, what else?

from telemetry.

josevalim commented on May 26, 2024

Get the list of metrics and then call :telemetry.attach on each of them and install the handlers that will aggregate the data in the way that makes most sense for the child process. For example, StatsD may have a pool of sockets and write to those sockets directly on each call. Prometheus may want to write this information to ets or another process (you would know better :D), etc. The important thing is exactly that we don't prescribe how the data is processes or aggregate in any way.

from telemetry.

deadtrickster commented on May 26, 2024

Yeah, I think I mostly understand the idea (till I start writing exporter :-)), but what to do in this supervised child after calling all attaches? Just empty loop? Perhaps I think simple callback instead of child process will be enough...

from telemetry.

josevalim commented on May 26, 2024

@deadtrickster that's a very good question. I dunno. You could just have a process doing nothing forever or have a task with type temporary that exits after setting everything up. I guess you could even just run init and return :ignore.

That is perhaps a question for telemetry to answer because we still don't have a good mechanism to work with detached handlers and processes may be part of it. We will get there. :)

from telemetry.

tsloughter commented on May 26, 2024

Since it is something that should only be configured at the highest level (so usually a release) I think it'd be good to support a list of handlers to attach in sys.config:

{telemetry, [{handlers, [...]}]}

Need to think more on how that would look and will try it out with OpenCensus.

Actually, now that I write that I'm thinking maybe it would be OpenCensus that is configured to attach to telemetry events.. So a user who includes OpenCensus can enable attaching to telemetry events through its config. That may be better.

from telemetry.

josevalim commented on May 26, 2024

I do have some concerns because starting handlers when telemetry starts may cause some handlers to be invoked before the application that defines the handler is started. Theoretically you should always depend on the things that are emitting events but that is not required since we are performing dynamic dispatching. Furthermore, any application can provide the functionality you propose, so I would rather keep telemetry core small, which is the original goal. :) -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

from telemetry.

tsloughter commented on May 26, 2024

@josevalim yea, I was starting on another reply to say start order sucks, so would definitely want to go with the latter.

I'm still not sure about getting the telemetry defined metrics to opencensus metrics though.

from telemetry.

tsloughter commented on May 26, 2024

Oh, and I don't think it'd look like the supervision example. I would expect the application (in this case statsd) to start and be able to configure itself based on application env and be updated by calls to it.

from telemetry.

josevalim commented on May 26, 2024

I will be glad to discuss this :) ping me on IRC later! -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

from telemetry.

binaryseed commented on May 26, 2024

For a New Relic exporter, I think it'd be ideal for the exporter to be responsible for attaching to the events, be it from application env config or via config passed to a process.

However regarding the telemetry-metrics part of the picture, the thing that seems a little tricky to me is taking a metric spec and turning it into the right Telemetry event... Say I get a metric spec name like my_app.repo.query. I could just assume that the Telemetry event is [:my_app, :repo, :query] and attach to that.

But it seems a little brittle - when I handle the event, the only thing I have to rely on is that query is the last segment of the event. The repo module could have a deeper nesting so it could have an arbitrary length, or it might not actually have Repo in the name. It's not possible to pattern match that last query segment which means my handle_event function can't really use the event to determine what to do.

I do want access to the Repo name because I'd like to include it in the metrics sent to New Relic (otherwise everything gets flattened if there are more than one Repo).

Perhaps that simply means that there shouldn't be app-specific things in the metric name / telemetry event. Maybe instead of my_app.repo.query it should always be ecto.query and the repo should be passed into the metadata?

Or perhaps it just means that the New Relic exporter should be passed higher-level config, like repos: [MyApp.Repo] and it should be responsible for deciding which metrics to deal with.

If there is a better place to have these broader design conversations, I'd be happy to shift them out of Github Issues...

from telemetry.

josevalim commented on May 26, 2024

That makes total sense. There are some sort of events you will want to know more about them, such as database queries, but I think that is not a telemetry concern but rather Ecto’s. Matching on the event name is error prone but we can probably make sure Ecto includes enough information in its metadata for you to uniquely identify it. We could also dispatch everything as [:ecto, :query], but given that we also allow users to change the name of the event per query, you would lose those, so I think metadata is the best way to go. Another option is for you to provide a special API in NewRelic where users can tell the query events, but that pushes more work to users.

…

-- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

from telemetry.

binaryseed commented on May 26, 2024

Great, I'll assume matching on metadata can give us everything we need.. I'll open an issue over in ecto for that particular case

I'll also assume for now that we'll start a child process that is passed a list of metrics and attaches to events based on them

from telemetry.

arkgil commented on May 26, 2024

I agree that users should not be concerned with what handlers are attached to what events. After all, they're not interested in having a handler attached, they want to get a metric pushed to StatsD, NewRelic or scraped by Prometheus (possibly via opencensus).

Regarding child processes, in the small StatsD exporter PoC I made some time ago, there was no process at all, only something like:

defmodule MyApp.Telemetry do
  def start do
    TelemetryMetrics.StatsD.start(metrics(), host: "...", port: ...)
  end

  defp metrics do
    [
      counter(
        "http.request",
        metadata: [:controller, :action], tags: [:controller, :action]
      ),
      sum(
        "my_app.repo.query",
       metadata: [...]
      )
    ]
  end
end

So having a process is not really always necessary, at least until we discover we need them to handle handler errors. My point is, it's not something we should be really concerned about right now.

@binaryseed Regarding your concerns about metric/event names, metric specs provided by Telemetry.Metrics already include the information about expected metric name and the source event name. If I create a metric spec like this:

Telemetry.Metrics.counter("my_app.repo.query", name: "my_app.db_query")

I expect the [:my_app, :repo, :query] to be the source event name, my_app.db_query to be the metric name in the system we export to (unless the system follows different naming conventions, like separating name segments with hyphens or something like that).

The struct returned by Telemetry.Metric.counter/2 call above contains event_name field with [:my_app, :repo, :query] value, and metric_name field with [:my_app, :db_query] value, so you can use them directly when attaching handlers and assembling the "external" metric name.

from telemetry.

josevalim commented on May 26, 2024

@arkgil I would recommend keeping a process even if all it does is to start the handlers on init, and then trap exits and remove the handlers on terminate. I think it is important to have the handlers attached to a process that respects a supervision tree shutdown. Otherwise, if handlers are not removed on shutdown and they depend on an application, process or service, we can have failures when certain events are emitted during shutdown and those may be hard to reproduce.

from telemetry.

arkgil commented on May 26, 2024

@josevalim right, that's a good point, I only thought about handling handler failures, but you're right that we need to support supervision tree shutdown.

from telemetry.

deadtrickster commented on May 26, 2024

Oh, thanks for statsd lib, It should be helpful. I'll try to write prometheus integration soonish.

from telemetry.

josevalim commented on May 26, 2024

There is also a ConsoleReporter shipping as part of Telemetry.Metrics as an example (as soon as this PR is merged) and @arkgil also wrote a "Writing Reporters" documentation for Telemetry.Metrics. I think we are all covered on this front now. :)

from telemetry.

Expected user experience for installing a telemetry backend about telemetry HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent