Giter VIP home page Giter VIP logo

natchez's Introduction

Natchez Trace

Discord Join the chat at https://gitter.im/skunk-pg/Lobby Maven Central Javadocs

Natchez is distributed tracing library for Scala.

Please proceed to the microsite for more information.

natchez's People

Contributors

alexcardell avatar andimiller avatar armanbilge avatar bpholt avatar bplommer avatar christiankjaer avatar daddykotex avatar dougc avatar estebanmarin avatar etspaceman avatar evgenyafanasev avatar gstro avatar hamnis avatar irevive avatar jacoby6000 avatar janstenpickle avatar keynmol avatar kubukoz avatar massimosiani avatar mergify[bot] avatar mijicd avatar mpilquist avatar msosnicki avatar pauljamescleary avatar rossabaker avatar sbly avatar scala-steward avatar tbrown1979 avatar tpolecat avatar typelevel-steward[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

natchez's Issues

Empty Span

Hello! I've been using something like this to make Trace[IO] bearable:

def defaultSpan[F[_]: Applicative](ep: EntryPoint[F]): Span[F] =
  new Span[F] {
    def put(fields: (String, TraceValue)*): F[Unit] = Applicative[F].unit

    def kernel: F[Kernel] = Applicative[F].pure(Kernel(Map.empty))

    def span(name: String): Resource[F, Span[F]] = ep.root(name)

    def traceId: F[Option[String]] = Applicative[F].pure(None)

    def spanId: F[Option[String]] = Applicative[F].pure(None)

    def traceUri: F[Option[URI]] = Applicative[F].pure(None)

  }

The idea is that Trace.ioTrace requires a default Span that it starts from. The combined usage looks like this:

Trace.ioTrace(Spans.defaultSpan(ep)).flatMap { implicit ioTrace: Trace[IO] =>
  ...
}

You can do it once per app, at startup, and pass the Trace[IO] instance everywhere, which is very convenient and doesn't require any wrapping in Kleisli.

I'd happily contribute something similar, or even provide a shortcut that goes from EntryPoint[IO] straight to IO[Trace[IO]] (to avoid the question of what span to use as the default). Does this seem like a good idea?

Enhancements for LogSpan

For the time being, we are heavily using just LogSpan for our applications (and maybe for a while as things like Jaeger and Honeycomb are a non-starter at the moment). The current LogSpan implementation has some shortcomings that force us to have to write our own. We would prefer to discuss those here and contribute changes back to the project (if desired).

Current Issues

  1. We cannot control the format of the log line - as the json is the only format and the encoding is in the LogSpan.json method. This makes it impossible to have the log output align with our standards
  2. The root log span and all children are output in the same log entry, as opposed to one log entry per span. This complicates log discovery for us, in addition when we have a very long running process with lots of log spans it adds memory pressure and can also delay the time before we see the log entry.
  3. We have the need for doing additional processing of the span other than just logging. E.g. We would like to generate epimetheus metrics to expose error rate, counter, and latency information to Prometheus.

Recommendations

Allow the user to control the processing of the span

Example: allow the user to provide a handler: (LogSpan[F] => ExitCode[Throwable]) => F[Unit] to the LogSpan so they can do anything they want with the span, including logging or metric generation.

Complete each child span immediately, calling release as soon as it is finished

The current finish of a child span looks like the following:

  def finish[F[_]: Sync: Logger]: (LogSpan[F], ExitCase[Throwable]) => F[Unit] = { (span, exitCase) =>
    for {
      n  <- now
      j  <- span.json(n, exitCase)
      _  <- span.parent match {
              case None |
                   Some(Left(_))  => Logger[F].info(Json.fromJsonObject(j).spaces2)
              case Some(Right(s)) => s.children.update(j :: _)
            }
    } yield ()
  }

Instead of explicitly managing children, remove the children all together and call the handler on finish of all spans. The finish could be omitted, and Resource would be sufficient with the handler in scope:

def span(name: String): Resource[F, Span[F]] = {
  Resource.makeCase(createChildHere)(handler).widen
}

Log backend ignores trace ids from Kernel with non-UUID trace ids.

Given a kernel like this

val kernel = Kernel(Map(
  "X-Natchez-Trace-Id" -> "non-uuid-trace-id", 
  "X-Natchez-Parent-Span-Id" -> "non-uuid-span-id"
))

The natchez.log.Log.entryPoint.continueOrElseRoot function will drop the trace id, because it's expecting UUIDs. This makes it impossible to use the Log backend with other backends that produce non-UUID trace ids.

Context switching between Natchez and backend instrumentation

Many of backends support auto-instrumentation for certain popular (non-Scala FP) libraries, typically using Java ThreadLocals to maintain the current span. If you're working in an app that has a tagless final / cats effect portion (where Natchez is a good fit) and also a non-FP Scala portion, it's not clear how best to propagate the span context between the two worlds in the same app.

I don't have a good solution for this, but if one exists, it would be nice to add it to the library, or at least document it. If one doesn't exist, that would also be good to document, so users can avoid frustration.

`ExitCase` lost in several places

Hi!

I noticed that allocated is used several times instead of allocatedCase. If I got it right, this would mean that the ExitCase that the finalizer of a resource receives will always be ExitCase.Succeeded, even if wrapped in another Resource.apply.

The relevant spots are:

t.span(name, options).allocated.map { case (child, release) =>

span.span(name, options).allocated.map { case (child, release) =>

(there are actually more in that file)

I believe for a correct implementation these should be replaced with the case-aware variants. Otherwise, the resources being wrapped will never be closed with a Canceled/Errored exit.

http4s module

Hey @tpolecat would you be open to a PR adding an http4s module for this project? Wrapping a Client, middleware for a server, maybe other things? If so, I'll try to get something created soonish

Natchez Xray Fails to Connect Requests from Service to Service

Problem:
Natchez Xray isn't linking serivce-to-service HTTP requests smoothly with AWS Xray tracing, even though the AWS Xray tracer header is present.

wrong xray
Although the trace id is the same, requests to another service shows up as another flow/tree(webdealer service hits /search endpoint in search service)

Impact:
This messes up the AWS Xray UI, displaying disjointed request trees instead of a single flow. It's a headache for developers trying to troubleshoot and analyze request journeys.

Solution:
We've identified and PR'ed a fix - #1001 . Enhancing Natchez Xray to recognize and use the AWS Xray tracer header properly will ensure seamless request tracing.

right xray
Search service is now part of the webdealer flow after webdealer service hits /search endpoint.

Action:
We kindly request the maintainers of Natchez to review our PR and consider integrating the proposed solution into the main codebase.

Thanks for your attention to this issue.

`Trace[IO]`

Starting with typelevel/cats-effect#1393, cats-effect (just like other effect systems) has a way to propagate fiber-local state without ReaderT using FiberLocal[F, A].

Additionally, to encapsulate the common pattern of having essentially ReaderT[F, Ref ...], @RaasAhsan has opened typelevel/cats-effect#1822.

We should investigate how/if these could be used to get, say, IO[Trace[IO]].

Is there a way to filter spans with Natchez?

We are using Natchez with Skunk, and our tracing payloads heavily populated with low-level database operations from Skunk that we generally don't want to include them in our traces. Is there a way (like with logging frameworks) whereby you can filter out spans that come from a dependent packages? Or is it "all or nothing"?

Binary Incompability in 0.0.10

Whilst upgrading Natchez from 0.0.8 to 0.0.10 I stumbled across this binary incompatibility introduced by the lightstep-grpc project, specifically caused by an import in the GrpcCollectorClientProvider class.

java.util.ServiceConfigurationError: com.lightstep.tracer.shared.CollectorClientProvider: Provider com.lightstep.tracer.shared.GrpcCollectorClientProvider could not be instantiated
  at java.util.ServiceLoader.fail(ServiceLoader.java:232)
  at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
  at java.util.ServiceLoader$LazyIterator.access$700(ServiceLoader.java:323)
  at java.util.ServiceLoader$LazyIterator$2.run(ServiceLoader.java:407)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:409)
  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
  at com.lightstep.tracer.shared.CollectorClientProvider.load(CollectorClientProvider.java:42)
  at com.lightstep.tracer.shared.CollectorClientProvider.<clinit>(CollectorClientProvider.java:19)
  at com.lightstep.tracer.shared.AbstractTracer.<init>(AbstractTracer.java:157)
  at com.lightstep.tracer.jre.JRETracer.<init>(JRETracer.java:31)
      ...
Caused by: java.lang.NoClassDefFoundError: io/grpc/util/RoundRobinLoadBalancerFactory
  at java.lang.Class.getDeclaredConstructors0(Native Method)
  at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
  at java.lang.Class.getConstructor0(Class.java:3075)
  at java.lang.Class.newInstance(Class.java:412)
  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
  ... 42 more
Caused by: java.lang.ClassNotFoundException: io.grpc.util.RoundRobinLoadBalancerFactory
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:98)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:352)

Publishing a local version of natchez from master seems to resolve the issue as the problematic dependency has been updated to address the binary incompability. Would it be possible to cut a new release that uses the latest dependencies?

Inconsistent behaviour with natchez & opentelemetry

background

For the last while, I've been using natchez-extras-datadog-stable to send metrics. For various reasons (mostly due to correlating with other datadog-internal generated traces), DataDog support tells me that the preferred solution is to have the instrumented JVM send the metrics, and not do it myself (via the blaze client impl above).

Their recommended solution for me is to create opentelemetry traces. The datadog agent (instrumented in the JVM) will automatically pick these up and convert/dispatch them to datadog.

Issue

I dropped natchez-extras-datadog-stable and using just natchez-opentelementry from this project. I am able to create an trace that does indeed get picked up and sent to datadog - which is great!

The problem starts when I try to mix http4s middleware into the mix.

I've tried both

No matter what I try, I can't seem to get traces from the instrumented routes to show up, even though this is the same EntryPoint instance that I can directly invoke to get valid traces created.

reproducing the issue

I did find a hacky way to intercept and view the traces that datadog is ignoring, and it does appear the difference is that
when I directly use the entrypoint, a valid traceID is set. When I use the middleware, this is 0 (datadog requires a non-zero 64 bit signed int)

If it helps, I created an example application - https://github.com/barryoneill/natchez-otel-datadog-experiments - that:

  • creates a natchez-opentelemetry EndPoint
  • manually creates a span (A) using the EndPoint directly
  • starts two http4s services, instrumented using the same Endpoint as test A, one using natchez-http4s and the other natchez-http4s-otel
  • makes a POST call to each service, which results in an OTel trace being created (B) and (C)

Outcome:

  • A has a valid traceID (and in the real world, shows up in datadog)
  • B and C have a 0 value (and in the real world, don't show up in datadog)

Sample output:

+--------------+-----------------------+-----------------------+----------------------------------------+---------------------+---------------------+--------------------------------------------------------------------+--------------+-----------+---------------+
| When         | Operation Name        | Resource Name         | DD Trace Id                            | DD Span Id          | DDParentSpanId      | Tags                                                               | Service Name | Span Type | Duration (ms) |
+--------------+-----------------------+-----------------------+----------------------------------------+---------------------+---------------------+--------------------------------------------------------------------+--------------+-----------+---------------+
| 15:43:46.337 | Direct_Endpoint_Usage | Direct_Endpoint_Usage |  64b: 000000000000000014576a24b90b8d37 | 4054336080662395000 | 0                   | _dd.profiling.enabled=0                                            | fooSvc       | internal  | 6             |
|              |                       |                       |                                        |                     |                     | _dd.trace_span_attribute_schema=0                                  |              |           |               |
|              |                       |                       |                                        |                     |                     | env=fooEnv                                                         |              |           |               |
|              |                       |                       |                                        |                     |                     | language=jvm                                                       |              |           |               |
|              |                       |                       |                                        |                     |                     | process_id=48250                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | runtime-id=3fc5a447-09d8-4916-a267-9c0189ba6666                    |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.id=133                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.name=io-compute-7                                           |              |           |               |
| 15:43:46.352 | /biz                  | POST /biz             | 128b: 00000000000000000000000000000000 | 4308140243084585245 | 0                   | _dd.profiling.enabled=0                                            | fooSvc       | internal  | 961           |
|              |                       |                       |                                        |                     |                     | _dd.trace_span_attribute_schema=0                                  |              |           |               |
|              |                       |                       |                                        |                     |                     | env=fooEnv                                                         |              |           |               |
|              |                       |                       |                                        |                     |                     | http.method=POST                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | http.status_code=200                                               |              |           |               |
|              |                       |                       |                                        |                     |                     | http.url=/biz                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | language=jvm                                                       |              |           |               |
|              |                       |                       |                                        |                     |                     | process_id=48250                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | runtime-id=3fc5a447-09d8-4916-a267-9c0189ba6666                    |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.id=129                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.name=io-compute-3                                           |              |           |               |
| 15:43:46.597 | fakeBizLogic          | fakeBizLogic          | 128b: 00000000000000000000000000000000 | 5724611567296865609 | 4308140243084585245 | businessAttr=STONKS                                                | fooSvc       | internal  | 417           |
|              |                       |                       |                                        |                     |                     | env=fooEnv                                                         |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.id=129                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.name=io-compute-3                                           |              |           |               |
| 15:43:47.327 | Http Server - POST    | POST /biz             | 128b: 00000000000000000000000000000000 | 7224963246604818194 | 0                   | _dd.profiling.enabled=0                                            | fooSvc       | internal  | 1240          |
|              |                       |                       |                                        |                     |                     | _dd.trace_span_attribute_schema=0                                  |              |           |               |
|              |                       |                       |                                        |                     |                     | env=fooEnv                                                         |              |           |               |
|              |                       |                       |                                        |                     |                     | exit.case=succeeded                                                |              |           |               |
|              |                       |                       |                                        |                     |                     | http.client_ip=127.0.0.1                                           |              |           |               |
|              |                       |                       |                                        |                     |                     | http.flavor=1.1                                                    |              |           |               |
|              |                       |                       |                                        |                     |                     | http.host=localhost:9000                                           |              |           |               |
|              |                       |                       |                                        |                     |                     | http.method=POST                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.accept=text/*                           |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.connection=keep-alive                   |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.content_length=0                        |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.date=Thu, 13 Jul 2023 19:43:47 GMT      |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.host=localhost:9000                     |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request.header.string.user_agent=http4s-ember/0.23.22         |              |           |               |
|              |                       |                       |                                        |                     |                     | http.request_content_length=0                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | http.response.header.string.content_length=13                      |              |           |               |
|              |                       |                       |                                        |                     |                     | http.response.header.string.content_type=text/plain; charset=UTF-8 |              |           |               |
|              |                       |                       |                                        |                     |                     | http.response_content_length=13                                    |              |           |               |
|              |                       |                       |                                        |                     |                     | http.status_code=200                                               |              |           |               |
|              |                       |                       |                                        |                     |                     | http.target=/biz                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | http.url=/biz                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | http.user_agent=http4s-ember/0.23.22                               |              |           |               |
|              |                       |                       |                                        |                     |                     | language=jvm                                                       |              |           |               |
|              |                       |                       |                                        |                     |                     | net.peer.ip=127.0.0.1                                              |              |           |               |
|              |                       |                       |                                        |                     |                     | net.peer.port=56766                                                |              |           |               |
|              |                       |                       |                                        |                     |                     | process_id=48250                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | runtime-id=3fc5a447-09d8-4916-a267-9c0189ba6666                    |              |           |               |
|              |                       |                       |                                        |                     |                     | span.kind=server                                                   |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.id=135                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.name=io-compute-9                                           |              |           |               |
| 15:43:47.775 | fakeBizLogic          | fakeBizLogic          | 128b: 00000000000000000000000000000000 | 8328755773498611729 | 7224963246604818194 | businessAttr=STONKS                                                | fooSvc       | internal  | 336           |
|              |                       |                       |                                        |                     |                     | env=fooEnv                                                         |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.id=135                                                      |              |           |               |
|              |                       |                       |                                        |                     |                     | thread.name=io-compute-9                                           |              |           |               |
+--------------+-----------------------+-----------------------+----------------------------------------+---------------------+---------------------+--------------------------------------------------------------------+--------------+-----------+---------------+
15:43:48.606 ERR appserver[dd.service=fooSvc, dd.env=fooEnv] 4 out of 5 datadog spans had empty traceIds

JaegerSpan to mark errors

Currently the JaegerSpan is not setting the appropriate error flag when exception happens. The error flag is respected in the Jaeger UI and the effect can be seen here (error at redis.getDriver span).

In addition to setting this flag, some additional logs could be set, according to the spec here

Idea: Allow treating the span as a resource in Trace

Currently, the Trace API has this:

def span[A](name: String)(k: F[A]): F[A]

I believe it would also be beneficial to have a variant that returns a resource:

trait Trace[F[_]] {
  def spanR(name: String): Resource[F, Span[F]]
}

Why? For example, for composing it with other resources...

Trace[F].spanR("my-resource") *> someResource

This would track the lifetime of the resource (and set the span in the composed resource's use too).

Also, you could use it with streams:

fs2.Stream.resource(Trace[F].spanR("my-resource")) *> someStream

This span would also last as long as the stream, and propagate to every effect in the stream.

I managed to get this with some hackery (Resource of Kleisli doesn't really look that good, but it works: #72

Flat code structure

Is there a reason why the source code doesn't follow the usual Scala layouts for packaging? For example, code under modules/core/shared/src/main/scala are written to be in package natchez. However, they are not actually in a sub-directory natchez which is the usual Scala convention.

Although Scala doesn't enforce it, most Scala code follow this convention.

Expose child Spans from OpenTracing API to support tracing legacy code?

I have a use-case where we're using http4s to drive legacy code written in Java (which we're not ready to rewrite in the typelevel style).

I would like to be able to share a trace initiated from http4s via natchez (and code written in #5) to pass down a child OpenTracing Span to serve as the root span of my legacy Java calls.

I realize this breaks the nice abstraction / hiding provided by natchez, but there can be reasons for consenting adults to do so from time-to-time.

Are we open to adding extension methods to the entrypoint hooks (like in JaegerTracer) to construct and return a child span from the OpenTracing API, or would that ruin everything?

Interoperability with Honeycomb beelines?

I'm interested in this being interoperable with Honeycomb (Beeline) integrations for other languages, and wanted to check if this was of interest:

Historically Honeycomb used an X-Honeycomb-Trace header, encoding trace ID, span ID, and some attached context. Recent updates have made it possible to also read traces using the W3C Trace Context headers. The outgoing headers still default to using the Honeycomb header, but there are plans to make this configurable.

In line with the above, the new default generated trace ID is a 128-bit identifier, and the span ID is a 64-bit identifier. These are encoded in a hexadecimal representation on the wire (so can be represented by text). This change is so that trace/span IDs can be compatible with both tracing systems.

eg: honeycombio/beeline-go#113 and honeycombio/beeline-go#110

Possible improvements:

  1. Change trace and span IDs to String type, and generate IDs compatible with W3C
  2. Support X-Honeycomb-Trace header format
  3. Support W3C trace header format (actually I'm hoping to use this format, as it makes it much easier to interoperate with other services that don't directly support Honeycomb)
  4. Some Beeline implementations also now support AWS X-Ray headers inbound, but I don't really care about it 😄
  5. It would be nice to be able to configure which serialization is used for inbound or outbound requests, but not clear if that should be in this library, or in http4s integration.

Support for Associating Log Group Name with Trace in Natchez and AWS X-Ray Daemon

We are using Natchez for generating traces for AWS X-Ray. X-Ray allows logs to be attached to the traces being generated, provided the CloudWatch log group name is included in the traces sent to X-Ray through the daemon. However, Natchez does not currently support passing this property along with traces, and neither does the AWS X-Ray Daemon.

In the screenshot below, the log section is empty even though the logs have the trace ID and entity ID present

Screenshot from 2024-05-24 14-51-42
In the image above image log section is empty even though the logs has trace id and entityId present in the logs.

If the trace includes the following details:
"aws": {
"cloudwatch_logs": [
{
"log_group": ""
}
]
}
then logs will appear as shown below:
image

Proposed Solution:
Introduce a mechanism in Natchez to include the CloudWatch log group name in the trace annotations. Specifically, if a field named aws_group_name is present in the annotations, it should be added as the AWS CloudWatch log group in the trace.
have submitted a PR for the same.
Steps to Reproduce

  1. Use Natchez to send X-Ray traces and use the X-Ray Daemon as the collector.
  2. Add trace and span details in the logs.
    3.Verify if the CloudWatch logs are included with the trace.

natchez-log: parent id is always trace id

Hi, when running a small example using the Log backend, I noticed that the parent ids of child spans are always the trace id, instead of that of their parents. I've provided an example program that uses Log, along with one that uses Jaeger, for contrast:

Log Program

Program:

package com.example

import cats.effect._
import org.typelevel.log4cats._
import org.typelevel.log4cats.slf4j._
import natchez._
import natchez.log.Log

object TracingExample extends IOApp.Simple {
  implicit val log: Logger[IO] =
    Slf4jLogger.getLoggerFromName("example-logger")

  val entryPoint = Log.entryPoint[IO]("my_svc")

  val run =
    run0(entryPoint)

  def run0(entryPoint: EntryPoint[IO]) =
    entryPoint.root("run0").use(run1)

  def run1(parent: Span[IO]) =
    parent.span("run1").use(run2)

  def run2(parent: Span[IO]) =
    parent.span("run2").use_
}

Output:

When running the above program, output shows that the parent ids of child spans is always the trace id, instead of their parents' ids:

[info] running com.example.TracingExample
2021-11-05 08:06:04.329 [io-compute-0] INFO  example-logger - {
  "name" : "run0",
  "service" : "my_svc",
  "timestamp" : "2021-11-05T15:06:04.251796Z",
  "duration_ms" : 72,
  "trace.span_id" : "43e77407-b5bc-4e6c-a9a4-8d6a14364502",
  "trace.parent_id" : null,
  "trace.trace_id" : "7331a334-f8fc-4cd7-9473-b18615adc412",
  "exit.case" : "succeeded",
  "children" : [
    {
      "name" : "run1",
      "service" : "my_svc",
      "timestamp" : "2021-11-05T15:06:04.257590Z",
      "duration_ms" : 65,
      "trace.span_id" : "7f947cd6-dc88-4e3f-99e2-f1b83b568fd4",
      "trace.parent_id" : "7331a334-f8fc-4cd7-9473-b18615adc412",
      "trace.trace_id" : "7331a334-f8fc-4cd7-9473-b18615adc412",
      "exit.case" : "succeeded",
      "children" : [
        {
          "name" : "run2",
          "service" : "my_svc",
          "timestamp" : "2021-11-05T15:06:04.259035Z",
          "duration_ms" : 0,
          "trace.span_id" : "b71cc362-3e56-48b9-8655-bccf57a833a7",
          "trace.parent_id" : "7331a334-f8fc-4cd7-9473-b18615adc412",
          "trace.trace_id" : "7331a334-f8fc-4cd7-9473-b18615adc412",
          "exit.case" : "succeeded",
          "children" : [
          ]
        }
      ]
    }
  ]
}

Jaeger Program

Program:

package com.example

import cats.effect._
import natchez._
import natchez.jaeger.Jaeger

object TracingExample extends ResourceApp.Simple {
  val entryPoint = Jaeger.entryPoint[IO]("my_svc")(c =>
    IO(
      c.withSampler(
        io.jaegertracing.Configuration.SamplerConfiguration
          .fromEnv()
          .withType("const")
          .withParam(1)
      ).getTracer()
    )
  )

  val run =
    entryPoint.evalMap(run0)

  def run0(entryPoint: EntryPoint[IO]) =
    entryPoint.root("run0").use(run1)

  def run1(parent: Span[IO]) =
    parent.span("run1").use(run2)

  def run2(parent: Span[IO]) =
    parent.span("run2").use_
}

Output

By contrast, when running the similar Jaeger program, the parent ids of child spans are their parents' ids, as expected:

Screen Shot 2021-11-05 at 8 25 51 AM

`DDSpan` propagate error to parent span

Hi !

I have a project implementing retries on network calls using cats-retry. This seemed to result in an apparently weird behaviour where when at least one retry was done then the whole resquest was marked as error even though our service responds with a 200, ie. "error" -> true is added to the root span the trace.

image

This seems to be linked to this bit which propagate a span error status to their parent

https://github.com/tpolecat/natchez/blob/08ec0cc98162aa9e21db7a00538251b616edffe1/modules/datadog/src/main/scala/DDSpan.scala#L104-L113

which is a behaviour we see only in the Datadog span. Neither Jaeger nor OpenTelemetry spans do this. Is there any reason for this ?

Propagate span values to children spans

There should be possible to automatically propagate some values (e.g. correlation ids) from parent to its children. I propose to make this the default behavior of put, and to add putLocal which should be used to add span-local values.

Http4s middleware - example with a resource being used by routes

Hey all!

I'm trying to understand how to use natchez and natchez-http4s when the route requires a resource (such as database calls). Is there an example of this?

When trying to send the database object to the routes I get the following type error around the resource I'm trying to use in the http routes.

[error] -- [E007] Type Mismatch Error: Application.scala:54:56 
[error] 54 |      ap = ep.liftT(NatchezMiddleware.server(rootRoutes(db))).orNotFound
[error]    |                                                        ^^
[error]    |Found:    (db : modules.Database[cats.effect.IO])
[error]    |Required: modules.Database[
[error]    |  [_] =>> cats.data.Kleisli[cats.effect.IO, natchez.Span[cats.effect.IO], _]]
[error]    |
[error]    | longer explanation available when compiling with `-explain`
[error] one error found
object Application extends IOApp.Simple {
  given logger: Logger[IO] = Slf4jLogger.getLogger[IO]

  val epInIO: EntryPoint[IO] = Log.entryPoint[IO]("app")

  def server(
      xa: HikariTransactor[IO],
      httpConf: HttpConfig
  ): Resource[IO, Server] =
    for
      ep: EntryPoint[IO] <- Resource.pure(epInIO)

      db: Database[IO] = Database(xa)

      ap = ep.liftT(NatchezMiddleware.server(rootRoutes(db))).orNotFound

      server: Server <- EmberServerBuilder
        .default[IO]
        .withHost(httpConf.host)
        .withPort(httpConf.port)
        .withHttpApp(ap)
        .build
    yield server

  override def run: IO[Unit] =
    for
      dbConf: DatabaseConfig <- DatabaseConfig.getConfig[IO]
      httpConf <- HttpConfig.getConfig[IO]

      program: Resource[IO, Unit] = for
        xa: HikariTransactor[IO] <- Database.makePostgresResource[IO](dbConf)
        _x: Server  <- server(xa, httpConf)
      yield ()

      _ <- program.use(_ => IO.never)
    yield ExitCode.Success
}
def rootRoutes[F[_]: Trace: Concurrent: Logger](database: Database[F])(implicit
    ev: MonadError[F, Throwable]
): HttpRoutes[F] = Router(
  "/"            -> (HealthRoutes[F].routes),
  "/v1/security" -> (SecuritiesRoutes[F](database).routes)
)
object Database:
  def apply[F[_]: Async: Logger](
      xa: Transactor[F]
  ): Database[F] = new Database[F](
    securities = SecuritiesRepository[F](xa),
    identifiers = IdentifierRepository[F](xa),
    securityTree = SecurityTreeRepository[F](xa),
    events = EventRepository[F](xa)
  )

  def makePostgresResource[F[_]: Async: Logger](
      config: DatabaseConfig
  ): Resource[F, HikariTransactor[F]] = for
    ec <- ExecutionContexts.fixedThreadPool(config.nThreads)
    xa <- HikariTransactor.newHikariTransactor[F](
      driverClassName = "org.postgresql.Driver",
      url = s"jdbc:${config.url}",
      user = config.user,
      pass = config.pass,
      connectEC = ec,
      logHandler = Some(mkLogger[F])
    )
  yield xa

Tracing resources

Given this Client:

  type Client[F[_]] = Request => Resource[F, Response]

We would like to able to span the entire resource, with child spans for acquisition and release:

ambient
+- client
  +- acquire
  +- use
  +- release

Inspired by natchez-http4s and #19, we might try:

    def tracedClient[F[_]: MonadCancelThrow: Trace](client: Client[F]): Client[F] = { req =>
      Resource(Trace[F].span("client") {
        Trace[F].span("acquire")(client(req).allocated).map { case (resp, release) =>
          (resp, Trace[F].span("release")(release))
        }
      })
    }

But the span tree looks like this:

ambient
+- client
| +- acquire
+- use
+- release

We could introduce a more powerful TraceResource type class:

trait TraceResource extends Trace[F] {
  def spanResource(name: String): Resource[F, Unit]
}

Tracing a client becomes:

    def tracedClient[F[_]: MonadCancelThrow: TraceResource](client: Client[F]): Client[F] = { req =>
      TraceResource[F].spanResource("client") >>
      Resource(
        Trace[F].span("acquire")(client(req).allocated).map { case (resp, release) =>
          (resp, Trace[F].span("release")(release))
        }
      )
    }

Executable proof of concept:

$ scala-cli test -S 2.13 https://gist.github.com/rossabaker/8872792b06bd84e5be8fae3c9caf8731

We could avoid the second type class by adding the method to Trace. This would be a breaking change. In addition to the new abstract method, we'd have to mapK the Resource on all the transformed instances, which would require a MonadCancel where there are currently weaker constraints. All the base instances except noop already require MonadCancel, so maybe that's okay.

/cc @zmccoy

X-Ray trace finalization fails when serialized size exceeds 64K

When serializing a trace using the X-Ray backend, serialized traces larger than 64KB are rejected by the X-Ray daemon and a java.net.SocketException: Message too long is raised from the DatagramSocket, because X-Ray enforces a maximum segment document size of 64KB.

The documentation suggests "send[ing] subsegments separately … to avoid exceeding the maximum segment document size (64 kB)," but we currently only send data if it's the parent segment.

Move or Retire non-OpenTelemetry Modules?

As I was working on #688, I noticed that several of the backends supported here now actively encourage users to start with or switch to using OpenTelemetry instrumentation directly. Honeycomb, Jaeger, and Lightstep are all in this bucket. I get the impression NewRelic and DataDog would prefer you use their libraries (but they do support OTel). AWS X-Ray documents their OTel distro before their proprietary library, and present the choice in what seems like a pretty balanced way. And of course OpenCensus and OpenTracing merged to form OpenTelemetry, so their preference is clear as well.

It's also a little tricky to work on changes to core when every change has to be immediately propagated across all the backends. Since I have hands-on experience with OTel and X-Ray, it was straightforward enough to figure out the right way to implement #688, but the other backends have slightly different semantics, and I don't really have a good way of testing what happens when sending trace data to commercial services I don't use. It would be better to have maintainers for those modules who use them, who can implement new features as they come.

Maybe we should consider either (1) moving most of these backends to their own repositories, so they can be developed and versioned separately from the core and OTel backend, or (2) retiring them altogether if there's no maintainer interest in keeping them around. And either way, document clearly that new users should start with the OpenTelemetry backend unless they have a specific reason to use one of the others.

Of course, for existing codebases, migrating to a new instrumentation library may not be trivial, so dropping support altogether shouldn't be done lightly. It would be helpful to have download stats or other data to help make this decision, although if we start by moving backends to separate repos, there would at least be an opportunity for interested parties to continue maintenance.

So, I'm curious what the community thinks.

XRayEnvironment.traceId should prioritize the com.amazonaws.xray.traceHeader system property

AWS's documentation states:

For Java runtime versions 17 and later, [the _X_AMZN_TRACE_ID] environment variable is not used. Instead, Lambda stores tracing information in the com.amazonaws.xray.traceHeader system property.

Furthermore, the OTel semantic conventions documentation states:

When instrumenting a Java AWS Lambda, instrumentation SHOULD first try to parse an OpenTelemetry Context out of the system property com.amazonaws.xray.traceHeader … before checking and attempting to parse the environment variable…

Child span exceeding the root span

I am seeing a few cases where the child span extends beyond the root span. Attaching screenshot
Screenshot from 2021-05-09 17-06-20
In this case the child span extends slightly beyond the root. But I have seen cases where the extension is significant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.