I've been reading the code for tokio-trace, and I've got some questions about <code cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

tokio-trace: how to use Spans to represent concurrency? about tracing HOT 9 CLOSED

tokio-rs commented on July 17, 2024

tokio-trace: how to use Spans to represent concurrency?

from tracing.

Comments (9)

jsgf commented on July 17, 2024

Ah, I just noticed follows_from. I'm guessing the intention is that each individual state transition is recorded with Span (ie, the actual poll calls), but they're linked together into a high-level operation via follows_from. Is that correct?

from tracing.

carllerche commented on July 17, 2024

I don't think follows_from is the best fit in this case. A good example for follows_from is a task sending a message to a worker and the worker does some work to process that message. The span representing the work to process that message has a parent that is the worker task but follows from the span that sent the message.

The case you describe does sound like a parent / child relationship. The work in the spawned task is entirely in service of the spawner. Being able to represent this as a parent / child was an original design goal. Thinking about it, I wonder if it was lost a bit over iterations.

IIRC, my thinking was that the spawner and the spawned task would get linked via a manual step. There would be an API to say that span A is a child of span B. However, parent / child was removed from -core. My thought was that it could be entirely handled by the subscriber by observing how spans are entered / exited.

Perhaps this isn't sufficient.

There are a few questions in my mind:

How does the instrumentation side of tokio-trace say Span A is a child of Span B.
Is observing parent/child via span enter / exit correct? I don't think not doing this is an option.
- What happens if a child span is nested in Span A, then it is moved and is re-rooted in Span B? I assume a span cannot have multiple parents? But how would this be handled?

from tracing.

hawkw commented on July 17, 2024

@jsgf this is an excellent question, thanks!

The answer right now is that this is something that the core tokio-trace libraries are unopinionated about. Since tracking the current span is something that can be done solely by a Subscriber and isn't necessary to implement anything in tokio-trace-core, we thought this was best left up to Subscriber implementations to determine the correct behaviour for their particular use-case. This is why the docs mention a notion of a "current span" but don't actually define what this means (something which I think ought to be resolved by improving the documentation).

With that said, here are some of my thoughts that have fallen out of experimenting with subscriber implementations:

You're right that if a single global current span is tracked for the entire process, we can't really represent concurrently executing contexts on multiple threads.

I think that in most cases, subscribers which expect to be used in multithreaded contexts should track a separate current span per thread. In my subscriber implementations, I've used something like this:

  thread_local! {
      static CURRENT_SPAN: RefCell<Vec<Span>> = RefCell::new(Vec::new());
  }

  // This can be used for determining the parent of new spans, for determining 
  // the current span for formatting events, etc...
  fn current_span() -> Option<Span> {
      CURRENT_SPAN.with(|stack| {
          stack.borrow().last().map(Span::clone)
      })
  }

  impl Subscriber for MySubscriber {

      fn enter(&self, span: &Span) {
          CURRENT_SPAN.with(|stack| {
              stack.borrow_mut().push(span.clone());
          })
      }

      fn exit(&self, _span: &Span) {
          CURRENT_SPAN.with(|stack| {
              let _ = stack.borrow_mut().pop();
          })
      }

      // ....
  }

Eventually, I'd like to ship a library of composable utilities for writing subscribers, with code for things like "tracking the current span on a per-thread basis" to be used in subscriber implementations. That's something I'm still experimenting with, though.

Typically, I think of the "parent" of a span as being the span in which that span was created, meaning that in most cases, it's likely that there'll be both a stack of the spans the current thread has entered (as in the example above) and a tree of spans with parent-child relationships, but again, this is left up to the subscriber.

I've found that when printing events in a context, the context that makes the most logical sense to someone consuming logs is to print the thread's current span and all its parents. This means that (using the example of a Hyper server) we'll get logs that look like this:

 INFO server{local=V4(127.0.0.1:3000)}: hyper_echo: listening...
 TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50498)}: hyper_echo: accepted connection
 INFO server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50498)}:request{method=POST uri=/echo/}: hyper_echo: received request
 TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}: hyper_echo: accepted connection
 INFO server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}: hyper_echo: received request
 DEBUG server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}:response{response_kind="reversed"}: hyper_echo: reversed request body chunk=Ok("hello world") body=Ok("dlrow olleh")
 TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}:response{response_kind="reversed"}: hyper_echo: response completed

even if the accepting task and the spawned request-handler futures are running on different threads in a threadpool. However, if we care about what individual worker threads are doing, we might also want to inspect the stack of spans that thread has entered as well.

In the use-case you described, representing the parent future with a span and having each spawned future correspond to a new span inside the spawning future's span definitely seems like what I would do.

I would probably do something like this (using the Future::instrument combinator from the tokio-trace-futures crate in the nursery):

use futures::future::{self, Future};
use tokio_trace_futures::Instrument;

fn parent_task(how_many: usize) -> impl Future<Item=(), Error=()> {
    future::lazy(move || {
        info!("spawning subtasks...");
        let subtasks = (1..=how_many).map(|i| {
            debug!(message = "creating subtask;", number = i);
            subtask(i)
        }).collect::<Vec<_>>();
        future::join_all(subtasks)
    })
    .map(|result| {
        debug!("all subtasks completed");
        let sum: usize = result.into_iter().sum();
        info!(sum = sum);
    })
    .instrument(span!("parent_task", subtasks = how_many))
}

fn subtask(number: usize) -> impl Future<Item=usize, Error=()> {
    future::lazy(move || {
        info!("polling subtask...");
        Ok(number)
    }).instrument(span!("subtask", number = number))
}

Running this as an example (using the tokio-trace-fmt subscriber from the nursery) gets logs like this:

 INFO parent_task{subtasks=10}: spawny_thing: spawning subtasks...
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=1
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=2
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=3
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=4
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=5
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=6
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=7
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=8
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=9
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=10
DEBUG parent_task{subtasks=10}:subtask{number=1}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=2}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=3}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=4}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=5}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=6}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=7}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=8}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=9}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=10}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}: spawny_thing: all subtasks completed
 INFO parent_task{subtasks=10}: spawny_thing: sum=55

which seems about like what I would expect to see from this code.

from tracing.

hawkw commented on July 17, 2024

As a side note, sorry that the nursery crates I mentioned above are not currently very well documented or discoverable; they're still works in progress (is is what happens when one tries to materialize an entire library ecosystem from scratch. :P).

from tracing.

jsgf commented on July 17, 2024

Thanks for your explanations. I think I have a clearer idea about how things fit together. Specifically the understanding I have now is:

A Span corresponds to an instance of a Future (or Stream), with entry periods corresponding to calls to poll
- Though Spans are more general and could be used to model similar things
Sub-spans correspond to sub-Future/Streams for sub-parts of the overall job
Span trees are scoped to a Task - it doesn't make sense for a Span to reference one in another task (except by follows_from)
Tasks are always bound to a single thread at a time, so only one Span at a given level of the tree can be entered at a time

Does that sound about right?

from tracing.

carllerche commented on July 17, 2024

@hawkw

My gut is that that using a thread local would be a good default for a subscriber to use if no parent / child relationship is specified, but there should be a way using the instrumentation API to explicitly create a parent / child relationship or to create an unrooted span.

I don't think we use span creation to determine parent / child given that spans are not entered when created. Subscribers probably have to observe the first time a span is entered and find the parent then.

from tracing.

hawkw commented on July 17, 2024

@carllerche

there should be a way using the instrumentation API to explicitly create a parent / child relationship or to create an unrooted span.

I'm open to adding this.

I don't think we use span creation to determine parent / child given that spans are not entered when created. Subscribers probably have to observe the first time a span is entered and find the parent then.

I don't agree --- in my experience, considering the span in which a span was created to be its parent leads to behaviour more closely in line with the user's expectations. For example, consider code like

span!("span_1").enter(|| {

    let fut = foo
        .and_then(...)
         // ... 
        .instrument(span!("span 2"));

    tokio::spawn(fut);
});

Logically, I would expect span_1 to be the parent of span_2. However, if we determine span parenthood based on the current span when a span is entered, then span_2 will likely be rootless*. In my experience, in order to get the nice trace trees we expect to see, where i.e. the task spawned to handle a request is a child of the task that accepts incoming connections, we probably want to determine parenthood based on the current span when a span is created.

But, regardless, this is behaviour that ought to be determined by the subscriber.

*Or, will be a child of whatever span the worker thread that began executing it was currently inside of when the future was polled for the first time, which could be very surprising indeed.

from tracing.

carllerche commented on July 17, 2024

@hawkw

When would such a snippet ever happen? Spans are created in future constructors. In practice, it would be something like this:

TcpStream::connect(...)
    .instrument(span!("child"))
    .and_then(|...| {
    })
    .instrument(span!("parent"))

If the parent / child link happens at construction time, it will fail here.

from tracing.

hawkw commented on July 17, 2024

I think we've put some work into fixing up the docs so that this is a little clearer. I'm closing this ticket for now.

from tracing.

tokio-trace: how to use Spans to represent concurrency? about tracing HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent