Giter VIP home page Giter VIP logo

Comments (10)

toddlipcon avatar toddlipcon commented on August 15, 2024

The difficulty here is that system calls are often called by inline assembly 'int 0x80' so the LD_PRELOAD interception isn't sufficient. Probably need to use ptrace, or otherwise require the program being profiled to annotate these manually-implemented synchronization primitives using something similar to how the COZ_PROGRESS macro calls into the libcoz runtime.

from coz.

toddlipcon avatar toddlipcon commented on August 15, 2024

I put some hacky code here: https://gist.github.com/toddlipcon/761fa7f8bd9e91f8a8dd
though not getting very good results on my actual application. Hints would be great.

from coz.

ccurtsinger avatar ccurtsinger commented on August 15, 2024

I'm not excited about paying ptrace's ~8% overhead all the time, since this would distort the program runtime quite a bit more than Coz already does. Extra instrumentation isn't ideal in terms of user effort, but it should work. Could you post your profile.coz file somewhere so I can take a look at the results?

A long term solution may be to use perf's trace events, which can count context switches and thread blocking counts, regardless of the cause of the blocking/unblocking event.

from coz.

emeryberger avatar emeryberger commented on August 15, 2024

Another option might be for Coz to expose a mechanism that would let programmers indicate that certain functions correspond to pthread_mutex_lock and pthread_mutex_unlock and condvar friends.

from coz.

ccurtsinger avatar ccurtsinger commented on August 15, 2024

@emeryberger: Yeah, that's roughly what the patch above does.

@toddlipcon: your implementation seems like it should be okay. Do you have a simple example where coz seems to do the wrong thing with your extra macros?

from coz.

toddlipcon avatar toddlipcon commented on August 15, 2024

I got a chance to look at this again today (thanks for pinging this issue).

Looking at the profile results, it looks like coz is just picking the same experiment over and over again to the point that it basically is never exploring any interesting parts of the code.

In particular, I'm profiling a benchmark program that looks like the following code:

SetUpRPCServer();
for (int i = 0; i < 8; i++) {
  threads.emplace_back(([]{ 
    while (true) {
      MakeRPCCallToLocalhost();
      COZ_PROGRESS;
    }

The implementation of 'MakeRPCCall' essentially delegates work to another thread (a libev event loop) and then blocks on a mutex/condvar until the call gets back.

The issue seems to be that, in the steady state, most of my threads are blocked in this stack trace waiting for a response. So the task-clock based profile collection means that it's extremely likely to pick this line of code for the experiment:

todd@todd-ThinkPad-T540p:~/git/coz$ grep experiment ../kudu/profile.coz  | awk '{print $2}' | sort | uniq -c | sort -nk1 | tail
      1 selected=/home/todd/git/kudu/thirdparty/gflags-2.1.2/include/gflags/gflags.h:154
      1 selected=/home/todd/git/kudu/thirdparty/installed-deps/include/google/protobuf/io/coded_stream.h:1091
      1 selected=/home/todd/git/kudu/thirdparty/libev-4.20/ev.c:3521
      2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:75
      2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:76
      2 selected=/home/todd/git/kudu/thirdparty/glog-0.3.4/src/logging.cc:2034
      3 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.pb.cc:721
      4 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:81
     33 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:23
   1454 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:84

(rtest.proxy.cc:84 is the last line within my source code for sending an RPC).

It seems almost as if the perf events are only getting collected from the "client" thread and not the "server" threads which are in the same process. Any ideas?

from coz.

toddlipcon avatar toddlipcon commented on August 15, 2024

I just noticed that if I don't pass '-s %%/src/kudu/%%' I end up getting a lot better spreading to experiments on other files. Maybe something is wrong with the way that experiment lines are getting filtered.

from coz.

ccurtsinger avatar ccurtsinger commented on August 15, 2024

When you omit the -s flag, do you find lots of samples in source files that don't match that pattern?

If coz gets a sample that's not in the given source scope, it walks back up the stack to find the last callsite that is in scope (if any). I'm guessing your application runtime is dominated by computation that is invoked (indirectly) from the callsite where your hotspot is.

from coz.

ccurtsinger avatar ccurtsinger commented on August 15, 2024

The fix for #57 should resolve your second issue, once it's done.

Could you submit the change in your gist as a pull request?

from coz.

vlovich avatar vlovich commented on August 15, 2024

Are the pre block/post block annotations needed for epoll too? Or is epoll understood properly and this only applies to condvar and futexes?

I have a complicated multiprocess system (processes spawned via forks of the main process which is always idle and uninteresting) and trying to see if coz will be a good fit to figure out why some complicated RPC between some components is slow.

from coz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.