Comments (10)
The difficulty here is that system calls are often called by inline assembly 'int 0x80' so the LD_PRELOAD interception isn't sufficient. Probably need to use ptrace, or otherwise require the program being profiled to annotate these manually-implemented synchronization primitives using something similar to how the COZ_PROGRESS macro calls into the libcoz runtime.
from coz.
I put some hacky code here: https://gist.github.com/toddlipcon/761fa7f8bd9e91f8a8dd
though not getting very good results on my actual application. Hints would be great.
from coz.
I'm not excited about paying ptrace
's ~8% overhead all the time, since this would distort the program runtime quite a bit more than Coz already does. Extra instrumentation isn't ideal in terms of user effort, but it should work. Could you post your profile.coz
file somewhere so I can take a look at the results?
A long term solution may be to use perf's trace events, which can count context switches and thread blocking counts, regardless of the cause of the blocking/unblocking event.
from coz.
Another option might be for Coz to expose a mechanism that would let programmers indicate that certain functions correspond to pthread_mutex_lock
and pthread_mutex_unlock
and condvar
friends.
from coz.
@emeryberger: Yeah, that's roughly what the patch above does.
@toddlipcon: your implementation seems like it should be okay. Do you have a simple example where coz seems to do the wrong thing with your extra macros?
from coz.
I got a chance to look at this again today (thanks for pinging this issue).
Looking at the profile results, it looks like coz is just picking the same experiment over and over again to the point that it basically is never exploring any interesting parts of the code.
In particular, I'm profiling a benchmark program that looks like the following code:
SetUpRPCServer();
for (int i = 0; i < 8; i++) {
threads.emplace_back(([]{
while (true) {
MakeRPCCallToLocalhost();
COZ_PROGRESS;
}
The implementation of 'MakeRPCCall' essentially delegates work to another thread (a libev event loop) and then blocks on a mutex/condvar until the call gets back.
The issue seems to be that, in the steady state, most of my threads are blocked in this stack trace waiting for a response. So the task-clock based profile collection means that it's extremely likely to pick this line of code for the experiment:
todd@todd-ThinkPad-T540p:~/git/coz$ grep experiment ../kudu/profile.coz | awk '{print $2}' | sort | uniq -c | sort -nk1 | tail
1 selected=/home/todd/git/kudu/thirdparty/gflags-2.1.2/include/gflags/gflags.h:154
1 selected=/home/todd/git/kudu/thirdparty/installed-deps/include/google/protobuf/io/coded_stream.h:1091
1 selected=/home/todd/git/kudu/thirdparty/libev-4.20/ev.c:3521
2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:75
2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:76
2 selected=/home/todd/git/kudu/thirdparty/glog-0.3.4/src/logging.cc:2034
3 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.pb.cc:721
4 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:81
33 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:23
1454 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:84
(rtest.proxy.cc:84 is the last line within my source code for sending an RPC).
It seems almost as if the perf events are only getting collected from the "client" thread and not the "server" threads which are in the same process. Any ideas?
from coz.
I just noticed that if I don't pass '-s %%/src/kudu/%%' I end up getting a lot better spreading to experiments on other files. Maybe something is wrong with the way that experiment lines are getting filtered.
from coz.
When you omit the -s
flag, do you find lots of samples in source files that don't match that pattern?
If coz gets a sample that's not in the given source scope, it walks back up the stack to find the last callsite that is in scope (if any). I'm guessing your application runtime is dominated by computation that is invoked (indirectly) from the callsite where your hotspot is.
from coz.
The fix for #57 should resolve your second issue, once it's done.
Could you submit the change in your gist as a pull request?
from coz.
Are the pre block/post block annotations needed for epoll too? Or is epoll understood properly and this only applies to condvar and futexes?
I have a complicated multiprocess system (processes spawned via forks of the main process which is always idle and uninteresting) and trying to see if coz will be a good fit to figure out why some complicated RPC between some components is slow.
from coz.
Related Issues (20)
- Profile viewer has an extremely limited color palette HOT 2
- `coz` on Rust programs in release mode (using coz-rs) picks random lines in `addr2line` and never files in `src` HOT 4
- Can't locate generic/monomorphized lines of code
- Optimize total programm run time HOT 4
- Coz install fails on CentOS8 Stream HOT 6
- Consider adding a CI
- Time to tag a new release? HOT 14
- Point counters not initialized
- Parallel COZ?
- Invalid Profile: The profile you loaded contains an invalid line: delta
- Coz doesn't work with WSL 2 HOT 1
- Apt is not supported for Fedora
- COZ cannot count samples (I mean, SIGPROF handling is ignored) while mmap() HOT 1
- Process never terminates when main thread of the profiled program exits with pthread_exit HOT 1
- Infinite recursion when libpthread.so.0 cannot be dynamically loaded
- Incorrect predictions if code contains usleep calls
- Build fails on arch linux HOT 1
- Assertion '!empty()' failed. HOT 4
- Dependencie "libelfin" doesn't compile HOT 1
- Add test cases for coz functionality HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coz.