Giter VIP home page Giter VIP logo

perfgrind's Issues

PIE not handled properly in pgconvert

I've just cloned the repo and run make check which worked fine, then make open-checkfiles.

cc -std=gnu99  -O2 -Wall -Wextra -g  -D_GNU_SOURCE -o pgcollect  pgcollect.c
g++ -std=c++03 -O2 -Wall -Wextra -g  -o pgconvert  pgconvert.cpp AddressResolver.cpp Profile.cpp -ldw -lelf
g++ -std=c++03 -O2 -Wall -Wextra -g  -o pginfo     pginfo.cpp    AddressResolver.cpp Profile.cpp -ldw -lelf
g++ -std=c++03 -O  -Wall -Wextra -g  -g -fno-omit-frame-pointer -o pginfo_dbg pginfo.cpp    AddressResolver.cpp Profile.cpp -ldw -lelf

collecting some data of ls binary (likely without full symbols)...
./pgcollect check_ls.pgdata -F 2048 -s -- ls -l /usr/bin  1>/dev/null

collecting data of checking that (guaranteed to have symbols for binary pginfo_dbg) ...
./pgcollect check.pgdata    -F 2048 -s -- ./pginfo_dbg callgraph check_ls.pgdata
Setting frequency to 2048
Going to profile process with PID 30725: ./pginfo_dbg callgraph check_ls.pgdata
memory objects: 3
entries: 19

mmap events: 19
good sample events: 15
bad sample events: 0
total sample events: 15
total events: 34
Collection stopped.
Waked up 1 times
Sythetic mmap events: 0
Real mmap events: 43
Sample events: 6
Total 49 events written

checking its infos ...
./pginfo callgraph check.pgdata
memory objects: 5
entries: 14

mmap events: 43
good sample events: 6
bad sample events: 0
total sample events: 6
total events: 49

converting both collections to callgrind format ...
./pgconvert check_ls.pgdata -d object       1> check_ls.grind  # old: stdout
Can't resolve symbol for address 154a1, load base: 55f323480000
Can't resolve symbol for address 15d300, load base: 7feffd84a000
./pgconvert check.pgdata    -d source -i       check.grind     # new: second option

The result looks reasonable so far but opening on KCachegrind shows nearly an empty source and most checking of machine code prompts an error, which is rooted in the objdump calls that use the address from the profile - and those don't exist in the binary.

If I run those manually I get.

objdump -C -d --start-address=0x2F1E --stop-address=0x2F46 /home/build/perfgrind/pginfo_dbg

/home/build/perfgrind/pginfo_dbg:     file format elf64-x86-64

run without the address:

 objdump -C -d  /home/build/perfgrind/pginfo_dbg | head -n 20

/home/build/perfgrind/pginfo_dbg:     file format elf64-x86-64


Disassembly of section .init:

0000000000003000 <_init>:
    3000:       48 83 ec 08             sub    $0x8,%rsp
    3004:       48 8b 05 dd 8f 00 00    mov    0x8fdd(%rip),%rax        # bfe8 <__gmon_start__>
    300b:       48 85 c0                test   %rax,%rax
    300e:       74 02                   je     3012 <_init+0x12>
    3010:       ff d0                   callq  *%rax
    3012:       48 83 c4 08             add    $0x8,%rsp
    3016:       c3                      retq

Disassembly of section .plt:

0000000000003020 <.plt>:
    3020:       ff 35 e2 8f 00 00       pushq  0x8fe2(%rip)        # c008 <_GLOBAL_OFFSET_TABLE_+0x8>
    3026:       ff 25 e4 8f 00 00       jmpq   *0x8fe4(%rip)        # c010 <_GLOBAL_OFFSET_TABLE_+0x10

Any idea what's going on here?

calling "very flat" callgraphs with bad function names

I've pulled current master, then built (no need for site.mak as dependencies were installed via apt in Debian), then followed the docs further (pgcollect with manually increased frequency, then running pgconvert) - but end up with a profile that has all entries named "func_HASH" and a total result of 148%.

should this still work and produce a callgrind file that contains reasonable function names and percentage?

FR: add `-z` option to pgconvert (compress on the fly via zstd)

similar to perf record:

       -z, --compression-level[=n]
           Produce compressed trace using specified level n (default: 1
           - fastest compression, 22 - smallest trace)

I think this would mean an (optional?) dependency on zlib and then "just" wrap the file output in a compression function using the level as specified / default and having the pgconvert tool using zlib to open the file.

The reason for this FR is that the CPU doing the profiling commonly has some spare cycles and reducing the data to write also leads to an io speedup (and long profiling always tends to fill the disk fast).

FR pgcollect: add `-a` "all cpus" option - or similar

The option to limit the recording (in perf only "by event" or "by user name/id") would be necessary, too (username/id would be fine, "by binary" would be extra cool, "by text match in command line" would be even better).

pgconvert could then use a single "logical" "collection" as root (if necessary) and then the PIDs as logical sub-notes - before adding the appropriate stack; pginfo would provide some details on the PIDs collected.

A feature like this is the one part that is missing most compared to "plain perf" or its rcording + conversion to callgrind format (which misses details but most important "eats CPU like an end-boss").

Should PLT hits be counted "special"?

I've seen ab01a43 but am not sure how the "before" and "after" looks like (I know, I could checkout the old version and just test...), so this may or may not have be different in old versions.

In any case when inspecting generated callgrind files I see quite an amount of func@plt entries (which possibly only happens when tracing with a high frequency).
If traced via callgrind those function lookups are not shown, it seems that the collection via perfparser attributes some executions that are done in the real function in a plt version (but that would need a deeper check to verify).

Question: Should these be shown? I think the ideal version would be to count them as if they would have been finished with calling (= show the real function/module instead of the callers' func@plt version; but I'm not sure that this is even possible. If not, should they be directly attributed to the caller (which could be chosen by the user if #10 is implemented with --collect=*@plt) or do you consider the current version to be the most reasonable?

pgcollect: option to use PID and timestamp of started/attached process as part of the output file

Scenario:

there's a "runner script" already that executes the executable after setting up some environment variables; I've added an environment variable in front of it, that allows to "only" export the variable to pgcollect; while this works in general there's again the need to set the variable (in this case possibly the login script) and it is harder than necessary to set the a useful output file name...

Originally posted by @GitMensch in #23 (comment)

To solve this I'd like to have either replace options in the specified filename or extra flags to say "add the current timestamp and/or pid to the output name specified.

That would mean something like one of those:

pgcollect filename.pgdata -P-T -- command options
# would result in filename.pgdata.PidOfThatProcesss.CurrentTimeStamp
pgcollect filename.%T-%P.pgdata -- command options
# would result in filename.CurrentTimeStamp-PidOfThatProcesss.pgdata

(and should also work if -p PID is used)

FR: create debian package

This may be a "long term" issue, also given that the build itself is relative easy.

Nonetheless an "official" debian package is very likely to "reach out" to new users, also because of debian packages being a common "upstream" for debian-based distributions.

A point to start is possibly the "Debian Handbook" Learning to Make Packages.

FR: GitHub metadata - Releases

To do so: go to https://github.com/ostash/perfgrind/tags, then start with the 0.1 version on the right side "..." button -> create release

Then choose release title, suggestion "perfgrind 0.1 - initial release", with some release note, suggestion:

* `pgcollect` records samples for `PERF_COUNT_HW_CPU_CYCLES` events in userspace only at frequency 1000;  
both attaching to existing process and spawning new process are supported
* `pgreport` converts collected samples to 'callgrind' format;  
only callgraphs are supported; detail is always 'source file/line' (when debug information is available); instruction dumping is not implemented yet."

then the same for the 0.2 tag, suggestion "perfgrind 0.2" with

* new completely rewritten converter `pgconvert`, old `pgreport` is removed
* flat/callgraph profiles are supported
* different levels of details are supported: object, symbol, source
* instruction dumping is supported
* resulting 'callgrind' files are much smaller now, thanks to grouping hits by object/symbol/source file/source line
* `pginfo` utility for getting some stats about dump file.

and for now last one "perfgrind 0.3" with

* allow to set arbitrary frequency in `pgcollect` via `-F` argument
* handling of hits in PLT added
* properly handle page offset in mmap events

again for an example how this would look like:

https://github.com/GitMensch/perfgrind/releases and https://github.com/GitMensch/perfgrind (on the right)

For 0.4 you can then try "generate release news" button ;-)

FR: pgconvert from `perf record` created file

Ideally I'd like to pgconvert profiles coming directly from
LD_LIBRARY_PATH=. perf record -o perf.data --call-graph dwarf,8192 -z --aio --sample-cpu ./main.

Originally posted by @GitMensch in #1 (comment)

Is this possible or can be added? Either directly into pgconvert or as separate tool like pdatagconvert?

Failing to start collection with mmap error

Hi, I'm trying to get this to work. I'm on AlmaLinux 8 and its a relatively locked down server. I'm seeing

Can't mmap perf events: Operation not permitted perfgrind

Any tips or permissions I have to adjust are appreciated.

external addon: script to combine events

This is possibly a quite "special" request: I'd like to combine events that match a prefix/suffix, which would be specified on command line.

By "combine" I mean that the events would be calculated as if they would have been seen in the caller:
Giving the following four stacktraces during the pgcollect run:

main
  func1
    func1_
       intfunc 

main
  func1
     intfunc2

main
  func2
     intfunc

main
  func2
     func2_

And a suggested call of pgconvert -d symbol -c "intfunc*,*_" filename.pgdata > callgrind.out.overfiew_pgdata
"-c func count as caller, may use an asterisk to match multiple ones"

The "inspected" call stacks would be (after combining everything that starts with "intfunc" or ends with "_"):

main
  func1

main
  func1

main
  func2

main
  func2

when combining the call all costs attached to this specific entry would be counted as happened in the caller... _ maybe_ that means the combination switch must be handled (and therefore specified) during recording already?

Can't create performance event file descriptor: No such file or directory

Testing on current Debian worked fine, testing on another machine (RHEL 8.5) raised this error.

perfgrind/pgcollect.c

Lines 307 to 314 in 9b1a75f

int fd = perf_event_open(&pe_attr, pid, cpu, -1, 0);
if (fd == -1)
{
perror("Can't create performance event file descriptor");
if (state->gogoFD != -1)
close(state->gogoFD);
exit(EXIT_FAILURE);
}

As perf record -o perf.data --call-graph dwarf,8192 -z worked fine I have no idea what to check, do you have any clue why that system call does not work here?

FR: add `install` and `uninstall` targets

Should I handle that in #12?
It would only depend on the three binaries moved to an internal variable PROGRAMS and execute install $(PROGRAMS) $(PREFIX/)bin (and defaulting that to /usr/local).

The uninstall would then execute an rm in the same place.

Side note: some Makefiles copy the existing binaries during install to the build dir, then restore them on uninstall, but I think that's not needed here. Of course some programs drop the uninstall target completely...

bug in pgcollect: `--` not handled correctly; overwriting executables

If pgcollect does not find and output file name it leaks to after -- which should always stop parsing.
Example:

result:

$> perfgrind/pgcollect -sF 8192 -- myprog myoption
Setting frequency to 8192
Going to profile process with PID 2561483: myoption
Can't exec new process: No such file or directory
Collection stopped.
Waked up 8 times
Sythetic mmap events: 0
Real mmap events: 0
Sample events: 0
Total 0 events written

and myprog being zero bytes afterwards (handled as output filename).

compiler warnings concerning formatting for 64bit integers

gcc -std=gnu99 -Wall -g -D_GNU_SOURCE -o pgcollect pgcollect.c
pgcollect.c: In function 'collectExistingMappings':
pgcollect.c:127:17: warning: format '%lx' expects argument of type 'long unsigned int *', but argument 3 has type '__u64 *' {aka 'long long unsigned int *'} [-Wformat=]
     sscanf(buf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n", &event.addr, &event.len, prot, &event.pgoff,
                 ^~~                                                  ~~~~~~~~~~~
In file included from pgcollect.c:4:
/usr/include/inttypes.h:121:34: note: format string is defined here
 # define PRIx64  __PRI64_PREFIX "x"
pgcollect.c:127:17: warning: format '%lx' expects argument of type 'long unsigned int *', but argument 4 has type '__u64 *' {aka 'long long unsigned int *'} [-Wformat=]
     sscanf(buf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n", &event.addr, &event.len, prot, &event.pgoff,
                 ^~~                                                               ~~~~~~~~~~
In file included from pgcollect.c:4:
/usr/include/inttypes.h:121:34: note: format string is defined here
 # define PRIx64  __PRI64_PREFIX "x"
pgcollect.c:127:17: warning: format '%lx' expects argument of type 'long unsigned int *', but argument 6 has type '__u64 *' {aka 'long long unsigned int *'} [-Wformat=]
     sscanf(buf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n", &event.addr, &event.len, prot, &event.pgoff,
                 ^~~                                                                                 ~~~~~~~~~~~~
In file included from pgcollect.c:4:
/usr/include/inttypes.h:121:34: note: format string is defined here
 # define PRIx64  __PRI64_PREFIX "x"

FR: GitHub metadata - "About"

Please go to the main page of this GitHub repo, then click right next to "about".

I suggest to use the following short description:

perfgrind - tools for collecting samples from Linux performance events subsystem and converting profiling data to callgrind format, allowing it to be read with KCachegrind

and following tags (you need to actually enter one by one):

linux-kernel perf profiling callgrind kcachegrind callgraph

and lastly disable "packages"

For the result see https://github.com/GitMensch/perfgrind on the right side.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.