andikleen / pmu-tools Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 329.0 4.76 MB

Intel PMU profiling tools

License: GNU General Public License v2.0

Makefile 0.04% C 1.82% C++ 0.01% Python 97.33% Shell 0.52% HTML 0.01% Roff 0.24% Perl 0.03%

pmu-tools's People

Contributors

Stargazers

Watchers

Forkers

kopchik erikarn llethub graydon biswapanda xu3bp6 wewela martinfaust dlespiau feilongliu lsharifi zestrada raceli tgrabiec jiaqiang simondyl hkshaw1990 banitag1 yongjianxu leoninnovate bluelover-zm weinix furatafram hgn chubbymaggie seokj greenscientist iramin psteinb hoangt yangxi changliwei goryszewskig ugiwgh qilewuqiong nkurz oddy555 lishiyong110 felipebetancur ustczjr86 knweiss ottolu elvinio emaxerrno atlantic777 drcrallen lailamaharon icefishc dwdm byeonghunhyeon sjanulonoks gemini1994 rainm nsk-dmitry fabmiz wcohen anirajk tdrjnr lishuai1225 justplay oleksandr-oksenenko-zz shaygalon schandra soramichi nimisolo gumi-presentation-by-dzh ahama thomas-yang xzffwy michaeljclark hacker-qian ziyht gollum2411 serhiyx vetoplayer panyufeng920 vickyma farck mcastelino fallinsky bodgergely mahesh-subramanian hannesweisbach hying-caritas wei-n-ning hawkroy ptabc lilinji abazhaniuk pllopis pandamengxu sharpzhao figozhang cooljiansir renzhengeek sheepx86 harrywei hjat2005 jackwangpku mahikishan

pmu-tools's Issues

GenuineIntel-6-4F-core.json not found

on a Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv
Downloading https://download.01.org/perfmon/BDX/BroadwellX_core_V10.json to GenuineIntel-6-4F-core.json
Cannot access event server: HTTP Error 404: Not Found

still
ocperf.py list does produce a reasonable list of events supported on Broadwell
and
ocperf.py stat works with my "standard" list of Broadwell events...

toplev -l4 --user generates invalid event spec on Intel(R) Xeon(R) CPU E5-2630 v3 (Haswell-E)

$ python2 ~/shared/pack/pmu-tools/toplev.py -l4 --user ls

yields

Using level 4.
Nodes Data_Sharing Memory_Bound 1_Port_Utilized Split_Stores L3_Bound
2_Ports_Utilized Contested_Accesses 3m_Ports_Utilized Store_Latency
Lock_Latency L3_Hit_Latency Split_Loads Ports_Utilization Core_Bound
MEM_Bound FB_Full have errata HSM30 HSM31 HSM26, HSM30
perf stat -x\; -e '{cpu/event=0x9c,umask=0x1/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cycles:u,cpu/event=0x79,umask=0x30/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0xc5,umask=0x0/u,cpu/event=0xd,umask=0x3,cmask=1/u,instructions:u},{cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xb1,umask=0x2,cmask=1/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cycles:u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0x5e,umask=0x1/u,instructions:u},{cpu/event=0x80,umask=0x4/u,cpu/event=0xab,umask=0x2/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0x87,umask=0x1/u,cpu/event=0x14,umask=0x2/u,cpu/event=0x79,umask=0x30,edge=1,cmask=1/u,cpu/event=0xc1,umask=0x40/u,cycles:u},{cpu/event=0x79,umask=0x24,cmask=4/u,cpu/event=0xa8,umask=0x1,cmask=1/u,cpu/event=0x79,umask=0x24,cmask=1/u,cpu/event=0x85,umask=0x60/u,cpu/event=0x79,umask=0x18,cmask=1/u,cpu/event=0xa8,umask=0x1,cmask=4/u,cycles:u,cpu/event=0x79,umask=0x18,cmask=4/u,cpu/event=0x85,umask=0x10/u},{cpu/event=0xa3,umask=0xc,cmask=12/u,cpu/event=0xd1,umask=0x20/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xd1,umask=0x4/u,cpu/event=0xa3,umask=0x5,cmask=5/u,cycles:u},{cpu/event=0x80,umask=0x4/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xe6,umask=0x1f/u,cpu/event=0x5e,umask=0x1,edge=1,inv=1,cmask=1/u,cpu/event=0x85,umask=0x60/u,cpu/event=0xc5,umask=0x0/u,cycles:u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x85,umask=0x10/u},{cpu/event=0x60,umask=0x8,cmask=6/u,cpu/event=0x7,umask=0x1/u,cpu/event=0xb7,umask=0x1/puhu,cpu/event=0xd0,umask=0x42/u,cpu/event=0x3,umask=0x2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cycles:u,cpu/event=0xb2,umask=0x1/u},{cpu/event=0x60,umask=0x8,cmask=1/u,cpu/event=0x8,umask=0x60/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0x60,umask=0x8,cmask=6/u,cpu/event=0xb1,umask=0x2,cmask=1/u,cpu/event=0x49,umask=0x60/u,cpu/event=0x49,umask=0x10/u,cpu/event=0x8,umask=0x10/u,cycles:u},{cpu/event=0x60,umask=0x4,cmask=1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cpu/event=0xd0,umask=0x82/u,cpu/event=0xc0,umask=0x2/u,cycles:u,cpu/event=0xd0,umask=0x21/u,instructions:u},{cpu/event=0x3,umask=0x8/u,cpu/event=0xd1,umask=0x8/u,cpu/event=0xd1,umask=0x40/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x48,umask=0x2,cmask=1/u,cycles:u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x48,umask=0x1/u},{cpu/event=0xd3,umask=0x1/u,cpu/event=0xd1,umask=0x4/u,cpu/event=0xd2,umask=0x4/u,cpu/event=0xd3,umask=0x4/u,cpu/event=0xd2,umask=0x1/u,cpu/event=0xd1,umask=0x40/u,cpu/event=0xd2,umask=0x2/u,cpu/event=0xd1,umask=0x2/u},{cpu/event=0xd3,umask=0x10/u,cpu/event=0xd3,umask=0x20/u,cycles:u}' ls
invalid or unsupported event: '{[snip]}'
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events

The issue appears to be this event:

cpu/event=0xb7,umask=0x1/puhu

which has a duplicate u specifier.

cc @lcw

CYCLE_ACTIVITY.STALLS_L1D_PENDING is always zero

I noticed that level 3 stats printed for memory bound workloads are incorrect on my machine (Xeon E5-2658 v3, Linux 3.19). Here is a sample output with a program that is DRAM bound (Intel MLC):

BE      Backend_Bound:                                90.68% 
BE/Mem  Backend_Bound.Memory_Bound:                   84.30% 
BE/Mem  Backend_Bound.Memory_Bound.L1_Bound:          84.35% 
BE/Mem  Backend_Bound.Memory_Bound.L3_Bound:          22.48% 
BE/Mem  Backend_Bound.Memory_Bound.MEM_Bound:         61.69%

L1_Bound value is incorrect. I traced the issue to perf always reporting zero for CYCLE_ACTIVITY.STALLS_L1D_PENDING. Here is a sample perf output for that event:

perf stat -I 1000 -e cpu/event=0xa3,umask=0xc,cmask=12/ -a sleep 5
#           time             counts unit events
     1.000206434                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     2.000452095                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     3.000657316                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     4.000875653                  0      cpu/event=0xa3,umask=0xc,cmask=12/
     5.001068298                  0      cpu/event=0xa3,umask=0xc,cmask=12/

With cmask=4, a value that seems correct is returned. I double checked SDM Vol3b and it seems that cmask value of 12 (0xc) should be correct. I understand this is not directly a pmu-tools bug, but was hoping to hear back if others are affected too.

ucevent CBO.LLC_{DDIO,PCIE}_* events for HSX

Haswell Xeon CBO.LLC_{DDIO,PCIE}_* events are missing. Any plans to add them in the near future?

rdpmc_read() in libjevents returns values > 2^48

I've been trying to do something similar to the interrupts.c code, but was having trouble with rdpmc_read() giving seemingly nonsense results. I've narrowed it down to the buf->offset field sometimes having a high bit set (1L << 48). PERF_EVENT_IOC_RESET will both the counter and the offset to 0, but at some point buf->offset will jump back. Masking off the high bits (or just ignoring offset) solves the problem but I can't find any reason it should be necessary.

I don't know if this is a bug in rdpmc_read(), a bug in the kernel, or something I'm doing wrong. I'm using a slightly older kernel (4.2.0) on Skylake, so it's also possible this is something that has already been fixed. Test code against current pmu-tools master is here: https://github.com/nkurz/pmu-tools/tree/test-offset. Help or suggestions of a better venue would be appreciated. I can upgrade to the current kernel and retest if necessary. Thanks!

Scale arg ignored in CSV mode

CSV-enabled output ignores the scale argument. If this is intentional the README should be updated, otherwise a quick workaround would be to modify self.vals in OutputCSV.flush() if args.scale is set.

CPU_STARTING and CPU_DYING no longer used in Linux 4.9

Hi, thanks for the great tool!

simple-pebs/simple-pebs.c uses CPU_STARTING and CPU_DYING to allow CPUs to be hot-plugged, but these macros are no longer used in Linux 4.9.
http://lxr.free-electrons.com/ident?v=4.9;i=CPU_STARTING

As this commit (https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=ee1e714b94521b0bb27b04dfd1728ec51b19d4f0) suggests, we should move to the new state machine mechanism to support hot-plugging for kernel 4.9 or later versions.

For most of the cases where CPU hot-plugging never happens, just delete the notifier call-backs like soramichi@d175a0b should work.

Download JSON Events File

I am running PMU tools on an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
When I run the event_download.py script, it tries to fecth https://download.01.org/perfmon/HSW/Haswell_core_V14.json. It seems that causes a 404 error.

The file it should fetch seems to be https://download.01.org/perfmon/HSW/Haswell_core_V15.json. Is there any workaround this problem?

Fails to compile in CentOS 6.5

addr.c uses the macro PERF_ATTR_SIZE_VER1, which isn't available in CentOS's version of /usr/include/linux/perf_events.h. However, it does have PERF_ATTR_SIZE_VER0, and compiles correctly when that is changed. Not sure if that will cause issues with usage, however.

Also, addr.c doesn't compile with gcc 4.4.7, and I'm reasonably certain that it's because that file uses an anonymous union in a struct, which that version of GCC doesn't support (not even with -std=c11, which isn't a supported option in this version). Upgrading my GCC to one that was compiled in this decade fixes the issue, without any source code modification other than the replaced macro above.

fucking windows newline symbol!!!!

alp@ws207:~/wrk$ ./pmu-tools/list-events.py
bash: ./pmu-tools/list-events.py: /usr/bin/python^M: плохой интерпретатор: Нет такого файла или каталога

toplev crashes on wrong float literal

Everything below was run as root.

# toplev.py -l1 sleep 10
Will measure complete system.
Using level 1.
perf stat -x\; -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 10
Traceback (most recent call last):
  File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 1617, in <module>
    ret = execute(runner, out, rest)
  File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 792, in execute
    env)
  File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 907, in do_execute
    multiplex = float(n[off + 1])
ValueError: invalid literal for float(): 100,00

Below is the result of perf stat ...:

# perf stat -x\; -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 10

CPU0;2241793414;;cpu/event=0x3c,umask=0x0,any=1/;10007127837;100,00
CPU1;2241051974;;cpu/event=0x3c,umask=0x0,any=1/;10007126109;100,00
CPU2;798878574;;cpu/event=0x3c,umask=0x0,any=1/;10007122595;100,00
CPU3;798025029;;cpu/event=0x3c,umask=0x0,any=1/;10007121927;100,00
CPU4;1479869080;;cpu/event=0x3c,umask=0x0,any=1/;10007136940;100,00
CPU5;1479260102;;cpu/event=0x3c,umask=0x0,any=1/;10007135470;100,00
CPU6;1637764499;;cpu/event=0x3c,umask=0x0,any=1/;10007133938;100,00
CPU7;1637043424;;cpu/event=0x3c,umask=0x0,any=1/;10007132916;100,00
CPU0;1778359179;;cpu/event=0xe,umask=0x1/;10007225730;100,00
CPU1;372610005;;cpu/event=0xe,umask=0x1/;10007224789;100,00
CPU2;423892267;;cpu/event=0xe,umask=0x1/;10007221503;100,00
CPU3;159917631;;cpu/event=0xe,umask=0x1/;10007219288;100,00
CPU4;457584393;;cpu/event=0xe,umask=0x1/;10007232528;100,00
CPU5;741543029;;cpu/event=0xe,umask=0x1/;10007230406;100,00
CPU6;1260524783;;cpu/event=0xe,umask=0x1/;10007228798;100,00
CPU7;402408452;;cpu/event=0xe,umask=0x1/;10007227198;100,00
CPU0;3625922836;;cpu/event=0x9c,umask=0x1/;10007284308;100,00
CPU1;153504280;;cpu/event=0x9c,umask=0x1/;10007281630;100,00
CPU2;1325774321;;cpu/event=0x9c,umask=0x1/;10007277765;100,00
CPU3;74342815;;cpu/event=0x9c,umask=0x1/;10007275369;100,00
CPU4;1632602740;;cpu/event=0x9c,umask=0x1/;10007287236;100,00
CPU5;268262892;;cpu/event=0x9c,umask=0x1/;10007284804;100,00
CPU6;2650705954;;cpu/event=0x9c,umask=0x1/;10007284336;100,00
CPU7;154401725;;cpu/event=0x9c,umask=0x1/;10007282072;100,00
CPU0;61193398;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007317348;100,00
CPU1;61193275;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007314139;100,00
CPU2;22095380;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007310171;100,00
CPU3;22095271;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007307061;100,00
CPU4;32942258;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007305560;100,00
CPU5;32942366;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007302497;100,00
CPU6;46513674;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007288378;100,00
CPU7;46513732;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007284929;100,00
CPU0;1471503714;;cpu/event=0xc2,umask=0x2/;10007304010;100,00
CPU1;340643421;;cpu/event=0xc2,umask=0x2/;10007300601;100,00
CPU2;357872568;;cpu/event=0xc2,umask=0x2/;10007296155;100,00
CPU3;126618075;;cpu/event=0xc2,umask=0x2/;10007292269;100,00
CPU4;397747237;;cpu/event=0xc2,umask=0x2/;10007290478;100,00
CPU5;628580576;;cpu/event=0xc2,umask=0x2/;10007286803;100,00
CPU6;1041831779;;cpu/event=0xc2,umask=0x2/;10007272537;100,00
CPU7;355401725;;cpu/event=0xc2,umask=0x2/;10007268529;100,00

I am not sure what's the case, but maybe locale?

# locale
LANG=pl_PL.utf8
LANGUAGE=en_US
LC_CTYPE="pl_PL.utf8"
LC_NUMERIC="pl_PL.utf8"
LC_TIME="pl_PL.utf8"
LC_COLLATE="pl_PL.utf8"
LC_MONETARY="pl_PL.utf8"
LC_MESSAGES="pl_PL.utf8"
LC_PAPER="pl_PL.utf8"
LC_NAME="pl_PL.utf8"
LC_ADDRESS="pl_PL.utf8"
LC_TELEPHONE="pl_PL.utf8"
LC_MEASUREMENT="pl_PL.utf8"
LC_IDENTIFICATION="pl_PL.utf8"
LC_ALL=pl_PL.utf8

The /usr/bin/python version (shouldn't you use /usr/bin/env python instead?):

Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2

Probably the issue can be solved setting locale in Python to the system one:

Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import locale

In [2]: locale.getdefaultlocale()
Out[2]: ('pl_PL', 'UTF-8')

In [3]: locale.atof("23.3")
Out[3]: 23.3

In [4]: locale.atof("23,3")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-132b6afaec24> in <module>()
----> 1 locale.atof("23,3")

/usr/lib/python2.7/locale.pyc in atof(string, func)
    314         string = string.replace(dd, '.')
    315     #finally, parse the string
--> 316     return func(string)
    317 
    318 def atoi(str):

ValueError: invalid literal for float(): 23,3

In [5]: locale.setlocale(locale.LC_ALL, '.'.join(locale.getdefaultlocale())
   ...: )
Out[5]: 'pl_PL.UTF-8'

In [6]: locale.atof("23,3")
Out[6]: 23.3

In [7]: locale.atof("23.3")
Out[7]: 23.3

So in the end it seems that locale.atof should be used instead of float when casting str to float.

not further backend_bound output in the toplev -l2

1 level :toplev.py sleep 60
Using level 1.
perf stat -x, -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1,any=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 60
S0-C0 FE Frontend_Bound: 35.21%
S0-C0 BE Backend_Bound: 50.81% //maybe the bound.
S0-C1 FE Frontend_Bound: 34.63%
S0-C1 BE Backend_Bound: 43.94%
.....
2 level:toplev.py -l2 sleep 60
S0-C0 FE Frontend_Bound: 32.92%

S0-C0 FE Frontend_Bound.Frontend_Latency: 27.60%

S0-C0 BE Backend_Bound: 52.92%

S0-C1 FE Frontend_Bound: 36.04%
S0-C1 FE Frontend_Bound.Frontend_Latency: 29.17%
......
S0-C0-T1BE/Mem Backend_Bound.Memory_Bound: 0.00% mismeasured

look,we cannot found the S0-C0 BE 's sub item,such as Frontend_Bound.Frontend_Latency:.
?
my kernel is ubuntu 3.16.0-31-generic.

I cant get pebs-grabber to install

I was able to build and install simple-pebs without issue.

This is on kernel 4.2.0-19

when I try pebs-grabber, I get an error.

dmegs says:
pebs_grabber: PEBS version 2
pebs_grabber: Cannot register kprobe: -2

KeyError: u'MATRIX_REQUEST'

Hi,

ocperf fails on my machine:

$ ocperf.py stat -e arith.div:k 
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv
Downloading https://download.01.org/perfmon/HSW/Haswell_core_V15.json to GenuineIntel-6-3C-core.json
Downloading https://download.01.org/perfmon/HSW/Haswell_matrix_bit_definitions_V15.json to GenuineIntel-6-3C-offcore.json
Downloading https://download.01.org/perfmon/readme.txt to readme.txt
Traceback (most recent call last):
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 690, in <module>
    emap = find_emap()
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 536, in find_emap
    return json_with_extra(el)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 482, in json_with_extra
    add_extra_env(emap, el)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 492, in add_extra_env
    emap.add_offcore(oc)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 452, in add_offcore
    if row[u"MATRIX_REQUEST"].upper() != "NULL":
KeyError: u'MATRIX_REQUEST'

Same for ocperf.py list:

$ ocperf.py list
Traceback (most recent call last):
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 690, in <module>
    emap = find_emap()
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 524, in find_emap
    emap = json_with_extra(el)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 482, in json_with_extra
    add_extra_env(emap, el)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 492, in add_extra_env
    emap.add_offcore(oc)
  File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 452, in add_offcore
    if row[u"MATRIX_REQUEST"].upper() != "NULL":
KeyError: u'MATRIX_REQUEST'

The CPU is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Issues with ocperf and toplev

I am running PMU-Tools on a Haswell i7 processor with 3.13.0-35-generic kernel (Ubuntu). I am getting some odd behavior in the output of ocperf and toplev.

When I run ocperf.py stat with the same events as toplev.py. It seems to show that many of the counters are <not counted>. Is this normal behavior? As I understood it, ocperf.py shouldn't show this behavior because it uses the events directly from Intel's description of the micro-architecture on my computer.

'{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/},{cpu/event=0xa2,umask=0x8/,cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x9c,umask=0x1/,cpu/event=0x9c,umask=0x1,cmask=4/,cycles,instructions},{cpu/event=0xe,umask=0x1/,cycles,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions},{cpu/event=0xab,umask=0x2/,cpu/event=0x87,umask=0x1/,cycles,cpu/event=0x79,umask=0x30,edge=1,cmask=1/,cpu/event=0x85,umask=0x10/},{cpu/event=0x80,umask=0x4/,cpu/event=0x79,umask=0x24,cmask=4/,cycles,cpu/event=0x79,umask=0x24,cmask=1/,cpu/event=0x85,umask=0x10/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x79,umask=0x18,cmask=1/,cycles,cpu/event=0xa3,umask=0xc,cmask=12/,cpu/event=0x79,umask=0x18,cmask=4/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0xa3,umask=0xc,cmask=12/,cpu/event=0xa3,umask=0x5,cmask=5/,cpu/event=0xa2,umask=0x8/,cycles},{cpu/event=0xa3,umask=0x5,cmask=5/,cpu/event=0xd1,umask=0x4/,cpu/event=0xd1,umask=0x20/,cycles},{cpu/event=0xc5,umask=0x0/,cpu/event=0xe6,umask=0x1f/,cpu/event=0x5e,umask=0x1/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0x80,umask=0x4/,cycles,cpu/event=0x5e,umask=0x1,edge=1,inv=1,cmask=1/},{cpu/event=0xd1,umask=0x4/,cycles,cpu/event=0xd2,umask=0x2/,cpu/event=0x7,umask=0x1/,cpu/event=0x3,umask=0x2/},{cpu/event=0x8,umask=0x10/,cycles,cpu/event=0x8,umask=0x60/,cpu/event=0x60,umask=0x1,cmask=6/},{cpu/event=0xd2,umask=0x1/,cpu/event=0x60,umask=0x1,cmask=1/,cycles,cpu/event=0xd2,umask=0x4/,cpu/event=0x60,umask=0x1,cmask=6/},{cpu/event=0xb7,umask=0x1,offcore_rsp=0x10003c0002/,cycles,cpu/event=0xd0,umask=0x42/,cpu/event=0xd2,umask=0x4/,cpu/event=0xd0,umask=0x82/},{cpu/event=0x49,umask=0x60/,cycles,cpu/event=0x49,umask=0x10/},{cpu/event=0xd1,umask=0x8/,cpu/event=0x3,umask=0x8/,cycles,cpu/event=0x48,umask=0x1/}

When I see the output of this command, a lot of events show up as <not counted>
Here is a sample of the output -

     1.196793465,<not counted>,cpu/event=0xab,umask=0x2/
     1.196793465,<not counted>,cpu/event=0x87,umask=0x1/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0x79,umask=0x30,edge=1,cmask=1/
     1.196793465,<not counted>,cpu/event=0x85,umask=0x10/
     1.196793465,<not counted>,cpu/event=0x80,umask=0x4/
     1.196793465,<not counted>,cpu/event=0x79,umask=0x24,cmask=4/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0x79,umask=0x24,cmask=1/
     1.196793465,<not counted>,cpu/event=0x85,umask=0x10/
     1.196793465,<not counted>,cpu/event=0xa3,umask=0x6,cmask=6/
     1.196793465,<not counted>,cpu/event=0x79,umask=0x18,cmask=1/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xa3,umask=0xc,cmask=12/
     1.196793465,<not counted>,cpu/event=0x79,umask=0x18,cmask=4/
     1.196793465,<not counted>,cpu/event=0xa3,umask=0x6,cmask=6/
     1.196793465,<not counted>,cpu/event=0xa3,umask=0xc,cmask=12/
     1.196793465,<not counted>,cpu/event=0xa3,umask=0x5,cmask=5/
     1.196793465,<not counted>,cpu/event=0xa2,umask=0x8/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xa3,umask=0x5,cmask=5/
     1.196793465,<not counted>,cpu/event=0xd1,umask=0x4/
     1.196793465,<not counted>,cpu/event=0xd1,umask=0x20/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xc5,umask=0x0/
     1.196793465,<not counted>,cpu/event=0xe6,umask=0x1f/
     1.196793465,<not counted>,cpu/event=0x5e,umask=0x1/
     1.196793465,<not counted>,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/
....
....
     1.196793465,<not counted>,cpu/event=0xd1,umask=0x4/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xd2,umask=0x2/
     1.196793465,<not counted>,cpu/event=0x7,umask=0x1/
     1.196793465,<not counted>,cpu/event=0x3,umask=0x2/
     1.196793465,<not counted>,cpu/event=0x8,umask=0x10/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0x8,umask=0x60/
     1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=6/
     1.196793465,<not counted>,cpu/event=0xd2,umask=0x1/
     1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=1/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xd2,umask=0x4/
     1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=6/
     1.196793465,<not counted>,cpu/event=0xb7,umask=0x1,offcore_rsp=0x10003c0002/
     1.196793465,<not counted>,cycles
     1.196793465,<not counted>,cpu/event=0xd0,umask=0x42/
     1.196793465,<not counted>,cpu/event=0xd2,umask=0x4/
     1.196793465,<not counted>,cpu/event=0xd0,umask=0x82/

Could this be the reason toplev.py seems to be producing stacked bar-plots that do not sum to 100%. For example, what does it mean in the first level figure is zero, but the back-end bound metrics in level2 is non-zero.

toplev crashes

saw this with a76c89a

$ ~/software/pmu-tools/repo/toplev.py -l1 sleep 10
Using level 1.
perf stat -x, -e 'task-clock,{cpu/event=0xc2,umask=0x2/,cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cpu/event=0x9c,umask=0x1/,cycles}' sleep 10
Traceback (most recent call last):
  File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 1748, in <module>
    ret = execute(runner, out, rest)
  File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 960, in execute
    print_keys(runner, res, rev, valstats, out, interval, env)
  File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 885, in print_keys
    cores = [key_to_coreid(x) for x in res.keys() if int(x) in runner.allowed_threads]
ValueError: invalid literal for int() with base 10: ''

KeyError: 'Description'

Hi,
I just tried pmu-tools/ocerf.py on a haswell box:

$ ./ocperf.py
Traceback (most recent call last):
  File "./ocperf.py", line 774, in <module>
    emap = find_emap()
  File "./ocperf.py", line 599, in find_emap
    emap = json_with_extra(el)
  File "./ocperf.py", line 557, in json_with_extra
    add_extra_env(emap, el)
  File "./ocperf.py", line 574, in add_extra_env
    emap.add_uncore(uc)
  File "./ocperf.py", line 551, in add_uncore
    self.uncore_events[name] = UncoreEvent(name, row)
  File "./ocperf.py", line 241, in __init__
    e.desc = row['Description'].strip()
KeyError: 'Description'

it's trying to open ${HOME}/.cache//pmu-events/GenuineIntel-6-3F-uncore.json which does not contain this field Description at any point.

Any ideas on how to proceed?
Best -

$ uname -a
Linux islay.mpi-cbg.de 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.1.1503 (Core) 
Release:        7.1.1503
Codename:       Core
$ cat /proc/cpuinfo|grep -i "name"|head -n1
model name      : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

ocperf.py event naming doesn't correspond to perf's one

Example:

$ ocperf.py record --event offcore_response.all_reads.l3_hit.hitm_other_core sleep 1
$ perf evlist
offcore_response_all_reads_l3_hit_hitm_other_core

So, ocperf.py event name contains dots, but perf's event name contains only underscore.
It confuses tools which uses perf and doesn't let to use ocperf.py as perf's wrapper.

I believe there are few solutions:

rename all ocperf.py events
Let users to specify perf-styled event names (with underscores)

HSM31 on Xeon v3 valid?

I'm currently testing toplev.py on a Xeon v3 (Haswell) and see this output in level 2 system-wide test:

# ./toplev.py -l2 sleep 5
Will measure complete system.
Using level 2.
warning: removing Memory_Bound Core_Bound due to unsupported events in kernel:
CYCLE_ACTIVITY.CYCLES_NO_EXECUTE CYCLE_ACTIVITY.STALLS_LDM_PENDING
Use --force-events to override (may result in wrong measurements)
Nodes Memory_Bound Core_Bound have errata HSM31 and were disabled.
Override with --ignore-errata

Using the --force-events and --ignore-errata options works as a workaround. However, I'm wondering if the error is valid in the first place?

I see HSM31 in the ~/.cache/pmu-events/GenuineIntel-6-3C-core.json
file and find it in the Intel 4th-gen-core mobile specification update
as HSM31: Performance Monitor UOPS_EXECUTED Event May Undercount.

However, I don't see HSM31 or anything UOPS_EXECUTED-related mentioned in the Intel Xeon E5 v3 specification update for my processor at all.
So is this warning valid?

The CPU of my Haswell test system:

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping:              2
CPU MHz:               2888.281
BogoMIPS:              5010.61
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47

Running the latest CentOS 7.2 kernel:

# uname -a
Linux haswell1 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Issues with energy measuring (--power)

I believe I've found two issues with --power functionality in current HEAD.

First, commit 9458aea925a20c19c9d15056c5dc623dc3fdbf12 appears to break power events (and likely some others too), because after that change valid_events_str is computed too early, before valid_events are populated in Runner::collect.

Reverting that commit seems to fix the issue for me on a HT CPU. However, on a non-HT CPU another issue remains: the metrics are not printed, unless I add -A flag to perf command line.

CPU_CLK_UNHALTED.THREAD{,_P} is same event in ocperf/toplev

Should be different events.

Cannot run toplev.py

I'm running pmu-tools on Intel Xeon E5-2660 (Sandy Bridge). ocperf.py runs fine, but toplev.py always gives me an error "IndexError: list index out of range".

Traceback (most recent call last):
File "./pmu-tools/toplev.py", line 765, in
sys.exit(execute(runner.evnum, runner, out, rest))
File "./pmu-tools/toplev.py", line 461, in execute
runner.print_res(res[j], rev[j], out, interval, j)
File "./pmu-tools/toplev.py", line 654, in print_res
obj.compute(lambda e, level:
File "/home/fei/pmu-tools/simple_ratios.py", line 36, in compute
self.val = EV("IDQ_UOPS_NOT_DELIVERED.CORE", 1) / SLOTS(EV)
File "./pmu-tools/toplev.py", line 655, in
lookup_res(res, rev, e, obj.res_map[(e, level)]))
File "./pmu-tools/toplev.py", line 482, in lookup_res
return res[index]
IndexError: list index out of range

parser cannot read samples in perf.data after

PERF=perf315 ./tester works
PERF=perf316 ./tester

does not read any samples and fails

Problems Running Toplev

I am trying to run this command - sudo ../pmu-tools/toplev.py -I 100 -l3 --title "GNU grep" --graph grep -r asdf /etc/*. On an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

This is setting off an AssertionError inside toplev.

Traceback (most recent call last):
  File "../pmu-tools/toplev.py", line 950, in <module>
    ret = execute(runner, out, rest)
  File "../pmu-tools/toplev.py", line 509, in execute
    env)
  File "../pmu-tools/toplev.py", line 549, in do_execute
    runner.print_res(res[j], rev[j], out, prev_interval, j, env)
  File "../pmu-tools/toplev.py", line 806, in print_res
    obj.compute(lambda e, level:
  File "/home/subho/pmu-tools/hsw_client_ratios.py", line 713, in compute
    self.val = BackendBoundAtEXE(EV, 2)- self.MemoryBound.compute(EV )
  File "/home/subho/pmu-tools/hsw_client_ratios.py", line 30, in BackendBoundAtEXE
    return BackendBoundAtEXE_stalls(EV, level) / CLKS(EV, level)
  File "/home/subho/pmu-tools/hsw_client_ratios.py", line 28, in BackendBoundAtEXE_stalls
    return ( EV("CYCLE_ACTIVITY.CYCLES_NO_EXECUTE", level) + EV("UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC", level) - FewUopsExecutedThreshold(EV, level) - EV("RS_EVENTS.EMPTY_CYCLES", level) + EV("RESOURCE_STALLS.SB", level) )
  File "../pmu-tools/toplev.py", line 807, in <lambda>
    lookup_res(res, rev, e, obj, env, level))
  File "../pmu-tools/toplev.py", line 631, in lookup_res
    assert event_rmap(rev[index]) == canon_event(ev)
AssertionError

This might be related to #7. I downloaded the https://download.01.org/perfmon/HSW/Haswell_core_V15.json and put it in my pmu_events folder as GenuineIntel-6-3C-core.json (instead of the V14 file which does not exist there and which event_download.py was looking for).

Puzzling about ICache_Misses

1.why the metric "ICache_Misses" don't be include in jkt_server_ratios.py ?
2.what is your thinking about that "ICache Misses" formula difference with vtune?
vtune, "ICache Misses" formula is event("ICACHE.MISSES") / query("InstructionsRetired")
pmu-tools, EV("ICACHE.IFETCH_STALL", 3) / CLKS(EV, 3) - ITLB_Miss_Cycles(EV, 3) / CLKS(EV, 3 )

Thank you!

ucevent.py fails with the new perf stat csv output format

The new perf stat csv output (https://lwn.net/Articles/653941/) breaks ucevent.py.

The assertion on line 601 of ucevent.py (assert evp[0] == j) fails because measure() includes the new stats printed after the event name as part of the event name: e.g., in a sample run evp[0] is 'uncore_imc_0/event=0x4,umask=0x3/,103357003,10.39' rather than 'uncore_imc_0/event=0x4,umask=0x3/'.

gen-dot.py can't work with latest ratios files.

gen-dot.py can't work with latest ratios files, because Runner instance has no attribute 'metric' and 'parent':

Traceback (most recent call last):
  File "./gen-dot.py", line 45, in <module>
    m.Setup(runner)
  File "/home/yefeng/pmu-tools-master/ivb_client_ratios.py", line 1604, in __init__
    n = Metric_IPC() ; r.metric(n)
AttributeError: Runner instance has no attribute 'metric'

and

Traceback (most recent call last):
  File "./gen-dot.py", line 48, in <module>
    runner.fix_parents()
  File "./gen-dot.py", line 32, in fix_parents
    if not obj.parent:
AttributeError: Frontend_Bound instance has no attribute 'parent'

I think runner.fix_parents() is not need, and modfied runner.finish(), it works

class Runner:
    def finish(self):
        for n in self.olist:
            if n.level > 1:
                print '"%s" -> "%s";' % (n.parent.name, n.name)
            else:
                print '"%s";' % (n.name)
    def metric(self, n):
        pass

runner = Runner()
m.Setup(runner)
print >>sys.stderr, runner.olist
#runner.fix_parents()
print "digraph {"
print "fontname=\"Courier\";"
runner.finish()
print "}"

NameError: global name 'sample_regs_user' is not defined

perf record -b --call-graph dwarf -- sleep 3
python perfdata.py perf.data

Traceback (most recent call last):
File "/home/ubuntu/Source/pmu-tools/parser/perfdata.py", line 575, in
h = perf_file.parse_stream(f)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 197, in parse_stream
return self._parse(stream, Container())
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 960, in _parse
obj = self.subcon._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 287, in _parse
return self._decode(self.subcon._parse(stream, context), context)
File "/usr/lib/python2.7/dist-packages/construct/adapters.py", line 261, in _decode
return self.inner_subcon._parse(BytesIO(obj), context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 519, in _parse
obj.append(self.subcon._parse(stream, context))
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 659, in _parse
sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 840, in _parse
obj = self.cases.get(key, self.default)._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 270, in _parse
return self.subcon._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 840, in _parse
obj = self.cases.get(key, self.default)._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 430, in _parse
count = self.countfunc(context)
File "/home/ubuntu/Source/pmu-tools/parser/perfdata.py", line 127, in
Array(lambda ctx: sample_regs_user,
NameError: global name 'sample_regs_user' is not defined

#!/usr/bin/python should be #!/usr/bin/env python in toplev.py (as in ocperf.py)

minor issue,

Improper handling of non-consecutive imc uncore pmu dev names

On my machine (running Linux 3.19) imc uncore pmu dev names are not consecutive: i.e., instead of /sys/devices/uncore_imc_{0..3} I have /sys/devices/uncore_imc_{0,1,4,5}. But expand_events (among possibly other places in the code) assume there is no gap in naming, so I end up only two values instead of four.

As a quick/dirty workaround I modified ucexpr.py>expand_events to:

for n in range(10):
    if ucevent.box_exists(...):
        l.append(...)

ocperf: different values of cpu_clk_unhalted.thread_any within the same core

Hi Andi,

I'm trying to measure the unhalted cycles on a core basis. I therefore used ocperf and selected the above mentioned event.

However, what I'm getting from ocperf is sort of weird. I was expecting to get the same values for the two threads that share the same core, however this not seem to be true.

$ sudo ./ocperf.py stat -e cpu_clk_unhalted.thread_any -a -A sleep 5
perf stat -e cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_any/ -a -A sleep 5

Performance counter stats for 'system wide':

CPU0 627.912.025 cpu_clk_unhalted_thread_any
CPU1 627.248.055 cpu_clk_unhalted_thread_any
CPU2 529.161.153 cpu_clk_unhalted_thread_any
CPU3 812.752.677 cpu_clk_unhalted_thread_any

   5,001079353 seconds time elapsed

Any hint at what might be the culprit here?

Some info on my system:

OS: Ubuntu 14.04
CPU : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz

Thank you!

Fix in ocperf

Hallo,

I have a little fix to propose for the process_args function in ocperf.py:

From 21b152a29f59da03769d4db33df720123218de80 Mon Sep 17 00:00:00 2001
From: Omar Awile <[email protected]>
Date: Mon, 12 Sep 2016 10:17:11 +0200
Subject: [PATCH] Pass along optional argv parameter for this case too

---
 ocperf.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ocperf.py b/ocperf.py
index f9bc904..7b1c068 100755
--- a/ocperf.py
+++ b/ocperf.py
@@ -790,7 +790,7 @@ def process_args(emap, argv=sys.argv):
                                              True if record == yes else False, emap)
             cmd.append(prefix + event)
         elif argv[i][0:2] == '-c':
-            oarg, i, prefix = getarg(i, cmd)
+            oarg, i, prefix = getarg(i, cmd, argv=argv)
             if oarg == "default":
                 if overflow is None:
                     print >>sys.stderr,"""

cheers!

not counted issue on Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

Hi community!

I am using perf as :sudo perf stat -e r00c0,r01c0,r01c0:p,r01c0:pp sleep 1

but the result is

Performance counter stats for 'sleep 1':

       464,229 r00c0                                                       
       464,229 r01c0                                                       
 <not counted> r01c0:p                 
 <not counted> r01c0:pp                

   1.001639901 seconds time elapsed

Whys is PEBS not counted here?

Sorry to ask my question here...

Please help me on this.

BDX support

Any plan to add BDX support? Is it safe to use bdxde_*.py files instead, in the meantime?

BDX events are now available on https://download.01.org/perfmon/.

Issue with remote write count

I'm confused with the remote write counter.
I've used ocperf.py list to find out the remote write counter , and I found the events--offcore_response_corewb_llc_miss_any_dram and offcore_response_corewb_llc_hit_any_response may be the write counter.
So I use the command ocperf.py stat -e offcore_response.corewb.llc_miss.any_dram,offcore_response.all_reads.llc_miss.remote_dram,offcore_response.corewb.llc_hit.any_response,mem-stores -I 1000 -C 8 to monitor the system. Than I use numactl to bind milc to physcpu 8 and remote memory, but the result is so confusing.

253.030788963 0 offcore_response_corewb_llc_miss_any_dram (36.40%)
253.030788963 36,177,045 offcore_response_all_reads_llc_miss_remote_dram (36.40%)
253.030788963 0 offcore_response_corewb_llc_hit_any_response (18.18%)
253.030788963 224,853,825 mem-stores (27.21%)
254.030893213 0 offcore_response_corewb_llc_miss_any_dram (36.40%)
254.030893213 35,695,552 offcore_response_all_reads_llc_miss_remote_dram (36.39%)
254.030893213 0 offcore_response_corewb_llc_hit_any_response (18.11%)
254.030893213 230,275,843 mem-stores (27.21%)
255.031004841 0 offcore_response_corewb_llc_miss_any_dram (36.39%)
255.031004841 35,970,716 offcore_response_all_reads_llc_miss_remote_dram (36.31%)
255.031004841 0 offcore_response_corewb_llc_hit_any_response (18.11%)
255.031004841 219,686,387 mem-stores (27.21%)

The result shows the llc miss and llc hit are both 0 .....
So I wonder if the events I chose are wrong?
Hoping for your reply.

Event duplicity?

Hi all,
I have noticed that on my CPU Intel i7-3537U ocperf lists the following events that seem to be the same (event, umask and any flag), what's the meaning of _p?
Thanks

cpu_clk_unhalted.thread: Core cycles when the thread is not in halt state
cpu/event=0x3c,umask=0x0,name=cpu_clk_unhalted_thread/
cpu_clk_unhalted.thread_p: Thread cycles when thread is not in halt state
cpu/event=0x3c,umask=0x0,name=cpu_clk_unhalted_thread_p/

cpu_clk_unhalted.thread_any: Core cycles when at least one thread on the physical core is not in halt state
cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_any/
cpu_clk_unhalted.thread_p_any: Core cycles when at least one thread on the physical core is not in halt state
cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_p_any/

inst_retired.any: Instructions retired from execution.
cpu/event=0xc0,umask=0x0,name=inst_retired_any/
inst_retired.any_p: Number of instructions retired. General Counter - architectural event
cpu/event=0xc0,umask=0x0,name=inst_retired_any_p/

cannot run toplev.py on fresh haswell box (kernel 3.13 and python 2.7)

Hi,

I have some troubles to run toplev on a new box so I wanted to let you know.

Here is an output, from a fresh clone of master branch :

satin@satin-phyexp1:/tmp/pmu-tools$ python --version
Python 2.7.6
satin@satin-phyexp1:/tmp/pmu-tools$ uname -r
3.13.0-37-generic
satin@satin-phyexp1:/tmp/pmu-tools$ ./toplev.py -I 100 -l3 --title "GNU grep" --graph grep -r foo /usr/*
Using level 3.
UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC not found
satin@satin-phyexp1:/tmp/pmu-tools$ []
Traceback (most recent call last):
  File "/tmp/pmu-tools//tl-barplot.py", line 185, in <module>
    plt.subplot(numplots, 1, 1)
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 897, in subplot
    a = fig.add_subplot(*args, **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/figure.py", line 914, in add_subplot
    a = subplot_class_factory(projection_class)(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 9251, in __init__
    self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
  File "/usr/lib/pymodules/python2.7/matplotlib/gridspec.py", line 176, in __getitem__
    raise IndexError("index out of range")
IndexError: index out of range

Is that a bug in toplev ?
Obviously UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC not found looks like a culprit.This is used in hsw_client_ratios.py.
Or am I missing some additional perf libraries or is due to my processor not fully supported (Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz) ?

Thanks in advance for any hints...

tabs and spaces inconsistent

Hi,
From a fresh checkout from master:
File "pmu-tools/toplev.py", line 147
e = e[:e.find(":")]
^
TabError: inconsistent use of tabs and spaces in indentation

And indeed, sometimes there are tabs, and sometimes spaces, and Python 3 doesn't like it.

toplev chart is empty

Hi,

I have tried toplev in order to produce a chart as you show in the README.

However, I just got an empty figure (see attachment).

Command line was:

toplev.py -I 100 -l3 --title "GNU grep" --graph md5sum ~/Downloads/ubuntu-14.04.3-server-amd64.iso

percentages going over 100?

Hi there. First of all, thank you for the tools. I've learned a lot about how to use perf just by looking at how the pmu-tools do it.

I'm seeing this strange output in commit d70840b, using command line options ../pmu-tools/toplev.py --verbose --no-multiplex -l3 --single-thread -- ./myprogram

I consistently get this this printed output whose % is > 100 on a particular test program I am running.

BE Backend_Bound: 82.25 % [100.00%]
BE/Mem Backend_Bound.Memory_Bound: 57.41 % [100.00%]
BE/Mem Backend_Bound.Memory_Bound.L1_Bound: 5.58 % [100.00%]
This metric estimates how often the CPU was stalled without
loads missing the L1 data cache...
Sampling events: mem_load_retired.l1_hit:pp mem_load_retired.fb_hit:pp
BE/Mem Backend_Bound.Memory_Bound.L1_Bound.DTLB_Load: _ 196.05 %below _ [100.00%]
This metric represents cycles fraction where the TLB was
missed by load instructions...
Sampling events: mem_inst_retired.stlb_miss_loads:p

hsx_server_ratios missing

It seems that hsx_server_ratios is not included in the repo. Could you please add it?

toplev.py fails with --level 2 on haswell

[tgrabiec@muninn ~]$ toplev.py -C 0 sleep 2 --level 2
Using level 2.
perf stat -x, -e '{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/},{cpu/event=0xa2,umask=0x8/,cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x9c,umask=0x1/,cpu/event=0x9c,umask=0x1,cmask=4/,cycles,instructions},{cpu/event=0xe,umask=0x1/,cycles,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions}' --cpu 0 sleep 2
Traceback (most recent call last):
  File "/home/tgrabiec/src/pmu-tools/toplev.py", line 950, in <module>
    ret = execute(runner, out, rest)
  File "/home/tgrabiec/src/pmu-tools/toplev.py", line 511, in execute
    runner.print_res(res[j], rev[j], out, interval, j, env)
  File "/home/tgrabiec/src/pmu-tools/toplev.py", line 806, in print_res
    obj.compute(lambda e, level:
  File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 713, in compute
    self.val = BackendBoundAtEXE(EV, 2)- self.MemoryBound.compute(EV )
  File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 30, in BackendBoundAtEXE
    return BackendBoundAtEXE_stalls(EV, level) / CLKS(EV, level)
  File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 28, in BackendBoundAtEXE_stalls
    return ( EV("CYCLE_ACTIVITY.CYCLES_NO_EXECUTE", level) + EV("UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC", level) - FewUopsExecutedThreshold(EV, level) - EV("RS_EVENTS.EMPTY_CYCLES", level) + EV("RESOURCE_STALLS.SB", level) )
  File "/home/tgrabiec/src/pmu-tools/toplev.py", line 807, in <lambda>
    lookup_res(res, rev, e, obj, env, level))
  File "/home/tgrabiec/src/pmu-tools/toplev.py", line 631, in lookup_res
    assert event_rmap(rev[index]) == canon_event(ev)
AssertionError

--level 1 seems to work:

[tgrabiec@muninn ~]$ toplev.py -C 0 sleep 2 --level 1
WARNING: HT enabled
Measuring multiple processes/threads on the same core may is not reliable.
Using level 1.
perf stat -x, -e '{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/}' --cpu 0 sleep 2
Backend Bound:                                 49.06% 
    This category reflects slots where no uops are being delivered due to a lack
    of required resources for accepting more uops in the Backend of the pipeline.
Frequency:                                      1.12 metric
    Frequency in Ghz

Problem with performance counters for Xeon D-1540

The Xeon D-1540 appears to have a problem where only 4 of the 8 perf counters per core actually count, whereas the other 4 remain zero (with hyperthreading disabled). I experienced this issue and then saw that other people have had the same issue: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/560536

I don't know if this affects other processor in that family, but it obviously ends up giving bogus pmu-tools results for this family of processors when hyperthreading is disabled. Might want to check for that particular processor and then limit the number of counters per perf set to only 4.

Note that in addition to only 4 out of 8 counters available, the LLC counter values also have their own set of problems as described at that page (also confirmed with my CPU). There's actually a lot of counter-related problems with this processor...
http://www.intel.com/content/www/us/en/processors/xeon/xeon-d-1500-specification-update.html

ocperf.py misparses "long" --event option

This fails:

$ ./ocperf.py record --event mem_load_uops_retired.l1_hit echo 1
perf record --event mem_load_uops_retired.l1_hit echo 1
event syntax error: 'mem_load_uops_retired.l1_hit'
                     \___ parser error
Run 'perf list' for a list of valid events

 Usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list available events

Passes:

$ ocperf.py record -e mem_load_uops_retired.l1_hit echo 1

The parsing code in ocperf.py does not handle long "--event" properly, apparently:

        elif sys.argv[i][0:2] == '-e':  # <--- oops, this is not for "--event"
            event, i, prefix = getarg(i, cmd)
            event, overflow = process_events(event, print_only,
                                             True if record == yes else False)
            cmd.append(prefix + event)

perfpd doesn't seem to handle MMAP2

Running perfpd doesn't seem to properly symbolize perf.data files that contain MMAP2

Can't collect some events on Xeon E5-2630 v3

Currently, I am trying to analyze my application by using toplev.py. However, it seems like that Xeon E5-2630 v3 is not supported. Specifically, I could not get the frontend, retiring and Bad speculation information. In addition, the backend information could not generate the detailed information such as memory bound and core bound information.

$] python toplev.py -l5 my_app
28 events not supported
0     BE      Backend_Bound:                67.04%
        This category reflects slots where no uops are being
        delivered due to a lack of required resources for accepting
        more uops in the Backend of the pipeline...
0             CPU utilization:        0.89 CPUs
        Number of CPUs used...
1     BE      Backend_Bound:                67.43%
1             CPU utilization:        0.89 CPUs

(I am sorry to disturb the issue article.)

Data collection about HyperThread/SMT

"-p/--pid mode not compatible with SMT. Use sleep in global mode." why? It's affected by perf?
I had gotten a answer from Intel vtune engineer, SMT can compatible with -p.
when computing Core actual clocks, and smt_enabled is true, however, the value nearly equal between "CPU_CLK_UNHALTED.THREAD:amt1" and ""CPU_CLK_UNHALTED.THREAD". Is it normal phenomenon?

Core actual clocks

def CORE_CLKS(EV, level):
return (EV("CPU_CLK_UNHALTED.THREAD:amt1", level) / 2) if smt_enabled else CLKS(EV, level)

Thank you! :)

Issue with '-v' flag

I'm trying to run toplev.py with a docker container as a workload, I use the following command:

python toplev.py --core C0 -l1 -I 1000 -x, -o ../benchmarks/mediaStreamingLevel1I1000msC0.csv taskset -c 0 docker run -t --name=streaming_client -v /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:client 172.18.0.2
When I run this the toplev is removing the '-v' flag present in the docker command which is causing errors. The output is:

Will measure complete system
Using level 1.
perf stat -x\; -e
'{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -I 1000 -C 0,18,36,54 -A -a taskset -c 0 docker run -t --name=streaming_client /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:client 172.18.0.2
Unable to find image '/path/to/output:/output:latest' locally

This might be happening as toplev also has the '-v ' flag (--verbose or -v). Without using toplev the docker container runs fine.

The tools supporting cpu framework?

Hi, I want to get the level 2 matrics and some level 3 metrics using the tool "toplev".

I want to confirm about which cpus framework the "toplev" tools now supports on ? SNB, IVB, HSW, BDW ?

return res[index][cpuoff] IndexError: tuple index out of range

ubuntu
cpu: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
kernel :Linux ubuntu 3.16.0-31-generic

root@ubuntu:~/pmu-tools# toplev.py -l2 -p 2004
Running in HyperThreading mode. Will measure complete system.
Using level 2.
perf stat -x, -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1,any=1/,cpu/event=0xc2,umask=0x2/},{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x9c,umask=0x1,cmask=4/},{cpu/event=0x3c,umask=0x0,any=1/,instructions,cpu/event=0x9c,umask=0x1/,cycles,cpu/event=0x9c,umask=0x1,cmask=4/},{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cycles,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions}' -A -a -p 2004
7 ---->print index in code./pmu-tools/toplev.py 915 line
(11946234493.0,) ----->print res[index]
0 ---->print cpuoff
then repeat
6
(3199005707.0,)
0

7
(11946234493.0,)
1

Traceback (most recent call last):
File "/root/pmu-tools/toplev.py", line 1377, in
ret = execute(runner, out, rest)
File "/root/pmu-tools/toplev.py", line 728, in execute
print_keys(runner, res, rev, out, interval, env)
File "/root/pmu-tools/toplev.py", line 682, in print_keys
runner.print_res(r, rev[cpus[0]], out, interval, core_fmt(core), env, Runner.SMT_yes, stat)
File "/root/pmu-tools/toplev.py", line 1161, in print_res
obj.compute(lambda e, level:
File "/root/pmu-tools/ivb_server_ratios.py", line 637, in compute
self.val = (STALLS_MEM_ANY(EV, 2) + EV("RESOURCE_STALLS.SB", 2)) / CLKS(EV, 2 )
File "/root/pmu-tools/ivb_server_ratios.py", line 71, in STALLS_MEM_ANY
return EV(lambda EV , level : min(EV("CPU_CLK_UNHALTED.THREAD", level) , EV("CYCLE_ACTIVITY.STALLS_LDM_PENDING", level)) , level )
File "/root/pmu-tools/toplev.py", line 1162, in
lookup_res(res, rev, e, obj, env, level, stat.referenced))
File "/root/pmu-tools/toplev.py", line 902, in lookup_res
for off in range(cpu.threads)])
File "/root/pmu-tools/ivb_server_ratios.py", line 71, in
return EV(lambda EV , level : min(EV("CPU_CLK_UNHALTED.THREAD", level) , EV("CYCLE_ACTIVITY.STALLS_LDM_PENDING", level)) , level )
File "/root/pmu-tools/toplev.py", line 901, in
lookup_res(res, rev, ev, obj, env, level, referenced, off), level)
File "/root/pmu-tools/toplev.py", line 919, in lookup_res
return res[index][cpuoff]
IndexError: tuple index out of range

we think the cpuoff =1 out of rang because of that the res[index] only have one member.

How can we fix the bug?

Thanks