tacc / perfexpert Goto Github PK

View Code? Open in Web Editor NEW

31.0 23.0 9.0 326.43 MB

An easy-to-use automatic performance diagnosis and optimization tool for HPC applications

Home Page: http://www.tacc.utexas.edu/perfexpert/

License: Other

Shell 0.38% C 31.30% C++ 40.94% Perl 0.04% PLpgSQL 21.67% HTML 3.74% Makefile 0.58% M4 1.34%

perfexpert's Introduction

PerfExpert

Authors

Antonio Gomez-Iglesias, Leonardo Fialho, Ashay Rane and James Browne

About PerfExpert

PerfExpert combines a simple user interface with a sophisticated analysis engine to:

Detect and diagnosis the causes for any core-, socket-, and node-level performance bottlenecks in each procedure and loop of an application.
Apply pattern-based software transformations on the application source code to enhance performance on identified bottlenecks.-
Provide performance analysis report and suggestions for bottleneck remediation for application’s performance bottlenecks which we are unable to optimize automatically.

PerfExpert is an open-source project. Funding to keep researchers working on PerfExpert depends on the value of this tool to the scientific community. For that reason, it is really important to know where and who are using our tool. We would really appreciate it if you could send us a message ([email protected]) telling us the institution (name and country) you are planning to install and test PerfExpert at.

Documentation:

See doc/ directory.

Version History

Version 4.2 (still in development):

integration of VTune;
improved support for MACPO;
integration of hotspots and events tables in the database so that all the tools use the same tables

Version 4.1.1:

LLC (last level cache, usually L3) support added;
tools were rename to perfexpert_something;
perfexpert_analyzer now is C only, thus Apache Ant is not a requirement anymore;
improvements and bug fixes on arguments handling:
- prefix, before, and after (-p, -b, -a) arguments are split by space to multiple arguments;
- quoted target program arguments are split by space to multiple arguments (bug fix);
- it is possible to pass a quoted argument to the target program ("this "is valid" too");
several improvements in MACPO;
license updated (UT license);
"compatibility mode" which accepts the old perfexpert_run_exp syntax (only run the experiments);
added the option -n which does not do anything but permits users to check arguments;
the compilation of every single tool is options using --disable-tool;
option (-o) to sort performance bottlenecks (three sorting options available);
the target program now supports relative path, full path, and path search;
improved memory usage (freeing memory when possible);
moved pre-requisites (externals) to a separated branch;
general improvements on database installing and updating;
improved interface with the user:
- analyzer shows the relevance of each module;
- code transformer shows which optimization has been applied;
- analyzer and recommendation reports are now shown for every optimization step;
- adding $PATH search to target programs;
- clearer messages to the user (outside debug mode);
other minor improvements and bug fixes.

Known issues:

The sniffer tool SEGFAULT with PAPI 5.3. Please, find a patch to this issue on the contrib directory.

COPYRIGHT

Additional copyrights may follow

This file is part of PerfExpert.

PerfExpert is free software: you can redistribute it and/or modify it under the terms of the The University of Texas at Austin Research License

PerfExpert is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

perfexpert's People

Contributors

Stargazers

Watchers

Forkers

jenstimmerman labrick molguin-qc carlosalexsander thomas-yang ugiwgh lgz-t roystgnr mengshanfeng

perfexpert's Issues

PerfExpert v4.2: Improve documentation

PerfExpert v4.2: Naïve thread imbalance module

PerfExpert v4.2: add MIC support to LCPI module

Restore code after running MACPO

MACPO leaves the instrumented code instead of the original code in the source folder. This needs to be changed.

autogen.sh fails with GNU automake 1.15

autogen / autoreconf fail to setup the Makefiles due to a warning (GNU automake 1.15):

$ autogen.sh
....
common/Makefile.am: installing 'config/depcomp'
automake: warnings are treated as errors
tools/macpo/Makefile.am:30: warning: source file '$(GTEST_SRC)/gtest-all.cc' is in a subdirectory,
tools/macpo/Makefile.am:30: but option 'subdir-objects' is disabled
...
autoreconf: automake failed with exit status: 1

Attempts to configure fail because Makefile.in is missing:

config.status: creating tools/Makefile
config.status: error: cannot find input file: `tools/perfexpert/Makefile.in'

I managed to fix this error by adding "subdirs-objects" parameter to AM_INIT_AUTOMAKE in configure.ac.

Possibly related to this (or a different bug) is a failure to build macpo due to unresolved variable names:

make[2]: Entering directory '/home/giacomo/hpc/src/perfexpert/tools/macpo'
Makefile:894: ../../contrib/gtest/src/.deps/gtest-all.Plo: No such file or directory
Makefile:895: ../../contrib/gtest/src/.deps/gtest_main.Plo: No such file or directory
Makefile:896: inst/.deps/aligncheck.Po: No such file or directory
Makefile:897: inst/.deps/analysis_profile.Po: No such file or directory

giacomo@giacomo:~/hpc/src/perfexpert/tools/macpo$ ls
$(GTEST_SRC)   $(srcdir)    Makefile.am  analyze   inst      libset       tests
$(MINST_BASE)  $(utestdir)  Makefile.in  common    libmacpo  macpo.sh
$(itestdir)    Makefile     README.md    examples  libmrt    macpo.sh.in

Copying the object files manually to the right folders solved this issue.

MACPO: Clean up analysis code

sniffer fails when using PAPI v5.3.0

The make install of PerfExpert fails when PAPI v5.3.0 is being used, because the sniffer tool is segfaulting:

make[3]: Entering directory `/tmp/perfexpert-4.1.1/tools/sniffer'
test -z "/path/to/PerfExpert/4.1.1/bin" || /bin/mkdir -p "/path/to/PerfExpert/4.1.1/bin"
 /bin/sh ../../libtool   --mode=install /usr/bin/install -c sniffer '/path/to/PerfExpert/4.1.1/bin'
libtool: install: /usr/bin/install -c sniffer /path/toPerfExpert/4.1.1/bin/sniffer
./sniffer
make[3]: *** [install-exec-local] Segmentation fault

The built sniffer also segfault when I run it manually, here's what GDB has to say:

(gdb) run
Starting program: /path/to/PerfExpert/4.1.1/bin/sniffer 

Program received signal SIGSEGV, Segmentation fault.
_pe_libpfm4_ntv_code_to_name (EventCode=<value optimized out>, ntv_name=0x7fffffff6160 "ITLB_MISSES:WALK_DURATION", len=1024, event_table=0x2aaaaaf20ce0) at components/perf_event/pe_libpfm4_events.c:869
869     components/perf_event/pe_libpfm4_events.c: No such file or directory.
       in components/perf_event/pe_libpfm4_events.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64

@leonardofialho confirmed this issue, it doesn't occur with PAPI v5.2.0

Intel not generating correct line numbers

This needs further investigation. There are cases where Intel reports, for example, that a hotspot is located in line 73, when it's actually in line 70 (simply because there are three empty lines between line 70 and 73). I can't always reproduce this.

Fix filenumber for MACPO

In this code:
40 void kernel_cpu( par_str par,
41 dim_str dim,
42 box_str* box,
43 FOUR_VECTOR* rv,
44 fp* qv,
45 FOUR_VECTOR* fv)
46 {

The hotspot we collect says kernel_cpu:40. When this is passed to MACPO, it doesn't now what to do with that line. It actually needs kernel_cpu:46. For this particular case, kernel_cpu without number also works.

MACPO needs to modify environment variables to fully automate the compilation of the code

Right now, MACPO needs to the user to change the makefile to include a set of flags and libraries. This can be automated by modifying environment variables, so that the user does not need to modify the makefile.

MACPO: Alignment of data structures

Useful for determining whether a loop can or cannot be vectorized by the compiler.
Instead of monitoring each access in a loop, can we obtain a regex of the stride in terms of the loop index?

Add --with-boost option to configure script

MACPO requires Rose, which requires Boost to be installed. If the path to Boost headers is not specified via CFLAGS / CPPFLAGS / CXXFLAGS, the configure script terminates saying that rose.h was not found. In reality, though, the config.log file shows that the test program failed to compile because Boost headers were not found. Passing --with-boost with the location of Boost should be a clean(er) solution than manipulating the C[PP|XX]FLAGS environment variables.

Instrument each file only once with MACPO

MACPO: Clean up formatting of output bars

MACPO: broke include path in Makefile.am

A personal folder is referenced in tools/macpo/analyze/Makefile.am:

macpo_analyze_CXXFLAGS = -I$(srcdir)/../common -Wno-deprecated
-I/scratch/cluster/ashay/apps/sparsehash/include -fopenmp

This brakes the compilation for regular users.

The following patch (post-installation) seems to fix the issue:

--- PerfExpert/4.1.1/etc/lcpi.conf.orig    2014-05-07 15:42:20.010888000 +0200
+++ PerfExpert/4.1.1/etc/lcpi.conf    2014-05-07 15:45:25.940577000 +0200
@@ -1,6 +1,6 @@
 # LCPI config generated using sniffer
 # version = 1.0
-ratio.floating_point = FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
+ratio.floating_point = SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
 ratio.data_accesses = PAPI_LD_INS / PAPI_TOT_INS
 GFLOPS_(%_max).overall = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2 + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) / PAPI_TOT_CYC) / 8
 GFLOPS_(%_max).packed = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2) / PAPI_TOT_CYC) / 8
@@ -20,6 +20,6 @@
 branch_instructions.overall = (PAPI_BR_INS * BR_lat + PAPI_BR_MSP * BR_miss_lat) / PAPI_TOT_INS
 branch_instructions.correctly_predicted = PAPI_BR_INS * BR_lat / PAPI_TOT_INS
 branch_instructions.mispredicted = PAPI_BR_MSP * BR_miss_lat / PAPI_TOT_INS
-floating-point_instr.overall = (((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
+floating-point_instr.overall = (((SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
 floating-point_instr.slow_FP_instr = (PAPI_FDV_INS * FP_slow_lat) / PAPI_TOT_INS
 floating-point_instr.fast_FP_instr = ((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) / PAPI_TOT_INS