Giter VIP home page Giter VIP logo

perfexpert's Introduction

PerfExpert

Authors

Antonio Gomez-Iglesias, Leonardo Fialho, Ashay Rane and James Browne

About PerfExpert

PerfExpert combines a simple user interface with a sophisticated analysis engine to:

  • Detect and diagnosis the causes for any core-, socket-, and node-level performance bottlenecks in each procedure and loop of an application.
  • Apply pattern-based software transformations on the application source code to enhance performance on identified bottlenecks.-
  • Provide performance analysis report and suggestions for bottleneck remediation for application’s performance bottlenecks which we are unable to optimize automatically.

PerfExpert is an open-source project. Funding to keep researchers working on PerfExpert depends on the value of this tool to the scientific community. For that reason, it is really important to know where and who are using our tool. We would really appreciate it if you could send us a message ([email protected]) telling us the institution (name and country) you are planning to install and test PerfExpert at.

Documentation:


See doc/ directory.

Version History

Version 4.2 (still in development):

  • integration of VTune;
  • improved support for MACPO;
  • integration of hotspots and events tables in the database so that all the tools use the same tables

Version 4.1.1:

  • LLC (last level cache, usually L3) support added;
  • tools were rename to perfexpert_something;
  • perfexpert_analyzer now is C only, thus Apache Ant is not a requirement anymore;
  • improvements and bug fixes on arguments handling:
    • prefix, before, and after (-p, -b, -a) arguments are split by space to multiple arguments;
    • quoted target program arguments are split by space to multiple arguments (bug fix);
    • it is possible to pass a quoted argument to the target program ("this "is valid" too");
  • several improvements in MACPO;
  • license updated (UT license);
  • "compatibility mode" which accepts the old perfexpert_run_exp syntax (only run the experiments);
  • added the option -n which does not do anything but permits users to check arguments;
  • the compilation of every single tool is options using --disable-tool;
  • option (-o) to sort performance bottlenecks (three sorting options available);
  • the target program now supports relative path, full path, and path search;
  • improved memory usage (freeing memory when possible);
  • moved pre-requisites (externals) to a separated branch;
  • general improvements on database installing and updating;
  • improved interface with the user:
    • analyzer shows the relevance of each module;
    • code transformer shows which optimization has been applied;
    • analyzer and recommendation reports are now shown for every optimization step;
    • adding $PATH search to target programs;
    • clearer messages to the user (outside debug mode);
  • other minor improvements and bug fixes.

Known issues:

  1. The sniffer tool SEGFAULT with PAPI 5.3. Please, find a patch to this issue on the contrib directory.

COPYRIGHT

Copyright (c) 2011-2015 University of Texas at Austin. All rights reserved.

Additional copyrights may follow

This file is part of PerfExpert.

PerfExpert is free software: you can redistribute it and/or modify it under the terms of the The University of Texas at Austin Research License

PerfExpert is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

perfexpert's People

Contributors

antoniogi avatar ashay avatar goyalankit avatar leonardofialho avatar sreesurendran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perfexpert's Issues

autogen.sh fails with GNU automake 1.15

autogen / autoreconf fail to setup the Makefiles due to a warning (GNU automake 1.15):

$ autogen.sh
....
common/Makefile.am: installing 'config/depcomp'
automake: warnings are treated as errors
tools/macpo/Makefile.am:30: warning: source file '$(GTEST_SRC)/gtest-all.cc' is in a subdirectory,
tools/macpo/Makefile.am:30: but option 'subdir-objects' is disabled
...
autoreconf: automake failed with exit status: 1

Attempts to configure fail because Makefile.in is missing:

config.status: creating tools/Makefile
config.status: error: cannot find input file: `tools/perfexpert/Makefile.in'

I managed to fix this error by adding "subdirs-objects" parameter to AM_INIT_AUTOMAKE in configure.ac.

Possibly related to this (or a different bug) is a failure to build macpo due to unresolved variable names:

make[2]: Entering directory '/home/giacomo/hpc/src/perfexpert/tools/macpo'
Makefile:894: ../../contrib/gtest/src/.deps/gtest-all.Plo: No such file or directory
Makefile:895: ../../contrib/gtest/src/.deps/gtest_main.Plo: No such file or directory
Makefile:896: inst/.deps/aligncheck.Po: No such file or directory
Makefile:897: inst/.deps/analysis_profile.Po: No such file or directory
giacomo@giacomo:~/hpc/src/perfexpert/tools/macpo$ ls
$(GTEST_SRC)   $(srcdir)    Makefile.am  analyze   inst      libset       tests
$(MINST_BASE)  $(utestdir)  Makefile.in  common    libmacpo  macpo.sh
$(itestdir)    Makefile     README.md    examples  libmrt    macpo.sh.in

Copying the object files manually to the right folders solved this issue.

sniffer fails when using PAPI v5.3.0

The make install of PerfExpert fails when PAPI v5.3.0 is being used, because the sniffer tool is segfaulting:

make[3]: Entering directory `/tmp/perfexpert-4.1.1/tools/sniffer'
test -z "/path/to/PerfExpert/4.1.1/bin" || /bin/mkdir -p "/path/to/PerfExpert/4.1.1/bin"
 /bin/sh ../../libtool   --mode=install /usr/bin/install -c sniffer '/path/to/PerfExpert/4.1.1/bin'
libtool: install: /usr/bin/install -c sniffer /path/toPerfExpert/4.1.1/bin/sniffer
./sniffer
make[3]: *** [install-exec-local] Segmentation fault

The built sniffer also segfault when I run it manually, here's what GDB has to say:

(gdb) run
Starting program: /path/to/PerfExpert/4.1.1/bin/sniffer 

Program received signal SIGSEGV, Segmentation fault.
_pe_libpfm4_ntv_code_to_name (EventCode=<value optimized out>, ntv_name=0x7fffffff6160 "ITLB_MISSES:WALK_DURATION", len=1024, event_table=0x2aaaaaf20ce0) at components/perf_event/pe_libpfm4_events.c:869
869     components/perf_event/pe_libpfm4_events.c: No such file or directory.
       in components/perf_event/pe_libpfm4_events.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64

@leonardofialho confirmed this issue, it doesn't occur with PAPI v5.2.0

Intel not generating correct line numbers

This needs further investigation. There are cases where Intel reports, for example, that a hotspot is located in line 73, when it's actually in line 70 (simply because there are three empty lines between line 70 and 73). I can't always reproduce this.

Fix filenumber for MACPO

In this code:
40 void kernel_cpu( par_str par,
41 dim_str dim,
42 box_str* box,
43 FOUR_VECTOR* rv,
44 fp* qv,
45 FOUR_VECTOR* fv)
46 {

The hotspot we collect says kernel_cpu:40. When this is passed to MACPO, it doesn't now what to do with that line. It actually needs kernel_cpu:46. For this particular case, kernel_cpu without number also works.

MACPO: Alignment of data structures

Useful for determining whether a loop can or cannot be vectorized by the compiler.
Instead of monitoring each access in a loop, can we obtain a regex of the stride in terms of the loop index?

Add --with-boost option to configure script

MACPO requires Rose, which requires Boost to be installed. If the path to Boost headers is not specified via CFLAGS / CPPFLAGS / CXXFLAGS, the configure script terminates saying that rose.h was not found. In reality, though, the config.log file shows that the test program failed to compile because Boost headers were not found. Passing --with-boost with the location of Boost should be a clean(er) solution than manipulating the C[PP|XX]FLAGS environment variables.

MACPO: broke include path in Makefile.am

A personal folder is referenced in tools/macpo/analyze/Makefile.am:

macpo_analyze_CXXFLAGS = -I$(srcdir)/../common -Wno-deprecated
-I/scratch/cluster/ashay/apps/sparsehash/include -fopenmp

This brakes the compilation for regular users.

Create a VTune module

The current VTune module is not fully implemented. Use the reporting capabilities of VTune to implement this module, so that we don't depend on an undocumented database (VTune's) to use this information.

MACPO: Access strides in terms of index

Detecting strides in terms of array indexes (instead of cache lines) is useful for determining whether a loop can or cannot be vectorized by the compiler.

PerfExpert v4.1.1 ignores AVX floating-point instructions in reported total FP instrs

We noticed that PerfExpert was reporting 0% floating point instructions for a test program that was heavily using AVX FP instructions.

After looking into this with @leonardofialho, it turns out the ratio.floating_point defined in lcpi.conf is missing the SIMD_FP_256 events.

The following patch (post-installation) seems to fix the issue:

--- PerfExpert/4.1.1/etc/lcpi.conf.orig    2014-05-07 15:42:20.010888000 +0200
+++ PerfExpert/4.1.1/etc/lcpi.conf    2014-05-07 15:45:25.940577000 +0200
@@ -1,6 +1,6 @@
 # LCPI config generated using sniffer
 # version = 1.0
-ratio.floating_point = FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
+ratio.floating_point = SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
 ratio.data_accesses = PAPI_LD_INS / PAPI_TOT_INS
 GFLOPS_(%_max).overall = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2 + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) / PAPI_TOT_CYC) / 8
 GFLOPS_(%_max).packed = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2) / PAPI_TOT_CYC) / 8
@@ -20,6 +20,6 @@
 branch_instructions.overall = (PAPI_BR_INS * BR_lat + PAPI_BR_MSP * BR_miss_lat) / PAPI_TOT_INS
 branch_instructions.correctly_predicted = PAPI_BR_INS * BR_lat / PAPI_TOT_INS
 branch_instructions.mispredicted = PAPI_BR_MSP * BR_miss_lat / PAPI_TOT_INS
-floating-point_instr.overall = (((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
+floating-point_instr.overall = (((SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
 floating-point_instr.slow_FP_instr = (PAPI_FDV_INS * FP_slow_lat) / PAPI_TOT_INS
 floating-point_instr.fast_FP_instr = ((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) / PAPI_TOT_INS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.