Giter VIP home page Giter VIP logo

memperf's Introduction

/*
 * Memory System Performance Characterisation
 * ECT memperf - Extended Copy Transfer Characterization
 *
 * Thomas M. Stricker <[email protected]>
 * Christian Kurmann  <[email protected]>
 * http://www.cs.inf.ethz.ch/CoPs/ECT/
 *
 * Changes in Changelog
 */

memperf measures the memory bandwidth in a 2 dimensional way.
First it varies the block size which provides information of the
throughput in different memory system hierarchys (different cache 
levels). Secondly it varies the access pattern from contiguous
blocks to different strided accesses.

4 different tests are provided:

load sum (test -m 0 ):
The load sum test measures the memory load performance for all the
blocksizes and access patterns. It accumulates the values in order
to prevent the optimizing compiler to suppress the interesting part
of the test.

const store (test -m 1):
The const store test does the reverse operation of the load test.
It measures the store bandwith for all the blocksizes and access
patterns.

load copy (test -m 2):
The load copy does a strided load test and stores the result in
a contiguous way. It simulates a matrix transpose. It is performed
for all the blocksizes and access patterns.

copy store (test -m 3):
The copy store test is the opposite of the load copy test. It
performs a contiguous load and stores the data in strides. So the
result of the operation is the same as in the load copy test. Again,
all the blocksizes and access patterns are tested.


Usage: memperf -m <mode> [-p] [-s] [-n] [-r] [-i] [-t]
       -m <mode>     : 0 = load sum test
                       1 = const store test
                       2 = load copy test
                       3 = copy store test
                       9 = all of the above tests

       -p <nproc>    : Number of processes (Default: 1 process)
                       (numbers higher than processors in the system
                       make no sense and will give strange results)

       -s <mxstrds>  : Number of strides testet
                       (Default: 22 different strides)

       -n <mxsize>   : Maximum block size tested [2^x double values]
                       (Default: 20 = 8MB)

       -r <minsize>  : Minimum block size tested [2^x double values]
                       (Default: 6 = 512 Bytes)

       -i <mxiters>  : Number of iterations for each test (Default: 16)
                       (the number of iterations is adaptivly chosen to the
                       examined block size, so it does not refers to very
                       small and very large blocks)

       -t <tics/us>  : When using the high resolution clock counter the
        (unix only)    program tries to autodetect the clock frequency.
                       This should work on linux/x86 and linux/alpha systems,
                       on other systems the autodetection might not be
                       reliable, especially on MP systems, so you can
                       override the autodetection.

       -a <useoptasm>: 0 = don't use optimized functions/special instructions
        (unix only)    1 = use only optimized functions (slower in some cases)
                       2 = both methods (Default: 0)
                       (currently only possible with x86 systems, needs CPU
                       with SSE or Enhanced 3dnow! support)

       -c <nrofrep>  : Number of repetitions of each test (Default:3)
                       (to increase reliability of the results, you shouldn't
                       use 1 (especially not in uniprocessor systems), of
                       course the higher the number the longer it takes
                       to complete the benchmark)

       -o <chartrev> : revert chart output (to make import in certain programs
                       easier)



Results:

The maximum results of the test are stored in files (one file for each mode).
The naming convention of the files is as follows:
chart.m0.p2.max      this is the maximum result of a mode 0 test with two
                     processors.

If you want the individual results of each repetition of the benchmark
you need to change the #define chart in lcpy.c, otherwise only the max
files will be generated.
These individual results of the tests are stored in files (one file for each process, each
repetition and each mode).
The naming convention of these files is as follows:
chart.m0.p2.out.r3.2 this is the result of the second process of the third
                     repetition of a mode 0 test with two processors.

All files have the following format (8 character separated colons):

Load Sum    0.5 K     1 K     2 K     4 K
       1   327.68  402.06  431.16  449.65
       2   321.25  368.18  412.18  439.84
       3   280.49  344.98  388.57  425.28
       4   309.13  339.56  375.56  417.43
       5   287.10  316.83  350.97  406.10

The first column determines the stride, the first row the block size.
All values are MB/s.

Visualiation:
We use DeltaGraph 4.0 from Delta Point to visualize the results.
We therefore provide a DeltaGraph library deltagraph.lbr with the
chart. 
deltagraph.dg4 is an example DeltaGraph file with one chart.
deltagraph.ps is a sample print.
We also provide an Excel Spreadsheet which generates similar charts.
Feel free to modify it.


Papers:
To understand the benchmark in theory, further reading is provided in the 
following ISCA and HPCA papers:

T. Stricker, T.Gross Global Address Space, Non-Uniform Bandwidth: 
A Memory System Performance Characterization of Parallel Systems
Reprint from proceedings of HPCA'97, Feb 1-5,1997, San Antonio, TX.

T. Stricker and T. Gross. Optimizing Memory System Performance for 
Communication in Parallel Computers . 
Reprint from proceedings of ISCA'95, June 1995. 

Both papers are available under: http://www.cs.inf.ethz.ch/cops/ECT



memperf's People

Stargazers

 avatar Oliver Xu avatar

Forkers

victorygogogo

memperf's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.