noma / hexciton_benchmark Goto Github PK

License: Boost Software License 1.0

Makefile 0.30% Shell 7.89% C++ 58.19% C 30.02% R 3.60%

hexciton_benchmark's Introduction

To build and run simply type 'make'

Note: Readers with older host CPUs will see a core dump at the very end
because some host benchmarks require AVX instructions. This only affects
the host openmp results.

Note: Readers who do not have their home directories automounted will
need to manually run the mic benchmarks (after typing 'make'):
   sh run_mic_nohome.sh bin.mic/benchmark_omp


Note:
=====

You can find the latest version on GitHub:
	https://github.com/noma/hexciton_benchmark

Please open an issue there or send an e-mail to [email protected] if you
have any feedback on the benchmark.

Prepare:
========

Dependencies:
	- Intel C++ Compiler (the OpenCL examples can also be built with other
	  compilers, e.g. g++)
	- an OpenCL SDK
	- a Xeon Phi Coprocessor with MPSS installed (otherwise the MIC examples
	  will not build/run)
	+ everything listed in "thirdparty/Readme"
	
	For Plotting:
	- GNU R
	- pdfcrop
	
Follow the "Readme" in "thirdparty".

Build:
======

Use "make_omp.sh" and "make_ocl.sh" to build the OpenMP 4.0 and OpenCL
benchmarks. Have a look at the scripts, or run them with "-h" for further
details.

For the OpenCL benchmarks:
	./make_ocl.sh -c # Build the CPU version
	./make_ocl.sh -a # Build the Xeon Phi version
	./make_ocl.sh -g # Build the GPU version

For the OpenMP benchmarks:
	./make_omp.sh -c # Build the CPU version
	./make_omp.sh -a # Build the Xeon Phi version

Run:
====

OpenCL: 
	NOTE: Run from this directory, otherwise, the kernel sources won't be
	      found.
	Host (CL_DEVICE_TYPE_CPU): 
		bin/benchmark_ocl_host
	MIC (CL_DEVICE_TYPE_ACCELERATOR): 
		bin/benchmark_ocl_mic
	GPU (CL_DEVICE_TYPE_GPU): 
		bin/benchmark_ocl_gpu


OpenMP: 
	NOTE: If you do not have a shared home between host and MIC, you might have
	      to manually copy files to the MIC file system and manually start the
	      benchmark.
	Host:
		bin/benchmark_omp
	MIC (native):
		./run_mic.sh bin.mic/benchmark_omp

Evaluate:
=========

All benchmark results are written to standard output, all other messages to
standard error. Piping the output into a file will produce a tab-separated
easy-to-work-with table. Examples:

# Benchmark results are piped to results, meesages go to error.log:
bin/benchmark_ocl_mic > ocl_mic.data 2> error.log

# For better readability, pipe the output through column:
cat ocl_mic.data | column -t

# If you are only interested in a subset of the columns, use cut:
cut -f 1,2,4,5 ocl_mic.data | column -t

There are also GNU R scripts for generating PDF plots from the results:

# Plot benchmark results:
Rscript plot_results.r ocl_mic.data ocl_mic.pdf average

# Compare two benchmarks (from the same benchmark binary):
Rscript plot_compare.r ocl_mic_host1.data ocl_mic_host2.data ocl_host1_vs_host2.pdf average

APPENDIX:
=========

Assignment of plot labels to kernel source files.

Figure: runtime improvements per OpenCL optimisation
	"initial kernel"          "commutator_ocl_initial.cl"
	"first refactoring"	      "commutator_ocl_refactored.cl"
	"AoSoA naive"             "commutator_ocl_aosoa_naive.cl"
	"AoSoA 2D-NDRange"        "commutator_ocl_aosoa.cl"
	"compile-time constants"  "commutator_ocl_aosoa_constantscl"
	"manual vectorization"	  "commutator_ocl_manual_aosoa_constants_direct_perm.cl" (CPU)
                          "commutator_ocl_manual_aosoa_constants_direct_prefetch.cl" (MIC)
	
Figure: OpenCL vectorization approaches
	"AoSoA 2D-NDRange"                 "commutator_ocl_aosoa.cl"
		                               "commutator_ocl_manual_aosoa.cl"
	"compile-time constants"           "commutator_ocl_aosoa_constants.cl"
		                               "commutator_ocl_manual_aosoa_constants.cl"
	"direct accumulation"              "commutator_ocl_aosoa_direct.cl"
		                               "commutator_ocl_manual_aosoa_direct.cl"
	"constants + direct write"         "commutator_ocl_aosoa_constants_direct.cl"
		                               "commutator_ocl_manual_aosoa_constants_direct.cl"
	"constants, direct, permuted k,j"  "commutator_ocl_aosoa_constants_direct_perm.cl"
		                               "commutator_ocl_manual_aosoa_constants_direct_perm.cl"

Figure: performance portability across different devices
	"fastest kernel on Sandy Bridge"  "commutator_ocl_aosoa"
	"fastest kernel on Haswell"       "commutator_ocl_manual_aosoa_constants_direct_perm"
	"fastest kernel on MIC"           "commutator_ocl_manual_aosoa_constants_direct_prefetch"
	"fastest kernel on GPGPU"         "commutator_ocl_gpu_final"

Figure: OpenMP kernel runtimes
	"AoSoA base kernel, auto"                                 "commutator_omp_aosoa.cpp"
	"AoSoA base kernel, manual"                               "commutator_omp_manual_aosoa.cpp"
	"constants + direct, auto"                                "commutator_omp_aosoa_constants_direct.cpp"
	"constants + direct, manual"                              "commutator_omp_manual_aosoa_constants_direct.cpp"
	"constants + direct\n+ loop 2->3, auto"                   "commutator_omp_aosoa_constants_direct_perm2to3.cpp"
	"constants + direct + loop 4->5\n+ unroll hints, manual"  "commutator_omp_manual_aosoa_constants_direct_perm_unrollhints.cpp"

hexciton_benchmark's People

Contributors

Stargazers

Watchers

hexciton_benchmark's Issues

Make reference kernel and result check optional

Write result table into stringstream/filestream with build_time as column

deviation question

Could you please list the deviation (host and device results) of all the OpenCL kernels running on a CPU or a GPU ?

Thanks

I've added CMake files to the project, which should simplify building with different SDKs a lot. benchmark_ocl can now be built like follows (NOTE: load cmake module on HLRN; add OpenCL-specific CMake stuff as needed).

mkdir build
cd build

cmake -DCMAKE_BUILD_TYPE=Release -DDEVICE_TYPE=CL_DEVICE_TYPE_CPU -DNUM_ITERATIONS=5 -DNUM_WARMUP=1 -DINTEL_PREFETCH_LEVEL=0 -DVEC_LENGTH=4 -DVEC_LENGTH_AUTO=16 -DOCL_BUILD_OPTIONS=""  ..

make -j

This will create a build of benchmark_ocl and all dependencies inside the folder build. The variables from the old make_ocl.sh are just passed through via -D ... here, so in a cmake_ocl.sh script, the build function can be replaced by a cmake call like above.

The build dir should be called build.<archictecture>.<sdk_suffix>. The script could then take additional input via the environment or command line:

# example values
BUILD_DIR_SUFFIX= # e.g. intel_sdk_for_opencl_2017_7.0.0.2511_x64
CMAKE_OPTIONS="-DOpenCL_FOUND=True -DOpenCL_LIBRARY=${HOME}/Software/ocl_sdks/intel/intel_sdk_for_opencl_2017_7.0.0.2511_x64/opt/intel/opencl/exp-runtime-2.1/lib64/libintelocl_2_1.so"

Good luck. :-)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.