ocaml-bench / sandmark Goto Github PK

A benchmark suite for the OCaml compiler

License: The Unlicense

Makefile 0.18% OCaml 17.75% C 7.34% Shell 0.10% PHP 0.02% NASL 0.01% Jupyter Notebook 73.76% Coq 0.81% Dockerfile 0.01% Python 0.03%

sandmark's Introduction

Sandmark

Sandmark is a suite of OCaml benchmarks and a collection of tools to configure different compiler variants, run and visualise the results.

Sandmark includes both sequential and parallel benchmarks. The results from the nightly benchmark runs are available at sandmark.tarides.com.

📣 Attention Users 🫵

If you are interested in only running the sandmark benchmarks on your compiler branch, please add your branch to sandmark nightly config. Read on if you are interested in setting up your own instance of Sandmark for local runs.

FAQ

How do I run the benchmarks locally?

On Ubuntu 20.04.4 LTS or newer, you can run the following commands:

# Clone the repository
$ git clone https://github.com/ocaml-bench/sandmark.git && cd sandmark

# Install dependencies
$ make install-depends

# Install OPAM if not available already
$ sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
$ opam init

## You can run all the serial or parallel benchmarks using the respective run_all_*.sh scripts
## You can edit the scripts to change the ocaml-version for which to run the benchmarks

$ bash run_all_serial.sh   # Run all serial benchmarks
$ bash run_all_parallel.sh   # Run all parallel benchmarks

You can now find the results in the _results/ folder.

How do I add new benchmarks?

See CONTRIBUTING.md

How do I visualize the benchmark results?

Local runs

To visualize the local results, there are a handful of IPython notebooks available in notebooks/, which are maintained on a best-effort basis. See the README for more information on how to use them.
You can run sandmark-nightly locally and visualize the local results directory using the local Sandmark nighly app.

Nightly production runs

Sandmark benchmarks are configured to run nightly on navajo and turing. The results for these benchmark runs are available at sandmark.tarides.com.

How are the machines tuned for the benchmarking?

You can find detailed notes on the OS settings for the benchmarking servers here

Overview

Sandmark uses opam, with a static local repository, to build external libraries and applications. It then builds any sandmark OCaml benchmarks and any data dependencies. Following this it runs the benchmarks as defined in the run_config.json

These stages are implemented in:

Opam setup: the Makefile handles the creation of an opam switch that builds a custom compiler as specified in the ocaml-versions/<version>.json file. It then installs all the required packages; the packages versions are defined in dependencies/template/*.opam files. The dependencies can be patched or tweaked using dependencies directory.
Runplan: the list of benchmarks which will run along with the measurement wrapper (e.g. orun or perf) is specified in run_config.json. This config file is used to generate dune files which will run the benchmarks.
Build: dune is used to build all the sandmark OCaml benchmarks that are in the benchmarks directory.
Execute: dune is used to execute all the benchmarks sepcified in the runplan using the benchmark wrapper defined in run_config.json and specified via the RUN_BENCH_TARGET variable passed to the makefile.

Configuration of the compiler build

The compiler variant and its configuration options can be specified in a .json file in the ocaml-versions/ directory. It uses the JSON syntax as shown in the following example:

{
  "url" : "https://github.com/ocaml-multicore/ocaml-multicore/archive/parallel_minor_gc.tar.gz",
  "configure" : "-q",
  "runparams" : "v=0x400"
}

The various options are described below:

url is MANDATORY and provides the web URL to download the source for the ocaml-base-compiler.
configure is OPTIONAL, and you can use this setting to pass specific flags to the configure script.
runparams is OPTIONAL, and its values are passed to OCAMLRUNPARAM when building the compiler. Note that this variable is not used for the running of benchmarks, just the build of the compiler

Execution

orun

The orun wrapper is packaged as a separate package here. It collects runtime and OCaml garbage collector statistics producing output in a JSON format.

You can use orun independently of the sandmark benchmarking suite, by installing it, e.g. using opam install orun.

Using a directory different than /home

Special care is needed if you happen to run sandmark from a directory different than home.

If you get error like # bwrap: execvp dune: No such file or directory, it may be because opam's sandboxing prevent executables to be run from non-standard locations.

In order to get around this issue, you may specify OPAM_USER_PATH_RO=/directory/to/sandmark in order to whitelist this location from sandboxing.

Benchmarks

You can execute both serial and parallel benchmarks using the run_all_serial.sh and run_all_parallel.sh scripts. Ensure that the respective .json configuration files have the appropriate settings.

If using RUN_BENCH_TARGET=run_orunchrt then the benchmarks will run using chrt -r 1.

IMPORTANT: chrt -r 1 is necessary when using taskset to run parallel programs. Otherwise, all the domains will be scheduled on the same core and you will see slowdown with increasing number of domains.

You may need to give the user permissions to execute chrt, one way to do this can be:

sudo setcap cap_sys_nice=ep /usr/bin/chrt

Configuring the benchmark runs

A config file can be specified with the environment variable RUN_CONFIG_JSON, and the default value is run_config.json. This file lists the executable to run and the wrapper which will be used to collect data (e.g. orun or perf). You can edit this file to change benchmark parameters or wrappers.

The environment within which a wrapper runs allows the user to configure variables such as OCAMLRUNPARAM or LD_PRELOAD. For example this wrapper configuration:

{
  "name": "orun-2M",
  "environment": "OCAMLRUNPARAM='s=2M'",
  "command": "orun -o %{output} -- taskset --cpu-list 5 %{command}"
}

would allow

$ RUN_BENCH_TARGET=run_orun-2M make ocaml-versions/5.0.0+trunk.bench

to run the benchmarks on 5.0.0+trunk with a 2M minor heap setting taskset onto CPU 5.

Running benchmarks

The build bench target determines the type of benchmark being built. It can be specified with the environment variable BUILD_BENCH_TARGET, and the default value is buildbench which runs the serial benchmarks. For executing the parallel benchmarks use multibench_parallel. You can also setup a custom bench and add only the benchmarks you care about.

Sandmark has support to build and execute the serial benchmarks in byte mode. A separate run_config_byte.json file has been created for the same. These benchmarks are relatively slower compared to their native execution. You can use the following commands to run the serial benchmarks in byte mode:

$ opam install dune.2.9.0
$ USE_SYS_DUNE_HACK=1 SANDMARK_CUSTOM_NAME=5.0.0 BUILD_BENCH_TARGET=bytebench \
    RUN_CONFIG_JSON=run_config_byte.json make ocaml-versions/5.0.0+stable.bench

We can obtain throughput and latency results for the benchmarks. To obtain latency results, we can set the environment variable RUN_BENCH_TARGET to run_pausetimes, which will run the benchmarks with olly and collect the GC tail latency profile of the runs (see the script pausetimes/pausetimes). The results will be files in the _results directory with a .pausetimes.*.bench suffix.

The perf stat output results can be obtained by setting the environment variable RUN_BENCH_TARGET to run_perfstat. In order to use the perf command, the kernel.perf_event_paranoid parameter should be set to -1 using the sysctl command. For example:

$ sudo sysctl -w kernel.perf_event_paranoid=-1

You can also set it permanently in the /etc/sysctl.conf file.

Results

After a run is complete, the results will be available in the _results directory.

Jupyter notebooks are available in the notebooks directory to parse and visualise the results, for both serial and parallel benchmarks. To run the Jupyter notebooks for your results, copy your results to notebooks/ sequential folder for sequential benchmarks and notebooks/parallel folder for parallel benchmarks. It is sufficient to copy only the consolidated bench files, which are present as _results/<comp-version>/<comp-version>.bench. You can run the notebooks with

$ jupyter notebook

Logs

The logs for nightly runs are available at here. Runs which are considered successful are copied to the main branch of the repo, so that they can be visualized using the sandmark nightly UI

Config files

The *_config.json files used to build benchmarks

run_config.json : Runs sequential benchmarks with stock OCaml variants in CI and sandmark-nightly on the IITM machine(turing)
multicore_parallel_run_config.json : Runs parallel benchmarks with multicore OCaml in CI and sandmark-nightly on the IITM machine(turing)
multicore_parallel_navajo_run_config.json : Runs parallel benchmarks with multicore OCaml in sandmark-nightly on Navajo (AMD EPYC 7551 32-Core Processor) machine
micro_multicore.json : To locally run multicore specific micro benchmarks

Benchmarks status

The following table marks the benchmarks that are currently not working with any one of the variants that are used in the CI. These benchmarks are known to fail and have an issue tracking their progress.

Variants	Benchmarks	Issue Tracker
5.0.0+trunk.bench	irmin benchmarks	sandmark#262
4.14.0+domains.bench	irmin benchmarks	sandmark#262

Multicore Notes

ctypes

ctypes 14.0.0 doesn't support multicore yet. A workaround is to update dependencies/packages/ctypes/ctypes.0.14.0/opam to use https://github.com/yallop/ocaml-ctypes/archive/14d0e913e82f8de2ecf739970561066b2dce70b7.tar.gz as the source url.

OS X

This is only needed for multicore versions before this commit

The ocaml-update-c command in multicore needs to run with GNU sed. sed will default to a BSD sed on OS X. One way to make things work on OS X is to install GNU sed with homebrew and then update the PATH you run sandmark with to pick up the GNU version.

Makefile Variables

Name	Description	Default Values	Usage
BENCH_COMMAND	TAG selection and make command to run benchmarks	4.14.0+domains for CI	With current-bench
BUILD_BENCH_TARGET	Target selection for sequential (buildbench) and parallel (multibench) benchmarks	`buildbench`	building benchmark
BUILD_ONLY	If the value is equal to 0 then execute the benchmarks otherwise skip the benchmark execution and exit the sandmark build process	0	building benchmark
CONTINUE_ON_OPAM_INSTALL_ERROR	Allow benchmarks to continue even if the opam package install errors out	true	executing benchmark
DEPENDENCIES	List of Ubuntu dependencies	`libgmp-dev libdw-dev jq python3-pip pkg-config m4`	building compiler and its dependencies
ENVIRONMENT	Function that gets the `environment` parameter from wrappers in `*_config.json`	null string	building compiler and its dependencies
ITER	Indicates the number of iterations the sandmark benchmarks would be executed	1	executing benchmark
OCAML_CONFIG_OPTION	Function that gets the runtime parameters `configure` in `ocaml-versions/*.json`	null string	building compiler and its dependencies
OCAML_RUN_PARAM	Function that gets the runtime parameters `run_param` in `ocaml-versions/*.json`	null string	building compiler and its dependencies
PACKAGES	List of all the benchmark dependencies in sandmark	`cpdf conf-pkg-config conf-zlib bigstringaf decompress camlzip menhirLib menhir minilight base stdio dune-private-libs dune-configurator camlimages yojson lwt zarith integers uuidm react ocplib-endian nbcodec checkseum sexplib0 eventlog-tools irmin cubicle conf-findutils index logs mtime ppx_deriving ppx_deriving_yojson ppx_irmin repr ppx_repr irmin-layers irmin-pack`	building benchmark
PRE_BENCH_EXEC	Any specific commands that needed to be executed before the benchmark. For eg. `PRE_BENCH_EXEC='taskset --cpu-list 3 setarch uname -m --addr-no-randomize'`	null string	executing benchmark
RUN_BENCH_TARGET	The executable to be used to run the benchmarks	`run_orun`	executing benchmark
RUN_CONFIG_JSON	Input file selection that contains the list of benchmarks	`run_config.json`	executing benchmark
SANDMARK_DUNE_VERSION	Default dune version to be used	2.9.0	building compiler and its dependencies
SANDMARK_OVERRIDE_PACKAGES	A list of dependency packages with versions that can be overrided (optional)	""	building compiler and its dependencies
SANDMARK_REMOVE_PACKAGES	A list of dependency packages to be dynamically removed (optional)	""	building compiler and its dependencies
SANDMARK_URL	OCaml compiler source code URL used to build the benchmarks	""	building compiler and its dependencies
SYS_DUNE_BASE_DIR	Function that returns the path of the system installed dune for use with benchmarking	dune package present in the local opam switch	building compiler and its dependencies
USE_SYS_DUNE_HACK	If the value is 1 then use system installed dune	0	building compiler and its dependencies
WRAPPER	Function to get the wrapper out of `run_<wrapper-name>`	run_orun	executing benchmark

sandmark's People

Contributors

Stargazers

Watchers

sandmark's Issues

CI should run parallel benchmarks

Currently we only run the sequential benchmarks in the CI. We should run parallel benchmarks as well. It is useful just to run the benchmark for 2 domains, removing the taskset and chrt commands. This can be done with jq.

Ability to collect perf stats on a benchmark run

On Linux we would like to be able to collect 'perf stat' on the benchmark process.

We would like to collect the following counters:

basic:
task-clock, instructions, cpu-cycles, stalled-cycles-frontend, branches, branch-misses
front-end to back-end pipeline:
idq_uops_not_delivered.core, lsd.cycles_active
memory:
L1-icache-load-misses, L1-dcache-load-misses, iTLB-load-misses, dTLB-load-misses

alt-ergo benchmarks fail on multicore due to Obj.truncate usage

The alt-ergo benchmarks do not run on multicore, for example:

Fatal error: exception Failure("Obj.truncate not supported")
        orun benchmarks/alt-ergo/alt-ergo-yyll.bench

The problem is that Obj.truncate is not supported on multicore.

For info, Obj.truncate will be deprecated in OCaml (likely 4.09):
ocaml/ocaml#2279

Improve microbenchmarks

A few of the microbenchmarks need to be improved. In particular, between multicore and trunk, the finalise microbenchmark measures the performance difference in mark stack overflow handling. lazy_primes measures the efficiency of major heap allocator.

Filtering benchmarks based on tags

We have multiple tags to classify benchmarks in the config files, like macro_bench, run_in_ci and tags based on running time. Makefile currently supports filtering macro and CI benchmarks with custom rules.

It would be useful to generalize this further by taking the following items as inputs:

tag
source file
destination file

All benchmarks with the tag in source file needs to be copied to destination file.

The following jq filter could be used to do the filtering:

jq '{wrappers : .wrappers, benchmarks: [.benchmarks | .[] | select(.tags | index(<tag>) != null)]}'

BUILD_ONLY needs to exit with error if package installation fails

At present, the BUILD_ONLY environment variable stops before executing the benchmarks. We need a way to exit with an error status if any of the dependency installation packages fail. We use --best-effort to ignore any build errors and proceed.

In particular, frama-c (#17) needs to be built successfully for this change to be useful with the CI.

graph500seq: Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

The graph500seq benchmarks are failing in Sandmark (commit aa122e6, January 24, 2021) for macro_bench tag with the following error messages:

Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun kernel2.12_10.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel2.12_10.orun.bench -- taskset --cpu-list 5 ./kernel2.exe 12 10)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

/tmp/orun875078stderr
        orun kernel3.12_10_2.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel3.12_10_2.orun.bench -- taskset --cpu-list 5 ./kernel3.exe 12 10 2)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

You can reproduce the error using:

$ TAG='"macro_bench"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

In benchmarks/graph500seq/dune, there is:

(alias (name buildbench) (deps kronecker.exe kernel1.exe kernel2.exe kernel3.exe))

Note:

The deps are not built in sequential order, and kronecker.exe produces kronecker.txt, which is a pre-requisite for kernel2 and kernel3.
Also, kernel2 and kernel3 use a linkKernel1 function from kernel1.

More parallel benchmarks

Suggestions are:

Cubicle model checker http://cubicle.lri.fr/
Aeio web server https://github.com/kayceesrk/ocaml-aeio
mSat https://github.com/Gbury/mSAT
Linkbench https://github.com/facebookarchive/linkbench
Hack language https://github.com/facebook/hhvm/tree/master/hphp/hack/src
Coq

Promote dune to > 2.0

Currently, sandmark only works with system dune < 2.0. We use dune.1.11.4 (See https://github.com/ocaml-bench/sandmark/blob/master/README.md#pre-requisites). Using dune > 2.0 will break the builds of some packages. The necessary fix is:

Install system dune version > 2.0
Start a benchmarking run and fix all the failing package builds.

Moving forward, this is quite critical as many packages will no longer build with dune < 2.0.

In the linear algebra section, matrix inversion would be handy

I don't want to depend on owl...

Running benchmarks with varying OCAMLRUNPARAM

Right now there isn't an easy way to run experiments where OCAMLRUNPARAM changes but the compiler build stays fixed.

Suppose we wanted to look at the impact of the minor heap size across our benchmarks with 4.10.0+stock.

The run plan we want is:
4.10.0+stock, OCAMLRUNPARAM=s=1M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=2M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=4M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=8M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=16M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=32M, using orun

One way to do this might be to have a way to specify OCAMLRUNPARAM within a wrapper in the run_config.json; this has the advantage that we get the wrapper naming and reuse of the build for free. There may be other better ways to do it.

(NB: the runparams in the compiler variant here is only applied to the opam build, not the run of the benchmarks)

Measurements of code size

For compiler benchmarking it's important to have measurements of code size. The size of the text section of the executable should be fine.

Rename (or remove?) lu_decomposition benchmark in numerical-analysis

We have LU_decomposition in multicore-numerical which can be a source of confusion.

If the numerical-analysis implementation doesn't differ significantly from the one in multicore-numerical then we should just remove it - otherwise we should rename it.

sandmark requires the system package libdw-dev on Ubuntu

Since there is no opam file, its installation is not automatic.

Maybe, there should be an opam file.

Include major cycle information in Jupyter notebook graph

The major cycle, count and words, provide useful information on what benchmarks are GC heavy. The baseline information along with their labels can be included in the graph, as they are more useful than what is available in the normalized graph.

How to run single-threaded benchmarks alone?

make ocaml-versions/4.06.0.bench fails to compile kcas package, unsurprisingly. How do I run just the single threaded benchmarks?

The following actions will be performed:
  ∗ install kcas 0.1.4

<><> Gathering sources ><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[kcas.0.1.4] found in cache

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[ERROR] The compilation of kcas failed at "/home/kc/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas".

#=== ERROR while compiling kcas.0.1.4 =========================================#
# context     2.0.0 | linux/x86_64 | ocaml-base-compiler.4.06.0 | file:///home/kc/sandmark/dependencies
# path        ~/sandmark/_opam/4.06.0/.opam-switch/build/kcas.0.1.4
# command     ~/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas
# exit-code   1
# env-file    ~/sandmark/_opam/log/kcas-8684-1872ec.env
# output-file ~/sandmark/_opam/log/kcas-8684-1872ec.out
### output ###
#       ocamlc src/.kcas.objs/byte/kcas.{cmo,cmt} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlc.opt -w -40 -g -bin-annot -I src/.kcas.objs/byte -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/byte/kcas.cmo -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field
#     ocamlopt src/.kcas.objs/native/kcas.{cmx,o} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlopt.opt -w -40 -g -I src/.kcas.objs/byte -I src/.kcas.objs/native -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/native/kcas.cmx -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
┌─ The following actions failed
│ λ build kcas 0.1.4
└─ 
╶─ No changes have been performed
opam exec --switch 4.06.0 -- dune build -j 1 --profile=release --workspace=ocaml-versions/.workspace.4.06.0 @bench; \
  ex=$?; find _build/4.06.0_* -name '*.bench' | xargs cat > ocaml-versions/4.06.0.bench; exit $ex

`OCAMLRUNPARAM` value should be included in the .bench file

Since OCAMLRUNPARAM values affect the execution of the benchmarks, we should include them in the .bench file for each of the runs. That is, this should include a new field called OCAMLRUNPARAM whose value is the value of the OCAMLRUNPARAM environment variable.

This will require changing the orun tool to include the additional field.

Interface/ability to run compiler variants easily in sandmark

We would like to be able to run compiler variants (e.g. flambda) in sandmark.

Right now, the ocaml-versions/.comp files only contain the source location, but do not allow for custom build flags.

sandmark dune version not compiling with 4.09

sandmark is unable to run the 4.09 branch after this commit:
ocaml/ocaml@a7e7e8e

sandmark is using a custom dune to make it work with multicore:
https://github.com/ocaml-bench/sandmark/blob/master/dependencies/packages/dune/dune.1.7.1/opam

Unfortunately sandmark is unable to compile dune after the 4.09 commit above. We get the following error:

#=== ERROR while compiling dune.1.7.1 =========================================#
# context              2.0.4 | linux/x86_64 | ocaml-base-compiler.4.09.0 | file:///local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/dependencies
# path                 /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/4.09.0/.opam-switch/build/dune.1.7.1
# command              /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/opam-init/hooks/sandbox.sh build ./boot.exe --release -j 5
# exit-code            1
# env-file             /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.env
# output-file          /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.out
### output ###
# Error: This expression has type
# [...]
#          (string, string) Dune.Import.result =
#            (string, string) Dune_caml.result
#        Type string list is not compatible with type string
# -> required by src/.dune.objs/native/dune__Watermarks.cmx
# -> required by alias src/lib-dune.cmx-all
# -> required by alias src/lib-dune.cmi-and-.cmx-all
# -> required by bin/.main.objs/native/main.cmx
# -> required by bin/main.a
# -> required by bin/main_dune.exe
# -> required by _boot/install/default/bin/dune
# -> required by dune.install

Additional benchmarks - hamming and soli

There are some benchmarks which are not in sandmark but are used by others:

hamming.ml: exists in ocaml testsuite, js_of_ocaml benchmarks and operf-micro:
https://github.com/ocaml/ocaml/blob/4.10/testsuite/tests/misc/hamming.ml
soli.ml: exists in ocaml testsuite and js_of_ocaml benchmarks:
https://github.com/ocaml/ocaml/blob/4.10/testsuite/tests/misc-unsafe/soli.ml

frama-c benchmarks fail on multicore (build and run)

frama-c benchmarks fail to build on multicore:

ctypes is a dependency (but see https://github.com/ocamllabs/sandmark#ctypes for workaround)
it uses the name effect which is a keyword in multicore. Patch to get going in sandmark here: https://github.com/ocamllabs/sandmark/tree/bugfix/ctk21/frama-c-multicore-effects

Even with frama-c building, the binary segfaults on multicore. There is a multicore for this: ocaml-multicore/ocaml-multicore#246

Include raw baseline observation in the normalized graphs

The normalized graphs in sandmark currently do not include the raw baseline observations. For example,

The graphs shows the normalized running time comparing two compiler variants. This graph does not convey how much time the baseline actually took. We've had few examples where we've seen 20% speedup / slowdown that can be explained by the fact that the benchmark only runs for a few milliseconds and the difference that we see is due to expected execution time variance and noise.

It would be useful to have the baseline time included in the normalized graphs as additional information with the benchmark name. For example,

The numbers in the parenthesis indicates the time in seconds for the baseline runs.

It would be useful to include the raw baseline observation in every normalised graph.

js_of_ocaml fails to run on multicore

See issue #17 on how to get frama-c building on multicore.

js_of_ocaml was failing with Error: Js_of_ocaml_compiler__Instr.Bad_instruction(983041)

might be additional bytecode instructions added in multicore and not supported by js_of_ocaml?

Add the ability to average out the results from several runs

Currently, the Makefile takes the number of iterations to run the benchmarks as an environment variable ITER (default is 1). Each iteration produces a separate folder under _results.

While we have removed much of the noise from the executions and our results are generally reproducible, it would be useful to add the ability to average out the stats from several runs. For example, ASLR is turned on by default on Linux. However, for benchmarking runs, we turn off ASLR as it introduces noise and affects reproducibility. The right thing is to have ASLR on and then get the average of several runs.

That said, average might not make sense for all of the topics; max pause time, max resident set size, for example. Now that the Jupyter notebooks are part of the Sandmark repo, we should add the ability in the notebooks to process multiple iterations of the same compiler variant and compute averages (and also median and SD, when it makes sense).

a numerical bench: Gram matrix initialization in parallel

https://github.com/UnixJunkie/gram-matrix-bench

the multicore-OCaml portion needs to be filled in.

You are welcome to import this test into your test suite.
Gram matrix initialization is useful in several machine learning applications.

Regards,
F.

Enrich the functionality of .comp files

Currently, the compiler variants in .comp files only take the URL. We've seen many requests for enriching the functionality of .comp files so that the builds and runs may be customized. Some of these are:

OCAMLRUNPARAM parameters for the runs. Consider the case of running the same compiler variant with different OCAMLRUNPARAM parameters. Currently, there is no way to configure this as part of the variant and must be externally applied during runs.
Compiler configuration flags such as enabling flambda.
Different wrappers (orun, pauseimes, perf). These are now hardcoded in the .json files [1]. It will be better if these were described in the .comp files.

I would consider 1 and 2 to be high priority right now. Once that is done, we can do 3. I recommend using s-expression for the new data format of .comp files, and parse it with the help of sexplib [2]. It might also be useful to rename .comp to .var to signify that these are different variants not compilers (same compiler might have different variants c.f. item 1 above).

[1] https://github.com/ocaml-bench/sandmark/blob/master/multicore_parallel_run_config.json#L2-L15
[2] https://github.com/janestreet/sexplib

@shakthimaan

[RFC] Classifying benchmarks based on running time

Currently, we have classified benchmarks using two tags -- macro_bench and runs_in_ci. This is a useful classification, but we've had cases of mislabelling where benchmarks that run for a few milliseconds being classified as macro. Moreover, the original idea of runs_in_ci was meant to be running those benchmarks run reasonably fast so that we don't exhaust the 1hr CI limit. But it is unclear whether this is being followed necessarily. There are a few benchmarks that now run for more than 100 seconds, but don't bring in much value themselves. They tend to be too long to be useful for an initial benchmarking of a new feature.

To address this I propose the following scheme. We will get rid of macro_bench and runs_in_ci and instead use the following classification:

lt_1s - benchmarks that run for less than 1 second
1s_10s - benchmarks that run for at least 1 second but less than 10 seconds
10s_100s - benchmarks that run for at least 10 seconds but less than 100 seconds
gt_100s - benchmarks that run for at least 100 seconds

We classify the benchmarks based on their running time on the turing machine: Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz.

For the initial performance benchmarking, all the benchmarks in the 1s to 100s range should be considered. This will be the replacement for macro_bench.

For the CI, we will run benchmarks that are in the 1s to 10s range. This will be the replacement for runs_in_ci tag.

Any parallel benchmark should have a serial baseline version that runs for at least 10s. Otherwise, the parallelization overheads outweigh the benefit of parallelism.

We should have a hard look at any benchmarks that run for less than 1s and see whether they're giving us any useful signals.

Deprecate the use of 4.06.1 and 4.10.0 in Sandmark

We no longer support multicore versions 4.06.1 and 4.10.0 in multicore. The task is to ensure that all the references to either of these versions is removed from the documentation and the scripts, and making sure that 4.12.0 works well with the new documentation.

[RFC] Header entry attributes for the summary benchmark result file

At present, the ocaml_bench_scripts creates the benchmark results files in a hierarchical directory structure that has information regarding hostname, GitHub commit, branch, timestamp etc. One option is to flatten this data, and also add additional meta-information as a header entry to each consolidated .bench result file. The advantages are:

The list of bench files can be used locally with a JupyterHub notebook, without having to create the necessary directory structure.
Each .bench file is self-contained, and can be easily stored and accessed in a file system archive. This allows an Extract, Transform, Load (ETL) tool to push data to a database or for visualization for further analysis.
A useful list of key-value attributes that can be included in the header entry of the JSON .bench file are:

Version
Timestamp
Hostname
Operating System
Kernel version
Architecture
GitHub repository
GitHub branch
GitHub commit
Compiler variant
Compiler configure options
Compiler runtime options

Running sandmark on OS X is broken

Looks like we've inadvertently broken the ability to run sandmark on OS X.
It looks like there's a couple of issues here:

some of the opam pinned packages are failing to build for OS X 10.15.x for example Lwt
the benchmarks themselves just fail to run and sadly it doesn't give any errors to help:

make: *** [ocaml-versions/4.10.0.bench] Error 1

It feels like the ability to run even just the pure OCaml benchmarks (e.g. benchmarks/multicore-numerical/game_of_life.ml) with no package dependencies is useful.

To fix OS X support there are probably two things to do here:

fix the underlying issue(s) so that benchmarks run on OS X
add OS X to the CI

Are we able to make ctypes automatically work for both multicore and stock ocaml

Right now we have a workaround for getting ctypes to work on multicore:
https://github.com/ocamllabs/sandmark#ctypes

Is there a solution that will make it work on both stock ocaml and multicore without editing the source location?

simple-tests/capi - Fatal error: "unexpected test name"

The capi tests are failing in Sandmark master branch (commit aa122e6, January 24, 2021) for the lt_1s, 1s_10s tags with the following error messages:

...
Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun capi.test_few_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_few_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_few_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orun898647stderr
        orun capi.test_many_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_many_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_many_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orunf8dd14stderr
        orun capi.test_no_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_no_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_no_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

You can reproduce the issue using:

$ TAG='"lt_1s"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

split out multibench target into serial and parallel

We have old code to split the multibench target into serial and parallel:
https://github.com/ocaml-bench/sandmark/tree/split_multibench_serial_parallel

This issue is a placeholder as we closed out #39 to progress #54

Add an odoc based benchmark

It would be useful to have more memory intensive benchmarks. It looks like odoc has dependencies that shouldn't be to complex to add considering what is already present. It will also build on the multicore 4.12+domains variant.

I'm sure that the odoc team can provide some pointers to a couple of workloads that are fairly memory intensive and representative of the sort of thing they see.

Lockfree benchmarks

Currently we have a few lockfree benchmarks in multicore-lockfree. Results of these are in #194.
This a placeholder issue for ocaml-multicore/saturn#6 - benchmarks section, which lists few items needing work on lockfree benchmarks. This issue can be closed after they are addressed.

[RFC] How should a user configure a sandmark run?

If we look at things as a user wanting to run a benchmarking experiment, then they are wanting to compose several components to get data. The components they want to compose are:

A compiler built with their favourite configure switches
A set of benchmarks they want to run (expressed preferably as the end binary rather than needing to know the dependencies)
A program that collects stats from the binary (e.g. perf, orun, binary size)
An environment configuration for the execution of the stats collection (e.g. taskset, ASLR, OCAMLRUNPARAM)

The user configures (and could do so implicitly from defaults) these components into a run plan of benchmark executions to get data.

The mechanisms for configuring up the above has evolved as we’ve tried to do more with sandmark and (currently) is spread out:

the compiler build is specified in the .comp file
the benchmarks and their parameters are in run_config.json but you have to get the BUILD_BENCH_TARGET correct (e.g. multicore vs serial benchmarks) for all the opam packages because the benchmark programs don’t bring their dependent opam packages with them.
the stat collection wrappers are in run_config.json
the environment configuration is a bit all over the place. Some of it is going on in the wrappers in run_config.json, we have some stuff happening (e.g. crontab scripts) with PRE_BENCH_EXEC and we have PARAMWRAPPER handling multicore tasksetting

With this issue I'm hoping to find out if this tuple of:

(
compiler build, 
set of benchmarks (incl dependencies),
executable to collect stats from a benchmark command, 
environment within which to run the executable
)

covers all the use cases we need?

Once we've got the use cases nailed down. We can try to have proposals or prototypes that attempt to make the configuration a bit easier to handle than the current mix of environment variables, comp files and json files. This should make it easier for users to run the experiments they want.

Reimplement broken pausetimes support

In Multicore OCaml 4.12.0, the catapult format (used by chrome://tracing/) based event tracing is deprecated. Multicore 4.12.0 will soon produce CTF based traces ocaml-multicore/ocaml-multicore#527. We need to update the scripts to adapt to the new tracing framework. This involves several tasks:

Modify the pausetimes scripts for multicore and stock to emit the CTF based traces
Rewrite the tail latency computation scripts to read the CTF format directly. Catapult traces produced by Multicore OCaml quickly go to Gigabytes in size and it wasn't practical to analyse the pausetime for long-running programs due to the large file sizes.
Possibly, the new tail latency scripts should be implemented in OCaml rather than python. We observed that python happens to be really slow in processing medium-sized eventlog traces; multiple minutes for processing the eventlog of a single program run. If it turns out to be slow to process the CTF traces, we should consider rewriting this in OCaml for speed.

Sandmark Analyze notebooks

The new sandmark analyze (https://github.com/ocaml-bench/sandmark-analyze) notebooks should be hosted as a service. This will help the exploration of measurements other than running time.

Noise in Sandmark

Following the discussion in ocaml/ocaml#9934, I set out to quantify the noise in Sandmark macrobenchmark runs. Before asking complex questions about loop alignments and microarchitectural optimisations as was done in ocaml/ocaml#10039, I wanted to measure the noise between multiple runs of the same code. It is important to note that currently, we only run a single iteration of each variant.

The benchmarking was done on IITM "turing" which is a Intel Xeon Gold 5120 CPU machine with isolated cores, cpu governer set to performance, hyper threading disabled, turbo boost disabled, interrupts and rcu_callbacks directed to non-isolated cores but ASLR on [1]. The result on two runs of the latest commit from https://github.com/stedolan/ocaml/tree/sweep-optimisation is here:

The outlier is worrisome, but there is up to 2% difference in both directions. Moving forward, we should consider the following:

Arrive at a measure for statistical significance on a given machine. What would be the minimum difference beyond which the result is statistically significant. This will vary based on the benchmark and the topic (running time, maxRSS).
Run multiple iterations. Sandmark already has an ITER variable which runs the experiments for multiple runs. The notebooks need to be updated so that mean (and standard deviation) are computed first and the graphs are updated to include error bars. The downside is that the benchmarking will take significantly longer. We should choose a representative set of macro benchmarks for quick study and reserve the full macro benchmark run for the final result. Can we run the sequential macro benchmarks in parallel on different isolated cores? What would be the impact of this on individual benchmark runs?

[1] https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking

Making Output location for run statistics 'required'

The command line does not specify that the output location is mandatory or else orun will fail with the following error:
orun: internal error, uncaught exception: Sys_error(": No such file or directory")

There are two ways to fix this:

Either make this argument here positional (since Cmdliner doesn't provide optional required arguments)
Or change the default string here from "" to something else.

Which idea seems better?

View results for a set of benchmarks in the nightly notebooks

(This is in the context of benchmark nightly runs on a remote machine)

It will be nice to have a way to select a set of benchmarks and view only results of those benchmarks. It will be useful to select them based on:

Individual benchmarks
Bench tags

Run parallel benchmarks in navajo.ocamllabs.io

Now that we know how to run parallel benchmark, we should run them in our CB server.

p.s: Not sure if this is the right place to add this issue. We don't have an issue tracker for CB.

PARAMWRAPPER should be CMDWRAPPER

as it wraps the command not the parameter. The context is https://github.com/ocaml-bench/sandmark/blob/master/Makefile#L8. In

if params < 16 then paramwrapper = 2-15 else paramwrapper = 2-15,16-27

s/params/domains
s/paramwrapper/cpulist

@shubhamkumar13

Notebook Refactoring and User changes

Ability to point to .bench results generated from ocaml-bench-scripts and use the notebooks directly on them using the UI.
Merge changes from sequential notebook to sequential-interactive and have only one notebook.
Make the parallel benchmark interactive.
Mention the Jupyter Notebook README content https://github.com/ocaml-bench/sandmark/blob/master/notebooks/README.md to the top-level README.

Make benchmark wrapper user configurable

From Tom Kelly on slack: "one thing that would be awesome in sandmark is the ability to configure the benchmark wrapper that collects the stats. Right now we have orun -o <output> -- <program-to-run> <program-arguments> which is static in the dune file. It would be nice if we could have the user configure in a central place <command> -o <output> -- <program-to-run> <program-arguments>. This can be powerful as you can then get off the shelf wrappers in there like ocperf.py and strace. It should also allow the user to define the arguments they want to pass to perf. For example they could record all the benchmarks for a given target."

Builds breaking due to `weak_htbl.ml`

CI builds are breaking now due to weak_htbl.ml addition. Unbreak this.

Add ability to let user specify command line arguments to benchmark wrapper.

From Tom Kelly: "while addressing #37, it would be nice to let user specify any command line arguments to benchmark wrapper itself <command> <command-arguments> -o <output> -- <program-to-run> <program-arguments>

Implement a better way to describe taskset cpulist

Given that PARAMWRAPPER feature has been reverted, provide a cleaner way to describe the taskset list of cores for a run given the number of domains.

[RFC] Categorize and group by benchmarks

At present, the benchmarks in Sandmark are available in the benchmarks folder:

$ ls benchmarks/
almabench       chameneos  decompress   kb           multicore-effects     multicore-minilight   numerical-analysis  stdlib      zarith
alt-ergo        coq        frama-c      lexifi-g2pp  multicore-gcroots     multicore-numerical   sauvola             thread-lwt
bdd             cpdf       graph500seq  menhir       multicore-grammatrix  multicore-structures  sequence            valet
benchmarksgame  cubicle    irmin        minilight    multicore-lockfree    nbcodec               simple-tests        yojson

What would be a good set of categories to group them together, perhaps something like the following?

library formal numerical graph multicore ...

Add Coq benchmarks

Coq Installation on Multicore OCaml

Coq compiles with the multicore compiler now. You'll need this branch of coq. You'll also need a dune > 2.4.0. The easiest way to get an installation going is to install dune.2.4.0 on 4.10.0 and copy the dune binary into the bin folder of the multicore switch. Once that is done, coq can be built with make -f Makefile.dune world.

Benchmarks

Once coq is build, make ci-all downloads and builds a whole series of libraries. It will be useful to take some of these library builds and make them into benchmarks in Sandmark.

ocaml-bench / sandmark Goto Github PK

sandmark's Introduction

Sandmark

📣 Attention Users 🫵

FAQ

How do I run the benchmarks locally?

How do I add new benchmarks?

How do I visualize the benchmark results?

Local runs

Nightly production runs

How are the machines tuned for the benchmarking?

Overview

Configuration of the compiler build

Execution

orun

Using a directory different than /home

Benchmarks

Configuring the benchmark runs

Tags

Running benchmarks

Results

Logs

Config files

Benchmarks status

Multicore Notes

ctypes

OS X

Makefile Variables

sandmark's People

Contributors

Stargazers

Watchers

Forkers

sandmark's Issues

Coq Installation on Multicore OCaml

Benchmarks

Recommend Projects

Recommend Topics

Recommend Org