Giter VIP home page Giter VIP logo

sandmark's People

Contributors

abbysmal avatar abdulrahim2567 avatar anmolsahoo25 avatar atuldhiman avatar ctk21 avatar dinosaure avatar electreaas avatar ernestmusong avatar fabbing avatar firobe avatar jberdine avatar kayceesrk avatar misterda avatar moazzammoriani avatar oliviernicole avatar punchagan avatar rikusilvola avatar ritikbhandari avatar sadiqj avatar shakthimaan avatar shubhamkumar13 avatar singhalshubh avatar stedolan avatar sudha247 avatar tmcgilchrist avatar xavierleroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sandmark's Issues

[RFC] How should a user configure a sandmark run?

If we look at things as a user wanting to run a benchmarking experiment, then they are wanting to compose several components to get data. The components they want to compose are:

  • A compiler built with their favourite configure switches
  • A set of benchmarks they want to run (expressed preferably as the end binary rather than needing to know the dependencies)
  • A program that collects stats from the binary (e.g. perf, orun, binary size)
  • An environment configuration for the execution of the stats collection (e.g. taskset, ASLR, OCAMLRUNPARAM)

The user configures (and could do so implicitly from defaults) these components into a run plan of benchmark executions to get data.

The mechanisms for configuring up the above has evolved as we’ve tried to do more with sandmark and (currently) is spread out:

  • the compiler build is specified in the .comp file
  • the benchmarks and their parameters are in run_config.json but you have to get the BUILD_BENCH_TARGET correct (e.g. multicore vs serial benchmarks) for all the opam packages because the benchmark programs don’t bring their dependent opam packages with them.
  • the stat collection wrappers are in run_config.json
  • the environment configuration is a bit all over the place. Some of it is going on in the wrappers in run_config.json, we have some stuff happening (e.g. crontab scripts) with PRE_BENCH_EXEC and we have PARAMWRAPPER handling multicore tasksetting

With this issue I'm hoping to find out if this tuple of:

(
compiler build, 
set of benchmarks (incl dependencies),
executable to collect stats from a benchmark command, 
environment within which to run the executable
)

covers all the use cases we need?

Once we've got the use cases nailed down. We can try to have proposals or prototypes that attempt to make the configuration a bit easier to handle than the current mix of environment variables, comp files and json files. This should make it easier for users to run the experiments they want.

How to run single-threaded benchmarks alone?

make ocaml-versions/4.06.0.bench fails to compile kcas package, unsurprisingly. How do I run just the single threaded benchmarks?

The following actions will be performed:
  ∗ install kcas 0.1.4

<><> Gathering sources ><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[kcas.0.1.4] found in cache

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
[ERROR] The compilation of kcas failed at "/home/kc/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas".

#=== ERROR while compiling kcas.0.1.4 =========================================#
# context     2.0.0 | linux/x86_64 | ocaml-base-compiler.4.06.0 | file:///home/kc/sandmark/dependencies
# path        ~/sandmark/_opam/4.06.0/.opam-switch/build/kcas.0.1.4
# command     ~/sandmark/_opam/opam-init/hooks/sandbox.sh build dune build -p kcas
# exit-code   1
# env-file    ~/sandmark/_opam/log/kcas-8684-1872ec.env
# output-file ~/sandmark/_opam/log/kcas-8684-1872ec.out
### output ###
#       ocamlc src/.kcas.objs/byte/kcas.{cmo,cmt} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlc.opt -w -40 -g -bin-annot -I src/.kcas.objs/byte -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/byte/kcas.cmo -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field
#     ocamlopt src/.kcas.objs/native/kcas.{cmx,o} (exit 2)
# (cd _build/default && /home/kc/sandmark/_opam/4.06.0/bin/ocamlopt.opt -w -40 -g -I src/.kcas.objs/byte -I src/.kcas.objs/native -intf-suffix .ml -no-alias-deps -open Kcas__ -o src/.kcas.objs/native/kcas.cmx -c -impl src/kcas.ml)
# File "src/kcas.ml", line 37, characters 2-28:
# Error: Unbound value Obj.compare_and_swap_field



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
┌─ The following actions failed
│ λ build kcas 0.1.4
└─ 
╶─ No changes have been performed
opam exec --switch 4.06.0 -- dune build -j 1 --profile=release --workspace=ocaml-versions/.workspace.4.06.0 @bench; \
  ex=$?; find _build/4.06.0_* -name '*.bench' | xargs cat > ocaml-versions/4.06.0.bench; exit $ex

Make benchmark wrapper user configurable

From Tom Kelly on slack: "one thing that would be awesome in sandmark is the ability to configure the benchmark wrapper that collects the stats. Right now we have orun -o <output> -- <program-to-run> <program-arguments> which is static in the dune file. It would be nice if we could have the user configure in a central place <command> -o <output> -- <program-to-run> <program-arguments>. This can be powerful as you can then get off the shelf wrappers in there like ocperf.py and strace. It should also allow the user to define the arguments they want to pass to perf. For example they could record all the benchmarks for a given target."

[RFC] Header entry attributes for the summary benchmark result file

At present, the ocaml_bench_scripts creates the benchmark results files in a hierarchical directory structure that has information regarding hostname, GitHub commit, branch, timestamp etc. One option is to flatten this data, and also add additional meta-information as a header entry to each consolidated .bench result file. The advantages are:

  1. The list of bench files can be used locally with a JupyterHub notebook, without having to create the necessary directory structure.
  2. Each .bench file is self-contained, and can be easily stored and accessed in a file system archive. This allows an Extract, Transform, Load (ETL) tool to push data to a database or for visualization for further analysis.
    A useful list of key-value attributes that can be included in the header entry of the JSON .bench file are:
  • Version
  • Timestamp
  • Hostname
  • Operating System
  • Kernel version
  • Architecture
  • GitHub repository
  • GitHub branch
  • GitHub commit
  • Compiler variant
  • Compiler configure options
  • Compiler runtime options

[RFC] Classifying benchmarks based on running time

Currently, we have classified benchmarks using two tags -- macro_bench and runs_in_ci. This is a useful classification, but we've had cases of mislabelling where benchmarks that run for a few milliseconds being classified as macro. Moreover, the original idea of runs_in_ci was meant to be running those benchmarks run reasonably fast so that we don't exhaust the 1hr CI limit. But it is unclear whether this is being followed necessarily. There are a few benchmarks that now run for more than 100 seconds, but don't bring in much value themselves. They tend to be too long to be useful for an initial benchmarking of a new feature.

image

To address this I propose the following scheme. We will get rid of macro_bench and runs_in_ci and instead use the following classification:

  • lt_1s - benchmarks that run for less than 1 second
  • 1s_10s - benchmarks that run for at least 1 second but less than 10 seconds
  • 10s_100s - benchmarks that run for at least 10 seconds but less than 100 seconds
  • gt_100s - benchmarks that run for at least 100 seconds

We classify the benchmarks based on their running time on the turing machine: Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz.

For the initial performance benchmarking, all the benchmarks in the 1s to 100s range should be considered. This will be the replacement for macro_bench.

For the CI, we will run benchmarks that are in the 1s to 10s range. This will be the replacement for runs_in_ci tag.

Any parallel benchmark should have a serial baseline version that runs for at least 10s. Otherwise, the parallelization overheads outweigh the benefit of parallelism.

We should have a hard look at any benchmarks that run for less than 1s and see whether they're giving us any useful signals.

BUILD_ONLY needs to exit with error if package installation fails

At present, the BUILD_ONLY environment variable stops before executing the benchmarks. We need a way to exit with an error status if any of the dependency installation packages fail. We use --best-effort to ignore any build errors and proceed.

In particular, frama-c (#17) needs to be built successfully for this change to be useful with the CI.

simple-tests/capi - Fatal error: "unexpected test name"

The capi tests are failing in Sandmark master branch (commit aa122e6, January 24, 2021) for the lt_1s, 1s_10s tags with the following error messages:

...
Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun capi.test_few_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_few_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_few_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orun898647stderr
        orun capi.test_many_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_many_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_many_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

/tmp/orunf8dd14stderr
        orun capi.test_no_args_noalloc_200_000_000.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/simple-tests && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../capi.test_no_args_noalloc_200_000_000.orun.bench -- taskset --cpu-list 5 ./capi.exe test_no_args_noalloc_200_000_000)
Fatal error: exception Failure("unexpected test name")

You can reproduce the issue using:

$ TAG='"lt_1s"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

CI should run parallel benchmarks

Currently we only run the sequential benchmarks in the CI. We should run parallel benchmarks as well. It is useful just to run the benchmark for 2 domains, removing the taskset and chrt commands. This can be done with jq.

View results for a set of benchmarks in the nightly notebooks

(This is in the context of benchmark nightly runs on a remote machine)

It will be nice to have a way to select a set of benchmarks and view only results of those benchmarks. It will be useful to select them based on:

  • Individual benchmarks
  • Bench tags

js_of_ocaml fails to run on multicore

See issue #17 on how to get frama-c building on multicore.

js_of_ocaml was failing with Error: Js_of_ocaml_compiler__Instr.Bad_instruction(983041)

might be additional bytecode instructions added in multicore and not supported by js_of_ocaml?

Improve microbenchmarks

A few of the microbenchmarks need to be improved. In particular, between multicore and trunk, the finalise microbenchmark measures the performance difference in mark stack overflow handling. lazy_primes measures the efficiency of major heap allocator.

Include raw baseline observation in the normalized graphs

The normalized graphs in sandmark currently do not include the raw baseline observations. For example,

image

The graphs shows the normalized running time comparing two compiler variants. This graph does not convey how much time the baseline actually took. We've had few examples where we've seen 20% speedup / slowdown that can be explained by the fact that the benchmark only runs for a few milliseconds and the difference that we see is due to expected execution time variance and noise.

It would be useful to have the baseline time included in the normalized graphs as additional information with the benchmark name. For example,

image

The numbers in the parenthesis indicates the time in seconds for the baseline runs.

It would be useful to include the raw baseline observation in every normalised graph.

Add the ability to average out the results from several runs

Currently, the Makefile takes the number of iterations to run the benchmarks as an environment variable ITER (default is 1). Each iteration produces a separate folder under _results.

While we have removed much of the noise from the executions and our results are generally reproducible, it would be useful to add the ability to average out the stats from several runs. For example, ASLR is turned on by default on Linux. However, for benchmarking runs, we turn off ASLR as it introduces noise and affects reproducibility. The right thing is to have ASLR on and then get the average of several runs.

That said, average might not make sense for all of the topics; max pause time, max resident set size, for example. Now that the Jupyter notebooks are part of the Sandmark repo, we should add the ability in the notebooks to process multiple iterations of the same compiler variant and compute averages (and also median and SD, when it makes sense).

Include major cycle information in Jupyter notebook graph

The major cycle, count and words, provide useful information on what benchmarks are GC heavy. The baseline information along with their labels can be included in the graph, as they are more useful than what is available in the normalized graph.

Reimplement broken pausetimes support

In Multicore OCaml 4.12.0, the catapult format (used by chrome://tracing/) based event tracing is deprecated. Multicore 4.12.0 will soon produce CTF based traces ocaml-multicore/ocaml-multicore#527. We need to update the scripts to adapt to the new tracing framework. This involves several tasks:

  1. Modify the pausetimes scripts for multicore and stock to emit the CTF based traces
  2. Rewrite the tail latency computation scripts to read the CTF format directly. Catapult traces produced by Multicore OCaml quickly go to Gigabytes in size and it wasn't practical to analyse the pausetime for long-running programs due to the large file sizes.
  3. Possibly, the new tail latency scripts should be implemented in OCaml rather than python. We observed that python happens to be really slow in processing medium-sized eventlog traces; multiple minutes for processing the eventlog of a single program run. If it turns out to be slow to process the CTF traces, we should consider rewriting this in OCaml for speed.

Filtering benchmarks based on tags

We have multiple tags to classify benchmarks in the config files, like macro_bench, run_in_ci and tags based on running time. Makefile currently supports filtering macro and CI benchmarks with custom rules.

It would be useful to generalize this further by taking the following items as inputs:

  • tag
  • source file
  • destination file

All benchmarks with the tag in source file needs to be copied to destination file.

The following jq filter could be used to do the filtering:

jq '{wrappers : .wrappers, benchmarks: [.benchmarks | .[] | select(.tags | index(<tag>) != null)]}'

Run parallel benchmarks in navajo.ocamllabs.io

Now that we know how to run parallel benchmark, we should run them in our CB server.

p.s: Not sure if this is the right place to add this issue. We don't have an issue tracker for CB.

Enrich the functionality of .comp files

Currently, the compiler variants in .comp files only take the URL. We've seen many requests for enriching the functionality of .comp files so that the builds and runs may be customized. Some of these are:

  1. OCAMLRUNPARAM parameters for the runs. Consider the case of running the same compiler variant with different OCAMLRUNPARAM parameters. Currently, there is no way to configure this as part of the variant and must be externally applied during runs.
  2. Compiler configuration flags such as enabling flambda.
  3. Different wrappers (orun, pauseimes, perf). These are now hardcoded in the .json files [1]. It will be better if these were described in the .comp files.

I would consider 1 and 2 to be high priority right now. Once that is done, we can do 3. I recommend using s-expression for the new data format of .comp files, and parse it with the help of sexplib [2]. It might also be useful to rename .comp to .var to signify that these are different variants not compilers (same compiler might have different variants c.f. item 1 above).

[1] https://github.com/ocaml-bench/sandmark/blob/master/multicore_parallel_run_config.json#L2-L15
[2] https://github.com/janestreet/sexplib

@shakthimaan

graph500seq: Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

The graph500seq benchmarks are failing in Sandmark (commit aa122e6, January 24, 2021) for macro_bench tag with the following error messages:

Executing benchmarks with:
  RUN_CONFIG_JSON=run_config_filtered.json
  RUN_BENCH_TARGET=run_orun  (WRAPPER=orun)
  PRE_BENCH_EXEC=
        orun kernel2.12_10.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel2.12_10.orun.bench -- taskset --cpu-list 5 ./kernel2.exe 12 10)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

/tmp/orun875078stderr
        orun kernel3.12_10_2.orun.bench [4.10.0+multicore_1] (exit 2)
(cd _build/4.10.0+multicore_1/benchmarks/graph500seq && /home/shakthi/testing/sandmark/_opam/4.10.0+multicore/bin/orun -o ../../kernel3.12_10_2.orun.bench -- taskset --cpu-list 5 ./kernel3.exe 12 10 2)
Fatal error: exception Sys_error("kronecker.txt: No such file or directory")

You can reproduce the error using:

$ TAG='"macro_bench"' make run_config_filtered.json
$ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench

In benchmarks/graph500seq/dune, there is:

(alias (name buildbench) (deps kronecker.exe kernel1.exe kernel2.exe kernel3.exe))

Note:

  1. The deps are not built in sequential order, and kronecker.exe produces kronecker.txt, which is a pre-requisite for kernel2 and kernel3.
  2. Also, kernel2 and kernel3 use a linkKernel1 function from kernel1.

Noise in Sandmark

Following the discussion in ocaml/ocaml#9934, I set out to quantify the noise in Sandmark macrobenchmark runs. Before asking complex questions about loop alignments and microarchitectural optimisations as was done in ocaml/ocaml#10039, I wanted to measure the noise between multiple runs of the same code. It is important to note that currently, we only run a single iteration of each variant.

The benchmarking was done on IITM "turing" which is a Intel Xeon Gold 5120 CPU machine with isolated cores, cpu governer set to performance, hyper threading disabled, turbo boost disabled, interrupts and rcu_callbacks directed to non-isolated cores but ASLR on [1]. The result on two runs of the latest commit from https://github.com/stedolan/ocaml/tree/sweep-optimisation is here:

image

The outlier is worrisome, but there is up to 2% difference in both directions. Moving forward, we should consider the following:

  1. Arrive at a measure for statistical significance on a given machine. What would be the minimum difference beyond which the result is statistically significant. This will vary based on the benchmark and the topic (running time, maxRSS).
  2. Run multiple iterations. Sandmark already has an ITER variable which runs the experiments for multiple runs. The notebooks need to be updated so that mean (and standard deviation) are computed first and the graphs are updated to include error bars. The downside is that the benchmarking will take significantly longer. We should choose a representative set of macro benchmarks for quick study and reserve the full macro benchmark run for the final result. Can we run the sequential macro benchmarks in parallel on different isolated cores? What would be the impact of this on individual benchmark runs?

[1] https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking

Running benchmarks with varying OCAMLRUNPARAM

Right now there isn't an easy way to run experiments where OCAMLRUNPARAM changes but the compiler build stays fixed.

Suppose we wanted to look at the impact of the minor heap size across our benchmarks with 4.10.0+stock.

The run plan we want is:
4.10.0+stock, OCAMLRUNPARAM=s=1M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=2M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=4M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=8M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=16M, using orun
4.10.0+stock, OCAMLRUNPARAM=s=32M, using orun

One way to do this might be to have a way to specify OCAMLRUNPARAM within a wrapper in the run_config.json; this has the advantage that we get the wrapper naming and reuse of the build for free. There may be other better ways to do it.

(NB: the runparams in the compiler variant here is only applied to the opam build, not the run of the benchmarks)

Add an odoc based benchmark

It would be useful to have more memory intensive benchmarks. It looks like odoc has dependencies that shouldn't be to complex to add considering what is already present. It will also build on the multicore 4.12+domains variant.

I'm sure that the odoc team can provide some pointers to a couple of workloads that are fairly memory intensive and representative of the sort of thing they see.

Ability to collect perf stats on a benchmark run

On Linux we would like to be able to collect 'perf stat' on the benchmark process.

We would like to collect the following counters:

  • basic:
    task-clock, instructions, cpu-cycles, stalled-cycles-frontend, branches, branch-misses
  • front-end to back-end pipeline:
    idq_uops_not_delivered.core, lsd.cycles_active
  • memory:
    L1-icache-load-misses, L1-dcache-load-misses, iTLB-load-misses, dTLB-load-misses

Add Coq benchmarks

Coq Installation on Multicore OCaml

Coq compiles with the multicore compiler now. You'll need this branch of coq. You'll also need a dune > 2.4.0. The easiest way to get an installation going is to install dune.2.4.0 on 4.10.0 and copy the dune binary into the bin folder of the multicore switch. Once that is done, coq can be built with make -f Makefile.dune world.

Benchmarks

Once coq is build, make ci-all downloads and builds a whole series of libraries. It will be useful to take some of these library builds and make them into benchmarks in Sandmark.

sandmark dune version not compiling with 4.09

sandmark is unable to run the 4.09 branch after this commit:
ocaml/ocaml@a7e7e8e

sandmark is using a custom dune to make it work with multicore:
https://github.com/ocaml-bench/sandmark/blob/master/dependencies/packages/dune/dune.1.7.1/opam

Unfortunately sandmark is unable to compile dune after the 4.09 commit above. We get the following error:

#=== ERROR while compiling dune.1.7.1 =========================================#
# context              2.0.4 | linux/x86_64 | ocaml-base-compiler.4.09.0 | file:///local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/dependencies
# path                 /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/4.09.0/.opam-switch/build/dune.1.7.1
# command              /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/opam-init/hooks/sandbox.sh build ./boot.exe --release -j 5
# exit-code            1
# env-file             /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.env
# output-file          /local/scratch/ctk21/daily/20190614_0005/4.09/a7e7e8e7454406d5b714bff7971d365b34b653a4/sandmark/_opam/log/dune-35511-c4933c.out
### output ###
# Error: This expression has type
# [...]
#          (string, string) Dune.Import.result =
#            (string, string) Dune_caml.result
#        Type string list is not compatible with type string
# -> required by src/.dune.objs/native/dune__Watermarks.cmx
# -> required by alias src/lib-dune.cmx-all
# -> required by alias src/lib-dune.cmi-and-.cmx-all
# -> required by bin/.main.objs/native/main.cmx
# -> required by bin/main.a
# -> required by bin/main_dune.exe
# -> required by _boot/install/default/bin/dune
# -> required by dune.install

Making Output location for run statistics 'required'

The command line does not specify that the output location is mandatory or else orun will fail with the following error:
orun: internal error, uncaught exception: Sys_error(": No such file or directory")

There are two ways to fix this:

  1. Either make this argument here positional (since Cmdliner doesn't provide optional required arguments)
  2. Or change the default string here from "" to something else.

Which idea seems better?

Measurements of code size

For compiler benchmarking it's important to have measurements of code size. The size of the text section of the executable should be fine.

Deprecate the use of 4.06.1 and 4.10.0 in Sandmark

We no longer support multicore versions 4.06.1 and 4.10.0 in multicore. The task is to ensure that all the references to either of these versions is removed from the documentation and the scripts, and making sure that 4.12.0 works well with the new documentation.

[RFC] Categorize and group by benchmarks

At present, the benchmarks in Sandmark are available in the benchmarks folder:

$ ls benchmarks/
almabench       chameneos  decompress   kb           multicore-effects     multicore-minilight   numerical-analysis  stdlib      zarith
alt-ergo        coq        frama-c      lexifi-g2pp  multicore-gcroots     multicore-numerical   sauvola             thread-lwt
bdd             cpdf       graph500seq  menhir       multicore-grammatrix  multicore-structures  sequence            valet
benchmarksgame  cubicle    irmin        minilight    multicore-lockfree    nbcodec               simple-tests        yojson

What would be a good set of categories to group them together, perhaps something like the following?

library formal numerical graph multicore ...

`OCAMLRUNPARAM` value should be included in the .bench file

Since OCAMLRUNPARAM values affect the execution of the benchmarks, we should include them in the .bench file for each of the runs. That is, this should include a new field called OCAMLRUNPARAM whose value is the value of the OCAMLRUNPARAM environment variable.

This will require changing the orun tool to include the additional field.

Running sandmark on OS X is broken

Looks like we've inadvertently broken the ability to run sandmark on OS X.
It looks like there's a couple of issues here:

  • some of the opam pinned packages are failing to build for OS X 10.15.x for example Lwt
  • the benchmarks themselves just fail to run and sadly it doesn't give any errors to help:
make: *** [ocaml-versions/4.10.0.bench] Error 1

It feels like the ability to run even just the pure OCaml benchmarks (e.g. benchmarks/multicore-numerical/game_of_life.ml) with no package dependencies is useful.

To fix OS X support there are probably two things to do here:

  • fix the underlying issue(s) so that benchmarks run on OS X
  • add OS X to the CI

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.