llnl / benchpark Goto Github PK

An open collaborative repository for reproducible specifications of HPC benchmarks and cross site benchmarking environments

Home Page: https://software.llnl.gov/benchpark/

License: Apache License 2.0

Shell 0.16% CMake 1.05% Python 96.20% C++ 2.59%

benchmark hpc

benchpark's Introduction

Overview

You can find detailed documentation at software.llnl.gov/benchpark. Benchpark can also be found on GitHub.

Benchpark is an open collaborative repository for reproducible specifications of HPC benchmarks. Benchpark enables cross-site collaboration on benchmarking by providing a mechanism for sharing reproducible, working specifications for the following:

System Specifications (benchmark and experiment agnostic)

Hardware information
System software environment information (available compilers, MPI)
Scheduler and launcher

Benchmark Specifications (system and experiment agnostic)

Source repo and version
Build configuration (with Spack)
Run configuration (with Ramble)

Experiment Specifications (specific benchmark experiment on a system specification)

Programming model (e.g., OpenMP, CUDA, ROCm) for the benchmark on a given system
Parameters for individual runs in a study

Dependencies

Benchpark uses the following open source projects for specifying configurations:

Spack for building benchmark and dependencies
Ramble for run configurations

Community

Benchpark is an open source project. Questions, discussion,and contributions of new specifications as well as updates and improvements to existing specifications are welcome.

We use github discussions for Q&A and discussion.

Contributing

To contribute to Benchpark, please open a pull request to the develop branch. Your PR must pass Benchpark's unit tests, and must be PEP 8 compliant.

Authors and citations

Many thanks to Benchpark's contributors.

Benchpark was created by Olga Pearce, Alec Scott, Greg Becker, Riyaz Haque, and Nathan Hanford.

To cite Benchpark, please use the following citation:

Olga Pearce, Alec Scott, Gregory Becker, Riyaz Haque, Nathan Hanford, Stephanie Brink, Doug Jacobsen, Heidi Poxon, Jens Domke, and Todd Gamblin. 2023. Towards Collaborative Continuous Benchmarking for HPC. In Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023), November 12–17, 2023, Denver, CO, USA. ACM, New York, NY, USA, 9 pages. doi.org/10.1145/3624062.3624135.

License

Benchpark is released under the Apache 2.0 w/ LLVM Exception license. For more details see the LICENSE file.

LLNL-CODE-850629

benchpark's People

Contributors

Stargazers

Watchers

benchpark's Issues

Print spack concretization output on stdout and a separate log file

Currently benchpark setup prints the spack concretization details in <workspace-dir>/workspace/logs/setup.<time>.out file. In case of multiple experiments, this concretization log appears only once per each set of experiments built using the same config options. This makes it difficult to track compilation output for a given experiment.
If will be highly desirable to have the spack concretization output printed on the stdout and its own separate log file. Additionally, it will also be nice to have a mode like benchpark setup --dry-run which simply prints the concretization log for a given set of experiments.

Add documentation on benchpark/repo/library/package.py provides(interface)

Example: https://github.com/LLNL/benchpark/blob/develop/repo/cublas/package.py

This should probably be under "Contributing" - should it be called "Adding a library interface"? @becker33

Is this necessary when:

a benchmark has a dependence that is an interface (define interface in this context?)
a library provides this interface, but with a different name.
if that library has a spack package, do we need to add the "provides("lapack")" to this existing package?
if that library does not have a spack package, is there a doc in spack that provides instructions on how to write this stub package?

Update github actions to re-render docs when bin/benchpark is updated

Current rule is to re-render docs if anything in docs/ or README.rst are updated. We should also include updates to bin/benchpark to capture additional CLI parameters.

Add archspec info to artifact

Add function to call arch spec
-- On my system, so I can add it to system spec in Benchpark
-- (eventally) When creating a system spec in Benchpark
What does arch spec return? Where should we store it?
Add arch spec info to archive

Update default compiler on lassen to clang-ibm instead of xlc

Code groups use clang-ibm as the default compiler on lassen and are planning to drop support for xlc in the near future. The default compiler for lassen should also be switched to clang-ibm. This would involve testing all exsting benchmarks on lassen against the clang-ibm compiler

cannot have cflags in two spots without breaking the concretizer

benchpark/experiments/stream/openmp/ramble.yaml

Line 44 in b3e09e6

spack_spec: '[email protected] +openmp stream_array_size={array_size} ntimes={n_times} cflags="-mcmodel=medium -Ofast -flto"'

and

benchpark/configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/auxiliary_software_files/compilers.yaml

Line 34 in b3e09e6

cflags: {"-msve-vector-bits=scalable"}

==> *******************************************
==> ********** Running Spack Command **********
==> **     command: /vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/test/spack/bin/spack concretize
==> **     with args: ['-U', '-f']
==> *******************************************
==> 
==> Error: Internal Spack error: the solver completed but produced specs that do not satisfy the request.
	Unsatisfied input specs:
	Input spec: [email protected]%[email protected] cflags='-mcmodel=large' fflags='-mcmodel=large' +openmp ntimes=20 stream_array_size=80000000
	Candidate spec: stream@=5.10%clang@=15.0.3 cflags='-msve-vector-bits=scalable -mcmodel=large' cxxflags='-msve-vector-bits=scalable' fflags='-mcmodel=large' ldflags='-fuse-ld=lld -lelf -ldl' +openmp build_system=makefile ntimes=20 offset=none stream_array_size=80000000 stream_type=none arch=linux-rhel8-a64fx ^[deptypes=build] gmake@=4.4.1%clang@=15.0.3 cflags='-msve-vector-bits=scalable' cxxflags='-msve-vector-bits=scalable' ldflags='-fuse-ld=lld -lelf -ldl' ~guile build_system=generic arch=linux-rhel8-a64fx
==> Error: Command exited with status 1:
    '/vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/test/spack/bin/spack' 'concretize' '-U' '-f'
==> Error: Error running spack command: /vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/test/spack/bin/spack concretize -U -f

ps: i ran sed -i -e 's@cflags=".*"@cflags=-mcmodel=large fflags=-mcmodel=large@g' experiments/stream/openmp/ramble.yaml before doing the test, so the cflags in the error log and the cflags shown in the linked files are slightly different

Parameterized system variables

Options for enabling different values for a variable within a single system config, such as different number of cores or memory per node in AWS instances. Brainstorming notes from discussion with Doug below.

Option 1:

variables.yaml

variables:
  aws_names: [full, half, quarter]
  sys_cpus_per_node: [44, 22, 11]
  sys_gpus_per_node [4, 2, 1]

zips.yaml

zips:
  aws_conf:
  - aws_names
  - sys_cpus_per_node
  - sys_gpus_per_node

ramble workspace setup --where '"{aws_names}" == "full"'

Option 2:

variables.yaml

variables:
  sys_cpus_per_node: 44
  sys_gpus_per_node 4

Should work

ramble -c variables:sys_cpus_per_node:22 -c variables:sys_gpus_per_node:2 workspace setup

Only works when they are part of a config scope.

Config scopes are listed in `ramble config list`

Anything within another config scope won't be merged by the config logic

i.e. command line scopes won't override variables defined in

`ramble:applications:hostname:variables` as they are in a different config section

section. And merging happens after the config scopes are merged.

ramble.yaml

ramble:
  variables: <-- Will work
  applications:
    hostname:
      variables: <-- Won't work
      workloads:
        local:
          variables: <-- Won't work
          experiments:
            test:
              variables:
                sys_cpus_per_node: 44
                sys_gpus_per_node 4

Won't work

ramble -c variables:sys_cpus_per_node:22 -c variables:sys_gpus_per_node:2 workspace setup

Add list of required python packages, add requirements to docs

Benchpark tags

Benchpark list in the docs currently shows all Ramble experiments. Should only be Benchpark experiments.
Add tags to Benchpark application.py (derive from Ramble if needed).
Add tags to Benchpark experiments (cuda, etc.)

missing fflags in stream benchmark definition?

benchpark/experiments/stream/openmp/ramble.yaml

Line 44 in b3e09e6

spack_spec: '[email protected] +openmp stream_array_size={array_size} ntimes={n_times} cflags="-mcmodel=medium -Ofast -flto"'

stream makefile has CFLAGS and FFLAGS and the performance-relevant code is in fortran files, so I would assume we need fflags here too (unless spack populates the fflags with a copy of the cflags ¯\(ツ)/¯ )

I tried adding fflags (via cflags="-mcmodel=large" fflags="-mcmodel=large") but this fails with:

==> Defining Spack variables
==> 
==> *******************************************
==> ********** Running Spack Command **********
==> **     command: /vol0005/mdt3/data/ra000020/u10016/benchpark.fj/test.fj/spack/bin/spack
==> **     with args: ['find', '--format={name}', '[email protected]', '+openmp', 'stream_array_size=80000000', 'ntimes=20', 'cflags="-mcmodel=large"', 'fflags="-mcmodel=large"', '%[email protected]']
==> *******************************************
==> 
==> Error: No package matches the query: [email protected]%[email protected] cflags='"-mcmodel=large"' fflags='"-mcmodel=large"' +openmp ntimes=20 stream_array_size=80000000
==> Error: Command exited with status 1:
    '/vol0005/mdt3/data/ra000020/u10016/benchpark.fj/test.fj/spack/bin/spack' 'find' '--format={name}' '[email protected]' '+openmp' 'stream_array_size=80000000' 'ntimes=20' 'cflags="-mcmodel=large"' 'fflags="-mcmodel=large"' '%[email protected]'

Dropping the " from the line ( -> cflags=-mcmodel=large fflags=-mcmodel=large ) compiles successfully which could be related to GoogleCloudPlatform/ramble#436 or something else ¯\(ツ)/¯

benchpark list

=== Spack has:
spack list
spack list sql //anything that has sql in the name
spack list --search-description documentation
spack info --all mpich
spack versions libelf //all versions of libelf
spack tags
spack tags [-hia] [tag ...] //for a specific tag

=== Ramble has:
ramble list
ramble list [-hd] [--format {name_only,version_json,html}] [--update FILE] [-t TAGS] ...
//can search the description, and by tags
ramble list wrf //all versions of wrf application
ramble info application //info on a given application

=== Benchpark has at least 2 additional needs:

What system definitions are available?
What benchmarks are defined in my repo (not yet upstreamed to Spack and/or Ramble)

Notes from discussing with Doug about #1:

System definitions in Benchpark are template workspaces, so we could point Ramble at the system directory to grab the names, but this won’t work with hierarchical system definitions (e.g., llnl/cts1, llnl/ats4).

We could activate one system and add ramble config to layer an experiment?

Doug wants to at some point add functionality to generate a workspace from another workspace
Have a directory of template workspaces
Ramble workspace from a template
Which might enable more options from the above.

What about the benchmark definitions for #2?

Parameterizing the compiler to use for an experiment

Application.py can give suggestions on where an experiment might be run, but not requirements. However, application.py could define a workload variable that can then be used in experiment variables.

Example of using a variable in the experiment to impact the build requirements:

https://github.com/LLNL/benchpark/tree/develop/experiments/amg2023/cuda/ramble.yaml

applications:
    amg2023:
      workloads:
        problem1:
          variables:
             cuda_arch: '{cuda_arch}'
spack:
    concretized: true
    packages:
      amg2023:
        spack_spec: amg2023@develop +mpi+cuda{modifier_spack_variant} cuda_arch=={cuda_arch} 
        compiler: default-compiler

How should we connect this to what is available on the system?

Note: environments are lazily rendered in Ramble, that is they are only rendered when the experiment phases are started.

benchpark tags

Both Spack and Ramble allow tags, and searching via tags, but do not enforce them.

Would it be possible to create a list of available tags (e.g., in a text file, if nothing else), and in the PR process to have a check for whether a benchmark introduces new tags? If a new tag is introduced, we could ask the PR author to consider the available tags, and if they do not meet his/her needs, proposing to add a new tag to our list.

A starting list of tags I would like to have:
Scale: single-node, sub-node, multi-node, many-node
Language: C, C++, Fortran, Python, Julia
Parallelism: cpu, cpu-openmp, gpu-openmp, raja, kokkos, cuda, hip, sycl, oneapi
Communication: MPI, NCCL, RCCL, UPC, shmem, nvsmem
Application domain: asc, engineering, astrophysics, chemistry, climate, fusion, material-science,
Application type: hydrodynamics, nbody, transport, deterministic, montecarlo, particles, direct, explicit, implicit, FFT, solver, dense-linear-algebra, sparse-linear-algebra, ML, I-O
Mesh representation: structured-grid, block-structured-grid, AMR, unstructured-grid
Communication performance characteristics: network-bandwidth-bound, network-bisection-bandwidth-bound, network-latency-bound, non-local-point-to-point, mpi-collectives
Memory access characteristics: regular-memory-access, irregular-memory-access, high-memory-bandwidth, large-memory-footprint
Performance characteristics: high-fp, simd, atomics, vectorization, high-branching, register-pressure, mixed-precision
I am cutting myself off here, this list will certainly grow, which is why I’d like to figure out some sanity to this madness upfront.

By the way, I’d like to use tags in the system specs as well, to indicate what the system is able to accommodate (e.g., has GPUs, CPUs, particular communication libraries). I imagine people coming in to look for available system definitions and wanting to find something most similar to their system to copy and refine.

Please let me know what you think we should do here.

Add file and line number information to `debug_print`

Add documentation "for the impatient"

The usual "Here's what to do if you know what you're doing."

intermittent xz/openssl fetch issues on LLNL Poodle with nosite-x86_64 saxpy/openmp

Big picture:

For the past several days I've hit intermittent issues with xv and openssl downloads when trying to build a nosite-x86_64 saxpy/openmp instance on LLNL's Poodle cluster.

I've been able to reproduce the issue in an isolated clone of spack. Adjusting the connection timeout might have helped, but in digging further into the logs I wasn't able to nail down what might be taking so long (fetch for xz was taking over a minute whether it succeeded or failed, and it's not that big). Cloning another spack instance into /dev/shm (to remove filesystem lag from the equation) resulted in ssl certificate errors. Note that multiple other packages were being fetched and built correctly.

From the top:

I sourced the following into my environment.

module load python/3.11.5
export PATH=${HOME}/.local/bin:${PATH}
export BPROOT=${PWD}/benchpark
export PATH=${BPROOT}/bin:${PATH}
export WORKSPACE_DIR=${PWD}/workspace
export BPSITE=nosite-x86_64
export BPEXPR=saxpy/openmp
export SPACK_DISABLE_LOCAL_CONFIG=1
alias bp="benchpark"

I executed the following manually.

mkdir ${WORKSPACE_DIR}
git clone [email protected]:LLNL/benchpark.git
cd benchpark
pip install -r requirements.txt
bp setup ${BPEXPR} ${BPSITE} ${WORKSPACE_DIR}
. ${WORKSPACE_DIR}/setup.sh
ramble -P -D ${WORKSPACE_DIR}/${BPEXPR}/${BPSITE}/workspace workspace setup

Output of the ramble command:

rountree@poodle18 ~/w/poodle/bp/benchpark$ ramble -P -D ${WORKSPACE_DIR}/${BPEXPR}/${BPSITE}/workspace workspace setup
==> Warning: The following config sections are deprecated and ignored:
==> Warning:     spack:concretized
==> Warning: Please remove from your configuration files.
==> Streaming details to log:
==>   /g/g24/rountree/w/poodle/bp/workspace/saxpy/openmp/nosite-x86_64/workspace/logs/setup.2024-04-22_13.37.45.out
==>   Setting up 8 out of 8 experiments:
==> Experiment #1 (1/8):
==>     name: saxpy.problem.saxpy_512_1_8_2
==>     root experiment_index: 1
==>     log file: /g/g24/rountree/w/poodle/bp/workspace/saxpy/openmp/nosite-x86_64/workspace/logs/setup.2024-04-22_13.37.45/saxpy.problem.saxpy_512_1_8_2.out                                                                                                                
==> Error: Command exited with status 1:
    '/g/g24/rountree/w/poodle/bp/workspace/spack/bin/spack' 'install' '--add' '--keep-stage'
==> Error: Error running spack command: /g/g24/rountree/w/poodle/bp/workspace/spack/bin/spack install --add --keep-stage
==> Error: For more details, see the log file: /g/g24/rountree/w/poodle/bp/workspace/saxpy/openmp/nosite-x86_64/workspace/logs/setup.2024-04-22_13.37.45/saxpy.problem.saxpy_512_1_8_2.out

Relevant section from the output log:

196 ==> Installing xz-5.4.6-7ve2jkjkz6gqh2bryqvg4knmjc6fzea2 [4/42]
197 ==> No binary for xz-5.4.6-7ve2jkjkz6gqh2bryqvg4knmjc6fzea2 found: installing from source
198 ==> Cannot find version 5.4.6 in url_list
199 ==> Error: FetchError: All fetchers failed for spack-stage-xz-5.4.6-7ve2jkjkz6gqh2bryqvg4knmjc6fzea2
...
443 [+] /usr/WS1/rountree/poodle/bp/workspace/spack/opt/spack/linux-rhel8-icelake/gcc-10.3.1/libpciaccess-0.17-kzqqwk7egks7el6afw7kqkf64tvavd3o
444 ==> Installing openssl-3.2.1-3q2ox7v4kzgnxuvog2cvnevfceicdvb5 [26/42]
445 ==> No binary for openssl-3.2.1-3q2ox7v4kzgnxuvog2cvnevfceicdvb5 found: installing from source
446 ==> Cannot find version 3.2.1 in url_list
447 ==> Error: FetchError: All fetchers failed for spack-stage-openssl-3.2.1-3q2ox7v4kzgnxuvog2cvnevfceicdvb5

The problem is isolated to those two packages. This is the list of package installations that succeeded, grepped from the same log file.

==> gcc-runtime: Successfully installed gcc-runtime-10.3.1-uv5gzwbajf5yzwm445tewllht6dkicag
==> gmake: Successfully installed gmake-4.4.1-zm2ln2tmjxbyh2s6d5ic2ugbmbpr3x2c
==> ca-certificates-mozilla: Successfully installed ca-certificates-mozilla-2023-05-30-z3j2mqnqzocuvqyjk34demou46kijcgn
==> pkgconf: Successfully installed pkgconf-1.9.5-ymi6dpauz3p7b2m72nz5sjbjr7wyih5o
==> libiconv: Successfully installed libiconv-1.17-icayascjkf75gibrlzrapin5xwzjgtmx
==> util-macros: Successfully installed util-macros-1.19.3-5nbpgq55oq6pgjf4pmoibemfn5xkdog6
==> libsigsegv: Successfully installed libsigsegv-2.14-j6yxdcncmoukkqpsrgxatwcz4ukegejo
==> berkeley-db: Successfully installed berkeley-db-18.1.40-njjr2bjediewd42mykbr5tijxqkk7fl7
==> zstd: Successfully installed zstd-1.5.5-7wi2pu5yk3vguf4hnyh4per2o3jnjs6n
==> findutils: Successfully installed findutils-4.9.0-i5fct7erduigjoo6m76bhq3qvyycqmru
==> zlib-ng: Successfully installed zlib-ng-2.1.6-rgyevtirmvkjqwj5tpuykjhlqz3k4pa3
==> nghttp2: Successfully installed nghttp2-1.57.0-zpiv2hlw4lrro4lmxc47ntc5sradgqi4
==> ncurses: Successfully installed ncurses-6.4-sx4iapeisassme466m76hnzjylwgtsuf
==> diffutils: Successfully installed diffutils-3.9-ebqi4azof2nbbxkrnbs4g7vgfld6bznc
==> pigz: Successfully installed pigz-2.8-gygguhocgfqtsx53kaxh6gvls3lspucp
==> libedit: Successfully installed libedit-3.1-20210216-bpwwonjiiiob6pfkgwiq32r3v4xtybv6
==> readline: Successfully installed readline-8.2-bsauefvq3eldqsvhsoiiwn5m7hcfc5u2
==> m4: Successfully installed m4-1.4.19-wl2dfgrcwzd5dwcbve4eu3ykucbk7lhm
==> bzip2: Successfully installed bzip2-1.0.8-mhv4u6qx2mfe4g3mixlr24mkeb3rqrg6
==> gdbm: Successfully installed gdbm-1.23-7jxirliofgdeuflwzworvyjvlnzl3psd
==> libtool: Successfully installed libtool-2.4.7-nyvpl525uscikcgdnnasjocssnnexnun
==> bison: Successfully installed bison-3.8.2-kvualogniepdspedawckn6sajplgbyep
==> perl: Successfully installed perl-5.38.0-65amd752wtcomvxgiazlp5ct7w6vbfuw
==> libpciaccess: Successfully installed libpciaccess-0.17-kzqqwk7egks7el6afw7kqkf64tvavd3o
==> libxcrypt: Successfully installed libxcrypt-4.4.35-oepzpvyswnxpkgf3wevksfwmnpuoiwds
==> autoconf: Successfully installed autoconf-2.72-7pkycl3vmhf2ua3yhqnhz4qgjz3ipz4c
==> automake: Successfully installed automake-1.16.5-ydznsx2aze5zmv4diufemkjecxhk3vfz
==> numactl: Successfully installed numactl-2.0.14-rgwnljzm6z5sfv3x5dtvpsrzzldspkbl

Will update shortly with data from spack experiments.

Object 'saxpy' not found in repository

I am trying to run saxpy on an X86 node with setup saxpy/openmp nosite-x86_64 following step 1-6 of the documentation, but at step 5 I get the following error :

Object 'saxpy' not found in repository '~/benchpark/ramble/var/ramble/repos/builtin'

In step 4, setup.sh is generated with a message to source it to complete the benchpark setup.
But, after doing so, I have this error after moving to saxpy/openmp/nosite-x86_64/workspace/ and executing ramble -P -D . workspace setup.
Am I missing something?

Code crashes with python < 3.7

The code uses the text parameter of subprocess.run, introduced in python 3.7. The minimum python version should be documented and the code should give an error message if someone tries to run it with too old a python.

files missing

./bin/benchpark
Missing arguments. Run rambler.sh

find . -name rambler.sh
<nothing>

contribution model

Need to record maintainers: for systems (one per system config, e.g., nosite-x86_64), and experiments (one per experiment, e.g., amg2023/cuda)
New systems

system owners should define/update the default compilers on the system
dashboard report of all the experiments that were tested

New experiments

dashboard report of all the platforms tested
should we require a working test on a neutral platform (e.g., AWS)?

Spack package.py and Ramble application.py

benchpark/repo is the way to test all of the components together
should commits to the repo trigger PRs to Spack and Ramble? Once those are upstreamed, changes to benchpark/repo need to be made, and ${BENCHMARK1} should point at the upstreamed versions instead of at the repo.

benchpark shows incorrect compile line/flags in logs

while testing stream on fugaku i came across this bug (will fix the real problem later in a PR)

==> No binary for stream-5.10-4aaoueupfe6vflmysh5bbs4xgxnvksmu found: installing from source
==> No patches needed for stream
==> stream: Executing phase: 'edit'
==> stream: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16'

4 errors found in build log:
     4     ==> [2024-03-09-14:55:37.640554] FILTER FILE: Makefile [replacing "CFLAGS = .*"]
     5     ==> [2024-03-09-14:55:37.649467] FILTER FILE: Makefile [replacing "FFLAGS = .*"]
     6     ==> stream: Executing phase: 'build'
     7     ==> [2024-03-09-14:55:37.701297] 'make' '-j16'
     8     /vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/spack/lib/spack/env/gcc/gcc -O2 -fopenmp -
           DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20   -c -o mysecond.o mysecond.c
     9     /vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/spack/lib/spack/env/gcc/gcc -O2 -fopenmp -
           DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 stream.c -o stream_c.exe
  >> 10    gcc: error: unrecognized argument in option '-mcmodel=medium'
     11    gcc: note: valid arguments to '-mcmodel=' are: large small tiny

as you can see, the compile line does not include "-mcmodel=medium" as the error suggests, so somehow benchpark is hiding the real compiler execution and flags from the log/user which isn't a good approach.

Improve documentation on benchpark/repo

Currently a sub-bullet under "Add a benchmark" - but it is hidden by default, so maybe we need to make it more prominent? Users are clearly not finding it.

It is a little dense at the moment, perhaps can write more clearly?

Explain the staging: package.py and/or application.py in benchpark/repo overwrite Spack and Ramble built-ins. Intent: getting things working.

Once working, the goal is to upstream package.py to Spack (and remove from Benchpark once available in Spack), and application.py to Ramble.

spack concretize gets confused by one meta package providing two (or more) packages

fujitsu-ssl2 includes many features, among them blas, lapack, and fft functions

benchpark/configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/auxiliary_software_files/packages.yaml

Lines 35 to 37 in b3e09e6

	fujitsu-ssl2:
	externals:
	- spec: "[email protected]%[email protected] arch=linux-rhel8-a64fx"

hence, i specified it to provide all of them

benchpark/configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/spack.yaml

Lines 20 to 25 in b3e09e6

 blas: 

 spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx 

 lapack: 

 spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx 

 fftw: 

 spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx

gromacs needs lapack and fft:

benchpark/experiments/gromacs/openmp/ramble.yaml

Lines 54 to 58 in b3e09e6

 gromacs: 

 packages: 

 - lapack 

 - default-mpi 

 - fftw

this situaltion cases benchpark (or ramble) to create the following

spack:
  concretizer:
    unify: true
  specs:
  - [email protected] arch=linux-rhel8-a64fx
  - [email protected]%[email protected] arch=linux-rhel8-a64fx
  - [email protected] arch=linux-rhel8-a64fx %[email protected]
  - gromacs@main +mpi+openmp~hwloc %[email protected]
  include:
  - /vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/gromacs/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace/software/gromacs.water_gmx50_adac/compilers.yaml
  - /vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/gromacs/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace/software/gromacs.water_gmx50_adac/packages.yaml

as you can see, fujitsu-ssl2 is listed twice, and oddly enough, it is listed 1x without compiler and 1x with compiler which then leads to the following error with spack's concretize command

==> Concretizing Spack environment
==> 
==> *******************************************
==> ********** Running Spack Command **********
==> **     command: /vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/spack/bin/spack concretize
==> **     with args: ['-U', '-f']
==> *******************************************
==> 
==> Error: concretization failed for the following reasons:

   1. cannot satisfy a requirement for package 'fujitsu-ssl2'.. You could consider setting `concretizer:unify` to `when_possible` or `false` to allow multiple versions of some packages.
==> Error: Command exited with status 1:
    '/vol0005/mdt3/data/ra000020/u10016/benchpark.gnu/test.gnu/spack/bin/spack' 'concretize' '-U' '-f'

i'm not sure which tool is incorrect in this case, but i'm sure someone with deeper benchpark and spack knowledge with figure it out. for now i uncomment the fft support for ssl2 as local workaround

benchpark/configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/spack.yaml

Lines 24 to 25 in b3e09e6

 fftw: 

 spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx

Compilers become external packages in Spack

Need to move compilers.yaml to packages.yaml
Need to understand when Ramble will shift to this as well, Spack expected to shift by November 2024.
Compiler-runtimes will show up as dependencies in the dependence graph.
Evaluate impact of conflict: and prefer: syntax changes
Binary compatibility on linux is now based on the libc version - how is this checked?

Hi all,

Spack v0.22 is out with some long-awaited changes to compiler handling, and many other features!

We are in the process of making compilers proper dependencies in Spack, and a number of changes in v0.22 support that effort. You may notice nodes in your dependency graphs for compiler runtime libraries like gcc-runtime or libgfortran, and you may notice that Spack graphs now include libc. We've also begun moving compiler configuration from compilers.yaml to packages.yaml to make it consistent with other externals. We are trying to do this with the least disruption possible, so your existing compilers.yaml files should still work. We expect to be done with this transition by the v0.23 release in November.

Highlights from the release:
Compiler runtime dependencies
Packages compiled with %gcc on Linux, macOS and FreeBSD now depend on a new package, gcc-runtime.
Packages compiled with %oneapi now depend on a new package intel-oneapi-runtime.
Changes to the optimization criteria of the solver improve the hit-rate of buildcaches by a fair amount.
Spack will reuse specs built with compilers that are not explicitly configured in compilers.yaml.
Binary compatibility on linux is now based on the libc version, instead of on the os tag.
Each package that can provide a compiler is now detectable using spack external find. External packages defining compiler paths are effectively used as compilers, and spack external find -t compiler can be used as a substitute for spack compiler find.
Improved spack find UI for Environments
Improved command-line string quoting
Revert default spack install behavior to --reuse
More control over reused specs
New redistribute() directive
New conflict: and prefer: syntax for package preferences
include_concrete in environments: ou may want to build on the concrete contents of another environment without changing that environment. You can now include the concrete specs from another environment's spack.lock with include_concrete.
Improved Python isolation through the use of a new python-venv package that we shim in front of spack-built and external python.
More details in the release notes here:
https://github.com/spack/spack/releases/tag/v0.22.0
-Todd

saxpy binary not copied to experiment directory

sucessful cmd: ./bin/benchpark setup saxpy/openmp RCCS-Fugaku-A64FX-TofuD test
sucessful cmd: ramble -P -D /vol0005/mdt3/data/ra000020/u10016/benchpark/test/saxpy/openmp/RCCS-Fugaku-A64FX-TofuD/workspace on

one log from ramble

==> *******************************************
==> ***** Finished Running Spack Command ******
==> *******************************************
==> 
==>   Executing phase make_experiments
==> Writing template execute_experiment to /vol0005/mdt3/data/ra000020/u10016/benchpark/test/saxpy/openmp/RCCS-Fugaku-A64FX-TofuD/workspace/experiments/saxpy/problem/saxpy_512_1_8_2/execute_experiment
==>   Executing phase write_status
==>   Executing phase write_inventory
==> Phase timing statistics:
==>   get_inputs time: 2e-05 (s)
==>   license_includes time: 0.00835 (s)
==>   software_create_env time: 9.8721 (s)
==>   software_install_requested_compilers time: 5.75531 (s)
==>   software_configure time: 151.70743 (s)
==>   software_install time: 1601.32282 (s)
==>   evaluate_requirements time: 1e-05 (s)
==>   define_package_paths time: 54.19031 (s)
==>   make_experiments time: 0.03606 (s)
==>   write_status time: 0.01214 (s)
==>   write_inventory time: 0.01052 (s)

binary exists:

$ find ./test -type f -name saxpy
./test/spack/opt/spack/linux-rhel8-a64fx/fj-4.8.1/saxpy-1.0.0-n3oc5yv7sfljri2p7zif2gz5vxjljxut/bin/saxpy

binary missing from experiment directory

$ ll /vol0005/mdt3/data/ra000020/u10016/benchpark/test/saxpy/openmp/RCCS-Fugaku-A64FX-TofuD/workspace/experiments/saxpy/problem/saxpy_512_1_8_2/                   
total 12
-rwxrwxr-x 1 u10016 ra010011 1257 Feb 23 07:49 execute_experiment
-rw-r--r-- 1 u10016 ra010011  139 Feb 23 07:49 ramble_inventory.json
-rw-r--r-- 1 u10016 ra010011   34 Feb 23 07:49 ramble_status.json
-rw-r--r-- 1 u10016 ra010011    0 Feb 23 08:03 saxpy_512_1_8_2.out

benchpark setup picks up .swp files from the machine config dir and crashes

Vim swap files from the machine config directory seemed to get picked up and passed to ramble which throws a utf-8 decoding error shown below.

Steps to reproduce below,compilers.yaml was opened in vim before running benchpark setup.

$benchpark setup gromacs/openmp LUMI-C-HPECray-zen3-MI250X-Slingshot ./
[...]

$. ./setup.sh

$ ramble -d -P -D gromacs/openmp/LUMI-C-HPECray-zen3-MI250X-Slingshot/workspace workspace setup
==> [2024-02-22-16:36:41.239885] In workspace init. Root = gromacs/openmp/LUMI-C-HPECray-zen3-MI250X-Slingshot/workspace
==> [2024-02-22-16:36:41.246624] '/usr/bin/git' '-C' '/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble' 'rev-parse' 'HEAD'
==> [2024-02-22-16:36:41.254788] '/usr/bin/git' '-C' '/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble' 'rev-parse' 'HEAD'
==> [2024-02-22-16:36:41.262421] '/usr/bin/git' '-C' '/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/spack' 'rev-parse' 'HEAD'
/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_4/gromacs/openmp/LUMI-C-HPECray-zen3-MI250X-Slingshot/workspace/configs/auxiliary_software_files/packages.yaml
/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_4/gromacs/openmp/LUMI-C-HPECray-zen3-MI250X-Slingshot/workspace/configs/auxiliary_software_files/modules.yaml
/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_4/gromacs/openmp/LUMI-C-HPECray-zen3-MI250X-Slingshot/workspace/configs/auxiliary_software_files/.compilers.yaml.swp
Traceback (most recent call last):
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/bin/ramble", line 70, in <module>
    sys.exit(ramble.main.main())
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/lib/ramble/ramble/main.py", line 909, in main
    return _main(argv)
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/lib/ramble/ramble/main.py", line 810, in _main
    ws = ramble.cmd.find_workspace(args)
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/lib/ramble/ramble/cmd/__init__.py", line 280, in find_workspace
    return ramble.workspace.Workspace(ws)
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/lib/ramble/ramble/workspace/workspace.py", line 522, in __init__
    self._read()
  File "/pfs/lustrep3/scratch/project_465000810/szpall/benchpark_tests_2/ramble/lib/ramble/ramble/workspace/workspace.py", line 581, in _read
    self._read_auxiliary_software_file(filename, f.read())
  File "/opt/cray/pe/python/3.10.10/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 16: invalid continuation byte

issue with open files

running bin/benchpark setup while packages.yaml is opened in vim results in problems for ramble -P -D ...

==> Error: 'utf-8' codec can't decode byte 0xb3 in position 17: invalid start byte

It seems that some vim temp files are used in the background. When I close the file and run ramble -P -D ... again i get a new error pointing to some .swp file from vim:

==> Error: [Errno 2] No such file or directory: '/vol0005/mdt3/data/ra000020/u10016/benchpark/test.gnu/amg2023/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace/configs/auxiliary_software_files/.packages.yaml.swp'

Set up dry-run tests for experiments in benchpark

This is how this works in Ramble:
https://googlecloudplatform.github.io/ramble/tutorials/2_running_a_simple_gromacs_experiment.html#setting-up-the-experiments
ramble repo add <benchpark_app_repo>
ramble unit-test -k known_applications

Near-term: perform the ramble unit-test in a benchpark workspace, with a list of applications (perhaps the x86 versions to start)
Mid-term: create a list of applications (perhaps the x86 versions to start) and set them up to run in CI
Longer-term: dry-run test when a new experiment is contributed via a PR

ncurses build issue with suspect python version

From the build log:

2 errors found in build log:
     799    -- Looking for nodelay in /dev/shm/bp/workspace/spack/opt/spack/linux-rhel8-icelake/gcc-10.3.1/ncurses-6.4-vaxrsua5gtx2q4ro7m4gndjdf3edy2vf/lib/libncurses.so - not found
     800    -- Found Curses: /dev/shm/bp/workspace/spack/opt/spack/linux-rhel8-icelake/gcc-10.3.1/ncurses-6.4-vaxrsua5gtx2q4ro7m4gndjdf3edy2vf/lib/libncurses.so
     801    -- Looking for use_default_colors
     802    -- Looking for use_default_colors - found
     803    -- Looking for a Fortran compiler
     804    -- Looking for a Fortran compiler - /dev/shm/bp/workspace/spack/lib/spack/env/gcc/gfortran
  >> 805    CMake Error at /usr/apps/python-3.11.5/lib/cmake/Qt5Core/Qt5CoreConfig.cmake:14 (message):
     806      The imported target "Qt5::Core" references the file
     807
     808         "/usr/apps/python-3.11.5/bin/qmake"
     809
     810      but this file does not exist.  Possible reasons include:
     811

     ...

     826      /usr/apps/python-3.11.5/lib/cmake/Qt5Widgets/Qt5WidgetsConfig.cmake:100 (find_package)
     827      Tests/CMakeLists.txt:276 (find_package)
     828
     829
     830    -- Configuring incomplete, errors occurred!
     831    ---------------------------------------------
  >> 832    Error when bootstrapping CMake:
     833    Problem while running initial CMake
     834

Here's the reproducer script. Will update with results from an older version of python. Assuming this is a local issue for now.

#!/bin/bash
export TAG=`date +"%F_%T"`

# Where are we?
export MACHINE=poodle

# Set up directory structure
export TOPDIR=/dev/shm/bp
export TMPDIR=/dev/shm/bp/tmp
export RESULTS_DIR=${HOME}/w/${MACHINE}/bp/results

echo "Starting..." `date`                                                       2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
echo "[bp_test.sh] MACHINE="${MACHINE}                                          2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
echo "[bp_test.sh] TOPDIR="${TOPDIR}                                            2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
echo "[bp_test.sh] RESULTS_DIR="${RESULTS_DIR}                                  2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}

mkdir -p ${TOPDIR}
mkdir -p ${TMPDIR}
mkdir -p ${RESULTS_DIR}

# Modify path
export PATH=${HOME}/.local/bin:${PATH}
export PATH=${TOPDIR}/benchpark/bin:${PATH}
echo "[bp_test.sh] PATH="${PATH}                                                2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}

# Set up software environment
echo "[bp_test.sh] module load python/3.11.5"                                   2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
#module load python/3.11.5                                                       2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
module load python/3.10.8

# Set up benchpark parameters
export BPSITE=nosite-x86_64
export BPEXPR=saxpy/openmp
export WORKSPACE_DIR=${TOPDIR}/workspace
echo "[bp_test.sh] BPSITE="${BPEXPR}                                            2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
echo "[bp_test.sh] BPEXPR="${BPEXPR}                                            2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
echo "[bp_test.sh] WORKSPACE_DIR="${WORKSPACE_DIR}                              2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
mkdir -p ${WORKSPACE_DIR}

# spack rituals
export SPACK_DISABLE_LOCAL_CONFIG=1
echo "[bp_test.sh] SPACK_DISABLE_LOCAL_CONFIG="${SPACK_DISABLE_LOCAL_CONFIG}    2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}

# and we're off....
cd ${TOPDIR}
git clone [email protected]:LLNL/benchpark.git                                     2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
cd ./benchpark
pip install -r requirements.txt                                                 2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}

benchpark setup ${BPEXPR} ${BPSITE} ${WORKSPACE_DIR}                            2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
. ${WORKSPACE_DIR}/setup.sh                                                     2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
ramble -P -D ${WORKSPACE_DIR}/${BPEXPR}/${BPSITE}/workspace workspace setup     2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
#ramble -P -D ${WORKSPACE_DIR}/${BPEXPR}/${BPSITE}/workspace on
echo "Completed" `date`                                                         2>>${RESULTS_DIR}/bp_stderr_${TAG} 1>>${RESULTS_DIR}/bp_stdout_${TAG}
cd ${RESULTS_DIR}

tqdm required by ramble

==> Error: Module tqdm is not found. Ensure requirements.txt are installed.

Should we be automatically be installing Ramble's package requirements as part of benchpark's setup or let the user get to the setup step of benchpark before hitting this error?

benchpark steps

All steps within the benchpark process should ultimately function interactively, in batch mode, under CI, and in the cloud. First, lets iron out the steps for a human user.

Checkout benchpark and needed infrastructure

benchpark setup [--spack=spack_dir] [--ramble=ramble_dir]

Should do everything in step 1-6 in current "running the benchmark"
once/infrequently
interactive/on login node
if point at system spack, what version is it? will this cause compatibility issues, or lack latest package descriptions?
Probably want to be able to pull in benchpark core updates, and corresponding spack and ramble versions. Not clear about pulling in changes to the config files which may conflict with local ones - but may also be necessary....
Errors: invalid directories specified, unable to clone -- all should be displayed in a shell interactively. Need to capture in a file if not running interactively (how to tell the difference?)
Re-entrant?

Browse benchpark to see available options for benchmark/programming_model, system, experiments, modifiers

benchpark list system, benchpark list benchmark, benchpark list modifiers

interactive/on login node
Error: interactive only (will not do this step under CI)
Re-entrant: yes, should have no side effects

(opt) Turn on modifiers, make any other changes ((We need to talk about the policy for what we want committed back - vs. what is a local change, and should these be in the same files?))

edit ramble.yaml, anything else? If it is possible to add a command to do this, this could be merged with step 3.
interactive/on login node
Error?
Re-entrant? I think so, only the user is making changes here

Select an available experiment, set up the workspace, build, and generate run scripts.

benchpark select_experiment --benchmark=benchmark_name/programming_model --system=system_name --workspace=workspace_dir

the user can keep coming back to select different experiments to run
if there are multiple workloads, how and in which step to select one?

ramble -D . workspace setup

Option to run on interactive/on login node or submit to batch
Error? From Spack etc. To fix, would edit the workspace/configs, until fixed. Then, would need to back port those changes to benchpark, which may or may not be straightforward (depending if benchpark combines ramble.yaml and other files before passing them to ramble, the mapping may be tricky to figure out).
Re-entrant: Yes, Ramble will overwrite previously generated files.

ramble -D . on

usually submits to batch
Error: scheduler (interactive?), launcher (file), or job error (file): Need to make easy for user to find
Re-entrant: Yes, but may overwrite the previous run or be confusing which of the files are from which run.
Are there "clean previous run" type calls the user can/should make before re-running?

analyze results

ramble workspace analyze

interactive or in batch
Error?
Re-entrant?

archive results

ramble workspace archive [-t] [--upload-url <mirror_url>]

-t creates a tarball
interactive or in batch
Error?
Re-entrant? probably but might create duplicates or overwrite

Please comment on individual steps.

streamc not building

$ ramble -P -D $(readlink -f $(pwd)/workspace/${BM}/${SYS}/workspace) workspace setup
==> Warning: The following config sections are deprecated and ignored:
==> Warning:     spack:concretized
==> Warning: Please remove from your configuration files.
==> Streaming details to log:
==>   /vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/workspace/streamc/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace/logs/setup.2024-05-02_11.34.27.out
==>   Setting up 6 out of 6 experiments:
==> Experiment #1 (1/6):
==>     name: streamc.streamc.stream_80000000_20_8
==>     root experiment_index: 1
==>     log file: /vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/workspace/streamc/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace/logs/setup.2024-05-02_11.34.27/streamc.streamc.stream_80000000_20_8.out
==> Error: Software spec stream is not defined in environment stream_80000000_20, but is required by the streamc application definition

Need to document spack and ramble version pinning

This can come up when adding or modifying a benchmark - and the package.py is already in Spack, but not yet in Benchpark.

how to find the versions we are on
potential side effects (e.g., version compatibility issues (e.g., package in tip of develop vs. our pinned version))
what might happen if people manually get the tip of develop (and how they could do that if they really needed to)

Need a mechanism to specify maintainers for systems and experiments

Ideally, we should have a maintainer for every system config (e.g., nosite-x86_64) and experiment (e.g., amg2023/cuda). The goals are the same as for Spack maintainers. I did not find a page on maintainers - but Spack's contribution guide is here:

https://spack.readthedocs.io/en/latest/contribution_guide.html

Automated saxpy/openmp CI on x86 in the cloud

Add documentation on spack modules

Where should this live?

A sub-bullet under "Adding a Specific System configuration"
FAQ: I would like to use external dependencies ... Adding a Specific System could point here.

Explain that if installed software does not have correct rpaths (e.g., it was installed by EasyBuild), it needs to be added as a spack external module, not just a spack external.

Point at Spack modules documentation.

OpenMPI build fails on finding xz and openssl source

variable expansion in spack.yaml not working properly

using configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/spack.yaml:

spack:
  packages:
    default-compiler:
      spack_spec: {default_comp} {sys_arch}
    default-mpi:
      spack_spec: fujitsu-mpi@{fj_comp_version}%{default_comp} {sys_arch}

default_comp is defined as '[email protected]'

it worked for spack_spec: fujitsu-mpi@{fj_comp_version}%{default_comp} {sys_arch}

but for spack_spec: {default_comp} {sys_arch} i'm getting the following error during ramble ... setup

==> Error: Error parsing yaml  in "/vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/spack.yaml", line 9, column 7: expected <block end>, but found '{'

Make System Configs table searchable

Make the list of systems searchable; consider using IO500 website < https://io500.org/submissions/customize/sc23/ten-production>_ as inspiration.

Add documentation on using Thicket to analyze Caliper data

saxpy not building anymore (after ramble update?)

$ ramble -P -D workspace/saxpy/openmp/RCCS-Fugaku-Fujitsu-A64FX-TofuD/workspace workspace setup
==> Error: Object 'saxpy' not found in repository '/vol0005/mdt3/data/ra000020/u10016/benchpark.llvm/workspace/ramble/var/ramble/repos/builtin'

Ramble: commit bb664f142b2cbdb2b2ea39e70a8535c9f27c1179
Benchpark: commit ea8293e

Need unit tests for python functionality

Multi-level system config

System configs currently contain different types of information, which serves different purposes:

Hardware specification

where defined: system_definition.yaml
systems it applies to: a class of systems at different sites
longevity: duration of the system (or class of systems) lifetime
purpose of record: find a system with the same hardware as my system. May want to record with the experiment.

Software stack: compiler and MPI locations

where defined: Optional?!? compilers.yaml
systems it applies to: just ours? can we autodetect?
longevity: ?
purpose of record: give the users a starting point to running on their system. What errors and guidance for mitigation should we give? Do we want these upstreamed back? Do we want these recorded in the experiment?

Software stack: compiler and MPI versions

where defined: compilers.yaml
systems it applies to: different machines could be at different versions
longevity: new versions can appear any time
purpose: give the users a starting point, also need to record as part of experiment - and use to debug or compare performance. Probably want to let users parameterize - and set up versions to use as part of their suites.

Scheduler, launcher:

where defined: variables.yaml
systems it applies to: many. Probably need a slurm and a flux schedule definition, auto generated for the user when they tell us which it is (can we autodetect?). Probably need to define a few launchers and pick one (mpirun, srun, ...)
longevity: static, except the queue info is baked in here unfortunately.
purpose: give the users a starting point. Probably don't want upstreamed, may not need to record.

Software packages we don't want to keep building

where defined: Optional! packages.yaml
systems it applies to: probably just ours. May be able to find using spack external.
longevity: yeah may want to update versions over time.
purpose: shorten build time. We do not want these upstreamed, but we want to be able to record for our own experiments/CI etc.

We should probably define a graded approach for generating these:

only introduce a new hardware specification if one like it indeed does not exist.
if hardware specification exists, pick scheduler&launcher, and how to start to define the compilers and MPI to use?
versions should be parameterized
optional things can be added later (if desired)

Benchpark diff

Given 2 runs of an experiment (same machine or different machines), what is different in:

systems (spec diff?)
build (ignore system software - compilers (+cmake etc.), mpi, math libraries
run parameters (ignore scheduler, launcher)
performance (we use caliper+thicket to diff these by establishing common call tree and parameters/metadata)

Spack diff notes:
spack find -l # returns package and its hash
ls spack location -I /hash_here/.spack/spec.json. # finds spec.json for the package
spack spec # outputs a list of dependencies, one per line
spack providers # outputs a list of "virtual packages": interfaces that can be provided by other packages,
# e.g., mpi, blas, lapack
# hip is amd's package name for rocm etc.
spack/spack#41711 # added ability to ignore dependencies in Spack diff

How to specify what to ignore? System sw (compilers, mpi, math libraries) but its a longer list:

./spack diff --ignore=gcc-runtime --ignore=hsa-rocr-dev --ignore=rocprim --ignore=rocrand --ignore=rocsparse --ignore=rocthrust --ignore=llvm-amdgpu --ignore=cmake --ignore=gmake --ignore=hip --ignore=blas --ignore=lapack --ignore=mpi /Users/pearce8/Documents/spack_diff/tioga-gtl-spec.json /Users/pearce8/Documents/spack_diff/magma-spec.json

produces:

amg2023 amdgpu_target gfx90a

amg2023 openmp True

amg2023 openmp False

amg2023 rocm False

amg2023 rocm True

hypre fortran True

hypre amdgpu_target gfx90a

hypre openmp True

hypre fortran False

hypre rocm False

hypre openmp False
hypre rocm True

Centralize text common to code and documentation (e.g., required python version numbers)

We'd like to have the benchpark binary be able to check the version of python in use against the minimum version, mention the minimum version in the documentation, and both tied to a common source so there's only a single file that needs to get updated.

rst has a .. literalinclude:: tag that should fit the bill. Perhaps have a file ground_truth/min_python_version.txt that is included via literalinclude in the docs and (with a bit more work) parsed by python in the benchpark script.

This avoids any CI silliness.

Add documentation on caliper modifier

Should be a sub-bullet under "Edit experiments".
Should reference Caliper - possibly the front page of Caliper docs.
Should reference Ramble modifiers page.
Should provide instructions on how to turn on the caliper modifier.

Exception handling

Raise an exception if benchpark setup received incorrect parameters (right now the code just returns).

Save job id for interaction with the scheduler

https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/job_submission.html#job-manipulation

Collaborators are doing similar things with Slurm.

Fails on macOS due to /dev/shm

When running on macOS, ramble workspace setup fails with

PermissionError: [Errno 1] Operation not permitted: '/dev/shm'

I assume this is becuase that directory doesn't exist on macOS.
I recall us discussing getting rid of the /dev/shm dependency, but I suppose this issue is to help track the status of this.

Thanks,
Nate

we need alternative location to ~/.spack

i ran two instances of ramble workspace setup and it broke ~/.spack/bootstrap/config/linux/config.yaml by adding a lonely 6 in the last line:

 57   debug: false
 58   build_jobs: 16
 59 6

consequently some new exec of workspace setup failed:

==> Error: error parsing YAML: near /home/u10016/.spack/bootstrap/config/linux/config.yaml, 58, 0: could not find expected ':'
    while scanning a simple key
  in "/home/u10016/.spack/bootstrap/config/linux/config.yaml", line 59, column 1
could not find expected ':'
  in "/home/u10016/.spack/bootstrap/config/linux/config.yaml", line 60, column 1

To fix that, benchpark shouldn't use ~/.spack but something local like .//.spack (or at least try to lock the files before writing to it, but that is less preferred because i would like to have a "clear" ~/.spack folder for other work, which isn't altered by benchpark).

	blas:
	spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx
	lapack:
	spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx
	fftw:
	spack_spec: fujitsu-ssl2@{default_fj_version} arch=linux-rhel8-a64fx