embench / embench-iot Goto Github PK

The main Embench repository

License: GNU General Public License v3.0

C 80.10% Python 16.31% Assembly 3.59%

embench-iot's Introduction

Embench Context Switch and Interrupt Benchmark

The work of this project is now transferred to the embench-rt repository. Embench benchmarks are now named for the market sector they address, rather than the technology they are based on.

Although all files were removed from this repository in its final commit, the history is preserved both in this repository and (more usefully) in the the embench-rt repository.

This benchmark is now archived.

embench-iot's People

Contributors

Stargazers

Watchers

Forkers

pys1024 mablinov jeremybennett lvcargnini kohnakagawa sifive gregac rtcw50 paolos02 antonblanchard kenta2 neilemorgan olajep tunghoang290780 resharma shyamalschandra galoisinc hiromu8811 jmonesti luismarques tilakbg90 foss-for-synopsys-dwc-arc-processors ming-yang simonpcook edward-jones nojpg spinalhdl srcejon loesterfranco nidalf sobuch escou64 antmicro compiler-team-dot-com darkbuck seagate shijay01 juliankunkel glasnak roger-shepherd silabs-mateilga tomverbeure xiaoxiang781216 xwentian2013 flip1995 lewis-revill hirooih birdluo ronen-haen-wdc nati-rapaport 3rror hongshui3000 craigblackmore luke957 wren6991 mfkiwl dnltz 1c3t3a kebrahimpour ottg davidharrishmc myxinxin-code michaelmaitland josecm hahaha142142 goldenbergg ppg18887 dtowersm barrydebruin lhtin hit586 colonelphantom ubc-orca adchd iiiyours nmeum gm-jiang e-matthews i-mikan-i maheshejs ssayin jerryz123 bluewww irichter yxd97 widlarizer xiebinbb riscv-mcu stepanof madhu2000u emanueleparisi tzwenn clkbug js-pycoder

embench-iot's Issues

Size of multiple sections by benchmark_size.py are not easily reported{improvement}

I have been running Embench on AVR with Microchip XC8.

.elf file created by Microchip XC8 has multiple sections with same name.
example of objdump from attached .elf file:

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000000  00800200  00800200  0000173a  2**0
                  ALLOC, LOAD, DATA
  1 .text         0000014c  00000000  00000000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  7 .text         00000004  000016b8  000016b8  0000172c  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  8 .note.gnu.avr.deviceinfo 00000040  00000000  00000000  00003424  2**2
                  CONTENTS, READONLY, DEBUGGING
  9 .text.mulul64 00000180  00000c06  00000c06  00000c7a  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .text.montmul 0000022a  000009dc  000009dc  00000a50  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

To accommodate all the executable sections I made following changes to benchmark_size.py:

def section_relevant_for_size(section):
    flags = section['sh_flags']
    if (flags & SH_FLAGS.SHF_ALLOC) == 0:
        return False;
    if (flags & SH_FLAGS.SHF_EXECINSTR) == 0:
        return False;
    return (flags & SH_FLAGS.SHF_MASKPROC) == 0

def benchmark_size(bench):
        for sec in elf.iter_sections():
            if section_relevant_for_size(sec):
                sec_size += sec['sh_size']

You can pass attached .elf file with python elftool to analyze the report.
aha-mont64.zip

Probably this can be correctly altered to accommodate all metric formats.

Tuning the benchmark repeat rate.

Hi there

Would it be possible to add a switch / knob somewhere in the configure script to tune the repeat rate for each benchmark?

Currently, the number of repetitions is a function of the global CPU_MHZ define, and a local (to the benchmark) LOCAL_SCALE_FACTOR (magic?) number. It would be nice to be able to change the LOCAL_SCALE_FACTOR to something very small for testing, or running simulations of the hardware where sometimes just 10's of repetitions suffices to get an idea of performance.

This would be very useful when analysing hardware being developed to really drill down into why performance bugs occur. Typically one would want access to waveform dumps, which grow too big to handle very quickly. Only running a few iterations keeps the waveform dumps small enough to use.

I'm happy putting together a patch for this myself, but figured I'd ask here first.

Cheers,
Ben

Queries on Embench benchmarks- Running a simple testcase

Hi,

Currently we are compiling and running all tests together.
Is it possible to compile and run a single test?

Thankyou.

Regards,
Sheena

Benchmark speed reference values and division with CPU_MHZ

For computing benchmark speed, document mentions:
"For each benchmark divide the time taken by the value used for CPU_MHZ in the configuration to give a normalized time value."

For the reference values computed, It is mentioned that default 16MHz is used for STM32, but config/arm/boards/stm32f4-discovery/boardsupport.c (or any other script) does not indicate on whether the benchmark time value is divided by CPU_MHZ.

I am running benchmark for speed on Microchip SAM Cortex M0+ device at 48Mhz and based on speed values which I have obtained, I don't think the reference values on STM32 used has CPU_MHZ divided to it.

Can you confirm please confirm the value of CPU_MHZ used for reference and whether the reference values are divided by CPU_MHZ?

If the reference values are not divided by CPU_MHZ, it would cause confusion when computing a relative benchmark score running at different frequency.

Can the process and values for reference values for speed be documented for future use?

commercial friendly license

I see currently Embench is licensed under GPLv3.

For Embench to be widely adopted, can we consider switching to a commercial friendly license like BSD or MIT?

gcc specific attributes are used in the benchmarks

The benchmark code is using gcc specific attributes to mark unused parameters and function to not inline. Using gcc specific attributes will make the benchmark harder to port to non-gcc based compilers. This was seen when building the benchmarks with the IAR compiler. The IAR compiler has a way to enable gcc extensions, however it would be nice if the benchmark could build on any standard c compiler.

Issue Running Embench

I tried to compile the benchmarks using the following command:

python3 build_all.py --arch arm --chip cortex-m4 --board generic

I am running into the error below
Warning: Compilation of beebsc.c from source directory /Users/user/embench-iot/support to binary directory /Users/user/embench-iot/bd/support failed
Warning: Compilation of main.c from source directory /Users/user/embench-iot/support to binary directory /Users/user/embench-iot/bd/support failed
Warning: Compilation of chipsupport.c from source directory /Users/user/embench-iot/config/arm/chips/cortex-m4 to binary directory /Users/user/embench-iot/bd/config/arm/chips/cortex-m4 failed
Warning: Compilation of boardsupport.c from source directory /Users/user/embench-iot/config/arm/boards/generic to binary directory /Users/user/embench-iot/bd/config/arm/boards/generic failed
Warning: Compilation of mont64.c from source directory /Users/user/embench-iot/src/aha-mont64 to binary directory /Users/user/embench-iot/bd/src/aha-mont64 failed
Warning: Compilation of crc_32.c from source directory /Users/user/embench-iot/src/crc32 to binary directory /Users/user/embench-iot/bd/src/crc32 failed
Warning: Compilation of libcubic.c from source directory /Users/user/embench-iot/src/cubic to binary directory /Users/user/embench-iot/bd/src/cubic failed
Warning: Compilation of basicmath_small.c from source directory /Users/user/embench-iot/src/cubic to binary directory /Users/user/embench-iot/bd/src/cubic failed
Warning: Compilation of libedn.c from source directory /Users/user/embench-iot/src/edn to binary directory /Users/user/embench-iot/bd/src/edn failed
Warning: Compilation of libhuffbench.c from source directory /Users/user/embench-iot/src/huffbench to binary directory /Users/user/embench-iot/bd/src/huffbench failed
Warning: Compilation of matmult-int.c from source directory /Users/user/embench-iot/src/matmult-int to binary directory /Users/user/embench-iot/bd/src/matmult-int failed
Warning: Compilation of libminver.c from source directory /Users/user/embench-iot/src/minver to binary directory /Users/user/embench-iot/bd/src/minver failed
Warning: Compilation of nbody.c from source directory /Users/user/embench-iot/src/nbody to binary directory /Users/user/embench-iot/bd/src/nbody failed
Warning: Compilation of nettle-aes.c from source directory /Users/user/embench-iot/src/nettle-aes to binary directory /Users/user/embench-iot/bd/src/nettle-aes failed
Warning: Compilation of nettle-sha256.c from source directory /Users/user/embench-iot/src/nettle-sha256 to binary directory /Users/user/embench-iot/bd/src/nettle-sha256 failed
Warning: Compilation of libnsichneu.c from source directory /Users/user/embench-iot/src/nsichneu to binary directory /Users/user/embench-iot/bd/src/nsichneu failed
Warning: Compilation of picojpeg_test.c from source directory /Users/user/embench-iot/src/picojpeg to binary directory /Users/user/embench-iot/bd/src/picojpeg failed
Warning: Compilation of libpicojpeg.c from source directory /Users/user/embench-iot/src/picojpeg to binary directory /Users/user/embench-iot/bd/src/picojpeg failed
Warning: Compilation of qrtest.c from source directory /Users/user/embench-iot/src/qrduino to binary directory /Users/user/embench-iot/bd/src/qrduino failed
Warning: Compilation of qrencode.c from source directory /Users/user/embench-iot/src/qrduino to binary directory /Users/user/embench-iot/bd/src/qrduino failed
Warning: Compilation of qrframe.c from source directory /Users/user/embench-iot/src/qrduino to binary directory /Users/user/embench-iot/bd/src/qrduino failed
Warning: Compilation of combined.c from source directory /Users/user/embench-iot/src/sglib-combined to binary directory /Users/user/embench-iot/bd/src/sglib-combined failed
Warning: Compilation of libslre.c from source directory /Users/user/embench-iot/src/slre to binary directory /Users/user/embench-iot/bd/src/slre failed
Warning: Compilation of libst.c from source directory /Users/user/embench-iot/src/st to binary directory /Users/user/embench-iot/bd/src/st failed
Warning: Compilation of libstatemate.c from source directory /Users/user/embench-iot/src/statemate to binary directory /Users/user/embench-iot/bd/src/statemate failed
Warning: Compilation of libud.c from source directory /Users/user/embench-iot/src/ud to binary directory /Users/user/embench-iot/bd/src/ud failed
Warning: Compilation of libwikisort.c from source directory /Users/user/embench-iot/src/wikisort to binary directory /Users/user/embench-iot/bd/src/wikisort failed

Any idea on how to resolve this?

Thanks,
George O.

Add to readme - prerequisite python version 3.x

Lets add to the readme a quote saying that users have to use python version 3.x

Parameter value priority order

This concerns two points about parameter handling that I'm having a little trouble with understanding.

In config/native/chips/size-test-gcc it says

# Parameter values which are duplicated in architecture, board, chip or
# command line are used in the following order of priority
# - default value
# - architecture specific value
# - chip specific value
# - board specific value
# - command line value

I assume this list is in order of increasing priority - that is the command line overrides anything else. I expected the list to be in descending order of priority (highest priority first); unless the order used is absolutely industry standard, and I'm just showing my lack of knowledge, perhaps the intro should say

# Parameter values which are duplicated in architecture, board, chip or
# command line are used in the following order of priority (lowest first)

I can't see how to pass flags through on the command line. Perhaps the mechanism, or lack thereof could be documented.

wikisort benchmark uses gcc extensions

The wikisort benchmark is using the typeof gcc extension which will make the benchmark fail on non-gcc compilers like for instance the IAR compiler.

One possible fix is to add the type as an argument to the Var and Swap macros instead of using typeof inside these macros. typeof is also used in the benchmark_body() function and can be replaced by writing the full type.

Suggestion: Generate requirements.txt to list the required libraries.

Execute 'python freeze > requirements.txt' to generate requirements.txt

Inconsistent Speed Results on Riscv CPU

Hi,

I am not sure if I am doing anything wrong here but on my platform, the speed benchmark results are fairly inconsistent. I am using a VexRiscv Core on a Genesys2 FPGA running Linux (125 Mhz).

I set up the boardsupport file as follows (the FENCE instruction is not necessary; it was just an attempt to make the results more consistent):

Boardsupport.c

#include <support.h>
#include <boardsupport.h>

void
initialise_board ()
{
  __asm__ volatile ("li a0, 0" : : : "memory");
}

void __attribute__ ((noinline)) __attribute__ ((externally_visible))
start_trigger ()
{
  __asm__ volatile ("FENCE\n\t"
                    "RDTIME %[lo]\n\t"
                    "RDTIMEH %[hi]\n\t"
                    "FENCE\n\t" : [lo]"=r"(lo), [hi]"=r"(hi) : :);
  t_start = (((unsigned long long)hi) << 32) | lo;
}

void __attribute__ ((noinline)) __attribute__ ((externally_visible))
stop_trigger ()
{
  __asm__ volatile ("FENCE\n\t"
                    "RDTIME %[lo]\n\t"
                    "RDTIMEH %[hi]\n\t" : [lo]"=r"(lo), [hi]"=r"(hi) : :);
  t_stop = (((unsigned long long)hi) << 32) | lo;
  unsigned long normalized = (t_stop-t_start)/CPU_MHZ;
  printf("%llu;\n", normalized);
}

Boardsupport.h

#include <stdio.h>
#define CPU_MHZ 125

unsigned long hi = 0, lo = 0;
unsigned long long t_start = 0;
unsigned long long t_stop = 0;

The chipsupport files are untouched, chip.cfg looks like this:

cflags = [
    '-c',  '-O2', '-march=rv32im', '-mabi=ilp32',
    '-fdata-sections', '-ffunction-sections'
]
ldflags = [
    '-O2', '-march=rv32im', '-mabi=ilp32', '-Wl,-gc-sections'
]
user_libs = ['-lm']

For compilation I used ./build_all.py --arch=riscv32 --cc=riscv32-unknown-linux-gnu-gcc --cflags="-O2 -ffunction-sections" --cpu-mhz=125 --chip=vex --board=genesys2 --logdir=log --builddir=bd2.

The results I am getting vary dramatically and I am not sure why (e.g. the results of crc32 vary between 11472 and 900 cycles) . The results of coremark and dhrystone are stable. Is this a bug or am I doing something wrong?

Thanks for your help!

Python's subprocess.run does not find crosscompilation toolchain

Hello,

I am trying to compile and run the benchmarks against our RISC-V based SoC. On my way there I stumbled upon Python's subprocess.run (present in build_all.py) . After populating the list with the arguments, it does not finf my installed crosscompilation toolchain. I tried full paths but nothing seemed to work. When I put the argument (shell=True) it found the toolchain but then as I found out it needs the arguments in a big string and not in a comma seperated list (so a lot more changes to be done). Has anybody else faced this issue? My Python's version is 2.7.15. Thank you in advance!

User guide enhancement suggestion

I was caught out by the naming convention for Python modules. For a speed benchmark the documentation says:

--target-module : This mandatory argument specifies a python module in the pylib directory with definitions of routines to run the benchmark.

I naïvely assumed that "specifies" meant the name of the python file (i.e. xyz.py), it doesn't it means the name of the module (i.e. xyzzy). The documentation might make this explicit for those unfamiliar with Python conventions.

Misleading code size reporting when -ffunction-sections option is used

I have noticed that the code size reporting by benchmark_size.py is a bit misleading when the -ffunction-sections compilation option is used. This is because the text section is split into multiple sections with '.text' suffixed by the function name, for example .text.fred. However the script only looks for precise matches when determining which sections to count. I think the script needs to be modified to scan all section names and include any that match the specified prefix (in this example .text).

Note in docs that pyelftools package is needed to benchmark size

It looks like the documentation notes that Python 3.6 is needed, but doesn't mention that the pyelftools package is also needed.

Specify target architecture

I've tried to compile a riscv32 config, but when using --host=riscv64-unknown-elf as parameter for the configure it expects a riscv64 config. With the current RISC-V GCC toolchain it seems more common to specify a 32bit architecture with -mabi and -march instead of compiling a separate toolschain for 32bit support.

Is it possible to compile a riscv32 config without putting it in a riscv64 folder?
I've tried using the --target parameter for configure, but it still failed to find my config.

Only last argument logged after benchmark fails or times out

Code starting at https://github.com/embench/embench-iot/blob/master/benchmark_speed.py#L204:

    if succeeded:
        return exec_time
    else:
        for arg in arglist:
            if arg == arglist[0]:
                comm = arg
            elif arg == '-ex':
                comm = ' ' + arg
            else:
                comm = " '" + arg + "'"

        log.debug('Args to subprocess:')
        log.debug(f'{comm}')

This results in logging only last argument passed to subprocess instead of all of them. I suppose it was intended to be like this (only difference is usage of '+='):

    if succeeded:
        return exec_time
    else:
        for arg in arglist:
            if arg == arglist[0]:
                comm = arg
            elif arg == '-ex':
                comm += ' ' + arg
            else:
                comm += " '" + arg + "'"

        log.debug('Args to subprocess:')
        log.debug(f'{comm}')

No such file error if 'results' directory doesn't exist when running run_all.py

If the results/ directory already exists then the run_all.py command will cause the result file to be created, but if results/ doesn't exist then the directory won't be created and you get a "No such file or directory: 'results/***.json'" error when it tries to open the file to output results. This differs from the behaviour with logs - the log/ directory always seems to be created before a log file in that directory is created.

This could potential trip new users since the error message doesn't accurately describe the underlying problem.

How to reproduce embench for riscv32

Hi there, I found slides here talking about riscv embench results.

In this slides, it said riscv code size is near to ARM

I tried to run it using command below:

./build_all.py --arch riscv32 --chip generic --board ri5cyverilator --cc riscv-nuclei-elf-gcc --cflags="-c -march=rv32imc -mabi=ilp32 -Os -ffunction-sections -fdata-sections" --ldflags="-Wl,-gc-sections -march=rv32imc -mabi=ilp32 -Os --specs=nosys.specs --specs=nano.specs" --user-libs="-lm" --clean

The riscv-nuclei-elf-gcc version 9.2.0 can be found here https://nucleisys.com/download.php

And I tried to check the code size against the reference data using command python ./benchmark_size.py

Benchmark            size
---------            ----
aha-mont64           1.88
crc32                3.98
cubic               22.55
edn                  2.00
huffbench            2.33
matmult-int          3.32
minver               6.37
nbody                8.77
nettle-aes           1.73
nettle-sha256        1.97
nsichneu             1.33
picojpeg             1.39
qrduino              1.30
sglib-combined       1.62
slre                 1.74
st                   8.46
statemate            1.08
ud                   4.81
wikisort             2.87
---------           -----
Geometric mean       2.84
Geometric SD         2.19
Geometric range      4.94
All benchmarks sized successfully

Absolute results using command python ./benchmark_size.py --absolute:

Benchmark            size
---------            ----
aha-mont64          2,010
crc32               1,130
cubic              35,718
edn                 2,646
huffbench           2,890
matmult-int         1,632
minver              7,442
nbody               8,328
nettle-aes          3,722
nettle-sha256       6,680
nsichneu           15,928
picojpeg            9,678
qrduino             7,554
sglib-combined      3,674
slre                3,822
st                  8,456
statemate           4,838
ud                  3,464
wikisort           12,326
---------           -----
Geometric mean      5,228
Geometric SD            2.27
Geometric range  9,561.58631295538
All benchmarks sized successfully

From the results, you can see cubic, minver, st code size are much worse than reference result, are there any steps wrong, please give me some hints.

Thanks
Huaqi

Explanation / Origin of benchmarks

Hello,

I am looking for an overview / explanation of what the benchmarks do, why were they picked, etc.
Tried looking at the mentioned link: http://beebs.eu/ and it seems to redirect to a bad site.

Info on these benchmarks will be very useful.

thanks
Nagendra

Trouble Running Benchmarks

I'm having some issues running the benchmarks--when I run "make check," all of the benchmarks fail with execution scores of 0. Going into the BEEBS log, I see that they never ran because they are expecting to connect to a GDB server. Assuming I have compiled a Verilator emulator of my design, how do I start a GDB server with it? From what I can see in the Rocket Chip debug guide, I have to run a specific program on the emulator to debug with GDB, but that doesn't seem like what embench is trying to do.

Compiler optimization deletes the bodies_energy function call in the nbody benchmark.

In the nbody benchmark, bodies_energy function returns the total energy, but this return value is not used in the benchmark_body function. Since bodies_energy function is a pure function, the function call of this can be omitted via a compiler optimization when the return value is not used.

I noticed that the clang compiler (Apple clang version 11.0.0 (clang-1100.0.33.8)) can detect this fact, and delete the function call of bodies_energy completely.

I attach the disassembly result of the binary compiled in the following compile flag.

# compiler option
-O2 -march=native -Wall -Wextra -fdata-sections -ffunction-sections

; disassembly result
; only function call of offset_momentum exits. 
_benchmark_body:
1000017a0:	55 	pushq	%rbp
1000017a1:	48 89 e5 	movq	%rsp, %rbp
1000017a4:	41 56 	pushq	%r14
1000017a6:	53 	pushq	%rbx
1000017a7:	85 ff 	testl	%edi, %edi
1000017a9:	7e 27 	jle	39 <_benchmark_body+0x32>
1000017ab:	89 fb 	movl	%edi, %ebx
1000017ad:	4c 8d 35 7c 08 00 00 	leaq	2172(%rip), %r14
1000017b4:	66 2e 0f 1f 84 00 00 00 00 00 	nopw	%cs:(%rax,%rax)
1000017be:	66 90 	nop
1000017c0:	4c 89 f7 	movq	%r14, %rdi
1000017c3:	be 05 00 00 00 	movl	$5, %esi
1000017c8:	e8 b3 fd ff ff 	callq	-589 <_offset_momentum>
1000017cd:	83 c3 ff 	addl	$-1, %ebx
1000017d0:	75 ee 	jne	-18 <_benchmark_body+0x20>
1000017d2:	5b 	popq	%rbx
1000017d3:	41 5e 	popq	%r14
1000017d5:	5d 	popq	%rbp
1000017d6:	c3 	retq
1000017d7:	66 0f 1f 84 00 00 00 00 00 	nopw	(%rax,%rax)

This deletion of bodies_energy function call is problematic because the main computational operations in the nbody benchmark are not performed. So, I think this issue should be fixed.

I think that one of the solutions to this issue is to add an assignment of the bodies_energy function return value to the global variable. The following code is such an example.

static double energy = 0.0;

static int __attribute__ ((noinline))
benchmark_body (int rpt)
{
  int j;

  for (j = 0; j < rpt; j++)
    {
      int i;
      offset_momentum (solar_bodies, BODIES_SIZE);
      /*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
      for (i = 0; i < 100; ++i)
	energy = bodies_energy (solar_bodies, BODIES_SIZE); // <-- assign the return value to global variable `energy`.
      /*printf("%.9f\n", bodies_energy(solar_bodies, BODIES_SIZE)); */
    }
  return 0;
}

Any comments on this issue? Thanks.

Location/repository for board support files

I've had Embench running on Ibex/OpenTitan for a while now but in special branches and I'm looking at getting the necessary stuff merged into their respective master branches.

The major question is which repository should the board support files live in? With the current embench build system the board.cfg and boardsupport.c/boardsupport.h at least needs to live in config/riscv32/boards/<board_name> which is in the embench repository. Do you want all such boardsupport files upstreaming into the embench repository?

My preference would be for such files to live in the Ibex/OpenTitan repositories as it makes maintenance simpler (no upstreaming them into this repository). I can hack my way around the build system already (using the vendoring system we have in Ibex/OpenTitan) to make this happen but it would be neater if the build system supported it properly.

This would involve adding an option to the python build system to specify a custom boards/ directory. I am happy to do this but first wanted to ask what the preferred option is here, I see it as either:

all board support files to be upstreamed into embench-iot
alter build system as above to support custom board/chip directories

Benchmarks should not use assert

Jon Taylor pointed out that assert is not appropriate for benchmarks. See the full mailing list discussion here:

https://lists.librecores.org/pipermail/embench/2019-August/000007.html

I consider this a bug, and this issue is to track a fix to the problem.

decoding results in run_native.py

In run_native.py benchmark runtimes are measured using time -p command. Lets say we have different runs with durations 1.5s and 1.05s, we can simply simulate the results and see the difference:

$ time -p sleep 1.5
real 1.50
user 0.00
sys 0.00
$ time -p sleep 1.05
real 1.05
user 0.00
sys 0.00

However, code in function decode_results will produce the same output for both. This seems to be caused by https://github.com/embench/embench-iot/blob/master/pylib/run_native.py#L66 because:

>>> '{:<03d}'.format(5)
'500'
>>> '{:<03d}'.format(50)
'500'

Is this interpretation of results intentional?

cubic - quad-precision FP problem (ARM/RISCV)

Dear All,

I compiled the program cubic targeting both ARM and RISC-V.
This program uses the long double type, which is interpreted in different ways by ARM and RISC-V: ARM interprets the long double as just a double, so a 64-bit double-precision FP value (http://www.keil.com/support/man/docs/armcc/armcc_chr1359125009502.htm), whereas
RISC-V interprets the long double as a quad-precision 128-bit quad-precision FP value (https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#c-types).

I think the comparison between the sizes of the two codes is not completely fair: with real library support, the code size of the RISC-V program is dominated by the quad-precision FP emulation functions that are present only in the RISC-V code.
With dummy libraries, the result is modified as well because RISC-V handles "larger" values than ARM, and values larger than 2*XLEN are passed by reference to the functions (https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#floating-point-register-convention). I think this is why the RISC-V function uses the stack a lot with a consequent code size increasing.

Best regards,
Matteo

User guide disagrees with build_all.py script

The user guide says:

--chip: This mandatory argument ...
--board: This mandatory argument...

In build_all.py they are not mandatory and have the default value "default"

Python version selection

#!/usr/bin/env python3 at the head of the python script will pick python3.5 if it does exist in the system.
therefore using the following #/usr/bin/env python3.6 will make sure to pick the right python version

Some of the benchmarks are single kernels/algorithms rather than applications.

In some cases I think it makes sense to benchmark a given well-defined algorithm or kernel, but maybe it wouldn't hurt to make some of the benchmarks more realistic.

The nettle-based benchmarks are just running the selected cryptographic algorithms.

Maybe a more interesting benchmark in that realm would be a graft of WireGuard or some similar high performance network + cryptography workload. WireGuard I think is a good candidate since it is fairly small and manageable, and would introduce a very popular stream cipher, ChaCha20 (which is used as the main RNG on Linux, FreeBSD, OpenBSD, and NetBSD), and construction, Poly1305 (which is also one of a couple TLS 1.3 standard AEAD constructions). WireGuard has some cruft in it to link in kernel- and architecture-specific things though, so maybe a bit of work to separate that out and turn it into a self-running benchmark (emulating a connection in a self-contained program); but I think I could give it a shot if that seems like something that would be good.

Benchmark files use inconsistent naming scheme

Many of the benchmarks have the main code in a file of the form src/<benchname>/<benchname>.c. But there are several different benchmarks that violate this:

aha-mont64/mont64.c -- why not aha-mont64.c?
cubic/libcubic.c -- why not cubic.c? (same goes for many others)
qrduino/qrtest.c -- why not qrduino.c? (same goes for sglib-combined)
crc32/crc_32.c -- why not crc32.c?

Is there a good reason to NOT rename them in order to ensure a bit of sanity? (I realize they may have come from all sorts of other original sources.)

Changing int data size based on compiler used

For 8 or 16 bit compilers, size of 'int' is 2 bytes. For example: AVR GCC and Microchip XC16 compiler. This change in data type will change the benchmark values (speed/size).

matmult-int.c and many other source files use int data type:
"
void
Multiply (matrix A, matrix B, matrix Res)
{
register int Outer, Inner, Index;
...
}
"

For an ARM compiler, int size from ARM GCC is 32 bits, but for AVR GCC/XC16 compiler its 16 bit.

A 'int' data type holds the value which specifies number of times a benchmark is iterated through:
libstatemate.c has:
`
#define LOCAL_SCALE_FACTOR 1964
return benchmark_body (LOCAL_SCALE_FACTOR * CPU_MHZ);

static int attribute ((noinline))
benchmark_body (int rpt)
{
..
}
`

For 8 or 16 bit compilers, size of 'int' is 2 bytes. If CPU_MHZ is 32, number of times which benchmark is iterated through should be 62848. Since int size is 2 bytes, iterations will be truncated and result in incorrect values.

So, I think its better to use standard int data types to avoid this confusion.

Repeated attempts to build will fail noisily

Invoking ./build_all.py --arch native --chip default --board default, and then re-invoking it, will cause it to die with messages like this:

AttributeError: 'NoneType' object has no attribute 'stdout'

The reason is that if all files are up to date, the 'res' object is still None.

A quick fix is to indent the 'res' lines, and the preceding "if not succeeded" lines, as shown here:

diff --git a/build_all.py b/build_all.py
index 8bc38be..2b1263d 100755
--- a/build_all.py
+++ b/build_all.py
@@ -483,12 +483,12 @@ def compile_file(f_root, srcdir, bindir, suffix='.c'):
             )
             succeeded = False
 
-    if not succeeded:
-        log.debug('Command was:')
-        log.debug(arglist_to_str(arglist))
+        if not succeeded:
+            log.debug('Command was:')
+            log.debug(arglist_to_str(arglist))
 
-    log.debug(res.stdout.decode('utf-8'))
-    log.debug(res.stderr.decode('utf-8'))
+        log.debug(res.stdout.decode('utf-8'))
+        log.debug(res.stderr.decode('utf-8'))
 
     return succeeded

configure failed whether or not add --with-chip

When run ./configure according to INSTALL, the output is as follows:
checking build system type... x86_64-pc-mingw64
checking host system type... x86_64-pc-mingw64
checking target system type... x86_64-pc-mingw64
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... no
checking whether make supports nested variables... no
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether make supports nested variables... (cached) no
configure: error: Chip config directory "generic" does not exist

Then I check the code in configure, No. 3142 wrote if we can't find $srcdir/config/$arch/chips/$chip, the error will be reported.
So I add the option --with-chip=size-test-gcc, it still reported an error:
configure: error: Chip config directory "size-test-gcc" does not exist

After that, I tried --target and --host options. It still did not work well. So I wonder if there is a problem with the script, or is it my usage is wrong?

Formatted string literals is not supported in Python3.5

The formatted string literals is a feature of Python 3.6. For instance, the following code written in the build_all.py:
log.error(f'ERROR: Architecture "{args.arch}" not found: exiting')

If the code does not support versions below Python 3.6, this is not a problem, but it needs to be declared.

-gc-sections is called -dead_strip in Apple's linker

The files support/dummy-libgcc.c, support/dummy-libc.c, support/dummy-libm.c refer to -gc-sections, which is an option to the (gnu?) linker to remove unused code. The Apple linker uses -dead_strip for apparently the same purpose. Perhaps the dummy-lib*.c's should be worded to accommodate this.

The usage of -gc-sections in config/native/* may be OK, depending on what "native" is supposed to do.

Error running benchmark

$ python build_all.py --arch=/home/odo/benchmarks/embench-iot/config/riscv32/arch.cfg
File "build_all.py", line 153
log.error(f'ERROR: Architecture "{args.arch}" not found: exiting')
^
SyntaxError: invalid syntax

How do I specify the config file.

Empty Executable Files After Successful Compilation

I have been porting embench-iot for some custom RISC-V 32 chips, in this repo. In all, I am compiling with the following parameters:

cflags               : ['-O0', '-g', '-march=rv32im', '-mabi=ilp32', '-nostdlib', '-nostartfiles', '-I/home/galoisuser/gfe/benchmarks/embench-iot/support', '-I/home/galoisuser/gfe/benchmarks/embench-iot/config/riscv32/boards/p1-fpga', '-I/home/galoisuser/gfe/benchmarks/embench-iot/config/riscv32/chips/p1', '-I/home/galoisuser/gfe/benchmarks/embench-iot/config/riscv32', '-DCPU_MHZ=50', '-DWARMUP_HEAT=5']
ldflags              : ['-nostartfiles', '-nostdlib', '-march=rv32im', '-mabi=ilp32', '-lc', '-lgcc']
cc                   : riscv64-unknown-elf-gcc
cc_define1_pattern   : -D{0}
cc_define2_pattern   : -D{0}={1}
cc_incdir_pattern    : -I{0}
cc_input_pattern     : /home/galoisuser/gfe/benchmarks/embench-iot/support/dummy-libc.c
cc_output_pattern    : -o {0}
ld_input_pattern     : /home/galoisuser/gfe/benchmarks/embench-iot/support/link.ld
ld_output_pattern    : -o {0}
user_libs            : {}
dummy_libs           : {}
cpu_mhz              : 50
warmup_heat          : 5
ld                   : riscv64-unknown-elf-gcc

A couple of notes:

I force-added the dummy-libc.c to the cc_input_pattern as if I add it to dummy_libs, it fails to compile or link.
With the above parameters, compilation and linking succeeds, but all the files have undefined _start address, as well as many empty sections:

riscv64-unknown-elf-objdump -x ../build/src/aha-mont64/aha-mont64

../build/src/aha-mont64/aha-mont64:     file format elf32-littleriscv
../build/src/aha-mont64/aha-mont64
architecture: riscv:rv32, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0xc0000000

Program Header:
    LOAD off    0x00000000 vaddr 0xc007f000 paddr 0xc007f000 align 2**12
         filesz 0x00000054 memsz 0x00281000 flags rw-

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  c0000000  c0000000  00000054  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  c0080000  c007f054  00000054  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  c0080000  c0080000  00000000  2**0
                  ALLOC
  3 .stack        00140000  c0080000  c0080000  00001000  2**0
                  ALLOC
  4 .heap         00140000  c01c0000  c01c0000  00001000  2**0
                  ALLOC
SYMBOL TABLE:
c0000000 l    d  .text  00000000 .text
c0080000 l    d  .data  00000000 .data
c0080000 l    d  .bss   00000000 .bss
c0080000 l    d  .stack 00000000 .stack
c01c0000 l    d  .heap  00000000 .heap
c0080800 g       *ABS*  00000000 __global_pointer$
c0080000 g       .bss   00000000 __sbss_start
c0080800 g       .bss   00000000 _gp
c0080000 g       .bss   00000000 __SDATA_BEGIN__
c0080000 g       .data  00000000 __rodata_end
c01c0000 g       .stack 00000000 __freertos_irq_stack_top
c0300000 g       .heap  00000000 _heap_end
c0080000 g       .bss   00000000 __bss_end
c02c0000 g       .heap  00000000 _heap_start
c0080000 g       .bss   00000000 __sbss_end
00000000         *UND*  00000000 _start
c0080000 g       .data  00000000 __rodata_start
c0080000 g       .data  00000000 __data_end
c0080000 g       *ABS*  00000000 __BSS_END__
c0080000 g       .bss   00000000 __bss_start
00040000 g       *ABS*  00000000 _STACK_SIZE
00040000 g       *ABS*  00000000 _HEAP_SIZE
c0180000 g       .stack 00000000 _stack_end
c0080000 g       .data  00000000 __DATA_BEGIN__
c01c0000 g       .stack 00000000 __stack
c0080000 g       *ABS*  00000000 _edata
c0300000 g       .heap  00000000 _end
c02c0000 g       .heap  00000000 _heap
c01c0000 g       .stack 00000000 _stack
c0080000 g       .data  00000000 __data_start

Any ideas what could be going on here? Any help or ideas are appreciated.

~Lyell

Control the amount of output

At present the scripts produce a modest amount of output to console and a larger amount to the log file. It would be useful to have some control over this, for example with --verbose and --quiet options.

Issue filed for future consideration.

Problems with spaces when setting tool flag options

I have noticed that if a user puts extra spaces in their tool options, for example, --cflags “-Os–mtune=size” instead of --cflags “-Os–mtune=size” the build fails complaining about a non existent file. This can be a bit confusing, it certainly confused me until I realized what was going on. This is due to the split(sep=’ ‘) in build_all.py creating empty elements in the cflags list e.g. [ ‘-Os’, ‘’, ‘-mtune=size’ ].

I think all the split(sep=' ') commands just need to be split() which ignores multiple spaces by default.

Documentation creation of baseline results

A number of users have experienced difficulty reproducing the baseline results. This should be added as a tutorial to the documentation.

Standard Citation Template

Hi there

Is it possible to come up with a standardised citation template for Embench? Those of us who reference it in academic publications can create an ad-hoc citation fairly easily, but a consistent one we can copy and paste into our bibliographies would be super helpful.

Currently, my Bibtex entry looks like:

@misc{embench-iot,
title={Embench: Open Benchmarks for Embedded Platforms},
author={David Patterson and Jeremy Bennett and Palmer Dabbelt, Cesare Garlati and G. S. Madhusudan and Trevor Mudge},
howpublished="\url{https://github.com/embench/embench-iot}"
}

Is this suitable, and could something like it be pasted on the front page for everyone to use?

Cheers,
Ben

simple primes benchmark

I have found the program at the below address useful for about the last four years as a benchmark of machines ranging from ATmega2560, various ARM devices including M3, M7, A7, A9, A15, A53, A72, various RISC-V, and x86.

On ILP32 or LP64 machines it requires 8004 bytes of globals plus 16 bytes of stack for countPrimes() and 16 or 32 bytes of stack for main(), plus whatever the runtime library needs for printf() and before main().

I hereby submit it for your consideration.

MIT license.

http://hoult.org/primes.txt

Recreating Reference Platform Size Results

Hi,

I am trying to recreate the reference platform scores as reported here: https://github.com/embench/embench-iot/tree/master/doc#reference-platform

I fetched the AArch32 bare-metal target GCC 9.2 found here.

This comes with binutils 2.33.1 (same as reference platform), and newlib nano 3.1.0 (not the same as reference platform).

To get the newlib 3.3.0, I cloned the newlib repo and checked out the commit with the tag 'newlib-3.3.0'. I then configured and built using these options (that from my understanding, build 'newlib-nano'):

./configure --target=arm-none-eabi --enable-newlib-reent-small --disable-newlib-fvwrite-in-streamio --disable-newlib-fseek-optimization --disable-newlib-wide-orient --enable-newlib-nano-malloc --disable-newlib-unbuf-stream-opt --enable-lite-exit --enable-newlib-global-atexit --enable-newlib-nano-formatted-io --disable-nls

Is this the correct way to build newlib-nano? The README was not explicit as to what options were used to build newlib nano.

My embench-iot/config/arm/chips/cortex-m4/chip.cfg looks like this:

cc = 'arm-none-eabi-gcc'
cflags = [
     '-c', '-Os', '-march=armv7-m', '-mcpu=cortex-m4', '-ffunction-sections', '-mfloat-abi=soft', '-mthumb'
]
ldflags = [
    '-T/path/to/embench-iot/config/arm/boards/stm32f4-discovery/STM32F407XG.ld',
    '-O2', '-Wl,-gc-sections', '-march=armv7-m', '-mcpu=cortex-m4', '-mfloat-abi=soft', '-mthumb', '-specs=nosys.specs'
]

and my build_all.py line like this:
./build_all.py --builddir build_arm_gcc_size --arch arm --chip cortex-m4 --board stm32f4-discovery

However, when running the benchmark_size.py, I get scores that are much larger than 1.00 (upwards of 20!). What am I doing wrong here? Everything seems to fall in line in what's outlined in the README, yet I'm getting such different results.

Edit:
Upon further research, it seems that the benchmarks are including lots of the c library. When comparing benchmarks using another toolchain (in this case, riscv32), those functions are not included. For example, memcpy and strlen are included in the arm elf but are not used anywhere. I think this accounts for the size differences, so the main question is why is this happening?

Thank you,
Joe

Avoiding Full Paths

When I add constraints in chip.cfg, how do I avoid using full paths to files - for example if I am trying to add the file dummy-libc.c which is in the support directory to the compilation as input, how do I formulate the path to that file? I am using a build directory located at ../build wrt the top level of the embench-iot directory. So far, I have tried:

cc_input_pattern = "dummy-libc.c"
cc_input_pattern = "support/dummy-libc.c"
cc_input_pattern = "../support/dummy-libc.c"
cc_input_pattern = "../embench-iot/support/dummy-libc.c"
cc_input_pattern = "../../embench-iot/supportdummy-libc.c"
cc_input_pattern = "embench-iot/support/dummy-libc.c"

All of these fail with the error: riscv64-unknown-elf-gcc: error: dummy-libc.c: No such file or directory.

There's very likely something I am missing here - any ideas?

~Lyell

Does Wikisort need to use double precision constants ?

It seems that this benchmark is measuring floating point more than wiki sorting algorithms.
Using 1.0f, 2.5f, 0.9f in the Testing* routines would be the recommend change.

Thanks.

Getting started issue: Unable to build benchmarks

I might have a silly error. I cloned the latest repo (from 8/14). Changed nothing by way of provided code and config. Tried building with:

python3 build_all.py --builddir . --logdir ./log --arch riscv32 --chip generic --board ri5cyverilator

Looked into the log which reported this:
...
General log

Warning: Compilation of beebsc.c from source directory /home/UNT/ndg0068/embench-iot/support to binary directory /home/UNT/ndg0068/embench-iot/./support failed
Command was:
riscv32-unknown-elf-gcc -I/home/UNT/ndg0068/embench-iot/support -I/home/UNT/ndg0068/embench-iot/config/riscv32/boards/ri5cyverilator -I/home/UNT/ndg0068/embench-iot/config/riscv32/chips/generic -I/home/UNT/ndg0068/embench-iot/config/riscv32 -DCPU_MHZ=1 -DWARMUP_HEAT=1 -o beebsc.o /home/UNT/ndg0068/embench-iot/support/beebsc.c

This is not the correct commandline since the build of beebsc.c must use the '-c' flag to tell it only to compile and not to link.

But I could not find a way to pass the '-c' flag. Adding the --cflags -c does not do it. I tried it and it said:
python3 build_all.py --builddir . --logdir ./log --arch riscv32 --chip generic --board ri5cyverilator --cflags -c
usage: build_all.py [-h] [--builddir BUILDDIR] [--logdir LOGDIR] --arch ARCH
[--chip CHIP] [--board BOARD] [--cc CC] [--ld LD]
[--cflags CFLAGS] [--ldflags LDFLAGS] [--env ENV]
[--cc-define1-pattern CC_DEFINE1_PATTERN]
[--cc-define2-pattern CC_DEFINE2_PATTERN]
[--cc-incdir-pattern CC_INCDIR_PATTERN]
[--cc-input-pattern CC_INPUT_PATTERN]
[--cc-output-pattern CC_OUTPUT_PATTERN]
[--ld-input-pattern LD_INPUT_PATTERN]
[--ld-output-pattern LD_OUTPUT_PATTERN]
[--user-libs USER_LIBS] [--dummy-libs DUMMY_LIBS]
[--cpu-mhz CPU_MHZ] [--warmup-heat WARMUP_HEAT] [-v]
[--clean] [--timeout TIMEOUT]
build_all.py: error: argument --cflags: expected one argument

Does the invocation require the '-c' compiler flag to be passed? If so, it seems that could be included in the build step automatically.

And if the flag has to be passed, then how exactly should the --cflags options be used?

Thanks
Nagendra

Some of benchmarks results off for native default build

I was looking into native builds and tested on mips and I want to check if there is something I'm missing or some problem i missed because these results don't look like what is written as example in doc. So results were off for some of size benchmarks but other look as expected.

So for mips64 gets when running benchmark:
(cflages '-c', '-O2', '-fdata-sections', '-ffunction-sections'
ldflags '-O2', '-Wl,-gc-sections')

--absolute							*default*
	Benchmark           Speed			Benchmark           Speed
	---------           -----			---------           -----
	aha-mont64              0			aha-mont64         4000000.00
	crc32                 100			crc32              4013000.00
	cubic                 400			cubic               10.39
	edn                     0			edn                3984000.00
	huffbench               0			huffbench          4108000.00
	matmult-int             0			matmult-int        4020000.00
	minver                800			minver               5.00
	nbody                 300			nbody               12.59
	nettle-aes              0			nettle-aes         3988000.00
	nettle-sha256           0			nettle-sha256      4000000.00
	nsichneu                0			nsichneu           4001000.00
	picojpeg                0			picojpeg           3747000.00
	qrduino                 0			qrduino            4210000.00
	sglib-combined          0			sglib-combined     4025000.00
	slre                    0			slre               4005000.00
	st                    300			st                  13.84
	statemate               0			statemate          4000000.00
	ud                    200			ud                  20.01
	wikisort              300			wikisort            14.36
	---------           -----			---------           -----
	Geometric mean          0			Geometric mean     71672.15
	Geometric SD          436.03			Geometric SD       375.21
	Geometric range        45			Geometric range    26891982.52
	All benchmarks run successfully			All benchmarks run successfully

And for size results are:
(cflags '-c', '-Os', '-fdata-sections', '-ffunction-sections'
ldflags '-Os', '-Wl,-gc-sections', '-nostartfiles', '-nostdlib')

--absolute							*default*
	Benchmark            size			Benchmark            size
	---------            ----			---------            ----
	aha-mont64          1,904			aha-mont64           1.81
	crc32               1,184			crc32                5.15
	cubic              19,344			cubic                7.83
	edn                 3,200			edn                  2.20
	huffbench           3,296			huffbench            2.02
	matmult-int         1,552			matmult-int          3.70
	minver              2,432			minver               2.26
	nbody               1,840			nbody                2.60
	nettle-aes          5,184			nettle-aes           1.80
	nettle-sha256       6,288			nettle-sha256        1.13
	nsichneu           23,904			nsichneu             1.59
	picojpeg           18,896			picojpeg             2.35
	qrduino            13,872			qrduino              2.28
	sglib-combined      5,712			sglib-combined       2.46
	slre                6,256			slre                 2.58
	st                  2,000			st                   2.27
	statemate           8,832			statemate            2.39
	ud                  2,192			ud                   3.12
	wikisort            9,232			wikisort             2.19
	---------           -----			---------           -----
	Geometric mean      4,787			Geometric mean       2.47
	Geometric SD            2.48			Geometric SD         1.51
	Geometric range     9,936			Geometric range      2.08
	All benchmarks sized successfully		All benchmarks sized successfully

It doesn't look to be empty executable issue.

When i was my testing was based on Embench 05rc1(#42) and arm32. Have tested with same process on raspberypi and got the expected results.

Queries on Embench benchmarks

Hi,

I want to run embench benchmarks for arm cpu cortex-m55F and i am not sure about board,
could you please let me know what options i need to select?

Should it be arc=arm, chip=cortex-m55f, board=generic?

Any help would be greatly appreciated.

Thankyou.

Regards,
Sheena

Run benchmarks on custom system

I'd like to run the benchmarks on a softcore cpu on a fpga. So I need to add a custo startup routine and run custom scripts to run it on the processor. Is this possible with the existing systems? Do I manually need to link the object files of the benchmarks with my link script and the startup file to generate the executable or are there better ways to run a custom benchmark?