Giter VIP home page Giter VIP logo

sbc-bench's Introduction

sbc-bench

SBC is a shortcut for single-board computer and this whole repository is about performance considerations around those devices (with an initial focus on energy efficient server tasks).

This small set of different CPU performance tests focuses on 'headless' operation only (no GPU/display stuff, no floating point number crunching). Unlike many other 'kitchen-sink benchmarks' it tries to produce insights instead of fancy graphs.

It has eight entirely different usage modes:

  • Generate a rough CPU performance assessment for a specific SBC in general (under ideal conditions)
  • Show whether an individual SBC is able to perform the same and if not hopefully answering the question 'why?'
  • Help software developers and hardware designers to improve 'thermal performance' when using the -t and/or -T switches (details/discussion, another example)
  • Graph thermal/consumption charts with -g to measure efficiency of settings/devices
  • Generate a controlled environment with appropriate settings for other benchmark suites like Geekbench (sbc-bench -G) or Phoronix (sbc-bench -P)
  • sbc-bench -k shows kernel version info. Stuff like: still supported? BSP or mainline?
  • The review modes (-r and -R) are designed to help reviewers and participants of 'SBC debug parties' to quickly identify tunables and bottlenecks that need further attention: reports many performance relevant settings, switches them to max performance and lurks from then on in the background to monitor other benchmark executions and tests. By comparing scores made with defaults we are able to directly identify settings that need adjustments
  • Provide basic CLI monitoring functionality through the -m switch

The monitoring now also displays some hardware information when starting:

tk@odroidxu4:~$ sbc-bench -m
Samsung Exynos EXYNOS5800 rev 1, Exynos 5422, Kernel: armv7l, Userland: armhf
CPU sysfs topology (clusters, cpufreq members, clockspeeds)
                 cpufreq   min    max
 CPU    cluster  policy   speed  speed   core type
  0        1        0      200    1400   Cortex-A7 / r0p3
  1        1        0      200    1400   Cortex-A7 / r0p3
  2        1        0      200    1400   Cortex-A7 / r0p3
  3        1        0      200    1400   Cortex-A7 / r0p3
  4        0        4      200    2000   Cortex-A15 / r2p3
  5        0        4      200    2000   Cortex-A15 / r2p3
  6        0        4      200    2000   Cortex-A15 / r2p3
  7        0        4      200    2000   Cortex-A15 / r2p3

Thermal source: /sys/devices/virtual/thermal/thermal_zone0/ (cpu0-thermal)

Time       big.LITTLE   load %cpu %sys %usr %nice %io %irq   Temp
18:18:33:  800/ 500MHz  0.00  18%   0%  17%   0%   0%   0%  25.0°C
18:18:38:  800/ 600MHz  0.00   0%   0%   0%   0%   0%   0%  24.0°C
18:18:43:  700/ 500MHz  0.07   0%   0%   0%   0%   0%   0%  24.0°C
18:18:48:  800/ 600MHz  0.07   0%   0%   0%   0%   0%   0%  24.0°C
^C

The SoCs (system-on-chip) used on today's SBC are that performant that heat dissipation when running full load for some time becomes an issue. The strategies to deal with the problem differ by platform and kernel. We've seen CPU cores being shut down when overheating (Allwinner boards running with original Allwinner software), we know platforms where throttling works pretty well but by switching to a different kernel performance is trashed on exactly the same hardware. Sometimes it's pretty easy to spot what's going on, sometimes vendors cheat on us and it takes some efforts to get a clue what's really happening.

This tool therefore focuses on a controlled environment and intensive monitoring running in the background and being added to results output. The tool returns with a brief performance overview (see screenshot above) but the real information will be uploaded to an online pasteboard service (Rock 5B example). Without checking this detailed output numbers are worthless (since we always need to check what really happened).

Execution

You need Debian Stretch/Buster/Bullseye/Bookworm or Ubuntu Bionic/Focal/Jammy. Older variants are not supported (due to distro packages being way too outdated). Then it's

wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
sudo /bin/bash ./sbc-bench.sh -c

You can also try out the new 'review mode' using -r instead of -c:

sudo /bin/bash ./sbc-bench.sh -r

This takes only a few seconds longer but generates a lot of additional insights especially on new platforms/SBC. It also exposes stuff that might invalidate proper benchmark execution (counterfeit SD cards, USB negotiation problems, 'bad settings' and so on). This mode is designed to provide a sane environment for further benchmark testing executed in another/different shell(s) so stopping the script via [ctrl]-[c] is necessary when done.

Unfortunately to adjust the cpufreq governor and to collect monitoring data execution as root is needed. So do not run this on productive systems or if you don't understand what the script is doing.

Which tools are used and why?

I chose mhz, tinymembench, ramlat, cpuminer, stockfish, 7-zip, cpufetch and OpenSSL's AES benchmarks for the following reasons:

This tool is not a benchmark but instead measures real CPU clockspeeds. This is helpful on platforms where cpufreq support is not available (yet) or we can not rely on the clockspeed values returned by the kernel. This applies to platforms where vendors are cheating (RPi, Amlogic), where weird clockspeed capping occurs for unknown reasons or where actual clockspeeds are set via jumpers while the clockspeeds available to the kernel are derived from device-tree (DT) entries. On a Clearfog Pro routerboard it will look like this for example (DT defines 666/1332 MHz while I configured 800/1600 MHz via jumper):

Checking cpufreq OPP:

Cpufreq OPP: 1332    Measured: 1599 (1598.621/1598.759/1598.324) (+20%)
Cpufreq OPP:  666    Measured:  799    (799.502/798.295/799.115) (+20%)

We call mhz twice. At the begin of the benchmark with an idle and cold system walking through all cpufreq OPP and directly after the most demanding benchmark has finished with the device still under full load to see whether behaviour changes when SoC is overheated. This is on a Thundercomm Dragonboard 845c. Prior to benchmark execution it looked like this:

Checking cpufreq OPP for cpu4-cpu7 (Qualcomm Kryo 3XX Gold):

Cpufreq OPP: 2803    Measured: 2704 (2705.057/2704.717/2704.717)     (-3.5%)
Cpufreq OPP: 2649    Measured: 2704 (2704.830/2704.717/2704.717)     (+2.1%)

When running the multi-threaded 7zip benchmark, the SoC temperature exceeds 80°C and afterwards the 2803 MHz cpufreq OPP is gone while the reported 2649 MHz are in reality only ~1940:

Checking cpufreq OPP for cpu4-cpu7 (Qualcomm Kryo 3XX Gold):

Cpufreq OPP: 2649    Measured: 1940 (1955.570/1943.795/1922.274)    (-26.8%)

Unlike other 'RAM benchmarks' tinymembench checks for both memory bandwidth and latency in a lot of variations so it's even possible to get some insights about internal cache sizes. It also measures each mode at least two times and if sample standard deviation exceeds 0.1%, it is shown in brackets next to the result. So it's pretty easy to spot background activity ruining benchmark results.

On hybrid systems with different CPU cores (big.LITTLE, DynamicIQ, Alder/Raptor Lake) we pin execution one time to an efficiency/little and one time to a performance/big core to know the difference this makes. For the sake of simplicity we output memcpy and memset numbers at the end of the benchmark. On an overclocked RPi 3 B+ (arm_freq=1570, over_voltage=4, core_freq=500, sdram_freq=510, over_voltage_sdram=2) this will look like this

Memory performance:
memcpy: 1316.0 MB/s (0.8%)
memset: 1933.9 MB/s

On a NanoPC T4 (RK3399, 2xA72/4xA53 CPU cores) this will look like this with mainline kernel and conservative settings without any optimizations yet:

Memory performance:
memcpy: 2054.9 MB/s
memset: 8453.0 MB/s (0.2%)
memcpy: 4238.8 MB/s (0.4%)
memset: 9082.5 MB/s (0.9%)

(first two lines show execution on a little A53 core, the last ones when pinned to an A72 big core)

On ARM SoCs CPU and GPU/VPU usually share memory access so it's worth a try to experiment with disabling HDMI/GPU for headless use cases. Often memory bandwidth and therefore overall performance increases. Same when switching between kernel branches.

Provides some insights about cache sizes/speed and memory latency/bandwidth. Stuff like this.

Helps identifying CPUs/SoCs and also provides detailed info about them in review mode.

Prior to adding stockfish on most platforms this was the most demanding benchmark of the six and pretty efficient to check for appropriate heat dissipation and even instabilities under load. It makes heavy use of SIMD optimizations (NEON on ARM and SSE/AVX on x86) therefore generating more heat than unoptimized 'standard' code.

Heavy SIMD optimizations aren't really common, the generated scores depend a lot on compiler version and therefore this test is optional. Unless you execute sbc-bench -c or with MODE=extensive it will be skipped since results can be misleading. So consider this being a load generator to check whether your board will start to throttle or becomes unstable but take the benchmark numbers with a grain of salt unless you're a programmer and know what NEON, SSE and AVX really are and whether your application can make use of.

A typical result (Rock 5B with Ubuntu Focal) will look like this:

Cpuminer total scores (5 minutes execution): 25.32,25.31,25.30,25.29,25.28,25.12 kH/s

(result variation in this case is ok since all results are more or less the same. If the board would've started throttling or heavy background activitiy would've happened the later numbers would be much lower than the first ones)

Stockfish (open source chess engine) also makes heavy use of SIMD extensions but is heavy on memory access too putting even more load on devices than cpuminer which doesn't access RAM that much or at all since working set fits inside CPU caches.

As with cpuminer this test is optional (sbc-bench -s or MODE=extensive needed) since not representing any broader use case but being more of a stressor / load generator exposing thermal and stability issues. Consumption figures are higher compared to cpuminer since stockfish also stresses the DRAM interface and at least it's sufficient to expose a reliability issue with Rock 5B (most probably today RK3588 in general) since running this benchmark reliably freezes my Rock 5B at 2112 MHz DRAM clock.

7-zip's internal benchmark mode is a pretty good representation of 'server workloads in general'. When running on all cores in parallel it doesn't utilize CPU cores fully (at least not on ARM SBC, on x86_64 with Hyperthreading and performant memory controllers it's a different story), it depends somewhat on memory performance (low latency more important than high bandwidth) and amount of available memory. When running fully parallel on systems that have many cores but are low on memory we see just as in reality the kernel either killing processes due to 'out of memory' or starting to swap if configured.

On big.LITTLE systems we start with one run pinned to a little core followed by one pinned to a big core. Then follow 3 consecutive runs using all available cores. The results might look like this:

7-zip total scores (3 consecutive runs): 3313,3285,3050
7-zip total scores (3 consecutive runs): 3613,3598,3633
7-zip total scores (3 consecutive runs): 7382,7407,7426

(this is a RPi 3 B+ with latest firmware update applied destroying performance showing throttling symptoms followed by a Rock64 at 1.4GHz with Armbian standard settings passively cooled by small heatsink followed by an octa-core NanoPi Fire3 also at 1.4 GHz but with heatsink and fan this time)

How to interpret 7-zip MIPS scores: 7-zip ist all about integer CPU and memory performance. And by looking at the 'total score' (running on all CPU cores in parallel) you need to keep in mind that only a few use cases are really parallel and limited to 'integer performance'. That's why it's written 'server workloads in general' above since this applies here and overall performance scales well with count of CPU cores.

If your use case is different (desktop, rendering, video editing, number crunching and so on that either depends more on single-threaded performance and/or involves floating point arithmetic, vector extensions or GPGPU), 7-zip MIPS are rather irrelevant for you since they do not even remotely represent your use case!

With 'server workloads' in mind 7-zip MIPS give an estimate of what to expect. A system showing two times more 7-zip MIPS compared to another will be able to run more (maybe twice as much or even more) daemons/tasks as long as the stuff is only CPU bound. How an individual daemon/task performs is a totally different story and needs to be checked (single-core 7-zip MIPS).

With a system scoring 125% compared to another it's a different story and you need to examine individual results and your use case closely (time to switch from staring at numbers to Active Benchmarking).

A nice example is comparing two ARMv8 server designs: 32 Neoverse-N1 cores (Amazon m6g.8xlarge VM) vs. 96 ThunderX1 cores (dual CPU ThunderX CN8890 blade). Both systems share an identical multi-core score (~110000 7-zip MIPS) but any real server workload will perform better on the Neoverse-N1 design. Single-threaded performance there is at least twice as high, memory performance way better and this will make the difference with real-world stuff unless the use case is really all about 100% CPU utilisation on all cores all the time.

If those 7-zip MIPS apply only to a few selected use cases as performance indicator why are they used in sbc-bench?

  • 7-zip's multi-threaded benchmark is that demanding that it can be used to check for power supply issues and thermal/throttling (that's why it's executed 3 times in a row)
  • Results are not that much affected by compiler version which allows to compare scores made in different years with different OS versions (confirmed with Debian Stretch/Buster/Bullseye and Ubuntu Bionic/Focal/Jammy or in other words: GCC 6.3 - 10.2). Majority of kitchen-sink benchmarks overly depend on compiler version / settings and as such usually it makes comparing results from different years pointless
  • Also the benchmark is not known to perform completely different when built for ARMv6, ARMv7 oder ARMv8 (the infamous sysbench cpu benchmark on the other hand 'performs' 10-15 times better on a 64-bit Raspbian which is not related to 64-bit vs. 32-bit but just due to ARMv8 ISA having a divide instruction)
  • To be able to get comparable scores spanning different years/libs/compilers submitted results are cherry picked to ensure 7-zip version being 16.02 or lower since on some platforms more recent 7-zip versions perform way better. Starting with v0.9.64 sbc-bench tries to build p7zip 16.02 when a higher version is detected.
  • Unlike many other kitchen-sink benchmarks RAM access / memory performance matters (sysbench cpu for example runs completely inside CPU caches). With this benchmark it's easy to spot memory performance issues like this (after switching bootloaders DDR4 RAM got clocked with just 333 instead of the former 1056 MHz). It's one of the 'cheapest' tools for regression testing but unfortunately not widely used there
  • 7-zip allows to spot different thermal throttling strategies for example throttling the memory controller instead of or in addition to CPU cores on certain platforms
  • the multi-core test is also nice to spot internal CPU/SoC bottlenecks and/or scheduler improvements

A good example for the latter is Odroid XU4, three times tested with different kernel and OS versions (Stretch, Bionic and Focal which all build packages with different GCC versions). Memory performance remained the same (for a way to quickly check this see included script snippets) but for whatever reasons only the multi-threaded performance fluctuated over time:

Kernel / Compiler 7-zip single 7-zip multi CPU utilisation compression CPU utilisation decompression
Kernel 4.9 / GCC 6.3 1622 6370 64% 78%
Kernel 4.14 / GCC 7.3 1633 7100 64% 78%
Kernel 5.4 / GCC 9.3 1604 8980 94% 84%

Smells like a scheduler problem with kernel 4.x. Only more detailed tests with more kernel/GCC combinations or switching to Active Benchmarking could really tell.

This test solely focuses on AES performance (VPN use case, full disk encryption). The test tries to quickly confirm whether an ARM SoC can make use of special crypto engines. Some SoC vendors don't care, some add proprietary engines to their SoCs (Marvell's CESA as an example), some vendors chose to license ARM's 'ARMv8 Crypto Extensions' (see here for some insights). So in case a board runs with an 64-bit ARM SoC this simple test shows the presence of crypto extensions or not.

Results might look like this on an overclocked Raspberry Pi 3 B+ at 1570 MHz lacking any crypto acceleration:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      39393.73k    54173.16k    60220.67k    61720.92k    62518.61k
aes-192-cbc      35676.65k    46311.68k    51358.21k    52840.11k    53157.89k
aes-256-cbc      33339.62k    42962.13k    46476.37k    47619.07k    47925.93k

Vs. an Orange Pi Zero Plus based on Allwinner H5 heavily underclocked at just 816 MHz:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     102568.41k   274205.76k   458456.23k   569923.58k   613422.42k
aes-192-cbc      95781.66k   235775.72k   366295.72k   435745.79k   461294.25k
aes-256-cbc      91725.44k   211677.08k   313433.77k   362907.31k   380482.90k

ARMv8 Crypto Extensions make the difference here. Even at almost half the CPU clockspeed with small data chunks at least 2.5 times faster and up to 9 times faster with larger chunks. Looking at different chunk sizes makes a lot of sense since some proprietary crypto engines suffer from high initialization overhead. See these numbers for a Banana Pi R2 based on a MediaTek MT7623 with proprietary crypto engine after compiling own kernel and OpenSSL (sources):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc        519.15k     1784.13k     6315.78k    25199.27k   124499.22k
aes-192-cbc        512.39k     1794.01k     6375.59k    25382.23k   118693.89k
aes-256-cbc        508.30k     1795.05k     6339.93k    25042.60k   112943.10k

Ensuring proper benchmark execution

Benchmarking a system that is otherwise busy will result in numbers without meaning. Therefore it's important to ensure the system is as idle as possible. That's the reason sbc-bench will only start once '1 min average load' is reported as below 0.1 or CPU utilization less than 2.5% for 30 seconds:

Of course this is not sufficient since background tasks might become active later or cron jobs result in some peak activity in between. As much such services as possible should be stopped prior to benchmark execution or in best case a rather minimal image should be used for testing. On the other hand sbc-bench can also easily be used to compare 'desktop' and 'minimal' images.

But comparisons only make some sense if execution of the benchmark can be observed. That's what sbc-bench's background monitoring is for that will be appended to detailed result list. We can there look for the following problems:

Swapping

The 7-zip benchmark when running on all cores can result in the system starting to swap when running low on memory. A good example for an affected board is the inexpensive NanoPi Fire3 with 8 A53 cores but just 1 GB DRAM. When we search in the detailed result output for Swap we'll find 2 occurences. One check prior to the benchmarks and one afterwards. With a Fire3 this might look like:

Swap:          495M          0B        495M
Swap:          495M         34M        460M

So we know swapping has happened which negatively affected performance to some degree based on how swap is implemented. If swapping to SD card is configured performance will be severely impacted but in this case since it's a recent Armbian image the effects are negligible since Armbian implements zram based swap in the meantime (that's why kind of swap is also recorded in detailed result list).

While executing the multi-core 7-zip benchmark monitoring looked like this:

System health while running 7-zip multi core benchmark:

Time       big.LITTLE   load %cpu %sys %usr %nice %io %irq   Temp
10:50:25: 1400/1400MHz  6.23   9%   0%   8%   0%   0%   0%  44.0°C
10:50:58: 1400/1400MHz  5.16  50%   0%  50%   0%   0%   0%  54.0°C
10:51:29: 1400/1400MHz  5.63  74%   0%  73%   0%   0%   0%  58.0°C
10:52:00: 1400/1400MHz  6.23  80%   0%  79%   0%   0%   0%  59.0°C
10:52:31: 1400/1400MHz  6.39  72%   0%  71%   0%   0%   0%  56.0°C

Always 0% in the %io column reported so not a big deal. With swap on SD card especially when using cards with low random IO performance we would've seen high occurences of %iowait activity and way lower performance numbers.

Background activity

We have 3 benchmark executions that run completely single threaded: tinymembench, the first 7-zip run limited to a single CPU core and the openssl test. In all these cases the overall %cpu percentage has to match count of CPU cores (the first two lines can be ignored). So on an octa-core board like NanoPi Fire3 it has to show exactly 12% and nothing more:

Time       big.LITTLE   load %cpu %sys %usr %nice %io %irq   Temp
10:40:05: 1400/1400MHz  0.18   2%   0%   0%   0%   1%   0%  40.0°C
10:41:05: 1400/1400MHz  0.63  10%   0%  10%   0%   0%   0%  44.0°C
10:42:05: 1400/1400MHz  0.94  12%   0%  12%   0%   0%   0%  44.0°C
10:43:05: 1400/1400MHz  0.98  12%   0%  12%   0%   0%   0%  40.0°C
10:44:05: 1400/1400MHz  0.99  12%   0%  12%   0%   0%   0%  40.0°C
10:45:05: 1400/1400MHz  1.00  12%   0%  12%   0%   0%   0%  40.0°C
10:46:06: 1400/1400MHz  1.04  12%   0%  12%   0%   0%   0%  40.0°C

On a dual-core board we're talking about 50% max, on hexa-cores it's 16% and on a quad-core board it must not exceed 25% (100 / 4):

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:18:10: 1392MHz  1.05  17%   2%  15%   0%   0%   0%  59.5°C
10:19:10: 1392MHz  0.95  21%   0%  21%   0%   0%   0%  62.5°C
10:20:10: 1392MHz  1.02  25%   0%  25%   0%   0%   0%  61.7°C
10:21:10: 1392MHz  1.13  27%   1%  26%   0%   0%   0%  59.5°C
10:22:10: 1392MHz  1.05  25%   0%  25%   0%   0%   0%  60.0°C
10:23:10: 1392MHz  1.09  25%   0%  25%   0%   0%   0%  61.2°C
10:24:10: 1392MHz  1.03  25%   0%  25%   0%   0%   0%  61.7°C

In this case we were able to spot some background activity in this line:

10:21:10: 1392MHz  1.13  27%   1%  26%   0%   0%   0%  59.5°C

$something happened in parallel which will slightly lower the generated benchmark score. While 2% CPU utilisation for other stuff won't hurt that much at least we need to have an eye on this since when there are higher utilisation numbers reported when running the single threaded stuff the system shows way too much background activity to report reasonable benchmark scores. Then we simply generated numbers without meaning.

Throttling

Depending on settings (kernel or some 'firmware' controlling the hardware) the clockspeeds might be dynamically reduced when the SoC starts to overheat. When clockspeeds are reduced then this obviously slows down operation.

sbc-bench continually monitors the clockspeeds but since we can only query every few seconds we might not catch short clockspeed decreases. That's why we check whether the kernel's cpufreq driver supports statistics. If true we record contents of stats/time_in_state prior to and after benchmark execution and compare afterwards. This way we are able to detect even short amounts of downclocking which will result in a warning like this: ATTENTION: Throttling occured. Check the log for details.

The detailed log then will contain information how much time (in milliseconds) has been spent on which clockspeed while executing the benchmarks. Might look like this on a NanoPC T4 without fan (only vendor's heatsink) after running the full set (NEON test included which resulted in the big cluster clocking down to even 408 MHz):

Throttling statistics (time spent on each cpufreq OPP) for CPUs 4-5:

1800 MHz: 1344.39 sec
1608 MHz:  372.95 sec
1416 MHz:  117.69 sec
1200 MHz:   48.28 sec
1008 MHz:   41.58 sec
 816 MHz:   55.24 sec
 600 MHz:  127.08 sec
 408 MHz:  352.72 sec

Important: to get throttling notifications running a kernel with CONFIG_CPU_FREQ_STAT=y is needed since otherwise cpufreq statistics are not available. And this will not work on Raspberries since there cpufreq driver has not the slightest idea what's going on.

And all of this doesn't work reliably on x86_64. Here you need to check 7-zip, cpuminer or stockfish scores. If they got lower during execution your device ran into thermal or powercapping issues.

Unattended execution

If sbc-bench should benchmark in an automated fashion then exporting MODE=unattended prior to execution will prevent warning dialogs but of course sbc-bench will still check whether average load or CPU utilization is too high and refuse to start since benchmarking a busy system is useless.

Everything sent to stdout can be ignored (but parsing for 'check the log' is highly recommended since hinting at too much background activity and/or swapping resulting in numbers without meaning instead of benchmark scores). Full benchmark results are available at /var/log/sbc-bench.log with the last line containing a performance summary. So something like this could be used for regression testing and similar stuff:

MODE=unattended sbc-bench.sh -c | grep -q 'check the log' || tail -n1 /var/log/sbc-bench.log

Extensive mode

When exporting MODE=extensive (not compatible with MODE=unattended so use either/or) then sbc-bench conducts additional tests:

  • the openssl benchmarks will also be executed in parallel on all CPU cores (takes an additional minute)
  • the cpuminer test will be fired up (5 more minutes)
  • the stockfish stress tester will be fired up 3 times to check further for throttling and stability issues
  • on ARM/RISC-V SoCs with clusters of different CPU cores (e.g. RK3399 with 4 x Cortex-A53 and 2 x Cortex-A72) additional multi-threaded 7-zip tests per cluster are done (no duration estimate possible since depends on SoC architecture)

This operation mode will be extended further over time to get insights into SoC internals.

MaxKHz environment variable

If $MaxKHz is exported prior to benchmark execution (e.g. by MODE=extensive MaxKHz=1416000 sbc-bench.sh) then cpufreq OPP higher than this value are skipped. On many platforms this allows CPU core comparisons at same clockspeeds (e.g. limiting all cores to 1.8 GHz on RK3588 or 1.4 GHz on RK3399). For a list of available values check

cat /sys/devices/system/cpu/cpufreq/policy?/scaling_available_frequencies

CPUINFOFILE environment variable

If $CPUINFOFILE is exported prior to benchmark execution then SoC guessing and similar stuff happens not based on /proc/cpuinfo but on the supplied file that obviously needs to have a compatible format.

ExecuteCommand environment variable

If $ExecuteCommand is exported prior to review mode (-r/-R) then instead of sbc-bench waiting/monitoring external benchmark executions it will execute whatever will be exported. So if you want to do a simple throttling test using stress-ng for example you would execute ExecuteCommand="stress-ng --cpu 0 -t 60m" sbc-bench.sh -R and check the output for throttling afterwards.

Though with such a goal in mind the better approach is running not stressors but benchmarks like 7-zip or cpuminer for a while since dropping scores over time are essentially a throttling indicator and this way you can spot other areas of throttling too (e.g. memory controller on recent Intel designs exceeding thermal tresholds lower than those for cpufreq throttling).

Interpreting results

The whole point of sbc-bench being started in the first place was trying to replace the casual 'fire and forget' benchmarking done by SBC reviewers with a controlled execution of benchmarks in a fully monitored environment to get an idea why benchmark scores are as they are. A lot of stuff can go wrong! And in 'fire and forget' mode almost always unnoticed.

The reasons why monitoring is absolutely necessary and what 'SBC reviewers' (especially majority of 'Youtubers') usually forget to check/mention as follows:

Measured clockspeeds

It's plain stupid to trust into the clockspeeds a certain device pretends to use. Single-Board Computers mostly rely on ARM SoCs originating from the 'Android e-waste' world and there cheating is rather norm than exception.

Faking clockspeeds is pretty easy, as such we always measure (see above). Before conducting any benchmarks sbc-bench walks through all cpufreq operation points to check them. And it does the same for the highest clockspeeds when benchmarking has finished.

Looks like this when clockspeeds are fake:

Tinkerboard: fake 2.0 GHz vs. real 1.8 GHz
Checking cpufreq OPP (Cortex-A17):

Cpufreq OPP: 1992    Measured: 1793 (1796.339/1793.356/1789.489)    (-10.0%)
Cpufreq OPP: 1920    Measured: 1792 (1793.825/1793.713/1789.355)     (-6.7%)
Cpufreq OPP: 1896    Measured: 1793 (1796.309/1793.640/1791.471)     (-5.4%)
Cpufreq OPP: 1800    Measured: 1793 (1794.914/1793.996/1790.777)
Cpufreq OPP: 1704    Measured: 1698 (1699.339/1698.236/1697.833)
...
Amlogic S905L2 TV box: fake 2.0 GHz vs. real 1.2 GHz
Checking cpufreq OPP (Cortex-A53):

Cpufreq OPP: 2016    Measured: 1197 (1197.660/1197.605/1197.411)    (-40.6%)
Cpufreq OPP: 1752    Measured: 1196 (1197.022/1196.370/1195.166)    (-31.7%)
Cpufreq OPP: 1536    Measured: 1196 (1197.286/1197.175/1195.540)    (-22.1%)
Cpufreq OPP: 1416    Measured: 1196 (1197.549/1197.438/1195.637)    (-15.5%)
Cpufreq OPP: 1200    Measured: 1197 (1197.535/1197.480/1197.397)
Cpufreq OPP: 1000    Measured:  996    (998.145/997.446/994.960)
Cpufreq OPP:  667    Measured:  663    (664.189/663.662/661.729)
...
Orange Pi 5: fake 2.4 GHz vs. real 2.2 GHz
Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2400    Measured: 2221 (2221.092/2221.044/2220.997)     (-7.5%)
Cpufreq OPP: 2352    Measured: 2220 (2220.519/2220.472/2220.328)     (-5.6%)
Cpufreq OPP: 2304    Measured: 2219 (2219.994/2219.947/2219.947)     (-3.7%)
Cpufreq OPP: 2256    Measured: 2219 (2219.565/2219.517/2219.422)     (-1.6%)
Cpufreq OPP: 2208    Measured: 2197 (2197.609/2197.562/2197.469)
...

Swapping

Swapping happens if physical RAM gets depleted and RAM contents are either compressed (zram/zswap) or transferred to slow storage (zswap/traditional swap). Both tasks harm performance, especially swap on storage used by SBCs (with low random I/O performance) is horribly slow.

Results are invalid and usually all you can do is to retest on a device with higher RAM capacity. More info on the topic above.

zswap combined with zram

Zswap and zram are mutually exclusive so use either/or. In case zswap is configured on top of zram once swapping starts performance will be more harmed compared to zswap or zram alone since the kernel will compress memory pages twice.

oom-killer

If no swap is configured or swap space is not sufficiently large enough then the kernel decides out of memory (oom) and kills the process in question.

If this happens you won't get benchmark scores and might need to stop memory hungry processes (e.g. disabling temporarely a desktop environment, then rebooting and rechecking), tools like ps_mem might ease the task.

In case no swap is configured you might change that but will then most probably run into this.

Background activity

It should be obvious that only an absolutely idle system can be benchmarked properly since if the benchmark program has to fight with other processes for CPU or memory resources the scores will suffer.

Results are invalid, more on this above.

Powercap

This is Intel/AMD stuff. Their CPUs are restricted by certain limits: thermal throttling or e.g. cores allowed to clock higher with single-threaded loads compared to multi-threaded.

But there are also power limits that can be set by the device maker: a passively cooled notebook might ship with different settings than a huge desktop with plenty of thermal headroom. Since this x86 stuff is kinda off-topic, simply check this review for example and search for 'powercap-info' there.

Thermal throttling

This is an attempt to prevent overheating by reducing consumption with the immediate effect of reduced performance. If it has happened of course results are invalid.

Background: one or more thermal sensors in SoC/CPU are used to determine warning and critical temperatures to then take measures:

Normal behaviour

Downclocking CPU cores when temperatures get critical is the usual strategy, for more details see above.

The detailed sbc-bench output contains a monitoring section and in case throttling happens over a time period long enough then the reduced clockspeeds can be spotted easily:

Example of a Tinkerboard starting to throttle at 70°C and clocking down to 1200 MHz
##########################################################################

Thermal source: /sys/class/hwmon/hwmon0/ (cpu_thermal)

System health while running tinymembench:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:52:40: 1800MHz  1.87  27%   9%  16%   0%   0%   0%  66.2°C  
10:52:50: 1800MHz  1.89  29%   2%  26%   0%   0%   0%  66.5°C  
10:53:00: 1800MHz  1.91  29%   3%  26%   0%   0%   0%  67.7°C  
10:53:10: 1800MHz  1.77  27%   1%  25%   0%   0%   0%  68.8°C  
10:53:20: 1800MHz  1.65  28%   2%  26%   0%   0%   0%  68.8°C  
10:53:30: 1800MHz  1.70  27%   1%  25%   0%   0%   0%  70.0°C  
10:53:40: 1800MHz  1.67  27%   2%  25%   0%   0%   0%  69.2°C  
10:53:51: 1800MHz  1.57  28%   2%  26%   0%   0%   0%  69.6°C  
10:54:01: 1800MHz  1.56  28%   2%  26%   0%   0%   0%  69.2°C  
10:54:11: 1704MHz  1.63  29%   2%  26%   0%   0%   0%  69.6°C  
10:54:21: 1800MHz  1.77  29%   2%  26%   0%   0%   0%  69.2°C  
10:54:31: 1704MHz  1.96  28%   2%  26%   0%   0%   0%  70.4°C  
10:54:41: 1704MHz  1.81  28%   2%  25%   0%   0%   0%  70.4°C  
10:54:52: 1608MHz  1.68  29%   2%  26%   0%   0%   0%  70.4°C  
10:55:02: 1800MHz  1.66  28%   2%  25%   0%   0%   0%  69.6°C  

System health while running ramlat:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:55:07: 1800MHz  1.61  27%   7%  19%   0%   0%   0%  72.1°C  
10:55:10: 1800MHz  1.56  26%   1%  25%   0%   0%   0%  70.4°C  
10:55:13: 1608MHz  1.56  26%   0%  25%   0%   0%   0%  70.4°C  
10:55:16: 1800MHz  1.59  27%   1%  25%   0%   0%   0%  69.6°C  
10:55:19: 1800MHz  1.59  26%   1%  25%   0%   0%   0%  68.8°C  
10:55:22: 1800MHz  1.70  27%   1%  25%   0%   0%   0%  69.2°C  
10:55:25: 1800MHz  1.73  27%   1%  25%   0%   0%   0%  68.8°C  
10:55:28: 1800MHz  1.73  27%   1%  25%   0%   0%   0%  68.8°C  
10:55:32: 1800MHz  1.67  26%   1%  25%   0%   0%   0%  69.6°C  
10:55:35: 1800MHz  1.62  26%   1%  25%   0%   0%   0%  68.8°C  
10:55:38: 1800MHz  1.62  26%   1%  25%   0%   0%   0%  69.2°C  
10:55:41: 1800MHz  1.57  26%   1%  24%   0%   0%   0%  69.2°C  

System health while running OpenSSL benchmark:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:55:42: 1800MHz  1.57  27%   7%  19%   0%   0%   0%  74.2°C  
10:55:58: 1512MHz  1.44  26%   1%  25%   0%   0%   0%  70.4°C  
10:56:14: 1512MHz  1.41  26%   1%  25%   0%   0%   0%  71.2°C  
10:56:30: 1512MHz  1.37  26%   1%  25%   0%   0%   0%  69.6°C  
10:56:46: 1800MHz  1.43  26%   0%  25%   0%   0%   0%  70.0°C  
10:57:03: 1512MHz  1.34  26%   1%  25%   0%   0%   0%  70.0°C  
10:57:19: 1704MHz  1.33  26%   1%  25%   0%   0%   0%  69.6°C  

System health while running 7-zip single core benchmark:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:57:30: 1512MHz  1.26  27%   6%  20%   0%   0%   0%  70.8°C  
10:57:38: 1704MHz  1.32  26%   1%  25%   0%   0%   0%  70.4°C  
10:57:46: 1608MHz  1.34  26%   1%  25%   0%   0%   0%  69.6°C  
10:57:55: 1704MHz  1.31  26%   1%  25%   0%   0%   0%  70.0°C  
10:58:03: 1608MHz  1.50  26%   1%  25%   0%   0%   0%  69.6°C  
10:58:11: 1608MHz  1.42  26%   1%  25%   0%   0%   0%  69.2°C  
10:58:19: 1800MHz  1.39  26%   1%  24%   0%   0%   0%  70.0°C  
10:58:27: 1704MHz  1.49  26%   1%  24%   0%   0%   0%  69.2°C  
10:58:35: 1704MHz  1.57  26%   1%  24%   0%   0%   0%  70.4°C  
10:58:43: 1512MHz  1.52  26%   1%  24%   0%   0%   0%  70.0°C  
10:58:51: 1704MHz  1.44  26%   1%  24%   0%   0%   0%  69.2°C  
10:58:59: 1800MHz  1.41  27%   2%  24%   0%   0%   0%  70.4°C  
10:59:07: 1704MHz  1.34  26%   1%  25%   0%   0%   0%  70.8°C  

System health while running 7-zip multi core benchmark:

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
10:59:10: 1512MHz  1.32  27%   5%  21%   0%   0%   0%  70.8°C  
10:59:28: 1416MHz  2.06  88%   2%  85%   0%   0%   0%  74.6°C  
10:59:47: 1416MHz  2.79  91%   2%  89%   0%   0%   0%  75.4°C  
11:00:06: 1200MHz  3.13  90%   3%  86%   0%   0%   0%  74.6°C  
11:00:23: 1416MHz  3.57  87%   3%  83%   0%   0%   0%  74.6°C  
11:00:39: 1416MHz  3.73  96%   4%  92%   0%   0%   0%  75.0°C  
11:00:55: 1416MHz  3.81  85%   2%  82%   0%   0%   0%  74.2°C  
11:01:13: 1200MHz  4.01  94%   1%  92%   0%   0%   0%  74.6°C  
11:01:33: 1200MHz  4.02  92%   2%  90%   0%   0%   0%  75.8°C  
11:01:53: 1200MHz  4.23  88%   3%  84%   0%   0%   0%  73.3°C  
11:02:10: 1200MHz  4.29  88%   3%  85%   0%   0%   0%  75.0°C  
11:02:26: 1416MHz  4.10  76%   3%  72%   0%   0%   0%  75.0°C  
11:02:42: 1512MHz  3.84  81%   2%  78%   0%   0%   0%  71.2°C  
11:03:01: 1200MHz  3.94  88%   2%  86%   0%   0%   0%  75.4°C  
11:03:19: 1416MHz  3.94  92%   2%  89%   0%   0%   0%  75.0°C  
11:03:36: 1416MHz  3.82  89%   3%  86%   0%   0%   0%  75.0°C  
11:03:53: 1512MHz  4.02  88%   2%  85%   0%   0%   0%  75.0°C  
11:04:10: 1416MHz  4.02  94%   4%  90%   0%   0%   0%  74.2°C  
11:04:28: 1200MHz  3.94  90%   2%  87%   0%   0%   0%  75.8°C  

##########################################################################

Though the monitoring output only taking samples every few seconds can't spot any peaks or dips as such we also try to report cpufreq statistics (if available). This might look like this:

Aforementioned Tinkerboard even clocked down shortly to 816 MHz
##########################################################################

Throttling statistics (time spent on each cpufreq OPP):

1800 MHz:  173.64 sec
1704 MHz:   68.23 sec
1608 MHz:   65.95 sec
1512 MHz:  157.73 sec
1416 MHz:  131.39 sec
1200 MHz:  101.11 sec
1008 MHz:   12.39 sec
 816 MHz:    0.20 sec
 696 MHz:       0 sec
 600 MHz:       0 sec
 408 MHz:       0 sec

##########################################################################
Allwinner H5 throttling just for a short amount of time
##########################################################################

Throttling statistics (time spent on each cpufreq OPP):

1368 MHz:  672.97 sec
1296 MHz:    3.62 sec
1200 MHz:       0 sec
1056 MHz:       0 sec
 816 MHz:       0 sec
 648 MHz:       0 sec
 480 MHz:       0 sec

##########################################################################

On Raspberries there's another problem: the ARM cores having no idea at which frequency they run since clockspeeds and throttling are done in the closed source ThreadX domain (for details start reading from 'The real brain of the Pi is not open source' here). As such sbc-bench also queries ThreadX in monitoring mode and lists fake and real frequencies next to each other:

Raspberry Pi 3B struggling with temperatures exceeding 80°C
System health while running 7-zip multi core benchmark:

Time        fake/real   load %cpu %sys %usr %nice %io %irq   Temp    VCore
23:19:21: 1200/1200MHz  1.00  12%   0%   5%   0%   5%   0%  61.2°C  1.3062V
23:19:54: 1200/1200MHz  2.17  79%   1%  78%   0%   0%   0%  68.8°C  1.3062V
23:20:25: 1200/1200MHz  2.82  90%   2%  88%   0%   0%   0%  72.0°C  1.3062V
23:20:55: 1200/1200MHz  3.21  90%   2%  88%   0%   0%   0%  73.1°C  1.3062V
23:21:28: 1200/1200MHz  3.53  87%   7%  80%   0%   0%   0%  75.2°C  1.3062V
23:22:01: 1200/1200MHz  3.62  87%  67%  20%   0%   0%   0%  78.4°C  1.3062V
23:22:37: 1200/1195MHz  3.83  86%   4%  81%   0%   0%   0%  77.9°C  1.3062V
23:23:15: 1200/1034MHz  4.07  95%   1%  94%   0%   0%   0%  81.1°C  1.3062V
23:23:50: 1200/1034MHz  3.99  91%   2%  89%   0%   0%   0%  81.7°C  1.3062V
23:24:21: 1200/1200MHz  3.38  50%   2%  48%   0%   0%   0%  76.3°C  1.3062V
23:24:59: 1200/1200MHz  3.97  74%  31%  41%   0%   0%   0%  79.5°C  1.3062V
23:25:29: 1200/1200MHz  3.84  83%   4%  78%   0%   0%   0%  79.5°C  1.3062V
23:25:59: 1200/1141MHz  3.92  83%   1%  81%   0%   0%   0%  80.6°C  1.3062V
23:26:30: 1200/1034MHz  4.00  91%   1%  89%   0%   0%   0%  81.1°C  1.3062V
23:27:00: 1200/1034MHz  4.00  90%   1%  88%   0%   0%   0%  81.7°C  1.3062V
23:27:31: 1200/1200MHz  3.54  48%   2%  46%   0%   0%   0%  77.9°C  1.3062V
23:28:24: 1200/1034MHz  3.89  84%  52%  32%   0%   0%   0%  81.7°C  1.3062V

When all benchmarks have finished we then query ThreadX for throttling and under-voltage events (for the latter see below) and in case only thermal throttling has happened this looks like this then:

Raspberry Pi 3B throttling and under-voltage summary since last reboot
##########################################################################

Querying ThreadX on RPi for thermal or undervoltage issues:

0100000000000000000
|||             |||_ under-voltage
|||             ||_ currently throttled
|||             |_ arm frequency capped
|||_ under-voltage has occurred since last reboot
||_ throttling has occurred since last reboot
|_ arm frequency capped has occurred since last reboot

##########################################################################

Killed CPU cores

Another attempt to cope with critical temperatures is to simply kill CPU cores to lower consumption/temps under load. Almost a decade ago Allwinner's Android kernels were (in)famous for this but Amlogic started to do this with their Android kernels also in recent years (but at least they bring the killed CPU cores up again when temperatures settle).

That's why sbc-bench also collects dmesg output while running the benchmarks to spot such problems ruining benchmark scores:

Khadas VIM3 `dmesg` output while killing two cores at 90°C and bringing them back up again below 85°C
[ 2877.094811] thermal thermal_zone0: temp:90000 increase, hyst:5000, trip_temp:90000, hot:1
[ 2877.115730] IRQ33 no longer affine to CPU1
[ 2877.115752] IRQ53 no longer affine to CPU1
[ 2877.115755] IRQ54 no longer affine to CPU1
[ 2877.115758] IRQ55 no longer affine to CPU1
[ 2877.115822] process 11494 (cpuminer) no longer affine to cpu1
[ 2877.115856] CPU1: shutdown
[ 2877.116885] psci: CPU1 killed (polled 0 ms)
[ 2877.163235] process 11496 (cpuminer) no longer affine to cpu3
[ 2877.163264] CPU3: shutdown
[ 2877.164291] psci: CPU3 killed (polled 0 ms)
[ 2877.198777] idx > max freq
[ 2877.302771] idx > max freq
[ 2877.406789] idx > max freq
[ 2877.406809] thermal thermal_zone0: temp:84500 decrease, hyst:5000, trip_temp:90000, hot:0
[ 2877.424153] Detected VIPT I-cache on CPU1
[ 2877.424202] CPU1: update cpu_capacity 631
[ 2877.424204] CPU1: Booted secondary processor [410fd034]
[ 2877.444258] Detected VIPT I-cache on CPU3
[ 2877.444291] CPU3: update cpu_capacity 1192
[ 2877.444293] CPU3: Booted secondary processor [410fd092]

temp_soft_limit

This is a Raspberry Pi 3B Plus and CM3+ speciality with their BCM2837B0 SoC said to clock at 1400 MHz. But unless you set temp_soft_limit=70 in config.txt the SoC will silently be limited to 1200 MHz once 60°C are hit. Since nobody knows sbc-bench is warning.

Frequency capping (under-voltage)

This is another Raspberry Pi speciality caused by their (in)famous 5V powering. Ohm's law also exists in the SBC world and low voltages combined with high currents always result in a voltage drop unless you have a proper power supply with fixed cable that can compensate for this voltage drop (read as: if you buy a Pi then always buy their appropriate USB-C wall wart as well).

We query ThreadX after executing all benchmarks and if you suffered from voltage drops (input voltage sacking below ~4.65V) then performance is ruined and it looks like this in detailed output:

RPi reporting under-voltage and frequency capping
1010000000000000000
|||             |||_ under-voltage
|||             ||_ currently throttled
|||             |_ arm frequency capped
|||_ under-voltage has occurred since last reboot
||_ throttling has occurred since last reboot
|_ arm frequency capped has occurred since last reboot

Frequency capping is the try to compensate for the voltage drops preventing the SBC from crashing. The SoC's various engine's clockspeeds are lowered immediately (ARM cores to 600 MHz with RPi 2-4, now with RPi 5 down to 1000/1500 MHz) and performance will suffer a lot. Since a few years fortunately those under-voltage events are also logged in kernel ring buffer when running with Raspberry Pi Ltd.'s kernels as such sbc-bench's detailed output will contain a section like this:

Multiple voltage drops on a RPi 4B while benchmarking
##########################################################################

dmesg output while running the benchmarks:

[ 1964.179580] hwmon hwmon1: Undervoltage detected!
[ 1974.259710] hwmon hwmon1: Voltage normalised
[ 1982.323967] hwmon hwmon1: Undervoltage detected!
[ 1988.372011] hwmon hwmon1: Voltage normalised
[ 2004.500454] hwmon hwmon1: Undervoltage detected!
[ 2014.580643] hwmon hwmon1: Voltage normalised
[ 2034.741173] hwmon hwmon1: Undervoltage detected!
[ 2050.869645] hwmon hwmon1: Voltage normalised
[ 2058.933690] hwmon hwmon1: Undervoltage detected!
[ 2066.997794] hwmon hwmon1: Voltage normalised
[ 2075.062096] hwmon hwmon1: Undervoltage detected!
[ 2081.110116] hwmon hwmon1: Voltage normalised
[ 2097.238583] hwmon hwmon1: Undervoltage detected!
[ 2101.270584] hwmon hwmon1: Voltage normalised
[ 2125.463233] hwmon hwmon1: Undervoltage detected!
[ 2133.527483] hwmon hwmon1: Voltage normalised
[ 2157.719971] hwmon hwmon1: Undervoltage detected!
[ 2163.767990] hwmon hwmon1: Voltage normalised
[ 2177.880416] hwmon hwmon1: Undervoltage detected!
[ 2183.928438] hwmon hwmon1: Voltage normalised
[ 2208.121072] hwmon hwmon1: Undervoltage detected!
[ 2214.169136] hwmon hwmon1: Voltage normalised

##########################################################################

Frequency capping often gets confused with thermal throttling but they're really different things though can occur in parallel. Then sbc-bench will hint at both in detailed output:

under-voltage, frequency capping and throttling while benchmarking
11100000000000000000
|||             |||_ under-voltage
|||             ||_ currently throttled
|||             |_ arm frequency capped
|||_ under-voltage has occurred since last reboot
||_ throttling has occurred since last reboot
|_ arm frequency capped has occurred since last reboot

Silly settings

This is mostly an Armbian issue: their OS images for Raspberries for almost two years lacked arm_boost=1 (which is a requirement for RPi 4B with BCM2711 C0 or later to automagically increase maximum clockspeed from 1500 MHz to 1800 MHz w/o overvolting) while at the same time setting over_voltage=2 and arm_freq=1800 which might improve performance on early RPi 4B but often does the opposite on recent RPi 4 since with these silly settings the CPU gets overvolted, therefore heats up more quickly and is prone to throttling.

Additional info when in review mode (WIP)

-r/-R

NTFS filesystems

When Linux is using FUSE/userland methods to access NTFS filesystems performance will be significantly harmed or at least on majority of SBCs likely be bottlenecked by maxing out one or more CPU cores. It is highly advised when benchmarking with any NTFS to monitor closely CPU utilization or better switch to a 'Linux native' filesystem like ext4 since representing 'storage performance' a lot more than 'somewhat dealing with a foreign filesystem' as with NTFS.

io_is_busy

When ondemand cpufreq governor is used it is important to tweak some of these governor's settings, especially io_is_busy. If this is set to 0 (default) then in case of pure I/O loads CPU clockspeeds aren't ramped up as quickly as needed or sometimes at all. As such I/O performance generally suffers, sometimes significantly. See here for example which difference that might make.

sbc-bench's People

Contributors

clach04 avatar darinpp avatar dmole avatar electrified avatar g-provost avatar hydroo avatar igorpecovnik avatar lanefu avatar numbqq avatar rhjdvsgsgks avatar thomaskaiser avatar wenzhuoz avatar wtarreau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sbc-bench's Issues

Comparison dashboard

Is there a centralized public dashboard that aggregates user results and displays them in comparison charts, like Phoronix? Your ReadMe examples are great and the raw data posts in these issues are great and the pics folder is nice but the ideal would be a public leaderboard so testers don't have to reinvent the wheel by buying and testing hardware others have tested. Thoughts?

Great project, btw

load > 1 for 7-zip single core benchmark

Checking the section "System health while running 7-zip single core benchmark" in any result (for example: here) shows the load is greater than 1, up to the number of cores in the system, during the single core benchmark. The %cpu correctly matches 100%/[# of cores].

It appears the benchmark is being run on a single core, but somehow the benchmark is multithreaded and loading more than one core.

Is this by design? Could it be compromising the results somewhat?

Jetson Xavier AGX

I have noticed that the Jetson Xavier AGX is not represented in your results.
If you are interested, here are some results running Nvidia Jetpack 4.3

| Jetson-AGX | 2265600 MHz | 4.9 | Ubuntu 18.04.3 LTS arm64 | 21590 | 2742 | 853250 | 10910 | 22520 | 26.57 |

http://ix.io/4ebH

ASUS Tinker Board 2

| ASUS Tinker Board 2/2S | 2016/1512 MHz | 4.4 | Debian GNU/Linux 10 (buster) arm64 | 5920 | 365270 | 1136950 | 3210 | 7500 | 10.66 | http://ix.io/40Nw |
| ASUS Tinker Board 2/2S | 2016/1512 MHz | 4.4 | Debian GNU/Linux 10 (buster) arm64 | 6100 | 364430 | 1141770 | 3180 | 7500 | 10.96 | http://ix.io/40NG |
| ASUS Tinker Board 2/2S | 2016/1512 MHz | 4.4 | Debian GNU/Linux 10 (buster) arm64 | 6080 | 361000 | 1138510 | 3170 | 7500 | 11.04 | http://ix.io/40NR |

ASUS Tinker Board 2 has been tested with stock heatsink.
OS: Tinker Board 2 Debian 10 V2.0.4

sbc-bench on arm

Recommendation - don't change the CPU governors on sbc-bench - can report what's in use, but by changing it, it's hard to tell if exploring this item for a/b testing

echo performance >/sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor

N100 with DDR5 and NVIDIA Jetson Orin Nano

N100 with DDR5 http://ix.io/4vxM
NVIDIA Jetson Orin Nano http://ix.io/4vy7

Device / details Clockspeed Kernel Distro 7-zip multi 7-zip single AES memcpy memset kH/s
Stock MINI S / N100 3400 MHz 6.1 Ubuntu 22.04.2 LTS x86_64 14010 4020 1224220 9900 8900 -
Loud MINI S / N100 3400 MHz 6.1 Ubuntu 22.04.2 LTS x86_64 14080 4025 1232700 9980 8930 -
DDR5 EQ / N100 3400 MHz 6.1 Ubuntu 23.04 x86_64 14150 4073 1232790 11600 12270 36.24
NVIDIA Orin Nano 1510 MHz 5.10 Ubuntu 20.04.6 LTS arm64 13650 2153 854400 6730 20240 20.68

System sudden death when setting unsupported governor

Testing out -r mode on our RK3566 based product.

At line 709 the script matched to rkvenc's governor:

radxa@rock-3c:~$ Governors="$(find /sys -name "*governor" | grep -E -v '/sys/module|cpuidle|watchdog')"
find: ‘/sys/kernel/tracing’: Permission denied
find: ‘/sys/kernel/debug’: Permission denied
find: ‘/sys/fs/pstore’: Permission denied
find: ‘/sys/fs/bpf’: Permission denied
find: ‘/sys/fs/fuse/connections/35’: Permission denied
radxa@rock-3c:~$ echo $Governors 
/sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/governor /sys/devices/platform/fde60000.gpu/devfreq/fde60000.gpu/governor /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

This node provides AvailableGovernorsSysFSNode:

radxa@rock-3c:~$ ls /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/
available_frequencies  device    max_freq  polling_interval  target_freq
available_governors    governor  min_freq  power             trans_stat
cur_freq               load      name      subsystem         uevent
radxa@rock-3c:~$ cat /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/available_governors 
venc_ondemand simple_ondemand

However, since its SysFSNode does not contain the word cpufreq, it is getting the default governor resetting treatment.

I have confirmed that echo powersave | sudo tee /sys/devices/platform/fdf40000.rkvenc/devfreq/fdf40000.rkvenc/governor can reliably bring our device down.

Maybe a way to do the mhz test on gpu?

Hi Tkaiser, this project is essential for arm sbc understanding. I was able to prove that my rk3399 was lying on me with the frequencies on manjaro while going further than 2.2 ghz on the big cores. I still dont understand who is thr one cheating here bc on rk3399 that should be driven by the kernel unlike on armlogic platforms and rpi. Maybe it only happens on armbian since I wasnt able to get it further than 2.2 ghz on armbian. I mean, adding another opp to go further.

Since the gpu seems to do the same on rk3399, so, not saying the truth about the freq, would be cool if something like that could be performed on the gpu side of things, since I also noted that the rpi4 gpu also fake his frequencies.

BR, Salvador

I will make a video mentioning this project.

OpenVPN bench

Something to consider... as this is more of a system level test as it flexs the core and memory, and check the engines available (one could tweak this to check what engines are available and test across I suppose using the openssl envelope option)

$ openvpn --genkey --secret /tmp/secret
$ time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-256-cbc
Fri Sep 21 16:19:38 2018 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
Real 0m19.924s
user 0m19.858s
sys 0m0.062s
$ openssl engine
(dynamic) Dynamic engine loading support

with this example on rk3288-tinker with armbian 5.60 - the effective potential thruput is 3200/usertime in seconds, so in this case, would be 161.14 Mbit/Sec

Found this over on the pfSense forums, but I suspect it's community knowledge and originator is lost to the forgetful web...

ReadSoCTemp function error with Raspberry Pi OS

I'm running SBC-Bench 0.9.9 in the latest Raspberry Pi OS on the Pi 4, and I get the following error in ReadSoCTemp due to a missing temporary file:

/home/pi/sbc-bench.sh: line 1378: /tmp/sbc-bench.sh.NPKM2L/soctemp: No such file or directory
/home/pi/sbc-bench.sh: line 1379: [: -ge: unary operator expected

Results for OnePlus 5 (`cheeseburger`)

Memory performance (all 2 CPU clusters measured individually):
memcpy: 2846.2 MB/s (Qualcomm Kryo V2)
memset: 13236.3 MB/s (Qualcomm Kryo V2)
memcpy: 9722.5 MB/s (Qualcomm Falkor V1/Kryo)
memset: 14068.2 MB/s (Qualcomm Falkor V1/Kryo)

Cpuminer total scores (5 minutes execution): 12.61,12.58,12.31,12.29,12.27,12.26,12.25,12.24,12.23,12.22,12.20,12.19,12.18,12.17,12.16,12.15,12.14,12.13,12.12,12.11,12.10,12.09,12.00 kH/s

7-zip total scores (3 consecutive runs): 10798,9517,9076, single-threaded: 2474

OpenSSL results (all 2 CPU clusters measured individually):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 140890.64k 436009.94k 895233.37k 1251617.11k 1414859.43k 1426860.71k (Qualcomm Kryo V2)
aes-192-cbc 135269.89k 393412.82k 740241.83k 970246.14k 1065869.31k 1071191.38k (Qualcomm Kryo V2)
aes-256-cbc 131737.98k 364308.61k 644709.38k 813548.89k 880197.63k 883326.98k (Qualcomm Kryo V2)

Full results uploaded to http://ix.io/4fdD

In case this device is not already represented in official sbc-bench results list then please
consider submitting it at https://github.com/ThomasKaiser/sbc-bench/issues with this line:
| OnePlus 5 | 2361600/1900800 MHz | 6.1 | Armbian 22.11.0-trunk Jammy arm64 | 9800 | 2474 | 883330 | 9720 | 14070 | 12.58 | http://ix.io/4fdD |

heztner.com CPX11 (EPYC 2, 2Core 2GB RAM) #2

| AMD EPYC Processor | no cpufreq support | 5.10 | Bullseye x86_64/amd64 | 8420 | 706250 | 873460 | 23630 | 34120 | - | http://ix.io/3K8L |

All is well in this round of bench.
Feature request: please output the result as a file too (maybe sbc-bench.log or something?). My remote connection (SSH) disconnect every n-minutes. So when left unattended, I cannot get the result from my VPS. Thanks!

sbc-bench.sh with MODE=extensive: cpuminer build error

On RaspberryPi 4B 2GB, on Ubuntu Server LTS 22.04 (64-bit) spc-bench.sh with MODE=extensive reports:

... (can't build cpuminer) Done.

System:

Linux ROS2HH 5.15.0-1016-raspi #18-Ubuntu SMP PREEMPT Wed Sep 28 12:15:55 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

$ cd /usr/local/src/cpuminer-multi
$ sudo ./build.sh

...
crypto/blake2s.c: In function ‘blake2s’:
crypto/blake2s.c:326:9: error: size of array element is not a multiple of its alignment
  326 |         blake2s_state S[1];
      |         ^~~~~~~~~~~~~
make[2]: *** [Makefile:1645: crypto/cpuminer-blake2s.o] Error 1

Uploading the results through a proxy

Hi,

I launched your script on an OrangePi RK3399 (unfortunately behind a corporate proxy), and got this error:

Full results uploaded to <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ERROR: The requested URL could not be retrieved</title>
<style type="text/css"><!--
 /*
 Stylesheet for Squid Error pages
 Adapted from design by Free CSS Templates
 http://www.freecsstemplates.org
 Released for free under a Creative Commons Attribution 2.5 License
*/

/* Page basics */
* {
        font-family: verdana, sans-serif;
}

html body {
        margin: 0;
        padding: 0;
        background: #efefef;
        font-size: 12px;
        color: #1e1e1e;
}

/* Page displayed title area */
#titles {
        margin-left: 15px;
        padding: 10px;
        padding-left: 100px;
        background: url('http://www.squid-cache.org/Artwork/SN.png') no-repeat left;
}

/* initial title */
#titles h1 {
        color: #000000;
}
#titles h2 {
        color: #000000;
}

/* special event: FTP success page titles */
#titles ftpsuccess {
        background-color:#00ff00;
        width:100%;
}

/* Page displayed body content area */
#content {
        padding: 10px;
        background: #ffffff;
}

/* General text */
p {
}

/* error brief description */
#error p {
}

/* some data which may have caused the problem */
#data {
}

/* the error message received from the system or other software */
#sysmsg {
}

pre {
    font-family:sans-serif;
}

/* special event: FTP / Gopher directory listing */
#dirmsg {
    font-family: courier;
    color: black;
    font-size: 10pt;
}
#dirlisting {
    margin-left: 2%;
    margin-right: 2%;
}
#dirlisting tr.entry td.icon,td.filename,td.size,td.date {
    border-bottom: groove;
}
#dirlisting td.size {
    width: 50px;
    text-align: right;
    padding-right: 5px;
}

/* horizontal lines */
hr {
        margin: 0;
}

/* page displayed footer area */
#footer {
        font-size: 9px;
        padding-left: 10px;
}


body
:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }
:lang(he) { direction: rtl; float: right; }
 --></style>
</head><body>
<div id="titles">
<h1>ERROR</h1>
<h2>The requested URL could not be retrieved</h2>
</div>
<hr>

<div id="content">
<p><b>Invalid Request</b> error was encountered while trying to process the request:</p>

<blockquote id="data">
<pre>POST / HTTP/1.1
Host: ix.io
User-Agent: curl/7.52.1
Accept: */*
Proxy-Connection: Keep-Alive
Content-Length: 32442
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------082c57537f40911f
</pre>
</blockquote>

<p>Some possible problems are:</p>
<ul>
<li><p>Missing or unknown request method.</p></li>
<li><p>Missing URL.</p></li>
<li><p>Missing HTTP Identifier (HTTP/1.0).</p></li>
<li><p>Request is too large.</p></li>
<li><p>Content-Length missing for POST or PUT requests.</p></li>
<li><p>Illegal character in hostname; underscores are not allowed.</p></li>
<li><p>HTTP/1.1 <q>Expect:</q> feature is being asked from an HTTP/1.0 software.</p></li>
</ul>

<p>Your cache administrator is <a href="mailto:root?subject=CacheErrorInfo%20-%20ERR_INVALID_REQ&amp;body=CacheHost%3A%20%0D%0AErrPage%3A%20ERR_INVALID_REQ%0D%0AErr%3A%20%5Bnone%5D%0D%0ATimeStamp%3A%20Tue,%2012%20Nov%202019%2010%3A46%3A04%20GMT%0D%0A%0D%0AClientIP%3A%2010.24.67.227%0D%0A%0D%0AHTTP%20Request%3A%0D%0APOST%20%2F%20HTTP%2F1.1%0AHost%3A%20ix.io%0D%0AUser-Agent%3A%20curl%2F7.52.1%0D%0AAccept%3A%20*%2F*%0D%0AProxy-Connection%3A%20Keep-Alive%0D%0AContent-Length%3A%2032442%0D%0AExpect%3A%20100-continue%0D%0AContent-Type%3A%20multipart%2Fform-data%3B%20boundary%3D------------------------082c57537f40911f%0D%0A%0D%0A%0D%0A">root</a>.</p>
<br>
</div>

<hr>
<div id="footer">
<p>Generated Tue, 12 Nov 2019 10:46:04 GMT by proxy (squid/3.1.10)</p>
<!-- ERR_INVALID_REQ -->
</div>
</body></html>. Please check the log for anomalies (e.g. swapping
or throttling happenend) and otherwise share this URL.

The installation of the tools went fine (I guess thanks to the http_proxy and https_proxy environment variables), but posting the results failed.

Thanks.

Sharing results for AWS Graviton

I wasn't quite sure how to format the results in order to submit a PR.. I can if you like.. am I supposed to used the best number out of the group, ex: 7zip scores.

This is test results from an Amazon a1.xlarge instance.... that's a 4 core Graviton CPU with 8 gigs of RAM

http://ix.io/2iFY

apt fails to install utils, check failed, reason - missing one of packages

Refreshing some images and found this:

root@NanoPi-M3:~# ./sbc-bench.sh -r
Starting to examine hardware/software for review purposes...

sbc-bench v0.9.36

Installing needed tools: apt -f -qq -y install lm-sensors sysstat dmidecode lshw usbutils mmc-utils stress-ng smartmontools p7zip...No 7-zip binary found and could not be installed. Aborting
root@NanoPi-M3:~# apt install p7zip
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  p7zip-full
The following NEW packages will be installed:
  p7zip
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 273 kB of archives.
After this operation, 961 kB of additional disk space will be used.
Get:1 http://ports.ubuntu.com xenial-security/universe arm64 p7zip arm64 9.20.1~dfsg.1-4.2ubuntu0.1 [273 kB]
Fetched 273 kB in 0s (1026 kB/s)
Selecting previously unselected package p7zip.
(Reading database ... 69869 files and directories currently installed.)
Preparing to unpack .../p7zip_9.20.1~dfsg.1-4.2ubuntu0.1_arm64.deb ...
Unpacking p7zip (9.20.1~dfsg.1-4.2ubuntu0.1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up p7zip (9.20.1~dfsg.1-4.2ubuntu0.1) ...
root@NanoPi-M3:~# ./sbc-bench.sh -r
Starting to examine hardware/software for review purposes...

Average load and/or CPU utilization too high (too much background activity). Waiting...

Too busy for benchmarking: 11:18:44 up 40 min,  4 users,  load average: 0.27, 0.36, 0.70,  cpu: 8%
Too busy for benchmarking: 11:18:49 up 41 min,  4 users,  load average: 0.25, 0.35, 0.70,  cpu: 0%
Too busy for benchmarking: 11:18:54 up 41 min,  4 users,  load average: 0.23, 0.35, 0.70,  cpu: 0%
Too busy for benchmarking: 11:18:59 up 41 min,  4 users,  load average: 0.21, 0.34, 0.69,  cpu: 0%
Too busy for benchmarking: 11:19:04 up 41 min,  4 users,  load average: 0.20, 0.33, 0.69,  cpu: 0%
Too busy for benchmarking: 11:19:09 up 41 min,  4 users,  load average: 0.18, 0.33, 0.69,  cpu: 0%

sbc-bench v0.9.36

Installing needed tools: apt -f -qq -y install lm-sensors sysstat dmidecode lshw usbutils mmc-utils stress-ng smartmontools, tinymembench, ramlat, mhz, cpuminer..../sbc-bench.sh: line 2888: iostat: command not found                                                                                                                                                                        Done.
Checking cpufreq OPP. Done.
Executing tinymembench. Done.

and again failed for sysstat which can be easily installed with apt,
the output is not checked so whole command failed because missing mmc-utils package (absent in old xenial), just installed everything else via apt (easily to find error without -qq) and build mmc-utils from sources.

While it's not that hard to figure out this, but:

  • check failed for some of installed commands (basically 7zip only is checked)
  • if any of needed package is missing then whole apt fails
  • script continue to work and will fail for needed commands
  • error message is misleading - 7z easy to add, it does not fail for mmc package that could not be installed

CPU temperature during system health not detected on (some) Intel and AMD mini PCs

After realising the 'missed opportunity' (see my latest comment on 'OS and Memory Impact on Mini PC Gaming Performance') I found that the CPU temperature during system health was not reflected correctly in the latest version of ‘sbc-bench’.

As a temporary work-around I ran the following script:

if $(lsmod | grep -qsw "^k10temp"); then # AMD
HWMON=$(grep --include=name -rls /sys/devices/pci* -e "^k10temp$" | sed 's?.*hwmon/??' | sed 's?/name??')
elif $(lsmod | grep -qsw "^coretemp"); then # Intel
HWMON=$(ls /sys/devices/platform/coretemp.0/hwmon)
else
HWMON=""
fi
if [ -f "/sys/class/hwmon/${HWMON}/temp1_input" ]; then
echo "sed -i 's?echo 0 >?ln -fs /sys/class/hwmon/${HWMON}/temp1_input ?' sbc-bench.sh"
else
echo "Cannot determine 'hwmon'"
fi

I'm not suggesting this as 'the fix' but rather using it to highlight where the issue is and one possible way of addressing it.

Proposal: add stockfish benchmark

From cnx-software.

First invocation on Rock 5B in lazy mode (phoronix-test-suite benchmark pts/stockfish-1.4.0) already ended up with the board freezing at the 2nd stockfish run. Attaching fan to power and repeating again also again freeze during 2nd stockfish bench 128 8 24 default depth run.

General problem was already known since so far on some boards highest DRAM clock wasn't usable and users needed to switch from 2112 MHz to 1560 MHz for stable operation.

My board hasn't seen any freezes on highest DRAM clock so this was a surprise. By updating my Armbian image to latest version I was hoping for getting most recent boot BLOBs as part of u-boot package. It now reads ii linux-u-boot-rock-5b-legacy 22.11.0-trunk.0106 arm64 Uboot loader 2017.09 but problems got even worse and now the board freezes on 2112 MHz DRAM clock already at 1st benchmark execution. Maybe @amazingfate can comment on whether my OS image is expected to run on latest BLOBs or not?

With lower DRAM clock everything works as expected but at 2112 MHz DRAM clock the board freezes regardless of the A76's clockspeeds (and as such DVFS/consumption) so it looks solely related to DRAM clock:

A76 clock DRAM clock Watts SoC temp Nodes per second
2360 MHz 528 MHz 8-9W 40°C 3238057
2360 MHz 1068 MHz 9-10W 43.5°C 4122771
2360 MHz 1560 MHz 10-11W 46°C 4653285
2360 MHz 2112 MHz 12W 46°C freeze
1800 MHz 2112 MHz 8-9W 39°C freeze

With other CPU benchmarks I haven't seen consumption exceeding 9W on Rock 5B so stockfish is really a potent load generator / stability tester. On top of making heavy use of SIMD extensions it also is heavy on memory access: walking through the different DRAM clockspeeds ended up with significantly different scores: https://openbenchmarking.org/result/2211099-NE-2211093NE82

Quick check on an AMD EPYC 7232P (8C/16T) thing also hints at stockfish being more demanding than both cpuminer and 7-zip:

First chart is from a NetIO powermeter (measuring at the wall), 2nd is the server's internal BMC showing PSU1 (PSU2 is always in standby on this machine so the whole productive consumption is PSU1's thing), the last two are the BMC measurements for CPU and DRAM separately (though no idea to which number the memory controller contributes):

Bildschirmfoto 2022-11-09 um 19 52 39 Kopie

Determining Alder Lake N100 performance

(Continuing from Googulator/linux-rk3588-midstream#3 (comment) )

@n2qcn Thanks for providing the test result over at http://ix.io/4sQP

There are a few things I want to ask though:

  • the AZW MINI S is limited to DDR4 (SO-)DIMMs right? Is there one or two such sockets?
  • Have you looked into dealing with powercap? powercap-info -p intel-rapl should tell the limits

Single-threaded the Gracemont core outperforms an A76 in e.g. RK3588 easily due to 3.4 GHz vs. ~2.4 GHz CPU clock (30% more with stuff like 7-ZIP, similar picture with Geekbench 6 but multi-threaded the N100 is bottlenecked by powercapping. The thermal values suggest that increasing powercap limits isn't enough since then throttling might become an issue.

But I would still love to hear whether with a few tweaks performance can be further improved (with this in mind the obvious move to (LP)DDR5 at 4800 MT/s won't be possible though, right?)

Freezes right on first start

I've tried to use this tool on a newly flashed Nano Pi NEO with Armbian (Ubuntu). All packages updated in advance. After starting this script as root (with -c option), nothing happened. I opened a separate shell and htop shows me this:

image

Doesn't look like it's actually doing something. The package installer might sit somewhere in a cage waiting for me to do something. Ctrl+C doesn't stop it.

Kendryte K510 error

/sbc-bench.sh: line 1572: 3 * 221 / 0 : division by 0 (error token is "0 ")
debian@debian:~$ cat /proc/cpuinfo
hart	: 0
isa	: rv64i2p0m2p0a2p0f2p0d2p0c2p0xv5-0p0
mmu	: sv39

hart	: 1
isa	: rv64i2p0m2p0a2p0f2p0d2p0c2p0xv5-0p0
mmu	: sv39

debian@debian:~$ uname -a
Linux debian 4.17.0 #1 SMP PREEMPT Wed Jun 22 05:46:18 AEST 2022 riscv64 GNU/Linux

rock64 and vcgencmd

I knocked up a (poor) vcgencmd https://github.com/clach04/rock64_vcgencmd implementation for rock64 with very limited functionality and it is installed in the same location raspberry pi software expects it to be. Because it is incomplete and there are various checks for its presence, sbc-bench runs on rock64 end up with errors, see http://ix.io/2b8U.

Not sure what the best way to deal with this but I noticed the same check is used in many places rather than one check and then setting a value to check. I noticed $CPUs is set, here https://github.com/ThomasKaiser/sbc-bench/blob/master/sbc-bench.sh#L123 - can that be used through out the script?

I ended up doing a quick hack:

diff sbc-bench.sh hack_sbc-bench.sh
121c121,122
<       if [ -f /usr/bin/vcgencmd ]; then
---
>       #if [ -f /usr/bin/vcgencmd ]; then
>       if [ -f /usr/bin/raspi-config ]; then
482c483,485
<       if [ -f /usr/bin/vcgencmd ]; then
---
>       # FIXME raspberrypi check here instead
>       #if [ -f /usr/bin/vcgencmd ]; then
>       if [ -f /usr/bin/raspi-config ]; then
560c563,565
<                       if [ -f /usr/bin/vcgencmd ]; then
---
>                       # FIXME raspberrypi check here instead
>                       #if [ -f /usr/bin/vcgencmd ]; then
>                       if [ -f /usr/bin/raspi-config ]; then
768c773,775
<       if [ -f /usr/bin/vcgencmd ]; then
---
>       # FIXME pi check here instead
>       #if [ -f /usr/bin/vcgencmd ]; then
>       if [ -f /usr/bin/raspi-config ]; then

but I do not recommend this approach - this was a quick-and-dirty change to make it work :-)

RK3399 Based TV Box - 4GB / 64GB (OC + FAN)

  • OC (1.8/2.088) within recommended voltages.
  • 50mm Heatsink + 50mm 5v Fan, High Quality Thermal Compound. (Fan Power Usage: 0.93W)
  • 5v 4A, High Quality power supply. (3.36W idle, 10.54w on 100%, including fan power usage of 0.93W)
  • LPDDR3 running @ 800Mhz, (DRAM chip is 2133Mhz Samsung LPDDR3 but Processor can support up to 933Mhz, I need to build u-boot to reach that frequency, I've not managed to build one yet)
  • Headless server (unnecessary kernel modules and hardware removed).
  • Boot from EMMC
  • Armbian 20.11 Focal with Linux 5.9.9-arm-64
# systemd-analyze : 
Startup finished in 6.290s (kernel) + 1.798s (userspace) = 8.089s
graphical.target reached after 1.764s in userspace
# time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-256-cbc
Sat Dec 19 16:37:16 2020 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode

real    0m1.571s
user    0m1.553s
sys     0m0.016s 
time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-128-cbc
Sat Dec 19 19:36:17 2020 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode

real    0m1.524s
user    0m1.504s
sys     0m0.016s

3200 / 1.553 = 2060 Mbps for AES-256-CBC
3200 / 1.504 = 2127 Mbps for AES-128-CBC?!

Edit: Found this in RK3399 Datasheet:

Embedded dual-channel encryption and decryption engine
 Support AES 128/192/256 bits key mode, ECB/CBC/CTR/XTS chain mode,
Slave/FIFO mode
 Support DES/3DES (ECB and CBC chain mode), 3DES (EDE/EEE key mode),
Slave/FIFO mode
 Support SHA1/SHA256/MD5(with hardware padding) HASH function, FIFO
mode only
 Support 160-bit Pseudo Random Number Generator (PRNG)
 Support 256-bit True Random Number Generator (TRNG)
 Support PKA 512/1024/2048 bit Exp Modulator

# openssl engine
(dynamic) Dynamic engine loading support
# systemd-analyze blame
371ms networking.service
357ms dev-mmcblk2p2.device
341ms ssh.service
230ms chrony.service
216ms systemd-logind.service
207ms [email protected]
189ms systemd-udev-trigger.service
120ms sysfsutils.service
118ms systemd-journald.service
 98ms e2scrub_reap.service
 82ms rsyslog.service
 74ms systemd-fsck@dev-disk-by\x2dlabel-BOOT_EMMC.service
 57ms systemd-udevd.service
 53ms armbian-hardware-optimize.service
 47ms dev-hugepages.mount
 46ms dev-mqueue.mount
 44ms sys-kernel-debug.mount
 40ms systemd-tmpfiles-setup.service
 37ms armbian-hardware-monitor.service
 37ms systemd-sysctl.service
 35ms fake-hwclock.service
 35ms sysstat.service
 33ms kmod-static-nodes.service
 32ms systemd-remount-fs.service
 30ms rc-local.service
 28ms systemd-tmpfiles-setup-dev.service
 28ms systemd-user-sessions.service
 28ms systemd-sysusers.service
 26ms systemd-update-utmp.service
 24ms systemd-update-utmp-runlevel.service
 22ms systemd-modules-load.service
 21ms boot.mount
 21ms [email protected]
 20ms systemd-journal-flush.service
 19ms setvtrgb.service
 14ms ifupdown-pre.service
 12ms sys-kernel-config.mount
  9ms tmp.mount
# iperf3 -c 192.168.20.150 -d -P 4
[...]
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   9.00-10.00  sec  28.0 MBytes   235 Mbits/sec
[  6]   9.00-10.00  sec  28.2 MBytes   237 Mbits/sec
[  8]   9.00-10.00  sec  28.1 MBytes   236 Mbits/sec
[ 10]   9.00-10.00  sec  28.2 MBytes   237 Mbits/sec
[SUM]   9.00-10.00  sec   113 MBytes   945 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  sender
[  4]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  receiver
[  6]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  sender
[  6]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  receiver
[  8]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  sender
[  8]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  sender
[ 10]   0.00-10.00  sec   282 MBytes   236 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  1.10 GBytes   945 Mbits/sec                  sender
[SUM]   0.00-10.00  sec  1.10 GBytes   945 Mbits/sec                  receiver

iperf Done.

http://ix.io/2ICt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.