andrewrk / poop Goto Github PK

View Code? Open in Web Editor NEW

788.0 10.0 50.0 39 KB

Performance Optimizer Observation Platform

License: MIT License

Zig 100.00%

poop's Introduction

Performance Optimizer Observation Platform

Stop flushing your performance down the drain.

Overview

This command line tool uses Linux's perf_event_open functionality to compare the performance of multiple commands with a colorful terminal user interface.

Usage

Usage: poop [options] <command1> ... <commandN>

Compares the performance of the provided commands.

Options:
 --duration <ms>    (default: 5000) how long to repeatedly sample each command

Building from Source

Tested with Zig 0.11.0-dev.3883+7166407d8.

zig build

Comparison with Hyperfine

Poop (so far) is brand new, whereas Hyperfine is a mature project with more configuration options and generally more polish.

However, poop does report peak memory usage as well as 5 other hardware counters, which I personally find useful when doing performance testing. Hey, maybe it will inspire the Hyperfine maintainers to add the extra data points!

Poop does not run the commands in a shell. This has the upside of not including shell spawning noise in the data points collected, and the downside of not supporting strings inside the commands.

Poop treats the first command as a reference and the subsequent ones relative to it, giving the user the choice of the meaning of the coloring of the deltas. Hyperfine always prints the wall-clock-fastest command first.

While Hyperfine is cross-platform, Poop is Linux-only.

poop's People

Contributors

Stargazers

Watchers

poop's Issues

Add `--export`: get summary statistics and timings

Allowing to serialize the data in json or any other format.

maybe duplicate: #22

machine-friendly output format

Having some common machine-readable format(s) would be helpful for analyzing results. There should be a format that is easy for plotting packages like gnuplot and matplotlib to consume - I expect CSV (which is a subset of the gnuplot text data format) is the right starting point for this.

There might be other formats of interest, maybe JSON that looks something like:

{
    "name": "Benchmark name",
    "optimize": "ReleaseFast",
    "target": "x86-64-linux-gnu",
    "summary": {
        "mean_wall_time": {
            "value": 123438432,
            "stddev": 1234,
            "min": 12314566,
            "max": 2431248383,
        },
        "other_summary_statistic": {
            "value": 12345,
            "stddev": 123432
            "min": 23432,
            "max": 12382132
        }
    },
    "samples": [
        { "wall_time": 123455677, "peak_rss": 1234, "cpu_cycles": 123, "instructions": 1232, "cache_references": 12321, "cache_misses": 234232, "branch_misses": 128234 },
        { "wall_time": 123455677, "peak_rss": 1234, "cpu_cycles": 123, "instructions": 1232, "cache_references": 12321, "cache_misses": 234232, "branch_misses": 128234 },
    ]
}

JSON is pretty verbose for using as a storage format, but it would compress well.

when a command fails, show the stderr by default

By default if a command exits with nonzero or crashes, display its stderr and termination cause.

progress bar while user waits for results to be collected

perhaps the current running estimates can be reported while the user waits

Build Instructions are poor

Some OS'es still ship zig-0.9.1 including mine, which is not working on my OS.
I was not able to find the specified, tested zig version here https://ziglang.org/download/.
zig-0.11.0-dev.3786+8dcb4a3dc (for linux_x86_64 in my case) is working. I couldn't find a different 0.11 version for direct download.
Build instructions can be found here https://ziglang.org/learn/getting-started/#direct-download, and it is a little awkward to find them, to be honest. On the other hand, one only needs to extract it with xz --decompress and add the extracted directory to your path. I suggest to mention this.
Users of poop that aren't zig devs, likely do not want to set up a nighty build / nighlty download of a prebuild binary, and the version for download likely keeps changing. That is a tough situation, and should be at least mentioned.
I did not check whether some zig-0.10 is working, but using a more stable version might be more convenient for users of poop that aren't zig-devs.

install poop

am I supposed to manually download poop from the releases page?

is there a way I can install poop via e.g. a package manager, so I can easily update it, without re-downloading manually?
any recommended way to install it?

would be great to have a few words about installation in the README

reached unreachable code (in VM)

I just tried running this in a VM where the counters are unavailable and got the following output:

$ ./poop echo
poop [1/1] 'echo'... thread 305373 panic: reached unreachable code
Unable to dump stack trace: debug info stripped
Aborted (core dumped)

I was hoping I might use some functionality of the program even if counters are unavailable. I am making an assumption that the problem is caused by the lack of counters; in an attempt to learn more detail, I was unable to build this using zig 0.10 on nix, I got an error about std.Build not being defined.

Welch T-Test

https://en.wikipedia.org/wiki/Welch%27s_t-test

It's basically just a way to get a single number you can then look up in a table to get the probability that your change made a measurable difference.
Computing t is not difficult, but getting from t to a probability can be a little bit more tricky.
I have to look that part up again, I don't want to say something wrong and it's quite late for me.
It would be a good idea to implement this, as it would be easier to give something that people can actually use without a statistics background or having to have a lookup table for t values in next to their keyboard.

Scipi's implementation returns this as the p-value.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

It's probably only useful for programs with a large variance in runtime when you want to test if a change made a measurable difference and only do a few dozen instead of thousands of tests to be confident.

The reason I mentioned it in the stream is because someone brought up checking if a value is within a standard deviation of the other.
This is basically a more sophisticated version of that.

blank line between benchmarks and summary section

something like

Summary:
  Performance: A > B ≈ C
  RAM Usage: B > C > A
  ...

add page faults

This is part of rusage. Potentially helpful snippet:

pub inline fn getPageFaults(rus: ResourceUsageStatistics) ?usize {
    switch (builtin.os.tag) {
        .linux, .macos, .ios => {
            if (rus.rusage) |ru| {
                return @intCast(usize, ru.ru_minflt);
            } else {
                return null;
            }
        },
        .windows => {
            if (rus.rusage) |ru| {
                return ru.PageFaultCount;
            } else {
                return null;
            }
        },
        else => return null,
    }
}

Use some max number of sigfigs

Keeping some max number of sigfigs would be useful to keep columns a consistent comptime-known width. This would make it easier to align columns centrally while maintaining their alignment across benchmarks.
Maybe this requires something like scientific notation. Relevant for #24.

The other reason it'd be useful is in some cases I get max RSS = "2M" for two different cases, but one is reported to be like 10% lower. The rounding is hiding important info. It'd be nice to see "2.14M" instead.

I'm not sure exactly how many sigfigs would be best and they might differ for each measurement, but I think 3 or 4 might be a good start.

figure out how to make the code less janky with regards to formatted printing

This pattern is pretty clunky:

poop/src/main.zig

Lines 329 to 332 in b01058a

 try printUnit(fbs.writer(), m.mean, m.unit, m.std_dev); 

 try w.writeAll(fbs.getWritten()); 

 count += fbs.pos; 

 fbs.pos = 0;

Options that can make poop exit with error code

It would be nice if we could specify acceptable regression values for each counter and make poop exit with 1 in the event that any of the values went above the specified filter

detect and report statistical outliers

when an outlier occurs, the user should be warned that they probably forgot to use --warmup or preload some cache or something

add --warmup option

to explicitly ignore some samples

Why the difference between the counter values reported by poop and perf-stat?

For example:

% perf stat -r5 -e instructions,cycles,cache-references,cache-misses,branches,branch-misses ls
build.zig  LICENSE  README.md  src  zig-cache  zig-out
build.zig  LICENSE  README.md  src  zig-cache  zig-out
build.zig  LICENSE  README.md  src  zig-cache  zig-out
build.zig  LICENSE  README.md  src  zig-cache  zig-out
build.zig  LICENSE  README.md  src  zig-cache  zig-out

 Performance counter stats for 'ls' (5 runs):

         1.696.322      instructions              #    1,10  insn per cycle           ( +-  0,21% )
         1.432.362      cycles                                                        ( +-  9,94% )
           144.573      cache-references                                              ( +-  3,93% )
            35.966      cache-misses              #   26,049 % of all cache refs      ( +-  1,02% )
           363.919      branches                                                      ( +-  0,17% )
            10.495      branch-misses             #    2,89% of all branches          ( +-  1,89% )

          0,001857 +- 0,000170 seconds time elapsed  ( +-  9,18% )

% ./zig-out/bin/poop ls                           
Benchmark 1 (3316 runs): ls
  measurement          mean ± σ            min … max           outliers
  wall_time          1.45ms ±  102us     723us … 2.27ms        323 (10%)        
  peak_rss           2.86MB ± 60.2KB    2.72MB … 2.99MB          0 ( 0%)        
  cpu_cycles          372K  ± 37.1K      347K  …  946K         361 (11%)        
  instructions        421K  ± 61.0       421K  …  421K           0 ( 0%)        
  cache_references   28.0K  ± 2.60K     22.5K  … 34.7K           0 ( 0%)        
  cache_misses       9.02K  ±  482      8.07K  … 10.8K           0 ( 0%)        
  branch_misses      4.72K  ± 67.5      3.99K  … 5.04K         175 ( 5%)

You can see above that, compared with perf stat, poop gives lower values for each hardware counter.
I'm not that familiar with the perf API and how each tool is using it but I would expected the same(or close enough) values reported by both tools if the same command is measured.

reexamine the conditions for marking the ratio insignificant

Currently it is done like this:

poop/src/main.zig

Line 365 in b01058a

const is_sig = @fabs(percent) >= 0.01;

but maybe there is better way to do it, that is more well accepted in the field of statistics.

(perhaps std deviation should be involved?)

mad max mode

Fuck it, race the commands all at the same time, and kill -9 the rest after the first one finishes.

Is this useful? no.

is this guitar useful? also no.

detect when perf is permission denied and give the user a helpful message

https://unix.stackexchange.com/questions/14227/do-i-need-root-admin-permissions-to-run-userspace-perf-tool-perf-events-ar

#16 (comment)

poop should inform the user how to configure their system to make it work without root access.

integer overflow when trying to print result of subsequent benchmark

Hey, just tried poop for the first time and

I just wanted to check peak_rss of different zstd compression levels.

sudo /home/tobias/.local/bin/poop "zstd -3 -f java_error_in_idea_.hprof" "zstd -6 -f java_error_in_idea_.hprof" "zstd -9 -f java_error_in_idea_.hprof"

When finishing the second benchmark (after the 3rd run, when he should post the results), there was a panic with integer overflow.
Running it again (after some time) it got through the second benchmark but crashed on the third 🤷‍♂️

I am using x86_64-linux-poop 0.3.0 on Ubuntu 22.04.

Let me know if I can provide you any additional information.

Update for Zig breaking change: per-Module

No longer able to build on latest Zig - ziglang/zig#18160

flag for setting command line parameters

I would like to be able to specify a set of command line arguments that get applied to all programs to be benchmarked, rather than have to include them in each executable argument. For example at the moment I tend to run benchmarks like this

poop "zig-out-v1/bin/prog arg1 arg2 arg3 arg4 arg5" "zig-out-v2/bin/prog arg1 arg2 arg3 arg4 arg5" "zig-out-v3/bin/prog arg1 arg2 arg3 arg4 arg5" "zig-out-v4/bin/prog arg1 arg2 arg3 arg4 arg5"

or, if I remember the precise way my shell does expansion I can do this:

poop zig-out-{v1,v2,v3,v4}/bin/prog" arg1 arg2 arg3 arg4 arg5"

(making sure there is no space between prog and the first ").

I propose there is a flag like --arg to accumulate components of argv allowing the above to be written as this:

poop zig-out-{v1,v2,v3,v4}/bin/prog --arg arg1 --arg arg2 --arg arg3 --arg arg4

The same argv splitting that is currently done should be preserved, and then arg1, arg2, arg3, arg4 would be appended to the argv of each command, this would allow to still supply arguments to a specific command like this:

poop zig-out-{v1,v2,v3}/bin/prog "zig-out-v4/bin/prog --shiny-new-option" --arg arg1 --arg arg2 --arg arg3 --arg arg4

Alternatively (or in addition?), there could be an --args flag to supply whitespace separated arguments:

poop zig-out-{v1,v2,v3}/bin/prog "zig-out-v4/bin/prog --shiny-new-option" --args "arg1 arg2 arg3 arg4"

`PILE OF POO` may not be available on some systems

After a quick search I found that PILE OF POO is not widely available.

As an example: https://www.fileformat.info/info/unicode/char/1f4a9/fontsupport.htm.
On my system it is only supported by SourceCodeXXX (checked using fc-list :charset=1F4A9)

look into CPU shielding

https://manpages.ubuntu.com/manpages/trusty/man1/cset-shield.1.html

I just learned about this today (thanks @Verdagon!). Maybe whatever syscalls it is using under the hood could be a nice way to make poop obtain less noisy measurements.

Zig standard library tests deadlock when run via `poop`

$ zig test lib/std/std.zig --zig-lib-dir lib --main-pkg-path lib/std --test-no-exec -femit-bin=test
$ poop ./test

Deadlocks for me when trying to print progress. Stacktrace from gdb:

(gdb) backtrace
#0  0x00000000011370db in os.linux.x86_64.syscall3 (number=write, arg1=2, arg2=140724114992364, arg3=4) at /home/ryan/Programming/zig/zig/lib/std/os/linux/x86_64.zig:46
#1  0x0000000001138ff6 in os.linux.write (fd=2, buf=0x7ffce2e2acec "1541\001", count=4) at /home/ryan/Programming/zig/zig/lib/std/os/linux.zig:633
#2  0x0000000000ddded1 in os.write (fd=2, bytes=...) at /home/ryan/Programming/zig/zig/lib/std/os.zig:1133
#3  0x0000000000c9a96b in fs.file.File.write (self=..., bytes=...) at /home/ryan/Programming/zig/zig/lib/std/fs/file.zig:1163
#4  0x000000000082eed9 in io.writer.Writer(fs.file.File,error{DiskQuota,FileTooBig,InputOutput,NoSpaceLeft,DeviceBusy,InvalidArgument,AccessDenied,BrokenPipe,SystemResources,OperationAborted,NotOpenForWriting,LockViolation,WouldBlock,ConnectionResetByPeer,Unexpected},(function 'write')).write (self=<error reading variable: Cannot access memory at address 0x4>, bytes=...)
    at /home/ryan/Programming/zig/zig/lib/std/io/writer.zig:17
#5  0x000000000081edee in io.writer.Writer(fs.file.File,error{DiskQuota,FileTooBig,InputOutput,NoSpaceLeft,DeviceBusy,InvalidArgument,AccessDenied,BrokenPipe,SystemResources,OperationAborted,NotOpenForWriting,LockViolation,WouldBlock,ConnectionResetByPeer,Unexpected},(function 'write')).writeAll (self=..., bytes=...)
    at /home/ryan/Programming/zig/zig/lib/std/io/writer.zig:23
#6  0x0000000000c9aa89 in fmt.formatBuf__anon_66211 (buf=..., options=..., writer=...) at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:1047
#7  0x0000000000cf0e66 in fmt.formatInt__anon_68567 (value=1541, base=10 '\n', case=lower, options=..., writer=...) at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:1460
#8  0x0000000001135296 in fmt.formatIntValue__anon_241031 (value=1541, options=<error reading variable: Cannot access memory at address 0x4>, 
    writer=<error reading variable: Cannot access memory at address 0x40>) at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:784
#9  0x00000000011352d5 in fmt.formatValue__anon_241030 (value=1541, options=<error reading variable: Cannot access memory at address 0x4>, writer=...)
    at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:733
#10 0x0000000000dda2d9 in fmt.formatType__anon_74949 (value=1541, options=<error reading variable: Cannot access memory at address 0x4>, writer=..., max_depth=3)
    at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:487
#11 0x00000000011597a7 in fmt.format__anon_241736 (writer=..., args=...) at /home/ryan/Programming/zig/zig/lib/std/fmt.zig:184
#12 0x0000000000de7911 in io.writer.Writer(fs.file.File,error{DiskQuota,FileTooBig,InputOutput,NoSpaceLeft,DeviceBusy,InvalidArgument,AccessDenied,BrokenPipe,SystemResources,OperationAborted,NotOpenForWriting,LockViolation,WouldBlock,ConnectionResetByPeer,Unexpected},(function 'write')).print__anon_75732 (self=..., 
    args=<error reading variable: Cannot access memory at address 0x4>) at /home/ryan/Programming/zig/zig/lib/std/io/writer.zig:28
#13 0x0000000000cefded in debug.print__anon_68538 (args=...) at /home/ryan/Programming/zig/zig/lib/std/debug.zig:88
#14 0x000000000083ca3a in test_runner.mainTerminal () at test_runner.zig:164
#15 0x0000000000826a7b in test_runner.main () at test_runner.zig:36
#16 0x000000000080e3dd in start.posixCallMainAndExit () at /home/ryan/Programming/zig/zig/lib/std/start.zig:369
#17 0x000000000080df32 in _start () at /home/ryan/Programming/zig/zig/lib/std/start.zig:251

Works fine when run via hyperfine, so I'm considering this a poop bug rather than a Zig bug.

do something with cache references and cache misses together

The percent of cache misses with respect to cache references is interesting, and is currently lost.

add an option to show ratio instead of percent delta (possibly by default)

I think that percentages are not easier to comprehend than ratios, especially when the delta is quite big. An example:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        26.565ms ± 3.741ms     25.269ms … 52.593ms          18 ( 5%)        0%
  peak_rss         16M ± 1K               16M … 16M                    90 (24%)        0%
  cpu_cycles       29460307 ± 352152      27881614 … 34087924          25 ( 7%)        0%
  instructions     68245274 ± 3           68245252 … 68245299           8 ( 2%)        0%
  cache_references 1905677 ± 11890        1885623 … 2050296             4 ( 1%)        0%
  cache_misses     35904 ± 994            34424 … 51464                 7 ( 2%)        0%
  branch_misses    18101 ± 75             18032 … 19280                16 ( 4%)        0%
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        499.521ms ± 16.373ms   486.703ms … 547.971ms         2 (10%)        💩+1780.3% ±  8.6%
  peak_rss         30M ± 2K               30M … 30M                     0 ( 0%)        💩+ 89.5% ±  0.0%
  cpu_cycles       1436695570 ± 36769236  1385230017 … 1548137633       4 (19%)        💩+4776.7% ± 12.4%
  instructions     443694437 ± 8150479    433293521 … 465060803         2 (10%)        💩+550.1% ±  1.2%
  cache_references 51072489 ± 227383      50490604 … 51378709           1 ( 5%)        💩+2580.0% ±  1.2%
  cache_misses     21754058 ± 27034       21713445 … 21806249           0 ( 0%)        💩+60490.3% ±  7.5%
  branch_misses    4283745 ± 190618       4059269 … 4782911             1 ( 5%)        💩+23565.4% ± 104.1%
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        168.705ms ± 17.803ms   149.247ms … 226.071ms         1 ( 2%)        💩+535.1% ±  7.6%
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        💩+  8.0% ±  0.0%
  cpu_cycles       483128706 ± 36179722   453889331 … 625311678         6 (10%)        💩+1539.9% ± 12.3%
  instructions     657956378 ± 7          657956360 … 657956401         2 ( 3%)        💩+864.1% ±  0.0%
  cache_references 5418462 ± 3474959      3555473 … 26208274            5 ( 8%)        💩+184.3% ± 18.3%
  cache_misses     563791 ± 85035         512236 … 1009351              9 (15%)        💩+1470.3% ± 23.8%
  branch_misses    1009697 ± 704651       823842 … 4319352             11 (18%)        💩+5478.0% ± 391.1%

Here is the same benchmark run with the worst one first:

Benchmark 1 (10 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        501.108ms ± 12.548ms   487.794ms … 516.363ms         0 ( 0%)        0%
  peak_rss         30M ± 1K               30M … 30M                     2 (20%)        0%
  cpu_cycles       1483284321 ± 13871550  1451769864 … 1497658793       2 (20%)        0%
  instructions     440695501 ± 8102579    432198649 … 459702251         0 ( 0%)        0%
  cache_references 51048638 ± 241437      50737084 … 51402025           0 ( 0%)        0%
  cache_misses     21761058 ± 32075       21698123 … 21794997           0 ( 0%)        0%
  branch_misses    4199931 ± 180500       4013353 … 4614098             0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        161.333ms ± 13.266ms   143.47ms … 192.347ms          0 ( 0%)        ⚡- 67.8% ±  1.9%
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        ⚡- 43.0% ±  0.0%
  cpu_cycles       461790403 ± 7477095    451871827 … 478106740         0 ( 0%)        ⚡- 68.9% ±  0.5%
  instructions     657956376 ± 5          657956369 … 657956387         0 ( 0%)        💩+ 49.3% ±  0.7%
  cache_references 3920322 ± 257185       3555405 … 4573300             0 ( 0%)        ⚡- 92.3% ±  0.4%
  cache_misses     518445 ± 4596          510041 … 528130               0 ( 0%)        ⚡- 97.6% ±  0.1%
  branch_misses    824224 ± 250           823674 … 824660               0 ( 0%)        ⚡- 80.4% ±  1.5%
Benchmark 3 (147 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        34.088ms ± 12.053ms    24.99ms … 55.173ms            0 ( 0%)        ⚡- 93.2% ±  1.5%
  peak_rss         16M ± 2K               16M … 16M                    38 (26%)        ⚡- 47.2% ±  0.0%
  cpu_cycles       29127566 ± 811043      27831022 … 30678771           0 ( 0%)        ⚡- 98.0% ±  0.1%
  instructions     68245275 ± 2           68245272 … 68245280           3 ( 2%)        ⚡- 84.5% ±  0.3%
  cache_references 1916213 ± 34330        1887064 … 2306377             1 ( 1%)        ⚡- 96.2% ±  0.1%
  cache_misses     36561 ± 922            35153 … 39289                 1 ( 1%)        ⚡- 99.8% ±  0.0%
  branch_misses    18107 ± 57             18030 … 18373                 3 ( 2%)        ⚡- 99.6% ±  0.7%

I think something like this is much easier to grok:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        26.565ms ± 3.741ms     25.269ms … 52.593ms          18 ( 5%)        1x
  peak_rss         16M ± 1K               16M … 16M                    90 (24%)        1x
  cpu_cycles       29460307 ± 352152      27881614 … 34087924          25 ( 7%)        1x
  instructions     68245274 ± 3           68245252 … 68245299           8 ( 2%)        1x
  cache_references 1905677 ± 11890        1885623 … 2050296             4 ( 1%)        1x
  cache_misses     35904 ± 994            34424 … 51464                 7 ( 2%)        1x
  branch_misses    18101 ± 75             18032 … 19280                16 ( 4%)        1x
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        499.521ms ± 16.373ms   486.703ms … 547.971ms         2 (10%)        💩18.803x ±  0.086
  peak_rss         30M ± 2K               30M … 30M                     0 ( 0%)        💩 1.895x ±  0.000
  cpu_cycles       1436695570 ± 36769236  1385230017 … 1548137633       4 (19%)        💩48.767% ± 0.124
  instructions     443694437 ± 8150479    433293521 … 465060803         2 (10%)        💩6.501x ±  0.012
  cache_references 51072489 ± 227383      50490604 … 51378709           1 ( 5%)        💩26.800x ±  0.012
  cache_misses     21754058 ± 27034       21713445 … 21806249           0 ( 0%)        💩61.4903x ±  0.075
  branch_misses    4283745 ± 190618       4059269 … 4782911             1 ( 5%)        💩24.5654x ± 1.041
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        168.705ms ± 17.803ms   149.247ms … 226.071ms         1 ( 2%)        💩6.351x ±  0.076
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        💩 1.080x ±  0.000
  cpu_cycles       483128706 ± 36179722   453889331 … 625311678         6 (10%)        💩16.399x ± 0.123
  instructions     657956378 ± 7          657956360 … 657956401         2 ( 3%)        💩9.641x ±  0.000
  cache_references 5418462 ± 3474959      3555473 … 26208274            5 ( 8%)        💩2.843x ± 0.183
  cache_misses     563791 ± 85035         512236 … 1009351              9 (15%)        💩15.703x ± 0.238
  branch_misses    1009697 ± 704651       823842 … 4319352             11 (18%)        💩55.780x ± 3.911

(I didn't properly convert the numbers on the confidence intervals to a ratio, so they'll be a bit off)

The ratio will be even easier to read (relative to the delta) if you also truncate some of the less significant figures in which case the ratio will need fewer digits than the delta to represent the performance differences (assuming we don't want to use scientific notation for the delta).

align stuff better

example:

  measurement      mean ± σ               min … max                  delta
  wall_time        1.254s ± 35.23ms       1.231s … 1.294s            ⚡-3.0%
  peak_rss         169M ± 97K             169M … 169M                  +0.4%
  cpu_cycles       4639444330 ± 42490202  4597639737 … 4682588462    ⚡-1.2%
  instructions     5998472007 ± 1915499   5996326492 … 6000010285      -0.1%
  cache_references 264926528 ± 1161752    263873524 … 266172790        -0.5%
  cache_misses     26841904 ± 359513      26436439 … 27121761        ⚡-2.1%
  branch_misses    35020581 ± 56286       34959963 … 35071194          -0.8%

it would be nice if the +/- and the ... lined up with each other

	try printUnit(fbs.writer(), m.mean, m.unit, m.std_dev);
	try w.writeAll(fbs.getWritten());
	count += fbs.pos;
	fbs.pos = 0;

andrewrk / poop Goto Github PK

poop's Introduction

Performance Optimizer Observation Platform

Overview

Usage

Building from Source

Comparison with Hyperfine

poop's People

Contributors

Stargazers

Watchers

Forkers

poop's Issues

Recommend Projects

Recommend Topics

Recommend Org