burntsushi / cargo-benchcmp Goto Github PK

View Code? Open in Web Editor NEW

341.0 341.0 21.0 303 KB

A small utility to compare Rust micro-benchmarks.

License: The Unlicense

Rust 99.82% Makefile 0.18%

cargo-benchcmp's People

Contributors

Stargazers

Watchers

Forkers

apanatshka llogiq fitzgen wdv4758h robmurtha hsivonen mgeisler robinst alexheretic kevincox stevepentland samcday ssomers cuviper isgasho standardgalactic playfloor atouchet swatinem

cargo-benchcmp's Issues

Add suppor for fractional `libtest` output

rust-lang/rust#124774 added support for outputting fractional ns values, which are not parsed properly (yet).

(un)License issue

I noticed this project is Dual-licensed under MIT or the UNLICENSE. That is very generous. However you seem to use crates which are not UNLICENSE'd. I believe that is a problem.

The UNLICENSE includes this text:

Anyone is free to [...] distribute this software, either in source code
form or as a compiled binary, [...]

There are no conditions to this distribution. Now look at prettytable-rs which uses a BSD license which includes:

Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

This is not compatible. You are using prettytable-rs which requires a copyright notice in binary distributions, however the Unlicense allows distributions without copyright notice.

There is no issue with source-code distributions, because the source-code of cargo-benchcmp does not include the source-code of any of the crates it uses. There is also no issue to use a compiled binary, but it will be unredistributable.

This also applies to regex, rustc-serialize and lazy-static.rs which are in the Cargo.toml too.

I believe you should either drop the Unlicense, or stop using crates which do not use the Unlicense.

Windows pipes in UTF-16

One windows, piping the benchmarks like cargo bench > old will save the results in UTF-16. Running cargo benchcmp will then fail with "stream did not contain valid UTF-8".

Include speedup in output

Currently the output include the diff in ns and the diff in %. Here is an example in one of my projects:

 name                                        899c01d.txt ns/iter     0014b09.txt ns/iter     diff ns/iter   diff % 
 argon::forces                               2,380,532 (+/- 2265)    495,065 (+/- 8025)        -1,885,467  -79.20% 
 propane::forces                             1,383,833 (+/- 1628)    446,348 (+/- 26241)         -937,485  -67.75% 
 propane::virial                             1,501,534 (+/- 2487)    1,459,726 (+/- 1119)         -41,808   -2.78% 
 water::cache_move_all_rigid_molecules_wolf  6,998,308 (+/- 877669)  6,531,980 (+/- 809654)      -466,328   -6.66%

What do you think about including speedup too? I can be computed by 1 / (1 + diff%). For example, the output could look like

 name                                        899c01d.txt ns/iter     0014b09.txt ns/iter     diff ns/iter   diff %  speedup
 argon::forces                               2,380,532 (+/- 2265)    495,065 (+/- 8025)        -1,885,467  -79.20%  x 4.81
 propane::forces                             1,383,833 (+/- 1628)    446,348 (+/- 26241)         -937,485  -67.75%  x 3.10
 propane::virial                             1,501,534 (+/- 2487)    1,459,726 (+/- 1119)         -41,808   -2.78%  x 1.03
 water::cache_move_all_rigid_molecules_wolf  6,998,308 (+/- 877669)  6,531,980 (+/- 809654)      -466,328   -6.66%  x 1.07

The diff % looks better for small difference, but gets hard to interpret for big differences. In contrary, the speedup looks better for high speedups, and worst for small differences.

Include results that are not in both result sets

I have a build using cargo benchcmp. Thanks for making this! It's great!

Currently, my results look like this:

name                   previous-benchmark ns/iter  current-benchmark ns/iter  diff ns/iter  diff % 
b01_compile_trivial    2,132                       2,829                               697  32.69% 
b02_compile_large      1,594,916                   2,516,662                       921,746  57.79% 
b03_compile_huge       51,480,362                  86,575,722                   35,095,360  68.17% 
b04_compile_simple     31,399                      44,247                           12,848  40.92% 
b05_compile_slow       225,000                     289,571                          64,571  28.70% 
b06_interpret_trivial  10,325                      10,442                              117   1.13% 
b07_interpret_simple   6,740,466                   7,388,517                       648,051   9.61% 
WARNING: benchmarks in new but not in old: b01_compile_trivial_opt, b02_compile_large_opt, b03_compile_huge_opt, b04_compile_simple_opt, b05_compile_slow_opt, b06_interpret_trivial_opt, b07_interpret_simple_opt

The WARNING on the bottom is because I added a bunch of benchmarks in the pull request I'm looking at. It would be great if instead of the warning, benchcmp listed those benchmarks as part of the results. Just leave the other column blank and don't color the text (leave it white or the default). That way all the results will show up in the table, and I can still clearly see what changed from before to after.

Segmentation fault

I get a segmentation fault when running the command.

➜  agg_perf git:(agg_perf) ✗ cargo benchcmp control variable
fish: Job 1, 'cargo benchcmp control variable' terminated by signal SIGSEGV (Address boundary error)

➜  agg_perf git:(agg_perf) ✗ bash
[pascal@pascal-rogstrixg513qyg513qy agg_perf]$ cargo benchcmp control variable
Segmentation fault (core dumped)

➜  agg_perf git:(agg_perf) ✗ rustc --version
rustc 1.67.0 (fc594f156 2023-01-24)
➜  agg_perf git:(agg_perf) ✗ uname -a
Linux pascal-rogstrixg513qyg513qy 6.0.19-3-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Jan 18 07:57:11 UTC 2023 x86_64 GNU/Linux

➜ coredumpctl gdb 45907
(gdb) bt
#0  0x0000561c3a520410 in prettytable::TableSlice::get_all_column_width ()
#1  0x0000561c3a52084c in prettytable::TableSlice::print_tty ()
#2  0x0000561c3a51216b in cargo_benchcmp::Args::run ()
#3  0x0000561c3a50f64d in cargo_benchcmp::main ()
#4  0x0000561c3a51e7e3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x0000561c3a51e7f9 in _ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17hbe8c0291b1f55e55E.llvm.2442119530853037590 ()
#6  0x0000561c3a5e37ac in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:606
#7  std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:483
#8  std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:447
#9  std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137
#10 std::rt::lang_s
[control.txt](https://github.com/BurntSushi/cargo-benchcmp/files/10742058/control.txt)
tart_internal::{closure#2} () at library/std/src/rt.rs:148
#11 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:483
#12 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:447
#13 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:137
#14 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#15 0x0000561c3a513b35 in main ()

variable.txt
control.txt

Allow running as either `cargo benchcmp` or `cargo-benchcmp`

From #17:

@BurntSushi says:

You need to actually use cargo benchcmp, not cargo-benchcmp.

Yes, this is annoying and yes this should be in the README. It looks like this is Docopt's fault, and the error message is awful. :-(

@fitzgen says:

Ah because when run as cargo benchcmp you get two args and when run as cargo-benchcmp you get one? Would it be possible to detect which situation we're in before passing args to docopt and then do the Right Thing after that?

@BurntSushi says:

I'd wholeheartedly support that, yes. :-)

are colors backwards?

While the colors look pretty, it feels like they are reversed. @Apanatshka was this intentional? In particular, red feels "bad" to me, but in fact, red is good because it indicates a speedup. However, I can also see how red means "negative"...

Getting blank output, not quite sure why?

Hi there,

Thanks for writing this cool tool :) I'm new to Rust, so please excuse any stupid questions.

I'm trying to use benchcmp with my lib, Frunk, using the following:

$ rustup run nightly cargo bench > benches-control && rustup run nightly cargo bench > benches-variable
#.. then 
$ cargo benchcmp benches-control benches-variable

I understand that benchmarking on the same branch will yield zero (or close to) differences, and my intention is to use benchcmp to compare performance before and after applying a PR, so I'll switch branches in between during actual usage.

Unfortunately the second command yields no output and I'm not sure why ? My benches are here, located in a separate directory.

Thanks in advance for your help,

Lloyd

Coloured regressions/improvements

As a side-effect of switching from tabwriter to prettytable-rs, we can now do coloured terminal output. It's a small change to get red lines for regressions and green lines for improvements. It almost makes the --regressions/--improvements flags superfluous :)

I can implement this feature quite easily (I did it before but I can't find the code just now), I may be able to squeeze it into a break or do it right after work. My modem is broken so I'm currently relying on internet access at work 😐

Take account of bench error

Thanks for this utility!

I think it would be nice to indicate differences inside the control error somehow. Perhaps colour them orange/warning-colour instead of red when a subsequent bench is slower than the control but inside the error.

For example a control

test benchmark_a  ... bench:  24,000,000 ns/iter (+/- 500,000)
test benchmark_b  ... bench:  24,000,000 ns/iter (+/- 5,000)

...and the latest bench

test benchmark_a  ... bench:  24,150,000 ns/iter (+/- 480,000)
test benchmark_b  ... bench:  24,150,000 ns/iter (+/- 5,500)

Will produce this output:

 name         control ns/iter   latest ns/iter     diff ns/iter  diff %  speedup 
 benchmark_a  24,000,000        24,150,000             150,000   0.62%   x 0.99 // colour orange
 benchmark_b  24,000,000        24,150,000             150,000   0.62%   x 0.99 // colour red

benchmark_a will be a warning as the diff is inside the control error. benchmark_b will be red as before as the diff is outside the error.

Perhaps you could also allow an option, similar to --threshold, to ignore warnings/ diffs inside errors.

Filter statistically significant changes

It would be great if it was possible to filter such that only statistically significant changes were shown. This is somewhat similar to issue #4.

cargo build --locked doesn't work in 0.4.1

hey!

the latest release ships a Cargo.lock that still points to 0.4.0. cargo probably updates this automatically the next time after cargo is invoked, but the current latest release can't be built with --locked due to this.

This is commonly done in archlinux.

Display a message when --improvements or --regressions is empty

Rather than displaying an empty table (which made me think that I broke cargo benchcmp) we should display a message like "No improvements" or "No regressions".

thread 'main' panicked at 'Cannot print table to standard output : operation not supported by the terminal'

Inside Emacs's M-x shell, the cargo benchcmp command will always panic:

$ cargo benchcmp ~/scratch/before ~/scratch/after
thread 'main' panicked at 'Cannot print table to standard output : operation not supported by the terminal', /Users/fitzgen/.cargo/registry/src/github.com-1ecc6299db9ec823/prettytable-rs-0.6.5/src/lib.rs:173
stack backtrace:
   1:        0x103c9bafa - std::sys::backtrace::tracing::imp::write::h46f28e67d38b4637
   2:        0x103c9dabf - std::panicking::default_hook::{{closure}}::h1d3243f546573ff4
   3:        0x103c9cf85 - std::panicking::default_hook::h96c288d728df3ebf
   4:        0x103c9d536 - std::panicking::rust_panic_with_hook::hb1322e5f2588b4db
   5:        0x103c9d3d4 - std::panicking::begin_panic::hfbeda5aad583dc32
   6:        0x103c9d2f2 - std::panicking::begin_panic_fmt::h4fe9fb9d5109c4bf
   7:        0x103c05962 - prettytable::TableSlice::print_tty::h7f3d5c1f5cfe8a0a
   8:        0x103bff5fd - cargo_benchcmp::Args::run::h5c6cc5028d1347fb
   9:        0x103bfa6e9 - cargo_benchcmp::main::hb31fd77d8a66a286
  10:        0x103c9e07a - __rust_maybe_catch_panic
  11:        0x103c9cac6 - std::rt::lang_start::haaae1186de9de8cb

Ideally, this would check if pretty colors or whatever are supported (which M-x shell actually will support, so maybe just removing whatever check will work?) and not use them if they aren't, rather than panicking.

Pick the best of multiple runs

One relatively simple way to paper over variability of benchmarks (for example cpu warmup-related things) is to pick the best time of multiple runs. The tool could allow having multiple input files for both before and after.

get rid of second_law

It no longer compiles because of a misuse of as_ref in a non-generic context.

We should not be using test frameworks. Building the scaffolding manually on top of process::Command is simple. Let's just do that.

This issue needs to be fixed for the build to be unbroken.

Interest in extension of the tool?

I forked your project and ported it to Rust as an exercise. I don't consider that an improvement, since you now need to compile the tool before you can use it. So I didn't open a PR. Besides, it's in an orphan branch because it doesn't really relate strongly to the Python stuff.

But then I had an idea to add to the tool and I did it to the Rust version because I'm more familiar with that code. Are you interested in pulling that code back into this repo? I prefer contributing/collaborating, even though it's a small tool.

Use in CI

it'd be amazing to be able to use this in CI to reject pull requests that introduce performance regressions.

There are a couple of things missing that would be needed to support this

exit with error code if there are regressions. This would probably be behind a '--check' flag to be consistent with rust fmt.
filter by statistical significance (#39). Filtering by percentage is essentially meaningless. This would be essential to reduce the noise in CI. This could be configurable by either P-value or sigma, with a sensible default. Doesn't have to be anything too fancy- the difference between two averages with known variance is itself a normally distributed random variable. you just need the P-value that the distribution includes 0, or the number of standard deviations that the mean is from 0. plenty of literature out there for doing this.

N-way comparison with plots

This is the big feature I've been working on. I would link you to the branch, but my modem is broken and I forgot to bring the code/laptop to work. Which also means that I cannot currently show you pretty pictures to help make the case for this feature.

What I can do is give a quick overview of the design decisions and effects of this feature.

Effects

Generate images (a bar chart with error bars) 📊
Extra code to maintain 🐞
Dependency on gnuplot for the plotting feature 😕 (couldn't find a pure Rust lib, so I'm piping to whatever is called gnuplot on the PATH)

Design decisions

Currently produces one plot per test name since different benches can have very different values. But maybe a big plot is also nice? Do we give that as an option then, or do we always generate it?
Where to put the plots. Right now it defaults to target/benchcmp. Should it be configurable? What about the issue where you call benchcmp from a subdirectory of your project? AFAIK we can't get the project root from an environment variable.. Do we go dirty and poll the directory structure to find the Cargo.toml file?
Command line interface: my original plan was to put the original stuff under a subcommand table and this new stuff under subcommand plot, but now I'm thinking perhaps it's better to provide a separate cargo-benchplot executable?

Include complete example in documentation

It would be great if there were a complete example that shows how to generate the "old" and "new" files that get passed to cargo benchcmp old new.

I feel like a dummy because none of

$ cargo bench > old
$ cargo bench > new
$ cargo benchcmp old new

nor

$ cargo bench &> old
$ cargo bench &> new
$ cargo benchcmp old new

nor

$ cargo benchcmp ./target/release/bench-12345 ./target/release/bench-67890

are working for me. (Obviously, in a non-toy example, I'd make changes between generating old and new benchmark results).

They all spit back:

Invalid arguments.

Usage:
    cargo benchcmp [options] <old> <new>
    cargo benchcmp [options] <old> <new> <file>
    cargo benchcmp -h | --help
    cargo benchcmp --version

What am I doing wrong here? I'd be happy to submit a PR updating the docs with a complete example once I can figure out how to run cargo benchcmp properly.

use more rigourous statistics

Russ Cox has written https://godoc.org/rsc.io/benchstat which does the same thing but with better math.

Project co-ownership?

Ok, I've rewritten this issue because the original had petty complaints about things I should have expected anyway.

Are you interested in sharing project ownership with me? I have more things I'd love to contribute to the tool to make it even more useful. And I think I can learn a lot from you about Rust. I'm also willing to look into the other issues that are open right now. But I'd like to have a clear sense of the roles and the way we work together.

write tests

Set up test coverage badge

I saw another crate use a badge for test coverage using a service called Coveralls. I have no experience with them, but it looks nice. Might be a good idea when #3 is resolved?