burntsushi / cargo-benchcmp Goto Github PK
View Code? Open in Web Editor NEWA small utility to compare Rust micro-benchmarks.
License: The Unlicense
A small utility to compare Rust micro-benchmarks.
License: The Unlicense
rust-lang/rust#124774 added support for outputting fractional ns
values, which are not parsed properly (yet).
I noticed this project is Dual-licensed under MIT or the UNLICENSE. That is very generous. However you seem to use crates which are not UNLICENSE'd. I believe that is a problem.
The UNLICENSE includes this text:
Anyone is free to [...] distribute this software, either in source code
form or as a compiled binary, [...]
There are no conditions to this distribution. Now look at prettytable-rs which uses a BSD license which includes:
Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
This is not compatible. You are using prettytable-rs which requires a copyright notice in binary distributions, however the Unlicense allows distributions without copyright notice.
There is no issue with source-code distributions, because the source-code of cargo-benchcmp does not include the source-code of any of the crates it uses. There is also no issue to use a compiled binary, but it will be unredistributable.
This also applies to regex, rustc-serialize and lazy-static.rs which are in the Cargo.toml too.
I believe you should either drop the Unlicense, or stop using crates which do not use the Unlicense.
One windows, piping the benchmarks like cargo bench > old
will save the results in UTF-16. Running cargo benchcmp
will then fail with "stream did not contain valid UTF-8".
Currently the output include the diff in ns and the diff in %. Here is an example in one of my projects:
name 899c01d.txt ns/iter 0014b09.txt ns/iter diff ns/iter diff %
argon::forces 2,380,532 (+/- 2265) 495,065 (+/- 8025) -1,885,467 -79.20%
propane::forces 1,383,833 (+/- 1628) 446,348 (+/- 26241) -937,485 -67.75%
propane::virial 1,501,534 (+/- 2487) 1,459,726 (+/- 1119) -41,808 -2.78%
water::cache_move_all_rigid_molecules_wolf 6,998,308 (+/- 877669) 6,531,980 (+/- 809654) -466,328 -6.66%
What do you think about including speedup too? I can be computed by 1 / (1 + diff%)
. For example, the output could look like
name 899c01d.txt ns/iter 0014b09.txt ns/iter diff ns/iter diff % speedup
argon::forces 2,380,532 (+/- 2265) 495,065 (+/- 8025) -1,885,467 -79.20% x 4.81
propane::forces 1,383,833 (+/- 1628) 446,348 (+/- 26241) -937,485 -67.75% x 3.10
propane::virial 1,501,534 (+/- 2487) 1,459,726 (+/- 1119) -41,808 -2.78% x 1.03
water::cache_move_all_rigid_molecules_wolf 6,998,308 (+/- 877669) 6,531,980 (+/- 809654) -466,328 -6.66% x 1.07
The diff % looks better for small difference, but gets hard to interpret for big differences. In contrary, the speedup looks better for high speedups, and worst for small differences.
I have a build using cargo benchcmp
. Thanks for making this! It's great!
Currently, my results look like this:
name previous-benchmark ns/iter current-benchmark ns/iter diff ns/iter diff %
b01_compile_trivial 2,132 2,829 697 32.69%
b02_compile_large 1,594,916 2,516,662 921,746 57.79%
b03_compile_huge 51,480,362 86,575,722 35,095,360 68.17%
b04_compile_simple 31,399 44,247 12,848 40.92%
b05_compile_slow 225,000 289,571 64,571 28.70%
b06_interpret_trivial 10,325 10,442 117 1.13%
b07_interpret_simple 6,740,466 7,388,517 648,051 9.61%
WARNING: benchmarks in new but not in old: b01_compile_trivial_opt, b02_compile_large_opt, b03_compile_huge_opt, b04_compile_simple_opt, b05_compile_slow_opt, b06_interpret_trivial_opt, b07_interpret_simple_opt
The WARNING on the bottom is because I added a bunch of benchmarks in the pull request I'm looking at. It would be great if instead of the warning, benchcmp listed those benchmarks as part of the results. Just leave the other column blank and don't color the text (leave it white or the default). That way all the results will show up in the table, and I can still clearly see what changed from before to after.
I get a segmentation fault when running the command.
➜ agg_perf git:(agg_perf) ✗ cargo benchcmp control variable
fish: Job 1, 'cargo benchcmp control variable' terminated by signal SIGSEGV (Address boundary error)
➜ agg_perf git:(agg_perf) ✗ bash
[pascal@pascal-rogstrixg513qyg513qy agg_perf]$ cargo benchcmp control variable
Segmentation fault (core dumped)
➜ agg_perf git:(agg_perf) ✗ rustc --version
rustc 1.67.0 (fc594f156 2023-01-24)
➜ agg_perf git:(agg_perf) ✗ uname -a
Linux pascal-rogstrixg513qyg513qy 6.0.19-3-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Jan 18 07:57:11 UTC 2023 x86_64 GNU/Linux
➜ coredumpctl gdb 45907
(gdb) bt
#0 0x0000561c3a520410 in prettytable::TableSlice::get_all_column_width ()
#1 0x0000561c3a52084c in prettytable::TableSlice::print_tty ()
#2 0x0000561c3a51216b in cargo_benchcmp::Args::run ()
#3 0x0000561c3a50f64d in cargo_benchcmp::main ()
#4 0x0000561c3a51e7e3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5 0x0000561c3a51e7f9 in _ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17hbe8c0291b1f55e55E.llvm.2442119530853037590 ()
#6 0x0000561c3a5e37ac in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:606
#7 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:483
#8 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:447
#9 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137
#10 std::rt::lang_s
[control.txt](https://github.com/BurntSushi/cargo-benchcmp/files/10742058/control.txt)
tart_internal::{closure#2} () at library/std/src/rt.rs:148
#11 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:483
#12 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:447
#13 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:137
#14 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#15 0x0000561c3a513b35 in main ()
From #17:
@BurntSushi says:
You need to actually use
cargo benchcmp
, notcargo-benchcmp
.Yes, this is annoying and yes this should be in the README. It looks like this is Docopt's fault, and the error message is awful. :-(
@fitzgen says:
Ah because when run as
cargo benchcmp
you get two args and when run ascargo-benchcmp
you get one? Would it be possible to detect which situation we're in before passing args to docopt and then do the Right Thing after that?
@BurntSushi says:
I'd wholeheartedly support that, yes. :-)
While the colors look pretty, it feels like they are reversed. @Apanatshka was this intentional? In particular, red feels "bad" to me, but in fact, red is good because it indicates a speedup. However, I can also see how red means "negative"...
Hi there,
Thanks for writing this cool tool :) I'm new to Rust, so please excuse any stupid questions.
I'm trying to use benchcmp with my lib, Frunk, using the following:
$ rustup run nightly cargo bench > benches-control && rustup run nightly cargo bench > benches-variable
#.. then
$ cargo benchcmp benches-control benches-variable
I understand that benchmarking on the same branch will yield zero (or close to) differences, and my intention is to use benchcmp to compare performance before and after applying a PR, so I'll switch branches in between during actual usage.
Unfortunately the second command yields no output and I'm not sure why ? My benches are here, located in a separate directory.
Thanks in advance for your help,
Lloyd
As a side-effect of switching from tabwriter
to prettytable-rs
, we can now do coloured terminal output. It's a small change to get red lines for regressions and green lines for improvements. It almost makes the --regressions
/--improvements
flags superfluous :)
I can implement this feature quite easily (I did it before but I can't find the code just now), I may be able to squeeze it into a break or do it right after work. My modem is broken so I'm currently relying on internet access at work 😐
Thanks for this utility!
I think it would be nice to indicate differences inside the control error somehow. Perhaps colour them orange/warning-colour instead of red when a subsequent bench is slower than the control but inside the error.
For example a control
test benchmark_a ... bench: 24,000,000 ns/iter (+/- 500,000)
test benchmark_b ... bench: 24,000,000 ns/iter (+/- 5,000)
...and the latest bench
test benchmark_a ... bench: 24,150,000 ns/iter (+/- 480,000)
test benchmark_b ... bench: 24,150,000 ns/iter (+/- 5,500)
Will produce this output:
name control ns/iter latest ns/iter diff ns/iter diff % speedup
benchmark_a 24,000,000 24,150,000 150,000 0.62% x 0.99 // colour orange
benchmark_b 24,000,000 24,150,000 150,000 0.62% x 0.99 // colour red
benchmark_a will be a warning as the diff is inside the control error. benchmark_b will be red as before as the diff is outside the error.
Perhaps you could also allow an option, similar to --threshold
, to ignore warnings/ diffs inside errors.
It would be great if it was possible to filter such that only statistically significant changes were shown. This is somewhat similar to issue #4.
hey!
the latest release ships a Cargo.lock that still points to 0.4.0. cargo probably updates this automatically the next time after cargo is invoked, but the current latest release can't be built with --locked due to this.
This is commonly done in archlinux.
Rather than displaying an empty table (which made me think that I broke cargo benchcmp
) we should display a message like "No improvements" or "No regressions".
Inside Emacs's M-x shell
, the cargo benchcmp
command will always panic:
$ cargo benchcmp ~/scratch/before ~/scratch/after
thread 'main' panicked at 'Cannot print table to standard output : operation not supported by the terminal', /Users/fitzgen/.cargo/registry/src/github.com-1ecc6299db9ec823/prettytable-rs-0.6.5/src/lib.rs:173
stack backtrace:
1: 0x103c9bafa - std::sys::backtrace::tracing::imp::write::h46f28e67d38b4637
2: 0x103c9dabf - std::panicking::default_hook::{{closure}}::h1d3243f546573ff4
3: 0x103c9cf85 - std::panicking::default_hook::h96c288d728df3ebf
4: 0x103c9d536 - std::panicking::rust_panic_with_hook::hb1322e5f2588b4db
5: 0x103c9d3d4 - std::panicking::begin_panic::hfbeda5aad583dc32
6: 0x103c9d2f2 - std::panicking::begin_panic_fmt::h4fe9fb9d5109c4bf
7: 0x103c05962 - prettytable::TableSlice::print_tty::h7f3d5c1f5cfe8a0a
8: 0x103bff5fd - cargo_benchcmp::Args::run::h5c6cc5028d1347fb
9: 0x103bfa6e9 - cargo_benchcmp::main::hb31fd77d8a66a286
10: 0x103c9e07a - __rust_maybe_catch_panic
11: 0x103c9cac6 - std::rt::lang_start::haaae1186de9de8cb
Ideally, this would check if pretty colors or whatever are supported (which M-x shell
actually will support, so maybe just removing whatever check will work?) and not use them if they aren't, rather than panicking.
One relatively simple way to paper over variability of benchmarks (for example cpu warmup-related things) is to pick the best time of multiple runs. The tool could allow having multiple input files for both before and after.
It no longer compiles because of a misuse of as_ref
in a non-generic context.
We should not be using test frameworks. Building the scaffolding manually on top of process::Command
is simple. Let's just do that.
This issue needs to be fixed for the build to be unbroken.
I forked your project and ported it to Rust as an exercise. I don't consider that an improvement, since you now need to compile the tool before you can use it. So I didn't open a PR. Besides, it's in an orphan branch because it doesn't really relate strongly to the Python stuff.
But then I had an idea to add to the tool and I did it to the Rust version because I'm more familiar with that code. Are you interested in pulling that code back into this repo? I prefer contributing/collaborating, even though it's a small tool.
it'd be amazing to be able to use this in CI to reject pull requests that introduce performance regressions.
There are a couple of things missing that would be needed to support this
This is the big feature I've been working on. I would link you to the branch, but my modem is broken and I forgot to bring the code/laptop to work. Which also means that I cannot currently show you pretty pictures to help make the case for this feature.
What I can do is give a quick overview of the design decisions and effects of this feature.
gnuplot
for the plotting feature 😕 (couldn't find a pure Rust lib, so I'm piping to whatever is called gnuplot
on the PATH
)target/benchcmp
. Should it be configurable? What about the issue where you call benchcmp
from a subdirectory of your project? AFAIK we can't get the project root from an environment variable.. Do we go dirty and poll the directory structure to find the Cargo.toml
file?table
and this new stuff under subcommand plot
, but now I'm thinking perhaps it's better to provide a separate cargo-benchplot
executable?It would be great if there were a complete example that shows how to generate the "old" and "new" files that get passed to cargo benchcmp old new
.
I feel like a dummy because none of
$ cargo bench > old
$ cargo bench > new
$ cargo benchcmp old new
nor
$ cargo bench &> old
$ cargo bench &> new
$ cargo benchcmp old new
nor
$ cargo benchcmp ./target/release/bench-12345 ./target/release/bench-67890
are working for me. (Obviously, in a non-toy example, I'd make changes between generating old and new benchmark results).
They all spit back:
Invalid arguments.
Usage:
cargo benchcmp [options] <old> <new>
cargo benchcmp [options] <old> <new> <file>
cargo benchcmp -h | --help
cargo benchcmp --version
What am I doing wrong here? I'd be happy to submit a PR updating the docs with a complete example once I can figure out how to run cargo benchcmp
properly.
Russ Cox has written https://godoc.org/rsc.io/benchstat which does the same thing but with better math.
Ok, I've rewritten this issue because the original had petty complaints about things I should have expected anyway.
Are you interested in sharing project ownership with me? I have more things I'd love to contribute to the tool to make it even more useful. And I think I can learn a lot from you about Rust. I'm also willing to look into the other issues that are open right now. But I'd like to have a clear sense of the roles and the way we work together.
I saw another crate use a badge for test coverage using a service called Coveralls. I have no experience with them, but it looks nice. Might be a good idea when #3 is resolved?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.