Comments (15)
Timeout takes a soft priority over RelStDev. No exceptions are thrown. I've elaborated a bit more in haddocks:
tasty-bench/Test/Tasty/Bench.hs
Lines 881 to 889 in 06fc7c1
from tasty-bench.
It would be nice, but probably too complicated for tasty-bench
to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?
I was thinking about implementing something in spirit of bcompare
, which mysteriously disappeared starting from criterion-1.0.0.0
. It could be a good step moving away from absolute numbers in benchmarks.
from tasty-bench.
We can phrase it differently: is the given function distinguishable from a given asymptote up to a constant multiple within some realistic range of inputs? Of course a pathological example like f (n) + (εn)! may always be invented such that up to a large enough n it looks like f and then suddenly explodes. But in such cases we may as well not have a bench mark in the first place.
It would be nice, but probably too complicated for
tasty-bench
to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?
I am not aware of any ready made solution in Haskell (although a quick search reveals prior art in other settings). But how hard can it be?
- Apply inverses of a bunch of functions representing asymptotic classes.
- Fit a linear model to each resulting data set.
- Select a class where the error is the least.
This can be flaky since some classes are very similar. But we do not really need this much. We only need to tell if a function plausibly may be in a given class, up to this much error. We really only want to automate and make precise the judgement that we already routinely make with a naked eye. And we already put bench marks in a test suite — it is a logical next step to make them actually detect failures.
I understand that this would be experimental and anything can go wrong. And we do not have any doctors of statistics on the pay roll. So maybe you are right and we should not even try.
from tasty-bench.
It does not take a PhD in statistics, but probably involves opinionated design choices and extra dependencies. tasty-bench
is a good citizen of tasty
ecosystem, so one can potentially write another TestReporter
, which will perform this kind of analysis. Additional metadata can be passed via dummy pattern in after
. I'm interested to see how this can play out, because I had similar desires in the past.
Actually, even a standalone package, which can guess asymptotics of a function, would be nice to have.
from tasty-bench.
I made some progress with bcompare
atop of tasty
facilities for test dependencies. Documentation is still absent, but here is an example of usage: Bodigrim/mod@cb4b737 It generates a report, comparing benchmark results with each other:
Sum
Data.Mod: OK (1.46s)
483 ms ± 41 ms
Data.Mod.Word: OK (2.18s)
145 ms ± 8.2 ms, 0.30x
finite-field: OK (12.76s)
4.26 s ± 133 ms, 8.81x
finite-typelits: OK (13.34s)
4.46 s ± 254 ms, 9.25x
modular-arithmetic: OK (8.63s)
2.87 s ± 91 ms, 5.95x
modular: OK (45.02s)
6.53 s ± 543 ms, 13.52x
from tasty-bench.
With bcompare
the original example can look like below, making it easier to guesstimate algorithmic complexity:
All
folds
foldl'
1: OK (3.86s)
14 ns ± 434 ps, 0.00x
2: OK (4.87s)
18 ns ± 500 ps, 0.00x
4: OK (14.16s)
26 ns ± 586 ps, 0.01x
8: OK (6.25s)
46 ns ± 1.5 ns, 0.01x
16: OK (11.60s)
86 ns ± 1.8 ns, 0.02x
32: OK (11.08s)
165 ns ± 4.8 ns, 0.04x
64: OK (4.95s)
292 ns ± 8.7 ns, 0.07x
128: OK (9.77s)
582 ns ± 12 ns, 0.14x
256: OK (9.34s)
1.1 μs ± 34 ns, 0.26x
512: OK (9.43s)
2.2 μs ± 70 ns, 0.53x
1024: OK (4.45s)
4.2 μs ± 132 ns
2048: OK (19.41s)
8.9 μs ± 313 ns, 2.11x
4096: OK (4.64s)
18 μs ± 654 ns, 4.15x
8192: OK (9.36s)
35 μs ± 677 ns, 8.38x
16384: OK (4.68s)
71 μs ± 2.3 μs, 16.90x
32768: OK (9.45s)
144 μs ± 3.1 μs, 34.11x
65536: OK (37.45s)
283 μs ± 3.5 μs, 67.20x
from tasty-bench.
So, it is comparing everything to the value at 1024? Nice!
But there are 2 drawbacks:
- It is even more numbers on the screen for my tired eyes to look at.
- The way to define it is via some obscure reference language? I use
awk
occasionally but this is, like, anti-Haskell.
So, it is sort of great, but also sort of not.
from tasty-bench.
- Yes,
tasty
usesawk
expressions for references between tests. The output above was generated from
bgroup "foldl'" $
map (\s -> (if S.length s == 1024 then id else bcompare "$NF == \"1024\" && $(NF-1) == \"foldl'\"") $
bench (show $ S.length s) $
nf (S.foldl' (\acc x -> acc + fromIntegral x) (0 :: Int)) s) foldInputs
This is arguably messy, because we stretch tasty
to its limits. It would be nice to provide more ergonomical combinators.
from tasty-bench.
I came here with the same feature request :-)
I think what's missing in tasty-bench is a way to time something without wrapping it in a TestTree
/Benchmark
.
Something like measure :: Benchmarkable -> IO Timings
, where Timings
would contain the summary statistics (mean, sd) as well as the raw measurements that were used to come up with these statistics.
Then downstream packages could use this to do some more intelligent analyses. (I'd have a go at one.)
@kindaro: see here for the explanation of why awk is used for patterns in tasty.
from tasty-bench.
@UnkindPartition what if I expose something like this?
tasty-bench/Test/Tasty/Bench.hs
Lines 788 to 790 in 1aac590
Would it be enough?
from tasty-bench.
It would also be nice to have a way to estimate a "good enough" number of runs, as described in the "Statistical model" section of the README. Otherwise that logic would have to be duplicated between packages.
from tasty-bench.
@UnkindPartition how about this?
tasty-bench/Test/Tasty/Bench.hs
Lines 847 to 850 in ee6b2d6
from tasty-bench.
This looks good. Would be good to document what the timeout does, exactly. (Would it use the max number of iterations that fit into the timeout even if RelStDev is not satisfied? Would it throw an exception?)
from tasty-bench.
Does this get to be wielded automatically when a bench mark runs in the presence of saved results of the previous bench mark, so that the results of a bench mark can be committed to version control, thereby at once turning it into a true test suite? Also, can we have default name for bench mark files, so that cabal test
but works? Or is this a fantasy yet?
from tasty-bench.
@kindaro not sure what exactly you refer by "this". Since measureCpuTime
is now publicly exposed, one can write a tasty
provider to detect and check asymptotic complexity. I think that such provider is unlikely to be merged into tasty-bench
itself, but the beauty of tasty
is that you can freely combine providers and ingredients at your own discretion.
You can specify default name for CSV report like this:
import Data.Maybe
import Test.Tasty.Bench
import Test.Tasty.Ingredients
import Test.Tasty.Options
import Test.Tasty.Runners
main = do
opts <- parseOptions benchIngredients benches
let opts' = changeOption (Just . fromMaybe (CsvPath "foo.csv")) opts
case tryIngredients benchIngredients opts' benches of
Nothing -> print "no ingredient"
Just mb -> do
b <- mb
print $ if b then "Success" else "Failure"
from tasty-bench.
Related Issues (20)
- Graceful degradation for Windows without chcp 65001
- Misleadling verbalization of speed up / slow down HOT 4
- Think about benchmarking of linear and unlifted data HOT 1
- Even less dependencies? HOT 8
- CSV reporter doesn't show baseline comparison HOT 5
- Benchmarking a memoized function HOT 2
- Estimate standard deviation for memory statistics HOT 2
- Output of the benchmark function is retained in memory HOT 11
- Custom, ad-hoc metrics? HOT 4
- Allow running benchmarks a given number of times. HOT 8
- Recommend mitigations for benchmark instability introduced by GHCs SpecConstr. HOT 5
- Benchmarks sometimes get stuck HOT 3
- Consider benchmarking until GCStats stabilise when given +RTS -T HOT 1
- Excessive inlining may optimize away the function to benchmark HOT 12
- How to remove "All" from every entry HOT 6
- Subtract benchmark baseline HOT 6
- Unhandled resource exception on pure IO action HOT 6
- Print more digits? HOT 6
- Some benchmarks written for `criterion` just complete immediately with `tasty-bench`. HOT 2
- Print nicer error message if baseline CSV file is not found
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tasty-bench.