For example, consider the following output: <div class="snippet-clipboard-content

I made some progress with bcompare atop of <code clas

Yes, tasty uses <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can we get a probabilistic check of asymptotic complexity? about tasty-bench HOT 15 CLOSED

kindaro commented on May 30, 2024

Can we get a probabilistic check of asymptotic complexity?

from tasty-bench.

Comments (15)

Bodigrim commented on May 30, 2024 1

Timeout takes a soft priority over RelStDev. No exceptions are thrown. I've elaborated a bit more in haddocks:

tasty-bench/Test/Tasty/Bench.hs

Lines 881 to 889 in 06fc7c1

 -- | An internal routine to measure execution time in seconds 

 -- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds) 

 -- and a target relative standard deviation 

 -- (put 'RelStDev' 0.05 for 5% or 'RelStDev' (1/0) to run only one iteration). 

 -- 

 -- 'Timeout' takes soft priority over 'RelStDev': this function prefers 

 -- to finish in time even if at cost of precision. However, timeout is guidance 

 -- not guarantee: 'measureCpuTime' can take longer, if there is not enough time 

 -- to run at least thrice or an iteration takes unusually long.

from tasty-bench.

Bodigrim commented on May 30, 2024

It would be nice, but probably too complicated for tasty-bench to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?

I was thinking about implementing something in spirit of bcompare, which mysteriously disappeared starting from criterion-1.0.0.0. It could be a good step moving away from absolute numbers in benchmarks.

from tasty-bench.

kindaro commented on May 30, 2024

We can phrase it differently: is the given function distinguishable from a given asymptote up to a constant multiple within some realistic range of inputs? Of course a pathological example like f (n) + (εn)! may always be invented such that up to a large enough n it looks like f and then suddenly explodes. But in such cases we may as well not have a bench mark in the first place.

It would be nice, but probably too complicated for tasty-bench to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?

I am not aware of any ready made solution in Haskell (although a quick search reveals prior art in other settings). But how hard can it be?

Apply inverses of a bunch of functions representing asymptotic classes.
Fit a linear model to each resulting data set.
Select a class where the error is the least.

This can be flaky since some classes are very similar. But we do not really need this much. We only need to tell if a function plausibly may be in a given class, up to this much error. We really only want to automate and make precise the judgement that we already routinely make with a naked eye. And we already put bench marks in a test suite — it is a logical next step to make them actually detect failures.

I understand that this would be experimental and anything can go wrong. And we do not have any doctors of statistics on the pay roll. So maybe you are right and we should not even try.

from tasty-bench.

Bodigrim commented on May 30, 2024

It does not take a PhD in statistics, but probably involves opinionated design choices and extra dependencies. tasty-bench is a good citizen of tasty ecosystem, so one can potentially write another TestReporter, which will perform this kind of analysis. Additional metadata can be passed via dummy pattern in after. I'm interested to see how this can play out, because I had similar desires in the past.

Actually, even a standalone package, which can guess asymptotics of a function, would be nice to have.

from tasty-bench.

Bodigrim commented on May 30, 2024

I made some progress with bcompare atop of tasty facilities for test dependencies. Documentation is still absent, but here is an example of usage: Bodigrim/mod@cb4b737 It generates a report, comparing benchmark results with each other:

  Sum
    Data.Mod:           OK (1.46s)
      483 ms ±  41 ms
    Data.Mod.Word:      OK (2.18s)
      145 ms ± 8.2 ms, 0.30x
    finite-field:       OK (12.76s)
      4.26 s ± 133 ms, 8.81x
    finite-typelits:    OK (13.34s)
      4.46 s ± 254 ms, 9.25x
    modular-arithmetic: OK (8.63s)
      2.87 s ±  91 ms, 5.95x
    modular:            OK (45.02s)
      6.53 s ± 543 ms, 13.52x

from tasty-bench.

Bodigrim commented on May 30, 2024

With bcompare the original example can look like below, making it easier to guesstimate algorithmic complexity:

All
  folds
    foldl'
      1:     OK (3.86s)
         14 ns ± 434 ps, 0.00x
      2:     OK (4.87s)
         18 ns ± 500 ps, 0.00x
      4:     OK (14.16s)
         26 ns ± 586 ps, 0.01x
      8:     OK (6.25s)
         46 ns ± 1.5 ns, 0.01x
      16:    OK (11.60s)
         86 ns ± 1.8 ns, 0.02x
      32:    OK (11.08s)
        165 ns ± 4.8 ns, 0.04x
      64:    OK (4.95s)
        292 ns ± 8.7 ns, 0.07x
      128:   OK (9.77s)
        582 ns ±  12 ns, 0.14x
      256:   OK (9.34s)
        1.1 μs ±  34 ns, 0.26x
      512:   OK (9.43s)
        2.2 μs ±  70 ns, 0.53x
      1024:  OK (4.45s)
        4.2 μs ± 132 ns
      2048:  OK (19.41s)
        8.9 μs ± 313 ns, 2.11x
      4096:  OK (4.64s)
         18 μs ± 654 ns, 4.15x
      8192:  OK (9.36s)
         35 μs ± 677 ns, 8.38x
      16384: OK (4.68s)
         71 μs ± 2.3 μs, 16.90x
      32768: OK (9.45s)
        144 μs ± 3.1 μs, 34.11x
      65536: OK (37.45s)
        283 μs ± 3.5 μs, 67.20x

from tasty-bench.

kindaro commented on May 30, 2024

So, it is comparing everything to the value at 1024? Nice!

But there are 2 drawbacks:

It is even more numbers on the screen for my tired eyes to look at.
The way to define it is via some obscure reference language? I use awk occasionally but this is, like, anti-Haskell.

So, it is sort of great, but also sort of not.

from tasty-bench.

Bodigrim commented on May 30, 2024

Yes, tasty uses awk expressions for references between tests. The output above was generated from

bgroup "foldl'" $
  map (\s -> (if S.length s == 1024 then id else bcompare "$NF == \"1024\" && $(NF-1) == \"foldl'\"") $ 
    bench (show $ S.length s) $
      nf (S.foldl' (\acc x -> acc + fromIntegral x) (0 :: Int)) s) foldInputs

This is arguably messy, because we stretch tasty to its limits. It would be nice to provide more ergonomical combinators.

from tasty-bench.

UnkindPartition commented on May 30, 2024

I came here with the same feature request :-)

I think what's missing in tasty-bench is a way to time something without wrapping it in a TestTree/Benchmark.

Something like measure :: Benchmarkable -> IO Timings, where Timings would contain the summary statistics (mean, sd) as well as the raw measurements that were used to come up with these statistics.

Then downstream packages could use this to do some more intelligent analyses. (I'd have a go at one.)

@kindaro: see here for the explanation of why awk is used for patterns in tasty.

from tasty-bench.

Bodigrim commented on May 30, 2024

@UnkindPartition what if I expose something like this?

tasty-bench/Test/Tasty/Bench.hs

Lines 788 to 790 in 1aac590

 -- | Low-level routine to measure execution time in seconds 

 -- of a given number of consecutive benchmark runs. 

 measureTime :: Word64 -> Benchmarkable -> IO Double

Would it be enough?

from tasty-bench.

UnkindPartition commented on May 30, 2024

It would also be nice to have a way to estimate a "good enough" number of runs, as described in the "Statistical model" section of the README. Otherwise that logic would have to be duplicated between packages.

from tasty-bench.

Bodigrim commented on May 30, 2024

@UnkindPartition how about this?

tasty-bench/Test/Tasty/Bench.hs

Lines 847 to 850 in ee6b2d6

 -- | An internal routine to measure execution time in seconds 

 -- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds) 

 -- and a target relative standard deviation (put 'RelStDev' 0.05 for 5%). 

 measureCpuTime :: Timeout -> RelStDev -> Benchmarkable -> IO Double

from tasty-bench.

UnkindPartition commented on May 30, 2024

This looks good. Would be good to document what the timeout does, exactly. (Would it use the max number of iterations that fit into the timeout even if RelStDev is not satisfied? Would it throw an exception?)

from tasty-bench.

kindaro commented on May 30, 2024

Does this get to be wielded automatically when a bench mark runs in the presence of saved results of the previous bench mark, so that the results of a bench mark can be committed to version control, thereby at once turning it into a true test suite? Also, can we have default name for bench mark files, so that cabal test but works? Or is this a fantasy yet?

from tasty-bench.

Bodigrim commented on May 30, 2024

@kindaro not sure what exactly you refer by "this". Since measureCpuTime is now publicly exposed, one can write a tasty provider to detect and check asymptotic complexity. I think that such provider is unlikely to be merged into tasty-bench itself, but the beauty of tasty is that you can freely combine providers and ingredients at your own discretion.

You can specify default name for CSV report like this:

import Data.Maybe
import Test.Tasty.Bench
import Test.Tasty.Ingredients
import Test.Tasty.Options
import Test.Tasty.Runners

main = do
  opts <- parseOptions benchIngredients benches
  let opts' = changeOption (Just . fromMaybe (CsvPath "foo.csv")) opts
  case tryIngredients benchIngredients opts' benches of
    Nothing -> print "no ingredient"
    Just mb -> do
      b <- mb
      print $ if b then "Success" else "Failure"

from tasty-bench.

Can we get a probabilistic check of asymptotic complexity? about tasty-bench HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	-- \| An internal routine to measure execution time in seconds
	-- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds)
	-- and a target relative standard deviation
	-- (put 'RelStDev' 0.05 for 5% or 'RelStDev' (1/0) to run only one iteration).
	--
	-- 'Timeout' takes soft priority over 'RelStDev': this function prefers
	-- to finish in time even if at cost of precision. However, timeout is guidance
	-- not guarantee: 'measureCpuTime' can take longer, if there is not enough time
	-- to run at least thrice or an iteration takes unusually long.

	-- \| Low-level routine to measure execution time in seconds
	-- of a given number of consecutive benchmark runs.
	measureTime :: Word64 -> Benchmarkable -> IO Double