Giter VIP home page Giter VIP logo

Comments (15)

Bodigrim avatar Bodigrim commented on May 30, 2024 1

Timeout takes a soft priority over RelStDev. No exceptions are thrown. I've elaborated a bit more in haddocks:

-- | An internal routine to measure execution time in seconds
-- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds)
-- and a target relative standard deviation
-- (put 'RelStDev' 0.05 for 5% or 'RelStDev' (1/0) to run only one iteration).
--
-- 'Timeout' takes soft priority over 'RelStDev': this function prefers
-- to finish in time even if at cost of precision. However, timeout is guidance
-- not guarantee: 'measureCpuTime' can take longer, if there is not enough time
-- to run at least thrice or an iteration takes unusually long.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

It would be nice, but probably too complicated for tasty-bench to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?

I was thinking about implementing something in spirit of bcompare, which mysteriously disappeared starting from criterion-1.0.0.0. It could be a good step moving away from absolute numbers in benchmarks.

from tasty-bench.

kindaro avatar kindaro commented on May 30, 2024

We can phrase it differently: is the given function distinguishable from a given asymptote up to a constant multiple within some realistic range of inputs? Of course a pathological example like f (n) + (εn)! may always be invented such that up to a large enough n it looks like f and then suddenly explodes. But in such cases we may as well not have a bench mark in the first place.


It would be nice, but probably too complicated for tasty-bench to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?

I am not aware of any ready made solution in Haskell (although a quick search reveals prior art in other settings). But how hard can it be?

  1. Apply inverses of a bunch of functions representing asymptotic classes.
  2. Fit a linear model to each resulting data set.
  3. Select a class where the error is the least.

This can be flaky since some classes are very similar. But we do not really need this much. We only need to tell if a function plausibly may be in a given class, up to this much error. We really only want to automate and make precise the judgement that we already routinely make with a naked eye. And we already put bench marks in a test suite — it is a logical next step to make them actually detect failures.


I understand that this would be experimental and anything can go wrong. And we do not have any doctors of statistics on the pay roll. So maybe you are right and we should not even try.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

It does not take a PhD in statistics, but probably involves opinionated design choices and extra dependencies. tasty-bench is a good citizen of tasty ecosystem, so one can potentially write another TestReporter, which will perform this kind of analysis. Additional metadata can be passed via dummy pattern in after. I'm interested to see how this can play out, because I had similar desires in the past.

Actually, even a standalone package, which can guess asymptotics of a function, would be nice to have.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

I made some progress with bcompare atop of tasty facilities for test dependencies. Documentation is still absent, but here is an example of usage: Bodigrim/mod@cb4b737 It generates a report, comparing benchmark results with each other:

  Sum
    Data.Mod:           OK (1.46s)
      483 ms ±  41 ms
    Data.Mod.Word:      OK (2.18s)
      145 ms ± 8.2 ms, 0.30x
    finite-field:       OK (12.76s)
      4.26 s ± 133 ms, 8.81x
    finite-typelits:    OK (13.34s)
      4.46 s ± 254 ms, 9.25x
    modular-arithmetic: OK (8.63s)
      2.87 s ±  91 ms, 5.95x
    modular:            OK (45.02s)
      6.53 s ± 543 ms, 13.52x

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

With bcompare the original example can look like below, making it easier to guesstimate algorithmic complexity:

All
  folds
    foldl'
      1:     OK (3.86s)
         14 ns ± 434 ps, 0.00x
      2:     OK (4.87s)
         18 ns ± 500 ps, 0.00x
      4:     OK (14.16s)
         26 ns ± 586 ps, 0.01x
      8:     OK (6.25s)
         46 ns ± 1.5 ns, 0.01x
      16:    OK (11.60s)
         86 ns ± 1.8 ns, 0.02x
      32:    OK (11.08s)
        165 ns ± 4.8 ns, 0.04x
      64:    OK (4.95s)
        292 ns ± 8.7 ns, 0.07x
      128:   OK (9.77s)
        582 ns ±  12 ns, 0.14x
      256:   OK (9.34s)
        1.1 μs ±  34 ns, 0.26x
      512:   OK (9.43s)
        2.2 μs ±  70 ns, 0.53x
      1024:  OK (4.45s)
        4.2 μs ± 132 ns
      2048:  OK (19.41s)
        8.9 μs ± 313 ns, 2.11x
      4096:  OK (4.64s)
         18 μs ± 654 ns, 4.15x
      8192:  OK (9.36s)
         35 μs ± 677 ns, 8.38x
      16384: OK (4.68s)
         71 μs ± 2.3 μs, 16.90x
      32768: OK (9.45s)
        144 μs ± 3.1 μs, 34.11x
      65536: OK (37.45s)
        283 μs ± 3.5 μs, 67.20x

from tasty-bench.

kindaro avatar kindaro commented on May 30, 2024

So, it is comparing everything to the value at 1024? Nice!

But there are 2 drawbacks:

  1. It is even more numbers on the screen for my tired eyes to look at.
  2. The way to define it is via some obscure reference language? I use awk occasionally but this is, like, anti-Haskell.

So, it is sort of great, but also sort of not.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024
  1. Yes, tasty uses awk expressions for references between tests. The output above was generated from
bgroup "foldl'" $
  map (\s -> (if S.length s == 1024 then id else bcompare "$NF == \"1024\" && $(NF-1) == \"foldl'\"") $ 
    bench (show $ S.length s) $
      nf (S.foldl' (\acc x -> acc + fromIntegral x) (0 :: Int)) s) foldInputs

This is arguably messy, because we stretch tasty to its limits. It would be nice to provide more ergonomical combinators.

from tasty-bench.

UnkindPartition avatar UnkindPartition commented on May 30, 2024

I came here with the same feature request :-)

I think what's missing in tasty-bench is a way to time something without wrapping it in a TestTree/Benchmark.

Something like measure :: Benchmarkable -> IO Timings, where Timings would contain the summary statistics (mean, sd) as well as the raw measurements that were used to come up with these statistics.

Then downstream packages could use this to do some more intelligent analyses. (I'd have a go at one.)

@kindaro: see here for the explanation of why awk is used for patterns in tasty.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

@UnkindPartition what if I expose something like this?

-- | Low-level routine to measure execution time in seconds
-- of a given number of consecutive benchmark runs.
measureTime :: Word64 -> Benchmarkable -> IO Double

Would it be enough?

from tasty-bench.

UnkindPartition avatar UnkindPartition commented on May 30, 2024

It would also be nice to have a way to estimate a "good enough" number of runs, as described in the "Statistical model" section of the README. Otherwise that logic would have to be duplicated between packages.

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

@UnkindPartition how about this?

-- | An internal routine to measure execution time in seconds
-- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds)
-- and a target relative standard deviation (put 'RelStDev' 0.05 for 5%).
measureCpuTime :: Timeout -> RelStDev -> Benchmarkable -> IO Double

from tasty-bench.

UnkindPartition avatar UnkindPartition commented on May 30, 2024

This looks good. Would be good to document what the timeout does, exactly. (Would it use the max number of iterations that fit into the timeout even if RelStDev is not satisfied? Would it throw an exception?)

from tasty-bench.

kindaro avatar kindaro commented on May 30, 2024

Does this get to be wielded automatically when a bench mark runs in the presence of saved results of the previous bench mark, so that the results of a bench mark can be committed to version control, thereby at once turning it into a true test suite? Also, can we have default name for bench mark files, so that cabal test but works? Or is this a fantasy yet?

from tasty-bench.

Bodigrim avatar Bodigrim commented on May 30, 2024

@kindaro not sure what exactly you refer by "this". Since measureCpuTime is now publicly exposed, one can write a tasty provider to detect and check asymptotic complexity. I think that such provider is unlikely to be merged into tasty-bench itself, but the beauty of tasty is that you can freely combine providers and ingredients at your own discretion.

You can specify default name for CSV report like this:

import Data.Maybe
import Test.Tasty.Bench
import Test.Tasty.Ingredients
import Test.Tasty.Options
import Test.Tasty.Runners

main = do
  opts <- parseOptions benchIngredients benches
  let opts' = changeOption (Just . fromMaybe (CsvPath "foo.csv")) opts
  case tryIngredients benchIngredients opts' benches of
    Nothing -> print "no ingredient"
    Just mb -> do
      b <- mb
      print $ if b then "Success" else "Failure"

from tasty-bench.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.