Sometimes I wish I could run a function after every invocation of the benchmarking fun

As this will discuss API, it's also interesting to delve into <a class="issue-link js-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Provide option to run a function after/before every benchmark invocation (teardown/setup) + before/after a scenario about benchee HOT 10 CLOSED

bencheeorg commented on September 2, 2024

Provide option to run a function after/before every benchmark invocation (teardown/setup) + before/after a scenario

from benchee.

Comments (10)

PragTob commented on September 2, 2024 3

As this will discuss API, it's also interesting to delve into #124 / keep it in mind while discussing this to keep the API reasonably consistent within itself.

Feedback is very welcome, although I know it's a long read :)

Problem/Requirements

Basically we need both:

invocation before/after every function invocation (i.e. every benchmarking sample)
Invocation before/after every scenario (job + input, so every distinct category that shows up afterwards)

(we don't need before/after all as that can just be done in the script)

Each of those needs to have some way of knowing which job and maybe input it is running for as we probably don't want to run them for all. I.e. in my example I want to run something before scenario only for the hound use case, not for wallaby.

Also bonus points - before function should have some way of passing their result to the benchmarking function. I.e. they might want to start a process and pass the pid on.

Possible solutions

Very much looking forward to other proposals and feedback

Tuple options

(Akin to #124 on some level)

The functions could be provided along with the benchmarking jobs in the initial map by using a tuple with the function and the options.

"My function" => {fn -> magic end, before_each: some_function, after_each: some_other_function, before_scenario: cool_beans}
"Other function" => fn -> dont_need_no_options end

Pros:

before/afters are defined together with the job they are for so no/little need to perform any kind of matching

Cons:

Possibly annoying if something should apply to all jobs (think cache busting)

General options

a before_each/after_each function could be defined as part of the general options, it could then be handed a Scenario-like struct with information about the job and the input to decide what to do.

before_each: fn(%{job_name: job, input_name: input}) -> if job_name == ..., do: ... end

Pros:

Stuck together with the other options
Easy to do if before/after applies to all jobs

Cons:

Might require some matching logic if the befores/afters are only attached to certain jobs

The Job-Module

Instead of just defining a benchmarking function, one could define a whole module that implements or does not implement the before_/after_ methods.

defmodule MyJob do

  @behaviour Benchee.Job

  def before_scenario, do: # something before the whole scenario
  def before_each, do: # something before every invocation
  def benchmark, do: # actual benchmark
end

# ...

"My Job" => MyJob

Pros:

attached to a specific job so it's easy to do custom stuff for it
Introduces an (optional) module for benchmarking code, which somehow feels right

Cons:

bit of overhead if people want to do something before every benchmark
Might feel like a loss of simplicity for people

The passing of values

from before_* to the benchmarking function and the return value of the benchmarking function to after_*

From before_* to benchmarking function

This is the fun part. We could instate another (optional) argument to the benchmarking function but that'd just get confusing because it would always change based on if an input is supplied, if a before_* function is supplied and so on. No thanks.

We could pass the return values (and the input?) in a map so we could pattern match on it, I'd have to benchmark this but I feel slightly cautious about it since that'd add pattern matching overhead to the function invocation which could have a negative impact on micro benchmarks. Hell, people even benchmark pattern matching ;)

My favorite idea (so far) is to pass the input into before_scenario, treat it's return value as the new input and pass that to before_each and then pass the return value of before_each to the job function. If people just want to add stuff they can put it in a tuple, otherwise they just change the one argument etc...

From benchmarking function to after_*

Rather simply, we just take the return value (which :timer.tc gives us either way) and pass it on. Dunno if this deserved it's own section. 😜

Here have a little fresh photo (~1 minute old!) of my little bunny overlords as a thanks for reading this far!

from benchee.

PragTob commented on September 2, 2024 1

I'm gonna be taking a stab/stabs at the before_/after_secenario things :)

from benchee.

devonestes commented on September 2, 2024

I like where this is headed. I think that since this is such a large addition, it might make sense to try and find a sort of MVP for this feature. I would think that MVP could be to allow a user to:

Define a function to run before and/or after each scenarios
Define a function to run before and/or after each scenario in a given job

I think we can do that with a sort of combination of two of the options you have above. I would imagine that something like the following wouldn't be too hard for a user:

Benchee.run({
  "job 1" => {fn -> needs_setup, before_each: fn -> do_a_thing end, after_each: fn -> do_another_thing end},
  "job 2" => fn -> does_not_need_setup end  
}, before_each: fn -> seed_process_with_data end,
   after_each: fn -> truncate_database end)

I would think that the "global" setup in this case would run before the individual setup (like they do in RSpec). That seems to make sense to me - what do you think?

One thing I feel kind of strongly about is that the Job Module is asking a lot of a user. I would bet that for 85-90% of people, most of the time they use Benchee it'll be for pretty simple usage. Having these more advanced features is really great, but it would make sense to me to optimize for the more simple use case. Basically, make Benchee.run/2 as easy as possible to use for simple benchmarks.

I think it might make sense to keep Benchee.run/2 really simple and maybe not add these kinds of advanced configuration options to that function. We already have the sort of "lower level" data transformation approach available and documented, and it might make sense to only add options there so Benchee.run/2 doesn't keep getting bigger and more complicated.

Basically, I think it's fair to tell a user "If you want to do simple benchmarks, use Benchee.run/2. If you want to start doing advanced configuration, we have another API that sits below Benchee.run/2 with much more fine grained levels of customization that you're welcome to use. It's a little more difficult to use, but it gives you much more flexibility." That way we offer the flexibility for "advanced" users without sacrificing the simplicity of Benchee.run/2 for "beginner" users.

from benchee.

PragTob commented on September 2, 2024

👋 Thanks for the input once again! 👍

I agree about the MVP thought, I wanna get it as far as possible before releasing because I don't wanna break API again :D Plus, I believe it's actually not that big. I believe function without getting passed in data (and some way to pass data along) can achieve very little if people write pure functions.

Ah it took me a while to realize what you want to do... so before_each really means before_each but you can supply it in both the global setup and for an individual job? Cool beans 👍 . That actually sounds more complicated to implement, but seems far more user friendly than either repeating it all over or having to switch on different scenarios/inputs all over.

_global setup vs. individual setup: global setup runs first definitely! 👍

Job Module maybe I didn't make my intention clear or I misunderstand the comment, the module would be optional as in normally you can just hand it a function and be done with it. If you want to do more fancy stuff provide a module. However, another downside that I see now is that often the module would be defined in a separate file which makes just posting the benchmarking code harder. Also, when we go "optional tuples" with the formatters in #124 then I also think doing about the same thing makes sense here.

I like the idea of an advanced interface versus beginner interface :) Although, the current "advanced" interface is very verbose and also more aimed at people who wanna just substitute a whole step/enhance it. If we can still comfortably fit something into Benchee.run then I'd do that :)

Omnomnom changes incoming

from benchee.

PragTob commented on September 2, 2024

I'll try to give this a somewhat time boxed shot :) 🚀

from benchee.

PragTob commented on September 2, 2024

Ok and I hit the first fun edge case...

When iterations are too fast we run them multiple times to get some sort of sensible measurement. Now, we have 2 options:

run before_each before the repeated invocation, meaning we won't exactly run it before each function invocation
run before_each for every invocation, thereby adding it to the total of the measured time

I absolutely wanna avoid the latter option personally, but one way or another it's broken when we have to run the function multiple times. Either we run before_each just once before a "batch" of invocations or the before_each time gets added to the measured times. 😓

from benchee.

devonestes commented on September 2, 2024

I don't think we can run before_each before the batch. It needs to actually be run before each invocation of a given function. I can see plenty of ways it would blow up for a user if we did that.

I think we can definitely avoid having the before_each run times added to the actual function run times - it'll require some additional changes. The general shape of it could be like:

def record_runtime(before_each, fun, arg, after_each) do
  before_each.()
  run_time = :tc.timer(fn -> fun.(arg) end)
  after_each.()
  run_time
end

It'll take some modification, but I think something like the above might work. This will definitely add significant complexity to this operation, though. I can't see any way around that.

from benchee.

PragTob commented on September 2, 2024

@devonestes we can't for repeated times :( What you're showing is the normal case and that's what I do, but for repeating it's different...

The whole problem why we do repeated measurements is that we can't get a good measurement for just one iteration through tc.timer because it is too fast so measurments don't give us anything. Which is why we repeat the call n times and measure the whole time:

    {microseconds, _return_value} = :timer.tc fn ->
      RepeatN.repeat_n(function, num_iterations)
    end

If we also wanted to repeat the before_each, it has to be within that repeat block. Chopping it up and adding the individual measurements together won't help because then the measurements will still be wrong.

from benchee.

PragTob commented on September 2, 2024

Updated with progress :)

from benchee.

PragTob commented on September 2, 2024

And we're all done 🎉

from benchee.

Provide option to run a function after/before every benchmark invocation (teardown/setup) + before/after a scenario about benchee HOT 10 CLOSED

Comments (10)

Problem/Requirements

Possible solutions

Tuple options

General options

The Job-Module

The passing of values

From before_* to benchmarking function

From benchmarking function to after_*

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent