Many interesting multi-machine or multi-process performance tests need to do some basi

Separate[BenchMark] setup and iteration phases,about microsoft/xunit-performance

Comments (31)

pharring commented on July 28, 2024

Just thinking out loud:
I think this is what fixtures are for (setup and teardown of Fact invocations). Note that with XUnit, the class constructor and Dispose methods are called for each Fact.
Would it be too awkward to separate tests into separate classes (you could use nested classes if you need to keep the code logically grouped or need some sharing from a common outer class) based on their setup/teardown needs?

See also "collection fixtures": https://gist.github.com/bradwilson/8423477 (which, incidentally, reminds us to think about parallelism for tests that require delicate or complex setup and teardown).

from xunit-performance.

roncain commented on July 28, 2024

Test fixtures could handle this too and would be more in keeping with the xunit style. We were not able to use this pattern because our tests are required to run in both CoreCLR and NET Native (and there was minimal xunit support for NET Native). So perhaps it is best to restate the implementation requirements for this issue to say that however setup is separated from test execution+measurement, it must work for both CoreCLR and NET Native. We would like a single programming model for test setup, regardless whether a test is marked [BenchMark].

Our setup story today is having each test call a helper to acquire its external resource to test -- because it worked in all xunit implementations available to us.

from xunit-performance.

ericeil commented on July 28, 2024

We need to support the full xUnit execution model on .NET Native, eventually. If we have to start with perf tests, then so be it. :) I certainly don't want to end up with a Benchmark-specific setup/teardown mechanism, unless there's some semantic reason why this needs to be done differently for Benchmarks.

@roncain's mention of [Theory] reminds me that we should definitely consider supporting data-driven perf tests, though. I'll open a separate issue for that.

from xunit-performance.

pharring commented on July 28, 2024

Even fixtures don't feel quite right. In Roslyn we have perf tests for things like source code formatting. The formatting operation itself is quick (tens of milliseonds) but the 'setup' is, by comparison, expensive (create a 'compilation', set the options, add some source code). I don't want the setup to be part of the measured scenario, so it has to go outside the [Benchmark] method. The fixture approach would encourage me to have one fixture or test class per Benchmark.

What if, instead, we have helper methods for benchmarks so you could indicate the region of interest? Maybe even wrap the "start/end" of a benchmark in a helper class so you could author like this:

using XUnit;
using XUnit.Performance;

class FormattingTests
{
    [Fact]
    public void VBWhitespace()
    {
        // setup...
        var workspace = CreateVBWorkspaceWithSourceAndOptions(" ... ");
        var document = workspace.GetDocument("Main.vb");

        // measure...
        Benchmark.Measure( () => {
             workspace.FormatDocument(document);
        }, BenchmarkMeasureOptions.ElaspedTime);
    }
}

Thoughts?

from xunit-performance.

pharring commented on July 28, 2024

I should explain that I imagine Benchmark.Measure will have lots of overloads for configuring the iteration strategy (which metrics are used to determine the end condition; confidence level; escape values, etc.), metrics gathered, isolation strategy, etc.

There could also be a Benchmark.MeasureExternalProcess method to address #5.

from xunit-performance.

ericeil commented on July 28, 2024

xUnit also already supports BeforeAfterTestAttribute, which allows "aspect"-like annotation of methods with common setup/teardown needs. Would that work?

from xunit-performance.

pharring commented on July 28, 2024

For some things, that might work - assuming all the setup/teardown is expressible that way. e.g. it doesn't appear to be possible to access non-static fields of the containing class from within the "Before" or "After" methods, so I don't think the formatting example above could be done.

As a fallback you can always write a single benchmark per test class. That way you can use the instance constructor and IDisposable.Dispose methods to do setup/teardown. But I hope that's the exception and not the norm.

from xunit-performance.

roncain commented on July 28, 2024

I'd like to add a couple other items to consider:
First issue:
Many of the OSS projects we will test generate NET Native libraries, and we would like to have a single strategy for setting up tests that works for CoreCLR as well as NET Native, including normal OuterLoop, perf, and stress. But we don't currently have a story to run the "real" xunit in NET Native against the OSS test sources. We have a sort of fake xunit, but it doesn't support fixtures or even test class instantiation. I know it is not the charter of this xunit-performance project to solve the NET Native xunit issue. I'm just observing proposing using normal xunit setup strategies won't work in NET Native yet, and this will be an issue for some of us.

Second issue:
In addition to a setup phase, many cross-machine tests also have a separate "warm up" phase before the measurements are taken. Though you could argue it is part of setup, it is more natural to be able to express this directly on the test (e.g. [BenchMark(Warmup_Time_Seconds=500)].

Combining the 1st and 2nd issues, it sounds important to allow some extensibility mechanisms here. If [BenchMark] can be designed to allow for a method to be called before measurements are taken, we can handle setup and warmup in ways that work best for us. So please consider adding to [BenchMark] something like the [Theory] concepts of being able to supply the name of a method or class responsible for doing work before measurements are taken.

from xunit-performance.

pharring commented on July 28, 2024

Based on the requirements, I don't think an attribute is going to cut it. The anatomy of a benchmark looks like this:

One-time setup
Per-iteration setup
Per-iteration timed action
Per-iteration clean-up
One-time clean-up

You could make these all parameters of a super-duper Benchmark attribute, finding a way to tie it all together, but it wouldn't be 'natural'. As @vancem said yesterday, we should use the power of the language (inheritance, generics, async, etc.) rather than trying to build our own system.

I want to be able to do something like this:

using Xunit.Performance;
using Microsoft.LoadTest; // making this up

class MyPerfTests
{
    [Fact]
    async void LoadTest1()
    {
        using (var clients = await ClientFarm.CreateClientsAsync(100))
        {
           await Benchmark.IterateAsync(
              setupAction: (int iteration) => clients.InitAsync($"MyLoadTest{iteration}"),
              timedAction: (int iteration) => clients.RunTestAsync(),
              cleanupAction: (int iteration) => clients.ResetAsync(),
              strategy: BenchmarkIterationStrategy.MovingAverage,
              options: BenchmarkOptions.ElapsedTime | BenchmarkOptions.IgnoreGC);
        } // end using
    }
}

Several things to note:

The method is async
There is per-test setup and teardown via the using statement (makes for a nice example, but it could be as complicated as you like)
There is per-iteration setup and cleanup in addition to the 'timed' region.
The three actions passed to IterateAsync are all task-returning actions (Func<int, Task>)

from xunit-performance.

pharring commented on July 28, 2024

BTW, I think "options" and "strategy" would be more than simple enum values. They should be combined into a single "options" parameter and represented as a class. We would have static instances of useful defaults but folks should be able to create new iteration strategies by deriving from a common base class.

e.g.

namespace XUnit.Performance
{
    public abstract class BenchmarkIterationStrategy
    {
        // Called before the start of a measured iteration
        protected virtual void OnBeforeMeasuredIteration(int iteration)
        {
        }

        // Called after an iteration has completed. Return true
        // to continue iterating, return false to stop.
        protected virtual bool OnAfterIteration(int iteration);
    }

    public static class BenchmarkIterationStrategy
    {
        public static readonly BenchmarkIterationStrategy Default = ...;
        public static readonly BenchmarkIterationStrategy SelfTuningElapsedTime = ...;
        public static readonly BenchmarkIterationStrategy TenTimes = FixedIterationCount(10);

        public static BenchmarkIterationStrategy FixedIterationCount(int count)
        {
            return new FixedCountIterationStrategy(count, warmup: false, ...);
        }

        public static BenchmarkIterationStrategy SelfTuning(float confidenceLevel, int minIterations, int maxIterations, etc...)
       {
            return new SelfTuningIterationStrategy(confidenceLeve, minIterations, maxIterations, ...);
       }

        // etc.
    }
}

I think that gives us lots of flexibility to extend to other iteration strategies.

from xunit-performance.

ericeil commented on July 28, 2024

FWIW, I added support for [Theory]-style benchmarks with f380e89. For the kinds of tests @pharring mentions, which (if I understand correctly) need to set up some data, then time some operation on that data, perhaps they can/should be structured as a Benchmark parameterized over that data. We'll set up the data prior to beginning timing, and then time the operation on that data. In any case, the ability to write data-driven perf tests seems useful (though it would "just work" with Paul's proposal, as you could just mark the method as a [Theory]).

I need to think more about the rest of this thread before commenting further.

from xunit-performance.

pharring commented on July 28, 2024

I should point out that one disadvantage of the utility class technique I sketched out above is that is doesn't have the test method name to hand when writing ETW events. You could get somewhere with [CallerMemberName], but it's not the same.

from xunit-performance.

karelz commented on July 28, 2024

@pharring The method is async -- Why is that important?

from xunit-performance.

pharring commented on July 28, 2024

Answering my own concern from above, if we continue to use the Benchmark attribute, then we can record/retrieve the currently-executing test case via the logical call context (CallContext.LogicalSetData/LogicalGetData). This is somewhat appealing: You can use the "Benchmark" static helper class only when the method is attributed with [Benchmark].

from xunit-performance.

pharring commented on July 28, 2024

It's important because you might be testing asynchronous (awaitable) APIs. Using the expressiveness of the language and the framework as it's designed to be used. That makes your test cases look more natural.

Maybe I should have made it obvious by using a simpler non-async example, but there will be non-async versions and plenty of overloads to supply sensible defaults.

e.g.

    [Benchmark]
    void FormatWhitespace()
    {
        // setup
        var workspace = CreateWorkspace(...);
        var document = workspace.AddDocument(...);

       // iterate
       Benchmark.Iterate(
              timedAction: (int ignored) => workspace.FormatDocument(document)
         );
    }

Note that there's no per-iteration setup/cleanup, just the timedAction. Everything else is defaulted. There's no async. The code looks natural.

from xunit-performance.

ericeil commented on July 28, 2024

Coming back to this, finally. :)

It's become clear to me, after some conversations with @schaabs about "load testing," and with @mellinoe and others about CoreFx testing, that we really do need this. And in particular, it's looking like load testing will force us down a path where a test ends up with quite a bit of control over how "iterations" are executed. So I spent a little time yesterday trying to come up with a simple mechanism for giving the tests more control, but also keeping simple tests simple.

The idea is to have a new type, call it Benchmark, which represents an executing test case. Tests can get at the current instance via Benchmark.Current. There would be two canonical uses of this type to control iterations. First, the fully-general pattern, which allows full control over when an "iteration" happens, and what happens around it:

        [Benchmark]
        static void TestWithSetupAndTeardown()
        {
            // ...any per-testcase setup can go here.

            while (Benchmark.Current.NeedMoreIterations)
            {
                // ...per-iteration setup goes here

                using (Benchmark.Current.NextIteration())
                {
                    // ...only code in here is actually measured
                }

                // ...per-iteration cleanup here
            }

            // ...per-testcase cleanup here
        }

Now, I still don't want trivial tests to have to be this verbose. So we also make Benchmark implement the IEnumerable pattern (logically, it represents a sequence of iterations). That allows a more compact syntax, for cases where we just want to measure everything on every iteration:

        [Benchmark]
        static void SuperSimpleTest()
        {
            foreach (var iteration in Benchmark.Current)
                DoSomething();
        }

This is compact enough, and hopefully "foolproof" enough, that I'm comfortable with requiring one or the other of these patterns for every perf test. I.e., [Benchmark] would no longer imply automatic execution of the whole test method multiple times per run. In fact, the only role of BenchmarkAttribute in this new scheme would be to supply a Benchmark.Current pre-initialized with the test name.

Thoughts?

from xunit-performance.

ericeil commented on July 28, 2024

Another thought is that Benchmark.NextIteration could take optional arguments specifying the kinds of things I've been trying to communicate through attributes thus far. For example:

Benchmark.NextIteration(collectGarbage: false, skipWarmup: true)

from xunit-performance.

pharring commented on July 28, 2024

I'm OK with the direction.

I think it's confusing to have two kinds of iteration (both IEnumerable and NeedsMoreIterations) on the same type. Instread, make Benchmark an abstract base class, implementing IEnumerable so both loops can be "foreach". Then have clients choose whether they want the 'simple' or the 'expanded' version. The simple version has no additional methods. The expanded version (only) has the "NextIteration" method.

I think the parameters (collectGarbage, skipWarmup, etc.) should be on a factory method for the outer loop. i.e. where you have Benchmark.Current it should be Benchmark.CreateExpandedBenchmark(collectGarbage: false, skipWarmup: true)

i.e.

[Benchmark]
void SuperSimpleTest()
{
    foreach (var iteration in Benchmark.Simple)
        DoSomething();
}

[Benchmark]
void TestWithSetupAndTeardown()
{
    // ...any per-testcase setup can go here.

    foreach (var iteration in Benchmark.CreateExpandedBenchmark(collectGarbage: false, skipWarmup: true)
    {
        // ...per-iteration setup goes here

        using (iteration.BeginMeasuredOperation())
        {
            // ...only code in here is actually measured
        }

        // ...per-iteration cleanup here
    }

    // ...per-testcase cleanup here
}

Note: Now that I've written it out, I realized I could put BeginMeasuredOperation (your NextIteration() method) on the iteration variable rather than the Benchmark. I think that makes "Benchmark" a static class (not abstract as I said above) with a couple of factory methods.

from xunit-performance.

ericeil commented on July 28, 2024

Yeah, I agree that having two iteration patterns is weird. I like your idea; even if we only had the "expanded" iteration pattern, the simplest use is pretty simple:

        [Benchmark]
        static void SuperSimpleTest()
        {
            foreach (var iteration in Benchmark.GetIterations())
                using (iteration.BeginMeasurement())
                    DoSomething();
        }

I dislike that there's more "plumbing" than actual test code here, but it's probably the case that in practice very few benchmarks are actually this simple.

from xunit-performance.

pharring commented on July 28, 2024

If you want to make the simple stuff look simple then:

[Benchmark]
void SuperSimpleTest()
{
    Benchmark.Iterate( () => DoSomething() );
}

from xunit-performance.

ericeil commented on July 28, 2024

Sold. :) I'll try to get this put together before there are too many tests in the wild that need to change.

from xunit-performance.

ericeil commented on July 28, 2024

Anyone have thoughts on the naming, etc.? Here's a proposal:

public static class Benchmark
{
    public static IEnumerable<BenchmarkIteration> GetIterations(bool collectGarbage = true, bool skipWarmup = false);
    public static void Iterate(Action measuredMethod);
    public static async Task IterateAsync(Func<Task> measuredMethod);
}

public struct BenchmarkIteration
{
    public BenchmarkIterationMeasurement StartMeasurement();
}

public struct BenchmarkIterationMeasurement : IDisposable
{
    public void Dispose();
}

And some sample usage:

        [Benchmark]
        static void TestWithSetupAndTeardown()
        {
            // ...any per-testcase setup can go here.

            foreach (var iteration in Benchmark.GetIterations(collectGarbage: true, skipWarmup: false))
            {
                // ...per-iteration setup goes here

                using (iteration.StartMeasurement())
                {
                    // ...only code in here is actually measured
                }

                // ...per-iteration cleanup here
            }

            // ...per-testcase cleanup here
        }

        [Benchmark]
        static void SuperSimpleTest()
        {
            Benchmark.Iterate(() => DoSomething());
        }

        [Benchmark]
        static async Task SuperSimpleAsyncTest()
        {
            await Benchmark.IterateAsync(() => DoSomethingAsync());
        }

from xunit-performance.

karelz commented on July 28, 2024

collectGarbage name is a bit confusing. Is it asking the infra to enforce GC? Is it a hint the test may trigger GC?

from xunit-performance.

ericeil commented on July 28, 2024

I think I'll just leave out the options for now (unless someone here already has a need for something). Maybe we should, eventually, go with something like a BenchmarkIterationOptions object that gets passed to Benchmark.GetIterations.

from xunit-performance.

ianhays commented on July 28, 2024

The overall structure of this looks useful to me, and I think the naming is fine as well. However, I agree with the above sentiment that having two distinct patterns for iteration is potentially confusing for little payoff. The simple case of

        [Benchmark]
        static void SuperSimpleTest()
        {
            foreach (var iteration in Benchmark.GetIterations())
                using (iteration.BeginMeasurement())
                    DoSomething();
        }

Is hardly worse than:

[Benchmark]
void SuperSimpleTest()
{
    Benchmark.Iterate( () => DoSomething() );
}

But the simplified loop version is consistent with the expanded loop version, which is more valuable in my opinion than the savings that the "SuperSimpleTest" pattern provides. I can of course write my tests in the former pattern regardless of the existence of the latter if I so choose, but I'd argue consistency is better than the minimal code savings.

Ian

from xunit-performance.

karelz commented on July 28, 2024

I think there is still value in having a super-simple option of having "pure functional" test run as a perf test:

[Benchmark]
void SuperSimpleTest()
{
    DoSomething();
}

from xunit-performance.

ianhays commented on July 28, 2024

I think there is still value in having a super-simple option of having "pure functional" test run as a perf test:

I'd agree with that notion as well. It's the half-functional half-expanded syntax that I think is potentially unfriendly.

Without looking at the code, would it be possible to have a system whereby:

Tests marked as [Benchmark] that do not have any static Benchmark. calls within the method function the same way as they currently do.
Tests marked as [Benchmark] that make a Benchmark.GetIterations() call follow the "expanded" syntax.

Or if that would not work, perhaps we could split [Benchmark] into two attributes where one is simple and the other provides fine-grained timing control?

from xunit-performance.

ericeil commented on July 28, 2024

I've submitted a PR with a proposed implementation. I've left out the "pure functional" iteration methods for now; I propose that we consider adding these only if experience tells us they'll be really handy.

from xunit-performance.

pharring commented on July 28, 2024

I think the string tests are telling us they (the "pure functional" iteration methods) would be handy. I think the repeated boilerplate hides the operations under test and, if I were maintaining that test code (and didn't have access to xunit.performance source), I'm sure I'd write it myself.

from xunit-performance.

ericeil commented on July 28, 2024

The string tests are not particularly realistic examples of tests, unfortunately. For example, each iteration takes so little time that the resolution of the system timer creates tons of artificial noise. Having learned this, I'm reluctant to encourage tests to be written this way. Also, I like that the "expanded" syntax highlights that there is a distinct region where the measurement is happening, and that code could go outside that region; this seems like it might be important as a test is modified over time, as it encourages the contributor to think about whether a given change should go inside or outside of that region.

But more than all that, as @ianhays points out, the "expanded" syntax isn't that bad; given that, I think one API is better than two (or three), unless experience teaches us otherwise.

from xunit-performance.

ericeil commented on July 28, 2024

I'm calling this closed for now.

from xunit-performance.

Separate[BenchMark] setup and iteration phases about xunit-performance HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent