offen / underscore-benchmark Goto Github PK

View Code? Open in Web Editor NEW

3.0 4.0 1.0 149 KB

Benchmarking stats calculation against different underscore versions

License: MIT License

JavaScript 100.00%

underscore-benchmark's Introduction

underscore-benchmark

Benchmarking stats calculation from offen/offen against different underscore versions.

Default setup

To run the benchmark from this repository, you first need to install the project's dependencies. Next, you can run the setup which does the following:

pull the underscore versions to compare and put them in the vendor directory
create fixture data for the benchmark to run against

npm install
npm run setup

Once this has finished, you are ready to run the benchmark in Node.js:

npm test

You can also run the benchmark in a browser:

npm start

budo will start a local server at port 9966: loading this URL in a browser will run the benchmark. Results will be printed to your browser's console.

If used with defaults, the benchmark will use commit c9b4b63fd08847281260205b995ae644f6f2f4d2 as the baseline and eaba5b58fa8fd788a5be1cf3b66e81f8293f70f9 as the comparison for the benchmark.

Adjusting the benchmark

Running against a single version only

If you do not want to run the benchmark against both the baseline and the comparison version at the same time, you can choose to run a single test only.

Using the CLI, this is done by passing the desired test as the first command line arg:

npm t -- baseline

If you are using the browser, append the name of the desired case as the URL hash and reload the page, e.g. http://localhost:9966/#baseline

Running against different versions of underscore

If you want to run this benchmark against different versions of underscore, you can use the ./pull.sh script to pull any versions you want to compare from the jashkenas/underscore repository.

The script expects two arguments, the baseline ref and the comparison ref:

node ./scripts/pull-versions.js master some-feature-branch

Such an argument can be any either a branchname or a commit hash for the repository.

Running against more/less user data

By default, the benchmark runs against randomly created user data for 5000 users. This is an average real-world workload for the code under test, so it's probably a sane default. If you want to benchmark how two versions of underscore compare with more or less data being processed, you can re-run the fixture creation, passing the non-default number of users as the first argument:

node ./scripts/generate-fixtures.js 2500 ./fixtures/events.json

Reverting your adjustments

If you want to revert your benchmark setup to the default state after making adjustments, you can use the setup command:

npm run setup

underscore-benchmark's People

Contributors

Stargazers

Watchers

Forkers

shakahl

underscore-benchmark's Issues

Add a way to run the benchmark code with only one version of Underscore

When testers find a significant difference, it is nice if they can profile the benchmark code, first with the baseline version and then with the comparison version, in order to determine which Underscore functions have slowed down (or sped up) most. For this to produce useful numbers, it is important that the benchmark runs with only one version at a time; otherwise, the profiler is likely to just average the results.

Windows support

As discussed in #1 it is required that running this benchmark on Windows is also possible (excluding Windows would cut off a lot of testing targets).

We specifically need to find a way to replace the pull.sh script with a cross-platform solution, either by using git-submodule or reimplementing the behavior as a Node.js script.

Discussion: benchmark package from NPM

The benchmark package does (at least) two questionable things:

In order to decide whether the performance difference is significant, it uses the Mann-Whitney U test, which is meant for ordinal scale data, instead of Student's t test, which is more appropriate for ratio scale data such as the performance measurements we're making here.
It recompiles JavaScript code from a string on every test run in order to prevent engine optimizations. Usually we're interested in performance with engine optimizations, since this is how code tends to run in the real world. Besides that, since most of the benchmark code is probably out of reach for this compilation trick, optimization is only partly disabled, producing inconsistent optimization characteristics. Maybe this behavior can be disabled; this is worth investigating.

I can think of a couple of options (from least to most effort):

Accept the quirks and do nothing.
Switch to an alternative benchmark framework that doesn't have these quirks. I'm not (yet) aware of an alternative.
Forego a convenient benchmark framework. Instead, repeat the benchmark code a fixed number of times (say, 10) for each version of Underscore and report each individual result, so that people can compute their own statistics.
Fork the benchmark package, fix the issues, submit a PR. Use our own version regardless of whether the PR is accepted. If not accepted, publish as a separate package on NPM.

How representative are the generated fixture data?

I see that generate-fixtures.js spends some effort to ensure that one third of events is mobile and a quarter has a referrer. Is this based on observations from real-world data?

Otherwise, the randomized values seem to be drawn from a flat distribution. All events are of type PAGEVIEW. Are these choices approximately realistic? For example, are other event types rare?

I'm asking because the distribution of values may affect the proportions of time spent in alternative branches of the code, which in turn may affect the running time.

If we end up concluding that the data are not very representative, one possible solution may be to take real data and replace each unique string by another, non-identifying string (using wholesale search/replace) in order to arrive at a single, censored-but-representative dataset.