zarr-developers / perfcapture Goto Github PK
View Code? Open in Web Editor NEWCapture the performance of a computer system whilst running a set of benchmark workloads.
License: MIT License
Capture the performance of a computer system whilst running a set of benchmark workloads.
License: MIT License
Need to standardise on words like the "recipes", what's a single "run", etc.
Maybe a run
should be a single execution of Workload.run()
against a specific Dataset
.
In today's Benchmarking meeting, we all agreed that it'd be great to get a simple benchmarking solution implemented ASAP, so we can get on with the fun work of trying to speed things up ๐. To start with, we just need something simple that'll allow us to run benchmark workloads on our machines, figure out how close the current code is to the theoretical IO performance of the hardware, and compare performance between git commits.
Here's an early attempt to specify a very simple framework to allow people to specify workloads. Here's a quick example of what is required to implement a simple dataset and workload.
To implement a new workload, you'd implement a class which inherits from Workload
, and you'd just override init_dataset
and run
. To implement a new dataset, you'd implement a class which inherits from 'Dataset', and override prepare
.
Does this look OK? Is there anything missing?
The (very simple) framework would then take care of discovering & running the workloads, whilst also recording relevant metrics to disk as JSON, and printing a summary.
The framework isn't ready yet! But the idea is that the MVP will expose a very simple CLI that allows you to run all benchmarks, or a specific benchmark. It'll automatically record metadata about the system and the git commit. And it'll make it easy to compare performance between different git commits.
The idea is to make it super-easy for anyone to submit a new workload via a PR. And easy for us to share "human-readable" metrics ("My new PR speeds up Zarr-Python on workload x by 10x on my machine! Hurray!") and share machine-readable metrics (JSON).
I'd imagine moving this code to zarr-developers if folks agree that the approach is OK.
(See zarr-developers/zarr-benchmark#1 for a discussion of why I'm currently leaning towards the idea of implementing our own benchmarking tool. But feel free to make the case for using an existing tool! I'm just not sure that any existing tool would allow us to measure IO performance.)
Implement a get_results() -> pd.DataFrame
method where the counter name is the column, and the run_id is the row.
Remove __str__
from each PerfCounter
.
run_workloads()
should return a dict where values are these results dataframes.
I'm not entirely happy with the name perfcapture
. perfcapture
felt right when I was thinking of capturing a timeseries of performance metrics for each workload (e.g. sampling every 100 ms). But now that we're just capturing total metrics at the end of each workload run, perfcapture
doesn't feel quite right. It's not a terrible name. But it doesn't quite sit right with me.
Something more like iobench
feels better, except there are already multiple projects called iobench
!
So pip install
installs the cli. So it's available from all paths.
In today's meeting for "Zarr Performance & Benchmarking (Europe-friendly time)", @joshmoore described the Zarr Implementations project: The Zarr Implementations project collects "data in zarr / n5 format written by different implementations" and tests for compatibility. Zarr Implementations is related to - although distinct from - benchmarking. Specifically: it might be nice to benchmark Zarr Implementations
.
With perfcapture
's current API, it should be possible to call Zarr Implementations
from perfcapture.Workload.run
(probably using one perfcapture.Workload
class per Zarr implementation).
But it might also be nice to harmonize the API, such that both perfcapture
and Zarr Implementations
use the same yaml
structures to define the workloads.
Related:
xref: #15
The plan is to implement a benchmarking tool which automatically runs a suite of "Zarr workloads" across a range of compute platforms, storage media, chunk sizes, and Zarr implementations.
What would we like to measure for each workload?
Existing benchmarking tools only measure the runtime of each workload. That doesn't feel sufficient for Zarr because one of our main questions during benchmarking is whether the Zarr implementation is able to saturate the IO subsystem, and how much CPU and RAM is required to saturate the IO.
I'd propose that it'd be great to measure these parameters each time each workload is run:
(Each run would also capture a bunch of metadata about the environment such as the compute environment, storage media, chunk sizes, Zarr implementation name and version, etc.)
I had previously gotten over-excited and starting thinking about capturing a full "trace" during the execution of each workload, e.g. capturing a timeseries of the IO utilization every 100 milliseconds. This might be useful, but makes the benchmarking code rather more complex, and maybe doesn't tell us much more than the "totals per workload" tell us. And some benchmark workloads might run for less than 100 ms. And psutil's documentation states that some of its counters aren't reliable when polled more frequently than 10 times a second.
What do you folks think? Do we need to record a full "trace" during each workload? Or is it sufficient to just capture totals per workload? Are there any changes you'd make to the list of parameters I proposed above?
xref: #6 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.