Giter VIP home page Giter VIP logo

perfcapture's People

Contributors

dependabot[bot] avatar jackkelly avatar jbms avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

jbms ap--

perfcapture's Issues

Simple API for specifying each benchmark workload

In today's Benchmarking meeting, we all agreed that it'd be great to get a simple benchmarking solution implemented ASAP, so we can get on with the fun work of trying to speed things up ๐Ÿ™‚. To start with, we just need something simple that'll allow us to run benchmark workloads on our machines, figure out how close the current code is to the theoretical IO performance of the hardware, and compare performance between git commits.

Here's an early attempt to specify a very simple framework to allow people to specify workloads. Here's a quick example of what is required to implement a simple dataset and workload.

To implement a new workload, you'd implement a class which inherits from Workload, and you'd just override init_dataset and run. To implement a new dataset, you'd implement a class which inherits from 'Dataset', and override prepare.

Does this look OK? Is there anything missing?

The (very simple) framework would then take care of discovering & running the workloads, whilst also recording relevant metrics to disk as JSON, and printing a summary.

The framework isn't ready yet! But the idea is that the MVP will expose a very simple CLI that allows you to run all benchmarks, or a specific benchmark. It'll automatically record metadata about the system and the git commit. And it'll make it easy to compare performance between different git commits.

The idea is to make it super-easy for anyone to submit a new workload via a PR. And easy for us to share "human-readable" metrics ("My new PR speeds up Zarr-Python on workload x by 10x on my machine! Hurray!") and share machine-readable metrics (JSON).

I'd imagine moving this code to zarr-developers if folks agree that the approach is OK.

(See zarr-developers/zarr-benchmark#1 for a discussion of why I'm currently leaning towards the idea of implementing our own benchmarking tool. But feel free to make the case for using an existing tool! I'm just not sure that any existing tool would allow us to measure IO performance.)

Consider different names for project (instead of `perfcapture`)?

I'm not entirely happy with the name perfcapture. perfcapture felt right when I was thinking of capturing a timeseries of performance metrics for each workload (e.g. sampling every 100 ms). But now that we're just capturing total metrics at the end of each workload run, perfcapture doesn't feel quite right. It's not a terrible name. But it doesn't quite sit right with me.

Something more like iobench feels better, except there are already multiple projects called iobench!

Consider defining workloads & benchmarks in yaml to be consistent with `zarr_implementations`

In today's meeting for "Zarr Performance & Benchmarking (Europe-friendly time)", @joshmoore described the Zarr Implementations project: The Zarr Implementations project collects "data in zarr / n5 format written by different implementations" and tests for compatibility. Zarr Implementations is related to - although distinct from - benchmarking. Specifically: it might be nice to benchmark Zarr Implementations.

With perfcapture's current API, it should be possible to call Zarr Implementations from perfcapture.Workload.run (probably using one perfcapture.Workload class per Zarr implementation).

But it might also be nice to harmonize the API, such that both perfcapture and Zarr Implementations use the same yaml structures to define the workloads.

Related:

What to measure during benchmarking?

The plan is to implement a benchmarking tool which automatically runs a suite of "Zarr workloads" across a range of compute platforms, storage media, chunk sizes, and Zarr implementations.

What would we like to measure for each workload?

Existing benchmarking tools only measure the runtime of each workload. That doesn't feel sufficient for Zarr because one of our main questions during benchmarking is whether the Zarr implementation is able to saturate the IO subsystem, and how much CPU and RAM is required to saturate the IO.

I'd propose that it'd be great to measure these parameters each time each workload is run:

  • Total execution time of the workload
  • Total bytes read / written for disk / network
  • Total IO operations
  • Total bytes in final numpy array
  • Average CPU utilization (per CPU).
  • Max RAM usage during the execution of the workload
  • CPU cache hit ratio

(Each run would also capture a bunch of metadata about the environment such as the compute environment, storage media, chunk sizes, Zarr implementation name and version, etc.)

I had previously gotten over-excited and starting thinking about capturing a full "trace" during the execution of each workload, e.g. capturing a timeseries of the IO utilization every 100 milliseconds. This might be useful, but makes the benchmarking code rather more complex, and maybe doesn't tell us much more than the "totals per workload" tell us. And some benchmark workloads might run for less than 100 ms. And psutil's documentation states that some of its counters aren't reliable when polled more frequently than 10 times a second.

What do you folks think? Do we need to record a full "trace" during each workload? Or is it sufficient to just capture totals per workload? Are there any changes you'd make to the list of parameters I proposed above?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.