elastic / rally Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 314.0 9.18 MB

Macrobenchmarking framework for Elasticsearch

License: Apache License 2.0

Python 98.19% Shell 1.48% Makefile 0.20% Jinja 0.12%

elasticsearch

rally's Introduction

Rally

Rally is the macrobenchmarking framework for Elasticsearch

What is Rally?

You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks:

Setup and teardown of an Elasticsearch cluster for benchmarking
Management of benchmark data and specifications even across Elasticsearch versions
Running benchmarks and recording results
Finding performance problems by attaching so-called telemetry devices
Comparing performance results

We have also put considerable effort in Rally to ensure that benchmarking data are reproducible.

Quick Start

Rally is developed for Unix and is actively tested on Linux and macOS. Rally supports benchmarking Elasticsearch clusters running on Windows but Rally itself needs to be installed on machines running Unix.

Installing Rally

Note: If you actively develop on Elasticsearch, we recommend that you install Rally in development mode instead as Elasticsearch is fast moving and Rally always adapts accordingly to the latest main version.

Install Python 3.8+ including pip3, git 1.9+ and an appropriate JDK to run Elasticsearch. Be sure that JAVA_HOME points to that JDK. Then run the following command, optionally prefixed by sudo if necessary:

pip3 install esrally

If you have any trouble or need more detailed instructions, please look in the detailed installation guide.

Run your first race

Now we're ready to run our first race:

esrally race --distribution-version=6.0.0 --track=geonames

This will download Elasticsearch 6.0.0 and run Rally's default track - the geonames track - against it. After the race, a summary report is written to the command line:

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                         Metric |                 Task |     Value |   Unit |
|-------------------------------:|---------------------:|----------:|-------:|
|            Total indexing time |                      |   28.0997 |    min |
|               Total merge time |                      |   6.84378 |    min |
|             Total refresh time |                      |   3.06045 |    min |
|               Total flush time |                      |  0.106517 |    min |
|      Total merge throttle time |                      |   1.28193 |    min |
|               Median CPU usage |                      |     471.6 |      % |
|             Total Young Gen GC |                      |    16.237 |      s |
|               Total Old Gen GC |                      |     1.796 |      s |
|                     Index size |                      |   2.60124 |     GB |
|                  Total written |                      |   11.8144 |     GB |
|         Heap used for segments |                      |   14.7326 |     MB |
|       Heap used for doc values |                      |  0.115917 |     MB |
|            Heap used for terms |                      |   13.3203 |     MB |
|            Heap used for norms |                      | 0.0734253 |     MB |
|           Heap used for points |                      |    0.5793 |     MB |
|    Heap used for stored fields |                      |  0.643608 |     MB |
|                  Segment count |                      |        97 |        |
|                 Min Throughput |         index-append |   31925.2 | docs/s |
|              Median Throughput |         index-append |   39137.5 | docs/s |
|                 Max Throughput |         index-append |   39633.6 | docs/s |
|      50.0th percentile latency |         index-append |   872.513 |     ms |
|      90.0th percentile latency |         index-append |   1457.13 |     ms |
|      99.0th percentile latency |         index-append |   1874.89 |     ms |
|       100th percentile latency |         index-append |   2711.71 |     ms |
| 50.0th percentile service time |         index-append |   872.513 |     ms |
| 90.0th percentile service time |         index-append |   1457.13 |     ms |
| 99.0th percentile service time |         index-append |   1874.89 |     ms |
|  100th percentile service time |         index-append |   2711.71 |     ms |
|                           ...  |                  ... |       ... |    ... |
|                           ...  |                  ... |       ... |    ... |
|                 Min Throughput |     painless_dynamic |   2.53292 |  ops/s |
|              Median Throughput |     painless_dynamic |   2.53813 |  ops/s |
|                 Max Throughput |     painless_dynamic |   2.54401 |  ops/s |
|      50.0th percentile latency |     painless_dynamic |    172208 |     ms |
|      90.0th percentile latency |     painless_dynamic |    310401 |     ms |
|      99.0th percentile latency |     painless_dynamic |    341341 |     ms |
|      99.9th percentile latency |     painless_dynamic |    344404 |     ms |
|       100th percentile latency |     painless_dynamic |    344754 |     ms |
| 50.0th percentile service time |     painless_dynamic |    393.02 |     ms |
| 90.0th percentile service time |     painless_dynamic |   407.579 |     ms |
| 99.0th percentile service time |     painless_dynamic |   430.806 |     ms |
| 99.9th percentile service time |     painless_dynamic |   457.352 |     ms |
|  100th percentile service time |     painless_dynamic |   459.474 |     ms |

----------------------------------
[INFO] SUCCESS (took 2634 seconds)
----------------------------------

Getting help

Quick help: esrally --help
Look in Rally's user guide for more information
Ask questions about Rally in the Rally Discuss forum.
File improvements or bug reports in our Github repo.

How to Contribute

See all details in the contributor guidelines.

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

rally's People

Contributors

Stargazers

Watchers

Forkers

jasontedor threatstop monk-ee danielmitterdorfer mingyitianxia jpountz quanpinjie nicholaskuechler dakrone lw309637554 pavelnikolov fengshao0907 joola aswinm ift8 qiuyuanfeng up1 cdahlqvist jomenxiao spiegela fatelei pschuermann sstults ddoloroi ggchangan levylll keyur9 80nianmo adinede zumo64 jimczi sangrho tom-deng cominvent aalfonso-stratio dustydecapod dbarthe honzakral shaharmor dongjunqiang dnhatn mhko saliormoon ruflin chlsmile trendsoa hexinw lukas-vlcek shaogx jumolis gdmello songfj sen0120 zvictorino gpdream mveitas hatdropper1977 ddillinger yanghongwu kesslerm openbl 3838438org tomcallahan dliappis sonecabr felfilali vidhyaa-yelp yileye suhuaguo alinazemian paulcoghlan iris-qq ducas trongtan124 krishkoneru dkul108 marks-chan 1008711 bigfishman akhil-rane tuniu1985 yodasantu chung0921 christophwong sumit-gupta-sgt 06094051 weichaojie joway msufa pengdake acartag7 bzcai ebadyano vladmasarik ynuosoft reece15 raghu999 imotov jiangew quartzinquartz

rally's Issues

Recreate former benchmarks from core

In elastic/elasticsearch#15356, we have removed some benchmarks from ES core as they are outdated and have also some conceptual problems. We should go through these benchmarks, check which of these would make sense as macrobenchmarks and add them to Rally.

Run Rally on EC2

We run the nightly benchmarks also on EC2. Rally should be able to do this too. Hint: It could turn out that we just have to have a provisioning script in place and there is not much to do in Rally.

Use metrics store in graph reporter (reporter.py)

After we've introduced a metrics store in #8, we have to make use of it in the graph reporter.

Note for nightly benchmarks: We will not support multiple metrics store implementations but rather migrate the existing files to the new structure in a one-time effort.

Allow to define a custom logging configuration

We want to customize the logging configuration for certain benchmarks. Previously this was possible in a limited fashion with some command line flags (e.g. verboseIW). We want to allow users to define custom logging snippets in a track specification.

Note that depending on the log configuration, logs could get huge so we should also compress them (by default). However, this is not in the scope of this ticket. This will be tackled in #17.

Physically isolate the benchmark candidate

We currently run the benchmark driver on the same physical machine as the benchmark candidate almost certainly skews results (how much is subject to further analysis, see also #9). Even in the local case we should strive to minimize interference as much as possible (pinning?)

Report latency percentile distribution

We should not report mean latency (which is misleading) but rather percentile distributions.

Clean up all paths when multiple data paths are specified

A user can specify multiple data paths but we currently clean up only the main data path which leads to misleading failures in the follow-up runs:

Racing on track 'Geonames' with setup '4gheap'
Traceback (most recent call last):
  File "/usr/local/bin/esrally", line 9, in <module>
    load_entry_point('esrally==0.0.3.dev0', 'console_scripts', 'esrally')()
  File "/home/ec2-user/rally/rally/rally.py", line 205, in main
    race_control.start(subcommand)
  File "/home/ec2-user/rally/rally/racecontrol.py", line 35, in start
    p.do(track)
  File "/home/ec2-user/rally/rally/racecontrol.py", line 85, in do
    self._driver.setup(cluster, track, track_setup)
  File "/home/ec2-user/rally/rally/driver.py", line 37, in setup
    cluster.client().indices.create(index=track.index_name)
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/client/utils.py", line 69, in _wrapped
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/client/indices.py", line 105, in create
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/transport.py", line 329, in perform_request
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/connection/http_urllib3.py", line 106, in perform_request
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/connection/base.py", line 105, in _raise_error
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'already exists')

Support secure and authenticated connection to ES metrics store

Currently we support only unencrypted communication with the metrics cluster. To properly support Found as metrics store, we should add secure and authenticated connections.

Write command line summary report to log file

Currently, the command line report is only dumped to the console but investigation across runs would be easier for users if we also write it to disk.

Rally should not remove the elasticsearch install when finished

Could rally not remove the install directory once a benchmark is finished? This would allow for further understanding of the benchmark results, for instance to see the distribution of the sizes of segments, how much disk is used for doc values, etc.

Add support to run a specific track setup

We should allow users to run only specific track setups. This is mainly needed for the nightlies but it would also allow local users to specify other track setups than the default one.

Add full-text dataset

We should also add a full-text dataset to Rally.

Hint: @polyfractal has already started looking into it.

Disk usage measurement is not portable

Like in the original version, we use du -s to determine the final index size and assume that the output always gives the size of the data directory in KB.

According to the man page of du "Display values are in units of the first available SIZE from --block-size, and the DU_BLOCK_SIZE, BLOCK_SIZE and BLOCKSIZE environment variables. Otherwise, units default to 1024 bytes (or 512 if POSIXLY_CORRECT is set)."

On my Mac I get on a directory with one file < 4K in it:

dm@io:scratch/test $ du -s
8   .
dm@io:scratch/test $ du -hs
4.0K    .

while on a Linux box, I get identical results.

On a Mac we would then mistakenly report an index size of 8K although it is 4K.

Apart from that we should decide whether we want to report the "real" file size in bytes or the file size based on the number of consumed filesystem blocks. I'd tend to measure the former. Implementation hint for the latter: du -sk.

/cc @mikemccand

Compress log files

Related to #16: We want to compress log files so in case somebody enables trace logging we don't leave huge log files behind.

Separate track specification from track execution

We should not put tracks (i.e. the actual benchmark specification) directly into core but should provide some kind of benchmark repository (think Maven central but for benchmarks). However we still must be able to develop benchmarks locally.

We should also introduce a logical URL schema for track data URLs (e.g. $ROOT_URL/$benchmark-name/$index-name/$type-name/)

Improve flexibility of benchmarks

Vision (Short Term)

After we have implemented everything concerning this meta-ticket, users will be able to define their own tracks separately from Rally. Tracks will then be able to move at a different pace than Rally. Also users can have their own private tracks if they want to and tracks can be also run on different versions of Elasticsearch (but concerning just the track-specific stuff, we do not cover version-aware cluster provisioning here (i.e. if the startup options format changes you're still screwed)).

Maintenance of tracks across multiple versions of Elasticsearch may involve a bit of manual work but if needed we could add tool support (I'd just not consider it a priority at the moment, we have larger issues to tackle (but happy to receive PRs)). So we solve the maintenance part primarily by documenting on how to do it.

These are the relevant sub-tickets:

#61: Allow more flexibility over what exact steps are executed overall in Rally
#52: Allow more control over how the benchmark candidate is configured
#26: Separate track specification from track execution
#69 Support tracks across multiple versions of Elasticsearch
#99 Support multiple track repositories

Provisioned cluster name should account for setups on multiple machines

Currently, the provisioner takes the host name into account when creating a cluster name. We should change that to something that is independent of the host in order to allow Rally to provision benchmark candidates spreading over multiple machines (e.g. use invocation timestamp or track and track-setup)

Allow uploading of generated reports to S3

Rally should be able to upload benchmark reports to S3 (needed for nightly builds)

Allow the user to define which steps Rally performs

This is related to #5 and is intended as a long-term solution for that ticket. Consider users which have already prepared their cluster, indexed lots of data but want to benchmark e.g. search performance. We should allow them to use Rally for benchmarking. This means that we need to introduce more flexibility into the stages that Rally performs (checkout, build, provisioning, launch, benchmarking).

Some disadvantages that we need to consider:

We lose the ability to attach certain profilers to the benchmark candidate if we do not launch them from Rally (e.g. the JFR profiler)
From a conceptual point of view, we cannot guarantee reproducible results as we have no control whatsoever about the environment in which the benchmark candidate is launched and it is probably also not possible to gather enough data about the runtime environment of the benchmark candidate (OS, CPU, memory, launch settings, etc. etc.)

Allow to specify the effective start time

Currently, Rally assumes "now" as start time which has been ok until now. But as we like to support back testing etc. we want to have control over the assumed effective start date.

Therefore, we will add a (undocumented) command line option to override the effective start date. It is undocumented because it is not only useless to the intended user base but also confusing.

Allow to override the specified source directory on the command line

We could maybe also just check the cwd and use that as a source directory

Expose benchmark script

We implement more or less implicitly two benchmark plans in driver.py: bulk indexing and searching. Users may want to define their own benchmark plans. However, we need to prepare the infrastructure for that before tackling this.

By having this in place we could benchmark all kind of situations like single-shot query latency.

Unzip the file after downloading

the bz2 uncompress is eating some CPU so it's not stressing elasticsearch as much as possible, the observed difference is small, but it is there...

esrally configure could merge a previous configuration

When a user runs esrally configure for the second time, Rally will overwrite the existing configuration (but issue a warning before-hand). We could use the existing configuration and use the values stored there as default.

Consider also the following scenario:

esrally configure --advanced-config
esrally configure

In this case we should just pass on the advanced config values from the first run although the user doesn't configure them in the second run. Otherwise, the would be reset to their defaults which is probably surprising to the user.

Support multiple data paths

Currently, the benchmark candidate's config option path.data in elasticsearch.yml is allowed to have only one entry. We should allow the user to specify more than one.

Split metrics gathering from reporting

Currently there is no data model whatsoever for metrics but just plain log files. Each reporter just parses these log files and creates reports from them. We should define a dedicated metrics data model. Reporters should just be responsible for rendering these metrics.

Sub tasks:

Define a data model. For now, we will just allow simple key, value pairs in the context of the triple (invocation_timestamp, track, track_setup).
Implement a metrics store based on a dedicated Elasticsearch instance.
Store metrics in the metrics store. This will be properly implemented in #21.
Use metrics store in reporting. In the first step, we will only use the metrics store for summary reports. Graphs will follow in #46.

Separate content and logic in reporter.py

We should externalize all templates in reporter.py and use some template engine like Jinja2. Note: We also need to define a metrics data model (see #8) but these two tickets are independent.

Benchmark different memory configurations

We should consider benchmarking with different memory settings per node. The intention is that below 4GB, compressed oops use byte offsets, then object offsets (division by 8) and above 32GB they are uncompressed, i.e. 64 bits wide (source). The intention behind measuring it at all is to show that it is a bad idea to run at large heaps, not so much the concrete numbers (and we should mention that there are also other side effects like that it (obviously) affects GC times in HotSpot).

Allow pluggable profilers

We should have a possibility to plug in different profilers which gather metrics, like CPU, memory, JVM statistics.

Support environment-specific node configuration

There are some occasions when we want to provide an environment specific node configuration to Rally. Currently, this would be necessary for specifying multiple data paths properly (see also #11) which we work around with an undocumented command line option for now.

As the setting is environment-specific it should not be hard-coded in the track setup but rather provided externally via a separate configuration file.

Check for accidental bottlenecks

We should check for accidental bottlenecks in the benchmark driver, problems with system setup (like #3) etc. We should also check the overhead of different profilers (see #19) and the metrics store (see #8).

This just serves as a reminder ticket and it is expected that lots of related tickets are created after initial analysis.

Some concrete things to check:

Behavior during indexing: Effects of number of indexing threads, what numbers are reported and how are they calculated? how are time intervals measured? What is the effect of different bulk sizes? Where are our bottlenecks?
Latency: What is the overhead for latency measurement? Are we prone to coordinated omission (I almost certainly think so and getting experimental evidence is easier once #62 is in)? What numbers are gathered and reported? This is tracked in the follow-up ticket #64
How are system metrics gathered and reported (e.g. CPU stats)? Can we cross-validate their correctness (e.g. GC times can be cross-validated with Java flight recorder)?
What is the overhead of different profilers in Rally? (Postponed and separately tracked in #66)
Do we account for proper warmup so we only measure a stable system?

Experimental setup

As a first step, I have recorded all HTTP requests and responses that are issued during the benchmark and have mocked the benchmark candidate with an nginx returning static responses (under the assumption that the bottleneck is now the benchmark driver).

The assumption that the benchmark driver is the bottleneck is supported by the result of the following benchmark with wrk against nginx:

dm@io:rally/performance_verification $ wrk -t8 -c8 -d128s -s post.lua http://127.0.0.1:8200/geonames/type/_bulk
Running 2m test @ http://127.0.0.1:8200/geonames/type/_bulk
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.12ms  427.35us   2.78ms   72.78%
    Req/Sec   174.00    337.51     1.04k    88.89%
  158 requests in 2.13m, 124.35MB read

post.lua represents one bulk request with 5000 documents. So, considering that we reach on average 174 requests / second, this is the equivalent of 174 * 5.000 = 870.000 documents / second. If we reach numbers in this range, we should consider our mock Elasticsearch (nginx) the bottleneck, else Rally.

8 test threads were chosen because this is currently the maximum number of client threads that are used by Rally on the benchmark machine for this ticket (i.e. my notebook).

Introduce benchmark "stages" that can be invoked separately

We should split benchmarking into multiple stages which can also be invoked separately (by providing a command line option). For starters, we should expose these stages:

race: (for now): set up of the benchmark candidate and the actual benchmarking stage (generating metrics but without generating reports). We could split this phase further in some kind of setup phase but we'll keep it simple for now.
report: Just generates reports from existing data
all: Meta-command that invokes all stages, intended for local runs (which should also be the default if possible)
configure to configure Rally either initially or reconfigure it later.

Download documents.bz2 to file to .tmp and rename

I had a truncated .bz2 file and hit scary exceptions running esrally, I think because I ctrl+c'd once while the download was happening and then when I ran again esrally assumed the file was complete.

We should just download to temp file and rename in the end to prevent this?

Show test details

From a customer request:

Any chance the annotations could be made "live" so we can see what the underlying change was?

(https://elasticsearch.zendesk.com/agent/tickets/16553)

Add perf profiler

With the profiler infrastructure in place (see #19), we could add a perf profiler to get CPU level information. This is also a first step towards flamegraph profiling (see #29).

Consider adding an Uber-benchmark

In a blog post an analysis of 1.1 billion Uber rides is discussed. The data are also available at the nyc-taxi-data Github project. We should check whether this would be a worthwhile benchmark. Also check the licensing model in more detail.

Open source Rally

Before we open source Rally, we have to tackle a few things:

Measure merge parts time

In the nightlies, we have a track setup 'defaults_verbose_iw', which provides the raw data for the merge part chart.

We need three things for this:

A custom log configuration in logging.yml: index.engine.lucene.iw: TRACE
Disable auto-throttling in elasticsearch.yml: index.merge.scheduler.auto_throttle: false
A new profiler which analyzes the log files after the benchmark has run and extracts the relevant metrics.

As this will need support to customize logs (#16) and compress them (#17) as trace logging creates large log files, this is currently blocked by the aforementioned tickets.

Rethink directory structure

We have to ensure we keep a logical directory structure for all the files that are written considering all of the newly introduced features. Things to keep in mind: we have multiple tracks, multiple setups, multiple profilers (see #19) and we probably also want to keep the install directory around (see #5)

Evaluate provisioning of benchmark machines with Ansible

We should evaluate whether we can leverage the Elasticsearch Ansible playbook to provision machines with Ansible instead of doing it manually.

Benefits:

Reduces complexity of Rally
Easier to support more complex scenarios like multi-node, plugin installation etc. (hopefully)

As Ansible is also written in Python, we should be able to use the Ansible Python API.

Port new merge parts chart from nightly build

See https://benchmarks.elastic.co

Implement a dygraph reporter

Although we now have implemented an integration with Kibana (see #46), we still want to be able to use the dygraphs library for the nightlies. Therefore, we have to implement a new reporter, which reads data from the metrics store and produces an HTML report.

Add Java Flight Recorder as a profiler

Related to #19 we can add the Java Flight Recorder as a proof of concept profiler. This has the advantage that we get already time-series data for the first time when running a macrobenchmark and not just summary statistics.

Move metrics to new profiler infrastructure

Currently, metrics gathering is completely tied to the benchmark runner (i.e. driver.py). We should refactor this and create individual profilers. I'd expect that the profiler infrastructure will also change due to the refactoring efforts in this ticket.

Allow to define specific revisions

Rally supports the parameter --revision with the two "meta-revisions" current and latest. A user should be able to specify also:

A git commit hash
A timestamp

Add JIT profiler

We should add a profiler to gather JIT compiler logs. This is needed for verification of proper warmup times (among other things).

Check whether benchmarked cluster is successfully started

Currently, we just blindly try to launch the cluster but do not check whether it worked. We should also provide some user feedback in this case.

Allow different lucene configurations

In comparing the index and query performance of geo_point types with different versions of Lucene @mikemccand noted a difference in the number of lucene segments reported even though the number of test documents is fixed. This effectively makes query performance a bogus comparison. The following image illustrates the differences:

It would be nice to define different lucene configurations (e.g., force merges) in a TrackSpecification to improve comparisons.

Replace existing reporting graphics with Kibana

Add flamegraph profiler

With the profiler infrastructure in place (see #19), we could add flamegraphs to get deeper insights down to CPU level.