Giter VIP home page Giter VIP logo

rally's Introduction

Rally logo

Rally

Rally is the macrobenchmarking framework for Elasticsearch

What is Rally?

You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks:

  • Setup and teardown of an Elasticsearch cluster for benchmarking
  • Management of benchmark data and specifications even across Elasticsearch versions
  • Running benchmarks and recording results
  • Finding performance problems by attaching so-called telemetry devices
  • Comparing performance results

We have also put considerable effort in Rally to ensure that benchmarking data are reproducible.

Quick Start

Rally is developed for Unix and is actively tested on Linux and macOS. Rally supports benchmarking Elasticsearch clusters running on Windows but Rally itself needs to be installed on machines running Unix.

Installing Rally

Note: If you actively develop on Elasticsearch, we recommend that you install Rally in development mode instead as Elasticsearch is fast moving and Rally always adapts accordingly to the latest main version.

Install Python 3.8+ including pip3, git 1.9+ and an appropriate JDK to run Elasticsearch. Be sure that JAVA_HOME points to that JDK. Then run the following command, optionally prefixed by sudo if necessary:

pip3 install esrally

If you have any trouble or need more detailed instructions, please look in the detailed installation guide.

Run your first race

Now we're ready to run our first race:

esrally race --distribution-version=6.0.0 --track=geonames

This will download Elasticsearch 6.0.0 and run Rally's default track - the geonames track - against it. After the race, a summary report is written to the command line:

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                         Metric |                 Task |     Value |   Unit |
|-------------------------------:|---------------------:|----------:|-------:|
|            Total indexing time |                      |   28.0997 |    min |
|               Total merge time |                      |   6.84378 |    min |
|             Total refresh time |                      |   3.06045 |    min |
|               Total flush time |                      |  0.106517 |    min |
|      Total merge throttle time |                      |   1.28193 |    min |
|               Median CPU usage |                      |     471.6 |      % |
|             Total Young Gen GC |                      |    16.237 |      s |
|               Total Old Gen GC |                      |     1.796 |      s |
|                     Index size |                      |   2.60124 |     GB |
|                  Total written |                      |   11.8144 |     GB |
|         Heap used for segments |                      |   14.7326 |     MB |
|       Heap used for doc values |                      |  0.115917 |     MB |
|            Heap used for terms |                      |   13.3203 |     MB |
|            Heap used for norms |                      | 0.0734253 |     MB |
|           Heap used for points |                      |    0.5793 |     MB |
|    Heap used for stored fields |                      |  0.643608 |     MB |
|                  Segment count |                      |        97 |        |
|                 Min Throughput |         index-append |   31925.2 | docs/s |
|              Median Throughput |         index-append |   39137.5 | docs/s |
|                 Max Throughput |         index-append |   39633.6 | docs/s |
|      50.0th percentile latency |         index-append |   872.513 |     ms |
|      90.0th percentile latency |         index-append |   1457.13 |     ms |
|      99.0th percentile latency |         index-append |   1874.89 |     ms |
|       100th percentile latency |         index-append |   2711.71 |     ms |
| 50.0th percentile service time |         index-append |   872.513 |     ms |
| 90.0th percentile service time |         index-append |   1457.13 |     ms |
| 99.0th percentile service time |         index-append |   1874.89 |     ms |
|  100th percentile service time |         index-append |   2711.71 |     ms |
|                           ...  |                  ... |       ... |    ... |
|                           ...  |                  ... |       ... |    ... |
|                 Min Throughput |     painless_dynamic |   2.53292 |  ops/s |
|              Median Throughput |     painless_dynamic |   2.53813 |  ops/s |
|                 Max Throughput |     painless_dynamic |   2.54401 |  ops/s |
|      50.0th percentile latency |     painless_dynamic |    172208 |     ms |
|      90.0th percentile latency |     painless_dynamic |    310401 |     ms |
|      99.0th percentile latency |     painless_dynamic |    341341 |     ms |
|      99.9th percentile latency |     painless_dynamic |    344404 |     ms |
|       100th percentile latency |     painless_dynamic |    344754 |     ms |
| 50.0th percentile service time |     painless_dynamic |    393.02 |     ms |
| 90.0th percentile service time |     painless_dynamic |   407.579 |     ms |
| 99.0th percentile service time |     painless_dynamic |   430.806 |     ms |
| 99.9th percentile service time |     painless_dynamic |   457.352 |     ms |
|  100th percentile service time |     painless_dynamic |   459.474 |     ms |

----------------------------------
[INFO] SUCCESS (took 2634 seconds)
----------------------------------

Getting help

How to Contribute

See all details in the contributor guidelines.

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Copyright 2015-2021 Elasticsearch https://www.elastic.co

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

rally's People

Contributors

alexsapran avatar b-deam avatar bartier avatar cdahlqvist avatar danielmitterdorfer avatar dependabot[bot] avatar djrickyb avatar dliappis avatar dnhatn avatar drawlerr avatar ebadyano avatar ebuildy avatar favilo avatar gareth-ellis avatar gbanasiak avatar honzakral avatar hub-cap avatar inqueue avatar j-bennet avatar jimczi avatar kesslerm avatar kquick avatar michaelbaamonde avatar mikemccand avatar nik9000 avatar openbl avatar paulcoghlan avatar pquentin avatar probakowski avatar ywelsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rally's Issues

Run Rally on EC2

We run the nightly benchmarks also on EC2. Rally should be able to do this too. Hint: It could turn out that we just have to have a provisioning script in place and there is not much to do in Rally.

Use metrics store in graph reporter (reporter.py)

After we've introduced a metrics store in #8, we have to make use of it in the graph reporter.

Note for nightly benchmarks: We will not support multiple metrics store implementations but rather migrate the existing files to the new structure in a one-time effort.

Allow to define a custom logging configuration

We want to customize the logging configuration for certain benchmarks. Previously this was possible in a limited fashion with some command line flags (e.g. verboseIW). We want to allow users to define custom logging snippets in a track specification.

Note that depending on the log configuration, logs could get huge so we should also compress them (by default). However, this is not in the scope of this ticket. This will be tackled in #17.

Physically isolate the benchmark candidate

We currently run the benchmark driver on the same physical machine as the benchmark candidate almost certainly skews results (how much is subject to further analysis, see also #9). Even in the local case we should strive to minimize interference as much as possible (pinning?)

Clean up all paths when multiple data paths are specified

A user can specify multiple data paths but we currently clean up only the main data path which leads to misleading failures in the follow-up runs:

Racing on track 'Geonames' with setup '4gheap'
Traceback (most recent call last):
  File "/usr/local/bin/esrally", line 9, in <module>
    load_entry_point('esrally==0.0.3.dev0', 'console_scripts', 'esrally')()
  File "/home/ec2-user/rally/rally/rally.py", line 205, in main
    race_control.start(subcommand)
  File "/home/ec2-user/rally/rally/racecontrol.py", line 35, in start
    p.do(track)
  File "/home/ec2-user/rally/rally/racecontrol.py", line 85, in do
    self._driver.setup(cluster, track, track_setup)
  File "/home/ec2-user/rally/rally/driver.py", line 37, in setup
    cluster.client().indices.create(index=track.index_name)
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/client/utils.py", line 69, in _wrapped
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/client/indices.py", line 105, in create
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/transport.py", line 329, in perform_request
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/connection/http_urllib3.py", line 106, in perform_request
  File "/usr/local/lib/python3.4/site-packages/elasticsearch-2.2.0-py3.4.egg/elasticsearch/connection/base.py", line 105, in _raise_error
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'already exists')

Rally should not remove the elasticsearch install when finished

Could rally not remove the install directory once a benchmark is finished? This would allow for further understanding of the benchmark results, for instance to see the distribution of the sizes of segments, how much disk is used for doc values, etc.

Add support to run a specific track setup

We should allow users to run only specific track setups. This is mainly needed for the nightlies but it would also allow local users to specify other track setups than the default one.

Disk usage measurement is not portable

Like in the original version, we use du -s to determine the final index size and assume that the output always gives the size of the data directory in KB.

According to the man page of du "Display values are in units of the first available SIZE from --block-size, and the DU_BLOCK_SIZE, BLOCK_SIZE and BLOCKSIZE environment variables. Otherwise, units default to 1024 bytes (or 512 if POSIXLY_CORRECT is set)."

On my Mac I get on a directory with one file < 4K in it:

dm@io:scratch/test $ du -s
8   .
dm@io:scratch/test $ du -hs
4.0K    .

while on a Linux box, I get identical results.

On a Mac we would then mistakenly report an index size of 8K although it is 4K.

Apart from that we should decide whether we want to report the "real" file size in bytes or the file size based on the number of consumed filesystem blocks. I'd tend to measure the former. Implementation hint for the latter: du -sk.

/cc @mikemccand

Compress log files

Related to #16: We want to compress log files so in case somebody enables trace logging we don't leave huge log files behind.

Separate track specification from track execution

We should not put tracks (i.e. the actual benchmark specification) directly into core but should provide some kind of benchmark repository (think Maven central but for benchmarks). However we still must be able to develop benchmarks locally.

We should also introduce a logical URL schema for track data URLs (e.g. $ROOT_URL/$benchmark-name/$index-name/$type-name/)

Improve flexibility of benchmarks

Vision (Short Term)

After we have implemented everything concerning this meta-ticket, users will be able to define their own tracks separately from Rally. Tracks will then be able to move at a different pace than Rally. Also users can have their own private tracks if they want to and tracks can be also run on different versions of Elasticsearch (but concerning just the track-specific stuff, we do not cover version-aware cluster provisioning here (i.e. if the startup options format changes you're still screwed)).

Maintenance of tracks across multiple versions of Elasticsearch may involve a bit of manual work but if needed we could add tool support (I'd just not consider it a priority at the moment, we have larger issues to tackle (but happy to receive PRs)). So we solve the maintenance part primarily by documenting on how to do it.

These are the relevant sub-tickets:

  • #61: Allow more flexibility over what exact steps are executed overall in Rally
  • #52: Allow more control over how the benchmark candidate is configured
  • #26: Separate track specification from track execution
  • #69 Support tracks across multiple versions of Elasticsearch
  • #99 Support multiple track repositories

Provisioned cluster name should account for setups on multiple machines

Currently, the provisioner takes the host name into account when creating a cluster name. We should change that to something that is independent of the host in order to allow Rally to provision benchmark candidates spreading over multiple machines (e.g. use invocation timestamp or track and track-setup)

Allow the user to define which steps Rally performs

This is related to #5 and is intended as a long-term solution for that ticket. Consider users which have already prepared their cluster, indexed lots of data but want to benchmark e.g. search performance. We should allow them to use Rally for benchmarking. This means that we need to introduce more flexibility into the stages that Rally performs (checkout, build, provisioning, launch, benchmarking).

Some disadvantages that we need to consider:

  • We lose the ability to attach certain profilers to the benchmark candidate if we do not launch them from Rally (e.g. the JFR profiler)
  • From a conceptual point of view, we cannot guarantee reproducible results as we have no control whatsoever about the environment in which the benchmark candidate is launched and it is probably also not possible to gather enough data about the runtime environment of the benchmark candidate (OS, CPU, memory, launch settings, etc. etc.)

Allow to specify the effective start time

Currently, Rally assumes "now" as start time which has been ok until now. But as we like to support back testing etc. we want to have control over the assumed effective start date.

Therefore, we will add a (undocumented) command line option to override the effective start date. It is undocumented because it is not only useless to the intended user base but also confusing.

Expose benchmark script

We implement more or less implicitly two benchmark plans in driver.py: bulk indexing and searching. Users may want to define their own benchmark plans. However, we need to prepare the infrastructure for that before tackling this.

By having this in place we could benchmark all kind of situations like single-shot query latency.

Unzip the file after downloading

the bz2 uncompress is eating some CPU so it's not stressing elasticsearch as much as possible, the observed difference is small, but it is there...

esrally configure could merge a previous configuration

When a user runs esrally configure for the second time, Rally will overwrite the existing configuration (but issue a warning before-hand). We could use the existing configuration and use the values stored there as default.

Consider also the following scenario:

esrally configure --advanced-config
esrally configure

In this case we should just pass on the advanced config values from the first run although the user doesn't configure them in the second run. Otherwise, the would be reset to their defaults which is probably surprising to the user.

Support multiple data paths

Currently, the benchmark candidate's config option path.data in elasticsearch.yml is allowed to have only one entry. We should allow the user to specify more than one.

Split metrics gathering from reporting

Currently there is no data model whatsoever for metrics but just plain log files. Each reporter just parses these log files and creates reports from them. We should define a dedicated metrics data model. Reporters should just be responsible for rendering these metrics.

Sub tasks:

  • Define a data model. For now, we will just allow simple key, value pairs in the context of the triple (invocation_timestamp, track, track_setup).
  • Implement a metrics store based on a dedicated Elasticsearch instance.
  • Store metrics in the metrics store. This will be properly implemented in #21.
  • Use metrics store in reporting. In the first step, we will only use the metrics store for summary reports. Graphs will follow in #46.

Benchmark different memory configurations

We should consider benchmarking with different memory settings per node. The intention is that below 4GB, compressed oops use byte offsets, then object offsets (division by 8) and above 32GB they are uncompressed, i.e. 64 bits wide (source). The intention behind measuring it at all is to show that it is a bad idea to run at large heaps, not so much the concrete numbers (and we should mention that there are also other side effects like that it (obviously) affects GC times in HotSpot).

Allow pluggable profilers

We should have a possibility to plug in different profilers which gather metrics, like CPU, memory, JVM statistics.

Support environment-specific node configuration

There are some occasions when we want to provide an environment specific node configuration to Rally. Currently, this would be necessary for specifying multiple data paths properly (see also #11) which we work around with an undocumented command line option for now.

As the setting is environment-specific it should not be hard-coded in the track setup but rather provided externally via a separate configuration file.

Check for accidental bottlenecks

We should check for accidental bottlenecks in the benchmark driver, problems with system setup (like #3) etc. We should also check the overhead of different profilers (see #19) and the metrics store (see #8).

This just serves as a reminder ticket and it is expected that lots of related tickets are created after initial analysis.

Some concrete things to check:

  • Behavior during indexing: Effects of number of indexing threads, what numbers are reported and how are they calculated? how are time intervals measured? What is the effect of different bulk sizes? Where are our bottlenecks?
  • Latency: What is the overhead for latency measurement? Are we prone to coordinated omission (I almost certainly think so and getting experimental evidence is easier once #62 is in)? What numbers are gathered and reported? This is tracked in the follow-up ticket #64
  • How are system metrics gathered and reported (e.g. CPU stats)? Can we cross-validate their correctness (e.g. GC times can be cross-validated with Java flight recorder)?
  • What is the overhead of different profilers in Rally? (Postponed and separately tracked in #66)
  • Do we account for proper warmup so we only measure a stable system?

Experimental setup

As a first step, I have recorded all HTTP requests and responses that are issued during the benchmark and have mocked the benchmark candidate with an nginx returning static responses (under the assumption that the bottleneck is now the benchmark driver).

The assumption that the benchmark driver is the bottleneck is supported by the result of the following benchmark with wrk against nginx:

dm@io:rally/performance_verification $ wrk -t8 -c8 -d128s -s post.lua http://127.0.0.1:8200/geonames/type/_bulk
Running 2m test @ http://127.0.0.1:8200/geonames/type/_bulk
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.12ms  427.35us   2.78ms   72.78%
    Req/Sec   174.00    337.51     1.04k    88.89%
  158 requests in 2.13m, 124.35MB read

post.lua represents one bulk request with 5000 documents. So, considering that we reach on average 174 requests / second, this is the equivalent of 174 * 5.000 = 870.000 documents / second. If we reach numbers in this range, we should consider our mock Elasticsearch (nginx) the bottleneck, else Rally.

8 test threads were chosen because this is currently the maximum number of client threads that are used by Rally on the benchmark machine for this ticket (i.e. my notebook).

Introduce benchmark "stages" that can be invoked separately

We should split benchmarking into multiple stages which can also be invoked separately (by providing a command line option). For starters, we should expose these stages:

  • race: (for now): set up of the benchmark candidate and the actual benchmarking stage (generating metrics but without generating reports). We could split this phase further in some kind of setup phase but we'll keep it simple for now.
  • report: Just generates reports from existing data
  • all: Meta-command that invokes all stages, intended for local runs (which should also be the default if possible)
  • configure to configure Rally either initially or reconfigure it later.

Download documents.bz2 to file to .tmp and rename

I had a truncated .bz2 file and hit scary exceptions running esrally, I think because I ctrl+c'd once while the download was happening and then when I ran again esrally assumed the file was complete.

We should just download to temp file and rename in the end to prevent this?

Add perf profiler

With the profiler infrastructure in place (see #19), we could add a perf profiler to get CPU level information. This is also a first step towards flamegraph profiling (see #29).

Open source Rally

Before we open source Rally, we have to tackle a few things:

  • Check if docs are helpful enough and possibly provide more docs
  • Check if error and feedback messages are helpful
  • Check if config defaults should be changed to account for non-core devs
  • Add README.rst for PyPI
  • Upload to PyPI
  • Upload user documentation to readthedocs
  • Add some kind of periodic update check as Rally will probably be updated quite often and we want users to upgrade early (e.g. update_checker) (we don't want to bug users and leave it for now)
  • Check if everything regarding licensing is correctly done (also: License files for benchmarks!)
  • Optional: Generate Pydocs (we ignore this for now as Rally is currently more meant to be used as a tool instead of a library and thus the API docs play just a role for developers working on Rally itself)
  • Announce on Discuss (anywhere else?)
  • Write a blog post about Rally
  • Document contribution guidelines
  • Set up CLA check (ask Karel)

Measure merge parts time

In the nightlies, we have a track setup 'defaults_verbose_iw', which provides the raw data for the merge part chart.

We need three things for this:

  1. A custom log configuration in logging.yml: index.engine.lucene.iw: TRACE
  2. Disable auto-throttling in elasticsearch.yml: index.merge.scheduler.auto_throttle: false
  3. A new profiler which analyzes the log files after the benchmark has run and extracts the relevant metrics.

As this will need support to customize logs (#16) and compress them (#17) as trace logging creates large log files, this is currently blocked by the aforementioned tickets.

Rethink directory structure

We have to ensure we keep a logical directory structure for all the files that are written considering all of the newly introduced features. Things to keep in mind: we have multiple tracks, multiple setups, multiple profilers (see #19) and we probably also want to keep the install directory around (see #5)

Evaluate provisioning of benchmark machines with Ansible

We should evaluate whether we can leverage the Elasticsearch Ansible playbook to provision machines with Ansible instead of doing it manually.

Benefits:

  • Reduces complexity of Rally
  • Easier to support more complex scenarios like multi-node, plugin installation etc. (hopefully)

As Ansible is also written in Python, we should be able to use the Ansible Python API.

Implement a dygraph reporter

Although we now have implemented an integration with Kibana (see #46), we still want to be able to use the dygraphs library for the nightlies. Therefore, we have to implement a new reporter, which reads data from the metrics store and produces an HTML report.

Move metrics to new profiler infrastructure

Currently, metrics gathering is completely tied to the benchmark runner (i.e. driver.py). We should refactor this and create individual profilers. I'd expect that the profiler infrastructure will also change due to the refactoring efforts in this ticket.

Allow to define specific revisions

Rally supports the parameter --revision with the two "meta-revisions" current and latest. A user should be able to specify also:

  • A git commit hash
  • A timestamp

Add JIT profiler

We should add a profiler to gather JIT compiler logs. This is needed for verification of proper warmup times (among other things).

Allow different lucene configurations

In comparing the index and query performance of geo_point types with different versions of Lucene @mikemccand noted a difference in the number of lucene segments reported even though the number of test documents is fixed. This effectively makes query performance a bogus comparison. The following image illustrates the differences:

newbenchmarks

It would be nice to define different lucene configurations (e.g., force merges) in a TrackSpecification to improve comparisons.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.