Giter VIP home page Giter VIP logo

ombt-orchestrator's Introduction

Message bus evaluation framework

Context

This is a framework to benchmark the communication middleware supported by oslo.messaging. It's primary goal is to address the evaluation of https://docs.openstack.org/performance-docs/latest/test_plans/massively_distribute_rpc/plan.html.

It is build on top of:

  • EnOSlib. This library helps to describe the experimental workflow and enforce it : from the deployment to the performance metrics analysis.
  • ombt. This will coordinate the benchmark once all the agents are up and running.

From a high level point of view the framework is able to deploy

  • a communication bus (e.g RabbitMQ, qdr aka qpid-dispatch-router),
  • a set of client/server that will communicate
  • start a benchmark while gathering metrics

A typical test consists in the following components:

Client 1---------+      +----------------------+     +-----> Server 1
                 |      |                      |     |
                 +----> |  Communication       | ----+-----> Server 2
Client 2--------------> |  Middleware          |     |
                 +----> |  (e.g qdr, rabbitms) |     |
...              |      |                      |     |
                 |      +----------------------+     +------> Server n
Client n---------+              |                             /
  \                                                         /
    \                           |                         / 
      \  --  --  --  --  -- Monitoring --  --  --  --  --

Installation

  • Clone the repository:
git clone https://github.com/msimonin/ombt-orchestrator
cd ombt-orchestrator
  • Install the dependencies
pip install -U pip
pip install -e .

On Grid'5000 you can launch this command from any frontend.

Configuration

The default configurations are currently defined in the conf.yaml file.

Command line interface

> oo
Usage: oo [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  backup       Backup environment logs [after test_case_*].
  campaign     Perform a TEST according to the (swept)...
  deploy       Claim resources from a PROVIDER and configure...
  destroy      Destroy all the running dockers (keeping...
  g5k          Claim resources on Grid'5000 (frontend).
  inventory    Generate the Ansible inventory [after g5k,...
  prepare      Configure available resources [after g5k,...
  test_case_1  Run the test case 1: one single large...
  test_case_2  Run the test case 2: multiple distributed...
  test_case_3  Run the test case 3: one single large...
  test_case_4  Run the test case 4: multiple distributed...
  vagrant      Claim resources on vagrant (localhost).

Workflow to run a test case

  • Deploying and launching the benchmark (default driver broker is defined in the configuration file)
# default confs.yaml on $PWD will be read
> oo deploy --driver=broker vagrant

# Launch the one benchmark
> oo test_case_1 --nbr_clients 10 --nbr_servers 2

Adapt to the relevant provider (e.g g5k)

  • Real-time metrics visualisation

Grafana is available on the port 3000 of the control node (check the inventory file).

  • Backuping the environment
> oo backup

The files retrieved by this action are located in current/backup dir by default.

  • Some cleaning and preparation for the next run
# Preparing the next run by cleaning the environment
> oo destroy
> oo deploy vagrant

# Next run
> oo test_case_1 --nbr_clients 20 --nbr_servers 2

It's possible to force an experimentation dir with --env mydir

Note also that scripting from python is also possible using the function defined in task.py

## Workflow to run a campaign

  • A campaign is a batch execution of several configurations for a given test case. Deployment and execution of a benchmark is read from a configuration file. For example, to run the first test case enabled on the framework run:
> oo campaign --provider g5k test_case_1
  • Alternatively a campaign can be executed in a incremental mode in which deployments are performed only when a different driver or call_type is defined. Incremental campaigns are executed with a different semantic on the parameters defined in the configuration. With the incremental option the semantics is based on the combination of parameters by means of a dot product between a set of them in the configuration file (i.e., a zip operation between the lists of parameters). These parameters are defined by test case as follows:

    • Test case 1: nbr_clients, nbr_servers and pause
    • Test case 2: nbr_topics and pause
    • Test case 3: nbr_clients, nbr_servers and pause (only rpc-cast calls)
    • Test case 4: nbr_topics and pause (only rpc-cast calls)
  • To execute an incremental campaign be sure to use the ombt version msimonin/ombt:singleton instead of the default and execute:

> oo campaign --incremental --provider g5k test_case_1

Misc.

  • Bound clients or servers to specific bus agents:

To bind ombt-clients to a specific bus instance you can declare the following roles: [bus, bus-client].

Following the same idea ombt-servers can be bound to a specific bus instance using roles: [bus, bus-server]

ombt-orchestrator's People

Contributors

avankemp avatar jrbalderrama avatar msimonin avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ombt-orchestrator's Issues

remove client/server log backup

remove tasks associate to client/server log backups because currently they are empty and in large experiments this requires copy thousands of empty files unnecessarily.

`./cli.py prepare` should reload the broker option from the env

Currently calling ./cli.py prepare from the command line applies the default configuration of the broker (qdr complete graph of size 4). We don't want to run in this situation :

./cli.py deploy rabbitmq
./cli.py destroy
# The following will install qdr instead of rabbitmq (usually we don't want to change the bus when iterating)
./cli.py prepare 

create a name of iterations once

the iteration string used to generate dir or log names is (partially) generated twice.
the main function test_case may get this string and avoid to generate another one.

Further increase ombt-agent deployment velocity

Currently we send ombt_confs to ansible and it looks like

{
"controller": [list of confs for every controller],
"rpc-server": [list of confs for every controller],
"rpc-client": [list of confs for every controller]
}

We could send:

{
"inventory_hostname1": 
"inventory_hostname2":
}

and load only the part that a node is interested in. This should increase the speed of the deployment and avoid any skipped tasks.

Rename qpidd -> qdr

qpidd refers to the qpid broker not qpid-dispatch-router.
In order to not confuse anyone, use qdr instead (command line/internal roles...)

campaign-like but with incremental starts

Our current workflow is:

  • given a set of parameters: params1, params2,...
  • Deploy, bench with param_{i}, destroy and deploy, bench with params_{i+1}

In order to scale fast we probably wants to avoid destroying everything between two rounds.
I propose to start thinking/writing a campaign-like script that allows to incrementally scale a test_case.

complete test campaigns as external functions

External functions from cli are supported for tc1 and tc2 only. In order to be consistent all test cases should be supported otherwise the execution of campaigns takes default values.

python-docker v3.0.0 makes docker_container fails

Step to reproduce

./cli.py deploy (whatever the provider is)
./cli.py test_case_1 (any option)

ombt-controller tasks fails when getting the benchmark result.

Step to fix

The newest version of python-docker is >=3.0.0 seems to make the ansible docker_container module fails. But here we don't constraint the installation to be <3.0.0 (could use 2.7.0).

- name: Install some python bindings
pip:
name: "{{ item }}"
with_items:
- docker
- influxdb

and makes this fail:

docker_container:
image: "{{ ombt_version }}"
command: "{{ item.command }}"
name: "{{ item.name }}"
detach: "{{ item.detach }}"
network_mode: host
state: started
volumes:
- "{{ item.log }}:{{item.docker_log}}"
with_items: "{{ ombt_confs[inventory_hostname] }}"

Bind client and server to different bus agent

Currently we bind ombt agents in a round-robin fashion on all the bus agents.
Using several bus agents leads to have several clients and server bound to the same agent.
In order to force messages to go through the bus we'll like to bind clients to a subset of agent and servers to different subset of agents :

Example:

clients -> router1 <-> router2 -> servers

Expose the broker parameters somewhere

E.G qdr options are hard coded in the tasks.py. This should be easily customizable.

One idea would be to put everything (the options for all supported broker) in the configuration file - the same way we are doing for the providers.

Use mount module instead of cmd

When executing the task 'mount' the following warning pops out:

[WARNING]: Consider using mount module rather than running mount

Sweeping over different bus configuration

For now bus is static during a campaign. This feature would allow to run experiments with different bus during the same experimental campaign (and thus we could easily compare things during the post-mortem analysis)

add AttributeError to list of managed exceptions during campaign

Traceback (most recent call last):
  File "./cli.py", line 301, in <module>
    cli()
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "./cli.py", line 291, in campaign
    env=env)
  File "/home/jarojasbalderrama/workspace/ombt-orchestrator/campaign.py", line 285, in incremental_campaign
    t.prepare(driver=current_driver, env=env_dir)
  File "/home/jarojasbalderrama/.pyenv/versions/ombt-2.7.14/lib/python2.7/site-packages/enoslib/task.py", line 50, in decorated
    fn(*args, **kwargs)
  File "/home/jarojasbalderrama/workspace/ombt-orchestrator/tasks.py", line 357, in prepare
    ansible_bus_conf = generate_ansible_conf(config, 'bus')
  File "/home/jarojasbalderrama/workspace/ombt-orchestrator/tasks.py", line 348, in generate_ansible_conf
    bus_conf = generate_bus_conf(configuration, machines)
  File "/home/jarojasbalderrama/workspace/ombt-orchestrator/tasks.py", line 325, in generate_bus_conf
    bus_conf = get_conf(graph, machines, round_robin)
  File "/home/jarojasbalderrama/workspace/ombt-orchestrator/qpid_dispatchgen.py", line 14, in get_conf
    for node, nbrdict in graph.adjacency_iter():
AttributeError: 'Graph' object has no attribute 'adjacency_iter'

Support for qdr broken

We mainly focuses in RabbitMQ lately, it's time to reintroduce the qdr support.

#19 will track the progress and eventually be merged

Add support for network constraints

Enoslib allows to specify network_constraints (see [1]):
In the conf we could have a tc section corresponding to the object taken in input from the Enoslib:

Example:

tc: 
  enable: true
  default_delay:  20ms
  default_rate: 1gbit
  groups: [ombt-client, bus]

And a dedicated task: tc that can be called from the command line :

  • cli tc enforce the constraints
  • cli tc --reset reset the constraints

Add tree based topology for qdr

Currently we only test the complete_graph topology.
We'd like to test tree based deployment:


       +----r2
       |
r1 -------r3   
       | 
       +----r4

This corresponds to realistic scenarii where :

  1. compute to conductor communication : calls on a single Target (report state).
    In this case the servers are on r1 and the clients are in r2 ... rn

  2. neutron-server to agents communication push policies: fanout on a single Target.
    In this case the servers are on r2...rn and the client is in r1.

Pause default value should be a float

╰─$ ./cli.py test_case_1 --nbr_clients 1 --nbr_servers 1 --nbr_calls 10000 --pause 0.1 --call_type rpc-cast --timeout 7200
Usage: cli.py test_case_1 [OPTIONS]

Error: Invalid value for "--pause": 0.1 is not a valid integer

this is here :

PAUSE = 0

Setting PAUSE=0.0 will fix that

Increase IdleTimeout in qdrouterd.conf

When dealing with a certain number of clients, collectd plugin throws a timeout. :

Condition('amqp:resource-limit-exceeded', 'local-idle-timeout expired').

This is probably due to the idleTimeoutSecondsto be too small (see qdrouterd.conf).
We could put idleTimeoutSeconds: 60. In both listener/connector in the qdrouterd.conf file.
It's located here :

{% for listener in item.listeners %}
listener {
{% if listener.role == "inter-router"%}
host: {{ hostvars[listener.host]['ansible_' + internal_network]['ipv4']['address'] }}
{% else %}
host: {{ hostvars[listener.host]['ansible_' + control_network]['ipv4']['address'] }}
{% endif %}
port: {{ listener.port }}
role: {{ listener.role }}
{% if listener.authenticatePeer is defined %}authenticatePeer: {{ listener.authenticatePeer }} {% else %}{% endif %}
{% if listener.saslMechanisms is defined %}saslMechanisms: {{ listener.saslMechanisms }} {% else %}{% endif %}
}
{% endfor %}
{% for connector in item.connectors %}
connector {
host: {{ hostvars[connector.host]['ansible_' + internal_network]['ipv4']['address'] }}
port: {{ connector.port }}
role: {{ connector.role }}
}
{% endfor %}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.