Giter VIP home page Giter VIP logo

storm-benchmark-sol's Introduction

Storm Benchmark Build Status

How do we measure storm performance

The benchmark set contains 9 workloads. They fall into two categories. The first category is "simple resource benchmark", the goal is to test how storm performs under pressure of certain resource. The second category is to measure how storm performs in real-life typical use cases.

  • Simple resource benchmarks:

    • wordcount, CPU sensitive
    • sol, network sensitive
    • rollingsort, memory sensitive
  • Typical use-case benchmark:

    • rollingcount
    • trident
    • uniquevisitor
    • pageview
    • grep
    • dataclean
    • drpc

In real-life use cases, Kafka is often used for data ingestion. To acccount for that, most use-case benchmarks read data from Kafka and they could be categorized by the corresponding data generators:

  • data generated by FileReadKafkaProducer

    • dataclean
    • drpc
    • pageview
    • uniquevisitor
  • data generated by PageViewKafkaProducer

    • grep
    • trident

The data generators are already provided and they are Storm applications as well.

How to use

We assume a Storm cluster is already set up locally.

  1. Build.

First, build storm-benchmark.

  git clone https://github.com/manuzhang/storm-benchmark.git
  mvn package
  1. Run. We use SOL as an example.
  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/sol.yaml -c topology.workers=2 storm.benchmark.tools.Runner storm.benchmark.benchmarks.SOL 
  • -storm directs stormbench to look for the storm command
  • -jar sets the benchmark jar with all the dependencies in
  • -conf is for user to provide a yaml conf file like storm/conf/storm.yaml. Check the storm-benchmark/conf folder where conf files are already provided for existing benchmarks
  • -c allows user to set conf through command line without modifying conf files every time
  1. Check. The benchmark results will be stored at config path METRICS_PATH(default is: reports). It contains throughput data and latency of the whole cluster.

The result of SOL contains two files

1. `SOL_metrics_1402148415021.csv`. Performance data.
2. `SOL_metrics_1402148415021.yaml`. The config used to run this test.

How to run a benchmark ingesting data from Kafka

We assume Storm and Kafka have been set up locally. (No need to create Kafka topic beforehand, which could be auto created when the producer sends messages to Kafka). Also, assume Storm Benchmark has been built successfully.

Here's how we run uniquevisitor, for instance.

  1. Launch PageViewKafkaProducer.
  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/pageview_producer.yaml storm.benchmark.tools.Runner storm.benchmark.tools.producer.kafka.PageViewKafkaProducer 
  1. Launch UniqueVisitor.
  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/uniquevisitor.yaml storm.benchmark.tools.Runner storm.benchmark.benchmarks.UniqueVisitor 

Then, we could check the metrics data as in the previous section.

Supports

Please contact:

Acknowledgement

We use the SOL benchmark code(https://github.com/yahoo/storm-perf-test) from yahoo. Thanks.

storm-benchmark-sol's People

Contributors

manuzhang avatar clockfly avatar sapinamin avatar coderamin avatar ptgoetz avatar luke0211 avatar

Watchers

Rohan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.