Giter VIP home page Giter VIP logo

tpcds-run-tool's Introduction

tpcds-run-tool

Make sure to run
* export RNGSEED_DATASET=42069
before running the script :^)

This is a very basic set of scripts to automate the tasks of

  • Downloading/compiling the server (MariaDB, MySQL and PostgreSQL are supported)
  • Generation of TPC-DS dataset and its read queries (that is, this is not the whole benchmark)
  • Loading the dataset and running the queries.

Geting TPC-DS to run

MySQL/MariaDB cannot run all of the TPC-DS queries (See https://jira.mariadb.org/browse/MDEV-17802 for details). The scripts here patch tpc-ds-tool:

  • Some queries are modified to run. The modification is trivial
  • Queries that could not be executed are excluded from the comparison.

See

  • mariadb-tpcds-tooling2/adapt-queries-for-mariadb-v4.diff
  • mariadb-tpcds-tooling2/adapt-queries-for-pg-v4.diff for details.

In order for SUM(query_time) value to be comparable, PostgreSQL also only runs the queries that I could get to work with MariaDB.

Preparing the host

Look at (or run) setup_server/setup-os-ubuntu.sh to see what packages need to be installed.

Running

Just run run-all-mariadb-10.4.sh to do all the steps with MariaDB 10.4.

Run run-all-pg.sh to do all the steps with PostgreSQL 11.

Benchmark results

The results are saved in my_tpcds_result table in tpcds database. The table is defined as follows:

create table my_tpcds_result(
  query_name varchar(64),
  query_stream varchar(64),
  query_start datetime,
  query_time_ms bigint
);

Here:

  • query_stream is number of query streams. TPC-DS generates N (we use 10) streams, each stream has instances of all 100 queries.
  • query_name is the name of the query template (queryNNN.tpl)
  • query_start - datetime when query started running. This is only for diagnostic.
  • query_time_ms - how much the query took, in milliseconds.

This way, the total time it took to run the query stream can be computed as follows:

select 
  query_stream, count(*), sum(query_time_ms) 
from
  my_tpcds_result
group by
  query_stream;

and one can also get details about individual query run times.

Other notes

Random query generator seed for the dataset seems to affect execution time A LOT.

It is currently hard-coded in mariadb-tpcds-tooling2/01-generate-dataset-pg.sh and "mariadb-tpcds-tooling2/01-generate-dataset-pg.sh.

tpcds-run-tool's People

Contributors

spetrunia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.