prestodb / benchto Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 19.0 3.71 MB

Framework for running macro benchmarks in a clustered environment

License: Apache License 2.0

Java 23.31% CSS 2.12% HTML 1.22% JavaScript 73.31% Shell 0.03% Dockerfile 0.01%

benchto's People

Contributors

Stargazers

Watchers

Forkers

losipiuk fiedukow jessesleeping hellium01 amalakar raghavsethi kireledan mbigelow mukteshkrmishra kriti-sc adamjshook itsinthebag isabella232 ashishtadose cyofeiyue yingsu00 hybigdata wilhelmjung upbram123

benchto's Issues

Add possibility to remove benchmarks

Queries has not been accessed since - Presto

Hi
I am running queries on Presto cluster and I get the following error , after 10 mins. Any idea? Is there any time limit in presto where the query gets killed by itself like this & hence we face the below issue?

I have similar issue com.facebook.presto.spi.PrestoException: Query 20191111_162609_00141_vrkhx has not been accessed since 2019-11-11T16:32:05.517Z: currentTime 2019-11-11T16:37:06.260Z

Got "Schema tpch_sf300_orc does not exist"

Got "Schema tpch_sf300_orc does not exist" when running the test for presto.

Store the explain plan and query text in benchto

When further investigating slow queries, it is very valuable to have the explain plan to see what changed. It's also not possible from the benchto UI to see what the text of the query is.

EXPLAIN (type=distributed) would be most helpful.

At least initially, it's ok if it's just stored in the benchto database. However, eventually either it needs to make its way to the benchto UI, or the benchto database needs to be cleaned up so that it's easier to find things.

Add benchmark grouping for Benchto UI

Benchmarks from the same benchmark should be grouped together.

SQL is not an update statement exception

See:

00:16:57.976 Exception in thread "defaultTaskExecutor-1" 06:45:10.548 ERROR [defaultTaskExecutor-2] c.t.b.d.l.LoggingBenchmarkExecutionListener - Query failed: q15 (0/6), execution error: SQL is not an update statement: /* TPC_H Query 15 - Create View for Top Supplier Query */
00:16:57.976 WITH revenue0 AS (
00:16:57.976 SELECT
00:16:57.976 l.suppkey as supplier_no,
00:16:57.976 sum(l.extendedprice*(1-l.discount)) as total_revenue
00:16:57.976 FROM
00:16:57.976 "hive"."tpch_10gb_orc"."lineitem" l
00:16:57.976 WHERE
00:16:57.976 l.shipdate >= DATE '1996-01-01'
00:16:57.976 AND l.shipdate < DATE '1996-01-01' + INTERVAL '3' MONTH
00:16:57.976 GROUP BY
00:16:57.976 l.suppkey
00:16:57.976 )
00:16:57.976 
00:16:57.976 /* TPC_H Query 15 - Top Supplier */
00:16:57.976 SELECT
00:16:57.976 s.suppkey,
00:16:57.976 s.name,
00:16:57.976 s.address,
00:16:57.976 s.phone,
00:16:57.976 total_revenue
00:16:57.976 FROM
00:16:57.976 "hive"."tpch_10gb_orc"."supplier" s,
00:16:57.976 revenue0
00:16:57.976 WHERE
00:16:57.976 s.suppkey = supplier_no
00:16:57.976 AND total_revenue = (SELECT max(total_revenue) FROM revenue0)
00:16:57.976 ORDER BY
00:16:57.976 s.suppkey

Add tests with multiple queries for the concurrency test

DriverAppIntegrationTest.testConcurrentBenchmark() uses test_concurrent_benchmark.yaml which contains only 1 query. We need to test the case where there are many queries. Each thread shall be able to pick up more or less equal amount of work/queries.

add possibility to add notes to benchmark from UI

During interpretation of benchmarks it could be helpful to note something about benchmark or query execution.
For example that this benchmark execution were run with some property set in particular software.

Add Driver cancellation with ctrl-c

Handle duplicate classes during build

Currently we are getting a lot of duplicate class warnings during build like:

[WARNING] benchto-generator-1.0.0-SNAPSHOT.jar, guava-15.0.jar define 1622 overlapping classes:
[WARNING]   - com.google.common.collect.ImmutableMapValues$1
[WARNING]   - com.google.common.io.LineProcessor
[WARNING]   - com.google.common.util.concurrent.AbstractService$5
[WARNING]   - com.google.common.io.BaseEncoding$StandardBaseEncoding$2
[WARNING]   - com.google.common.io.ByteProcessor
[WARNING]   - com.google.common.math.package-info
[WARNING]   - com.google.common.util.concurrent.SimpleTimeLimiter
[WARNING]   - com.google.common.io.GwtWorkarounds$5
[WARNING]   - com.google.common.cache.AbstractCache$StatsCounter
[WARNING]   - com.google.common.util.concurrent.CycleDetectingLockFactory$Policies
[WARNING]   - 1612 more...
[WARNING] benchto-generator-1.0.0-SNAPSHOT.jar, hive-apache-0.14.jar define 13614 overlapping classes:
[WARNING]   - parquet.column.page.PageReader
[WARNING]   - com.facebook.presto.hive.$internal.org.codehaus.jackson.map.ser.BeanSerializer
[WARNING]   - org.apache.hadoop.hive.common.LogUtils
[WARNING]   - parquet.format.converter.ParquetMetadataConverter$1
[WARNING]   - org.apache.hadoop.hive.ql.plan.DDLWork
[WARNING]   - com.facebook.presto.hive.$internal.org.codehaus.jackson.map.introspect.AnnotatedField
[WARNING]   - parquet.it.unimi.dsi.fastutil.Stack
[WARNING]   - com.facebook.presto.hive.$internal.org.codehaus.jackson.map.jsontype.TypeIdResolver
[WARNING]   - org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_args$get_database_argsStandardSchemeFactory
[WARNING]   - org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterCharColumnNotBetween
[WARNING]   - 13604 more...
[WARNING] automaton-1.11-8.jar, benchto-generator-1.0.0-SNAPSHOT.jar define 25 overlapping classes:
[WARNING]   - dk.brics.automaton.AutomatonMatcher
[WARNING]   - dk.brics.automaton.ShuffleOperations$ShuffleConfiguration
[WARNING]   - dk.brics.automaton.RegExp$Kind
[WARNING]   - dk.brics.automaton.RunAutomaton
[WARNING]   - dk.brics.automaton.Automaton
[WARNING]   - dk.brics.automaton.RegExp
[WARNING]   - dk.brics.automaton.AutomatonProvider
[WARNING]   - dk.brics.automaton.RegExp$1
[WARNING]   - dk.brics.automaton.MinimizationOperations$StateListNode
[WARNING]   - dk.brics.automaton.State
[WARNING]   - 15 more...

We need to fix that and treat as an error:

Health check query results are never retrieved

Benchto executes healthcheck queries between executions, but never retrieves the results.

This causes the queries to hang in Presto with 100% progress until they time out with

com.facebook.presto.spi.PrestoException: Query 20170714_200842_00168_kz7bj has not been accessed since 2017-07-14T16:08:42.616-04:00: currentTime 2017-07-14T16:13:42.917-04:00
    at com.facebook.presto.execution.SqlQueryManager.failAbandonedQueries(SqlQueryManager.java:584)
    at com.facebook.presto.execution.SqlQueryManager$1.run(SqlQueryManager.java:180)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The queries being executed are like the one below:

select nodes_count,
case nodes_count
when 9 then 1
else 1/0
end
from (
select count(*) nodes_count from system.runtime.nodes where state = 'active'
)

No need to log query results for each query execution

If a benchmark is running query multiple times, there is no need to log them each time. It clutters the log file without any new value.

Instead, there could be a check which validates that all the queries returned the same results. Notice that logging and results verification should not be included in benchmark duration time.

Run measurements must be pulled from Graphite after delay to be complete

Currently benchto pulls env utilization measurements as soon as run execution ends. However, at least when using Graphite, the stats are not available in real-time and should be pulled after some time. Bonus would be to recognize when the stats are not fully available yet, but maybe that's not needed and a delay will be sufficient.

Differentiate `Benchmark` and `BenchmarkPhase`

Benchto can load and run multiple benchmarks. A benchmark may contain multiple phases, e.g. TPCH has 1) Load phase, 2) power phase 3) refresh function phase1 4) throughput phase 5) refresh function phase2.

We would like to distinguish these two concepts

Add something like `product test` to travis

Idea is to add simple mechanism which would set some test benchto environment so we could run benchto-driver with sample benchmarks and check that benchmarks were collected properly by benchto-service. Additionally it could check (test ui) if java script benchto-service web pages executes properly for this sample benchmarks.

CC @fiedukow

Add timeout for tests execution

In case of an error that causes the test to stuck, timeout should break the query execution and mark the benchmark as failed.

Display query in UI

It is is useful to be able to see the query statement for particular benchmark.

add pagination in benchmark trends page

If any of query executions fails, whole benchmark should be execution should be interrupted

It is already known that benchmark execution failed so there is not much sense to continue,

CC: @fiedukow

NPE while updating graphite

00:24:53.068 Exception in thread "defaultTaskExecutor-2" java.lang.NullPointerException
00:24:53.068    at com.teradata.benchto.driver.graphite.GraphiteClient$GraphiteEventRequest$GraphiteEventRequestBuilder.when(GraphiteClient.java:149)
00:24:53.068    at com.teradata.benchto.driver.listeners.GraphiteEventExecutionListener.executionFinished(GraphiteEventExecutionListener.java:80)
00:24:53.068    at com.teradata.benchto.driver.listeners.benchmark.BenchmarkStatusReporter.lambda$reportExecutionFinished$3(BenchmarkStatusReporter.java:60)
00:24:53.068    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
00:24:53.068    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
00:24:53.068    at java.lang.Thread.run(Thread.java:745)

Add a way to update information about environment

CC: @fiedukow

Support more run modes

We shall support at least these run modes:

Power
Run one query at a time, with the pre-designated query order. For example, TPCH requires the Power test run 1 stream of 22 queries, with the query order 14 2 9 20 6 17 18 8 21 13 3 22 16 4 11 15 1 10 19 5 7 12. In regular tracking, we may want to do cold/warm runs, so each query would be run runs times, with the first execution being cold, the rest being warm. We will take the last warm execution as the reporting query. Before the first execution of each query, we need to run macros to clean the caches. E.g. if runs = 3, then the queries would be run clean caches, 14(cold), 14, 14(warm), clean caches, 2(cold), 2, 2(warm)...
Throughput
Run s streams concurrently, and each stream needs to follow the specified query orders. e.g. See https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf for the query orders for TPCH tests. In Throughput mode, we usually don't do the cold/warm runs as in Power phase, but rather once per query. There can be a pre-warm run before the real run, usually being the Power run. So the runs parameter can actually be neglected.
Concurrent
Run many queries using a pre-defined query concurrency(threads), e.g. 1000 queries run on 40 threads. Each thread just pick up a random query from the query set. Each query is executed runs times but the order shall be randomized. Usually the runs = 1. This is to simulate a real workload on customers' clusters. Note that we will need to be able to supply randomly chosen queries from the query history, if we have any.
Concurrent-Repeating
This is what #68 does. With pre-defined concurrency level, each thread executes the same listed queries with runs times. With that PR each query would be executed runs * concurrency times. This mode is usually used if we want to run a single query on all threads for a period of time, possibly to reproduce some bugs.

The current implementation supports 1 and 4 but not 2 and 3. We shall support all 4 modes.

Got "Schema tpch_sf300_orc does not exist

SQL health check does not work

Even that below query fails:

select nodes_count,
case nodes_count
when 9 then 1
else 1/0
end
from (
select count(*) nodes_count from system.runtime.nodes where state = 'active'
)

with:

com.facebook.presto.spi.PrestoException: / by zero

driver continues the work:

00:12:08.372 08:41:18.885 INFO  [main] c.t.b.d.e.BenchmarkExecutionDriver - Running health check macros: [run-health-check-script, health_check/9_nodes_active.sql]
00:12:08.373 08:41:18.887 INFO  [main] c.t.b.d.m.s.ShellMacroExecutionDriver - Executing macro: 'hfab -P -R all other:querygrid_hdp_perf_cluster node.shell:"sh health_check.sh" && hfab -P -R all other:querygrid_td_perf_cluster node.shell:"sh health_check.sh"'
00:12:11.552 08:41:22.068 INFO  [main] c.t.b.d.m.q.QueryMacroExecutionDriver - Executing macro query: select nodes_count,
00:12:11.552 case nodes_count
00:12:11.552 when 9 then 1
00:12:11.552 else 1/0
00:12:11.552 end
00:12:11.552 from (
00:12:11.552 select count(*) nodes_count from system.runtime.nodes where state = 'active'
00:12:11.552 )
00:12:12.317 08:41:22.834 INFO  [main] c.t.b.d.e.BenchmarkExecutionDriver - [1 of 100] processing benchmark: Benchmark{name=presto/tpcds,

Make benchmarks hardware agnostic

Currently benchmarks contains attributes which are specific to the hardware and cluster:

data size (10GB, 100GB, 1TB)
number of runs, number of prewarms
file format

This makes reusing benchmarks on different clusters problematic since those elements need to be fine tuned to a given execution environment.

Major requirements of benchmarks are:

results should have low variability
execution time of benchmark should be reasonable (e.g: between 30s and 1min) in order to reduce static query overhead (e.g: client-server latency)

Having those requirements we propose to refactor benchto-driver and benchmarks in the following way:

add runs_scale_factor driver parameter. Increasing this parameter would reduce variability of benchmark runtimes (this addresses your issue @idemura )
use catalog (e.g: tpch) and data_size parameters in benchmarks instead of schema. data_size would be categorical variable (e.g: tiny, small, medium, large, huge). The purpose of the variable is to describe how "difficult" benchmark is to execute without specifying exact data size. This allows to use different mappings in various benchmark environments (e.g: tpch_medium=tpch_10gb_text). The most appropriate data size should be used for environment so that benchmark runtime is reasonable. Also, one definition of benchmark can be used to test different file formats.

FYI: @findepi @idemura