hortonworks / hive-testbench Goto Github PK

Perl 6.62% Shell 29.11% Makefile 6.45% Java 57.83%

hive-testbench's Introduction

hive-testbench

A testbench for experimenting with Apache Hive at any data scale.

Overview

The hive-testbench is a data generator and set of queries that lets you experiment with Apache Hive at scale. The testbench allows you to experience base Hive performance on large datasets, and gives an easy way to see the impact of Hive tuning parameters and advanced settings.

Prerequisites

You will need:

Hadoop 2.2 or later cluster or Sandbox.
Apache Hive.
Between 15 minutes and 2 days to generate data (depending on the Scale Factor you choose and available hardware).
If you plan to generate 1TB or more of data, using Apache Hive 13+ to generate the data is STRONGLY suggested.

Install and Setup

All of these steps should be carried out on your Hadoop cluster.

Step 1: Prepare your environment.

In addition to Hadoop and Hive, before you begin ensure gcc is installed and available on your system path. If you system does not have it, install it using yum or apt-get.
Step 2: Decide which test suite(s) you want to use.

hive-testbench comes with data generators and sample queries based on both the TPC-DS and TPC-H benchmarks. You can choose to use either or both of these benchmarks for experiementation. More information about these benchmarks can be found at the Transaction Processing Council homepage.
Step 3: Compile and package the appropriate data generator.

For TPC-DS, ./tpcds-build.sh downloads, compiles and packages the TPC-DS data generator. For TPC-H, ./tpch-build.sh downloads, compiles and packages the TPC-H data generator.
Step 4: Decide how much data you want to generate.

You need to decide on a "Scale Factor" which represents how much data you will generate. Scale Factor roughly translates to gigabytes, so a Scale Factor of 100 is about 100 gigabytes and one terabyte is Scale Factor 1000. Decide how much data you want and keep it in mind for the next step. If you have a cluster of 4-10 nodes or just want to experiment at a smaller scale, scale 1000 (1 TB) of data is a good starting point. If you have a large cluster, you may want to choose Scale 10000 (10 TB) or more. The notion of scale factor is similar between TPC-DS and TPC-H.

If you want to generate a large amount of data, you should use Hive 13 or later. Hive 13 introduced an optimization that allows far more scalable data partitioning. Hive 12 and lower will likely crash if you generate more than a few hundred GB of data and tuning around the problem is difficult. You can generate text or RCFile data in Hive 13 and use it in multiple versions of Hive.
Step 5: Generate and load the data.

The scripts tpcds-setup.sh and tpch-setup.sh generate and load data for TPC-DS and TPC-H, respectively. General usage is tpcds-setup.sh scale_factor [directory] or tpch-setup.sh scale_factor [directory]

Some examples:

Build 1 TB of TPC-DS data: ./tpcds-setup.sh 1000

Build 1 TB of TPC-H data: ./tpch-setup.sh 1000

Build 100 TB of TPC-DS data: ./tpcds-setup.sh 100000

Build 30 TB of text formatted TPC-DS data: FORMAT=textfile ./tpcds-setup 30000

Build 30 TB of RCFile formatted TPC-DS data: FORMAT=rcfile ./tpcds-setup 30000

Also check other parameters in setup scripts important one is BUCKET_DATA.
Step 6: Run queries.

More than 50 sample TPC-DS queries and all TPC-H queries are included for you to try. You can use hive, beeline or the SQL tool of your choice. The testbench also includes a set of suggested settings.

This example assumes you have generated 1 TB of TPC-DS data during Step 5:
```
 cd sample-queries-tpcds
 hive -i testbench.settings
 hive> use tpcds_bin_partitioned_orc_1000;
 hive> source query55.sql;
```
Note that the database is named based on the Data Scale chosen in step 3. At Data Scale 10000, your database will be named tpcds_bin_partitioned_orc_10000. At Data Scale 1000 it would be named tpch_flat_orc_1000. You can always show databases to get a list of available databases.

Similarly, if you generated 1 TB of TPC-H data during Step 5:
```
 cd sample-queries-tpch
 hive -i testbench.settings
 hive> use tpch_flat_orc_1000;
 hive> source tpch_query1.sql;
```

Feedback

If you have questions, comments or problems, visit the Hortonworks Hive forum.

If you have improvements, pull requests are accepted.

hive-testbench's People

Contributors

Stargazers

Watchers

Forkers

windpiger ptallada hemanthvenkata 012huang kgyrtkirk mineminemine123 hackty algates zerodukeji janezhao0708 saketachalamchala kaiseu dstreev serenagong mave89 justinpitts hzruandd developerslxz yujunz tandy0516 goungoun zeweichen11 madankumar21 kyotoyaho fishline kiran4t4t vassantkunde patalwell yuliu-sn aschilds maniaabdi lhfei curiosityyy marshtompsxd mccoyzhu binducattamanchi kbadtia u20024804 dqu168 itxx00 hefei1986 kainanbin aniket486 tobiasheuer bijugs jiazemin harrisonfeng dreamerharshit sajjad765 firemonk9 amaltb zohaib7king 0xcafedaddy zacharani utf7 niniaibu rajamarimuthu0 humengyu2012 hjw199089 7mming7 bhjo0930 hubzhangxj gztueday weiatwork waelemam ariji1 dingyule anuragkh wuzihao1 zmatyus techyogillc wendaoheri gopireddy506 pdeyhim minio xujiliang vsawhney mixergit y4rr andyzheng xiaobabylu androidmake kero9999 aengusrooney liupan664021 chentao alexander1005 morioramdenbourg savadev longw5 hope-onely kbohra mukul1987 bjangir oluk0001 liuguh fangtailin qingzma paulvid fapifta

hive-testbench's Issues

./tpcds-bulld.sh fails, access denied error

Trying to build the tools, getting access denied. Tried from different clients, none seem to be working?

./tpcds-build.sh
Building TPC-DS Data Generator
curl http://dev.hortonworks.com.s3.amazonaws.com/hive-testbench/tpcds/README
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A0EC229B461934D8</RequestId><HostId>4xhSEQCANRAYks4zmXeXETDA7OYzXppuB0+CyjWiblxcT9WKhXNNaQJctVxF8IGiQQg0itPjFo8=</HostId></Error>curl --output tpcds_kit.zip http://dev.hortonworks.com.s3.amazonaws.com/hive-testbench/tpcds/TPCDS_Tools.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   243    0   243    0     0    997      0 --:--:-- --:--:-- --:--:--  1000
mkdir -p target/
cp tpcds_kit.zip target/tpcds_kit.zip
test -d target/tools/ || (cd target; unzip tpcds_kit.zip)
Archive:  tpcds_kit.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of tpcds_kit.zip or
        tpcds_kit.zip.zip, and cannot find tpcds_kit.zip.ZIP, period.
make: *** [target/tools/dsdgen] Error 9

Unzip doesn't Work

When I run ./tpcds-build.sh, it gave me an unzip error as below:

Building TPC-DS Data Generator
test -d target/tools/ || (cd target; unzip tpcds_kit.zip)
Archive:  tpcds_kit.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of tpcds_kit.zip or
        tpcds_kit.zip.zip, and cannot find tpcds_kit.zip.ZIP, period.
make: *** [Makefile:19: target/tools/dsdgen] Error 9
TPC-DS Data Generator built, you can now use tpcds-setup.sh to generate data.

Does this mean the zip file is corrupt? Can anyone help me here? Thanks in advance.

What's the license of hive-testbench?

will it be paid or restricted to get?
Looking forward for your response, thanks.

the sql 10,14,16,35,41,42,51,62,64,78,79,84,85,91,92,94,95,96,99 run errors by add_constraints.sql has error in hive version 2.3.3

I want to know where the CSV file generated after the query is

I runSuite.pl tpcds 2,and I want to know where is the csv file generated after the query

./tpcds-setup.sh 2 fails

Hello, I am running:

Hadoop 2.7.1
Hive 1.2.2

and I am trying to generate testdata.

I did:

./tpcds-build.sh
./tpcds-setup.sh 2

the latter fails then with this message:

[...]
TPC-DS text data generation complete.
Loading text data into external tables.
0: jdbc:hive2://localhost:2181/ (closed)> create database if not exists ${DB};

make: *** [load_orc_2.mk:3: date_dim] Error 2
Loading constraints
0: jdbc:hive2://localhost:2181/ (closed)> -- set hivevar:DB=tpcds_bin_partitioned_orc_10000
0: jdbc:hive2://localhost:2181/ (closed)>
0: jdbc:hive2://localhost:2181/ (closed)> alter table customer_address add constraint ${DB}_pk_ca primary key (ca_address_sk) disable novalidate rely;

Data loaded into database tpcds_bin_partitioned_orc_2.

But when I check, the database does not exist. How can I resolve this issue?

How to verify the correctness of query results？

Does it support version 3?

Does it support version 3? I use Hadoop 3 and hive 3 for testing. I can generate data, but I can't build hive database and table。

./tpch-setup.sh execution failed

I using tpch-setup.sh in CDH6.3.2
the environment variable below :

XDG_SESSION_ID=104
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../spark
HOSTNAME=node01.cdh6.citms.cn
TERM=xterm
SHELL=/bin/bash
HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
HISTSIZE=1000
SSH_CLIENT=192.168.0.99 57577 22
HADOOP_PREFIX=/opt/cloudera/parcels/CDH/lib/hadoop
SSH_TTY=/dev/pts/0
USER=hdfs
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hbase
MAIL=/var/spool/mail/root
PATH=.:/usr/local/mysql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs
HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hive
JAVA=/usr/java/jdk1.8.0_271-amd64/bin/java
kylin_hadoop_conf_dir=/usr/local/apache-kylin-4.0.0-bin-spark2/hadoop_conf
PWD=/root/hive-testbench
HADOOP_YARN_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn
JAVA_HOME=/usr/java/jdk1.8.0_271-amd64
HADOOP_CONF_DIR=/etc/hadoop/conf
LANG=zh_CN.UTF-8
HISTCONTROL=ignoredups
SHLVL=5
HOME=/root
HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce
KYLIN_HOME=/usr/local/apache-kylin-4.0.0-bin-spark2
LOGNAME=hdfs
SSH_CONNECTION=192.168.0.99 57577 192.168.10.146 22
LESSOPEN=||/usr/bin/lesspipe.sh %s
MYSQL_HOME=/usr/local/mysql
XDG_RUNTIME_DIR=/run/user/0
_=/usr/bin/env
OLDPWD=/root/hive-testbench/tpch-gen

the output:

[root@node01 hive-testbench]# ./tpch-setup.sh 2
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
ls: `/tmp/tpch-generate/2/lineitem': No such file or directory
Generating data at scale factor 2.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
   at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:348)
   at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
   at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
   at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
   at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3151)
   at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3196)
   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3235)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3286)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3254)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:478)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
   at org.notmysock.tpch.GenTable.genInput(GenTable.java:171)
   at org.notmysock.tpch.GenTable.run(GenTable.java:98)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
   at org.notmysock.tpch.GenTable.main(GenTable.java:54)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
ls: `/tmp/tpch-generate/2/lineitem': No such file or directory
Data generation failed, exiting.

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.

Commands in scripts:
runcommand "$HIVE -f ddl-tpcds/bin_partitioned/add_constraints.sql --hivevar DB=${DATABASE}"

Error description：
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Parent table not found: customer_address)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Parent table not found: customer_address) (state=08S01,code=1)

But I see that the tables in the database exist!

Has anyone ever encountered this problem？

-Dmaven.multiModuleProjectDirectory system property is not set

While compiling and packaging the data generator for tpch following error is seen.

./tpch-build.sh
...
-Dmaven.multiModuleProjectDirectory system property is not set.make: *** [target/tpch-gen-1.0-SNAPSHOT.jar] Error 1

testbench.settings file not found

Hi,
I am trying to run this benchmark on my setup but while running sample query current repository unable to find "testbench.settings" file on bellow location


Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hive-common-1.1.0-cdh5.14.2.jar!/hive-log4j.properties
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
Exception in thread "main" java.io.FileNotFoundException: File file:/bench/hive-testbench/sample-queries-tpcds/testbench.settings does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:598)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:344)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:784)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:437)
	at org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:449)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:723)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
root@sn1:/bench/hive-testbench/sample-queries-tpcds# ls /bench/hive-testbench/sample-queries-tpcds/testbench.settings 
ls: cannot access /bench/hive-testbench/sample-queries-tpcds/testbench.settings: No such file or directory
root@sn1:/bench/hive-testbench/sample-queries-tpcds#

Table not found 'date_dim'

While generating tpcds data, I got a make error. When "export DEBUG_SCRIPT=X" was set, it gave a little more detailed error as below. Anyone knows what's going on?

jdbc:hive2://localhost:2181/> create table date_dim
jdbc:hive2://localhost:2181/> stored as ${FILE}
jdbc:hive2://localhost:2181/> as select * from ${SOURCE}.date_dim;
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 3:17 Table not found 'date_dim' (state=42S02,code=10001)

Closing: 0: jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2?tez.queue.name=default
make: *** [date_dim] Error 2

Hive 13. Typo?

Hi all,

i just got started learning big data hadoop, etc.

when it's written

If you want to generate a large amount of data, you should use Hive 13 or later.

what is it? the client's version?
or did he mean apache version 3?

https://github.com/apache/hive

makefile error for tpcds

I have been trying to setup hive-testbench for generating tpcds data. Upon running ./tpcds-setup.sh, I observed the following error.

I'm using hadoop- 3.3.6, hive - hive-4.0.0-alpha-1 versions.

gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -o dsdgen s_brand.o s_customer_address.o s_call_center.o s_catalog.o s_catalog_order.o s_catalog_order_lineitem.o s_catalog_page.o s_catalog_promotional_item.o s_catalog_returns.o s_category.o s_class.o s_company.o s_customer.o s_division.o s_inventory.o s_item.o s_manager.o s_manufacturer.o s_market.o s_pline.o s_product.o s_promotion.o s_purchase.o s_reason.o s_store.o s_store_promotional_item.o s_store_returns.o s_subcategory.o s_subclass.o s_warehouse.o s_web_order.o s_web_order_lineitem.o s_web_page.o s_web_promotinal_item.o s_web_returns.o s_web_site.o s_zip_to_gmt.o w_call_center.o w_catalog_page.o w_catalog_returns.o w_catalog_sales.o w_customer_address.o w_customer.o w_customer_demographics.o w_datetbl.o w_household_demographics.o w_income_band.o w_inventory.o w_item.o w_promotion.o w_reason.o w_ship_mode.o w_store.o w_store_returns.o w_store_sales.o w_timetbl.o w_warehouse.o w_web_page.o w_web_returns.o w_web_sales.o w_web_site.o dbgen_version.o address.o build_support.o date.o decimal.o dist.o driver.o error_msg.o genrand.o join.o list.o load.o misc.o nulls.o parallel.o permute.o pricing.o print.o r_params.o StringBuffer.o tdef_functions.o tdefs.o text.o scd.o scaling.o release.o sparse.o validate.o -lm
/usr/bin/ld: s_purchase.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_purchase.c:55: multiple definition of nItemIndex'; s_catalog_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here /usr/bin/ld: s_web_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order.c:56: multiple definition of nItemIndex'; s_catalog_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order_lineitem.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order_lineitem.c:54: multiple definition of g_s_web_order_lineitem'; s_web_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order.c:54: first defined here /usr/bin/ld: w_catalog_page.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_catalog_page.c:52: multiple definition of g_w_catalog_page'; s_catalog_page.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_page.c:51: first defined here
/usr/bin/ld: w_warehouse.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_warehouse.c:53: multiple definition of g_w_warehouse'; s_warehouse.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_warehouse.c:51: first defined here /usr/bin/ld: w_web_site.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_web_site.c:59: multiple definition of g_w_web_site'; s_web_site.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_site.c:51: first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [makefile:233: dsdgen] Error 1
make[1]: Leaving directory '/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools'
make: *** [Makefile:23: target/tools/dsdgen] Error 2
TPC-DS Data Generator built, you can now use tpcds-setup.sh to generate data.
hadoop@f1953c80b0cb:/work/tpcds-insights/hive-testbench$

could you please help me in resolving this issue?

./tpcds-setup.sh 4 doesn't generate data

Only RC file generation is supported, not ORC file generation?

SQL parsing errors (HDP 2.6)

Got many parsing errors when executing benchmarks on both TPC-DS and TPC-H.
However, the same sql can be parsed and executed via the UI of Hive View 2.

Platform: HDP 2.6, Hive +LLAP
Error messages:
NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1114) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1160) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

how i know, my query result is right , if i dont have a query Expected results ,i think you should give us a expected result

TPC-DS Table Generation Execution Error

Sometimes this happens on "large" scale factors. The optimization step fails at table 17 or 18.
Make 'recipe error'
The specific error is:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Is this some kind of permission error and how could I fix this?

hive configuration hive.optimize.sort.dynamic.partition.threshold does not exists

failed to execute tpch-setup.sh in hive2.3.5, hadoop2.8.5

DEBUG_SCRIPT=ON FORMAT=textfile ./tpch-setup.sh 100 /data
Logging initialized using configuration in file:/usr/local/service/hive/conf/hive-log4j2.properties Async: true
Query returned non-zero code: 1, cause: hive configuration hive.optimize.sort.dynamic.partition.threshold does not exists.

FAILED: SemanticException [Error 10249]: Line 20:16 Unsupported SubQuery Expression 'cr_order_number': Only 1 SubQuery expression is supported.

query10, query16..
FAILED: SemanticException [Error 10249]: Line 20:16 Unsupported SubQuery Expression 'cr_order_number': Only 1 SubQuery expression is supported.

complie error

the error remind
the http://dev.hortonworks.com.s3.amazonaws.com can't visit
how can i solve this question?
thank you!

the sample table did not appear,make error

sudo -u hdfs ./tpcds-setup.sh 100
TPC-DS text data generation complete.
Loading text data into external tables.
make: *** [date_dim] Error 1
Loading constraints
Data loaded into database tpcds_bin_partitioned_orc_100.

/tmp/tpcds-generate/100 is null

make fails in tpcds-gen

Compilation fails for me with:

/usr/bin/ld: s_purchase.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_purchase.c:55: multiple definition of `nItemIndex'; s_catalog_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order.c:56: multiple definition of `nItemIndex'; s_catalog_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order_lineitem.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order_lineitem.c:54: multiple definition of `g_s_web_order_lineitem'; s_web_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order.c:54: first defined here
/usr/bin/ld: w_catalog_page.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_catalog_page.c:52: multiple definition of `g_w_catalog_page'; s_catalog_page.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_page.c:51: first defined here
/usr/bin/ld: w_warehouse.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_warehouse.c:53: multiple definition of `g_w_warehouse'; s_warehouse.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_warehouse.c:51: first defined here
/usr/bin/ld: w_web_site.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_web_site.c:59: multiple definition of `g_w_web_site'; s_web_site.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_site.c:51: first defined here
collect2: error: ld returned 1 exit status

any clue?

TPC-DS SQL ERRORS

when i use source query1.sql to test，i met error FAILED: SemanticException Line 0:-1 Unsupported SubQuery Expression 'ctr_store_sk': Only SubQuery expressions that are top level conjuncts are allowed，there are the same questions in sql6，9，10，16，23，24，32，and in sql8，ERROR is FAILED: NullPointerException null，can anyone help me？thank you a lot.

SQL query typos

All the queries with "interval #num days" will fail. "days" is a typo, it should be "day".

Problematic queries:

query12.sql
query16.sql
query20.sql
query21.sql
query32.sql
query37.sql
query40.sql
query5.sql
query77.sql
query80.sql
query82.sql
query92.sql
query94.sql
query95.sql
query98.sql

load-partitioned.sql doesn't inherit shell value for ${REDUCERS}

The intent for the tpch-setup.sh command is to pass the REDUCERS variable into the final stage, which checks the table creation in the line

hive -i settings/load-${SCHEMA_TYPE}.sql -f ddl-tpch/bin_${SCHEMA_TYPE}/analyze.sql --database ${DATABASE};

But the $REDUCERS variable cannot be interpreted in load-partitioned.sql in the line

set hive.exec.reducers.max=${REDUCERS};

This results in a hive execution error

why scale must be greater than 1?

maven does not work with http any more

Hi, in the current version since http protocol is used to interact with maven repository it is not working. Could you please update the version to latest maven and verify.

complie error

config.h:137: warning: "HUGE_TYPE" redefined
137 | #define HUGE_TYPE __int64
|
config.h:103: note: this is the location of the previous definition
103 | #define HUGE_TYPE int64_t
|
config.h:139: warning: "HUGE_FORMAT" redefined
139 | #define HUGE_FORMAT "%I64d"
|
config.h:104: note: this is the location of the previous definition
104 | #define HUGE_FORMAT "%lld"
|
In file included from mkheader.c:37:
porting.h:46:10: fatal error: values.h: No such file or directory
46 | #include <values.h>
| ^~~~~~~~~~
compilation terminated.
make: *** [: mkheader.o] Error 1

hive-testbench doesn't generate database

Hello,

I'm using hive-testbench-hdp3 on HDP3.1.4. A bunch of files were generated in target HDFS directory after running tpcds-setup.sh. However no database was generated. Any advice to address the issue?

My steps are like:

run tpcds-build.sh
run 'FORMAT=parquet ./tpcds-setup.sh 10 /benchmarks/tpcds'. Note it reported an error as below

TPC-DS text data generation complete.
Loading text data into external tables.
make: *** [date_dim] Error 1
Loading constraints
Data loaded into database tpcds_bin_partitioned_parquet_10.

check databses in hive - there is no new databse generated.

Error: when run sample-queries-tpcds/query1.sql

CDH 6.0.0 Hive 2.1.1
when run the script hive-testbench/sample-queries-tpcds/query1.sql, it show the error:

source query1.sql;
FAILED: ParseException line 15:31 cannot recognize input near '(' 'select' 'avg' in expression specification

and query2.sql do not work, too.