Giter VIP home page Giter VIP logo

hive-testbench's Introduction

hive-testbench

A testbench for experimenting with Apache Hive at any data scale.

Overview

The hive-testbench is a data generator and set of queries that lets you experiment with Apache Hive at scale. The testbench allows you to experience base Hive performance on large datasets, and gives an easy way to see the impact of Hive tuning parameters and advanced settings.

Prerequisites

You will need:

  • Hadoop 2.2 or later cluster or Sandbox.
  • Apache Hive.
  • Between 15 minutes and 2 days to generate data (depending on the Scale Factor you choose and available hardware).
  • If you plan to generate 1TB or more of data, using Apache Hive 13+ to generate the data is STRONGLY suggested.

Install and Setup

All of these steps should be carried out on your Hadoop cluster.

  • Step 1: Prepare your environment.

    In addition to Hadoop and Hive, before you begin ensure gcc is installed and available on your system path. If you system does not have it, install it using yum or apt-get.

  • Step 2: Decide which test suite(s) you want to use.

    hive-testbench comes with data generators and sample queries based on both the TPC-DS and TPC-H benchmarks. You can choose to use either or both of these benchmarks for experiementation. More information about these benchmarks can be found at the Transaction Processing Council homepage.

  • Step 3: Compile and package the appropriate data generator.

    For TPC-DS, ./tpcds-build.sh downloads, compiles and packages the TPC-DS data generator. For TPC-H, ./tpch-build.sh downloads, compiles and packages the TPC-H data generator.

  • Step 4: Decide how much data you want to generate.

    You need to decide on a "Scale Factor" which represents how much data you will generate. Scale Factor roughly translates to gigabytes, so a Scale Factor of 100 is about 100 gigabytes and one terabyte is Scale Factor 1000. Decide how much data you want and keep it in mind for the next step. If you have a cluster of 4-10 nodes or just want to experiment at a smaller scale, scale 1000 (1 TB) of data is a good starting point. If you have a large cluster, you may want to choose Scale 10000 (10 TB) or more. The notion of scale factor is similar between TPC-DS and TPC-H.

    If you want to generate a large amount of data, you should use Hive 13 or later. Hive 13 introduced an optimization that allows far more scalable data partitioning. Hive 12 and lower will likely crash if you generate more than a few hundred GB of data and tuning around the problem is difficult. You can generate text or RCFile data in Hive 13 and use it in multiple versions of Hive.

  • Step 5: Generate and load the data.

    The scripts tpcds-setup.sh and tpch-setup.sh generate and load data for TPC-DS and TPC-H, respectively. General usage is tpcds-setup.sh scale_factor [directory] or tpch-setup.sh scale_factor [directory]

    Some examples:

    Build 1 TB of TPC-DS data: ./tpcds-setup.sh 1000

    Build 1 TB of TPC-H data: ./tpch-setup.sh 1000

    Build 100 TB of TPC-DS data: ./tpcds-setup.sh 100000

    Build 30 TB of text formatted TPC-DS data: FORMAT=textfile ./tpcds-setup 30000

    Build 30 TB of RCFile formatted TPC-DS data: FORMAT=rcfile ./tpcds-setup 30000

    Also check other parameters in setup scripts important one is BUCKET_DATA.

  • Step 6: Run queries.

    More than 50 sample TPC-DS queries and all TPC-H queries are included for you to try. You can use hive, beeline or the SQL tool of your choice. The testbench also includes a set of suggested settings.

    This example assumes you have generated 1 TB of TPC-DS data during Step 5:

     cd sample-queries-tpcds
     hive -i testbench.settings
     hive> use tpcds_bin_partitioned_orc_1000;
     hive> source query55.sql;
    

    Note that the database is named based on the Data Scale chosen in step 3. At Data Scale 10000, your database will be named tpcds_bin_partitioned_orc_10000. At Data Scale 1000 it would be named tpch_flat_orc_1000. You can always show databases to get a list of available databases.

    Similarly, if you generated 1 TB of TPC-H data during Step 5:

     cd sample-queries-tpch
     hive -i testbench.settings
     hive> use tpch_flat_orc_1000;
     hive> source tpch_query1.sql;
    

Feedback

If you have questions, comments or problems, visit the Hortonworks Hive forum.

If you have improvements, pull requests are accepted.

hive-testbench's People

Contributors

amitagarwal06 avatar cartershanklin avatar chetnachaudhari avatar dchiguruvada avatar dongjoon-hyun avatar jcamachor avatar ndembla avatar rajeshbalamohan avatar rbalamohan avatar t3rmin4t0r avatar ttmahdy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hive-testbench's Issues

./tpcds-bulld.sh fails, access denied error

Trying to build the tools, getting access denied. Tried from different clients, none seem to be working?

./tpcds-build.sh
Building TPC-DS Data Generator
curl http://dev.hortonworks.com.s3.amazonaws.com/hive-testbench/tpcds/README
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A0EC229B461934D8</RequestId><HostId>4xhSEQCANRAYks4zmXeXETDA7OYzXppuB0+CyjWiblxcT9WKhXNNaQJctVxF8IGiQQg0itPjFo8=</HostId></Error>curl --output tpcds_kit.zip http://dev.hortonworks.com.s3.amazonaws.com/hive-testbench/tpcds/TPCDS_Tools.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   243    0   243    0     0    997      0 --:--:-- --:--:-- --:--:--  1000
mkdir -p target/
cp tpcds_kit.zip target/tpcds_kit.zip
test -d target/tools/ || (cd target; unzip tpcds_kit.zip)
Archive:  tpcds_kit.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of tpcds_kit.zip or
        tpcds_kit.zip.zip, and cannot find tpcds_kit.zip.ZIP, period.
make: *** [target/tools/dsdgen] Error 9

Unzip doesn't Work

When I run ./tpcds-build.sh, it gave me an unzip error as below:

Building TPC-DS Data Generator
test -d target/tools/ || (cd target; unzip tpcds_kit.zip)
Archive:  tpcds_kit.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of tpcds_kit.zip or
        tpcds_kit.zip.zip, and cannot find tpcds_kit.zip.ZIP, period.
make: *** [Makefile:19: target/tools/dsdgen] Error 9
TPC-DS Data Generator built, you can now use tpcds-setup.sh to generate data.

Does this mean the zip file is corrupt? Can anyone help me here? Thanks in advance.

./tpcds-setup.sh 2 fails

Hello, I am running:

  • Hadoop 2.7.1
  • Hive 1.2.2

and I am trying to generate testdata.

I did:

./tpcds-build.sh
./tpcds-setup.sh 2

the latter fails then with this message:

[...]
TPC-DS text data generation complete.
Loading text data into external tables.
0: jdbc:hive2://localhost:2181/ (closed)> create database if not exists ${DB};

make: *** [load_orc_2.mk:3: date_dim] Error 2
Loading constraints
0: jdbc:hive2://localhost:2181/ (closed)> -- set hivevar:DB=tpcds_bin_partitioned_orc_10000
0: jdbc:hive2://localhost:2181/ (closed)>
0: jdbc:hive2://localhost:2181/ (closed)> alter table customer_address add constraint ${DB}_pk_ca primary key (ca_address_sk) disable novalidate rely;

Data loaded into database tpcds_bin_partitioned_orc_2.

But when I check, the database does not exist. How can I resolve this issue?

Does it support version 3?

Does it support version 3? I use Hadoop 3 and hive 3 for testing. I can generate data, but I can't build hive database and table。

./tpch-setup.sh execution failed

I using tpch-setup.sh in CDH6.3.2
the environment variable below :

XDG_SESSION_ID=104
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../spark
HOSTNAME=node01.cdh6.citms.cn
TERM=xterm
SHELL=/bin/bash
HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
HISTSIZE=1000
SSH_CLIENT=192.168.0.99 57577 22
HADOOP_PREFIX=/opt/cloudera/parcels/CDH/lib/hadoop
SSH_TTY=/dev/pts/0
USER=hdfs
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hbase
MAIL=/var/spool/mail/root
PATH=.:/usr/local/mysql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs
HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hive
JAVA=/usr/java/jdk1.8.0_271-amd64/bin/java
kylin_hadoop_conf_dir=/usr/local/apache-kylin-4.0.0-bin-spark2/hadoop_conf
PWD=/root/hive-testbench
HADOOP_YARN_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn
JAVA_HOME=/usr/java/jdk1.8.0_271-amd64
HADOOP_CONF_DIR=/etc/hadoop/conf
LANG=zh_CN.UTF-8
HISTCONTROL=ignoredups
SHLVL=5
HOME=/root
HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce
KYLIN_HOME=/usr/local/apache-kylin-4.0.0-bin-spark2
LOGNAME=hdfs
SSH_CONNECTION=192.168.0.99 57577 192.168.10.146 22
LESSOPEN=||/usr/bin/lesspipe.sh %s
MYSQL_HOME=/usr/local/mysql
XDG_RUNTIME_DIR=/run/user/0
_=/usr/bin/env
OLDPWD=/root/hive-testbench/tpch-gen

the output:

[root@node01 hive-testbench]# ./tpch-setup.sh 2
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
ls: `/tmp/tpch-generate/2/lineitem': No such file or directory
Generating data at scale factor 2.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
   at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:348)
   at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
   at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
   at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
   at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3151)
   at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3196)
   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3235)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3286)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3254)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:478)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
   at org.notmysock.tpch.GenTable.genInput(GenTable.java:171)
   at org.notmysock.tpch.GenTable.run(GenTable.java:98)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
   at org.notmysock.tpch.GenTable.main(GenTable.java:54)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
ls: `/tmp/tpch-generate/2/lineitem': No such file or directory
Data generation failed, exiting.

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.

Commands in scripts:
runcommand "$HIVE -f ddl-tpcds/bin_partitioned/add_constraints.sql --hivevar DB=${DATABASE}"

Error description:
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Parent table not found: customer_address)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Parent table not found: customer_address) (state=08S01,code=1)

But I see that the tables in the database exist!

Has anyone ever encountered this problem?

testbench.settings file not found

Hi,
I am trying to run this benchmark on my setup but while running sample query current repository unable to find "testbench.settings" file on bellow location


Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hive-common-1.1.0-cdh5.14.2.jar!/hive-log4j.properties
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
Exception in thread "main" java.io.FileNotFoundException: File file:/bench/hive-testbench/sample-queries-tpcds/testbench.settings does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:598)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:344)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:784)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:437)
	at org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:449)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:723)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
root@sn1:/bench/hive-testbench/sample-queries-tpcds# ls /bench/hive-testbench/sample-queries-tpcds/testbench.settings 
ls: cannot access /bench/hive-testbench/sample-queries-tpcds/testbench.settings: No such file or directory
root@sn1:/bench/hive-testbench/sample-queries-tpcds# 

Table not found 'date_dim'

While generating tpcds data, I got a make error. When "export DEBUG_SCRIPT=X" was set, it gave a little more detailed error as below. Anyone knows what's going on?

jdbc:hive2://localhost:2181/> create table date_dim
jdbc:hive2://localhost:2181/> stored as ${FILE}
jdbc:hive2://localhost:2181/> as select * from ${SOURCE}.date_dim;
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 3:17 Table not found 'date_dim' (state=42S02,code=10001)

Closing: 0: jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2?tez.queue.name=default
make: *** [date_dim] Error 2

Hive 13. Typo?

Hi all,

i just got started learning big data hadoop, etc.

when it's written

If you want to generate a large amount of data, you should use Hive 13 or later.

what is it? the client's version?
or did he mean apache version 3?

https://github.com/apache/hive
image

makefile error for tpcds

I have been trying to setup hive-testbench for generating tpcds data. Upon running ./tpcds-setup.sh, I observed the following error.

I'm using hadoop- 3.3.6, hive - hive-4.0.0-alpha-1 versions.

gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -o dsdgen s_brand.o s_customer_address.o s_call_center.o s_catalog.o s_catalog_order.o s_catalog_order_lineitem.o s_catalog_page.o s_catalog_promotional_item.o s_catalog_returns.o s_category.o s_class.o s_company.o s_customer.o s_division.o s_inventory.o s_item.o s_manager.o s_manufacturer.o s_market.o s_pline.o s_product.o s_promotion.o s_purchase.o s_reason.o s_store.o s_store_promotional_item.o s_store_returns.o s_subcategory.o s_subclass.o s_warehouse.o s_web_order.o s_web_order_lineitem.o s_web_page.o s_web_promotinal_item.o s_web_returns.o s_web_site.o s_zip_to_gmt.o w_call_center.o w_catalog_page.o w_catalog_returns.o w_catalog_sales.o w_customer_address.o w_customer.o w_customer_demographics.o w_datetbl.o w_household_demographics.o w_income_band.o w_inventory.o w_item.o w_promotion.o w_reason.o w_ship_mode.o w_store.o w_store_returns.o w_store_sales.o w_timetbl.o w_warehouse.o w_web_page.o w_web_returns.o w_web_sales.o w_web_site.o dbgen_version.o address.o build_support.o date.o decimal.o dist.o driver.o error_msg.o genrand.o join.o list.o load.o misc.o nulls.o parallel.o permute.o pricing.o print.o r_params.o StringBuffer.o tdef_functions.o tdefs.o text.o scd.o scaling.o release.o sparse.o validate.o -lm
/usr/bin/ld: s_purchase.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_purchase.c:55: multiple definition of nItemIndex'; s_catalog_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here /usr/bin/ld: s_web_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order.c:56: multiple definition of nItemIndex'; s_catalog_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order_lineitem.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order_lineitem.c:54: multiple definition of g_s_web_order_lineitem'; s_web_order.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_order.c:54: first defined here /usr/bin/ld: w_catalog_page.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_catalog_page.c:52: multiple definition of g_w_catalog_page'; s_catalog_page.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_catalog_page.c:51: first defined here
/usr/bin/ld: w_warehouse.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_warehouse.c:53: multiple definition of g_w_warehouse'; s_warehouse.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_warehouse.c:51: first defined here /usr/bin/ld: w_web_site.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/w_web_site.c:59: multiple definition of g_w_web_site'; s_web_site.o:/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools/s_web_site.c:51: first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [makefile:233: dsdgen] Error 1
make[1]: Leaving directory '/work/tpcds-insights/hive-testbench/tpcds-gen/target/tools'
make: *** [Makefile:23: target/tools/dsdgen] Error 2
TPC-DS Data Generator built, you can now use tpcds-setup.sh to generate data.
hadoop@f1953c80b0cb:/work/tpcds-insights/hive-testbench$

could you please help me in resolving this issue?

SQL parsing errors (HDP 2.6)

Got many parsing errors when executing benchmarks on both TPC-DS and TPC-H.
However, the same sql can be parsed and executed via the UI of Hive View 2.

Platform: HDP 2.6, Hive +LLAP
Error messages:
NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1114) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1160) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

TPC-DS Table Generation Execution Error

Sometimes this happens on "large" scale factors. The optimization step fails at table 17 or 18.
Make 'recipe error'
The specific error is:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Is this some kind of permission error and how could I fix this?

hive configuration hive.optimize.sort.dynamic.partition.threshold does not exists

failed to execute tpch-setup.sh in hive2.3.5, hadoop2.8.5

DEBUG_SCRIPT=ON FORMAT=textfile ./tpch-setup.sh 100 /data
Logging initialized using configuration in file:/usr/local/service/hive/conf/hive-log4j2.properties Async: true
Query returned non-zero code: 1, cause: hive configuration hive.optimize.sort.dynamic.partition.threshold does not exists.

the sample table did not appear,make error

sudo -u hdfs ./tpcds-setup.sh 100
TPC-DS text data generation complete.
Loading text data into external tables.
make: *** [date_dim] Error 1
Loading constraints
Data loaded into database tpcds_bin_partitioned_orc_100.

/tmp/tpcds-generate/100 is null

make fails in tpcds-gen

Compilation fails for me with:

/usr/bin/ld: s_purchase.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_purchase.c:55: multiple definition of `nItemIndex'; s_catalog_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order.c:56: multiple definition of `nItemIndex'; s_catalog_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_order.c:56: first defined here
/usr/bin/ld: s_web_order_lineitem.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order_lineitem.c:54: multiple definition of `g_s_web_order_lineitem'; s_web_order.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_order.c:54: first defined here
/usr/bin/ld: w_catalog_page.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_catalog_page.c:52: multiple definition of `g_w_catalog_page'; s_catalog_page.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_catalog_page.c:51: first defined here
/usr/bin/ld: w_warehouse.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_warehouse.c:53: multiple definition of `g_w_warehouse'; s_warehouse.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_warehouse.c:51: first defined here
/usr/bin/ld: w_web_site.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/w_web_site.c:59: multiple definition of `g_w_web_site'; s_web_site.o:/home/stefano/Repositories/hive-testbench/tpcds-gen/target/tools/s_web_site.c:51: first defined here
collect2: error: ld returned 1 exit status

any clue?

TPC-DS SQL ERRORS

when i use source query1.sql to test,i met error FAILED: SemanticException Line 0:-1 Unsupported SubQuery Expression 'ctr_store_sk': Only SubQuery expressions that are top level conjuncts are allowed,there are the same questions in sql6,9,10,16,23,24,32,and in sql8,ERROR is FAILED: NullPointerException null,can anyone help me?thank you a lot.

SQL query typos

All the queries with "interval #num days" will fail. "days" is a typo, it should be "day".

Problematic queries:

query12.sql
query16.sql
query20.sql
query21.sql
query32.sql
query37.sql
query40.sql
query5.sql
query77.sql
query80.sql
query82.sql
query92.sql
query94.sql
query95.sql
query98.sql

load-partitioned.sql doesn't inherit shell value for ${REDUCERS}

The intent for the tpch-setup.sh command is to pass the REDUCERS variable into the final stage, which checks the table creation in the line

hive -i settings/load-${SCHEMA_TYPE}.sql -f ddl-tpch/bin_${SCHEMA_TYPE}/analyze.sql --database ${DATABASE};

But the $REDUCERS variable cannot be interpreted in load-partitioned.sql in the line

set hive.exec.reducers.max=${REDUCERS};

This results in a hive execution error

maven does not work with http any more

Hi, in the current version since http protocol is used to interact with maven repository it is not working. Could you please update the version to latest maven and verify.

complie error

config.h:137: warning: "HUGE_TYPE" redefined
137 | #define HUGE_TYPE __int64
|
config.h:103: note: this is the location of the previous definition
103 | #define HUGE_TYPE int64_t
|
config.h:139: warning: "HUGE_FORMAT" redefined
139 | #define HUGE_FORMAT "%I64d"
|
config.h:104: note: this is the location of the previous definition
104 | #define HUGE_FORMAT "%lld"
|
In file included from mkheader.c:37:
porting.h:46:10: fatal error: values.h: No such file or directory
46 | #include <values.h>
| ^~~~~~~~~~
compilation terminated.
make: *** [: mkheader.o] Error 1

hive-testbench doesn't generate database

Hello,

I'm using hive-testbench-hdp3 on HDP3.1.4. A bunch of files were generated in target HDFS directory after running tpcds-setup.sh. However no database was generated. Any advice to address the issue?

My steps are like:

  1. run tpcds-build.sh

  2. run 'FORMAT=parquet ./tpcds-setup.sh 10 /benchmarks/tpcds'. Note it reported an error as below

TPC-DS text data generation complete.
Loading text data into external tables.
make: *** [date_dim] Error 1
Loading constraints
Data loaded into database tpcds_bin_partitioned_parquet_10.

  1. check databses in hive - there is no new databse generated.

Error: when run sample-queries-tpcds/query1.sql

CDH 6.0.0 Hive 2.1.1
when run the script hive-testbench/sample-queries-tpcds/query1.sql, it show the error:

source query1.sql;
FAILED: ParseException line 15:31 cannot recognize input near '(' 'select' 'avg' in expression specification

and query2.sql do not work, too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.