Comments (1)
Past convert_submit_cpu.template
root@9a55836b2913:/spark-rapids-benchmarks/nds# cat convert_submit_cpu_iceberg.template
#
# SPDX-FileCopyrightText: Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
export SPARK_HOME=${SPARK_HOME:-/usr/lib/spark}
# 1. The iceberg-spark-runtime-3.2_2.12:0.13.1 only works on Spark 3.2.x
# Please refer to https://iceberg.apache.org/releases/ for other Spark versions.
# 2. The Iceberg catalog/tables is expected to be in current directory.
# see `spark.sql.catalog.spark_catalog.warehouse`.
export SPARK_CONF=("--master" "spark://$HOSTNAME:7077"
"--deploy-mode" "client"
"--deploy-mode" "cluster"
"--driver-memory" "10G"
"--num-executors" "1"
"--executor-memory" "40G"
"--executor-cores" "12"
"--conf" "spark.task.cpus=1"
"--packages" "org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1"
"--conf" "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
"--conf" "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog"
"--conf" "spark.sql.catalog.spark_catalog.type=hadoop"
"--conf" "spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog"
"--conf" "spark.sql.catalog.local.type=hadoop"
"--conf" "spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse")
from spark-rapids-benchmarks.
Related Issues (20)
- Transcode from other formats besides CSV
- Query filtering for nds_power HOT 2
- [Bug] Failed to build twice in nds/tpcds-gen HOT 2
- [FEA] Support Iceberg and DeltaLake as input data format for data conversion
- [FEA] Allow property files for nds_transcode and other nds_* scripts HOT 1
- [BUG] Got error "cannot resolve 'd_date' given input columns" when run nds_maintenance.py HOT 1
- [BUG] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project tpcds-gen
- Update README to latest version of spark-rapids HOT 1
- [FEA] nds_validate.py can compare some specified queries
- [FEA]Define a pre-commit hook to update copyright year automatically
- [BUG] nds_transcode.py is not handling international characters correctly HOT 1
- [BUG] validation failure of power run results on gpu with nvcomp and power run results on cpu HOT 7
- [BUG] No need for patches/code.patch HOT 1
- [BUG] Throughput run should use a new template file instead of using the Power test one
- [BUG] Table meta information are missing when running Data Maintenance over Delta
- Implement a pre-commit / premerge check for licence headers HOT 1
- [FEA] Create NDS-H benchmark for performance analysis HOT 2
- [QST] Cannot run on GPU because GpuCSVScan only supports UTF8 encoded data HOT 14
- [BUG] Iceberg related jobs failed due to Spark version incompatibility
- [BUG] Delta related jobs failed due to Spark version incompatibility
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-rapids-benchmarks.