Spark Excel Library
A library for querying Excel files with Apache Spark, for Spark SQL and DataFrames.
Requirements
This library requires Spark 1.4+
Linking
You can link against this library in your program at the following coordinates:
Scala 2.10
groupId: com.crealytics
artifactId: spark-excel_2.10
version: 0.8.2
Scala 2.11
groupId: com.crealytics
artifactId: spark-excel_2.11
version: 0.8.2
Using with Spark shell
This package can be added to Spark using the --packages
command line option. For example, to include it when starting the spark shell:
Spark compiled with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-excel_2.11:0.8.2
Spark compiled with Scala 2.10
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-excel_2.10:0.8.2
Features
This package allows querying Excel spreadsheets as Spark DataFrames.
Scala API
Spark 1.4+:
Create a DataFrame from an Excel file:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.crealytics.spark.excel")
.option("location", "Worktime.xlsx")
.option("sheetName", "Daily")
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "true")
.load()
Building From Source
This library is built with SBT.
To build a JAR file simply run sbt assembly
from the project root.
The build configuration includes support for both Scala 2.10 and 2.11.