Giter VIP home page Giter VIP logo

spark-root's Introduction

Spark ROOT

Under rapid development ๐Ÿ‘

Current Release is on Maven Central: 0.1.13

DOI

Connect ROOT to ApacheSpark to be able to read ROOT TTrees, infer the schema and manipulate the data via Spark's DataFrames/Datasets/RDDs.

Current Limitations

  • Pointers are currently not well supported

Requirements

  • Apache Spark 2.0.
  • Scala 2.11
  • root4j - available on Maven Central

Test Example - Schema Inferral

./spark-shell --packages org.diana-hep:spark-root_2.11:0.1.0

import org.dianahep.sparkroot._

The file used here is available in the resources of the repo
val df = spark.sqlContext.read.root("path/to/spark-root/src/test/resources/test_basicTypes_NDArrays.root")

The ROOT file contains:
- Simple Numeric Types + Char
- Fixed Dim 1D Arrays of these types
- Fixed Dim ND Arrays of these types

scala> df.printSchema
root
 |-- a: integer (nullable = true)
 |-- b: double (nullable = true)
 |-- c: float (nullable = true)
 |-- d: byte (nullable = true)
 |-- f: boolean (nullable = true)
 |-- arr1: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- arr2: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- arr3: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- arr4: array (nullable = true)
 |    |-- element: byte (containsNull = true)
 |-- arr5: array (nullable = true)
 |    |-- element: boolean (containsNull = true)
 |-- multi1: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: integer (containsNull = true)
 |-- multi2: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: double (containsNull = true)
 |-- multi3: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: float (containsNull = true)
 |-- multi4: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: byte (containsNull = true)
 |-- multi5: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: array (containsNull = true)
 |    |    |    |-- element: boolean (containsNull = true)


scala> df.show
+---+----+----+---+-----+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|  a|   b|   c|  d|    f|                arr1|                arr2|                arr3|                arr4|                arr5|              multi1|              multi2|              multi3|              multi4|              multi5|
+---+----+----+---+-----+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|  0| 0.0| 0.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  1| 1.0| 1.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  2| 2.0| 2.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  3| 3.0| 3.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  4| 4.0| 4.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  5| 5.0| 5.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  6| 6.0| 6.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  7| 7.0| 7.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  8| 8.0| 8.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
|  9| 9.0| 9.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 10|10.0|10.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 11|11.0|11.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 12|12.0|12.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 13|13.0|13.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 14|14.0|14.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 15|15.0|15.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 16|16.0|16.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 17|17.0|17.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 18|18.0|18.0|120|false|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
| 19|19.0|19.0|120| true|[0, 1, 2, 3, 4, 5...|[0.0, 1.0, 2.0, 3...|[0.0, 1.0, 2.0, 3...|[0, 1, 2, 3, 4, 5...|[false, true, fal...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|[WrappedArray(Wra...|
+---+----+----+---+-----+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
only showing top 20 rows

spark-root's People

Contributors

jpivarski avatar vkhristenko avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.