Giter VIP home page Giter VIP logo

hbutani / spark-druid-olap Goto Github PK

View Code? Open in Web Editor NEW
285.0 50.0 96.0 129.89 MB

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Home Page: http://sparklinedata.com/

License: Apache License 2.0

Scala 93.80% Shell 6.20%
spark business-intelligence olap-cube sparksql query-optimization

spark-druid-olap's Introduction

Sparkline BI Accelerator

Latest release: 0.4.0
Documentation: Overview, Quick Start Guide, User Guide, Dev. Guide
Mailing List: User Mailing List
License: Apache 2.0
Continuous Integration: Build Status
Company: Sparkline Data

The Sparkline BI Accelerator is a Spark native Business Intelligence Stack geared towards providing fast ad-hoc querying over a Logical Cube(aka Star-Schema). It simplifies how enterprises can provide an ad-hoc query layer on top of a Hadoop/Spark(Big Open Data Stack).

  • we provide the ad-hoc query capability by extending the Spark SQL layer, through SQL extensions and an extended Optimizer(both logical and Physical optimizations).
  • we use OLAP indexing vs. pre-materialization as a technique to achieve query performance. OLAP Indexing is a well-known technique that is far superior to materialized views to support ad-hoc querying. We utilize another Open-Source Apache licensed Big Data component for the OLAP Indexing capability.

Overall Picture

spark-druid-olap's People

Contributors

hbutani avatar jpullokkaran avatar rrbutani avatar sdesikan6 avatar zimingwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-druid-olap's Issues

Followup on UnresolvedException in Spark for aggregation of the form fnX(fnY(..),..)

Followup on the following Spark-SQL issue: If there is an aggregation expression of the form fnX(fnY(..),..) (the top-level function has a fun. invocation as its child expression) then:

  • ResolveFunctions rule skips the top-level invocation. The expression is left as an UnResolvedFunction expression.
  • In ResolveGroupingAnalytics rule, the UnResolvedFunction is handled as an Alias
  • On the Expand operator, when the projections are computed when masking this expression as a null, a call is made to get its dataType(basicOperators:291). This causes a UnresolvedException to be thrown.

See DruidRewriteCubeTest::ShipDateYearAggCube for a an example.

Can be reproduced by replacing the shipDtYrGroup groupByExpr by "concat(concat(l_linestatus, 'a'), 'b')". More than one level of function invocation is needed, so "concat(l_linestatus, 'a')" works fine.

aggregation when join with another table

I have a InMemo table click_cached, And I try to join this table with a druid table cl_events_test and aggregate with druid like this select count(1),cast(cl_events_test.timestamp as date) as theday from cl_events_test, click_cached where click_cached.customerId=cl_events_test.customerId group by cast(cl_events_test.timestamp as date)

But I found druid index is not used in this case .

explain select count(1),cast(cl_events_test.timestamp as date) as theday from cl_events_test, click_cached where click_cached.customerId=cl_events_test.customerId group by cast(cl_events_test.timestamp as date);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| == Physical Plan == |
| TungstenAggregate(key=[cast(timestamp#318 as date)#473], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#456L,theday#448]) |
| +- TungstenExchange hashpartitioning(cast(timestamp#318 as date)#473,200), None |
| +- TungstenAggregate(key=[cast(timestamp#318 as date) AS cast(timestamp#318 as date)#473], functions=[(count(1),mode=Partial,isDistinct=false)], output=[cast(timestamp#318 as date)#473,count#475L]) |
| +- Project [timestamp#318] |
| +- BroadcastHashJoin [customerId#316L], [customerId#453L], BuildRight |
| :- Project [timestamp#318,customerId#316L] |
| : +- Scan DruidRelationInfo(fullName = DruidRelationName(cl_events_test,10.25.2.91,cl_events_test), sourceDFName = cl_events_base, |
| timeDimensionCol = timestamp, |
| options = DruidRelationOptions(1000000,100000,true,true,true,30000,true,/druid,true,false,1,true,None))[event#313,targetId#314,targetName#315,customerId#316L,source#317,timestamp#318] |
| +- InMemoryColumnarTableScan [customerId#453L], InMemoryRelation [_c0#323L,theday#322,customerId#453L], true, 10000, StorageLevel(true, true, false, true, 1), Project [alias-2#325L AS _c0#323L,cast(alias-1#324 as date) AS theday#322,cast(customerId#316 as bigint) AS customerId#316L], Some(click_cached) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+

count(distinct(dimension)) cannot translate to hyperUnique

I created a batch ingestion spec and defined a hyperUnique metric in it. When I query data from dataSource using sparkline, I found the count(distinct(dimension)) cannot translate to hyperUnique aggregation. Is it a bug or the misuse I made? or this feature is not supported for now?

Enable rewrites against the underlying Fact table

For example for tpch queries: allow the query to be written against lineitembase
The original thought was to add table properties to the underlying table recording the fact that it is associated with a Druid Index/Datasource. But this will not work because these table properties are not exposed in BaseRelation. So at the time of query rewrite we have lost the association between a LogicalRelation Operator and the underlying Table

Druid Query Plan not generated when raw dataset is JSON

I have a raw dataset that contains JSON objects. I'm able to load onto Druid and query it using Druid Queries. However, when, I try to run a "groupBy" command using the accelerator (using 2.0), Druid query plan doesn't get generated, instead, the query runs against raw dataset:
sql("""CREATE TEMPORARY TABLE click_summary USING org.apache.spark.sql.json OPTIONS (path '/tmp/test/part-r-00000')""".stripMargin).printSchema() sql("""SELECT count(*) from click_summary""".stripMargin).show() sql(""" CREATE TEMPORARY TABLE clicksummarized USING org.sparklinedata.druid OPTIONS (sourceDataframe "click_summary", timeDimensionColumn "processingTime", druidDatasource "clickenhanced", druidHost "localhost", zkQualifyDiscoveryNames "true", queryHistoricalServers "true", numProcessingThreadsPerHistorical '1', starSchema ' { "factTable" : "clicksummarized", "relations" : [] } ')""".stripMargin) sql("""SELECT adIdChain.advertiser_guid, sum(clickCounters.total_click_count) as clicks from clicksummarized group by adIdChain.advertiser_guid""".stripMargin).show()

I tried specifying a columnMapping, but that doesn't help either:
With field names in Druid matching the nested naming of JSON
columnMapping '{"adIdChain.advertiser_guid" : "adIdChain.advertiser_guid","clickCounters.total_click_count" : "clickCounters.total_click_count"}',

With field names in Druid different from the nested naming of JSON
.option("columnMapping", "{\"adIdChain.advertiser_guid\" : \"adIdChain__advertiser_guid\"," + "\"clickCounters.total_click_count\" : \"clickCounters__total_click_count\"}")

handle spark datetime expressions in where clause on time dimension

"""
|SELECT Sum(lineitem.l_extendedprice) AS
| sum_l_extendedprice_ok,
| Cast(Concat(To_date(Cast(
| Concat(To_date(lineitem.l_shipdate), ' 00:00:00') AS
| TIMESTAMP)), ' 00:00:00') AS TIMESTAMP) AS
| tdy_l_shipdate_ok
|FROM (SELECT *
| FROM lineitem) lineitem
|
|WHERE ( ( Cast(Concat(To_date(lineitem.l_shipdate), ' 00:00:00') AS TIMESTAMP)
| >= Cast(
| '1993-05-19 00:00:00' AS TIMESTAMP) )
| AND ( Cast(Concat(To_date(lineitem.l_shipdate), ' 00:00:00') AS
| TIMESTAMP) <=
| Cast(
| '1998-08-02 00:00:00' AS TIMESTAMP) ) )
|GROUP BY Cast(Concat(To_date(Cast(
| Concat(To_date(lineitem.l_shipdate), ' 00:00:00') AS
| TIMESTAMP)), ' 00:00:00') AS TIMESTAMP)
""".stripMargin

Non-SQL df queries failing

Group by on columns in a dataframe are not working.

scala> q1OLAP.groupBy("l_returnflag").count().show()

16/01/15 11:21:59 ERROR Executor: Exception in task 0.0 in stage 16.0 (TID 1285)
org.sparklinedata.druid.DruidDataSourceException: Unexpected response status: HTTP/1.1 500 Internal Server Error
at org.sparklinedata.druid.client.DruidClient$$anonfun$3$$anonfun$apply$1.apply(DruidClient.scala:86)
at org.sparklinedata.druid.client.DruidClient$$anonfun$3$$anonfun$apply$1.apply(DruidClient.scala:81)
at scala.util.Try$.apply(Try.scala:161)
at org.sparklinedata.druid.client.DruidClient$$anonfun$3.apply(DruidClient.scala:81)
at org.sparklinedata.druid.client.DruidClient$$anonfun$3.apply(DruidClient.scala:70)
at scala.util.Success.flatMap(Try.scala:200)
at org.sparklinedata.druid.client.DruidClient.perform(DruidClient.scala:70)
at org.sparklinedata.druid.client.DruidClient.post(DruidClient.scala:101)
at org.sparklinedata.druid.client.DruidClient.executeQuery(DruidClient.scala:150)
at org.sparklinedata.druid.DruidRDD.compute(DruidRDD.scala:46)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/01/15 11:21:59 WARN ThrowableSerializationWrapper: Task exception could not be deserialized
java.lang.ClassNotFoundException: org.sparklinedata.druid.DruidDataSourceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/01/15 11:21:59 ERROR TaskResultGetter: Could not deserialize TaskEndReason: ClassNotFound with classloader org.apache.spark.repl.SparkIMain$TranslatingClassLoader@41a64f33
16/01/15 11:21:59 WARN TaskSetManager: Lost task 0.0 in stage 16.0 (TID 1285, localhost): UnknownReason
16/01/15 11:21:59 ERROR TaskSetManager: Task 0 in stage 16.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 1285, localhost): UnknownReason
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1314)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1377)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:401)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:362)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:370)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:23)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:32)
at $iwC$$iwC$$iwC$$iwC.(:34)
at $iwC$$iwC$$iwC.(:36)
at $iwC$$iwC.(:38)
at $iwC.(:40)
at (:42)
at .(:46)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Executing same query multiple times and it fetches same result, however new data ingested in druid.

Hi Team,

I am executing same query multiple times with same session (same spark context). Also simultaneously new data ingested in druid. I am not getting updated result. Is there a way to clear the cache and fetch updated result from druid.

Finally if I kill the current session and start a new session, then getting updated result.

Let me know if any fix/solution available to this issue.

Thanks,
Senthil

No such Table exception

When I am running multiple Tableau users with many sheets I get the following error. The table is there and the error is intermittent. The same query executed 10-15 seconds after the error goes through.

org.spark-project.guava.util.concurrent.UncheckedExecutionException: org.apache.spark.sql.catalyst.analysis.NoSuchTableException
at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:387)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog.org$apache$spark$sql$hive$sparklinedata$SparklineMetastoreCatalog$$super$lookupRelation(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog$$anonfun$lookupRelation$2.apply(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog$$anonfun$lookupRelation$2.apply(SparklineDataContext.scala:86)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog.lookupRelation(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineDataContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(SparklineDataContext.scala:74)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at org.apache.spark.sql.hive.sparklinedata.SparklineDataContext$$anon$1.lookupRelation(SparklineDataContext.scala:74)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:303)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:315)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:310)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:265)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:305)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:265)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:305)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:310)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:300)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:211)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException
at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122)
at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122)
at org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog.org$apache$spark$sql$hive$sparklinedata$SparklineMetastoreCatalog$$super$lookupRelation(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog$$anonfun$lookupRelation$2.apply(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog$$anonfun$lookupRelation$2.apply(SparklineDataContext.scala:86)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.sparklinedata.SparklineMetastoreCatalog.lookupRelation(SparklineDataContext.scala:86)
at org.apache.spark.sql.hive.sparklinedata.SparklineDataContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(SparklineDataContext.scala:74)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at org.apache.spark.sql.hive.sparklinedata.SparklineDataContext$$anon$1.lookupRelation(SparklineDataContext.scala:74)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)
at org.sparklinedata.druid.DefaultSource.createRelation(DefaultSource.scala:41)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:180)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:124)
at org.spark-project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4880)
... 79 more
16/06/05 08:41:26 ERROR SparkExecuteStatementOperation: Error running hive query:

handle spark datetime functions as grouping expressions

  1. Show by YEAR in Tableau
    SELECT YEAR(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)) AS yr_l_shipdate_ok FROM ( select * from lineitembase ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) GROUP BY YEAR(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))

2.SELECT CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), 'yyyy-MM-01 00:00:00') AS TIMESTAMP) AS tmn_l_shipdate_ok FROM ( select * from lineitembase ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) GROUP BY CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), 'yyyy-MM-01 00:00:00') AS TIMESTAMP)

  1. SELECT SUM(lineitem.l_extendedprice) AS sum_l_extendedprice_ok, CAST(CONCAT(TO_DATE(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), ' 00:00:00') AS TIMESTAMP) AS tdy_l_shipdate_ok FROM ( select * from lineitembase ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) GROUP BY CAST(CONCAT(TO_DATE(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), ' 00:00:00') AS TIMESTAMP)

Generating Denormalized TPCH Dataset

This is the command I used learning from here: https://github.com/SparklineData/spark-druid-olap/wiki/Generating-Denormalized-TPCH-Dataset

spark yingyang$ bin/spark-submit --packages com.databricks:spark-csv_2.10:1.1.0,SparklineData:spark-datetime:0.0.2,SparklineData:spark-druid-olap:0.0.2 --class org.sparklinedata.tpch.TpchGenMain /Users/yingyang/Downloads/tpch-spark-druid-master/tpchData/target/scala-2.10/tpchdata_2.10-0.0.1.jar /Users/yingyang/Downloads/data_dbgen --scale 1

I got an error:
Ivy Default Cache set to: /Users/yingyang/.ivy2/cache
The jars for the packages stored in: /Users/yingyang/.ivy2/jars
:: loading settings :: url = jar:file:/Users/yingyang/Downloads/spark/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
SparklineData#spark-datetime added as a dependency
SparklineData#spark-druid-olap added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.10;1.1.0 in list
found org.apache.commons#commons-csv;1.1 in list
found com.univocity#univocity-parsers;1.5.1 in list
found SparklineData#spark-datetime;0.0.2 in spark-packages
found com.github.nscala-time#nscala-time_2.10;1.6.0 in list
found joda-time#joda-time;2.5 in list
found org.joda#joda-convert;1.2 in list
found SparklineData#spark-druid-olap;0.0.2 in spark-packages
found org.apache.httpcomponents#httpclient;4.5 in central
found org.apache.httpcomponents#httpcore;4.4.1 in central
found commons-logging#commons-logging;1.2 in central
found commons-codec#commons-codec;1.9 in central
found org.json4s#json4s-ext_2.10;3.2.10 in central
found org.joda#joda-convert;1.6 in central
found com.github.scopt#scopt_2.10;3.3.0 in list
downloading http://dl.bintray.com/spark-packages/maven/SparklineData/spark-datetime/0.0.2/spark-datetime-0.0.2.jar ...
[SUCCESSFUL ] SparklineData#spark-datetime;0.0.2!spark-datetime.jar (426ms)
downloading http://dl.bintray.com/spark-packages/maven/SparklineData/spark-druid-olap/0.0.2/spark-druid-olap-0.0.2.jar ...
[SUCCESSFUL ] SparklineData#spark-druid-olap;0.0.2!spark-druid-olap.jar (501ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.5/httpclient-4.5.jar ...
[SUCCESSFUL ] org.apache.httpcomponents#httpclient;4.5!httpclient.jar (99ms)
downloading https://repo1.maven.org/maven2/org/json4s/json4s-ext_2.10/3.2.10/json4s-ext_2.10-3.2.10.jar ...
[SUCCESSFUL ] org.json4s#json4s-ext_2.10;3.2.10!json4s-ext_2.10.jar (19ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.1/httpcore-4.4.1.jar ...
[SUCCESSFUL ] org.apache.httpcomponents#httpcore;4.4.1!httpcore.jar (75ms)
downloading https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar ...
[SUCCESSFUL ] commons-logging#commons-logging;1.2!commons-logging.jar (18ms)
downloading https://repo1.maven.org/maven2/commons-codec/commons-codec/1.9/commons-codec-1.9.jar ...
[SUCCESSFUL ] commons-codec#commons-codec;1.9!commons-codec.jar (69ms)
downloading https://repo1.maven.org/maven2/org/joda/joda-convert/1.6/joda-convert-1.6.jar ...
[SUCCESSFUL ] org.joda#joda-convert;1.6!joda-convert.jar (21ms)
:: resolution report :: resolve 4244ms :: artifacts dl 1239ms
:: modules in use:
SparklineData#spark-datetime;0.0.2 from spark-packages in [default]
SparklineData#spark-druid-olap;0.0.2 from spark-packages in [default]
com.databricks#spark-csv_2.10;1.1.0 from list in [default]
com.github.nscala-time#nscala-time_2.10;1.6.0 from list in [default]
com.github.scopt#scopt_2.10;3.3.0 from list in [default]
com.univocity#univocity-parsers;1.5.1 from list in [default]
commons-codec#commons-codec;1.9 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
joda-time#joda-time;2.5 from list in [default]
org.apache.commons#commons-csv;1.1 from list in [default]
org.apache.httpcomponents#httpclient;4.5 from central in [default]
org.apache.httpcomponents#httpcore;4.4.1 from central in [default]
org.joda#joda-convert;1.6 from central in [default]
org.json4s#json4s-ext_2.10;3.2.10 from central in [default]
:: evicted modules:
org.joda#joda-convert;1.2 by [org.joda#joda-convert;1.6] in [default]
joda-time#joda-time;2.3 by [joda-time#joda-time;2.5] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 17 | 8 | 8 | 2 || 14 | 8 |
---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
module not found: com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d

==== local-m2-cache: tried

  file:/Users/yingyang/.m2/repository/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.pom

  -- artifact com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d!spark-datetime.jar:

  file:/Users/yingyang/.m2/repository/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.jar

==== local-ivy-cache: tried

  /Users/yingyang/.ivy2/local/com.github.SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/ivys/ivy.xml

==== central: tried

  https://repo1.maven.org/maven2/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.pom

  -- artifact com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d!spark-datetime.jar:

  https://repo1.maven.org/maven2/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.jar

==== spark-packages: tried

  http://dl.bintray.com/spark-packages/maven/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.pom

  -- artifact com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d!spark-datetime.jar:

  http://dl.bintray.com/spark-packages/maven/com/github/SparklineData/spark-datetime/bf5693a575a1dea5b663e4e8b30a0ba94c21d62d/spark-datetime-bf5693a575a1dea5b663e4e8b30a0ba94c21d62d.jar

    ::::::::::::::::::::::::::::::::::::::::::::::

    ::          UNRESOLVED DEPENDENCIES         ::

    ::::::::::::::::::::::::::::::::::::::::::::::

    :: com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d: not found

    ::::::::::::::::::::::::::::::::::::::::::::::

:::: ERRORS
unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver sbt-chain

unknown resolver null

unknown resolver sbt-chain

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

unknown resolver null

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.github.SparklineData#spark-datetime;bf5693a575a1dea5b663e4e8b30a0ba94c21d62d: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1068)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Setting up dataset as part of quick start on spark-druid throws json parser error

I have implemented the steps listed as part of quick start guide to test spark sql with druid. while executing a command to set index with raw data, I get a Json Object error. Could you please help identify what's causing the issue. I have imported jackson to parse json in spark. Below is the error message.

scala> sql("""
| CREATE TEMPORARY TABLE orderLineItemPartSupplier
| USING org.sparklinedata.druid
| OPTIONS (sourceDataframe "orderLineItemPartSupplierBase",
| timeDimensionColumn "l_shipdate",
| druidDatasource "tpch",
| druidHost "localhost",
| druidPort "8082",
| columnMapping '{ "l_quantity" : "sum_l_quantity", "ps_availqty" : "sum_ps_availqty", "cn_name" : "c_nation", "cr_name" : "c_region", "sn_name" : "s_nation", "sr_name" : "s_region" } ',
| functionalDependencies '[ {"col1" : "c_name", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_phone", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_mktsegment", "type" : "n-1"}, {"col1" : "c_name", "col2" : "c_comment", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_nation", "type" : "n-1"}, {"col1" : "c_nation", "col2" : "c_region", "type" : "n-1"} ] ',
| starSchema ' { "factTable" : "orderLineItemPartSupplier", "relations" : [] } ')
| """.stripMargin
| )
org.json4s.package$MappingException: Do not know how to convert JObject(List()) into class java.lang.String
at org.json4s.Extraction$.convert(Extraction.scala:559)
at org.json4s.Extraction$.extract(Extraction.scala:331)
at org.json4s.Extraction$.extract(Extraction.scala:42)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
at org.sparklinedata.druid.client.DruidClient.timeBoundary(DruidClient.scala:122)
at org.sparklinedata.druid.client.DruidClient.metadata(DruidClient.scala:130)
at org.sparklinedata.druid.metadata.DruidRelationInfo$.apply(DruidRelationInfo.scala:62)
at org.sparklinedata.druid.DefaultSource.createRelation(DefaultSource.scala:89)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.execution.datasources.CreateTempTableUsing.run(ddl.scala:93)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:61)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:63)
at $iwC$$iwC$$iwC$$iwC.(:65)
at $iwC$$iwC$$iwC.(:67)
at $iwC$$iwC.(:69)
at $iwC.(:71)
at (:73)
at .(:77)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Supporting next_day function for druid push down

The following SQL needs to get pushed down to druid. Currently it does not.

SELECT v.campaign_name AS campaign_name,
sum(v.conversions) AS sum_conversions_ok,
sum(v.impressions) AS sum_impressions_ok,
cast(date_add(next_day(cast(v.date_string AS timestamp),'SU'),-7) AS timestamp) AS twk_date_string_ok
FROM (
SELECT *
FROM sparkline_viewability_2_4 ) v
WHERE (
v.advertiser_name = 'AMEX Personal Savings')
GROUP BY v.campaign_name,
cast(date_add(next_day(cast(v.date_string AS timestamp),'SU'),-7) AS timestamp)

Rewrite to Druid not happening when tables are cached in Spark

The query rewrite to Druid is not happening when the spark tables are cached.

explain select
c_mktsegment,

  sum(l_extendedprice) as price
  from customer,
               orders,
               lineitem
  where  dateIsBefore(dateTime(`o_orderdate`),dateTime("1995-03-15")) and dateIsAfter(dateTime(`l_shipdate`),dateTime("1995-03-15"))
               and c_custkey = o_custkey
               and l_orderkey = o_orderkey
  group by c_mktsegment

== Physical Plan ==
image

handle spark datetime expressions in where clause on non time columns

"""SELECT
|sn_name,
|sum(l_extendedprice) as revenue
|FROM
|customer,
|orders,
|lineitem,
|partsupp,
|supplier,
|suppnation,
|suppregion
|WHERE
|c_custkey = o_custkey
|AND l_orderkey = o_orderkey
|and l_suppkey = ps_suppkey
|and l_partkey = ps_partkey
|and ps_suppkey = s_suppkey
|AND s_nationkey = sn_nationkey
|AND sn_regionkey = sr_regionkey
|AND sr_name = 'ASIA'
|AND o_orderdate >= date '1994-01-01'
|AND o_orderdate < date '1994-01-01' + interval '1' year
|GROUP BY
|sn_name
|ORDER BY
|revenue desc""".stripMargin

Sparklinedata connector errors while using with Spark1.6.0

We are doing a POC with Sparkline data and running queries against TPCH data. Using Sparklinedata connector with Spark1.6.0 causes the following error.

scala> sql("""
| CREATE TEMPORARY TABLE orderLineItemPartSupplier
| USING org.sparklinedata.druid
| OPTIONS (sourceDataframe "orderLineItemPartSupplierBase",
| timeDimensionColumn "l_shipdate",
| druidDatasource "tpch",
| druidHost "10.100.1.57",
| druidPort "8082",
| columnMapping '{ "l_quantity" : "sum_l_quantity", "ps_availqty" : "sum_ps_availqty", "cn_name" : "c_nation", "cr_name" : "c_region", "sn_name" : "s_nation", "sr_name" : "s_region" } ',
| functionalDependencies '[ {"col1" : "c_name", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_phone", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_mktsegment", "type" : "n-1"}, {"col1" : "c_name", "col2" : "c_comment", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_nation", "type" : "n-1"}, {"col1" : "c_nation", "col2" : "c_region", "type" : "n-1"} ] ',
| starSchema ' { "factTable" : "orderLineItemPartSupplier", "relations" : [] } ')
| """.stripMargin
| )
res3: org.apache.spark.sql.DataFrame = []

scala> sql("""
| select l_returnflag as r, l_linestatus as ls,
| count(*), sum(l_extendedprice) as s, max(ps_supplycost) as m, avg(ps_availqty) as a
| from orderLineItemPartSupplier
| group by l_returnflag, l_linestatus
| order by s, ls, r
| limit 3""".stripMargin
| ).show()
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/AggregateExpression
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$4$$anonfun$apply$1.applyOrElse(AggregateTransform.scala:87)
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$4$$anonfun$apply$1.applyOrElse(AggregateTransform.scala:87)
at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218)
at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:136)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:136)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:95)
at org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:136)
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$4.apply(AggregateTransform.scala:87)
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$4.apply(AggregateTransform.scala:87)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.sql.sources.druid.AggregateTransform$class.org$apache$spark$sql$sources$druid$AggregateTransform$$transformSingleGrouping(AggregateTransform.scala:87)
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$7$$anonfun$apply$10.apply(AggregateTransform.scala:202)
at org.apache.spark.sql.sources.druid.AggregateTransform$$anonfun$7$$anonfun$apply$10.apply(AggregateTransform.scala:201)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:90)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:89)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3$$anonfun$apply$1.apply(GenTraversableViewLike.scala:91)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:90)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:89)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:90)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:89)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.GenTraversableViewLike$Mapped$class.foreach(GenTraversableViewLike.scala:80)
at scala.collection.SeqViewLike$$anon$3.foreach(SeqViewLike.scala:78)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
at scala.collection.SeqViewLike$AbstractTransformed.to(SeqViewLike.scala:43)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.SeqViewLike$AbstractTransformed.toList(SeqViewLike.scala:43)
at org.apache.spark.sql.sources.druid.LimitTransfom$$anonfun$1.apply(DruidTransforms.scala:54)
at org.apache.spark.sql.sources.druid.LimitTransfom$$anonfun$1.apply(DruidTransforms.scala:40)
at org.apache.spark.sql.sources.druid.DruidPlanner$$anonfun$plan$1.apply(DruidPlanner.scala:41)
at org.apache.spark.sql.sources.druid.DruidPlanner$$anonfun$plan$1.apply(DruidPlanner.scala:41)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:90)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:89)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.GenTraversableViewLike$Mapped$class.foreach(GenTraversableViewLike.scala:80)
at scala.collection.SeqViewLike$$anon$3.foreach(SeqViewLike.scala:78)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
at scala.collection.SeqViewLike$AbstractTransformed.to(SeqViewLike.scala:43)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.SeqViewLike$AbstractTransformed.toList(SeqViewLike.scala:43)
at org.apache.spark.sql.sources.druid.LimitTransfom$$anonfun$1.apply(DruidTransforms.scala:61)
at org.apache.spark.sql.sources.druid.LimitTransfom$$anonfun$1.apply(DruidTransforms.scala:40)
at org.apache.spark.sql.sources.druid.DruidPlanner$$anonfun$plan$1.apply(DruidPlanner.scala:41)
at org.apache.spark.sql.sources.druid.DruidPlanner$$anonfun$plan$1.apply(DruidPlanner.scala:41)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:90)
at scala.collection.GenTraversableViewLike$FlatMapped$$anonfun$foreach$3.apply(GenTraversableViewLike.scala:89)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.GenTraversableViewLike$FlatMapped$class.foreach(GenTraversableViewLike.scala:89)
at scala.collection.SeqViewLike$$anon$4.foreach(SeqViewLike.scala:79)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
at scala.collection.SeqViewLike$AbstractTransformed.to(SeqViewLike.scala:43)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.SeqViewLike$AbstractTransformed.toList(SeqViewLike.scala:43)
at org.apache.spark.sql.sources.druid.DruidStrategy.apply(DruidStrategy.scala:84)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:394)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:355)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC.(:61)
at $iwC$$iwC.(:63)
at $iwC.(:65)
at (:67)
at .(:71)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.expressions.AggregateExpression
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 171 more

handle tableau pattern for quarter aggregation

Show by QUARTER in Tableau

SELECT CAST(CONCAT(YEAR(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), (CASE WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<4 THEN '-01' WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<7 THEN '-04' WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<10 THEN '-07' ELSE '-10' END), '-01 00:00:00') AS TIMESTAMP) AS tqr_l_shipdate_ok FROM ( select * from lineitembase ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) GROUP BY CAST(CONCAT(YEAR(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), (CASE WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<4 THEN '-01' WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<7 THEN '-04' WHEN MONTH(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP))<10 THEN '-07' ELSE '-10' END), '-01 00:00:00') AS TIMESTAMP)

/ by zero when run query of sample retail dataset

I got an error when I run "select count(*) from sp_demo_retail;" in beeline.

The error message is:

Error: java.lang.ArithmeticException: / by zero (state=,code=0)
java.sql.SQLException: java.lang.ArithmeticException: / by zero
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at org.apache.hive.beeline.Commands.execute(Commands.java:848)
at org.apache.hive.beeline.Commands.sql(Commands.java:713)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467)

Here is my ddl:

`CREATE TABLE sp_demo_retail_base (
invoiceno string
,stockcode string
,description string
, quantity bigint
, invoicedate string
, unitprice double
, customerid string
, country string
, count int
)
USING com.databricks.spark.csv
OPTIONS (path "/opt/retails.csv",
header "false", delimiter ",")

CREATE TABLE sp_demo_retail
USING org.sparklinedata.druid
OPTIONS (
sourceDataframe "sp_demo_retail_base",
timeDimensionColumn "invoicedate",
druidDatasource "retail",
druidHost "10.25.2.91",
zkQualifyDiscoveryNames "false",
queryHistoricalServers "true",
numSegmentsPerHistoricalQuery "1",
columnMapping '{ } ',
functionalDependencies '[] ',
starSchema ' { "factTable" : "sp_demo_retail_base", "relations" : [] } ')`

0: jdbc:hive2://localhost:10000/> explain select * from sp_demo_retail limit 10;
Getting log thread is interrupted, since query is done!
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| == Physical Plan == |
| Limit 10 |
| +- ConvertToSafe |
| +- Project [invoiceno#9,stockcode#10,description#11,quantity#12L,invoicedate#13,unitprice#14,customerid#15,country#16,count#17] |
| +- Scan DruidRelationInfo(fullName = DruidRelationName(sp_demo_retail_base,10.25.2.91,retail), sourceDFName = sp_demo_retail_base, |
| timeDimensionCol = invoicedate, |
| options = DruidRelationOptions(1000000,100000,true,true,true,30000,true,/druid,true,false,1,None))[invoiceno#9,stockcode#10,description#11,quantity#12L,invoicedate#13,unitprice#14,customerid#15,country#16,count#17] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
7 rows selected (0.144 seconds)
0: jdbc:hive2://localhost:10000/> explain select count(1) from sp_demo_retail;
Getting log thread is interrupted, since query is done!
+-------------------------------------------+--+
| plan |
+-------------------------------------------+--+
| == Physical Plan == |
| java.lang.ArithmeticException: / by zero |
+-------------------------------------------+--+
2 rows selected (0.069 seconds)
0: jdbc:hive2://localhost:10000/>

Support GBy queries with no Aggregates

The following query errors out. Query as seen on Spark UI monitoring.( 4040 port)
SELECT customer.c_mktsegment AS c_mktsegment FROM ( select * from lineitemindexed ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) GROUP BY customer.c_mktsegment

Job aborted due to stage failure: Task 0 in stage 18.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18.0 (TID 1061, localhost): UnknownReason + details
Job aborted due to stage failure: Task 0 in stage 18.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18.0 (TID 1061, localhost): UnknownReason
Driver stacktrace:

why we need sourceDataframe?

hi:
as my understanding, druid already indexed data. why we still have a sourceDataframe like orderLineItemPartSupplierBase which providing a data source path?
creating a table schema maping to druid internal schema that should be enough. so would somebody explain it?

Additional aggregation cols produce exception

select o_orderstatus as x, cast(o_orderdate as date) as y, count(*) as z
from orderLineItemPartSupplierBase
where o_orderdate <= '1993-06-30'
group by o_orderstatus, cast(o_orderdate as date)
order by x, y, z

Detailed logging

Sparkline should write to a log showing the SQL, and related druid queries and also write whether a query is using the index or not.

Filter on decimal data type cause java.util.NoSuchElementException: None.get

Filter on decimal data type cause java.util.NoSuchElementException: None.get error message.

For example:
select name, type, count(1) from table where decimal = 10 group by name, type; -> java.util.NoSuchElementException: None.get error message, but
select name, type, count(1) from table where decimal in (10) group by name, type; -> OK

This is true for the following operators: =, >=, <=, >, <

With Tungsten engine, all of the queries can run without error.

Spark version: 1.6.1

Using longsum vs count for queries.

select date_int, count(*) from MyTable group by date_int order by date_int produces the following JSON
The aggr type should be longsum to get the exact number of rows in the table.

......
"aggregations" : [ {
"jsonClass" : "FunctionAggregationSpec",
"type" : "count",
"name" : "alias-1",
"fieldName" : "count"
} ],

Is there an option available to use druid index for any kind of queries (including normal select * queries)

I have created a spark underlying table and druid data source table. I am planning not to store the raw data for underlying table and looking for an option to use druid index data for all kind queries. Please let me know if you such option to fetch data from druid for all queries.

Currently 'select * from dds_table' query fetching 0 result (Since no raw data stored) and
'select os, count (*) from dds_table group by os' fetching the actual result (fetching from druid).

Please suggest.

Query on the Last segment of a datasource is not returning the expected results

: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> explain select count(*) from moat_daily where cast(ts_local as timestamp) >= cast ("2016-08-19 00:00:00Z" as timestamp);
+-----------------------------------------------------------------------------------------------------------------------+--+
| plan |
+-----------------------------------------------------------------------------------------------------------------------+--+
| == Physical Plan == |
| Project [alias-3#740L AS _c0#739L] |
| +- SortBasedAggregate(key=[], functions=[(sum(alias-3#740L),mode=Complete,isDistinct=false)], output=[alias-3#740L]) |
| +- ConvertToSafe |
| +- TungstenExchange SinglePartition, None |
| +- ConvertToUnsafe |
| +- Scan DruidQuery(19982889): { |
| "q" : { |
| "jsonClass" : "TimeSeriesQuerySpec", |
| "queryType" : "timeseries", |
| "dataSource" : "moat_daily", |
| "intervals" : [ "2016-08-19T00:00:00.000Z/2016-08-19T00:00:01.000Z" ], |
| "granularity" : "all", |
| "aggregations" : [ { |
| "jsonClass" : "FunctionAggregationSpec", |
| "type" : "longSum", |
| "name" : "alias-3", |
| "fieldName" : "count" |
| } ] |
| }, |
| "useSmile" : true, |
| "queryHistoricalServer" : true, |
| "numSegmentsPerQuery" : 2, |
| "intervalSplits" : [ { |
| "start" : 1471564800000, |
| "end" : 1471564801000 |
| } ], |
| "outputAttrSpec" : [ { |
| "exprId" : { |
| "id" : 740, |
| "jvmId" : { } |
| }, |
| "name" : "alias-3", |
| "dataType" : { }, |
| "tf" : "toLong" |
| } ] |
| }[alias-3#740L] |
+-----------------------------------------------------------------------------------------------------------------------+--+
37 rows selected (0.049 seconds)

The interval end does not include the last segment (it should be 2016-08-20T00:00:00.000Z)

Using spark-druid-olap in an external project

I couldn't find documentation on how to use this custom DataSource implementation and optimizations in an external project.

(a) Is the jar file published to an external repo such that it can be referred to via the build.sbt file in my project?

OR

(b) Do I need to clone the repo and manually build the jar myself?

Thanks,
Jithin

handle tableau pattern for spark datetime filters on non time columns

----Date filters----
SELECT SUM(lineitem.l_extendedprice) AS sum_l_extendedprice_ok, CAST(CONCAT(TO_DATE(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), ' 00:00:00') AS TIMESTAMP) AS tdy_l_shipdate_ok FROM ( select * from lineitembase ) lineitem JOIN ( select * from orders ) orders ON (lineitem.l_orderkey = orders.o_orderkey) JOIN ( select * from customer ) customer ON (orders.o_custkey = customer.c_custkey) JOIN ( select * from custnation ) custnation ON (customer.c_nationkey = custnation.cn_nationkey) JOIN ( select * from custregion ) custregion ON (custnation.cn_regionkey = custregion.cr_regionkey) WHERE ((CAST(CONCAT(TO_DATE(orders.o_orderdate),' 00:00:00') AS TIMESTAMP) >= CAST('1993-05-19 00:00:00' AS TIMESTAMP)) AND (CAST(CONCAT(TO_DATE(orders.o_orderdate),' 00:00:00') AS TIMESTAMP) <= CAST('1998-08-02 00:00:00' AS TIMESTAMP))) GROUP BY CAST(CONCAT(TO_DATE(CAST(CONCAT(TO_DATE(lineitem.l_shipdate),' 00:00:00') AS TIMESTAMP)), ' 00:00:00') AS TIMESTAMP)

Support regex functions like RLIKE as in the one below.

SELECT viewability_5.campaign_name AS campaign_name, viewability_5.country AS country, viewability_5.creative_size AS creative_size FROM viewability2.viewability_5 viewability_5 WHERE ((viewability_5.advertiser_name = 'XXX') AND (CAST(viewability_5.date_string AS TIMESTAMP) >= CAST('2016-03-01 16:00:00' AS TIMESTAMP)) AND (CAST(viewability_5.date_string AS TIMESTAMP) <= CAST('2016-03-21 23:59:59' AS TIMESTAMP)) AND LOWER(viewability_5.line_name) RLIKE CONCAT('.', 'YYY', '.')) GROUP BY viewability_5.campaign_name, viewability_5.country, viewability_5.creative_size

Allow mapping a metric column to different metrics in Druid

For example suppose the raw data is capturing wind_speed and the index is at hourly grain. At the hourly level the index captures min, max, sum and count. Translation from sql must use these metrics appropriately.

druid's metrics are:
{
"type": "doubleMax",
"name": "?",
"fieldName": "wind_speed"
},
{
"type": "doubleMin",
"name": "?",
"fieldName": "wind_speed"
},
{
"type" : "doubleSum",
"name" : "?",
"fieldName": "wind_speed"
}
{
"type" : "longSum",
"name" : "?",
"fieldName": "wind_speed"
}

Avoid Druid Broker Bottleneck

Eliminate broker as bottleneck in cases where large amount of data needs to be pulled out from Druid for subsequent processing in spark. One possible solution is to talk directly to Historical nodes.

For example:
SELECT c_name,
bal,
sales_prospects_amount
FROM (SELECT c_name,
Sum(c_acctbal) bal
FROM orderlineitempartsupplier
GROUP BY c_name
HAVING Sum(c_acctbal) > 1000)r1
JOIN (SELECT cname,
Sum(sales_prospects_amount) AS sales_prospects_amount
FROM sales_leads
GROUP BY c_name) r2
ON r1.c_name = r2.cname

handle generic grouping expressions on a single dimension

SELECT avg(lineitem.l_extendedprice) AS avg_l_extendedprice_ok,
customers.c_mktsegment AS c_mktsegment,
custnation.cn_name AS cn_name,
custregion.cr_name AS cr_name,
(((year(cast(lineitem.l_commitdate AS timestamp)) * 10000) + (month(cast(lineitem.l_commitdate AS timestamp)) * 100)) + day(cast(lineitem.l_commitdate AS timestamp))) AS md_l_commitdate_ok,
cast((month(cast(lineitem.l_commitdate AS timestamp)) - 1) / 3 + 1 AS BIGINT) AS qr_l_commitdate_ok
FROM (
SELECT *
FROM lineitemindexed ) lineitem
JOIN
(
SELECT *
FROM orders ) orders
ON (
lineitem.l_orderkey = orders.o_orderkey)
JOIN
(
SELECT *
FROM customer ) customers
ON (
orders.o_custkey = customers.c_custkey)
JOIN
(
SELECT *
FROM custnation ) custnation
ON (
customers.c_nationkey = custnation.cn_nationkey)
JOIN
(
SELECT *
FROM custregion ) custregion
ON (
custnation.cn_regionkey = custregion.cr_regionkey)
GROUP BY customers.c_mktsegment,
custnation.cn_name,
custregion.cr_name,
(((year(cast(lineitem.l_commitdate AS timestamp)) * 10000) + (month(cast(lineitem.l_commitdate AS timestamp)) * 100)) + day(cast(lineitem.l_commitdate AS timestamp))),
cast((month(cast(lineitem.l_commitdate AS timestamp)) - 1) / 3 + 1 AS BIGINT)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.