sameeragarwal / blinkdb Goto Github PK
View Code? Open in Web Editor NEWBlinkDB: Sub-Second Approximate Queries on Very Large Data.
Home Page: http://blinkdb.cs.berkeley.edu/
License: Apache License 2.0
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Home Page: http://blinkdb.cs.berkeley.edu/
License: Apache License 2.0
Hi,
Any help would be appreciated!
I have followed the instruction https://github.com/sameeragarwal/blinkdb/wiki/Running-BlinkDB-on-a-Cluster to build blinkDB, however when I firstly using ant building the submodule hive_blinkdb to the line
"$ ant package"
the version is 0.2.0
$ cd blinkdb
$ git submodule init
$ git submodule update
$ cd hive_blinkdb
$ ant package
errors exit as:(I changed nothing of the blinkDB files and settings)
..................
ivy-retrieve:
[echo] Project: shims
compile:
[echo] Project: shims
[echo] Building shims 0.20
build_shims:
[echo] Project: shims
[echo] Compiling /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/common/java;/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20/java against hadoop 0.20.2 (/home/ljz/blinkDB/blinkdb/hive_blinkdb/build/hadoopcore/hadoop-0.20.2)
ivy-init-settings:
[echo] Project: shims
ivy-resolve-hadoop-shim:
[echo] Project: shims
[ivy:resolve] :: loading settings :: file = /home/ljz/blinkDB/blinkdb/hive_blinkdb/ivy/ivysettings.xml
ivy-retrieve-hadoop-shim:
[echo] Project: shims
[echo] Building shims 0.20S
build_shims:
[echo] Project: shims
[echo] Compiling /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/common/java;/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/common-secure/java;/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java against hadoop 1.0.0 (/home/ljz/blinkDB/blinkdb/hive_blinkdb/build/hadoopcore/hadoop-1.0.0)
ivy-init-settings:
[echo] Project: shims
ivy-resolve-hadoop-shim:
[echo] Project: shims
[ivy:resolve] :: loading settings :: file = /home/ljz/blinkDB/blinkdb/hive_blinkdb/ivy/ivysettings.xml
ivy-retrieve-hadoop-shim:
[echo] Project: shims
[javac] Compiling 13 source files to /home/ljz/blinkDB/blinkdb/hive_blinkdb/build/shims/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:21: error: package org.mortbay.jetty.bio does not exist
[javac] import org.mortbay.jetty.bio.SocketConnector;
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:22: error: package org.mortbay.jetty.handler does not exist
[javac] import org.mortbay.jetty.handler.RequestLogHandler;
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:23: error: package org.mortbay.jetty.webapp does not exist
[javac] import org.mortbay.jetty.webapp.WebAppContext;
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:34: error: package org.mortbay.jetty does not exist
[javac] private static class Server extends org.mortbay.jetty.Server implements JettyShims.Server {
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:27: error: Jetty20SShims is not abstract and does not override abstract method startServer(String,int) in JettyShims
[javac] public class Jetty20SShims implements JettyShims {
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:28: error: startServer(String,int) in Jetty20SShims cannot implement startServer(String,int) in JettyShims
[javac] public Server startServer(String listen, int port) throws IOException {
[javac] ^
[javac] return type org.apache.hadoop.hive.shims.Jetty20SShims.Server is not compatible with org.apache.hadoop.hive.shims.JettyShims.Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:34: error: org.apache.hadoop.hive.shims.Jetty20SShims.Server is not abstract and does not override abstract method stop() in org.apache.hadoop.hive.shims.JettyShims.Server
[javac] private static class Server extends org.mortbay.jetty.Server implements JettyShims.Server {
[javac] ^
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:36: error: cannot find symbol
[javac] WebAppContext wac = new WebAppContext();
[javac] ^
[javac] symbol: class WebAppContext
[javac] location: class Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:36: error: cannot find symbol
[javac] WebAppContext wac = new WebAppContext();
[javac] ^
[javac] symbol: class WebAppContext
[javac] location: class Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:39: error: cannot find symbol
[javac] RequestLogHandler rlh = new RequestLogHandler();
[javac] ^
[javac] symbol: class RequestLogHandler
[javac] location: class Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:39: error: cannot find symbol
[javac] RequestLogHandler rlh = new RequestLogHandler();
[javac] ^
[javac] symbol: class RequestLogHandler
[javac] location: class Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:47: error: cannot find symbol
[javac] SocketConnector connector = new SocketConnector();
[javac] ^
[javac] symbol: class SocketConnector
[javac] location: class Server
[javac] /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Jetty20SShims.java:47: error: cannot find symbol
[javac] SocketConnector connector = new SocketConnector();
[javac] ^
[javac] symbol: class SocketConnector
[javac] location: class Server
[javac] Note: /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: /home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 13 errors
[javac] 1 warning
BUILD FAILED
/home/ljz/blinkDB/blinkdb/hive_blinkdb/build.xml:319: The following error occurred while executing this line:
/home/ljz/blinkDB/blinkdb/hive_blinkdb/build.xml:169: The following error occurred while executing this line:
/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/build.xml:92: The following error occurred while executing this line:
/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/build.xml:95: The following error occurred while executing this line:
/home/ljz/blinkDB/blinkdb/hive_blinkdb/shims/build.xml:84: Compile failed; see the compiler error output for details.
Hi,
We are in the process of evaluating BlinkDB for supporting interactive count(*) queries on a table with various filters.
As per https://github.com/sameeragarwal/blinkdb/wiki/Running-BlinkDB-on-a-Cluster , we need to "Copy the Spark and BlinkDB directories to slaves"
But from few talks i saw for BlinkDB, looks like only spark client needs to be modified, so that it modifies query plan to use sample tables instead of original table. That means it is necessary for us to build & install BlinkDB on only machine.
Please correct me if i am wrong about this.
Since our organization is huge, it would be difficult to ask infrastructure team to apply patch on spark for supporting BlinkDB. We are using Spark 2.1.0 version currently. It will be easier for us to ask our infrastructure team to upgrade Spark Version.
So, wondering by when native support of BlinkDB will be available in Spark.
Also, would be grateful if one can point us to documentation for creating proof of concept around BlinkDB.
When I run simple SQL queries like
Select approx_count(_) from table1 where A = 'xx' _within 0.1 seconds*
I run into an error: Parse Error: mismatched input 'within' expecting EOF near ''table1''
Does blinkdb support clauses like "within xx seconds" as shown in the paper as running examples?
Blinkdb will fail if using SERDE, for example
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
JsonSerDe class will not be found, because blinkdb will convert the command to
ROW FORMAT SERDE 'org.openx.data.jsonserde.jsonserde'
Suggest to modify SharkDriver.scala line 175.
15/12/14 18:06:03 ERROR hive.log: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:248)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:114)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1009)
at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49)
at shark.SharkCliDriver.(SharkCliDriver.scala:273)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException Server IPC version 9 cannot communicate with client version 4)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1011)
at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49)
at shark.SharkCliDriver.(SharkCliDriver.scala:273)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException Server IPC version 9 cannot communicate with client version 4)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:785)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:106)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:248)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:114)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1009)
... 4 more
Please add a cluster setup wiki page (standalone/ec2/etc.) and license file.
Thanks
I was trying to run the approx functions locally. However, they behaved differently from what's one the tutorial. The approx sum needs 3 arguments and I am not sure what to put in order to approximate. Can someone help me use them? Also, I cannot use error, samplewith...
shark> describe rand5000;
OK
numbers int
Time taken: 0.143 seconds
shark> select approx_sum(numbers) from rand5000;
62.169: [GC (Metadata GC Threshold) 179982K->30429K(1005056K), 0.0271976 secs]
62.196: [Full GC (Metadata GC Threshold) 30429K->19981K(1005056K), 0.0866642 secs]
63.093: [GC (System.gc()) 45909K->21763K(1005056K), 0.0042580 secs]
63.097: [Full GC (System.gc()) 21763K->10910K(1005056K), 0.1983384 secs]
63.299: [GC (System.gc()) 15782K->11078K(1005056K), 0.0017398 secs]
63.301: [Full GC (System.gc()) 11078K->9947K(1005056K), 0.0761869 secs]
FAILED: Error in semantic analysis: Exactly one argument is expected.
shark> select approx_sum(numbers, numbers, numbers) from rand5000;
130.911: [GC (Allocation Failure) 264388K->30538K(1005056K), 0.0253560 secs]
132.398: [GC (Allocation Failure) 292682K->51941K(1005056K), 0.0284526 secs]
132.821: [GC (Allocation Failure) 309431K->30135K(1005056K), 0.0112377 secs]
OK
2471570.0 +/- NaN (99% Confidence)
Time taken: 4.154 seconds
Are there plans to have Blink working with Spark 1.0.0/1.0.1?
Hi!
Have you got any experience deploying on CDH5 cluster with the Spark available in there?
Is the standalone Spark strictly necessary?
Thanks
I thought BlinkDB was a very promising project. I do not see any activity on this fork or any other fork more recent than June 2014.
What's the current status? Is this project dead?
Hey,
I am trying to build blinkdb using command,
sbt/sbt package
but it keeps throwing following error,
[info] Resolving org.scala-tools.testing#test-interface;0.5 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.spark#spark-core_2.9.3;0.8.0-SNAPSHOT: not found
[warn] :: org.apache.spark#spark-repl_2.9.3;0.8.0-SNAPSHOT: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.9.3;0.8.0-SNAPSHOT: not found
unresolved dependency: org.apache.spark#spark-repl_2.9.3;0.8.0-SNAPSHOT: not found
I tried building my spark 0.8.0 again using following command,
SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 sbt/sbt publish-local
but all in vain.
Any help.?
@sameeragarwal
I use BlinkDB CLI,and run simple SQL queries like:
select count(1) from src within 1 seconds;
I run into an error:
**FAILED: Parse Error: line 1:25 Failed to recognize predicate 'within'. Failed rule: 'kwInner' in join type specifier
**
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.