nexr / rhive Goto Github PK
View Code? Open in Web Editor NEWRHive is an R extension facilitating distributed computing via Apache Hive.
Home Page: http://nexr.github.io/RHive
RHive is an R extension facilitating distributed computing via Apache Hive.
Home Page: http://nexr.github.io/RHive
I recently install Hive(0.11.0) and Hadoop(1.2.1) on ubuntu(13.10 x64). R(3.0.1 "Good Sport") is installed correctly and working well with other packages. I also installed Rhive(2.0.0) package, when I tried connecting to hive from R it is showing error message as shown below. please help me on the same.
reported from Haven
design distributed aggregate function similar to R aggregate function
implement RHive api to connect hdfs and to read/write data to hdfs.
Hi, this is what I get after installing version 2.0.0
rhive.init()
rhive.env()
hadoop home: /usr/local/hadoop
hive home: /usr/local/hiverhive.connect(host='master',port='10000')
Error: class not found
Please consider that version 0.0.7 (the previous version I used) worked just fine. The environment I got from 0.0.7 was
Hive Home Directory : /usr/local/hive
Hadoop Home Directory : /usr/local/hadoop
Hadoop Conf Directory :
Default RServe List
master slave1 slave2 slave3 slave4 slave5 slave6 slave7 slave8 slave9 slave10master : RHIVE_DATA = /home/hduser/RData/
slave1 : RHIVE_DATA = /home/hduser/RData
slave2 : RHIVE_DATA = /home/hduser/RData
slave3 : RHIVE_DATA = /home/hduser/RData
slave4 : RHIVE_DATA = /home/hduser/RData
slave5 : RHIVE_DATA = /home/hduser/RData
slave6 : RHIVE_DATA = /home/hduser/RData
slave7 : RHIVE_DATA = /home/hduser/RData
slave8 : RHIVE_DATA = /home/hduser/RData
slave9 : RHIVE_DATA = /home/hduser/RData
slave10 : RHIVE_DATA = /home/hduser/RData
Connected HiveServer : master:10000
I am getting the following error with Hive-0.9.0:11:
Error in rdata[[i]] : subscript out of bounds
traceback() reveals:
3: FUN(X[[25L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
What does this error mean? Can this be fixed? Thanks in advance!
RHive should handle huge result from Hive.
Hello All,
I installed RHive-0.7 and CDH4.4,rhive.connect() is worked.
when I run 'rhive.exportAll('scoring')',its print the Error as follow:
rhive.exportAll('scorint')
Error in RSeval(rcon, command) : remote evaluation failed
In addition: Warning messages:
1: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
2: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
I want to know what the problem ......I'm looking forward to your reply.
Hi,
I met an error when I build a package which have dependency on RHive 0.0-3 as like below
installing source package ‘clog’ ...
** R
** data
** inst
** preparing package for lazy loading
Warning in file(file, "rt") :
cannot open file '/srv/clog/hadoop-0.20.203.0/conf/slaves': No such file or directory
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
Error : package ‘RHive’ could not be loaded
ERROR: lazy loading failed for package ‘clog’
Above error is occurred when I run 'R CMD check packagefile' and I can't finish building package.
I thinks it's cause by returning error code when RHive is loaded by library.
I have used rhive.query with no problem. But when I use rhive.write.table(myTableName), the following error occurs:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: File myTableName.rhive does not exist.
[1] "myTableName"
It created an empty table with correct column names. I double checked that the class of myTableName is a data frame.
Does anyone come across this problem? Thanks in advance.
In mrapply, mapapply and reduceapply, user can use custom environment.
user can use this custom environment in mapper and reduce function.
In RUDF and RUDAF, null data is passed to R function. but R can not handle this data because null in R is not null but NULL.
[root@hadoop RHive]# ant build
Buildfile: build.xml
compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone who can help me??
Windows setup guide is necessary. because many people use windows as their operating system. we need test and make a guide for that people.
design distributed apply function similar to R apply function
Error in 1:listStatus$length : argument of length 0
though i can run select query and rhive.desc.table('table') commands
develop RHive apply function using RUDF.
We designed two apply function depended on return type.
these functions syntax is below :
[n|s]apply(hive-tablename, FUN, col1, ...)
Hi,
I had a serious problem when I try to use RHive on just recovered custler.
The problem was rhive.connect was not finished never for long long time.
I figured it out, the cause of problem is that mysql server for Hive was down.
I am not sure this problem can be solved in RHive or not but anyway, I think timeout parameter may be necessary with default value in rhive.connect() function.
Thanks.
rhive.write.table(iris)
[1] "iris"
rhive.desc.table("iris")
col_name data_type comment
1 rowname string
2 sepallength double
3 sepalwidth double
4 petallength double
5 petalwidth double
6 species string
rhive.napply('iris', function(column1) { column1 * 10}, 'sepallength')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:40 cannot recognize input near 'CREATE' 'TABLE' 'iris_napply1328157031_table' in select clause
, errorCode:11, SQLState:42000)
plz check.
recommanding use rhive.big.query function
Hello,
When I do a simple
rhive.query("select * from X limit 10000")
it takes 90s to answer once the query is completed on the hiveserver (OK displayed on the console).
It increases linearily with data size, always exactly 9 ms per line, it does not depend on the line length.
It is several order of magnitude slower than any other kind of data transfer between R and whatever. My guess is that there is some kind of timeout somewhere.
rhive.connect makes long message when I use.
I don't know what is the mean of the message.
can you guys make a way to save these to log file to hide the hard message and can you explain what is this?
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive-0.7.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-0.20.203.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
rhive use hive-trift service so error log drop into hive-trift's stdout
this cause inconvenience to user
Hi There,
Is the RHive will work with hiveserver2 which is enabled with kerberos security ?
When I try to connect to Hive I am getting following exception in my R Studio console
rhive.connect(host="hostname.domain.com/default;principal=hive/[email protected]",defaultFS="hdfs://namenode.domain.com:8020/user/me",hiveServer2=TRUE)
14/05/05 15:10:52 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "Thread-35" java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: hive/[email protected]:10000/default
at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:64)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:198)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:138)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:123)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at com.nexr.rhive.hive.DatabaseConnection.connect(DatabaseConnection.java:51)
at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.connect(HiveJdbcClient.java:330)
at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.run(HiveJdbcClient.java:322)
Error: java.lang.IllegalStateException: Not connected to hiveserver
Thanks,
Prabhu.
R-UDF function should take multi type variables as arguments.
1.query : select logdata from ulog limit1
2.hive console result
{"{"body":"SEQ_ID":"20120709160001430307","HOST_NAME":"u2dlpweb01","LOG_TIME":"20120709160001","REQ_TIME":"20120709160001","LOG_KIND":"SVC","KT_USER_ID":"","KT_SVC_ID":"X","SESSION_KEY":"","FILE_ID":"X","RT_CODE":"1","DIVIDE1":"211.55.29.102","DIVIDE2":"http://gate2.ucloud.com/api/1/pcclient/pcauth","DIVIDE3":"POST","DIVIDE4":"200","DIVIDE5":"0000","DIVIDE6":"WIN","DIVIDE7":"7","DIVIDE8":"uCloud","DIVIDE9":"1.0.2","DIVIDE10":"personal","DIVIDE11":"GATEWAY","DIVIDE12":"4000","DIVIDE13":"X","DIVIDE14":"X","DIVIDE15":"X","DIVIDE16":"X","DIVIDE17":"X","DIVIDE18":"uCloud/1.0.2 WIN/7 PC personal","DIVIDE19":"X","DIVIDE20":"X","DIVIDE21":"X","DIVIDE22":"X","DIVIDE23":"X","DIVIDE24":"X","DIVIDE26":"X","DIVIDE26":"X","DIVIDE27":"X","DIVIDE28":"X","DIVIDE29":"X","DIVIDE30":"X","timestamp":1341820971766,"pri":"INFO","nanos":784667370878488,"host":"u2dlpweb01","fields":{"AckTag":"20120709-170250760+0900.784666364804488.00000018","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000:\u001F짰I","tailSrcFile":"ucloud-003.log","rolltag":"20120709-170444530+0900.524399892735665.00000020"}}"}
Time taken: 14.89 seconds
3.Rstudio rhive.query
rhive.query('select logdata from ulog limit 1')
logdata
1 NA
Warning message:
NAs introduced by coercion
4.ulog table script
CREATE EXTERNAL TABLE IF NOT EXISTS ulog (
logdata MAP<STRING,STRING>
)
PARTITIONED BY(logdt STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '='
LOCATION '/ucloud/collected/ucloudpersonal'
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)
Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
My rhive package version is RHive_0.0-6.
Need rhive.hdfs.chown, rhive.hdfs.chmod functions (not much strongly)
rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
Error in rdata[[i]] : subscript out of bounds
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coerciontraceback()
3: FUN(X[[1L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
1: rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
please check.
rhive.mrapply
function (tablename, mapperFUN, reducerFUN, mapinput = NULL,
mapoutput = NULL, by = NULL, reduceinput = NULL, reduceoutput = NULL,
mapper_args = NULL, reducer_args = NULL, buffersize = -1L,
verbose = FALSE, hiveclient = rhive.defaults("hiveclient"))
.....
...
..
Plz modify function parameters especially all remains after reducerFUN to be neglectable parameters.
like "rhive.mrapply("weights", map, reduce)" to apply map for all columns.
I am trying to use RHive with CDH4. And at rhive.connect() it gives me the following error-
WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Error in .jfindClass(as.character(class)) : class not found
Any ideas on this?
I download the RHive 2.0 package,cd into the work directory,then run 'ant build' ,it faild:
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml
compile:
[mkdir] Created dir: /opt/RHive/build/classes
[javac] Compiling 21 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:44: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!stat.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:115: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!src.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:147: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] length[i] = items[i].isDir() ? fs.getContentSummary(items[i].getPath()).getLength() : items[i].getLen();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/JobManager.java:33: warning: [deprecation] getUsedMemory() in org.apache.hadoop.mapred.ClusterStatus has been deprecated
[javac] clusterStatus.getUsedMemory();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
[javac] 4 warnings
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Total time: 3 seconds
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml
compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone can help me ?
Hello there. I recently started using your RHive package, and have been mostly happy so far. One concern I have is with code of "rhive.hdfs.connect". It tries to make changes in the root HDFS directory. In most real systems, including ours, it would be disallowed. Would it make sense to make changes in /tmp instead of /? I had to change your code to make it work with our system. Thanks!
= Yakov
I'm trying to work with RHive on Amazon EMR and I'm getting an error with rhive.connect, but the connection seems to be working:
{code}
library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-6. For overview type ‘?RHive’.
HIVE_HOME=/home/hadoop/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
rhive.connect(port=10003)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2012-10-16 21:15:43,446 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:121)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:225)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:190)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1330)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1348)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:246)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,497 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,517 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,546 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,563 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3079)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:598)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:548)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:529)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,593 INFO [LeaseChecker] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1235)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1247)
at java.lang.Thread.run(Thread.java:662)
2012-10-16 21:15:43,609 INFO [Thread-7] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3338)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3202)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2415)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2656)
2012-10-16 21:15:43,656 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,659 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,664 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,668 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,681 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3725)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3640)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:96)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
{code}
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
- coefficient * sal
- }
rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)
Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
recommand use rhive.big.query function
Hive provides user to use any script in hive query. Hive uses hadoop streaming to provide it.
RHive can provide function to run R script in HQL with this Hive function.
I tried to connect to a HiveServer2 instance, and it remained stuck at the rhive.connect() call.
I was able to connect to a HiveServer1 instance.
However, the latest Cloudera Manager (4.5) has support for managing only HiveServer2.
Are there any plans for HiveServer2 support?
I have a table created using "ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';".
While rhive.query("SELECT * FROM serdetable") works,
selecting special column rhive.query("SELECT col1 FROM serdetable")
returns
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
I tried to run the same query directly from hive shell and it works - which means that the jar that contains org.openx.data.jsonserde.JsonSerDe class was loaded by the hive.
I have to mention that trying on other table created with default Regex SerDe returns the same error.
Any help would be appriciated!
It's hard to debug and find error poins.
We need an appropriate way to catch the remote debug messages and It should works for dedicate account session.
rhive.write.table(editweights)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:575 mismatched input '0' expecting Identifier near ',' in column specification
, errorCode:11, SQLState:42000)
head(editweights)
V1 a A b B c C d D e E f F g G h H i I j J
1 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1
2 -1 0.0 -0.3 -2.0 -3.0 -1.5 -3.0 -1.0 -3.0 -1.0 -1 -1.5 -3.0 -2.0 -3.0 -2.0 -3.0 -2.0 -3 -2 -3
3 -1 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -3
4 -1 -2.0 -3.0 0.0 -0.3 -1.0 -3.0 -1.5 -3.0 -2.0 -2 -1.0 -3.0 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1 -3
5 -1 -3.0 -3.0 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -0.5 -0.5 -0.5 -0.5 -3.0 -3 -3 -3
6 -1 -1.5 -3.0 -1.0 -3.0 0.0 -0.3 -0.5 -0.5 -0.5 -1 -0.5 -0.5 -1.0 -3.0 -1.5 -3.0 -2.0 -3 -2 -3
k K l L m M n N o O p P q Q r R s S t T u U v
1 -1.0 -1 -1 -1 -1 -1 -1.0 -1.0 -1 -1 -1 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1.0
2 -2.0 -3 -2 -3 -2 -3 -2.0 -3.0 -2 -2 -2 -2 -0.5 -0.5 -1.5 -1.5 -0.5 -0.5 -2.0 -2.0 -2 -3 -2.0
3 -3.0 -3 -3 -3 -3 -3 -3.0 -3.0 -3 -3 -3 -3 -0.5 -0.5 -3.0 -3.0 -0.5 -0.5 -3.0 -3.0 -3 -3 -3.0
4 -1.5 -3 -2 -3 -1 -3 -0.5 -0.5 -2 -2 -2 -2 -2.0 -2.0 -1.5 -1.5 -2.0 -3.0 -1.0 -1.0 -1 -3 -0.5
5 -3.0 -3 -3 -3 -3 -3 -0.5 -0.5 -3 -3 -3 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -0.5
6 -2.0 -3 -2 -3 -2 -3 -1.5 -3.0 -2 -2 -2 -2 -2.0 -2.0 -1.0 -1.0 -1.0 -3.0 -0.5 -0.5 -2 -3 -0.5
V w W x X y Y z Z 0 1 2 3 4 5 6 7 8 9
1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0
2 -3.0 -0.5 -0.5 -1.0 -3.0 -2.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
3 -3.0 -0.5 -0.5 -3.0 -3.0 -3.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
4 -0.5 -2.0 -2.0 -1.5 -3.0 -1.0 -3 -2.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
5 -0.5 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
6 -0.5 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
_ $ #
1 -0.5 -0.15 -0.2
2 -1.5 -100.00 -100.0
3 -1.5 -100.00 -100.0
4 -1.5 -100.00 -100.0
5 -1.5 -100.00 -100.0
6 -1.5 -100.00 -100.0
plz fix as soon as possible.
issue case:
aaa = rhive.big.query("select *,
CASE
WHEN petallength < 2.45 THEN 'first'
WHEN petallength >= 2.45 THEN 'second'
END as separation
from iris_3")
expected output:
1 5.1 3.5 1.4 0.2 setosa first
2 4.9 3.0 1.4 0.2 setosa first
.
.
50 5.0 3.3 1.4 0.2 setosa first
51 7.0 3.2 4.7 1.4 versicolor second
.
.
When I run any kind of query, the results returned by HiveClient are not tab-separated, but delimitted with the literal string "\001" resulting in improper results. Is this happening to anyone else?
Hi,
We are trying to execute the exemples (https://github.com/nexr/RHive/wiki/RHive-example-code).
When trying to execute the query, our jobs failed with a KryoException.
It seems that an UDF instance is serialized even it contained converters that are not designed for serialization (no default constructor).
We are using Hadoop 2.2 and hive 0.12 (Horton distribution).
Are those examples still correct ?
Do you have an idea of the cause of our error ?
Regards,
Philippe
The exemple :
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}
rhive.assign('coefficient',coefficient)
rhive.assign('scoring',scoring)
rhive.exportAll(‘scoring’)
rhive.query("select R('scoring',col_sal,0.0) from emp")
Exception :
Error: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:314)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:263)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:376)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:552)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109)
at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:810)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:720)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:733)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:287)
... 13 more
Need progress bar or verbose message, in order to know ETA when we fetch big data from Hive query result using "rhive.query" function. Sometimes, the "rhive.query" takes too long time. need a sort of indicator.
develop RHive aggregate function similar to R aggregate function
this function syntax is
FUN(table-name, hiveFUN, col, ..., groups)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.