nexr / rhive Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 63.0 3.72 MB

RHive is an R extension facilitating distributed computing via Apache Hive.

Home Page: http://nexr.github.io/RHive

R 50.49% Java 49.51%

rhive's People

Contributors

Stargazers

Watchers

rhive's Issues

Connection error with 2.0.0

I recently install Hive(0.11.0) and Hadoop(1.2.1) on ubuntu(13.10 x64). R(3.0.1 "Good Sport") is installed correctly and working well with other packages. I also installed Rhive(2.0.0) package, when I tried connecting to hive from R it is showing error message as shown below. please help me on the same.

rhive.big.query function cause a number overflow against what rhive.query do.

reported from Haven

design aggregate function

design distributed aggregate function similar to R aggregate function

integrate hadoop hdfs

implement RHive api to connect hdfs and to read/write data to hdfs.

version 2.0.0 Error at init

Hi, this is what I get after installing version 2.0.0

rhive.init()
rhive.env()
hadoop home: /usr/local/hadoop
hive home: /usr/local/hive

rhive.connect(host='master',port='10000')
Error: class not found

Please consider that version 0.0.7 (the previous version I used) worked just fine. The environment I got from 0.0.7 was

Hive Home Directory : /usr/local/hive
Hadoop Home Directory : /usr/local/hadoop
Hadoop Conf Directory :
Default RServe List
master slave1 slave2 slave3 slave4 slave5 slave6 slave7 slave8 slave9 slave10master : RHIVE_DATA = /home/hduser/RData/
slave1 : RHIVE_DATA = /home/hduser/RData
slave2 : RHIVE_DATA = /home/hduser/RData
slave3 : RHIVE_DATA = /home/hduser/RData
slave4 : RHIVE_DATA = /home/hduser/RData
slave5 : RHIVE_DATA = /home/hduser/RData
slave6 : RHIVE_DATA = /home/hduser/RData
slave7 : RHIVE_DATA = /home/hduser/RData
slave8 : RHIVE_DATA = /home/hduser/RData
slave9 : RHIVE_DATA = /home/hduser/RData
slave10 : RHIVE_DATA = /home/hduser/RData

Connected HiveServer : master:10000

rhive.query resulting in subscript out of bounds

I am getting the following error with Hive-0.9.0:11:
Error in rdata[[i]] : subscript out of bounds

traceback() reveals:

3: FUN(X[[25L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})

What does this error mean? Can this be fixed? Thanks in advance!

handle big result from Hive

RHive should handle huge result from Hive.

rhive.exportAll('scoring') Error

Hello All,
I installed RHive-0.7 and CDH4.4,rhive.connect() is worked.
when I run 'rhive.exportAll('scoring')',its print the Error as follow:

rhive.exportAll('scorint')
Error in RSeval(rcon, command) : remote evaluation failed
In addition: Warning messages:
1: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
2: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2

I want to know what the problem ......I'm looking forward to your reply.

Checking Hadoop slaves shoud be added

Hi,
I met an error when I build a package which have dependency on RHive 0.0-3 as like below

installing source package ‘clog’ ...
** R
** data
** inst
** preparing package for lazy loading
Warning in file(file, "rt") :
cannot open file '/srv/clog/hadoop-0.20.203.0/conf/slaves': No such file or directory
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
Error : package ‘RHive’ could not be loaded
ERROR: lazy loading failed for package ‘clog’

removing ‘/Users/aidenhong/Documents/workspace/cog_git/src/main/R/cog.Rcheck/clog’

Above error is occurred when I run 'R CMD check packagefile' and I can't finish building package.
I thinks it's cause by returning error code when RHive is loaded by library.

rhive.write.table error

I have used rhive.query with no problem. But when I use rhive.write.table(myTableName), the following error occurs:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: File myTableName.rhive does not exist.
[1] "myTableName"

It created an empty table with correct column names. I double checked that the class of myTableName is a data frame.

Does anyone come across this problem? Thanks in advance.

support custom environment using mrapply

In mrapply, mapapply and reduceapply, user can use custom environment.
user can use this custom environment in mapper and reduce function.

convert java null to R NULL

In RUDF and RUDAF, null data is passed to R function. but R can not handle this data because null in R is not null but NULL.

ant build Faild

[root@hadoop RHive]# ant build
Buildfile: build.xml

compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Is there anyone who can help me??

Windows setup guide is necessary

Windows setup guide is necessary. because many people use windows as their operating system. we need test and make a guide for that people.

design apply function

design distributed apply function similar to R apply function

rhive.load.table('table') command throwing following error:

Error in 1:listStatus$length : argument of length 0

though i can run select query and rhive.desc.table('table') commands

implement apply function

develop RHive apply function using RUDF.
We designed two apply function depended on return type.

napply : numeric type
sapply : string type

these functions syntax is below :
[n|s]apply(hive-tablename, FUN, col1, ...)

Timeout checking in rhive.connect()

Hi,
I had a serious problem when I try to use RHive on just recovered custler.
The problem was rhive.connect was not finished never for long long time.
I figured it out, the cause of problem is that mysql server for Hive was down.
I am not sure this problem can be solved in RHive or not but anyway, I think timeout parameter may be necessary with default value in rhive.connect() function.

Thanks.

rhive.napply

rhive.write.table(iris)
[1] "iris"
rhive.desc.table("iris")
col_name data_type comment
1 rowname string
2 sepallength double
3 sepalwidth double
4 petallength double
5 petalwidth double
6 species string
rhive.napply('iris', function(column1) { column1 * 10}, 'sepallength')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:40 cannot recognize input near 'CREATE' 'TABLE' 'iris_napply1328157031_table' in select clause
, errorCode:11, SQLState:42000)

plz check.

rhive.query function show warning message when query result is too big

recommanding use rhive.big.query function

RHive query results transfer extremely slow

Hello,

When I do a simple
rhive.query("select * from X limit 10000")

it takes 90s to answer once the query is completed on the hiveserver (OK displayed on the console).

It increases linearily with data size, always exactly 9 ms per line, it does not depend on the line length.

It is several order of magnitude slower than any other kind of data transfer between R and whatever. My guess is that there is some kind of timeout somewhere.

rhive.connect makes long log messages

rhive.connect makes long message when I use.
I don't know what is the mean of the message.
can you guys make a way to save these to log file to hide the hard message and can you explain what is this?
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive-0.7.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-0.20.203.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

rhive omit HIVE error log

rhive use hive-trift service so error log drop into hive-trift's stdout
this cause inconvenience to user

RHive with hiveserver2 + kerberos

Hi There,

Is the RHive will work with hiveserver2 which is enabled with kerberos security ?

When I try to connect to Hive I am getting following exception in my R Studio console

rhive.connect(host="hostname.domain.com/default;principal=hive/[email protected]",defaultFS="hdfs://namenode.domain.com:8020/user/me",hiveServer2=TRUE)
14/05/05 15:10:52 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "Thread-35" java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: hive/[email protected]:10000/default
        at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:64)
        at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:198)
        at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:138)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:123)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:185)
        at com.nexr.rhive.hive.DatabaseConnection.connect(DatabaseConnection.java:51)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.connect(HiveJdbcClient.java:330)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.run(HiveJdbcClient.java:322)
Error: java.lang.IllegalStateException: Not connected to hiveserver

Thanks,
Prabhu.

support multi type arguments of R-UDF function

R-UDF function should take multi type variables as arguments.

rhive.query function dose not handle "hive map table" properly

1.query : select logdata from ulog limit1

2.hive console result
{"{"body":"SEQ_ID":"20120709160001430307","HOST_NAME":"u2dlpweb01","LOG_TIME":"20120709160001","REQ_TIME":"20120709160001","LOG_KIND":"SVC","KT_USER_ID":"","KT_SVC_ID":"X","SESSION_KEY":"","FILE_ID":"X","RT_CODE":"1","DIVIDE1":"211.55.29.102","DIVIDE2":"http://gate2.ucloud.com/api/1/pcclient/pcauth","DIVIDE3":"POST","DIVIDE4":"200","DIVIDE5":"0000","DIVIDE6":"WIN","DIVIDE7":"7","DIVIDE8":"uCloud","DIVIDE9":"1.0.2","DIVIDE10":"personal","DIVIDE11":"GATEWAY","DIVIDE12":"4000","DIVIDE13":"X","DIVIDE14":"X","DIVIDE15":"X","DIVIDE16":"X","DIVIDE17":"X","DIVIDE18":"uCloud/1.0.2 WIN/7 PC personal","DIVIDE19":"X","DIVIDE20":"X","DIVIDE21":"X","DIVIDE22":"X","DIVIDE23":"X","DIVIDE24":"X","DIVIDE26":"X","DIVIDE26":"X","DIVIDE27":"X","DIVIDE28":"X","DIVIDE29":"X","DIVIDE30":"X","timestamp":1341820971766,"pri":"INFO","nanos":784667370878488,"host":"u2dlpweb01","fields":{"AckTag":"20120709-170250760+0900.784666364804488.00000018","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000:\u001F짰I","tailSrcFile":"ucloud-003.log","rolltag":"20120709-170444530+0900.524399892735665.00000020"}}"}
Time taken: 14.89 seconds

3.Rstudio rhive.query
rhive.query('select logdata from ulog limit 1')
logdata
1 NA
Warning message:
NAs introduced by coercion

4.ulog table script
CREATE EXTERNAL TABLE IF NOT EXISTS ulog (
logdata MAP<STRING,STRING>
)
PARTITIONED BY(logdt STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '='
LOCATION '/ucloud/collected/ucloudpersonal'

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}

rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
My rhive package version is RHive_0.0-6.

rhive.hdfs.chmod

Need rhive.hdfs.chown, rhive.hdfs.chmod functions (not much strongly)

rhive.query

rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
Error in rdata[[i]] : subscript out of bounds
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion

traceback()
3: FUN(X[[1L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
1: rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")

please check.

Remove Rserver

rhive.mrapply

rhive.mrapply
function (tablename, mapperFUN, reducerFUN, mapinput = NULL,
mapoutput = NULL, by = NULL, reduceinput = NULL, reduceoutput = NULL,
mapper_args = NULL, reducer_args = NULL, buffersize = -1L,
verbose = FALSE, hiveclient = rhive.defaults("hiveclient"))
.....
...
..

Plz modify function parameters especially all remains after reducerFUN to be neglectable parameters.

like "rhive.mrapply("weights", map, reduce)" to apply map for all columns.

RHive not working with CDH4

I am trying to use RHive with CDH4. And at rhive.connect() it gives me the following error-

WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Error in .jfindClass(as.character(class)) : class not found

Any ideas on this?

RHive 2.0 Compile Failed!

I download the RHive 2.0 package,cd into the work directory,then run 'ant build' ,it faild:
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

compile:
[mkdir] Created dir: /opt/RHive/build/classes
[javac] Compiling 21 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:44: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!stat.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:115: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!src.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:147: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] length[i] = items[i].isDir() ? fs.getContentSummary(items[i].getPath()).getLength() : items[i].getLen();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/JobManager.java:33: warning: [deprecation] getUsedMemory() in org.apache.hadoop.mapred.ClusterStatus has been deprecated
[javac] clusterStatus.getUsedMemory();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
[javac] 4 warnings

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Total time: 3 seconds
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone can help me ?

Unnecessary attempt to modify HDFS root directory

Hello there. I recently started using your RHive package, and have been mostly happy so far. One concern I have is with code of "rhive.hdfs.connect". It tries to make changes in the root HDFS directory. In most real systems, including ours, it would be disallowed. Would it make sense to make changes in /tmp instead of /? I had to change your code to make it work with our system. Thanks!

= Yakov

export function can be overwritten by other RHive users when they use same name

rhive.connect Error but it still works?

I'm trying to work with RHive on Amazon EMR and I'm getting an error with rhive.connect, but the connection seems to be working:

{code}

library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-6. For overview type ‘?RHive’.
HIVE_HOME=/home/hadoop/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
rhive.connect(port=10003)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2012-10-16 21:15:43,446 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:121)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:225)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:190)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1330)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1348)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:246)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,497 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,517 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,546 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,563 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3079)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:598)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:548)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:529)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,593 INFO [LeaseChecker] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1235)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1247)
at java.lang.Thread.run(Thread.java:662)
2012-10-16 21:15:43,609 INFO [Thread-7] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3338)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3202)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2415)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2656)
2012-10-16 21:15:43,656 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,659 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,664 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,668 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,681 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3725)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3640)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:96)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
{code}

rhive.query function dose not handle "hive map table" properly

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {

coefficient * sal

}
rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!

rhive.query function show warning message when query result is too big

recommand use rhive.big.query function

support R script with mapreduce script function of HIVE

Hive provides user to use any script in hive query. Hive uses hadoop streaming to provide it.
RHive can provide function to run R script in HQL with this Hive function.

RHive doesn't seem to support HiveServer2

I tried to connect to a HiveServer2 instance, and it remained stuck at the rhive.connect() call.

I was able to connect to a HiveServer1 instance.

However, the latest Cloudera Manager (4.5) has support for managing only HiveServer2.

Are there any plans for HiveServer2 support?

quering SerDe tables

I have a table created using "ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';".
While rhive.query("SELECT * FROM serdetable") works,
selecting special column rhive.query("SELECT col1 FROM serdetable")
returns
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
I tried to run the same query directly from hive shell and it works - which means that the jar that contains org.openx.data.jsonserde.JsonSerDe class was loaded by the hive.
I have to mention that trying on other table created with default Regex SerDe returns the same error.
Any help would be appriciated!

Need a way to get debug message from remote node

It's hard to debug and find error poins.
We need an appropriate way to catch the remote debug messages and It should works for dedicate account session.

rhive.write.table

rhive.write.table(editweights)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:575 mismatched input '0' expecting Identifier near ',' in column specification
, errorCode:11, SQLState:42000)

head(editweights)
V1 a A b B c C d D e E f F g G h H i I j J
1 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1
2 -1 0.0 -0.3 -2.0 -3.0 -1.5 -3.0 -1.0 -3.0 -1.0 -1 -1.5 -3.0 -2.0 -3.0 -2.0 -3.0 -2.0 -3 -2 -3
3 -1 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -3
4 -1 -2.0 -3.0 0.0 -0.3 -1.0 -3.0 -1.5 -3.0 -2.0 -2 -1.0 -3.0 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1 -3
5 -1 -3.0 -3.0 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -0.5 -0.5 -0.5 -0.5 -3.0 -3 -3 -3
6 -1 -1.5 -3.0 -1.0 -3.0 0.0 -0.3 -0.5 -0.5 -0.5 -1 -0.5 -0.5 -1.0 -3.0 -1.5 -3.0 -2.0 -3 -2 -3
k K l L m M n N o O p P q Q r R s S t T u U v
1 -1.0 -1 -1 -1 -1 -1 -1.0 -1.0 -1 -1 -1 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1.0
2 -2.0 -3 -2 -3 -2 -3 -2.0 -3.0 -2 -2 -2 -2 -0.5 -0.5 -1.5 -1.5 -0.5 -0.5 -2.0 -2.0 -2 -3 -2.0
3 -3.0 -3 -3 -3 -3 -3 -3.0 -3.0 -3 -3 -3 -3 -0.5 -0.5 -3.0 -3.0 -0.5 -0.5 -3.0 -3.0 -3 -3 -3.0
4 -1.5 -3 -2 -3 -1 -3 -0.5 -0.5 -2 -2 -2 -2 -2.0 -2.0 -1.5 -1.5 -2.0 -3.0 -1.0 -1.0 -1 -3 -0.5
5 -3.0 -3 -3 -3 -3 -3 -0.5 -0.5 -3 -3 -3 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -0.5
6 -2.0 -3 -2 -3 -2 -3 -1.5 -3.0 -2 -2 -2 -2 -2.0 -2.0 -1.0 -1.0 -1.0 -3.0 -0.5 -0.5 -2 -3 -0.5
V w W x X y Y z Z 0 1 2 3 4 5 6 7 8 9
1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0
2 -3.0 -0.5 -0.5 -1.0 -3.0 -2.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
3 -3.0 -0.5 -0.5 -3.0 -3.0 -3.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
4 -0.5 -2.0 -2.0 -1.5 -3.0 -1.0 -3 -2.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
5 -0.5 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
6 -0.5 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
_ $ #
1 -0.5 -0.15 -0.2
2 -1.5 -100.00 -100.0
3 -1.5 -100.00 -100.0
4 -1.5 -100.00 -100.0
5 -1.5 -100.00 -100.0
6 -1.5 -100.00 -100.0

plz fix as soon as possible.

rhive.query function dose not handle hive's "CASE ~ WHEN ~ END" syntax properly

issue case:
aaa = rhive.big.query("select *,
CASE
WHEN petallength < 2.45 THEN 'first'
WHEN petallength >= 2.45 THEN 'second'
END as separation
from iris_3")

expected output:
1 5.1 3.5 1.4 0.2 setosa first
2 4.9 3.0 1.4 0.2 setosa first
.
.
50 5.0 3.3 1.4 0.2 setosa first
51 7.0 3.2 4.7 1.4 versicolor second
.
.

Results not split correctly?

When I run any kind of query, the results returned by HiveClient are not tab-separated, but delimitted with the literal string "\001" resulting in improper results. Is this happening to anyone else?

RUDF rhive.query failed due to serialisation exception

Hi,

We are trying to execute the exemples (https://github.com/nexr/RHive/wiki/RHive-example-code).
When trying to execute the query, our jobs failed with a KryoException.
It seems that an UDF instance is serialized even it contained converters that are not designed for serialization (no default constructor).

We are using Hadoop 2.2 and hive 0.12 (Horton distribution).

Are those examples still correct ?
Do you have an idea of the cause of our error ?

Regards,
Philippe

The exemple :
coefficient <- 1.1 scoring <- function(sal) { coefficient * sal } rhive.assign('coefficient',coefficient) rhive.assign('scoring',scoring) rhive.exportAll(‘scoring’) rhive.query("select R('scoring',col_sal,0.0) from emp")

Exception :
Error: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter Serialization trace: converters (com.nexr.rhive.hive.udf.RUDF) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:314) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:263) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:376) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:552) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter Serialization trace: converters (com.nexr.rhive.hive.udf.RUDF) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097) at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109) at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672) at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:810) at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:720) at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:733) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:287) ... 13 more

FUN(table-name, hiveFUN, col, ..., groups)

nexr / rhive Goto Github PK

rhive's People

Contributors

Stargazers

Watchers

Forkers

rhive's Issues

Recommend Projects

Recommend Topics

Recommend Org