Giter VIP home page Giter VIP logo

rhive's People

Contributors

bluemir avatar echiu64 avatar euriion avatar jakemoon avatar ssshow16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rhive's Issues

Connection error with 2.0.0

I recently install Hive(0.11.0) and Hadoop(1.2.1) on ubuntu(13.10 x64). R(3.0.1 "Good Sport") is installed correctly and working well with other packages. I also installed Rhive(2.0.0) package, when I tried connecting to hive from R it is showing error message as shown below. please help me on the same.
image

version 2.0.0 Error at init

Hi, this is what I get after installing version 2.0.0

rhive.init()
rhive.env()
hadoop home: /usr/local/hadoop
hive home: /usr/local/hive

rhive.connect(host='master',port='10000')
Error: class not found

Please consider that version 0.0.7 (the previous version I used) worked just fine. The environment I got from 0.0.7 was

Hive Home Directory : /usr/local/hive
Hadoop Home Directory : /usr/local/hadoop
Hadoop Conf Directory :
Default RServe List
master slave1 slave2 slave3 slave4 slave5 slave6 slave7 slave8 slave9 slave10master : RHIVE_DATA = /home/hduser/RData/
slave1 : RHIVE_DATA = /home/hduser/RData
slave2 : RHIVE_DATA = /home/hduser/RData
slave3 : RHIVE_DATA = /home/hduser/RData
slave4 : RHIVE_DATA = /home/hduser/RData
slave5 : RHIVE_DATA = /home/hduser/RData
slave6 : RHIVE_DATA = /home/hduser/RData
slave7 : RHIVE_DATA = /home/hduser/RData
slave8 : RHIVE_DATA = /home/hduser/RData
slave9 : RHIVE_DATA = /home/hduser/RData
slave10 : RHIVE_DATA = /home/hduser/RData

Connected HiveServer : master:10000

rhive.query resulting in subscript out of bounds

I am getting the following error with Hive-0.9.0:11:
Error in rdata[[i]] : subscript out of bounds

traceback() reveals:

3: FUN(X[[25L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})

What does this error mean? Can this be fixed? Thanks in advance!

rhive.exportAll('scoring') Error

Hello All,
I installed RHive-0.7 and CDH4.4,rhive.connect() is worked.
when I run 'rhive.exportAll('scoring')',its print the Error as follow:

rhive.exportAll('scorint')
Error in RSeval(rcon, command) : remote evaluation failed
In addition: Warning messages:
1: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
2: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2

I want to know what the problem ......I'm looking forward to your reply.

Checking Hadoop slaves shoud be added

Hi,
I met an error when I build a package which have dependency on RHive 0.0-3 as like below

installing source package ‘clog’ ...
** R
** data
** inst
** preparing package for lazy loading
Warning in file(file, "rt") :
cannot open file '/srv/clog/hadoop-0.20.203.0/conf/slaves': No such file or directory
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
Error : package ‘RHive’ could not be loaded
ERROR: lazy loading failed for package ‘clog’

  • removing ‘/Users/aidenhong/Documents/workspace/cog_git/src/main/R/cog.Rcheck/clog’

Above error is occurred when I run 'R CMD check packagefile' and I can't finish building package.
I thinks it's cause by returning error code when RHive is loaded by library.

rhive.write.table error

I have used rhive.query with no problem. But when I use rhive.write.table(myTableName), the following error occurs:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: File myTableName.rhive does not exist.
[1] "myTableName"

It created an empty table with correct column names. I double checked that the class of myTableName is a data frame.

Does anyone come across this problem? Thanks in advance.

convert java null to R NULL

In RUDF and RUDAF, null data is passed to R function. but R can not handle this data because null in R is not null but NULL.

ant build Faild

[root@hadoop RHive]# ant build
Buildfile: build.xml

compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Is there anyone who can help me??

Windows setup guide is necessary

Windows setup guide is necessary. because many people use windows as their operating system. we need test and make a guide for that people.

implement apply function

develop RHive apply function using RUDF.
We designed two apply function depended on return type.

  1. napply : numeric type
  2. sapply : string type

these functions syntax is below :
[n|s]apply(hive-tablename, FUN, col1, ...)

Timeout checking in rhive.connect()

Hi,
I had a serious problem when I try to use RHive on just recovered custler.
The problem was rhive.connect was not finished never for long long time.
I figured it out, the cause of problem is that mysql server for Hive was down.
I am not sure this problem can be solved in RHive or not but anyway, I think timeout parameter may be necessary with default value in rhive.connect() function.

Thanks.

rhive.napply

rhive.write.table(iris)
[1] "iris"
rhive.desc.table("iris")
col_name data_type comment
1 rowname string
2 sepallength double
3 sepalwidth double
4 petallength double
5 petalwidth double
6 species string
rhive.napply('iris', function(column1) { column1 * 10}, 'sepallength')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:40 cannot recognize input near 'CREATE' 'TABLE' 'iris_napply1328157031_table' in select clause
, errorCode:11, SQLState:42000)

plz check.

RHive query results transfer extremely slow

Hello,

When I do a simple
rhive.query("select * from X limit 10000")

it takes 90s to answer once the query is completed on the hiveserver (OK displayed on the console).

It increases linearily with data size, always exactly 9 ms per line, it does not depend on the line length.

It is several order of magnitude slower than any other kind of data transfer between R and whatever. My guess is that there is some kind of timeout somewhere.

rhive.connect makes long log messages

rhive.connect makes long message when I use.
I don't know what is the mean of the message.
can you guys make a way to save these to log file to hide the hard message and can you explain what is this?
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive-0.7.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-0.20.203.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

rhive omit HIVE error log

rhive use hive-trift service so error log drop into hive-trift's stdout
this cause inconvenience to user

RHive with hiveserver2 + kerberos

Hi There,

Is the RHive will work with hiveserver2 which is enabled with kerberos security ?

When I try to connect to Hive I am getting following exception in my R Studio console

rhive.connect(host="hostname.domain.com/default;principal=hive/[email protected]",defaultFS="hdfs://namenode.domain.com:8020/user/me",hiveServer2=TRUE)
14/05/05 15:10:52 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "Thread-35" java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: hive/[email protected]:10000/default
        at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:64)
        at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:198)
        at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:138)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:123)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:185)
        at com.nexr.rhive.hive.DatabaseConnection.connect(DatabaseConnection.java:51)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.connect(HiveJdbcClient.java:330)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.run(HiveJdbcClient.java:322)
Error: java.lang.IllegalStateException: Not connected to hiveserver

Thanks,
Prabhu.

rhive.query function dose not handle "hive map table" properly

1.query : select logdata from ulog limit1

2.hive console result
{"{"body":"SEQ_ID":"20120709160001430307","HOST_NAME":"u2dlpweb01","LOG_TIME":"20120709160001","REQ_TIME":"20120709160001","LOG_KIND":"SVC","KT_USER_ID":"","KT_SVC_ID":"X","SESSION_KEY":"","FILE_ID":"X","RT_CODE":"1","DIVIDE1":"211.55.29.102","DIVIDE2":"http://gate2.ucloud.com/api/1/pcclient/pcauth","DIVIDE3":"POST","DIVIDE4":"200","DIVIDE5":"0000","DIVIDE6":"WIN","DIVIDE7":"7","DIVIDE8":"uCloud","DIVIDE9":"1.0.2","DIVIDE10":"personal","DIVIDE11":"GATEWAY","DIVIDE12":"4000","DIVIDE13":"X","DIVIDE14":"X","DIVIDE15":"X","DIVIDE16":"X","DIVIDE17":"X","DIVIDE18":"uCloud/1.0.2 WIN/7 PC personal","DIVIDE19":"X","DIVIDE20":"X","DIVIDE21":"X","DIVIDE22":"X","DIVIDE23":"X","DIVIDE24":"X","DIVIDE26":"X","DIVIDE26":"X","DIVIDE27":"X","DIVIDE28":"X","DIVIDE29":"X","DIVIDE30":"X","timestamp":1341820971766,"pri":"INFO","nanos":784667370878488,"host":"u2dlpweb01","fields":{"AckTag":"20120709-170250760+0900.784666364804488.00000018","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000:\u001F짰I","tailSrcFile":"ucloud-003.log","rolltag":"20120709-170444530+0900.524399892735665.00000020"}}"}
Time taken: 14.89 seconds

3.Rstudio rhive.query
rhive.query('select logdata from ulog limit 1')
logdata
1 NA
Warning message:
NAs introduced by coercion

4.ulog table script
CREATE EXTERNAL TABLE IF NOT EXISTS ulog (
logdata MAP<STRING,STRING>
)
PARTITIONED BY(logdt STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '='
LOCATION '/ucloud/collected/ucloudpersonal'

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}

rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
My rhive package version is RHive_0.0-6.

rhive.hdfs.chmod

Need rhive.hdfs.chown, rhive.hdfs.chmod functions (not much strongly)

rhive.query

rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
Error in rdata[[i]] : subscript out of bounds
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion

traceback()
3: FUN(X[[1L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
1: rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")

please check.

rhive.mrapply

rhive.mrapply
function (tablename, mapperFUN, reducerFUN, mapinput = NULL,
mapoutput = NULL, by = NULL, reduceinput = NULL, reduceoutput = NULL,
mapper_args = NULL, reducer_args = NULL, buffersize = -1L,
verbose = FALSE, hiveclient = rhive.defaults("hiveclient"))
.....
...
..

Plz modify function parameters especially all remains after reducerFUN to be neglectable parameters.

like "rhive.mrapply("weights", map, reduce)" to apply map for all columns.

RHive not working with CDH4

I am trying to use RHive with CDH4. And at rhive.connect() it gives me the following error-

WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Error in .jfindClass(as.character(class)) : class not found

Any ideas on this?

RHive 2.0 Compile Failed!

I download the RHive 2.0 package,cd into the work directory,then run 'ant build' ,it faild:
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

compile:
[mkdir] Created dir: /opt/RHive/build/classes
[javac] Compiling 21 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:44: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!stat.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:115: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!src.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:147: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] length[i] = items[i].isDir() ? fs.getContentSummary(items[i].getPath()).getLength() : items[i].getLen();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/JobManager.java:33: warning: [deprecation] getUsedMemory() in org.apache.hadoop.mapred.ClusterStatus has been deprecated
[javac] clusterStatus.getUsedMemory();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
[javac] 4 warnings

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Total time: 3 seconds
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone can help me ?

Unnecessary attempt to modify HDFS root directory

Hello there. I recently started using your RHive package, and have been mostly happy so far. One concern I have is with code of "rhive.hdfs.connect". It tries to make changes in the root HDFS directory. In most real systems, including ours, it would be disallowed. Would it make sense to make changes in /tmp instead of /? I had to change your code to make it work with our system. Thanks!

= Yakov

rhive.connect Error but it still works?

I'm trying to work with RHive on Amazon EMR and I'm getting an error with rhive.connect, but the connection seems to be working:

{code}

library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-6. For overview type ‘?RHive’.
HIVE_HOME=/home/hadoop/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
rhive.connect(port=10003)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2012-10-16 21:15:43,446 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:121)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:225)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:190)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1330)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1348)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:246)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,497 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,517 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,546 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,563 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3079)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:598)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:548)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:529)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,593 INFO [LeaseChecker] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1235)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1247)
at java.lang.Thread.run(Thread.java:662)
2012-10-16 21:15:43,609 INFO [Thread-7] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3338)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3202)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2415)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2656)
2012-10-16 21:15:43,656 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,659 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,664 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,668 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,681 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3725)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3640)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:96)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
{code}

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {

  • coefficient * sal
  • }
    rhive.assign('coefficient', coefficient)
    [1] TRUE
    rhive.assign('scoring', scoring)
    [1] TRUE
    rhive.exportAll('scoring')
    [1] TRUE
    rhive.query("select * from iris limit 10")
    rhive_row sepallength sepalwidth petallength petalwidth species
    1 1 5.1 3.5 1.4 0.2 setosa
    2 2 4.9 3.0 1.4 0.2 setosa
    3 3 4.7 3.2 1.3 0.2 setosa
    4 4 4.6 3.1 1.5 0.2 setosa
    5 5 5.0 3.6 1.4 0.2 setosa
    6 6 5.4 3.9 1.7 0.4 setosa
    7 7 4.6 3.4 1.4 0.3 setosa
    8 8 5.0 3.4 1.5 0.2 setosa
    9 9 4.4 2.9 1.4 0.2 setosa
    10 10 4.9 3.1 1.5 0.1 setosa
    test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
    test2
    rhive_row sepallength sepalwidth
    1 1 5.1 3.5
    2 2 4.9 3.0
    3 3 4.7 3.2
    4 4 4.6 3.1
    5 5 5.0 3.6
    6 6 5.4 3.9
    7 7 4.6 3.4
    8 8 5.0 3.4
    9 9 4.4 2.9
    10 10 4.9 3.1
    test2 <- rhive.write.table(test2)
    rhive.desc.table(test2)
    col_name data_type comment
    1 rhive_row string
    2 sepallength double
    3 sepalwidth double
    rhive.query("select * from test2")
    rhive_row sepallength sepalwidth
    1 1 5.1 3.5
    2 2 4.9 3.0
    3 3 4.7 3.2
    4 4 4.6 3.1
    5 5 5.0 3.6
    6 6 5.4 3.9
    7 7 4.6 3.4
    8 8 5.0 3.4
    9 9 4.4 2.9
    10 10 4.9 3.1
    rhive.query("select R('scoring', sepallength, 0.0) from test2")
    error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
    HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!

RHive doesn't seem to support HiveServer2

I tried to connect to a HiveServer2 instance, and it remained stuck at the rhive.connect() call.

I was able to connect to a HiveServer1 instance.

However, the latest Cloudera Manager (4.5) has support for managing only HiveServer2.

Are there any plans for HiveServer2 support?

quering SerDe tables

I have a table created using "ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';".
While rhive.query("SELECT * FROM serdetable") works,
selecting special column rhive.query("SELECT col1 FROM serdetable")
returns
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
I tried to run the same query directly from hive shell and it works - which means that the jar that contains org.openx.data.jsonserde.JsonSerDe class was loaded by the hive.
I have to mention that trying on other table created with default Regex SerDe returns the same error.
Any help would be appriciated!

rhive.write.table

rhive.write.table(editweights)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:575 mismatched input '0' expecting Identifier near ',' in column specification
, errorCode:11, SQLState:42000)

head(editweights)
V1 a A b B c C d D e E f F g G h H i I j J
1 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1
2 -1 0.0 -0.3 -2.0 -3.0 -1.5 -3.0 -1.0 -3.0 -1.0 -1 -1.5 -3.0 -2.0 -3.0 -2.0 -3.0 -2.0 -3 -2 -3
3 -1 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -3
4 -1 -2.0 -3.0 0.0 -0.3 -1.0 -3.0 -1.5 -3.0 -2.0 -2 -1.0 -3.0 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1 -3
5 -1 -3.0 -3.0 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -0.5 -0.5 -0.5 -0.5 -3.0 -3 -3 -3
6 -1 -1.5 -3.0 -1.0 -3.0 0.0 -0.3 -0.5 -0.5 -0.5 -1 -0.5 -0.5 -1.0 -3.0 -1.5 -3.0 -2.0 -3 -2 -3
k K l L m M n N o O p P q Q r R s S t T u U v
1 -1.0 -1 -1 -1 -1 -1 -1.0 -1.0 -1 -1 -1 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1.0
2 -2.0 -3 -2 -3 -2 -3 -2.0 -3.0 -2 -2 -2 -2 -0.5 -0.5 -1.5 -1.5 -0.5 -0.5 -2.0 -2.0 -2 -3 -2.0
3 -3.0 -3 -3 -3 -3 -3 -3.0 -3.0 -3 -3 -3 -3 -0.5 -0.5 -3.0 -3.0 -0.5 -0.5 -3.0 -3.0 -3 -3 -3.0
4 -1.5 -3 -2 -3 -1 -3 -0.5 -0.5 -2 -2 -2 -2 -2.0 -2.0 -1.5 -1.5 -2.0 -3.0 -1.0 -1.0 -1 -3 -0.5
5 -3.0 -3 -3 -3 -3 -3 -0.5 -0.5 -3 -3 -3 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -0.5
6 -2.0 -3 -2 -3 -2 -3 -1.5 -3.0 -2 -2 -2 -2 -2.0 -2.0 -1.0 -1.0 -1.0 -3.0 -0.5 -0.5 -2 -3 -0.5
V w W x X y Y z Z 0 1 2 3 4 5 6 7 8 9
1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0
2 -3.0 -0.5 -0.5 -1.0 -3.0 -2.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
3 -3.0 -0.5 -0.5 -3.0 -3.0 -3.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
4 -0.5 -2.0 -2.0 -1.5 -3.0 -1.0 -3 -2.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
5 -0.5 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
6 -0.5 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
_ $ #
1 -0.5 -0.15 -0.2
2 -1.5 -100.00 -100.0
3 -1.5 -100.00 -100.0
4 -1.5 -100.00 -100.0
5 -1.5 -100.00 -100.0
6 -1.5 -100.00 -100.0

plz fix as soon as possible.

rhive.query function dose not handle hive's "CASE ~ WHEN ~ END" syntax properly

issue case:
aaa = rhive.big.query("select *,
CASE
WHEN petallength < 2.45 THEN 'first'
WHEN petallength >= 2.45 THEN 'second'
END as separation
from iris_3")

expected output:
1 5.1 3.5 1.4 0.2 setosa first
2 4.9 3.0 1.4 0.2 setosa first
.
.
50 5.0 3.3 1.4 0.2 setosa first
51 7.0 3.2 4.7 1.4 versicolor second
.
.

Results not split correctly?

When I run any kind of query, the results returned by HiveClient are not tab-separated, but delimitted with the literal string "\001" resulting in improper results. Is this happening to anyone else?

RUDF rhive.query failed due to serialisation exception

Hi,

We are trying to execute the exemples (https://github.com/nexr/RHive/wiki/RHive-example-code).
When trying to execute the query, our jobs failed with a KryoException.
It seems that an UDF instance is serialized even it contained converters that are not designed for serialization (no default constructor).

We are using Hadoop 2.2 and hive 0.12 (Horton distribution).

Are those examples still correct ?
Do you have an idea of the cause of our error ?

Regards,
Philippe

The exemple :

coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}
rhive.assign('coefficient',coefficient)
rhive.assign('scoring',scoring)
rhive.exportAll(‘scoring’)
rhive.query("select R('scoring',col_sal,0.0) from emp")

Exception :

Error: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:314)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:263)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:376)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:552)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109)
at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:810)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:720)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:733)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:287)
... 13 more

Need verbose mode when fetching big result

Need progress bar or verbose message, in order to know ETA when we fetch big data from Hive query result using "rhive.query" function. Sometimes, the "rhive.query" takes too long time. need a sort of indicator.

implement aggregate function

develop RHive aggregate function similar to R aggregate function

this function syntax is

FUN(table-name, hiveFUN, col, ..., groups)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.