Giter VIP home page Giter VIP logo

hive-third-functions's Introduction

Hi 👋, 我是Aaron,在美团从事大数据相关的工作。当前个人主要专注在以数据视角去解决算法领域的问题,负责进行算法交付系统的设计规划和能力建设。

我的动态

hive-third-functions's People

Contributors

aaronshan avatar dependabot[bot] avatar yiuterran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hive-third-functions's Issues

array_intersect bug

array_intersect 有数组越界bug,代码也复杂,简化代码:
` @OverRide
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object leftArray = arguments[0].get();
Object rightArray = arguments[1].get();

    int leftArrayLength = leftArrayOI.getListLength(leftArray);
    int rightArrayLength = rightArrayOI.getListLength(rightArray);

    // Check if array is null or empty
    if (leftArray == null || rightArray == null || leftArrayLength < 0 || rightArrayLength < 0) {
        return null;
    }

    if (leftArrayLength == 0) {
        return leftArray;
    }
    if (rightArrayLength == 0) {
        return rightArray;
    }

    List<?> leftList = leftArrayOI.getList(leftArray);
    List<?> rightList = leftArrayOI.getList(rightArray);
    HashSet<?> result_set = Sets.newHashSet(leftList);
    result_set.retainAll(rightList);

    return new ArrayList(result_set);
}`

some function cannot work

0: jdbc:hive2://XX> create temporary function wgs_distance as 'com.github.aaronshan.functions.geo.UDFGeoWgsDistance'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoWgsDistance does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_to_bd as 'com.github.aaronshan.functions.geo.UDFGeoGcjToBd'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjToBd does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function bd_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoBdToGcj'; No rows affected (0.014 seconds) 0: jdbc:hive2://XX> create temporary function wgs_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoWgsToGcj'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoWgsToGcj does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_to_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjToWgs'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjToWgs does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://XX> create temporary function gcj_extract_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjExtractWgs'; ERROR : FAILED: Class com.github.aaronshan.functions.geo.UDFGeoGcjExtractWgs does not implement UDF, GenericUDF, or UDAF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

the jar is come from your released 2.2.0 , is there some problems ?

[BUG]when apply the udf, appears null pointer error

Hello,

I tried to apply these UDFs in spark with hive support. Here is the code:

// register UDF
spark.sql("create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince'");
        	
// get file
Dataset<Row> rawdata = spark.read().csv("./src/main/resources/starM.csv");

// use UDF
rawdata.createOrReplaceTempView("starM");
Dataset<Row> udfModified = spark.sql("SELECT *, id_card_province(_c13) FROM starM");
udfModified.show();

and I got error:

org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(org.apache.hadoop.io.Text)  on object cc.shanruifeng.functions.card.UDFChinaIdCardProvince@5c622859 of class cc.shanruifeng.functions.card.UDFChinaIdCardProvince with arguments {652423184510291234:org.apache.hadoop.io.Text} of size 1
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:981)
	at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:91)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_6$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:235)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:957)
	... 18 more
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.io.Text.encode(Text.java:450)
	at org.apache.hadoop.io.Text.set(Text.java:198)
	at cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(UDFChinaIdCardProvince.java:23)
	... 23 more

Could you please give me some advice on this problem? The column is not null anyway.

UDFArrayIntersect代码逻辑有问题,会出现数组越界异常或者出现错误返回值。

https://github.com/aaronshan/hive-third-functions/blame/f98fef86d328882c85ea40b69b14375d90d44201/src/main/java/com/github/aaronshan/functions/array/UDFArrayIntersect.java#L160
select default.array_intersect(array("39236600","38943350","39007633"),array("39236600","38943350","39007633","38593565","39165420","39119191","39223090","39273131","39113697","39264583","38643724","39243639","39273301","39153039","39152750","38422867","39194210"));
返回值应该是{"39236600","38943350","39007633"},但实际上只返回了一个。

查看了代码,发现compare方法逻辑错误,应修复为一下:
private int compare(ListObjectInspector arrayOI, Object array, int[] positions, int position1, int position2) {
ObjectInspector arrayElementOI = arrayOI.getListElementObjectInspector();
Object arrayElementTmp1 = arrayOI.getListElement(array, positions[position1]);
Object arrayElementTmp2 = arrayOI.getListElement(array, positions[position2]);
return ObjectInspectorUtils.compare(arrayElementTmp1, arrayElementOI, arrayElementTmp2, arrayElementOI);
}
即参数中增加int[] positions,传递进来leftPositions或者rightPositions,相应方法调用出也需一并修改。

What is the syntax for json path?

create table temp.test_explode (userid string, log string) partitioned by (day string) stored as orc;
insert into table temp.test_explode partition (day = '2017-11-01') values ('u1', '[{  "action": "a" },  {  
"action": "b"} ]');
insert into table temp.test_explode partition (day = '2017-11-02') values ('u2', '[{  "action": "a" , "arg1": "1"},  {  "action": "b", "arg2": "2"} ]');


create temporary function udf_json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract';

create temporary function udf_json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar';


select userid, udf_json_array_extract ( log, '$.action' ) from  temp.test_explode  where day = '2017-11-01';

-- result
u1	["\"a\"","\"b\""]

select userid, udf_json_array_extract_scalar ( log, '$.action' ) from  temp.test_explode  where day = '2017-11-01';

-- result
u1	NULL

I want to get an array like ["a","b"], in the above second query, I got NULL.

Am I using wrong syntax for json path? What is the syntax for the path?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.