Giter VIP home page Giter VIP logo

Comments (18)

FlxRobin avatar FlxRobin commented on May 17, 2024

What is the status of this issue? Latest packaged release (0.56) does not seem to support this.

Query 20140114_114351_00007_iv7th failed: Error opening Hive split hdfs://path/to/file (offset=0, length=25658859) using org.apache.hadoop.hive.ql.io.RCFileInputFormat: Unknown codec: com.hadoop.compression.lzo.LzoCodec

from presto.

damiencarol avatar damiencarol commented on May 17, 2024

@FlxRobin
With master version, I only copied file hadoop-lzo-0.4.20-SNAPSHOT.jar to plugin dir.

cp hadoop-1.2.1/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar ./presto-server-0.57-SNAPSHOT/plugin/hive-hadoop1/

When I do that, it works like charm. (Need to stop/start the cluster)

PrestoDB load jar of plugin you load.

In HIVE (picked from official doc) :

CREATE TABLE test_lzo2 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
    INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    AS
    SELECT transaction_id as id,
profil as label
from transaction 
limit 5

In presto :

presto:casino> select * from test_lzo2;
    id    |  label
----------+----------
 13701376 | Fidélisé
 13701377 | Fidélisé
 13701378 | Fidélisé
 13701379 | Fidélisé
 13701380 | Fidélisé
(5 rows)

Query 20140114_164727_00004_xc7ir, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:01 [10 rows, 220B] [12 rows/s, 278B/s]

from presto.

FlxRobin avatar FlxRobin commented on May 17, 2024

Thanks Damien, I had two minor issues while enabling this:

  1. You have to add the "native-lzo" lib to Presto, you can do this by adding the line below to your presto-server/etc/jvm.config file:
    -Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64

  2. If you use CDH4 the path is of course not presto-server/plugin/hive-hadoop1/ but presto-server/plugin/hive-cdh4/

from presto.

lvxin1986 avatar lvxin1986 commented on May 17, 2024

the version of presto which i used is 0.75,it seems don't support lzo compression,anyone knows whether does the presto support lzo?

from presto.

yuananf avatar yuananf commented on May 17, 2024

@electrum Hi, I did this and saw that Presto can support LZO-compressed text files now.
But it does not support reading lzo index file and split lzo text file, I understand the reason is that the modification of the hadoop-lzo source code, but it still confused me.
I saw that hive can handle lzo index file well without modifying hadoop-lzo source code, have you got any idea how hive does this? What should I do to support lzo index and split lzo file.

from presto.

electrum avatar electrum commented on May 17, 2024

That patch modifies the getRecordReader() method to effectively skip index files. Presto doesn't use the InputFormat's split computation (it has its own). Assuming that patch makes it work at all, it is likely that the splitting doesn't work (I would be wary of wrong answers).

In any case, we recommend switching to a modern format like ORC, RCFile or Parquet. These formats, especially ORC, will be substantially faster for a variety of reasons and have better compression.

from presto.

yuananf avatar yuananf commented on May 17, 2024

Thanks for explaining!

from presto.

wbchn avatar wbchn commented on May 17, 2024

@FlxRobin
I am using presto 0.86.
It does not work after set jvm.config and copy hadoop-lzo.jar to plugin/hive-hadoop2 (i use hive-hadoop2)

Is there any other solution after version 0.57?

from presto.

yuananf avatar yuananf commented on May 17, 2024

@FlxRobin The hadoop-lzo.jar should be build on https://github.com/klbostee/hadoop-lzo.
Works fine for presto 0.96.

from presto.

wbchn avatar wbchn commented on May 17, 2024

@yuananf
Thx for your reply.

I try https://github.com/klbostee/hadoop-lzo , but got same error:

2015-03-05T22:13:35.567-0800    DEBUG   query-execution-4   com.facebook.presto.execution.SqlStageExecution Stage 20150306_061333_00006_y9z6v.1 is FAILED
2015-03-05T22:13:35.570-0800    DEBUG   query-execution-0   com.facebook.presto.execution.SqlStageExecution Stage 20150306_061333_00006_y9z6v.0 is FAILED
2015-03-05T22:13:35.573-0800    ERROR   hive-hive-3 com.google.common.util.concurrent.Futures$CombinedFuture    input future failed.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:na]
    at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:148) ~[presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.HiveSplitSourceProvider$3.process(HiveSplitSourceProvider.java:258) ~[presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker.doWalk(AsyncWalker.java:95) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker.access$000(AsyncWalker.java:34) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker$1.run(AsyncWalker.java:72) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.SuspendingExecutor$1.run(SuspendingExecutor.java:67) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.reflect.InvocationTargetException: null
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
    at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:145) ~[presto-hive-0.72.jar:0.72]
    ... 14 common frames omitted
Caused by: java.lang.NullPointerException: null
    at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101) ~[hadoop-lzo.jar:na]
    ... 19 common frames omitted
2015-03-05T22:13:35.573-0800    DEBUG   query-execution-1   com.facebook.presto.execution.QueryStateMachine Query 20150306_061333_00006_y9z6v is FAILED

~~~~ more  ~~~~

2015-03-05T22:13:35.591-0800    ERROR   hive-hive-12    com.google.common.util.concurrent.Futures$CombinedFuture    input future failed.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:na]
    at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:148) ~[presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.HiveSplitSourceProvider$3.process(HiveSplitSourceProvider.java:258) ~[presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker.doWalk(AsyncWalker.java:95) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker.access$000(AsyncWalker.java:34) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.AsyncWalker$1.run(AsyncWalker.java:72) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.SuspendingExecutor$1.run(SuspendingExecutor.java:67) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
    at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.reflect.InvocationTargetException: null
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
    at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:145) ~[presto-hive-0.72.jar:0.72]
    ... 14 common frames omitted
Caused by: java.lang.NullPointerException: null
    at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101) ~[hadoop-lzo.jar:na]
    ... 19 common frames omitted
2015-03-05T22:13:35.620-0800     INFO   query-execution-1   com.facebook.presto.event.query.QueryMonitor    TIMELINE: Query 20150306_061333_00006_y9z6v :: elapsed 1811.00ms :: planning 273.56ms :: scheduling 1537.00ms :: running 0.00ms :: finishing 1537.00ms :: begin 2015-03-05T22:13:33.760-08:00 :: end 2015-03-05T22:13:35.571-08:00

It seem like some jar reflect with others after copy hadoop-lzo.jar to plugin. but i have no idea solute this. could u help me?

by the way, i use aws emr enviorment, which using JAVA 7, Hadoop 2.4.0, Hive 0.13.1

from presto.

electrum avatar electrum commented on May 17, 2024

This will be fixed soon with the new native LZO implementation in https://github.com/airlift/aircompressor

from presto.

yuananf avatar yuananf commented on May 17, 2024

Great! Looking forward to it.

from presto.

mombergm avatar mombergm commented on May 17, 2024

@electrum Do you know in which future release aircompressor would be?

from presto.

xcrossed avatar xcrossed commented on May 17, 2024

@electrum Hello, have you fixed this problem? Anyone knows how to fix it?

Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:253)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:233)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:69)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:167)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$14(ResumableTasks.java:33)
at com.facebook.presto.hive.util.ResumableTasks$$Lambda$487/1333014523.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.executeOrMerge(BoundedExecutor.java:69)
at io.airlift.concurrent.BoundedExecutor.access$000(BoundedExecutor.java:28)
at io.airlift.concurrent.BoundedExecutor$1.run(BoundedExecutor.java:40)
... 3 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:250)
... 12 more
Caused by: java.lang.NullPointerException
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:103)
... 17 more

from presto.

xcrossed avatar xcrossed commented on May 17, 2024

https://groups.google.com/forum/#!topic/presto-users/_Fe9YrZ3gYg can fix it

from presto.

xmly avatar xmly commented on May 17, 2024

@electrum @mombergm Have the aircompressor being used in the presto?

Thanks

from presto.

electrum avatar electrum commented on May 17, 2024

@xmly Are you interested in text files? I filed a new issue for that: #7348

Please comment on the new issue with specific use cases, examples of how to create the tables in Hive, etc.

I'm going to close this issue since we now have our own LZO implementation and will use that instead.

from presto.

electrum avatar electrum commented on May 17, 2024

If you're interested in LZO for SequenceFile, please file a new issue for that, as that's a completely different implementation.

from presto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.