Comments (18)
What is the status of this issue? Latest packaged release (0.56) does not seem to support this.
Query 20140114_114351_00007_iv7th failed: Error opening Hive split hdfs://path/to/file (offset=0, length=25658859) using org.apache.hadoop.hive.ql.io.RCFileInputFormat: Unknown codec: com.hadoop.compression.lzo.LzoCodec
from presto.
@FlxRobin
With master version, I only copied file hadoop-lzo-0.4.20-SNAPSHOT.jar to plugin dir.
cp hadoop-1.2.1/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar ./presto-server-0.57-SNAPSHOT/plugin/hive-hadoop1/
When I do that, it works like charm. (Need to stop/start the cluster)
PrestoDB load jar of plugin you load.
In HIVE (picked from official doc) :
CREATE TABLE test_lzo2
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
AS
SELECT transaction_id as id,
profil as label
from transaction
limit 5
In presto :
presto:casino> select * from test_lzo2;
id | label
----------+----------
13701376 | Fidélisé
13701377 | Fidélisé
13701378 | Fidélisé
13701379 | Fidélisé
13701380 | Fidélisé
(5 rows)
Query 20140114_164727_00004_xc7ir, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:01 [10 rows, 220B] [12 rows/s, 278B/s]
from presto.
Thanks Damien, I had two minor issues while enabling this:
-
You have to add the "native-lzo" lib to Presto, you can do this by adding the line below to your presto-server/etc/jvm.config file:
-Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64 -
If you use CDH4 the path is of course not presto-server/plugin/hive-hadoop1/ but presto-server/plugin/hive-cdh4/
from presto.
the version of presto which i used is 0.75,it seems don't support lzo compression,anyone knows whether does the presto support lzo?
from presto.
@electrum Hi, I did this and saw that Presto can support LZO-compressed text files now.
But it does not support reading lzo index file and split lzo text file, I understand the reason is that the modification of the hadoop-lzo source code, but it still confused me.
I saw that hive can handle lzo index file well without modifying hadoop-lzo source code, have you got any idea how hive does this? What should I do to support lzo index and split lzo file.
from presto.
That patch modifies the getRecordReader()
method to effectively skip index files. Presto doesn't use the InputFormat
's split computation (it has its own). Assuming that patch makes it work at all, it is likely that the splitting doesn't work (I would be wary of wrong answers).
In any case, we recommend switching to a modern format like ORC, RCFile or Parquet. These formats, especially ORC, will be substantially faster for a variety of reasons and have better compression.
from presto.
Thanks for explaining!
from presto.
@FlxRobin
I am using presto 0.86.
It does not work after set jvm.config and copy hadoop-lzo.jar to plugin/hive-hadoop2 (i use hive-hadoop2)
Is there any other solution after version 0.57?
from presto.
@FlxRobin The hadoop-lzo.jar should be build on https://github.com/klbostee/hadoop-lzo.
Works fine for presto 0.96.
from presto.
@yuananf
Thx for your reply.
I try https://github.com/klbostee/hadoop-lzo , but got same error:
2015-03-05T22:13:35.567-0800 DEBUG query-execution-4 com.facebook.presto.execution.SqlStageExecution Stage 20150306_061333_00006_y9z6v.1 is FAILED
2015-03-05T22:13:35.570-0800 DEBUG query-execution-0 com.facebook.presto.execution.SqlStageExecution Stage 20150306_061333_00006_y9z6v.0 is FAILED
2015-03-05T22:13:35.573-0800 ERROR hive-hive-3 com.google.common.util.concurrent.Futures$CombinedFuture input future failed.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:na]
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:148) ~[presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.HiveSplitSourceProvider$3.process(HiveSplitSourceProvider.java:258) ~[presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker.doWalk(AsyncWalker.java:95) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker.access$000(AsyncWalker.java:34) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker$1.run(AsyncWalker.java:72) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.SuspendingExecutor$1.run(SuspendingExecutor.java:67) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:145) ~[presto-hive-0.72.jar:0.72]
... 14 common frames omitted
Caused by: java.lang.NullPointerException: null
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101) ~[hadoop-lzo.jar:na]
... 19 common frames omitted
2015-03-05T22:13:35.573-0800 DEBUG query-execution-1 com.facebook.presto.execution.QueryStateMachine Query 20150306_061333_00006_y9z6v is FAILED
~~~~ more ~~~~
2015-03-05T22:13:35.591-0800 ERROR hive-hive-12 com.google.common.util.concurrent.Futures$CombinedFuture input future failed.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:na]
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:148) ~[presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.HiveSplitSourceProvider$3.process(HiveSplitSourceProvider.java:258) ~[presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker.doWalk(AsyncWalker.java:95) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker.access$000(AsyncWalker.java:34) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.AsyncWalker$1.run(AsyncWalker.java:72) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.SuspendingExecutor$1.run(SuspendingExecutor.java:67) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.executeOrMerge(BoundedExecutor.java:82) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor.access$000(BoundedExecutor.java:41) [presto-hive-0.72.jar:0.72]
at com.facebook.presto.hive.util.BoundedExecutor$1.run(BoundedExecutor.java:53) [presto-hive-0.72.jar:0.72]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:145) ~[presto-hive-0.72.jar:0.72]
... 14 common frames omitted
Caused by: java.lang.NullPointerException: null
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101) ~[hadoop-lzo.jar:na]
... 19 common frames omitted
2015-03-05T22:13:35.620-0800 INFO query-execution-1 com.facebook.presto.event.query.QueryMonitor TIMELINE: Query 20150306_061333_00006_y9z6v :: elapsed 1811.00ms :: planning 273.56ms :: scheduling 1537.00ms :: running 0.00ms :: finishing 1537.00ms :: begin 2015-03-05T22:13:33.760-08:00 :: end 2015-03-05T22:13:35.571-08:00
It seem like some jar reflect with others after copy hadoop-lzo.jar to plugin. but i have no idea solute this. could u help me?
by the way, i use aws emr enviorment, which using JAVA 7, Hadoop 2.4.0, Hive 0.13.1
from presto.
This will be fixed soon with the new native LZO implementation in https://github.com/airlift/aircompressor
from presto.
Great! Looking forward to it.
from presto.
@electrum Do you know in which future release aircompressor would be?
from presto.
@electrum Hello, have you fixed this problem? Anyone knows how to fix it?
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:253)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:233)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:69)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:167)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$14(ResumableTasks.java:33)
at com.facebook.presto.hive.util.ResumableTasks$$Lambda$487/1333014523.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.executeOrMerge(BoundedExecutor.java:69)
at io.airlift.concurrent.BoundedExecutor.access$000(BoundedExecutor.java:28)
at io.airlift.concurrent.BoundedExecutor$1.run(BoundedExecutor.java:40)
... 3 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:250)
... 12 more
Caused by: java.lang.NullPointerException
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:103)
... 17 more
from presto.
https://groups.google.com/forum/#!topic/presto-users/_Fe9YrZ3gYg can fix it
from presto.
@electrum @mombergm Have the aircompressor being used in the presto?
Thanks
from presto.
@xmly Are you interested in text files? I filed a new issue for that: #7348
Please comment on the new issue with specific use cases, examples of how to create the tables in Hive, etc.
I'm going to close this issue since we now have our own LZO implementation and will use that instead.
from presto.
If you're interested in LZO for SequenceFile, please file a new issue for that, as that's a completely different implementation.
from presto.
Related Issues (20)
- Disable internal authentification (kerberos) between corrdinator and workers HOT 3
- Improve CTE Materialization Execution Tests
- remove unneeded CircleCI tests from docs-only pull requests HOT 2
- memoryCacheStats.ssdStats is NULL at the time of recording metric HOT 1
- kCounterSpillPeakMemoryBytes defined as a HISTOGRAM type but recorded as a regular metric
- The official MySQL Driver cannot connect to MySQL databases of other versions HOT 1
- Presto release 0.286
- [native] Add machine-wide number of bytes sent and received over the network during task execution
- [Native] Adding functions in Prestissimo HOT 7
- Presto JDBC driver needs to upgrade Jackson libraries to 2.16.0 due to various CVE's
- [Bug] presto-benchto-benchmarks driver not started
- [Native] Document Prestissimo Configuration
- Common Subquery Expression Materialization
- Query 20240119_063836_00856_pugph failed: Unsupported column type: timestamp
- [Doc] Support for doc translations. HOT 1
- Call ListBucket for each file creation
- [documentation] autogenerate Properties Reference doc from @ConfigDescription
- [documentation] improve CONTRIBUTING.md with a table of ways to contribute and first steps for each
- [native] Output operators are missing in Stage Summary Page HOT 2
- How logs are rotated and compressed for server and http-server.log
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presto.