Comments (4)
We have not seen this in our self-hosted environment. Might be due something EC2 specific. Do you have any theories about the root cause?
gszjulcsi [email protected] wrote:We use distributed lzo indexer on EMR (hadoop version: 1.0.3), files stored on Amazon s3.
Sometimes (observed twice by now) we had the following issue:
all lzo.index is generated, but some of the lzo.index.tmp files are not deleted and cause problem when processing them with pig. No exception or error is thrown during the indexing and job is reported to run successfully.
βReply to this email directly or view it on GitHub.
from hadoop-lzo.
Meanwhile we have noticed that these index.tmp files disappeared. We
suspect that was an s3 eventual consistency issue, namely it took s3 too
long (cc. 7 hours) to maintain consistency.
2014-01-29 dvryaboy [email protected]
We have not seen this in our self-hosted environment. Might be due
something EC2 specific. Do you have any theories about the root cause?
gszjulcsi [email protected] wrote:We use distributed lzo indexer
on EMR (hadoop version: 1.0.3), files stored on Amazon s3.Sometimes (observed twice by now) we had the following issue:
all lzo.index is generated, but some of the lzo.index.tmp files are not
deleted and cause problem when processing them with pig. No exception or
error is thrown during the indexing and job is reported to run
successfully.--Reply to this email directly or view it on GitHub.
Reply to this email directly or view it on GitHubhttps://github.com//issues/87#issuecomment-33571495
.
from hadoop-lzo.
I see. Well perhaps it would make sense to add a filter to the lzo input formats so they ignore these temp files and you don't get an error. Feel free to send a pull request with such a change, we will be happy to take a look.
from hadoop-lzo.
excluding .tmp files is a good fix.
There are other subtle issues with S3 because of these delays e.g. https://github.com/kevinweil/elephant-bird/issues/309
from hadoop-lzo.
Related Issues (20)
- create a public email group?
- sc.textFile doesn't seem to use LzoTextInputFormat when hadoop-lzo is installed HOT 2
- where is /build.properties generated HOT 3
- No output when using index file HOT 6
- Hadoop LZO does not take non-default queue HOT 1
- mvn clean test doesn't build jar HOT 4
- lzo with gradle
- New maven version with AArch64 binary HOT 5
- JNI issue in LzoDecompressor_decompressBytesDirect
- Build Failure on Ubuntu HOT 7
- maven.twttr.com has been down for over a day
- Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16 in Twitter public Maven repo (http://maven.twttr.com) HOT 11
- pom.xml may have an incorrect license
- Compression Level is ignored. HOT 2
- support fileglobs when index files
- Full build instructions for windows 10 HOT 1
- maven.twttr.com outage - 503 errors - breaks builds of downstream projects HOT 23
- changes to continuous integration
- How to decompress LZO file using hadoop-lzo
- LZO codec not working for graviton instances
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop-lzo.