Giter VIP home page Giter VIP logo

Comments (4)

dvryaboy avatar dvryaboy commented on August 12, 2024

We have not seen this in our self-hosted environment. Might be due something EC2 specific. Do you have any theories about the root cause?
gszjulcsi [email protected] wrote:We use distributed lzo indexer on EMR (hadoop version: 1.0.3), files stored on Amazon s3.

Sometimes (observed twice by now) we had the following issue:

all lzo.index is generated, but some of the lzo.index.tmp files are not deleted and cause problem when processing them with pig. No exception or error is thrown during the indexing and job is reported to run successfully.

β€”Reply to this email directly or view it on GitHub.

from hadoop-lzo.

gszjulcsi avatar gszjulcsi commented on August 12, 2024

Meanwhile we have noticed that these index.tmp files disappeared. We
suspect that was an s3 eventual consistency issue, namely it took s3 too
long (cc. 7 hours) to maintain consistency.

2014-01-29 dvryaboy [email protected]

We have not seen this in our self-hosted environment. Might be due
something EC2 specific. Do you have any theories about the root cause?
gszjulcsi [email protected] wrote:We use distributed lzo indexer
on EMR (hadoop version: 1.0.3), files stored on Amazon s3.

Sometimes (observed twice by now) we had the following issue:

all lzo.index is generated, but some of the lzo.index.tmp files are not
deleted and cause problem when processing them with pig. No exception or
error is thrown during the indexing and job is reported to run
successfully.

--Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com//issues/87#issuecomment-33571495
.

from hadoop-lzo.

dvryaboy avatar dvryaboy commented on August 12, 2024

I see. Well perhaps it would make sense to add a filter to the lzo input formats so they ignore these temp files and you don't get an error. Feel free to send a pull request with such a change, we will be happy to take a look.

from hadoop-lzo.

rangadi avatar rangadi commented on August 12, 2024

excluding .tmp files is a good fix.

There are other subtle issues with S3 because of these delays e.g. https://github.com/kevinweil/elephant-bird/issues/309

from hadoop-lzo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.