Comments (3)
@jgerman it sounds like your issue is unrelated to missing lookups when loading checkpoints.
Check the back-pressure on your tasks. It is the most frustrating and hard to identify cause of random peer timeouts. Despite the brief section in the user guide, onyx 0.14 does not handle backpressure all that well. When the preceeding task is pushing data out faster than the downstream tasks can take, it can stop processing heartbeat messages.
If. you have a lot of back-pressure, consider writing to an intermediate kafka topic to decouple the imbalance between producer/consumer speed. Tuning the idle-sleeps in the config as the user guide suggests is actually not that effective in my experience.
http://www.onyxplatform.org/docs/user-guide/0.14.x/#backpressure
from onyx.
We actually see this issue in our long running jobs even when no changes to the job have been made. The stack trace is almost identical other than you're using S3 and we're using ZK.
I'm pretty sure ours is due to a task rebooting but haven't figured out why it happens, or why it appears to consistently be the same output task.
from onyx.
Thanks! Any suggestions for measuring back pressure? I've only indirectly looked at it via consumer lag on our kafka topics.
I've definitely seen issues with checkpoints not being written which I assumed was due to heartbeats (barriers?) not being processed.
from onyx.
Related Issues (20)
- output task with a window does not record the window checkpoint size correctly
- Is Dire still used in Onyx platform? HOT 7
- Upgrade Apache Curator framework to 4.0.x - SSL Support HOT 6
- Flow conditions validation shows bad error message when tasks are not connected. HOT 3
- Peer group manager where communicator fails to start is recoverable HOT 2
- Validation error for min-max-n-peers for flux policy is not printed correctly.
- Input plugin's poll! continues to be invoked after completed? HOT 2
- Move task-lifecycle backoff-until-task-start! into state machine.
- http://www.onyxplatform.org/ links to https://github.com/onyx-platform/onyx/releases/tag/0.13.x which does not exist
- Resume point AssertionError: Assert failed: (= slot-migration :direct) for {:mode :initialize}
- Output from job-snapshot-coordinates does not match input schema for build-resume-point
- Onyx patch versions should not require new tenancy-id HOT 2
- Clojure 1.10.0-beta4 isn't happy with Onyx HOT 1
- Co-located task scheduler does not respect capacity contrains
- Output plugin :after-task-stop lifecycle fn doesn't receive :onyx.core/scheduler-event HOT 1
- Onyx hangs when provide ":onyx.messaging.aeron/media-driver-dir" setting in peer-config HOT 1
- Project maintanence going forward HOT 28
- Aeron Reliability HOT 4
- IndexOutOfBoundsException from aeron HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onyx.