Comments (9)
Yes, the probability of multiple executions will be slightly higher, but not unmanageably so. Unexpected netsplits and/or node deaths are not that common, especially if a graceful exit with job draining is implemented. I think this is probably an unavoidable cost of increased throughput.
As for the lock, I'm not sure you have fully understood my original proposal. In this new scenario, there will be one and exactly one advisory lock taken by one global dispatcher. Workers will not be required to take any locks at all.
It will not need the same query, it can work with a simple lockless SELECT LIMIT
which would be hyper fast. There will be some code shared, around enqueuing and deleting jobs. I imagine it to be implemented as a separate dispatcher module, so the user can choose which one they boot in their supervision tree.
e.g. MultiDispatcher
or SingletonDispatcher
from rihanna.
This architecture would need to perform some kind of durable state management that can handle nodes and workers going down, including the master. I'm sceptical that this would be orders of magnitude faster than the same implemented in Postgres though I've no data to back up my feeling.
How do you intend the state management and failure detection to work with this design?
from rihanna.
Nope, no durable state management is required. If a node goes down, we receive a down message to the dispatcher and simply retry the job on a new node.
from rihanna.
What happens when the global lock process goes down?
What happens when a node is isolated from the global lock process by a network partition?
from rihanna.
First case scenario:
A new singleton will be booted which will re-acquire the global lock and start reading jobs again. Some jobs may be executed twice.
Second case scenario:
Erlang's built-in monitoring will realise the network partition, interpret that node as down and assume none of the jobs it was running have been executed. The global lock process will re-dispatch these jobs to a node that is alive. Some jobs may be executed twice.
from rihanna.
What happens to the workers? All killed by the exit from the global lock process? In cloud environments network partitions are common (Erlang was designed for more reliable networks) so this may cause some disruption. I'm not sure how fast global links are, would be cool to test this.
In the network partition situation if we're using global processes we'll end up with at least two nodes running the global lock process. Would this be safe? If we're still running the same SQL query to the database it would be but I'm unsure if that was the intention.
All sounds fun so far :) I'd suggest that (at some point) it'd be worth doing some preliminary benchmarking so we can get a better understanding.
from rihanna.
Workers on the partitioned node may continue to run, which is why some jobs may execute twice. There was always a conscious design choice in Rihanna to guarantee at-least-once execution, hence this failure mode.
We will never have two nodes running the global lock process, because postgres will only ever grant the advisory lock once. In the event of a netsplit and two master nodes occurring, one of them will fail to take the lock and simply do nothing.
from rihanna.
It's the same guarantee, but the likelihood of multiple delivery would increases substantially, one to document well.
We will never have two nodes running the global lock process, because postgres will only ever grant the advisory lock once. In the event of a netsplit and two master nodes occurring, one of them will fail to take the lock and simply do nothing.
Would there be an additional database lock then?
If that's the case we wouldn't even need the same iterative query. I feel like there wouldn't actually be that much code shared with the current Rihanna.
from rihanna.
especially if a graceful exit with job draining is implemented.
I think that in the event of the dispatcher/lock death we want to brutally kill workers rather than killing them gracefully- otherwise multiple delivery is guaranteed.
As for the lock, I'm not sure you have fully understood my original proposal. In this new scenario, there will be one and exactly one advisory lock taken by one global dispatcher. Workers will not be required to take any locks at all.
I see, much clearer now :)
from rihanna.
Related Issues (20)
- Postgres error: "you don't own a lock of type ExclusiveLock" HOT 5
- Not execute in test sandbox ? HOT 1
- Running on many apps in an umbrella HOT 6
- Running N instances of rihanna on different tables HOT 2
- UndefinedFunctionError: function nil.id/0 is undefined HOT 3
- Question: Projection on when multi-queue support will be a thing? HOT 8
- Re-enqueue and retain metadata
- Should due_at be part of the order in the lock query? HOT 2
- Rihanna.Jobs documentation issue
- ERROR 25006 (read_only_sql_transaction) cannot execute SELECT FOR UPDATE in a read-only transaction HOT 2
- Returning {:reenqueue, due_at} from job removes job from table HOT 3
- lock optimization for very large jobs table. Possible?! HOT 10
- Report job type (e.g. module name) in Telemetry events HOT 6
- Jobs that unexpectedly raise/exit are never retried HOT 1
- Rihanna Jobs sporadically not appearing in logs HOT 2
- Upgrade checks fail when database has multiple schemas & job tables
- Warnings on 1.11.4
- Rihanna.enqueue_many ? HOT 2
- Option to create the jobs_table_name on a different schema
- Any chance of a version bump / release with the latest changes? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rihanna.