Comments (4)
recluster maintains optrespawn
, a badly named internal variable that keeps the current respawn delay
When a worker dies, we runs workerReplace
to replace the worker. At the beginning, in workerReplace we fist cap the respawn delay to not be greater than the backoff
option
optrespawn = Math.min(optrespawn, opt.backoff);
then we calculate the next moment that a worker is allowed to respawn, based on the time of the previous respawn plus the respawn delay.
Of course, that moment cannot be in the past. If the last respawn was hours ago, we need to make sure that the next one will be at least now
via Math.max
. Combining the two, we get:
nextSpawn = Math.max(now, lastSpawn + optrespawn * 1000)
Then we can calculate the delay of the timer that will respawn the process. If the last respawn was indeed ages ago, the next one will happen immediately (now - now
)
time = nextSpawn - now;
Then we can update the moment of the last spawn for next time
lastSpawn = nextSpawn;
Once everything is calculated, we multiply the current delay by two (it will be capped the next time a respawn happens, so we need not worry about it overshooting) and run a timer to reset it back via delayedDecreaseBackoff
.
Finally, we proceed with logging the delay and running the respawn timer
delayedDecreaseBackoff
(better name: debouncedDecreaseBackoff
) is a debounced function in the sense of lodash's _.debounce. Its effect is applied opt.backoff
seconds after its last called, but if called again sooner then that, it will reset the timer.
The timer is set to the maximum possible delay (opt.backoff) to ensure that the backoff is not decreased if respawns keep happening at intervals shorter than opt.backoff
. Once the delay hits maximum, it will be kept at that maximum (1) as long as worker respawns are requested, within opt.backoff
seconds or less.
If respawn requests stop happening, delayedDecreaseBackoff
will have a chance to execute and decrease the current delay by half. It ensures that the delay is not smaller than the minimum. Otherwise, if its larger,it schedules itself again (it needs to be decreased further)
Example: Given opt.respawn = 1s, opt.backoff = 10s, we should get something like this if workers keep dying:
1 -> 2 -> 4 -> 8 -> 10 -> 10 -> 10 -> ...
Once workers stop dying, the delay will start decreasing gradually by 1/2 every opt.backoff = 10
seconds:
10 -> 5 -> 2.5 -> 1.25 -> 1 (timer stops)
I'm not sure whats going on in your case. Since the current delay is always capped before being used and displayed, I don't have an idea where the bug might be
(1): well, it will oscillate between max and 1/2 max, given that workers don't die immediately but only after running for at least a while. So when the delay is max
, the next worker death happens at max+someDelta
and the timer gets a chance to run and decrease the backoff by 1/2. If we added a couple of seconds to opt.backoff
for this case, then we can ensure that the delay stays at max
from recluster.
It's a great detailed explanation. I totally understand the debouncedDecreaseBackoff (delayedDecreaseBackoff)
function.
In this case, I find out the reason why the respawn time exceed backoff
is that when the workers dies immediatly, the lastSpawn
will be assign the calculated value and the optrepsawn
will be multiplied by 2. the same process will run many times in few time.Finally ,It will resuilt in:
- optrespawn time will be set max value(
backoff
) - the lastSpawn is based on the last lastSpawn but last lastSpawn process may not finish,So the interval time will accumulate more and more.
When I set the time of the server error timeout is 5000 instead of 500 and the number of workers is 1 instead of 4 or error time of 10000、 workers of 2 , the result will be right.
Maybe this particular condition(server dies too fast) will trigger this result.
Can we add
if(opt.backoff) {
nextSpawn = Math.min(opt.backoff * 1000 + now,nextSpawn);
}
after
var nextSpawn = Math.max(now, lastSpawn + optrespawn * 1000);
to ensure the time between respawns not to exceed backoff
?
from recluster.
I think I see what you mean. Your issue happens with more workers, after a few of them die. Since the death occurred between now and lastSpawn, the next spawn is scheduled even later in the future.
But thats the correct behavior, isn't it?
Lets say recluster is already at maximum delay and two workers die at almost the same time. Since a respawn is already scheduled in, (e.g.) 10 seconds, we must schedule the next one in 20 seconds to ensure 10s breathing space inbetween. Otherwise, if 16 workers die in a row, there will be a flood of 16 respawns in a row (all happening at once after 10 seconds), which is precisely what we want to avoid... This way (at maximum delay), they will be evenly spaced opt.backoff
seconds apart.
The delay shown by the logger isn't between the two respawns - its between now and the currently scheduled respawn. That value may get bigger than opt.respawn
, but the time between respawns doesn't.
from recluster.
Yes,I think the I make sence of what happen. I mistaked the time
meaning in log.You really help me out of this understanding of problem.
Thank you ;)
from recluster.
Related Issues (20)
- Multiple calls support HOT 6
- Maybe emit() function can become more clear? HOT 2
- recluster.terminate() accepts a callback as argument HOT 2
- Question: what's the correct way to gracefully shutdown a cluster and its children? HOT 3
- respawn and backoff with multiple workers HOT 2
- Daemon option HOT 2
- Respawn issue with two instances HOT 2
- Windows Support ?
- Catch Error on Cluster.reload()
- Question: How to know when all workers are ready/active? HOT 1
- Documentation for return type of activeWorkers() function is invalid
- Processes stack up on concurrent restarts HOT 5
- timeout does not work as documented HOT 5
- Add support for cwd?
- Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed on node 9.2.1 HOT 1
- recluster and nodemon on Mac OS X
- Push latest to NPM HOT 1
- Is this mondule still maintained? HOT 3
- Why execArgv is by default not process.execArgv ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recluster.