Giter VIP home page Giter VIP logo

Comments (4)

spion avatar spion commented on June 28, 2024

recluster maintains optrespawn, a badly named internal variable that keeps the current respawn delay

When a worker dies, we runs workerReplace to replace the worker. At the beginning, in workerReplace we fist cap the respawn delay to not be greater than the backoff option

optrespawn = Math.min(optrespawn, opt.backoff);

then we calculate the next moment that a worker is allowed to respawn, based on the time of the previous respawn plus the respawn delay.

Of course, that moment cannot be in the past. If the last respawn was hours ago, we need to make sure that the next one will be at least now via Math.max. Combining the two, we get:

nextSpawn = Math.max(now, lastSpawn + optrespawn * 1000)

Then we can calculate the delay of the timer that will respawn the process. If the last respawn was indeed ages ago, the next one will happen immediately (now - now)

time = nextSpawn - now;

Then we can update the moment of the last spawn for next time

lastSpawn = nextSpawn;

Once everything is calculated, we multiply the current delay by two (it will be capped the next time a respawn happens, so we need not worry about it overshooting) and run a timer to reset it back via delayedDecreaseBackoff.

Finally, we proceed with logging the delay and running the respawn timer

delayedDecreaseBackoff (better name: debouncedDecreaseBackoff) is a debounced function in the sense of lodash's _.debounce. Its effect is applied opt.backoff seconds after its last called, but if called again sooner then that, it will reset the timer.

The timer is set to the maximum possible delay (opt.backoff) to ensure that the backoff is not decreased if respawns keep happening at intervals shorter than opt.backoff. Once the delay hits maximum, it will be kept at that maximum (1) as long as worker respawns are requested, within opt.backoff seconds or less.

If respawn requests stop happening, delayedDecreaseBackoff will have a chance to execute and decrease the current delay by half. It ensures that the delay is not smaller than the minimum. Otherwise, if its larger,it schedules itself again (it needs to be decreased further)

Example: Given opt.respawn = 1s, opt.backoff = 10s, we should get something like this if workers keep dying:

1 -> 2 -> 4 -> 8 -> 10 -> 10 -> 10 -> ...

Once workers stop dying, the delay will start decreasing gradually by 1/2 every opt.backoff = 10 seconds:

10 -> 5 -> 2.5 -> 1.25 -> 1 (timer stops)

I'm not sure whats going on in your case. Since the current delay is always capped before being used and displayed, I don't have an idea where the bug might be

(1): well, it will oscillate between max and 1/2 max, given that workers don't die immediately but only after running for at least a while. So when the delay is max, the next worker death happens at max+someDelta and the timer gets a chance to run and decrease the backoff by 1/2. If we added a couple of seconds to opt.backoff for this case, then we can ensure that the delay stays at max

from recluster.

frankLife avatar frankLife commented on June 28, 2024

It's a great detailed explanation. I totally understand the debouncedDecreaseBackoff (delayedDecreaseBackoff) function.

In this case, I find out the reason why the respawn time exceed backoff is that when the workers dies immediatly, the lastSpawn will be assign the calculated value and the optrepsawn will be multiplied by 2. the same process will run many times in few time.Finally ,It will resuilt in:

  1. optrespawn time will be set max value(backoff)
  2. the lastSpawn is based on the last lastSpawn but last lastSpawn process may not finish,So the interval time will accumulate more and more.

When I set the time of the server error timeout is 5000 instead of 500 and the number of workers is 1 instead of 4 or error time of 10000、 workers of 2 , the result will be right.

Maybe this particular condition(server dies too fast) will trigger this result.

Can we add

    if(opt.backoff) {
        nextSpawn = Math.min(opt.backoff * 1000 + now,nextSpawn);
    }

after

    var nextSpawn = Math.max(now, lastSpawn + optrespawn * 1000);

to ensure the time between respawns not to exceed backoff?

from recluster.

spion avatar spion commented on June 28, 2024

I think I see what you mean. Your issue happens with more workers, after a few of them die. Since the death occurred between now and lastSpawn, the next spawn is scheduled even later in the future.

But thats the correct behavior, isn't it?

Lets say recluster is already at maximum delay and two workers die at almost the same time. Since a respawn is already scheduled in, (e.g.) 10 seconds, we must schedule the next one in 20 seconds to ensure 10s breathing space inbetween. Otherwise, if 16 workers die in a row, there will be a flood of 16 respawns in a row (all happening at once after 10 seconds), which is precisely what we want to avoid... This way (at maximum delay), they will be evenly spaced opt.backoff seconds apart.

The delay shown by the logger isn't between the two respawns - its between now and the currently scheduled respawn. That value may get bigger than opt.respawn, but the time between respawns doesn't.

from recluster.

frankLife avatar frankLife commented on June 28, 2024

Yes,I think the I make sence of what happen. I mistaked the time meaning in log.You really help me out of this understanding of problem.

Thank you ;)

from recluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.