I'm proposing a feature addition to chaoskube that would add the ability suspend the c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Observe non-working times feature about chaoskube HOT 6 CLOSED

linki commented on May 28, 2024

Observe non-working times feature

from chaoskube.

Comments (6)

twildeboer commented on May 28, 2024 1

Generally speaking, I suggest being careful to resist the temptation to over-engineer features. Rather, design and implmenet what you know is needed and then see how that goes and whether there is demand for more or something different.

Regarding this feature specifically, speaking only for our own use case, we do not have need for both detailed "off-time" and "on-time" specifications. Our team has typical work hours and has an on-call rotation for non-working hours. I imagine that would generally describe the majority of the chaoskube users. Since chaoskube is (from our perspective) intended to be run as an on-going stabiliity test, all we care about is being able to limit which services are impacted, and not making on-call life harder on anyone unnecessarily. You may notice that chaosmonkey does not provide such detailed scheduling, AFAIK. If someone wants to run chaoskube on the weekend, they can just deploy another instance of it to do whatever they want. The scheduling will never be perfect anyway, since the holidays will need to be updated from time to time, at least. Finally, we view chaoskube as a tool that gives us confidence in the resilience of our systems, but it is not critical to our infrastructure and does not need precise scheduling capabilities.

Another reason to avoid precise scheduling capability is that it is significantly more difficult to implement correctly. You will have to include all kinds of logic to handle periods that span midnight and Daylight Saving jumps. And you will have to try to find a way to support such configuration that is not confusing. People will get confused about what their configuration really means, no matter how carefully you write your documentation, and then you will get all kinds of bug reports that are actually user-error or user misunderstanding.

You could, perhaps, if the need was shown to be significant, add the ability to override each global "off-time" attribute with service-specific ones through annotations. But I would wait and see if this is a real need, because it adds complexity. Our team does not need this.

from chaoskube.

klautcomputing commented on May 28, 2024

I did some of the things you proposed in my PR already. If you want extend it with the things I don't have that'd be the easiest for you.

from chaoskube.

linki commented on May 28, 2024

@twildeboer @klautcomputing I try to look at the PR again over the weekend. At first sight the way to specify the time frame as well as the implementation seemed quite complicated to me.

@klautcomputing would you think defining the range similar to https://github.com/hjacobs/kube-downscaler#configuration would simplify usage as well as implementation and still be able to capture your use cases?

e.g.: be active at work time as well as midday on weekends would be:

--active-at "Fri-Fri 10:00-16:00 CET, Sat-Sun 10:00-12:00 CET"

from chaoskube.

klautcomputing commented on May 28, 2024

@linki could you leave a couple of comments on my code where you think my implementation is too complicated?

--active-at "Fri-Fri 10:00-16:00 CET, Sat-Sun 10:00-12:00 CET"

Did you maybe mean Mon-Fri? Because otherwise I don't see how that format is meant to work. If yes, then I think that'd be easily doable. Thinking about it again we might not want this as a flag for choaskube but instead as a label in the manifest which would allow individual teams to specify their own schedule.

This raises the general question of whether we want chaoskube to be purely opt in. Given that chaos engineering is not something that should surprise a team, but they should have made an active decision to test their systems with chaos it might be the right choice and would get rid of --percentage in my code and make it a little easier.

from chaoskube.

twildeboer commented on May 28, 2024

@linki - PR for this feature waiting for you.

from chaoskube.

linki commented on May 28, 2024

@klautcomputing @twildeboer Thank you for all your input.

The above feature is part of v0.8.0 so I'm going to close this issue. I think we found a fairly easy way to configure it althought the equivalent of --workhours is not defined including but excluding similar to --offdays and --holidays.

I also think that at some point some configuration should be overridable by annotations or moved entirely to annotations, e.g. for users defining a mean-time-to-failure on a per-pod basis and independent of the cluster size (the "percentage" feature, #20).

from chaoskube.

Observe non-working times feature about chaoskube HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent