Hi, since the Alert CPUThrottlingHigh</cod

The math <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Comments (40)

tomwilkie commented on June 26, 2024 20

Sorry for slow response. This alert was added exactly for this reason: with low limits, spiky workloads can have low averages and still be being throttled. Consider this: if we sample every 15s, and do a rate[1m], even 12 seconds of maxed out CPU will appear as 20% CPU utilisation.

What we've found is raising our limits on container CPU (whilst keeping container CPU requests close to 95%-ile "average" usage*) has allowed us to have lower throttling and decent utilisation.

If you don't want this, you can set the threshold to something >25% in the _config field.

avg(quantile_over_time(0.95, namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="default", container_name="node-exporter"}[1h]))

from kubernetes-mixin.

paulfantom commented on June 26, 2024 10

Since this is a very useful alert to have, especially during debugging, it is also a very chatty one (as can be seen by the number of issues linked here). In many cases this alert is not actionable (apart from silencing it) because the application is not latency-sensitive and can work without problems even when throttled. Additionally, this alert is based on cause and not a symptom. I propose to reduce alert severity to info.

from kubernetes-mixin.

chiluk commented on June 26, 2024 9

Am I reading it correctly that your CPU requests and Cpu Limit are set to .07 and .08 respectively? Think about what is going on here. Whenever your application is runnable it is only able to execute for 8ms every 100ms on only 1 CPU before it hits throttling. Assuming a 3ghz CPU clock this is similar to giving your application a 210mhz single-core cpu *(this reminds me of my days back in the 90's with a Cyrix 166+).

Depending on what it is or isn't doing the in-kernel context switch time could potentially be that expensive without your application doing anything *(you can thank spectre/meltdown for that). Basically your Requests are Limits are bounded too tightly. They are set well below the threshold that can be reasonably accounted for reliably by the kernel with useful results.

I don't know what the minimum limit should be set to, but I do think you are well below that based on the throttling percentages you are seeing. This issue is solved. Your expectations of what can be reasonably accomplished with existing kernel constructs and hardware need to be re-evaluated.

from kubernetes-mixin.

aantn commented on June 26, 2024 8

This isn't a false alarm and it isn't due to CFS kernel bugs!

I've written a whole wiki page on this and how to respond to each subset of this alert

The gist of it is that processes are being deprived CPU when they need it and that can happen even when CPU is available. I know people consider it a best practice to set CPU limits, but if you use CPU requests for everything then the simple and safe action here is to remove the limits.

Unless this is happening on metrics-server in which case it's a whole different story...

from kubernetes-mixin.

chiluk commented on June 26, 2024 7

The math @benjaminhuo pointed at looks correct. I suspect @bgagnon is probably hitting the inadvertent throttling covered in my talk resulting in the increased throttled period percentages he's seeing. I suspect installing kernels with the fixes will likely alleviate some of the throttling such that the monitor can be decreased. Hopefully if these patches ever get accepted, bursty applications can have tighter limits with decreased throttling.

from kubernetes-mixin.

cbeneke commented on June 26, 2024 6

Correct me if I'm wrong: As far as I understand the node_exporter actually only uses up CPU cycles when being scraped (which for a default prometheus setting should be every 15 or 30 seconds). This means on average the pod does have a very flat line of CPU usage. But the problem is, that the container_cpu_cfs_periods_total only increases, when the pod actually uses CPU (On my private cluster I see value increases of somewhat around 4-12 period counts per scrape (Which equals 0.4-1.2 seconds running time). Since the container_cpu_cfs_throttled_periods_total increases almost equally, as the pod is - when running - hitting the throttling limit on almost every period the alert is firing.

To be honest, I have no Idea how to build general-approach alerts for this then (or if it even is relevant), since it hardly depends on the application. In case of node-exporter it should be irrelevant (since prometheus doesn't care about some overhead in scraping and I've seen no cluster yet, where the node_exporter scrape was slowed to more than 3 seconds)

from kubernetes-mixin.

chiluk commented on June 26, 2024 6

We do not use prometheus, so I don't know about the default values etc. @cbeneke has the right idea. Since the pod only uses cpu sporadically it always hits throttling when it is running. If you don't care about the response times of this pod or how long it takes to "gather metrics" I would leave requests where they are, and increase the limit until you no longer see throttling. That way the pod would only be scheduled by the kernel when nothing else is able to run. This is similar to how we schedule our batch jobs with low requests, but high limit. That way they rarely pre-empt latency sensitive applications, but they are allowed to use a ton of cpu time that would otherwise be sacrificed away to the idle process gods.

from kubernetes-mixin.

aantn commented on June 26, 2024 6

In case anyone is curious, I'll elaborate on the previous comment with a specific example:

Here's a numerical example of throttling when average CPU is far below the limit.

Assumptions

An http server handles one http request per second, which takes 30 milliseconds to handle
The server runs inside a container with request=limit=130m
The node's kernel parameters are default - specifically, the CFS scheduling period is configured as 100ms

Outcome

Average CPU of 30% (the server runs 30 milliseconds every second)
The server is allowed to run for 13 consecutive milliseconds every 100 milliseconds (a limit of 130m is 13% of a CPU-second. With a CFS period of 100ms that means 13% of each CFS period - i.e. 13ms)
When the server gets a request it needs to run for 30 milliseconds. In the first CFS period it runs for 13ms then waits 87 ms for the period to end. In the second CFS period, the same. In the third period it runs the remaining 4ms and finishes running.
Hence, the server was throttled in 2 out of 3 CFS periods it ran in. Therefore it was throttled 66% of the time.
The Prometheus alert CPUThrottlingHigh fires. (The metrics it uses are taken from kernel stats nr_throttled/nr_periods)

This is not a false positive alert. There is a real user facing impact. A server is getting one http request per second. It should take only 30ms to handle it. Yet that request takes 204ms instead! There was real latency introduced here. Performance got worse by 6.8x. Despite the pod having a limit of 130m which is far above average CPU of 3%.

In short, as always, remove those darn limits if you can.

from kubernetes-mixin.

szymonpk commented on June 26, 2024 4

@metalmatze Disabling cfs-quota or removing cpu limits for containers with small limits and spiky workloads.

from kubernetes-mixin.

alibo commented on June 26, 2024 4

I think the title is misleading a little bit, I don't think they're false positives. Even with an updated kernel, applications still suffer from throttled cpu periods and perform much slower when many processes or threads are running at the same time (the situation is much worse for applications that handle each request in a separated thread or process such as php-fpm based apps or their average response time is more than (available quota periods in ms) / (number of threads or processes running))

For Golang apps such as node-exporter, you can set GOMAXPROCS to a lower value than node's cpu cores or use Uber's automaxprocs library to mitigate the CPU throttling issue:

https://github.com/uber-go/automaxprocs

Benchmarks:
uber-go/automaxprocs#12

from kubernetes-mixin.

metalmatze commented on June 26, 2024 3

Talking to @gouthamve again, I am now running this alert in my cluster with cpuThrottlingPercent: 50. Having bumped the node-exporter cpu limits from 102m to 200m the alert isn't trigger happy anymore.

kubernetes-mixin/config.libsonnet

Line 44 in f7ca48c

cpuThrottlingPercent: 25,

Therefore the question: Should we set cpuThrottlingPercent in these mixins to 50, 60, or even 75 by default? What do you think?

from kubernetes-mixin.

szymonpk commented on June 26, 2024 3

@metalmatze I think your process is still throttled and it may affect its performance. So it is just hiding the real issue.

from kubernetes-mixin.

AndrewSav commented on June 26, 2024 3

I guess we need to define what we call "false positive" here. IMO a false positive in this context is an alert that is not actionable, e.g. not indicative of a real problem that requires an action. So far I was not able to deduce why those alerts randomly trigger and disappear many times a day and how they help me.

from kubernetes-mixin.

gouthamve commented on June 26, 2024 2

I've found this issue here: kubernetes/kubernetes#67577

I might dig into this later though. See this for some more info: https://twitter.com/putadent/status/1047808685840334848

from kubernetes-mixin.

bgagnon commented on June 26, 2024 2

@chiluk's recent talk at KubeCon19 revealed all the intricate details of CFS and throttling. Details about the kernel patch are now widely documented (see kubernetes/kubernetes#67577), but the bit that caught my eye is that calculating the throttling percentage based on seconds is apparently wrong:

Throttling seconds accumulate on the counter for every running thread. As such, one cannot come up with a percentage value without also knowing the number of threads at time. Instead, the alert should be based on the ratio of periods, which are global for all cores, not seconds.

I'm thinking the alert in this repo should be changed in that direction. Thoughts?

from kubernetes-mixin.

brancz commented on June 26, 2024 2

FYI there is also already the cpuThrottlingSelector configuration that allows you to scope or exclude certain containers/namespaces/etc.

from kubernetes-mixin.

chiluk commented on June 26, 2024 2

@irizzant and @alibo are correct. It's highly unlikely that you are receiving false positives. However it is likely that you are getting positives for very short bursts. I don't know enough about the monitor, but it might be useful to put some threshold on the monitor where it only triggers if the application is throttled for more than x% of the last many periods. I'd expect most well written applications to be throttled at some point in time. It also might be useful to be able to put such a threshold in the pod spec itself so it could be twiddled per pod. Alright that's my attempt at thought leadering here. Hopefully cgroups v2 will make some of this mess "better" without creating a whole new range of issues.

If you'd rather not read the long blog post I wrote that @KlavsKlavsen linked I also gave a topic on this subject a few years back.
https://www.youtube.com/watch?v=UE7QX98-kO0

from kubernetes-mixin.

chiluk commented on June 26, 2024 2

Another possibility would be to create a kernel scheduler config such that runnable throttled applications would receive run time when the idle process would otherwise be run. That might really muddy the accounting metrics in the kernel, and would probably take a herculean effort to get scheduler dev approval.

from kubernetes-mixin.

chiluk commented on June 26, 2024 2

Without limits, a misbehaving or crashlooping application can theoretically eat all the available CPU which would adversely affect performance of other applications on the system. Even when request operating correctly as a minimum guarantee, an application using 100% of all cores of a CPU would cause thermal throttling on the CPU itself which can lead to lower performance for collocated behaving applications. Additionally it might cause a scheduling delay of a behaving application for a cpu time slice (~5ms).

For this reason, my recommendation for interactive/request servicing applications is to set cpu limits large enough so as to avoid throttling, but not so large that a misbehaving application can eat an entire box.

from kubernetes-mixin.

cbeneke commented on June 26, 2024 1

Hmm, thats interesting. But from what I read thats actually a bug in the kernel/cfs. Especially taking https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1 in mind, spiky workloads are throttled for no reason. I'm not getting how to mitigate that alert though. Afaict the only real mitigation is to just disable limits (which is not really an option).
Question here: How should the Alert be helpful then? I have pods running at 90-95% CPU throttling (regarding to this calculation) which do calculations only once a minute: they are running at their CPU limit for 3-5 seconds and do nothing the rest of the time.

Imho the alert is more misleading / too trigger friendly, as long as the mentioned bug(s) is/are not fixed (Thanks for linking those)

from kubernetes-mixin.

metalmatze commented on June 26, 2024 1

Sure. What do you propose instead @szymonpk?
My comment was more about people silencing or removing this alert completely at the moment and how to temporarily mitigate that. 🙂

from kubernetes-mixin.

omerlh commented on June 26, 2024 1

Thank you for your detailed information. I should have thought about that...

At first I used the default kube-prometheus default resources (102m request/250 limit), but I still experienced throttling. So I increased resources to 800m, which solved the issue - but noticed node-exporter does not use it (it needs something ~0.1m). So I reduce the resources back - and now I'm fighting with this alert.

from kubernetes-mixin.

alibo commented on June 26, 2024 1

Another possibility would be to create a kernel scheduler config such that runnable throttled applications would receive run time when the idle process would otherwise be run. That might really muddy the accounting metrics in the kernel, and would probably take a herculean effort to get scheduler dev approval.

@chiluk The burstable CFS controller is introduced in Kernel 5.14 (it's not released yet!) can mitigate this issue a little bit, specially improves P90+ of response time a lot based on the benchmarks are provided:

cpu.cfs_burst_us: torvalds/linux@f418371
docs: alibaba/cloud-kernel@a0b0376

however, it's not implemented in CRI-based container runtimes yet:

from kubernetes-mixin.

paulfantom commented on June 26, 2024 1

I guess we need to define what we call "false positive" here. IMO a false positive in this context is an alert that is not actionable, e.g. not indicative of a real problem that requires an action.

In that context, and when application is not experiencing any issues manifested with other alerts, it is a false-positive. For exactly this reason CPUThrottlingHigh is shipped with severity: info and not warning nor critical. The idea is that with latest alertmanager you can configure inhibition rule to prevent alert from firing when there is no other alert firing for the specified label sets. Issue in kube-prometheus has a bit more details: prometheus-operator/kube-prometheus#861.

from kubernetes-mixin.

metalmatze commented on June 26, 2024

I have the same situation on my personal cluster with the overall load (1 - avg(rate(node_cpu{mode="idle"}[1m]))) being at ~20%.

/cc @gouthamve

from kubernetes-mixin.

metalmatze commented on June 26, 2024

It would be nice to actually debug this on the CFS (completly fail scheduler) layer.
Sadly, I have not clue how to do that, but others might. Anyone? ☺️

from kubernetes-mixin.

metalmatze commented on June 26, 2024

Alright. Thanks a bunch for the further info. I'll look into that for my personal cluster and try to get a better idea.

from kubernetes-mixin.

dkozlov commented on June 26, 2024

helm/charts#14801

from kubernetes-mixin.

benjaminhuo commented on June 26, 2024

@bgagnon this alert is based on period already, I don't think we need to change that .
please refer to https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/resource_alerts.libsonnet#L108

from kubernetes-mixin.

bgagnon commented on June 26, 2024

Thanks @benjaminhuo and @chiluk, I must have misread the alert definition!

from kubernetes-mixin.

omerlh commented on June 26, 2024

We actually just deployed a cluster with the fix (kernel version 4.14.154-128.181.amzn2.x86_64) and still seeing the same issue with node exporter:

While the actual CPU usage is very low:

I think there is another issue because the actual usage is very low - ~6% from the request.

from kubernetes-mixin.

AndrewSav commented on June 26, 2024

So does this explanation mean that the default values set by kube-prometeus are non-sesnsical / wrong?

from kubernetes-mixin.

omerlh commented on June 26, 2024

I don't know to be honest... there is a discussion here prometheus-operator/kube-prometheus#214.

It looks like the values make sense - node exporter almost not using any cpu, so giving it very little resources is reasonable - but I do wonder why it's still get throttled...

from kubernetes-mixin.

omerlh commented on June 26, 2024

So maybe we can add a selector for excluding containers from the alerts? So users could easily ignore such containers?

from kubernetes-mixin.

brancz commented on June 26, 2024

Info level severity sounds good to me

from kubernetes-mixin.

metalmatze commented on June 26, 2024

Everybody, please leave a review on #453. Thanks!

from kubernetes-mixin.

KlavsKlavsen commented on June 26, 2024

issues as described here https://engineering.indeedblog.com/blog/2019/12/cpu-throttling-regression-fix/ seems to be what this alert shows

from kubernetes-mixin.

irizzant commented on June 26, 2024

I totally agree with @alibo , this is not misleading. I initially disabled the alert and then found myself hunting down the reason for extremely slow and/or failing pods!

CPU throttling is a serious issue in clusters, and also blindly removing limits can cause further problems.

A very nice to have feature in dashboards would be a graph showing CPU waste, based on CPU requests.

from kubernetes-mixin.

chiluk commented on June 26, 2024

@aantn understands.

However, removing the limits is not strictly "safe" if you have untrustworthy apps or poor developers.

from kubernetes-mixin.

levsha commented on June 26, 2024

@aantn understands.

However, removing the limits is not strictly "safe" if you have untrustworthy apps or poor developers.

How exactly is this unsafe?

from kubernetes-mixin.

CPUThrottlingHigh false positives about kubernetes-mixin HOT 40 OPEN

Comments (40)

Assumptions

Outcome

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent