Kubernetes Cluster Proportional Autoscaler Container

License: Apache License 2.0

Makefile 19.45% Shell 3.56% Go 74.39% Mustache 2.59%

cluster-proportional-autoscaler's Introduction

Horizontal cluster-proportional-autoscaler container

Overview

This container image watches over the number of schedulable nodes and cores of the cluster and resizes the number of replicas for the required resource. This functionality may be desirable for applications that need to be autoscaled with the size of the cluster, such as DNS and other services that scale with the number of nodes/pods in the cluster.

Usage of cluster-proportional-autoscaler:

      --alsologtostderr[=false]: log to standard error as well as files
      --configmap="": ConfigMap containing our scaling parameters.
      --default-params=map[]: Default parameters(JSON format) for auto-scaling. Will create/re-create a ConfigMap with this default params if ConfigMap is not present.
      --log-backtrace-at=:0: when logging hits line file:N, emit a stack trace
      --log-dir="": If non-empty, write log files in this directory
      --logtostderr[=false]: log to standard error instead of files
      --namespace="": Namespace for all operations, fallback to the namespace of this autoscaler(through MY_POD_NAMESPACE env) if not specified.
      --poll-period-seconds=10: The time, in seconds, to check cluster status and perform autoscale.
      --stderrthreshold=2: logs at or above this threshold go to stderr
      --target="": Target to scale. In format: deployment/*, replicationcontroller/* or replicaset/* (not case sensitive).
      --v=0: log level for V logs
      --version[=false]: Print the version and exit.
      --vmodule=: comma-separated list of pattern=N settings for file-filtered logging
      --nodelabels=: NodeLabels for filtering search of nodes and its cpus by LabelSelectors. Input format is a comma separated list of keyN=valueN LabelSelectors. Usage example: --nodelabels=label1=value1,label2=value2.
      --max-sync-failures=[0]: Number of consecutive polling failures before exiting. Default value of 0 will allow for unlimited retries.

Installation with helm

Add the cluster-proportional-autoscaler Helm repository:

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler
helm repo update

Then install a release using the chart. The charts default values file provides some commented out examples for setting some of the values. There are several required values, but helm should fail with messages that indicate which value is missing.

helm upgrade --install cluster-proportional-autoscaler \
    cluster-proportional-autoscaler/cluster-proportional-autoscaler --values <<name_of_your_values_file>>.yaml

Examples

Please try out the examples in the examples folder.

Implementation Details

The code in this module is a Kubernetes Golang API client that, using the default service account credentials available to Golang clients running inside pods, it connects to the API server and polls for the number of nodes and cores in the cluster.

The scaling parameters and data points are provided via a ConfigMap to the autoscaler and it refreshes its parameters table every poll interval to be up to date with the latest desired scaling parameters.

Calculation of number of replicas

The desired number of replicas is computed by using the number of cores and nodes as input of the chosen controller.

This may be later extended to more complex interpolation or exponential scaling schemes but it currently supports linear and ladder modes.

Control patterns and ConfigMap formats

The ConfigMap provides the configuration parameters, allowing on-the-fly changes(including control mode) without rebuilding or restarting the scaler containers/pods.

Currently the two supported ConfigMap key value is: ladder and linear, which corresponding to two supported control mode.

Linear Mode

Parameters in ConfigMap must be JSON and use linear as key. The sub-keys as below indicates:

data:
  linear: |-
    {
      "coresPerReplica": 2,
      "nodesPerReplica": 1,
      "min": 1,
      "max": 100,
      "preventSinglePointFailure": true,
      "includeUnschedulableNodes": true
    }

The equation of linear control mode as below:

replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) )
replicas = min(replicas, max)
replicas = max(replicas, min)

When preventSinglePointFailure is set to true, controller ensures at least 2 replicas if there are more than one node.

For instance, given a cluster has 4 nodes and 13 cores. With above parameters, each replica could take care of 1 node. So we need 4 / 1 = 4 replicas to take care of all 4 nodes. And each replica could take care of 2 cores. We need ceil(13 / 2) = 7 replicas to take care of all 13 cores. Controller will choose the greater one, which is 7 here, as the result.

When includeUnschedulableNodes is set to true, the replicas will scale based on the total number of nodes. Otherwise, the replicas will only scale based on the number of schedulable nodes (i.e., cordoned and draining nodes are excluded.)

Either one of the coresPerReplica or nodesPerReplica could be omitted. All of min, max, preventSinglePointFailure and includeUnscheduleableNodes are optional. If not set, min would be default to 1, preventSinglePointFailure will be default to false and includeUnschedulableNodes will be default to false.

Side notes:

Both coresPerReplica and nodesPerReplica are float.
The lowest replicas will be set to 1 when min is less than 1.

Ladder Mode

Parameters in ConfigMap must be JSON and use ladder as key. The sub-keys as below indicates:

data:
  ladder: |-
    {
      "coresToReplicas":
      [
        [ 1, 1 ],
        [ 64, 3 ],
        [ 512, 5 ],
        [ 1024, 7 ],
        [ 2048, 10 ],
        [ 4096, 15 ]
      ],
      "nodesToReplicas":
      [
        [ 1, 1 ],
        [ 2, 2 ]
      ],
      "includeUnschedulableNodes": false
    }

The ladder controller gives out the desired replicas count by using a step function. The step ladder function uses the datapoint for core and node scaling from the ConfigMap. The lookup which yields the higher number of replicas will be used as the target scaling number.

For instance, given a cluster comes with 100 nodes and 400 cores and it is using above ConfigMap. The replicas derived from "cores_to_replicas_map" would be 3 (because 64 < 400 < 512). The replicas derived from "nodes_to_replicas_map" would be 2 (because 100 > 2). And we would choose the larger one 3.

When includeUnschedulableNodes is set to true, the replicas will scale based on total number of nodes or cores. Otherwise, the replicas will only scale based on the number of schedulable nodes (i.e., cordoned and draining nodes are excluded.)

Either one of the coresToReplicas or nodesToReplicas could be omitted. All elements in them should be int. includeUnschedulableNodes will default to false.

Replicas can be set to 0 (unlike in linear mode).

Scaling to 0 replicas could be used to enable optional features as a cluster grows. For example, this ladder would create a single replica once the cluster reaches six nodes.

data:
  ladder: |-
    {
      "nodesToReplicas":
      [
        [ 0, 0 ],
        [ 6, 1 ]
      ]
    }

Comparisons to the Horizontal Pod Autoscaler feature

The Horizontal Pod Autoscaler is a top-level Kubernetes API resource. It is a closed feedback loop autoscaler which monitors CPU utilization of the pods and scales the number of replicas automatically. It requires the CPU resources to be defined for all containers in the target pods and also requires heapster to be running to provide CPU utilization metrics.

This horizontal cluster proportional autoscaler is a DIY container (because it is not a Kubernetes API resource) that provides a simple control loop that watches the cluster size and scales the target controller. The actual CPU or memory utilization of the target controller pods is not an input to the control loop, the sole inputs are number of schedulable cores and nodes in the cluster. There is no requirement to run heapster and/or provide CPU resource limits as in HPAs.

The ConfigMap provides the operator with the ability to tune the replica scaling explicitly.

Using NodeLabels

Nodelabels is an optional param to count only nodes and its cpus where the nodelabels exits. This is useful when nodeselector is used on the target pods controller so its needed to take account only the nodes tagged with the nodeselector labels to calculate the total replicas to scale. When the param is ignored then the cluster proportional autoscaler counts all schedulable nodes and its cpus.

cluster-proportional-autoscaler's People

Contributors

Stargazers

Watchers

Forkers

girishkalele linearregression cloudxtreme mrhohn arvinhub random-liu wojtek-t kiragoo mbssaiakhil yastij kawych spiffxp diannaowa zhangxiaoyu-zidif cbajumpaa openstacker evalle warmchang chenqiangzhishen patrickshan etsangsplk ultimateboy liggitt zobelhelas thejasbabu gemoya jchauncey samaws1 joelsmith tizhou86 qiu957919102 yan234280533 cogniac digeler ringtail sujeeth-alef wuhua988 zefr-inc isgasho qibobo ps882 simony-gke ydcool malw2020 waynezhang1984 schleyfox onesolpark eriksywu cloudguru79 gdzy1987 lemonli dharmab d-kuro sozercan chotiwat rajansandeep johngmyers mcavoyk beidoucloudplatform wendaotao bjdzliu raffo shuinoo mrostanski odinggg open-yuhaoz clix-dev-llc jsravn isabella232 krishhhna lrouquette joschi36 sfowl phmayer67 yjuns adobe-platform grieshaber krmichel aliyuncontainerservice nicolasgarfinkiel ryanmt zensai3805 kundan2707 patrixgdd c0deaddict dyzone raphaelil bnutt nothinggift awwwd jepsenwan tomerzino anshika-sharma-as jbadru73 uesyn eqemccudden blend zhamdoctor pengzonli thesulks

cluster-proportional-autoscaler's Issues

Unable to compute integer values of schedulable cores in the cluster.

Found on Kubernetes 1.13 and 1.14 servers; upgrading Autoscaler from 1.7.1 to 1.8.0.

The autoscaler errors with:

"1 autoscaler_server.go:108] Error while getting cluster status: unable to compute integer values of schedulable cores in the cluster"

It looks like the change in autoscaler_server.go didn't account for the return value being a string instead of an integer:

kubectl get node ip-some-node.foo.bar -o json | jq .status.capacity.cpu
"8"

kubectl get node ip-some-node.foo.bar -o json | jq .status.allocatable.cpu
"7910m"

Emit events when the autoscaler changes the number of replicas

We should emit an event to surface when autoscaling occurs.

release 1.7.0 not scaling

Hi,

Latest version 1.7.0 doesn't seem to be working at all (tested in k8s v1.11.9, v1.13.10 and v1.15.2 with RBAC and wasn't able to make it autoscale anything - it only shows the initial log message autoscaler.go:49] Scaling Namespace: default, Target: deployment/nginx-autoscale-example and that's it).
Exactly same deployment (e.g. examples/linear.yaml) works fine when using cluster-proportional-autoscaler releases 1.5.0 or 1.6.0. Just need to exec kubectl set image deployment/nginx-autoscaler autoscaler=gcr.io/google_containers/cluster-proportional-autoscaler-amd64:1.6.0 and is all good.

As a side note: deployments in example yml files are also broken as selector is mandatory for api version apps/v1: error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec

Thanks!

Allow horizontally scaling statefulsets

Hi,

We'd like to add the ability to proportionally scale stateful sets. Is there a particular reason this is a really bad idea?

Our use-case is for docker registry mirror pods and prometheus instances where we use a shared git repo of kubernetes manifests. On minikube/tiny clusters, we run out of resources that we would have in production.

Might need to be a little more careful, but should be doable https://kubernetes.io/docs/tasks/run-application/scale-stateful-set

README.md had a little bit error

in README.md Linear Mode position,end of "preventSinglePointFailure": true need a character ',’ to make sure the configmap create correct.

Release 1.8.0

We need to make a new release to pick up:

cc @eriksywu
/assign

not work with kops?

It seems cluster-proportional-autoscaler access api-server svc to get nodes. If the cluster is created using kops, it is not allowed to call api-server svc directly. For my environment, I use kops create k8s cluster in aws. There is an ELB for api access.
Is it possible to add a parameter to specify the api endpoint for cluster-proportional-autoscaler?

The README is unclear (and possibly the configuration is indirect)

Disclaimer: I am/was confused in so many ways, it was struggle to even write this issue down - I think I only started understanding how the autoscaler works when writing this but I'm still not really confident I do understand it now.

The documentation should explain why:

The parameter "coresPerReplica" is called so. If I want to decide how many replicas I want per core in the cluster, I want "replicasPerCore". Cores per replica seems like leaking implementation details.
"nodesPerReplica" is even more obscure - from my struggles I understand that we're trying to calculate the number of replicas from the number of cluster nodes - why isn't it "replicasPerNode"?

The presented formula:

replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) )

is beautifully exposing this issue. It could be rewritten to

replicas = max( ceil( cores * replicasPerCore ) , ceil( nodes * replicasPerNode ) )

For instance, given a cluster has 4 nodes and 13 cores.

A regular old example cluster would never have a number of cores indivisible by the number of nodes. Maybe except for MixedInstancesPolicy on AWS - but that's a very advanced topic. One may start thinking the README talking about used capacity.

The README should state if it's talking about used cores or total core capacity.

SchedulableCores doubles unexpectedly

CPA Version

1.7.1.

Kubernetes Version

1.11.8

What happens?

During the normal course of operations, seemingly with no change to the environment, the number of detected SchedulableCores doubles (while the number of detected nodes remains the same). This results in the number of DNS pods doubling and then being chopped in half again (resulting in some occassional blips in DNS service).

Here's a sampling of the logs:

I0906 16:30:32.691040       1 k8sclient.go:277] Cluster status: SchedulableNodes[45], SchedulableCores[90]
I0906 16:30:32.691065       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 11 to 12
I0906 16:34:12.693949       1 k8sclient.go:277] Cluster status: SchedulableNodes[44], SchedulableCores[88]
I0906 16:34:12.693973       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 12 to 11
I0906 16:38:12.693949       1 k8sclient.go:277] Cluster status: SchedulableNodes[45], SchedulableCores[90]
I0906 16:38:12.693975       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 11 to 12
I0906 16:42:02.691179       1 k8sclient.go:277] Cluster status: SchedulableNodes[44], SchedulableCores[88]
I0906 16:42:02.691461       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 12 to 11
I0906 19:32:32.696697       1 k8sclient.go:277] Cluster status: SchedulableNodes[44], SchedulableCores[182]
I0906 19:32:32.696721       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 11 to 23
I0906 19:34:52.692216       1 k8sclient.go:277] Cluster status: SchedulableNodes[46], SchedulableCores[186]
I0906 19:34:52.692243       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 23 to 24
I0906 19:49:02.690834       1 k8sclient.go:277] Cluster status: SchedulableNodes[47], SchedulableCores[94]
I0906 19:49:02.690859       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 24 to 12
I0906 19:52:12.693694       1 k8sclient.go:277] Cluster status: SchedulableNodes[47], SchedulableCores[188]
I0906 19:52:12.693721       1 k8sclient.go:278] Replicas are not as expected : updating replicas from 12 to 24

I have not yet been able to reliably reproduce this issue but it does seem to happen a few times an hour, possibly coinciding with minor cluster scaling operations (initiated, in our case, by the cluster-autoscaler).

arm64 images not available after 1.6.0

From the releases, I can see that starting with 1.7.0 only the amd64 arch image is published. Is arm64 support dropped at some point? I couldn't find it in the CHANGELOG or notes on the releases.

https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/cluster-proportional-autoscaler-arm64
https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/cluster-proportional-autoscaler-amd64

Mount ConfigMap into container instead of fetching it from Apiserver

Kubernetes already provides ConfigMap mounting feature --- the mounted ConfigMap gets dynamically reloaded on disk on changes. It seems not wise to re-implement a mechanism provided by the cluster.

Beside, with current implementation the autoscaler has to be granted ConfigMap read access rights to the cluster. It also seems unfit because only one specific ConfigMap is needed for providing scaling parameters.

However, one problem with the ConfigMap mounting solution is that the container will not be started running if the ConfigMap is no exist(mount error), which means we need an initial process to create it ahead. This is not desired either, because we want to handle the entire lifecycle of ConfigMap by autoscaler itself.

Good news is folks already start working on the optional ConfigMap feature(kubernetes/community#175), which is also needed by kube-dns. We should consider re-write the ConfigMap polling logic here after this feature is implemented.

add /healthz endpoint

re: issue #27

I'd like to add a healthz endpoint for general liveliness check (i.e check if able to load configmap)

Unable to compute integer values of schedulable cores in the cluster.

Should provide other ways to specify the scaling target beside name.

As the scaling target's name could be changed, autoscaler should provide other ways for user to specify what the target is. One candidate would be label selector.

Take kube-dns as an example, we could use label k8s-app: kube-dns to select the target ReplicationController/Deployment, and don't need to restart autoscaler to change the input argument everytime when the target name is changed.

Expose prometheus metrics

Hi,

Currently there is no way to monitor the autoscaler for errors.

A prometheus metrics endpoint should be exposed, similar to other k8s components.

Autoscaler examples are broken because lack of proper permissions.

Kubernetes upstream did a bunch of updates about authorization. And it broke the autoscaler examples because by default the pod created in the default namespace would not have authentication to list nodes nor modify a target resource. Error log as below:

E0107 00:30:07.411820       1 autoscaler_server.go:96] Error while getting cluster status: the server does not allow access to the requested resource (get nodes)

Need to properly assign permissions to autoscaler pod as a fix.

Update to use kubernetes client-go polymorphic scale client

Use of polymorphic scale client would add support for any resource with scale sub-resource and would fix #29.

Publish multi-arch image manifest

We're working on adding ARM64 support in Kops but in order to do that we need to use multi-architecture images for all of our addons because the nodes they could be scheduled on could span multiple architectures. Currently the CPA releases have separate images per architecture but it would be great to have a single image manifest for all architectures.

Adding support for this could be as simple as adding these docker manifest commands to the Makefile. Any objection to supporting this? We can provide assistance if needed.

Support for checking node and cores with current NodeAffinity

From the document it appears if i have more than 1 node group and i have NodeAffinity configured instead of Node Selectors it might end up getting total nodes/total cores for all the nodes available in the cluster.

It would be great if it takes into account NodeAffinity when calculating the total current nodes and cores

Should have a way to print all supported modes and available params

There is currently no way to know what modes are supported, what parameters are available for each mode and how to use them, unless you read the READMEs in this repo.

We should be able to print these out via the binary as well. Probably through:

$ cluster-proportional-autoscaler --show-modes
# Show available modes, each has one line description.
$ cluster-proportional-autoscaler --show-usage $MODE_NAME
# Show available params and usage.

Rebuild docker image to pick up Alpine 3.8.1

Alpine 3.8.1 fixes a RCE vulnerability (see https://alpinelinux.org/posts/Alpine-3.8.1-released.html).
Rebuild and push the image to pick up the new base image.

dns autoscaler does not support virtual kubelet node

In the Azure AKS environment, when using virtual kubelet node (https://docs.microsoft.com/en-us/azure/aks/virtual-kubelet), the default CPU count reported is 800 (https://github.com/virtual-kubelet/virtual-kubelet/blob/e98d3ad2ae4767bce647e8e2f62d261f881fa242/providers/azure/aci.go#L269-L271). This leads to the dns autoscaler calculates a large number of replica for dns pod than actually required.

This issue asks for a native support of virtual kubelet node by making dns autoscaler aware of the virtual kubelet node and the meaning of the reported CPU count.

cluster-proportional-autoscaler or DaemonSet

Hi there,

This is Feilong from OpenStack Magnum team. Magnum is like a service which help customer to deploy production-ready k8s. We're working on an enhancement for DNS HA. I have a question about cluster-proportional-autoscaler and DaemonSet. Based on my understanding, DaemonSet will ensure that all Nodes run a copy of a Pod. So what's the extra benefit cluster-proportional-autoscaler can provide? And back to my case, DNS HA, do you still recommend to use cluster-proportional-autoscaler or just use DaemonSet? Thanks.

Include cordoned nodes in CPA calculations

CPA only considers schedulable nodes when determining how many replicas to run. However, cordoned nodes often contribute to load since a cordoned node may still run workloads which were scheduled before the cordoning.

An option to include cordoned nodes in the autoscaling calculation would allow critical services (e.g. DNS) to scale based on the total number of nodes rather than the schedulable Nodes.

Use a newer client-go version

We are using an old client-go library (v1.4). It may not be compatible with v1.7 k8s server as the upstream only guarantee maximum two major version diffs.

Should update to use a newer version before v1.7 code freeze.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Is node.Status.Allocatable much better than node.Status. Capacity in calculating total cores?

Allocatable on a Kubernetes node is defined as the amount of compute resources that are available for pods. In production environment, we often reserve 1 or more cpu core for system, so the difference of total core numbers is huge in a large cluster(eg 1000 nodes).

If this is a real problem, I can submit a pr for it.

Unbalanced number of replicas per node

Hi there. I'm seeing behavior with this autoscaler that doesn't align with my expectation. My expectation is that with linear mode, coresPerReplica: 1, a 6 node cluster with 8 cores per node would then see 8 replicas on each node, for a total of 48. This would provide an evenly distributed CPU request buffer across the cluster, where each node would have ~20% of its allocated CPU requests filled with "pause" pods.

However, when deploying the autoscaler it seems that I am seeing an uneven distribution, and different results each time with respect to distribution. I am seeing 48 replicas in total but the distribution is not what I expected. In particular I'm seeing that it is consistently not putting replicas (or very little) on a node with about 48% of cpu requests/limits allocated. But with a 200m pause pod replica, and 1 core per replica there is plenty of space for the overprovisioning autoscaler to allocate 200m * 8 replicas on this node.

Let me know what additional information you would like, if any. Thanks!

Being able to patch individual fields of linear value

Linear values are json formatted, but in fact it's string in yaml config, which makes it impossible to kubectl patch a single field. For example:

data:
  linear: |-
    {
      "coresPerReplica": 2,
      "nodesPerReplica": 1,
      "min": 1,
      "max": 100,
      "preventSinglePointFailure": true
      "includeUnschedulableNodes": true
    }

If I want to change max to 50, I can't do kubectl patch with:

data:
  linear:
    max: 50

I can do:

data:
  linear: |-
    {
      "coresPerReplica": 2,
      "nodesPerReplica": 1,
      "min": 1,
      "max": 50,
      "preventSinglePointFailure": true
      "includeUnschedulableNodes": true
    }

But this is not viable in long term, especially when this config is a system config that could be updated to have more fields or less fields, which will make the above custom diverge.

Bring vertical autoscaling feature into current autoscaler.

Although the original purpose of this repo is to provide a method for horizontal autoscaling, it would be great if we also bring in the vertical autoscaling feature, as this could be done in the same pattern. The vertical autoscaling feature could be modeled off of the Addon Resizer project, which is basically monitoring the cluster status and modify the scaling resource(like CPU and Memory resource in Deployment specs) as needed.

One thought is that we could implement vertical autoscaling controllers just like the horizontal ones. And we could also restructure the codes to enable running multiple controllers simultaneously (one scenario would be running one horizontal controller and one vertical controller together). kube-dns would be a real use case, as we may need to bump up its resource request when the cluster size grow big enough while we also want to horizontally scale it in the meantime.

Another thought is we may need to collect different infos as the cluster status(rather than only number of nodes and cores) when different types of controllers start to come in. #10 may be a feasible starting point.

cc @bowei

Can CPA adjust PODs replica count on the basis of configured date and time

Use case:
We want to scale the cluster at particular date and time hence expecting CPA can modify/change replica count at a specific date and time which will cause CA to scale the cluster.

there are other things which I am looking before adaptation of CPA, some of them I already discuss and answered by @MrHohn , thought to track it via ghe

1- I see its for managing POD in terms of maintaining some number of PODs as per cluster size etc by using policy like pernode or exponential , liner etc. Can a single CPA POD manage two different kind of PODs which will deploy on different-2 node groups ?
Like I have two node groups in my cluster A and B now I want to maintain two set of PODs one for each, can a single CPA manage it?

Reply from @MrHohn : short answer is that CPA currently doesn’t support managing two set of pods.

2- Is CPA compatible for all k8s cluster version or its one to one mapping like cluster-autoscaler ?

Reply from @MrHohn : it should be compatible to k8s 1.9+.

3- Can we build CPA and push the build image to cloud provider for their user ?
Reply from @MrHohn : I’m not an expert in terms of licensing, but I don’t spot any big issue for doing that given this project is pretty much using the same license as kubernetes/kubernetes (Apache License 2.0).

Make CoresPerReplica and NodesPerReplica `float` instead of `int`

Current implementation of linearParams as below:

type linearParams struct {
	CoresPerReplica int `json:"coresPerReplica"`
	NodesPerReplica int `json:"nodesPerReplica"`
	Min             int `json:"min"`
	Max             int `json:"max"`
}

By making CoresPerReplica and NodesPerReplica float, we can have more precise control. Example:

CoresPerReplica = 2.5
CurrentSchedulableCores = 5
ExpectedReplicas = ceil(CurrentSchedulableCores/CoresPerReplica) = 2

Update references to kubernetes-incubator

The kubernetes-incubator org has been deprecated - kubernetes/community#1922.

This repo still contains kubernetes-incubator in it's Makefile and import paths, so these should be updated.

cluster-proportional-autoscaler conflicts with HPA CPU

While using cluster-proportional-autoscaler for CoreDNS scaling - https://gist.github.com/MrHohn/1198bccc2adbd8cf3b066ab37ccd8355#enable-dns-horizontal-autoscaling-feature it seems like CPA gets conflicts with CoreDNS HPA (cpu based).

I.E:
CPA scale up from 3 to 4: Replicas are not as expected : updating replicas from 3 to 4
But CoreDNS CPU HPA immediately scale-down from 4 to 3: Normal SuccessfulRescale 3m21s (x158 over 40d) horizontal-pod-autoscaler New size: 3; reason: All metrics below target

How should i manage this issue ?

More control patterns are needed beside `ladder` mode.

For this cluster-proportional-autoscaler, the desired number of replicas of RC/Deployment/RS is decided by the controller plugin. As the progress made in #3, currently the only supported controller is ladder controller, which compute replica counts by looking up the number of cores and nodes and using the step ladder function.

Although the starting point of this project is to simply scale up resource based on cluster size, more handy but simple control pattern should be welcomed. Hence, this component may be later extended to more complex interpolation or linear/exponential scaling schemes.

Feature Request: Helm chart for cluster-proportional-autoscaler

I would like the Helm Chart to deploy the cluster-proportional-autoscaler.

Rate limit replica scale up after parameter change

After a configuration update, it is possible for the number of replicas to experience a large change (e.g. from 1k replicas => 2k), overwhelming the other parts of the infrastructure.

We should have a tunable scale up rate of change parameter such as "scale_up_replicas_per_second" that can control how fast we increase replica counts.

Note: it can default to infinity which is the policy today.

Need to abstract GetClusterStatus to be part of the controller plugin

The ultimate goal of this autoscaler is to support any types of cluster statuses for controllers. In current implementation, only cluster size(node number and core number) is consider as part of the cluster size. Soon it will become not sufficient because cluster size is not always an appropriate parameters for scaling applications.

Another obvious option for scaling is pods number with QPS per pod. Or even metrics retrieved from some other applications in the same cluster. To support any possible type for cluster status, I think abstracting GetClusterStatus() also into controller would be a good choice.

Rebase docker image on scratch

See kubernetes/kubernetes#40248 (comment)

It looks like this image doesn't require any external dependencies, so we should be able to rebase on scratch without any trouble.

For debugging scratch-based images, see https://github.com/kubernetes/contrib/tree/master/scratch-debugger

Export prometheus metrics

Exposing prometheus compatible metrics from the pod will enable easier debugging and tracking.

JSON broken in README.md

Hi,
I have to report that there is an error in JSON provided as an example of linear object in default configuration.
The comma and quotes are necessary for proper config.

Is now:

data: linear: |- { "coresPerReplica": 2, "nodesPerReplica": 1, "min": 1, "max": 100, "preventSinglePointFailure": true "includeUnschedulableNodes": true }

Should be:

data: linear: |- { "coresPerReplica": 2, "nodesPerReplica": 1, "min": 1, "max": 100, "preventSinglePointFailure": "true", "includeUnschedulableNodes": "true" }

Sorry, I have created (probably wrong) PR with those changes in #101

Need to get the right semantics for scaling pattern to make autoscaler more practical.

From kubernetes/kubernetes#40063 and comment in kubernetes/kubernetes#40281.

A practical setup for apps like kube-dns would be:

Have only 1 replica on a 1 node cluster.
Increase to at least 2 replicas in case of 2 nodes to mitigate single point failure.
In other cases let the controller takes over.

Open to suggestions to get the right semantics that satisfy above requirements.

Support switching control pattern on-the-fly.

As we are adding more control patterns to autoscaler like #5, it would be great to also support switching control pattern without restarting the autoscaler.

Currently the control mode is passed to autoscaler as an input argument. One thought to achieve this is also parse out the control mode from ConfigMap.

Update tzdata package in cluster-proportional-autoscaler image

DLA-2424-1 is a vulnerability impacting the cluster-proportional-autoscaler image. The tzdata package will need to be upgraded to resolve.

Provide filter mechanics for nodes / cores used in scaling calculation

Deployments managed by this autoscaler may only qualify on certain nodes (e.g. using taints/tolerations/labels).

In order to ensure the autoscaling calculation is appropriate for the nodes/cores available to the deployment, it'd be useful to have a filtering mechanic that ensures only specific nodes qualify during calculations.

Let me know if this requirement makes sense and thanks for the work on this project.

Operating from outside a cluster

I was wondering if it would be welcome a new feature that will allow users to run the cluster-proportional-autoscaler outside of the cluster that is autoscaling.

The use case is as follows:

Given a managed Kubernetes service, I want the cluster proportional autoscaler to automatically scale my CoreDNS deployment based on the number of nodes/load.

In this case, the cluster consumer will get benefit from the autoscaling logic, but will not see the cluster-proportional-autoscaler, since this component will be running from outside the cluster (provided with a kubeconfig file through a new --config or --kubeconfig argument).

If this argument is not provided, the cluster-proportional-autoscaler will fallback to the in-cluster regular operation.

Please let me know, if there's interest I can provide a PR.

Need to support maintaining default ConfigMap params

Current autoscaler only takes the responsibility to fetch ConfigMap from the apiserver. When there is no ConfigMap created or the previous ConfigMap on apiserver got accidentally deleted, autoscaler will fail. It would be good that this autoscaler also provides the ability to create a default ConfigMap when not present, and the default ConfigMap params could(or should) be provided by user.

Here is one specific use case of autoscaler with kube-dns addon of kubernetes:
The plan to use this autoscaler with kube-dns is to make it also an addon and put its manifest into the /addon folder. By doing so, the addon manager will manage its creation and update. A problem will happen to the ConfigMap if we also treat it as an addon resource. Addon manager assumes ownership over all supplied fields in the initial config, which means the user would not be able to modify the autoscale params(ConfigMap) if they don't have the permission to modify the manifests locate on the master node. Without the ability to configure the scaling params, this autoscaler is much less useful.

An obvious way to solve above problem is to only create the ConfigMap once on startup instead of handing over it to the addon manager. But that leads to another issue that what if the ConfigMap got accidentally deleted by user? The autoscaler will be functionless if no one re-create the ConfigMap params.

To make the autoscaling feature both flexible and robust, a quick proposal below:

Add one more flag (like --default-params) for user to provide the default params for scaling. The autoscaler will create/update the corresponding ConfigMap on startup if flag presents.
The working logic remains the same, user could still modify the ConfigMap params on the fly. But if fetching ConfigMap from apiserver fails, autoscaler should re-create the ConfigMap with the given default params.
If default params not provided, autoscaler acts as before and will not manage the ConfigMap.

@bowei @thockin @bprashanth

Add health check and metrics

This pod needs a health check. Might as well add metrics as well.

Probably can do a cut and paste of https://github.com/kubernetes/dns/blob/master/pkg/sidecar/metrics.go

support for zero in Replicas.Min

From the documentation:
The lowest replicas will be set to 1 when min is less than 1.

We have however got a use case where we are looking to set "replicas.min" to 0 in lower environments. We maintain a handful of clusters with multiple autoscalers for different components.

The idea is to let autoscaler do the scaling on demand and save some costs(even for an instance). This will help us keeping the configuration consistent across environments as well.

It should be possible to use only use the cores of nodes with specific labels

I’ve been using the proportional autoscaler with a low priority pause container deployment to maintain relative headroom before scaling is triggered in the normal recommended way.

This however is breaking down for me when I want to maintain this relative headroom for individual worker-pools. In particular the proportional autoscaler doesn’t have a concept of only responding to certain node labels or worker-pools.

That is to say if I have 2 worker pools one with 20 CPUs and one with 10, The proportional autoscaler will act on the number of CPUs as being 30, rather than 20 or 10 and it is not currently possible to maintain a dynamic amount of headroom per worker pool.

The proportional autoscaler only understands the total amount of CPU within a cluster and it is not possible partition that.

kubernetes-sigs / cluster-proportional-autoscaler Goto Github PK