Giter VIP home page Giter VIP logo

cluster-proportional-vertical-autoscaler's Introduction

cluster-proportional-vertical-autoscaler (BETA)

Build Status Go Report Card

Overview

This container image watches over the number of nodes and cores of the cluster and resizes the resource limits and requests for a DaemonSet, ReplicaSet, or Deployment. This functionality may be desirable for applications where resources such as cpu and memory for a particular job need to be autoscaled with the size of the cluster.

Usage of cluster-proportional-vertical-autoscaler:

      --alsologtostderr[=false]: log to standard error as well as files
      --config-file: The default configuration (in JSON format).
      --default-config: A config file (in JSON format), which overrides the --default-config.
      --kube-config="": Path to a kubeconfig. Only required if running out-of-cluster.
      --log-backtrace-at=:0: when logging hits line file:N, emit a stack trace
      --log-dir="": If non-empty, write log files in this directory
      --logtostderr[=false]: log to standard error instead of files
      --namespace="": The Namespace of the --target. Defaults to ${MY_NAMESPACE}.
      --poll-period-seconds=10: The period, in seconds, to poll cluster size and perform autoscaling.
      --stderrthreshold=2: logs at or above this threshold go to stderr
      --target="": Target to scale. In format: deployment/*, replicaset/* or daemonset/* (not case sensitive).
      --v=0: log level for V logs
      --version[=false]: Print the version and exit.
      --vmodule=: comma-separated list of pattern=N settings for file-filtered logging

Examples

Please try out the examples in the examples folder.

Implementation Details

The code in this module is a Kubernetes Golang API client that, using the default service account credentials available to Golang clients running inside pods, it connects to the API server and polls for the number of nodes and cores in the cluster.

The scaling parameters and data points are provided via a config file in JSON format to the autoscaler and it refreshes its parameters table every poll interval to be up to date with the latest desired scaling parameters.

Calculation of resource requests and limits

The resource requests and limits are computed by using the number of cores and nodes as input as well as the provided step values bounded by provided base and max values.

Example:

Base = 10
Max = 100
Step = 2
CoresPerStep = 4
NodesPerStep = 2

The core and node counts are rounded up to the next whole step.

If we find 64 cores and 4 nodes we get scalars of:
  by-cores: 10 + (2 * (round(64, 4)/4)) = 10 + 32 = 42
  by-nodes: 10 + (2 * (round(4, 2)/2)) = 10 + 4 = 14
  
The larger is by-cores, and it is less than Max, so the final value is 42.

If we find 3 cores and 3 nodes we get scalars of:
  by-cores: 10 + (2 * (round(3, 4)/4)) = 10 + 2 = 12
  by-nodes: 10 + (2 * (round(3, 2)/2)) = 10 + 4 = 14

Config parameters

The configuration should be in JSON format and supports the following parameters:

  • base The baseline quantity required.
  • max The maximum allowed quantity.
  • step The amount of additional resources to grow by. If this is too fine-grained, the resizing action will happen too frequently.
  • coresPerStep The number of cores required to trigger an increase.
  • nodesPerStep The number of nodes required to trigger an increase.

Example:

"containerA": {
  "requests": {
    "cpu": {
      "base": "10m", "step":"1m", "coresPerStep":1
    },
    "memory": {
      "base": "8Mi", "step":"1Mi", "coresPerStep":1
    }
  }
"containerB": {
  "requests": {
    "cpu": {
      "base": "250m", "step":"100m", "coresPerStep":10
    },
  }
}

Running the cluster-proportional-vertical-autoscaler

This repo includes an example yaml files in the "examples" directory that can be used as examples demonstrating how to use the vertical autoscaler.

For example, consider a Deployment that needs to scale its resources (cpu, memory, etc...) proportional to the number of cores in a cluster.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thing
  namespace: kube-system
  labels:
    k8s-app: thing
spec:
  replicas: 3
  selector:
    matchLabels:
      k8s-app: thing
  template:
    metadata:
      labels:
        k8s-app: thing
    spec:
      containers:
      - image: nginx
        name: thing
kubectl create -f thing.yaml

The below config will scale the above defined deployment's CPU resource by "100m" step size for every 10 nodes that are added to the cluster.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thing-autoscaler
  namespace: kube-system
  labels:
    k8s-app: thing-autoscaler
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  selector:
    matchLabels:
      k8s-app: thing-autoscaler
  template:
    metadata:
      labels:
        k8s-app: thing-autoscaler
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpvpa-amd64:v0.8.1
        resources:
          requests:
            cpu: "20m"
            memory: "10Mi"
        command:
          - /cpvpa
          - --target=deployment/thing
          - --namespace=kube-system
          - --logtostderr=true
          - --poll-period-seconds=10
          - --default-config={"thing":{"requests":{"cpu":{"base":"250m","step":"100m","nodesPerStep":10}}}}
      tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      serviceAccountName: thing-autoscaler

cluster-proportional-vertical-autoscaler's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-proportional-vertical-autoscaler's Issues

CHangelog not updated

cluster-proportional-vertical-autoscaler changelog is not updated for current release
last updated release was 0.0.0 and current release is 0.8.3

CPVPA fails if some API is not discoverable

If there are APIServices that are not discoverable then the CPVPA is crashlooping:

$ kubectl get apiservice  | grep metrics-ad
v1beta1.custom.metrics.k8s.io          kube-system/kube-metrics-adapter   False (MissingEndpoints)   26d
v1beta1.external.metrics.k8s.io        kube-system/kube-metrics-adapter   False (MissingEndpoints)   26d

$ kubectl -n kube-system logs calico-typha-vertical-autoscaler-5557c6d7d-2sd7c
I0420 04:39:57.408180       1 autoscaler.go:46] Scaling namespace: kube-system, target: deployment/calico-typha-deploy
E0420 04:39:59.408093       1 autoscaler.go:49] failed to discover apigroup for kind "Deployment": unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request, external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Is this desired behaviour? If yes, why does the CPVPA need to discover the full API?

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Support for Statefulset

We would like to have a autoscaler for statefulset which scales based on number of nodes. I have a couple of scenario in my architecture where I think it should be useful. I am not aware of any such solution available in OSS. I think it would be good idea to extend this for statefulset. I am happy to help out with the PR but I would like to know any concerns regarding this proposal that I should be aware of.

Cannot patch apps/v1 deployment

In the logs I see the following entry repeated:

E1018 09:08:03.582625       1 autoscaler_server.go:153] Update failure: patch failed: Deployment.apps "foo" is invalid: spec.template.spec.containers[0].image: Required value

Kubernetes version - v1.16.2, v1.15.6
Deployment apiVersion - apps/v1
kubernetes-incubato/cluster-proportional-vertical-autoscaler version - v0.8.1 (k8s.gcr.io/cpvpa-amd64:v0.8.1)

Strategic Merge Patch changes the order of the containers list

Problem

The patch being generated from a map, which are unordered, can cause the order of the containers list to change:

func (k *k8sClient) UpdateResources(resources map[string]apiv1.ResourceRequirements) error {
ctrs := []interface{}{}
for ctrName, res := range resources {
ctrs = append(ctrs, map[string]interface{}{
"name": ctrName,
"resources": res,
})
}

This creates a perpetual diff, especially noticeable in tools monitoring drift at all time like Argo CD.

Potential solutions

I have thought of 4 potential solutions:

  1. Load the configuration as an ordered map with a library supporting it, maybe challenging to find something parsing JSON directly into it
  2. Add an optional configuration field to specify the order, for example:
"containerA": {
  // ...
  "order": 1
}
"containerB": {
  // ...
  "order": 2
}
  1. Get the order from the deployment. It would not require any additional configuration from the user, but would require get permission on the deployment
  2. Change the configuration from a map to a list (breaking change):
[
  {
    "name": "containerA",
    "requests": {
      "cpu": {
        "base": "10m", "step": "1m", "coresPerStep": 1
      },
      "memory": {
        "base": "8Mi", "step": "1Mi", "coresPerStep": 1
      }
    }
  },
  {
    "name": "containerB",
    "requests": {
      "cpu": {
        "base": "250m", "step": "100m", "coresPerStep": 10
      }
    }
  }
]

Additional information

This is similar to the issue faced here: kubernetes/kubernetes#62830

Log necessary info on scale event and avoid logging when doing nothing

The autoscaler currently doesn't do a very good job on logging from what we have observed. Specifically it would be helpful to log the current node/cpu count when the scale event happens, so that it would be clear how that decision is made. Also it would be great to not log too frequently when doing nothing to reduce the noise.

cc @lzang

Add support for arm64 images

What would you like to be added:

Images build and published through this repository are only amd64 compatible. We should enable a multi-arch docker build to also offer arm64 images.

cpva fails with "unknown target kind: Tap"

What happened:
cpva fails to start when deployment is available under multiple API groups.

How to reproduce it (as minimally and precisely as possible):

  1. Install linkerd. See https://linkerd.io/2/getting-started/

  2. Ensure that there are multiple API groups serving resource deployments:

$ k api-resources
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
# ...
daemonsets                        ds           apps                           true         DaemonSet
deployments                       deploy       apps                           true         Deployment
# ...
daemonsets                        ds           tap.linkerd.io                 true         Tap
deployments                       deploy       tap.linkerd.io                 true         Tap
  1. Ensure that cpva fails with
$ k logs cpva -n kube-system
I0217 20:41:29.699612       1 autoscaler.go:46] Scaling namespace: kube-system, target: deployment/calico-typha-deploy
E0217 20:41:30.799782       1 autoscaler.go:49] unknown target kind: Tap

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T23:41:24Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Image: k8s.gcr.io/cpvpa-amd64:v0.8.1

Multiple targets for scale up

Currently as far as I understand the component allows scaling only 1 target per deployment i.e. 1 instance of CPVA can scale 1 deployment/replicaset. Can we allow specifying a list of targets in --targets along with a list of configmap, this will save us resources in launching separate containers for each scaling requirement. I can help in implementing the same, I would like to know if there is any concern or suggestion around this feature.

Vertical autoscaler doesn't support apps/v1

Noticed the issue when trying to upgrade Calico for latest k8s and saw vertical autoscaler is crash looping. It seems that current vertical autoscaler doesn't support apps/v1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.