Giter VIP home page Giter VIP logo

cpu-pooler's Introduction

CPU Pooler for Kubernetes

Build Status

Overview

CPU Pooler for Kubernetes is a solution for Kubernetes to manage predefined, distinct CPU pools of Kubernetes Nodes, and physically separate the CPU resources of the containers connecting to the various pools.

Two explicit types of CPU pools are supported; exclusive and shared. If a container does not explicitly ask resources from these pools, it will belong to the default pool.

Request from exclusive pool will allocate CPU(s) for latency-sensitive containers requiring exclusive access.

Containers content with some, limited level of sharing, but still appreciating best-effort protection from noisy neighbours can request resources from the shared pool. Shared requests are fractional CPUs with one CPU unit exactly matching to one thousandth of a CPU core. (millicpu).

The default pool is synonymous to the Kubelet managed, by default existing shared pool. Worth noting that the shared and default pool's CPU sets, and characteristics are distinct. By supporting the "original" CPU allocation method in parallel with the enhanced, 3rd party containers totally content with the default CPU management policy of Kubernetes can be instantiated on a CPU-Pooler upgraded system without any configuration changes.

Components of the CPU-Pooler project

The CPU-Pooler project contains 4 core components:

  • a Kubernetes standard Device Plugin seamlessly integrating the CPU pools to Kubernetes as schedulable resources
  • a Kubernetes standard Informer making sure containers belonging to different CPU pools are always physically isolated from each other
  • process starter binary capable of pinning specific processes to specific cores even within the confines of a container
  • a mutating admission webhook for the Kubernetes core Pod API, validating and mutating CPU pool specific user requests

The Device Plugin's job is to advertise the pools as consumable resources to Kubelet through the existing DPAPI. The CPUs allocated by the plugin are communicated to the container as environment variables containing a list of physical core IDs. By default the application can set its processes CPU affinity according to the given CPU list, or can leave it up to the standard Linux Completely Fair Scheduler.

For the edge case where application does not implement functionality to set the CPU affinity of its processes, the CPU pooler provides mechanism to set it on behalf of the application. This opt-in functionality is enabled by configuring the application process information to the annotation field of its Pod spec. A mutating admission controller webhook is provided with the project to mutate the Pod's specification according to the needs of the starter binary (mounts, environment variables etc.). The process-starter binary has to be installed to host file system in /opt/bin directory.

Lastly, the CPUSetter sub-component implements total physical separation of containers via Linux cpusets. This Informer constantly watches the Pod API of Kubernetes, and is triggered whenever a Pod is created, or changes its state(e.g. restarted etc.) CPUSetter first calculates what is the appropriate cpuset for the container: the allocated CPUs in case of exclusive, the shared pool in case of shared, or the default in case the container did not explicitly ask for any pooled resources. CPUSetter then provisions the calculated set into the relevant parametet of the container's cgroupfs filesystem (cpuset.cpus). As CPUSetter is triggered by all Pods on all Nodes, we can be sure no containers can ever -even accidentally- access CPU resources not meant for them!

Using the allocated CPUs

For running the application in allocated CPUs or pinning application process / thread(s) to allocated CPUs , there are following options available:

cpuset

Cpuset basically just restricts container to run on certain CPU(s) so relying on cpuset isolation is suitable for plain shared CPU allocation. This option is also suitable for allocating one exclusive CPU for a single threaded application.

pod annotation

If container has multiple processes and multiple exclusive CPUs are allocated or different pool types are used, pod annotation can be used to configure processes and CPU amounts for them. In this case CPU Pooler takes care of pinning the processes to allocated CPUs. This is suitable for single threaded processes running on exclusive CPUs. The container can also have process(es) running on shared CPUs

use environment variables for pinning

CPU Pooler sets allocated exclusive CPUs to environment variable EXCLUSIVE_CPUS and allocated shared CPUs to SHARED_CPUS environment variable. They contain CPU(s) as comma separated list. These variables can be used by the application to read the allocated CPU(s) and do pinning of threads / processes to the CPU(s). This option needs to be used e.g for multithreaded processess with exlusive CPUs.

In this option there is a synchronization problem in CPUsetter setting the cpuset and application doing the pinning (by setting the CPU affinity); if application does the pinning immediately after the startup, it is highly likely that CPUsetter has not finished it's job and cpuset is set after the CPU affinity by the application process. In this case the affinity setting is lost. This problem is solved by the process-starter, which waits for finishing of cpuset configuration and starts the container process only after that. The process-starter can be used only if the command property is configured in container's pod manifest. If the command is not configured, the process-starer does not know which process to start in the container. If command cannot be configured, application must make sure that cpuset is configured (=matches to union of SHARED_CPUS and EXCLUSIVE_CPUS) before setting the CPU affinity.

Configuration

Kubelet

Kubelet treats Devices transparently, therefore it will never realize the CPU-Pooler managed resources are actually CPUs. In order to avoid double bookkeeping of CPU resources in the cluster, every Node's Kubelet hosting the CPU-Pooler Device Plugin should be configured according to the following formula: --system-reserved = <TOTAL_CPU_CAPACITY> - SIZEOF(DEFAULT_POOL) + <DEFAULT_SYSTEM_RESERVED> This setting effectively tells Kubelet to discount the capacity belonging to the CPU-Pooler managed shared and exclusive pools.

CPU pools

CPU pools are configured with a configMap named cpu-pooler-configmap. The schema of configMap is as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cpu-pooler-configmap
data:
  poolconfig-<name>.yaml: |
    pools:
      exclusive_<poolname1>:
        cpus : "<list of cpus"
      exclusive_<poolname2>:
        cpus : "<list of cpus"
      shared_<poolname3>:
        cpus : "<list of cpus>"
      default:
        cpus : "<list of cpus>"
      nodeSelector:
        <key> : <value>

The poolconfig-.yaml file must exist in the data section. The cpu pools are defined in poolconfig-.yaml files. There must be at least one poolconfig-.yaml file in the data section. Pool name from the config will be the resource in the fully qualified resource name (nokia.k8s.io/<pool name>). The pool name must have pool type prefix - 'exclusive' for exclusive cpu pool or 'shared' for shared cpu pool. A CPU pool not having either of these special prefixes is considered as the cluster-wide 'default' CPU pool, and as such, CPU cores belonging to this pool will not be advertised to the Device Manager as schedulable resources. The nodeSelector is used to tell which node the pool configuration file belongs to. CPU pooler and CPUSetter components both read the node labels and select the config that matches the value of nodeSelector.

In the deployment directory there is a sample pool config with two exclusive pools (both have two cpus) and one shared pool (one cpu). Nodes for the pool configurations are selected by nodeType label.

Please note: currently only one shared pool is supported per Node!

Pod spec

The cpu-device-plugin advertises the resources of exclusive, and shared CPU pools as name: nokia.k8s.io/<poolname>. The poolname is pool name configured in cpu-pooler-configmap. The cpus are requested in the resources section of container in the pod spec.

Annotation:

Annotation schema is following and the name for the annotation is nokia.k8s.io/cpus. Resource being the the advertised resource name.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/container"
  },
  "definitions": {
    "container": {
      "type": "object",
      "required": [
        "container",
        "processes"
      ],
      "properties": {
        "container": {
          "type": "string"
        },
        "processes": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/process"
          }
        }
      }
    },
    "process": {
      "type": "object",
      "required": [
        "process",
        "args",
        "pool",
        "cpus"
      ],
      "properties": {
        "process": {
          "type": "string"
        },
        "args": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "pool": {
          "type": "string"
        },
        "cpus": {
          "type": "number"
        }
      }
    }
  }
}

An example is provided in cpu-test.yaml pod manifest in the deployment folder.

Restrictions

Following restrictions apply when allocating cpu from pools and configuring pools:

  • There can be only one shared pool in the node
  • Resources belonging to the default pool are not advertised. The default pool definition is only used by the CPUSetter component

Build

Mutating webhook

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpu-device-webhook -f build/Dockerfile.webhook  .

The device plugin

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpudp -f build/Dockerfile.cpudp  .

Process starter

$ dep ensure --vendor-only
$ CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' github.com/nokia/CPU-Pooler/cmd/process-starter

CPUSetter

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpusetter -f build/Dockerfile.cpusetter  .

Installation

Install process starter to host file system:

$ cp process-starter /opt/bin/

Create cpu-device-plugin config and daemonset:

$ kubectl create -f deployment/cpu-pooler-config.yaml
$ kubectl create -f deployment/cpu-dev-ds.yaml

There is a helper script ./scripts/generate_cert.sh that generates certificate and key for the webhook admission controller. The script deployment/create-webhook-conf.sh can be used to create the webhook configuration from the provided manifest file (webhook-conf.yaml).

Create CPUSetter daemonset:

$ kubectl create -f deployment/cpusetter-ds.yaml

Following steps create the webhook server with necessary configuration (including the certifcate and key)

$ ./scripts/generate-cert.sh
$ 
$ kubectl create -f deployment/webhook-svc-depl.yaml
$ deployment/create-webhook-conf.sh deployment/webhook-conf.yaml

The cpu-device-plugin with webhook should be running now. Test the installation with a cpu test pod. First create image for the test container:

$ docker build -t busyloop test/

Start the test container:

$ kubectl create -f ./deployment/cpu-test.yaml

Upon instantiation you can observe that the cpuset.cpus parameter of the created containers' cpuset cgroupfs hierarchy are set to right values:

  • to the configured cpuset belonging to the shared pool for sharedtestcontainer
  • to the a cpuset containing only the ID of the 2 chosen cores for exclusivetestcontainer
  • to the configured cpuset belonging to the default pool for defaulttestcontainer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.