Giter VIP home page Giter VIP logo

kubenurse's Introduction

CI Coverage Status GitHub release (latest SemVer)

Kubenurse

kubenurse is a little service that monitors all network connections in a Kubernetes cluster. Kubenurse measures request durations, records errors and exports those metrics in Prometheus format.

Deployment

You can get the Docker image from Docker Hub. The examples directory contains manifests which can be used to deploy kubenurse to the kube-system namespace of your cluster.

Helm deployment

You can also deploy kubenurse with Helm, the Chart can be found in the helm/kubenurse directory.
The following command can be used to install kubenurse with Helm: helm upgrade [RELEASE_NAME] --install -f values.yaml ./helm/kubenurse/

Configuration settings

Setting Description Default
daemonset.image.repository The repository name postfinance/kubenurse
daemonset.image.tag The tag/ version of the image v1.4.0
namespace The namespace where kubenurse will be deployed kube-system
serviceAccount.name The name of the service account which is used kubenurse
service.name The name of service which exposes the kubenurse application 8080-8080
service.port The port number of the service 8080
ingress.enabled Enable/ Disable the ingress true
ingress.className The classname of the ingress controller (e.g. the nginx ingress controller) nginx
ingress.url The url of the ingress; e.g. kubenurse.westeurope.cloudapp.azure.com dummy.kubenurse.com
rbac.allow_unschedulable.enabled Configure a clusterrole and clusterrolebinding if env KUBENURSE_ALLOW_UNSCHEDULABLE is set to false false

After everything is set up and Prometheus scrapes the kubenurses, you can build dashboards that show network latencies and errors or use the metrics for alarming.

Grafana ingress view Grafana path view

Configuration

kubenurse is configured with environment variables:

  • KUBENURSE_INGRESS_URL: An URL to the kubenurse in order to check the ingress
  • KUBENURSE_SERVICE_URL: An URL to the kubenurse in order to check the Kubernetes service
  • KUBENURSE_INSECURE: If "true", TLS connections will not validate the certificate
  • KUBENURSE_EXTRA_CA: Additional CA cert path for TLS connections
  • KUBENURSE_NAMESPACE: Namespace in which to look for the neighbour kubenurses
  • KUBENURSE_NEIGHBOUR_FILTER: A Kubernetes label selector (eg. app=kubenurse) to filter neighbour kubenurses
  • KUBENURSE_ALLOW_UNSCHEDULABLE: If this is "true", path checks to neighbouring kubenurses are made even if they are running on unschedulable nodes.
  • KUBENURSE_USE_TLS: If this is "true", enable TLS endpoint on port 8443
  • KUBENURSE_CERT_FILE: Certificate to use with TLS endpoint
  • KUBENURSE_CERT_KEY: Key to use with TLS endpoint

Following variables are injected to the Pod by Kubernetes and should not be defined manually:

  • KUBERNETES_SERVICE_HOST: Host to communicate to the kube-apiserver
  • KUBERNETES_SERVICE_PORT: Port to communicate to the kube-apiserver

The used http client appends the certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt if found.

http Endpoints

The kubenurse service listens for http requests on port 8080 (optionally https on port 8443) and exposes endpoints:

  • /: Redirects to /alive
  • /alive: Returns a pretty printed JSON with the check results, described below
  • /alwayshappy: Returns http-200 which is used for testing itself
  • /metrics: Exposes Prometheus metrics

The /alive endpoint returns a JSON like this with status code 200 if everything is OK else 500:

{
  "api_server_direct": "ok",
  "api_server_dns": "ok",
  "me_ingress": "ok",
  "me_service": "ok",
  "hostname": "kubenurse-1234-x2bwx",
  "neighbourhood_state": "ok",
  "neighbourhood": [
   {
    "PodName": "kubenurse-1234-8fh2x",
    "PodIP": "10.10.10.67",
    "HostIP": "10.12.12.66",
    "NodeName": "k8s-66.example.com",
    "Phase": "Running"
   },
   {
    "PodName": "kubenurse-1234-ffjbs",
    "PodIP": "10.10.10.138",
    "HostIP": "10.12.12.89",
    "NodeName": "k8s-89.example.com",
    "Phase": "Running"
   }
  ],
  "headers": {
   "Accept": [
    "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
   ],
   "Accept-Encoding": [
    "gzip, deflate, br"
   ],
   ...
  }
}

Health Checks

Every five seconds and on every access of /alive, the checks described below are run. Check results are cached for 3 seconds in order to prevent excessive network traffic.

A little illustration of what communication occurs, is here:

Communication

API Server Direct

Checks the /version endpoint of the Kubernetes API Server through the direct link (KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT).

Metric type: api_server_direct

API Server DNS

Checks the /version endpoint of the Kubernetes API Server through the Cluster DNS URL https://kubernetes.default.svc:$KUBERNETES_SERVICE_PORT. This also verifies a working kube-dns deployment.

Metric type: api_server_dns

Me Ingress

Checks if the kubenurse is reachable at the /alwayshappy endpoint behind the ingress. This address is provided by the environment variable KUBENURSE_INGRESS_URL that could look like https://kubenurse.example.com. This also verifies a correct upstream DNS resolution.

Metric type: me_ingress

Me Service

Checks if the kubenurse is reachable at the /alwayshappy endpoint through the Kubernetes service. The address is provided by the environment variable KUBENURSE_SERVICE_URL that could look like http://kubenurse.mynamespace.default.svc:8080. This also verifies a working kube-proxy setup.

Metric type: me_service

Neighbourhood

Checks if every neighbour kubenurse is reachable at the /alwayshappy endpoint. Neighbours are discovered by querying the kube-apiserver for every Pod in the KUBENURSE_NAMESPACE with label KUBENURSE_NEIGHBOUR_FILTER. The request is done directly to the Pod-IP (port 8080, or 8443 if TLS is enabled) and the metric types contains the prefix path_ and the hostname of the kubelet on which the neighbour kubenurse should run. Only kubenurses on nodes that are schedulable are considered as neighbours, this can be changed by setting KUBENURSE_ALLOW_UNSCHEDULABLE="true".

Metric type: path_$KUBELET_HOSTNAME

Metrics

All performed checks expose metrics which can be used to monitor/alert:

  • SDN network latencies and errors
  • kubelet-to-kubelet network latencies and errors
  • pod-to-apiserver communication
  • Ingress roundtrip latencies and errors
  • Service roundtrip latencies and errors (kube-proxy)
  • Major kube-apiserver issues
  • kube-dns (or CoreDNS) errors
  • External DNS resolution errors (ingress URL resolution)

At /metrics you will find these:

  • kubenurse_errors_total: Kubenurse error counter partitioned by error type
  • kubenurse_request_duration: Kubenurse request duration partitioned by error type, summary over one minute

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.