Giter VIP home page Giter VIP logo

pushing-netperf-metrics-to-prometheus's Introduction

pushing-netperf-metrics-to-prometheus

Repository for the netperf component used by the Network-Aware framework for the Kubernetes platform based the Scheduler Framework.

Usage & Prerequisites

The goal is to perform netperf measurements between cluster nodes in Kubernetes and have the values available as metrics in Prometheus to make scheduling decisions based on latency.

First, Prometheus needs to be installed in the k8s-cluster. In our experiments, we have installed kube-prometheus. Please follow their guidelines on how to properly set up kube-prometheus in your cluster.

Then, we need to install the Prometheus PushGateway since we are pushing the metrics to Prometheus via its PushGateway.

To install the PushGateway alongside kube-prometheus, you can use helm to deploy it in your cluster:

helm install prometheus-pushgateway prometheus-community/prometheus-pushgateway --namespace monitoring --set serviceMonitor.enabled=true

You need to enable the service monitor to make sure kube-prometheus gets the data from the Pushgateway.

After installing kube-prometheus and the PushGateway, your monitoring namespace should look like this:

kubectl get pods -n monitoring

NAME                                      READY   STATUS    RESTARTS   AGE
alertmanager-main-0                       2/2     Running   0          43s
alertmanager-main-1                       2/2     Running   0          43s
alertmanager-main-2                       2/2     Running   0          43s
blackbox-exporter-6798fb5bb4-t6szj        3/3     Running   0          42s
grafana-7476b4c65b-k7lng                  1/1     Running   0          40s
kube-state-metrics-74964b6cd4-xgh8f       3/3     Running   0          39s
node-exporter-29gh8                       2/2     Running   0          37s
node-exporter-2pvnr                       2/2     Running   0          37s
node-exporter-bw87p                       2/2     Running   0          38s
node-exporter-dbfjq                       2/2     Running   0          38s
node-exporter-dk9hg                       2/2     Running   0          37s
node-exporter-fbhkg                       2/2     Running   0          37s
node-exporter-gblnq                       2/2     Running   0          37s
node-exporter-hhgl5                       2/2     Running   0          38s
node-exporter-njpp5                       2/2     Running   0          37s
node-exporter-nwrpb                       2/2     Running   0          37s
node-exporter-pkwgx                       2/2     Running   0          38s
node-exporter-szwpp                       2/2     Running   0          37s
node-exporter-w49j6                       2/2     Running   0          38s
node-exporter-z5ft9                       2/2     Running   0          38s
node-exporter-zqrx7                       2/2     Running   0          38s
prometheus-adapter-5b8db7955f-c7n99       1/1     Running   0          34s
prometheus-adapter-5b8db7955f-g2nkh       1/1     Running   0          34s
prometheus-k8s-0                          2/2     Running   0          32s
prometheus-k8s-1                          2/2     Running   0          32s
prometheus-operator-75d9b475d9-t4txw      2/2     Running   0          88s
prometheus-pushgateway-59b66c656c-nzlc6   1/1     Running   0          78s

Before running the netperf component, you should run the port-forward command to access the PushGateway via http://localhost:9090:

kubectl port-forward prometheus-pushgateway-59b66c656c-nzlc6 -n monitoring 9091

Netperf Component

The netperf component is based on the k8s-netperf open-sourced here

First, you should run kubectl apply -f k8s-netperf.yaml to deploy the netperf pods as daemon sets. After deployment, the number of pods in the cluster depends on the number of nodes in your cluster:

kubectl get pods -n default

NAME                           READY   STATUS    RESTARTS   AGE
netperf-host-2hngp             1/1     Running   0          15s
netperf-host-47jj6             1/1     Running   0          15s
netperf-host-4wbfq             1/1     Running   0          15s
netperf-host-7ltcw             1/1     Running   0          15s
netperf-host-9stmq             1/1     Running   0          15s
netperf-host-dfvmq             1/1     Running   0          15s
netperf-host-ftssm             1/1     Running   0          15s
netperf-host-hdzwq             1/1     Running   0          15s
netperf-host-hlvsh             1/1     Running   0          15s
netperf-host-jznvh             1/1     Running   0          15s
netperf-host-knnts             1/1     Running   0          15s
netperf-host-qfhms             1/1     Running   0          15s
netperf-host-qqhhg             1/1     Running   0          15s
netperf-host-srjz6             1/1     Running   0          15s
netperf-host-xj8lm             1/1     Running   0          16s
netperf-pod-2pk86              1/1     Running   0          16s
netperf-pod-77c9846498-gpfbv   1/1     Running   0          16s
netperf-pod-77c9846498-jrv7w   1/1     Running   0          16s
netperf-pod-8jhrg              1/1     Running   0          16s
netperf-pod-chlwx              1/1     Running   0          16s
netperf-pod-fhz9m              1/1     Running   0          16s
netperf-pod-gmhp2              1/1     Running   0          16s
netperf-pod-k6xfb              1/1     Running   0          16s
netperf-pod-mz6xh              1/1     Running   0          16s
netperf-pod-p6knm              1/1     Running   0          16s
netperf-pod-s29p4              1/1     Running   0          16s
netperf-pod-sd7k5              1/1     Running   0          16s
netperf-pod-tg5gw              1/1     Running   0          16s
netperf-pod-vbqj6              1/1     Running   0          16s
netperf-pod-w955l              1/1     Running   0          16s
netperf-pod-wt9cg              1/1     Running   0          16s
netperf-pod-z9r86              1/1     Running   0          16s

Then, you run our script ./runTest.sh to:

  • run the netperf measurements.
  • read the .csv and send the data to the Pushgateway.
  • repeat the loop, a timer can be set up (i.e., sleep)

The k8s-netperf perl script has been modified to run netperf tests between nodes and save the results in a .csv file. Two options exist:

  1. runNetperf.pl: It runs a test per node to another random node.
  2. runNetperfV2.pl: It runs a test for all nodes to every possible destination.

The first option is faster but not all data will be available at a given moment:

The second option takes longer but all data is available in Prometheus.

The goal is to send to the Pushgateway the latency percentiles between nodes (i.e., 50th, 90th and 99th percentile latency in microseconds).

A python program netperf_reporter.py sends the data to the PushGateway based on the netperf reporter written for Cilium open-sourced here

Examples

Example of a results.csv file:

origin,destination,netperf_p50_latency_microseconds,netperf_p90_latency_microseconds,netperf_p99_latency_microseconds
n5,n1,458,610,737
n4,n3,435,525,595
n3,n2,375,457,496
n2,n5,478,586,695
n1,n4,420,497,635

Metrics sent via the netperf reporter to the Prometheus PushGateway:

Query example in Prometheus:

Contact

For questions or support, please use GitHub's issue system.

pushing-netperf-metrics-to-prometheus's People

Contributors

jpedro1992 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.