Giter VIP home page Giter VIP logo

kube-events-exporter's Introduction

Red Hat Observability Service

This project holds the configuration files for our internal Red Hat Observability Service based on Observatorium.

See our website for more information about RHOBS.

Requirements

  • Go 1.17+

macOS

  • findutils (for GNU xargs)
  • gnu-sed

Both can be installed using Homebrew: brew install gnu-sed findutils. Afterwards, update the SED and XARGS variables in the Makefile to use gsed and gxargs or replace them in your environment.

Usage

This repository contains Jsonnet configuration that allows generating Kubernetes objects that compose RHOBS service and its observability.

RHOBS service

The jsonnet files for RHOBS service can be found in services directory. In order to compose RHOBS Service we import many Jsonnet libraries from different open source repositories including kube-thanos for Thanos components, Observatorium for Observatorium, Minio, Memcached, Gubernator, Dex components, thanos-receive-controller for Thanos receive controller component, parca for Parca component, observatorium api for API component, observatorium up for up component, rules-objstore for rules-objstore component.

Currently, RHOBS components are rendered as OpenShift Templates that allows parameters. This is how we deploy to multiple clusters, sharing the same configuration core, but having different details like resources or names.

This is why there might be a gap between vanilla Observatorium and RHOBS. We have plans to resolve this gap in the future.

Running make manifests generates all required files into resources/services directory.

Observability

Similarly, in order to have observability (alerts, recording rules, dashboards) for our service we import mixins from various projects and compose all together in observability directory.

Running make prometheusrules grafana generates all required files into resources/observability directory.

Updating Dependencies

Up-to-date list of jsonnet dependencies can be found in jsonnetfile.json. Fetching all deps is done through make vendor_jsonnet utility.

To update a dependency, normally the process would be:

make vendor_jsonnet # This installs dependencies like `jb` thanks to Bingo project.
JB=`ls $(go env GOPATH)/bin/jb-* -t | head -1`

# Updates `kube-thanos` to master and sets the new hash in `jsonnetfile.lock.json`.
$JB update https://github.com/thanos-io/kube-thanos/jsonnet/kube-thanos@main

# Update all dependancies to master and sets the new hashes in `jsonnetfile.lock.json`.
$JB update

Testing cluster

The purpose of RHOBS testing cluster is to experiment before changes are rolled out to staging and production environments. The objects in the cluster are managed by app-interface, however the testing cluster uses a different set of namespaces - observatorium{-logs,-metrics,-traces}-testing.

Changes can be applied to the cluster manually, however they will be overridden by app-interface during the next deployment cycle.

Refresh token

The refresh token can be obtained via token-refresher.

./token-refresher --url=https://observatorium.apps.rhobs-testing.qqzf.p1.openshiftapps.com  --oidc.client-id=observatorium-rhobs-testing  --oidc.client-secret=<token> --log.level=debug --oidc.issuer-url=https://sso.redhat.com/auth/realms/redhat-external --oidc.audience=observatorium-telemeter-testing --file /tmp/token
cat /tmp/token

App Interface

Our deployments our managed by our Red Hat AppSRE team.

Updating Dashboards

Staging: Once the PR containing the dashboard changes is merged to main it goes directly to stage environment - because the telemeter-dashboards resourceTemplate refers the main branch here.

Production: Update the commit hash ref in the saas file in the telemeterDashboards resourceTemplate, for production environment.

Prometheus Rules and Alerts

Use synchronize.sh to create a MR against app-interface to update dashboards.

Components - Deployments, ServiceMonitors, ConfigMaps etc...

Staging: update the commit hash ref in https://gitlab.cee.redhat.com/service/app-interface/blob/master/data/services/telemeter/cicd/saas.yaml

Production: update the commit hash ref in https://gitlab.cee.redhat.com/service/app-interface/blob/master/data/services/telemeter/cicd/saas.yaml

CI Jobs

Jobs runs are posted in:

#sd-app-sre-info for grafana dashboards

and

#team-monitoring-info for everything else.

Troubleshooting

  1. Enable port forwarding for a user - example
  2. Add a pod name to the allowed list for port forwarding - example

kube-events-exporter's People

Contributors

dgrisonnet avatar lilic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kube-events-exporter's Issues

Spike: support events.k8s.io api group

Description

This project was originally meant to only support core/v1 Events; however, as the new Event API is planned to go GA in Kubernetes v1.19, we might want to consider supporting it.

Impactful changes

EventRecorder

The core API EventRecorder was designed to update the apiserver every time an Event was generated. This was really useful to this project as it means that every time an Event was generated we could be noticed right away by watching the resource on the apiserver.

However, in order to reduce the number of apiserver calls made by the EventRecorder, this behavior was changed in the new implementation. In the new implementation there are 3 mechanisms in place updating the apiserver:

  • A singleton Event is created
  • An EventSeries is broken (no new isomorphic Event in the last 6 minutes)
  • A 30minutes heartbeat send updates about the existing EventSeries

These changes reduce the precision we will be able to have on Events as we will only be able to receive updates from them every 30 minutes. This might require redefining the goals of this project.

Event structure

The InvolvedObject field was renamed Regarding to make the semantic more obvious. Since we were using the InvolvedObject naming in our labels we might also want to consider renaming:

involved_object_kind => regarding_kind
involved_object_namespace => regarding_namespace

--
Event API proposal
Event API GA

Fix kube_events_total inconsistencies when isomorphic Events are created

Description

When an Event is created, we increment the kube_events_total counter by 1. However, in some cases, a single Event creation can result in multiple isomorphic Events being created.

When an Event is updated, this is handled correctly because we add the difference between the counts of the old Event and the new one to the counter.

Event creation

"k8s.io/client-go/tools/record/" EventRecorder

There are 2 cases where an Event is created with this recorder:

  • The Event count is equal to 1
  • The recorder tried to patch an Event that does not exist anymore on the apiserver

With this recorder, each Event generation results in one request to the apiserver. Thus, the assumption of one Create event resulting in one Event creation is correct.

"k8s.io/client-go/tools/events/" EventRecorder

There are 2 cases where an Event is created with this recorder:

  • The Event series is nil
  • The recorder tried to patch an Event that does not exist anymore on the apiserver

Both behaviour are really similar however, in this particular case, in order to reduce the number of call made to the apiserver, it was chosen to only update the Events on the apiserver every 30 minutes via a heartbeat (goroutine). Meaning that all updates during these 30 minutes gets merged into a single request. With a PATCH request, we won't have any problem to know how many Events were generated during this period since we will be able to get both the previous count and the new one. However, if the PATCH request fail, the Event will be recreated and we won't be able to know the previous count.

Potential solutions

  • When an Event is created add the Event count to the counter.

We can't do that because, in the case of a deleted Event, the next update will result in this Event recreation with a count equal to all the older Events + the new ones.

Example: Delete Ev{Count=100} (retention) => Update Ev{Count=101} => Fail (does not exist) => Create Ev{Count=101}

  • When an Event is created for the first time if we increment the counter by one and store this Event + its count in an internal cache. We will then keep the data up to date in the internal cache and use it as the old Event count when updating the counter.

We will have inconsistencies when multiple Events are created with the same apiserver request and seen for the first time by the exporter.

  • Have a sync period before starting collecting metrics in order to save existing Event counts in an internal cache.

This sync period needs to be longer than the period between 2 heartbeats (+30min). However, we won't have any inconsistencies with this method.

--
Current event handler in charge of updating the kube_events_total counter.

Fix exporter polling in e2e tests

Currently, the polling mechanism used in the e2e tests to wait for the exporter to be running is based on the ready replicas count of the exporter deployment. This introduces flakes when creating Events right after creating the exporter as the informer might not be running yet.

In order to fix that, we should poll the /healthz endpoint instead.

Add Kubernetes Event Collector

The collector should handle the following Counter metric:

kube_events_total{type=””, involved_object_namespace=””, involved_object_kind=””, reason=””}

Expose metrics about the exporter received requests

The exporter should expose the following metrics about itself:

  • kube_events_exporter_version
  • kube_events_exporter_requests_total
  • kube_events_exporter_requests_in_flight
  • kube_events_exporter_request_duration_seconds_sum
  • kube_events_exporter_request_duration_seconds_count
  • kube_events_exporter_request_duration_seconds_bucket

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.