Giter VIP home page Giter VIP logo

aws-devopsguru-eks-test-harness's Introduction

DevOps Guru EKS Test Harness

This project allows one to deploy an EKS cluster in their account and trigger various failure modes via a test client, in order to demonstrate functionality of DevOps Guru in a context of Kubernetes cluster.

Requirements

In order to operate this test harness you will need the following:

Installing the harness

In order to provision the cluster and install all the necessary elements:

  • Authenticate into your AWS account using credentials that have mutating permissions.
aws configure
  • Run the bootstrap script in the root folder of the repository.
./bootstrap.sh

Inspecting the cluster

If you would like to inspect the content of deployed EKS cluster, start kubectl proxy via the script in the root of the repository

./start_proxy.sh

This will allow you to view:

In order to stop the proxy process, run

./stop_proxy.sh

In order to get access token for Kubernetes dashboard, run

./get_dashboard_token.sh

Running tests

Before running tests, please make sure that your cluster has been running for at least 60 minutes, to give DevOps Guru a chance to ingest and index all the metrics.

In order to run test cases, make sure you have Python 3.6+ interpreter installed and run:

./run_test.sh <test_name>

Currently supported tests scenarios:

  • alb_4xx - triggers a series of 4XX errors in test API, producing ApplicationELB HTTPCode_Target_4XX_Count Anomalous insights in DevOps Guru. Please keep in mind, that this can take up to 15-20 minutes to trigger.
  • alb_5xx triggers a series of 5XX errors in test API, producing ApplicationELB HTTPCode_Target_5XX_Count Anomalous insights in DevOps Guru. Please keep in mind, that this can take up to 15-20 minutes to trigger.
  • stop_instance - stops one of underlying EC2 instances in EKS node group, producing ContainerInsights cluster_failed_node_count Anomalous In Stack eksctl-DevOpsGuruTestCluster-cluster insight in DevOps Guru.
  • restart_instance - restarts all the underlying EC2 instances in EKS node group, ending the anomaly caused by stop_instance.
  • enable_cpu_stress_test - enables CPU stress test mode, which brings overall cluster CPU utilization to above 90%. After 30 minutes, this produces an anomaly, which does not produce a separate insight, but will be shown as a part of alb_5xx, alb_4xx and stop_instance insights. Before enabling this mode, make sure that the cluster has been running for at least 60 minutes to establish baseline for utilization.
  • disable_cpu_stress_test - disables CPU stress test mode mentioned in enable_cpu_stress_test
  • trigger_pod_crash - installs a misconfigured deployment that induces a rolling pod crash due to a failing probe to demonstrate pod_number_of_container_restarts insights
  • disable_pod_crash - restores normal deployment configuration after trigger_pod_crash

Anomalous metric values can be confirmed via CloudWatch console, and DevOps Guru produced anomalies can be seen in DevOps Guru console.

Cleaning up test resources

In order to clean up test harness resources from your account you can run:

./cleanup.sh

In case the cleanup script fails, you can attempt manual deletion of CloudFormation stack names eksctl-DevOpsGuruTestCluster-cluster.

aws-devopsguru-eks-test-harness's People

Contributors

dependabot[bot] avatar mirelap-amazon avatar lukasvoda avatar hjarrell avatar amazon-auto avatar anthonymykhailenko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.