Giter VIP home page Giter VIP logo

krkn-hub's Issues

Add demo to the readme

We need to add a short demo of Kraken-hub in the github readme, this will help with giving a quick overview of capabilities of the tooling without having to go over the docs. https://asciinema.org/ might help with this.

CPU/MEM Hog scenarios - pod generated in default namespace is not cleaned/removed

When we trigger a cpu/mem hog scenario, one pod is scheduled in the default namespace. After the scenario is completed, this pods keeps lying there and isn't removed.

In my view, this pod should be removed/cleaned once the scenario test is completed.

To verify, after the scenario is completed run $ oc get pods -n default

[ERROR] scenario: scenarios/pvc_scenario.yaml failed with exception: <class 'UnboundLocalError'> file: /root/kraken/kraken/pvc/pvc_scenario.py line: 141

scenarios/pvc_scenario.yaml failed with exception: <class 'UnboundLocalError'>

podman run -it --rm --name=disk --net=host --env-host=true -v $KUBECONFIG:/root/.kube/config -v $SCENARIO:/root/kraken/scenarios/pvc_scenario.yaml -d krkn-hub:pvc-scenarios

2024-07-17 02:25:10,381 [INFO] Starting kraken
2024-07-17 02:25:10,390 [INFO] Initializing client to talk to the Kubernetes cluster
2024-07-17 02:25:10,390 [INFO] Generated a uuid for the run: 232d86a6-04ad-4d5e-b5de-8187b0f8a239
2024-07-17 02:25:20,834 [INFO] Fetching cluster info
2024-07-17 02:25:22,498 [INFO] Cluster version is 4.12.32
2024-07-17 02:25:22,498 [INFO] Server URL: https://<abc.com>:6443
2024-07-17 02:25:22,499 [INFO] Daemon mode not enabled, will run through 1 iterations

2024-07-17 02:25:22,499 [INFO] Executing scenarios for iteration 0
2024-07-17 02:25:22,499 [INFO] Running PVC scenario
2024-07-17 02:25:22,501 [INFO] Input params:
pvc_name: ''
pod_name: 'virt-launcher-rodan-223249-137'
namespace: 'virtualmachines'
target_fill_percentage: '75%'
duration: '60s'
2024-07-17 02:25:43,240 [INFO] Volume name: os-disk
2024-07-17 02:25:43,241 [INFO] PVC name: rodan-223249-137-os
2024-07-17 02:25:43,241 [ERROR] scenario: scenarios/pvc_scenario.yaml failed with exception: <class 'UnboundLocalError'> file: /root/kraken/kraken/pvc/pvc_scenario.py line: 141

$ cat scenarios/pvc_scenario.yaml
pvc_scenario:
pvc_name:
pod_name: virt-launcher-rodan-223249-137
namespace: virtualmachines
fill_percentage: 75
duration: 60

power-outage scenario container image exits prematurely

container exits prematurely due to KUBECONFIG not set properly inside a container. But when i run container in /bin/bash i can see KUBECONFIG mounted under /root/.kube/config and not sure why unset KUBECONFIG is run as per logs below.

.
$ podman run -it --rm --name=power --net=host --env-host=true -v /tmp/kubeconfig-42:/root/.kube/config:Z quay.io/krkn-chaos/krkn-hub:power-outages

  • source /root/main_env.sh
    ++ export CERBERUS_ENABLED=False
    ++ CERBERUS_ENABLED=False
    ++ export CERBERUS_URL=http://0.0.0.0:8080
    ++ CERBERUS_URL=http://0.0.0.0:8080
    ++ export KRKN_KUBE_CONFIG=/root/.kube/config
    ++ KRKN_KUBE_CONFIG=/root/.kube/config
    ++ export WAIT_DURATION=60
    ++ WAIT_DURATION=60
    ++ export ITERATIONS=1
    ++ ITERATIONS=1
    ++ export DAEMON_MODE=False
    ++ DAEMON_MODE=False
    ++ export RETRY_WAIT=120
    ++ RETRY_WAIT=120
    ++ export PUBLISH_KRAKEN_STATUS=False
    ++ PUBLISH_KRAKEN_STATUS=False
    ++ export SIGNAL_ADDRESS=0.0.0.0
    ++ SIGNAL_ADDRESS=0.0.0.0
    ++ export PORT=8081
    ++ PORT=8081
    ++ export SIGNAL_STATE=RUN
    ++ SIGNAL_STATE=RUN
    ++ export DEPLOY_DASHBOARDS=False
    ++ DEPLOY_DASHBOARDS=False
    ++ export CAPTURE_METRICS=False
    ++ CAPTURE_METRICS=False
    ++ export ENABLE_ALERTS=False
    ++ ENABLE_ALERTS=False
    ++ export ALERTS_PATH=config/alerts
    ++ ALERTS_PATH=config/alerts
    ++ export ES_SERVER=http://0.0.0.0:9200
    ++ ES_SERVER=http://0.0.0.0:9200
    ++ export CHECK_CRITICAL_ALERTS=False
    ++ CHECK_CRITICAL_ALERTS=False
    ++ export KUBE_BURNER_URL=https://github.com/cloud-bulldozer/kube-burner/releases/download/v1.7.0/kube-burner-1.7.0-Linux-x86_64.tar.gz
    ++ KUBE_BURNER_URL=https://github.com/cloud-bulldozer/kube-burner/releases/download/v1.7.0/kube-burner-1.7.0-Linux-x86_64.tar.gz
    ++ export TELEMETRY_ENABLED=False
    ++ TELEMETRY_ENABLED=False
    ++ export TELEMETRY_API_URL=https://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
    ++ TELEMETRY_API_URL=https://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
    ++ export TELEMETRY_USERNAME=redhat-chaos
    ++ TELEMETRY_USERNAME=redhat-chaos
    ++ export TELEMETRY_PASSWORD=
    ++ TELEMETRY_PASSWORD=
    ++ export TELEMETRY_PROMETHEUS_BACKUP=True
    ++ TELEMETRY_PROMETHEUS_BACKUP=True
    ++ export TELEMTRY_FULL_PROMETHEUS_BACKUP=False
    ++ TELEMTRY_FULL_PROMETHEUS_BACKUP=False
    ++ export TELEMETRY_BACKUP_THREADS=5
    ++ TELEMETRY_BACKUP_THREADS=5
    ++ export TELEMETRY_ARCHIVE_PATH=/tmp
    ++ TELEMETRY_ARCHIVE_PATH=/tmp
    ++ export TELEMETRY_MAX_RETRIES=0
    ++ TELEMETRY_MAX_RETRIES=0
    ++ export TELEMETRY_RUN_TAG=chaos
    ++ TELEMETRY_RUN_TAG=chaos
    ++ export TELEMETRY_ARCHIVE_SIZE=1000
    ++ TELEMETRY_ARCHIVE_SIZE=1000
    ++ export TELEMETRY_LOGS_BACKUP=False
    ++ TELEMETRY_LOGS_BACKUP=False
    ++ export 'TELEMETRY_FILTER_PATTERN=["(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2}\.\d+).+","kinit (\d+/\d+/\d+\s\d{2}:\d{2}:\d{2})\s+","(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z).+"]'
    ++ TELEMETRY_FILTER_PATTERN='["(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2}\.\d+).+","kinit (\d+/\d+/\d+\s\d{2}:\d{2}:\d{2})\s+","(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z).+"]'
    ++ export TELEMETRY_CLI_PATH=
    ++ TELEMETRY_CLI_PATH=
    ++ export TELEMETRY_EVENTS_BACKUP=True
    ++ TELEMETRY_EVENTS_BACKUP=True
    ++ unset KUBECONFIG
  • source /root/env.sh
    ++ export SHUTDOWN_DURATION=1200
    ++ SHUTDOWN_DURATION=1200
    ++ export CLOUD_TYPE=aws
    ++ CLOUD_TYPE=aws
    ++ export TIMEOUT=180
    ++ TIMEOUT=180
    ++ export SCENARIO_TYPE=cluster_shut_down_scenarios
    ++ SCENARIO_TYPE=cluster_shut_down_scenarios
    ++ export 'SCENARIO_FILE=- scenarios/cluster_shut_down_scenario.yml'
    ++ SCENARIO_FILE='- scenarios/cluster_shut_down_scenario.yml'
    ++ export SCENARIO_POST_ACTION=
    ++ SCENARIO_POST_ACTION=
  • source /root/common_run.sh
  • config_setup
  • envsubst
  • checks
  • check_oc
  • log 'Checking if OpenShift client is installed'
    ++ date +%d-%m-%YT%H:%M:%S
  • echo -e '\033[1m10-07-2024T02:45:06 Checking if OpenShift client is installed\033[0m'
    10-07-2024T02:45:06 Checking if OpenShift client is installed
  • which oc
  • alias
  • eval declare -f
    ++ declare -f
  • /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot oc
    /usr/local/bin/oc
  • check_kubectl
  • log 'Checking if kubernetes client is installed'
    ++ date +%d-%m-%YT%H:%M:%S
  • echo -e '\033[1m10-07-2024T02:45:06 Checking if kubernetes client is installed\033[0m'
    10-07-2024T02:45:06 Checking if kubernetes client is installed
  • which kubectl
  • alias
  • eval declare -f
    ++ declare -f
  • /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot kubectl
    /usr/local/bin/kubectl
  • check_cluster_version
  • kubectl version
    WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
    Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"33a7a8bcccdd1c7c0e2f51609d832d31232d2f26", GitTreeState:"clean", BuildDate:"2023-12-13T22:07:37Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}
    Kustomize Version: v5.0.1
    Unable to connect to the server: EOF
  • log 'Unable to connect to the cluster, please check if it'''s up and make sure the KUBECONFIG is set correctly'
    ++ date +%d-%m-%YT%H:%M:%S
  • echo -e '\033[1m10-07-2024T02:45:17 Unable to connect to the cluster, please check if it'''s up and make sure the KUBECONFIG is set correctly\033[0m'
    10-07-2024T02:45:17 Unable to connect to the cluster, please check if it's up and make sure the KUBECONFIG is set correctly
  • exit 1

simulate a disk failure on the cluster node (full or partial)

sometimes physical disk failure be it full or partial failure can bring down the overall IO performance of the cluster so is there a way to simulate disk failure in Kraken?

here partial failure means predictive failure or medium errors (a few sectors have gone bad) where the disk is still accessible by the kernel/fs/application.

Create common_run bash script for all run files to use

Want to add a common run bash file at the base level with common functions that each run.sh script will use to avoid duplication

Can start with the following functions but could be more

# Check if oc is installed
log "Checking if OpenShift client is installed"
which oc &>/dev/null
if [[ $? != 0 ]]; then
  log "Looks like OpenShift client is not installed, please install before continuing"
  log "Exiting"
  exit 1
fi

# Check if kubectl is installed
log "Checking if kubernetes client is installed"
which kubectl &>/dev/null
if [[ $? != 0 ]]; then
  log "Looks like Kubernetes client is not installed, please install before continuing"
  log "Exiting"
  exit 1
fi

# Check if cluster exists and print the clusterversion under test
kubectl get clusterversion
if [[ $? -ne 0 ]]; then
  log "Unable to connect to the cluster, please check if it's up and make sure the KUBECONFIG is set correctly"
  exit 1
fi

Add vmware node scenario support in krkn-hub

Kraken now support node scenarios for nodes/clusters in VMWare, it would be nice if we can add this support in krkn-hub to start leveraging node scenarios using krkn-hub wrapper.

Parameterize internal image names for node/mem/io hog scenarios to support disconnected environments

In the node/mem/io hog scenarios, below two images are called internally from the parent image i.e.,

  • quay.io/arcalot/arcaflow-plugin-kubeconfig:0.2.0
  • quay.io/arcalot/arcaflow-plugin-stressng:0.3.1

In disconnected environments, these images will be pulled from a connected host and mirrored on to a local registry. The image names will have to be changed in order to pull it from the local mirror, instead of Quay.

OCM/ACM chaos scenarios integration

Kraken now supports OCM/ACM chaos scenarios - krkn-chaos/krkn#370, we will need to get them into Kraken-hub as well to be able to run them using podman without having to carry around or tweak config files - especially useful for CI use case.

Node selector doesn't work for node memory hog scenario

While testing the node memory hog scenario, container always gets created on random node. When verified it is due to the input.yaml template file has selector hard coded as none {}.

https://github.com/redhat-chaos/krkn-hub/blob/main/node-memory-hog/input.yaml.template

Similar file in cpu hog scenario has the variable assigned in the yaml and is working as expected. This has to be updated so to make sure the required node has the memory stress.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.