krkn-chaos / krkn-hub Goto Github PK
View Code? Open in Web Editor NEWContainerized wrapper around https://github.com/krkn-chaos/krkn to inject failures into Kubernetes clusters with minimal configuration.
License: Apache License 2.0
Containerized wrapper around https://github.com/krkn-chaos/krkn to inject failures into Kubernetes clusters with minimal configuration.
License: Apache License 2.0
Add in Ci to validate kraken-hub scenarios and environment variables
We need to add a short demo of Kraken-hub in the github readme, this will help with giving a quick overview of capabilities of the tooling without having to go over the docs. https://asciinema.org/ might help with this.
Kraken-hub needs to support the metrics collection and evaluation features in Kraken to be able to capture metrics of interest from in-cluster prometheus and also evaluate them to determine pass/fail: https://github.com/cloud-bulldozer/kraken#scraping-and-storing-metrics-long-term and https://github.com/cloud-bulldozer/kraken#alerts.
When we trigger a cpu/mem hog scenario, one pod is scheduled in the default namespace. After the scenario is completed, this pods keeps lying there and isn't removed.
In my view, this pod should be removed/cleaned once the scenario test is completed.
To verify, after the scenario is completed run $ oc get pods -n default
https://github.com/redhat-chaos/krkn-hub/blob/main/pod-scenarios/pod_scenario.yaml.template needs to be replaced now that we use Arcaflow based pod-scenarios: krkn-chaos/krkn#280.
Misspelled variable name in the following link:
https://github.com/redhat-chaos/krkn-hub/blob/main/docs/node-memory-hog.md#supported-parameters
Says LIMTUS_INSTALL (vs LITMUS_INSTALL)
scenarios/pvc_scenario.yaml failed with exception: <class 'UnboundLocalError'>
podman run -it --rm --name=disk --net=host --env-host=true -v $KUBECONFIG:/root/.kube/config -v $SCENARIO:/root/kraken/scenarios/pvc_scenario.yaml -d krkn-hub:pvc-scenarios
2024-07-17 02:25:10,381 [INFO] Starting kraken
2024-07-17 02:25:10,390 [INFO] Initializing client to talk to the Kubernetes cluster
2024-07-17 02:25:10,390 [INFO] Generated a uuid for the run: 232d86a6-04ad-4d5e-b5de-8187b0f8a239
2024-07-17 02:25:20,834 [INFO] Fetching cluster info
2024-07-17 02:25:22,498 [INFO] Cluster version is 4.12.32
2024-07-17 02:25:22,498 [INFO] Server URL: https://<abc.com>:6443
2024-07-17 02:25:22,499 [INFO] Daemon mode not enabled, will run through 1 iterations
2024-07-17 02:25:22,499 [INFO] Executing scenarios for iteration 0
2024-07-17 02:25:22,499 [INFO] Running PVC scenario
2024-07-17 02:25:22,501 [INFO] Input params:
pvc_name: ''
pod_name: 'virt-launcher-rodan-223249-137'
namespace: 'virtualmachines'
target_fill_percentage: '75%'
duration: '60s'
2024-07-17 02:25:43,240 [INFO] Volume name: os-disk
2024-07-17 02:25:43,241 [INFO] PVC name: rodan-223249-137-os
2024-07-17 02:25:43,241 [ERROR] scenario: scenarios/pvc_scenario.yaml failed with exception: <class 'UnboundLocalError'> file: /root/kraken/kraken/pvc/pvc_scenario.py line: 141
$ cat scenarios/pvc_scenario.yaml
pvc_scenario:
pvc_name:
pod_name: virt-launcher-rodan-223249-137
namespace: virtualmachines
fill_percentage: 75
duration: 60
Kraken-hub should support ingress based network chaos scenarios now that Kraken supports it - krkn-chaos/krkn#299.
Add ability to pass the wait time parameter to the container scenario in kraken hub
After PR: krkn-chaos/krkn#395 gets merged we will need to update the pointers of the cpu, memory and io hog to the new scenarios and add the new parameters for this
Will want to take out any refernces to litmus
In the document https://github.com/redhat-chaos/krkn-hub/blob/main/docs/cerberus.md#cerberus
The history port should be 8080, so:
"It exposes the go/no-go signal at http://0.0.0.0:8080/ and metrics API at http://0.0.0.0:8080/history."
By the way, the document is really helpful. Thank you for your sharing!
We need a mechanism in place to build container images based on the pull request commits to be able to test it for every PR instead of doing the same manually.
NOTE: https://github.com/arcalot/arcaflow-plugin-image-builder can be used as a reference for implementation.
container exits prematurely due to KUBECONFIG not set properly inside a container. But when i run container in /bin/bash i can see KUBECONFIG mounted under /root/.kube/config and not sure why unset KUBECONFIG is run as per logs below.
.
$ podman run -it --rm --name=power --net=host --env-host=true -v /tmp/kubeconfig-42:/root/.kube/config:Z quay.io/krkn-chaos/krkn-hub:power-outages
sometimes physical disk failure be it full or partial failure can bring down the overall IO performance of the cluster so is there a way to simulate disk failure in Kraken?
here partial failure means predictive failure or medium errors (a few sectors have gone bad) where the disk is still accessible by the kernel/fs/application.
Want to add a common run bash file at the base level with common functions that each run.sh script will use to avoid duplication
Can start with the following functions but could be more
# Check if oc is installed
log "Checking if OpenShift client is installed"
which oc &>/dev/null
if [[ $? != 0 ]]; then
log "Looks like OpenShift client is not installed, please install before continuing"
log "Exiting"
exit 1
fi
# Check if kubectl is installed
log "Checking if kubernetes client is installed"
which kubectl &>/dev/null
if [[ $? != 0 ]]; then
log "Looks like Kubernetes client is not installed, please install before continuing"
log "Exiting"
exit 1
fi
# Check if cluster exists and print the clusterversion under test
kubectl get clusterversion
if [[ $? -ne 0 ]]; then
log "Unable to connect to the cluster, please check if it's up and make sure the KUBECONFIG is set correctly"
exit 1
fi
Kraken now support node scenarios for nodes/clusters in VMWare, it would be nice if we can add this support in krkn-hub to start leveraging node scenarios using krkn-hub wrapper.
In the node/mem/io hog scenarios, below two images are called internally from the parent image i.e.,
In disconnected environments, these images will be pulled from a connected host and mirrored on to a local registry. The image names will have to be changed in order to pull it from the local mirror, instead of Quay.
Kraken now supports OCM/ACM chaos scenarios - krkn-chaos/krkn#370, we will need to get them into Kraken-hub as well to be able to run them using podman without having to carry around or tweak config files - especially useful for CI use case.
While testing the node memory hog scenario, container always gets created on random node. When verified it is due to the input.yaml template file has selector hard coded as none {}.
https://github.com/redhat-chaos/krkn-hub/blob/main/node-memory-hog/input.yaml.template
Similar file in cpu hog scenario has the variable assigned in the yaml and is working as expected. This has to be updated so to make sure the required node has the memory stress.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.