krkn-chaos / arcaflow-plugin-kill-pod Goto Github PK
View Code? Open in Web Editor NEWChaos Engineering - Kill Pod scenario plugin for Arcaflow
License: Apache License 2.0
Chaos Engineering - Kill Pod scenario plugin for Arcaflow
License: Apache License 2.0
Integrated the new arcaflow-lib-kubernetes in kill-pod
Kraken and chaos-engineering
2022-10-18T09:42:59Z info python code style and quality check found these issues for arcaflow-plugin-kill-pod version dev-build
2022-10-18T09:42:59Z info (%!w(string=/github/workspace/arcaflow_plugin_kill_pod.py:13:1: F401 'arcaflow_plugin_sdk.schema' imported but unused
from arcaflow_plugin_sdk import validation, plugin, schema
^
/github/workspace/arcaflow_plugin_kill_pod.py:45:80: E501 line too long (84 > 79 characters)
if (name_pattern is None or name_pattern.match(pod.metadata.name)) and \
namespace_pattern.match(pod.metadata.namespace):
^
/github/workspace/arcaflow_plugin_kill_pod.py:64:80: E501 line too long (111 > 79 characters)
"description": "Map between timestamps and the pods removed. The timestamp is provided in nanoseconds."
^
/github/workspace/arcaflow_plugin_kill_pod.py:84:80: E501 line too long (103 > 79 characters)
"""
This is a configuration structure specific to pod kill scenario. It describes which pod from which
namespace(s) to select for killing and how many pods to kill.
"""
^
/github/workspace/arcaflow_plugin_kill_pod.py:98:80: E501 line too long (99 > 79 characters)
"description": "Regular expression for target pods. Required if label_selector is not set."
^
/github/workspace/arcaflow_plugin_kill_pod.py:103:80: E501 line too long (110 > 79 characters)
metadata={"name": "Number of pods to kill", "description": "How many pods should we attempt to kill?"}
^
/github/workspace/arcaflow_plugin_kill_pod.py:112:80: E501 line too long (110 > 79 characters)
"description": "Kubernetes label selector for the target pods. Required if name_pattern is not set.\n"
^
/github/workspace/arcaflow_plugin_kill_pod.py:113:80: E501 line too long (115 > 79 characters)
"See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ for details."
^
/github/workspace/arcaflow_plugin_kill_pod.py:118:80: E501 line too long (84 > 79 characters)
"description": "Path to your Kubeconfig file. Defaults to ~/.kube/config.\n"
^
/github/workspace/arcaflow_plugin_kill_pod.py:119:80: E501 line too long (119 > 79 characters)
"See https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/ for "
^
/github/workspace/arcaflow_plugin_kill_pod.py:125:80: E501 line too long (88 > 79 characters)
"description": "Timeout to wait for the target pod(s) to be removed in seconds."
^
/github/workspace/arcaflow_plugin_kill_pod.py:130:80: E501 line too long (91 > 79 characters)
"description": "How many seconds to wait between checks for the target pod status."
^
/github/workspace/arcaflow_plugin_kill_pod.py:140:80: E501 line too long (107 > 79 characters)
def kill_pods(cfg: KillPodConfig) -> typing.Tuple[str, typing.Union[PodKillSuccessOutput, PodErrorOutput]]:
^
/github/workspace/arcaflow_plugin_kill_pod.py:146:80: E501 line too long (99 > 79 characters)
pods = _find_pods(core_v1, cfg.label_selector, cfg.name_pattern, cfg.namespace_pattern)
^
/github/workspace/arcaflow_plugin_kill_pod.py:149:80: E501 line too long (120 > 79 characters)
"Not enough pods match the criteria, expected {} but found only {} pods".format(cfg.kill, len(pods))
^
/github/workspace/arcaflow_plugin_kill_pod.py:159:80: E501 line too long (110 > 79 characters)
core_v1.delete_namespaced_pod(pod.metadata.name, pod.metadata.namespace, body=V1DeleteOptions(
^
/github/workspace/arcaflow_plugin_kill_pod.py:177:25: F841 local variable 'read_pod' is assigned to but never used
read_pod = core_v1.read_namespaced_pod(p.name, p.namespace)
^
/github/workspace/arcaflow_plugin_kill_pod.py:177:80: E501 line too long (83 > 79 characters)
read_pod = core_v1.read_namespaced_pod(p.name, p.namespace)
^
/github/workspace/arcaflow_plugin_kill_pod.py:185:80: E501 line too long (99 > 79 characters)
return "error", PodErrorOutput("Timeout while waiting for pods to be removed.")
^
/github/workspace/arcaflow_plugin_kill_pod.py:215:80: E501 line too long (98 > 79 characters)
metadata={"name": "Pod count", "description": "Wait for at least this many pods to exist"}
^
/github/workspace/arcaflow_plugin_kill_pod.py:220:80: E501 line too long (84 > 79 characters)
metadata={"name": "Timeout", "description": "How many seconds to wait for?"}
^
/github/workspace/arcaflow_plugin_kill_pod.py:225:80: E501 line too long (91 > 79 characters)
"description": "How many seconds to wait between checks for the target pod status."
^
/github/workspace/arcaflow_plugin_kill_pod.py:237:80: E501 line too long (115 > 79 characters)
def wait_for_pods(cfg: WaitForPodsConfig) -> typing.Tuple[str, typing.Union[PodWaitSuccessOutput, PodErrorOutput]]:
^
/github/workspace/arcaflow_plugin_kill_pod.py:[24](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:25)5:80: E501 line too long (103 > 79 characters)
pods = _find_pods(core_v1, cfg.label_selector, cfg.name_pattern, cfg.namespace_pattern)
^
/github/workspace/arcaflow_plugin_kill_pod.py:2[48](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:49):80: E[50](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:51)1 line too long (118 > [79](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:80) characters)
return "success", \
PodWaitSuccessOutput(list(map(lambda p: Pod(p.metadata.namespace, p.metadata.name), pods)))
^
))
When running the tests on a 4.12 openshift cluster seeing the below error because of the new restrictions on creating pods in restricted namespaces
test_watch (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1378, in run
self.function(*self.args, **self.kwargs)
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/tests/test_arcaflow_plugin_kill_pod.py", line 162, in create_test_pod
core_v1.create_namespaced_pod("default", V1Pod(
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod
return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info
return self.api_client.call_api(
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/rest.py", line 276, in POST
return self.request("POST", url,
File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/rest.py", line 235, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '75dd2079-c7f0-4fdc-a032-9e5c284f0ebc', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '51138c75-8252-4d1c-b9c4-9e29122004fc', 'X-Kubernetes-Pf-Prioritylevel-Uid': '1bbd872d-d7fb-48f8-9a2c-7105d44806dd', 'Date': 'Thu, 27 Oct 2022 15:21:33 GMT', 'Content-Length': '688'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"watch-test-irtwgirf\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"test\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container \"test\" must set securityContext.capabilities.drop=[\"ALL\"]), runAsNonRoot != true (pod or container \"test\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container \"test\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")","reason":"Forbidden","details":{"name":"watch-test-irtwgirf","kind":"pods"},"code":403}
4.12 Openshift cluster
Think this also applies to 1.25 kuberenetes clusters
python -m coverage run -a -m unittest discover -s tests -v
https://kubernetes.io/docs/concepts/security/pod-security-standards/
Running unit tests we get some resource allocation warning:
test_kill_pod (test_arcaflow_plugin_kill_pod.KillPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 52444), raddr=('127.0.0.1', 38689)>
result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/unittest/case.py:597: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('[12](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286360/jobs/5383107149#step:10:13)7.0.0.1', 52434), raddr=('127.0.0.1', 38689)>
self.doCleanups()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_not_enough_pods (test_arcaflow_plugin_kill_pod.KillPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48206), raddr=('127.0.0.1', 38689)>
result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_serialization (test_arcaflow_plugin_kill_pod.KillPodTest) ... ok
test_serialization (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... ok
test_timeout (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... /opt/hostedtoolcache/Python/3.9.[14](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286360/jobs/5383107149#step:10:15)/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48218), raddr=('127.0.0.1', 38689)>
result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_watch (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48228), raddr=('127.0.0.1', 38689)>
result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/unittest/case.py:597: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 37830), raddr=('127.0.0.1', 38689)>
self.doCleanups()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
Run unit tests.
In rosa testing on prow I found the following error. We need to check if the pod status is not none and probably the pod overall
2023-05-25 20:15:53,175 [INFO] Executing scenarios for iteration 0
2023-05-25 20:15:53,175 [INFO] scenario pod-scenarios/pod_scenario.yaml
2023-05-25 20:15:54,398 [INFO] {
"output_id": "success",
"output_data": {
"pods": {
"1685045753389784028": {
"namespace": "openshift-etcd",
"name": "etcd-ip-10-0-178-150.ec2.internal"
}
}
}
}
2023-05-25 20:17:07,192 [INFO] {
"output_id": "error",
"output_data": {
"error": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.9/site-packages/arcaflow_plugin_kill_pod.py\", line 292, in wait_for_pods\n for container in pod.status.container_statuses:\nTypeError: 'NoneType' object is not iterable\n"
}
}
Looks like quay secrets are not properly set up for this repository to properly run the carpenter build. Not sure if we need this?
Run github actions on PR
https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965194/jobs/5637657081
2022-11-04T07:29:00Z info QUAY_USERNAME is empty
2022-11-04T07:29:00Z info QUAY_PASSWORD is empty
[20](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965200/jobs/5637657034#step:4:21)22-11-04T07:29:00Z info QUAY_NAMESPACE is empty
2022-11-04T07:29:00Z info Missing credentials for quay.io
20[22](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965200/jobs/5637657034#step:4:23)/11/04 07:29:00 failed requirements check, not building: arcaflow-plugin-kill-pod dev-build
The built-in tests are currently failing, and the test automation and code coverage steps are disabled in the Dockerfile as a result. Tests should be corrected and coverage re-enabled.
The pod killing scenario should only "kill" pods that are running, not complete or really in any other state. Need to add a parameter to this line to add state of pod
Run with below scenario file a couple of time, I ran 3 times and 2 of the iterations were completed pods
- id: kill-pods
config:
namespace_pattern: openshift-.*
name_pattern: .*
kill: 1
timeout: 180
Note that installer 6 and 8 pods are gone (were deleted in run)
% oc get pods -n openshift-kube-controller-manager
NAME READY STATUS RESTARTS AGE
installer-7-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 97m
kube-controller-manager-master-00.pubenda-sno.qe.devcluster.openshift.com 4/4 Running 3 (61m ago) 89m
revision-pruner-6-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 98m
revision-pruner-7-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 97m
revision-pruner-8-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 90m
Output below of run
10-27 11:04:08.301 2022-10-27 15:04:08,046 [INFO] Starting kraken
10-27 11:04:08.301 2022-10-27 15:04:08,053 [INFO] Initializing client to talk to the Kubernetes cluster
10-27 11:04:13.659 2022-10-27 15:04:13,552 [INFO] Fetching cluster info
10-27 11:04:17.886 2022-10-27 15:04:17,611 [INFO] Cluster version is 4.12.0-0.nightly-2022-10-25-210451
10-27 11:04:17.886 2022-10-27 15:04:17,612 [INFO] Server URL: https://api.pubenda-sno.qe.devcluster.openshift.com:6443/
10-27 11:04:17.886 2022-10-27 15:04:17,612 [INFO] Generated a uuid for the run: 5bf946f0-5594-4be1-b5e9-952feb2790ec
10-27 11:04:17.886 2022-10-27 15:04:17,612 [INFO] Daemon mode not enabled, will run through 3 iterations
10-27 11:04:17.887
10-27 11:04:17.887 2022-10-27 15:04:17,612 [INFO] Executing scenarios for iteration 0
10-27 11:04:20.406 2022-10-27 15:04:19,909 [INFO] {
10-27 11:04:20.406 "output_id": "success",
10-27 11:04:20.406 "output_data": {
10-27 11:04:20.406 "pods": {
10-27 11:04:20.406 "1666883058879308015": {
10-27 11:04:20.406 "namespace": "openshift-marketplace",
10-27 11:04:20.406 "name": "redhat-marketplace-2jt78"
10-27 11:04:20.406 }
10-27 11:04:20.406 }
10-27 11:04:20.406 }
10-27 11:04:20.406 }
10-27 11:04:20.406
10-27 11:04:20.406 2022-10-27 15:04:19,909 [INFO]
10-27 11:04:20.406 2022-10-27 15:04:19,909 [INFO] Executing scenarios for iteration 1
10-27 11:04:22.355 2022-10-27 15:04:22,140 [INFO] {
10-27 11:04:22.356 "output_id": "success",
10-27 11:04:22.356 "output_data": {
10-27 11:04:22.356 "pods": {
10-27 11:04:22.356 "1666883061109950333": {
10-27 11:04:22.356 "namespace": "openshift-kube-controller-manager",
10-27 11:04:22.356 "name": "installer-6-master-00.pubenda-sno.qe.devcluster.openshift.com"
10-27 11:04:22.356 }
10-27 11:04:22.356 }
10-27 11:04:22.356 }
10-27 11:04:22.356 }
10-27 11:04:22.356
10-27 11:04:22.356 2022-10-27 15:04:22,140 [INFO]
10-27 11:04:22.356 2022-10-27 15:04:22,140 [INFO] Executing scenarios for iteration 2
10-27 11:04:24.939 2022-10-27 15:04:24,387 [INFO] {
10-27 11:04:24.939 "output_id": "success",
10-27 11:04:24.939 "output_data": {
10-27 11:04:24.939 "pods": {
10-27 11:04:24.939 "1666883063354512640": {
10-27 11:04:24.939 "namespace": "openshift-kube-controller-manager",
10-27 11:04:24.939 "name": "installer-8-master-00.pubenda-sno.qe.devcluster.openshift.com"
10-27 11:04:24.939 }
10-27 11:04:24.939 }
10-27 11:04:24.939 }
10-27 11:04:24.939 }
10-27 11:04:24.939
10-27 11:04:24.939 2022-10-27 15:04:24,388 [INFO]
10-27 11:04:24.940 2022-10-27 15:04:24,388 [INFO] Successfully finished running Kraken. UUID for the run: 5bf946f0-5594-4be1-b5e9-952feb2790ec. Report generated at /home/jenkins/ws/workspace/_ci_paige-e2e-multibranch_kraken/kraken/kraken.report. Exiting
When we have have updates to this plugin we need to rebuild the krkn image to load the new code into the krkn image and push to quay. This will be very helpful for prow imaging to get updates without human interactions
Need to add references in this documentation that we need a specific python version to be able to run. Makes it hard for a user to not be able to figure out why the code might not be running properly without really giving and information
In the kraken documentation it states it needs python3.9 to run, no where here does it state that making it hard to debug if issues arise
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.