Giter VIP home page Giter VIP logo

arcaflow-plugin-kill-pod's People

Contributors

chaitanyaenr avatar dependabot[bot] avatar dustinblack avatar jdowni000 avatar paigerube14 avatar sandrobonazzola avatar shahsahil264 avatar tsebastiani avatar

Watchers

 avatar  avatar  avatar  avatar

arcaflow-plugin-kill-pod's Issues

python code style and quality check fails

Describe the bug

2022-10-18T09:42:59Z	info		python code style and quality check found these issues for arcaflow-plugin-kill-pod version dev-build
2022-10-18T09:42:59Z	info		(%!w(string=/github/workspace/arcaflow_plugin_kill_pod.py:13:1: F401 'arcaflow_plugin_sdk.schema' imported but unused
from arcaflow_plugin_sdk import validation, plugin, schema
^
/github/workspace/arcaflow_plugin_kill_pod.py:45:80: E501 line too long (84 > 79 characters)
            if (name_pattern is None or name_pattern.match(pod.metadata.name)) and \
                    namespace_pattern.match(pod.metadata.namespace):
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:64:80: E501 line too long (111 > 79 characters)
        "description": "Map between timestamps and the pods removed. The timestamp is provided in nanoseconds."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:84:80: E501 line too long (103 > 79 characters)
    """
    This is a configuration structure specific to pod kill  scenario. It describes which pod from which
    namespace(s) to select for killing and how many pods to kill.
    """
       
                                                                       ^
/github/workspace/arcaflow_plugin_kill_pod.py:98:80: E501 line too long (99 > 79 characters)
        "description": "Regular expression for target pods. Required if label_selector is not set."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:103:80: E501 line too long (110 > 79 characters)
        metadata={"name": "Number of pods to kill", "description": "How many pods should we attempt to kill?"}
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:112:80: E501 line too long (110 > 79 characters)
        "description": "Kubernetes label selector for the target pods. Required if name_pattern is not set.\n"
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:113:80: E501 line too long (115 > 79 characters)
                       "See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ for details."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:118:80: E501 line too long (84 > 79 characters)
        "description": "Path to your Kubeconfig file. Defaults to ~/.kube/config.\n"
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:119:80: E501 line too long (119 > 79 characters)
                       "See https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/ for "
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:125:80: E501 line too long (88 > 79 characters)
        "description": "Timeout to wait for the target pod(s) to be removed in seconds."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:130:80: E501 line too long (91 > 79 characters)
        "description": "How many seconds to wait between checks for the target pod status."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:140:80: E501 line too long (107 > 79 characters)
def kill_pods(cfg: KillPodConfig) -> typing.Tuple[str, typing.Union[PodKillSuccessOutput, PodErrorOutput]]:
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:146:80: E501 line too long (99 > 79 characters)
            pods = _find_pods(core_v1, cfg.label_selector, cfg.name_pattern, cfg.namespace_pattern)
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:149:80: E501 line too long (120 > 79 characters)
                    "Not enough pods match the criteria, expected {} but found only {} pods".format(cfg.kill, len(pods))
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:159:80: E501 line too long (110 > 79 characters)
                core_v1.delete_namespaced_pod(pod.metadata.name, pod.metadata.namespace, body=V1DeleteOptions(
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:177:25: F841 local variable 'read_pod' is assigned to but never used
                        read_pod = core_v1.read_namespaced_pod(p.name, p.namespace)
                        ^
/github/workspace/arcaflow_plugin_kill_pod.py:177:80: E501 line too long (83 > 79 characters)
                        read_pod = core_v1.read_namespaced_pod(p.name, p.namespace)
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:185:80: E501 line too long (99 > 79 characters)
                    return "error", PodErrorOutput("Timeout while waiting for pods to be removed.")
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:215:80: E501 line too long (98 > 79 characters)
        metadata={"name": "Pod count", "description": "Wait for at least this many pods to exist"}
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:220:80: E501 line too long (84 > 79 characters)
        metadata={"name": "Timeout", "description": "How many seconds to wait for?"}
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:225:80: E501 line too long (91 > 79 characters)
        "description": "How many seconds to wait between checks for the target pod status."
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:237:80: E501 line too long (115 > 79 characters)
def wait_for_pods(cfg: WaitForPodsConfig) -> typing.Tuple[str, typing.Union[PodWaitSuccessOutput, PodErrorOutput]]:
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:[24](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:25)5:80: E501 line too long (103 > 79 characters)
                pods = _find_pods(core_v1, cfg.label_selector, cfg.name_pattern, cfg.namespace_pattern)
                                                                               ^
/github/workspace/arcaflow_plugin_kill_pod.py:2[48](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:49):80: E[50](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:51)1 line too long (118 > [79](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286359/jobs/5383107100#step:4:80) characters)
                    return "success", \
                           PodWaitSuccessOutput(list(map(lambda p: Pod(p.metadata.namespace, p.metadata.name), pods)))
                                       
                                       ^
))

4.12 Pod Creation needs namespace creation privileges

Describe the bug

When running the tests on a 4.12 openshift cluster seeing the below error because of the new restrictions on creating pods in restricted namespaces

test_watch (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1378, in run
    self.function(*self.args, **self.kwargs)
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/tests/test_arcaflow_plugin_kill_pod.py", line 162, in create_test_pod
    core_v1.create_namespaced_pod("default", V1Pod(
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod
    return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info
    return self.api_client.call_api(
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/rest.py", line 276, in POST
    return self.request("POST", url,
  File "/Users/prubenda/PycharmProjects/arcaflow-plugin-kill-pod/venv3/lib/python3.10/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '75dd2079-c7f0-4fdc-a032-9e5c284f0ebc', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '51138c75-8252-4d1c-b9c4-9e29122004fc', 'X-Kubernetes-Pf-Prioritylevel-Uid': '1bbd872d-d7fb-48f8-9a2c-7105d44806dd', 'Date': 'Thu, 27 Oct 2022 15:21:33 GMT', 'Content-Length': '688'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"watch-test-irtwgirf\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"test\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container \"test\" must set securityContext.capabilities.drop=[\"ALL\"]), runAsNonRoot != true (pod or container \"test\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container \"test\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")","reason":"Forbidden","details":{"name":"watch-test-irtwgirf","kind":"pods"},"code":403}

To reproduce

4.12 Openshift cluster
Think this also applies to 1.25 kuberenetes clusters

python -m coverage run -a -m unittest discover -s tests -v

Background information

https://kubernetes.io/docs/concepts/security/pod-security-standards/

ResourceWarning: unclosed ssl.SSLSocket

Describe the bug

Running unit tests we get some resource allocation warning:

test_kill_pod (test_arcaflow_plugin_kill_pod.KillPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 52444), raddr=('127.0.0.1', 38689)>
  result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/unittest/case.py:597: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('[12](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286360/jobs/5383107149#step:10:13)7.0.0.1', 52434), raddr=('127.0.0.1', 38689)>
  self.doCleanups()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_not_enough_pods (test_arcaflow_plugin_kill_pod.KillPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48206), raddr=('127.0.0.1', 38689)>
  result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_serialization (test_arcaflow_plugin_kill_pod.KillPodTest) ... ok
test_serialization (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... ok
test_timeout (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... /opt/hostedtoolcache/Python/3.9.[14](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3272286360/jobs/5383107149#step:10:15)/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48218), raddr=('127.0.0.1', 38689)>
  result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_watch (test_arcaflow_plugin_kill_pod.WaitForPodTest) ... /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py:5266: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 48228), raddr=('127.0.0.1', 38689)>
  result = self._handler(params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/unittest/case.py:597: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 37830), raddr=('127.0.0.1', 38689)>
  self.doCleanups()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok

To reproduce

Run unit tests.

No found pods, error out

In rosa testing on prow I found the following error. We need to check if the pod status is not none and probably the pod overall

2023-05-25 20:15:53,175 [INFO] Executing scenarios for iteration 0
2023-05-25 20:15:53,175 [INFO] scenario pod-scenarios/pod_scenario.yaml
2023-05-25 20:15:54,398 [INFO] {
	"output_id": "success",
	"output_data": {
		"pods": {
			"1685045753389784028": {
				"namespace": "openshift-etcd",
				"name": "etcd-ip-10-0-178-150.ec2.internal"
			}
		}
	}
}

2023-05-25 20:17:07,192 [INFO] {
	"output_id": "error",
	"output_data": {
		"error": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.9/site-packages/arcaflow_plugin_kill_pod.py\", line 292, in wait_for_pods\n    for container in pod.status.container_statuses:\nTypeError: 'NoneType' object is not iterable\n"
	}
}

Set up Quay secrets

Describe the bug

Looks like quay secrets are not properly set up for this repository to properly run the carpenter build. Not sure if we need this?

To reproduce

Run github actions on PR
https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965194/jobs/5637657081

Additional context

2022-11-04T07:29:00Z	info		QUAY_USERNAME is empty
2022-11-04T07:29:00Z	info		QUAY_PASSWORD is empty
[20](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965200/jobs/5637657034#step:4:21)22-11-04T07:29:00Z	info		QUAY_NAMESPACE is empty
2022-11-04T07:29:00Z	info		Missing credentials for quay.io
20[22](https://github.com/arcalot/arcaflow-plugin-kill-pod/actions/runs/3387965200/jobs/5637657034#step:4:23)/11/04 07:29:00 failed requirements check, not building: arcaflow-plugin-kill-pod dev-build

Tests failing and currently disabled

The built-in tests are currently failing, and the test automation and code coverage steps are disabled in the Dockerfile as a result. Tests should be corrected and coverage re-enabled.

Need to only kill pods that are running

Describe the bug

The pod killing scenario should only "kill" pods that are running, not complete or really in any other state. Need to add a parameter to this line to add state of pod

To reproduce

Run with below scenario file a couple of time, I ran 3 times and 2 of the iterations were completed pods

  - id: kill-pods
    config:
     namespace_pattern: openshift-.*
     name_pattern: .*
     kill: 1
     timeout: 180

Additional context

Note that installer 6 and 8 pods are gone (were deleted in run)
% oc get pods -n openshift-kube-controller-manager
NAME READY STATUS RESTARTS AGE
installer-7-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 97m
kube-controller-manager-master-00.pubenda-sno.qe.devcluster.openshift.com 4/4 Running 3 (61m ago) 89m
revision-pruner-6-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 98m
revision-pruner-7-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 97m
revision-pruner-8-master-00.pubenda-sno.qe.devcluster.openshift.com 0/1 Completed 0 90m

Output below of run

10-27 11:04:08.301  2022-10-27 15:04:08,046 [INFO] Starting kraken
10-27 11:04:08.301  2022-10-27 15:04:08,053 [INFO] Initializing client to talk to the Kubernetes cluster
10-27 11:04:13.659  2022-10-27 15:04:13,552 [INFO] Fetching cluster info
10-27 11:04:17.886  2022-10-27 15:04:17,611 [INFO] Cluster version is 4.12.0-0.nightly-2022-10-25-210451
10-27 11:04:17.886  2022-10-27 15:04:17,612 [INFO] Server URL: https://api.pubenda-sno.qe.devcluster.openshift.com:6443/
10-27 11:04:17.886  2022-10-27 15:04:17,612 [INFO] Generated a uuid for the run: 5bf946f0-5594-4be1-b5e9-952feb2790ec
10-27 11:04:17.886  2022-10-27 15:04:17,612 [INFO] Daemon mode not enabled, will run through 3 iterations
10-27 11:04:17.887  
10-27 11:04:17.887  2022-10-27 15:04:17,612 [INFO] Executing scenarios for iteration 0
10-27 11:04:20.406  2022-10-27 15:04:19,909 [INFO] {
10-27 11:04:20.406  	"output_id": "success",
10-27 11:04:20.406  	"output_data": {
10-27 11:04:20.406  		"pods": {
10-27 11:04:20.406  			"1666883058879308015": {
10-27 11:04:20.406  				"namespace": "openshift-marketplace",
10-27 11:04:20.406  				"name": "redhat-marketplace-2jt78"
10-27 11:04:20.406  			}
10-27 11:04:20.406  		}
10-27 11:04:20.406  	}
10-27 11:04:20.406  }
10-27 11:04:20.406  
10-27 11:04:20.406  2022-10-27 15:04:19,909 [INFO] 
10-27 11:04:20.406  2022-10-27 15:04:19,909 [INFO] Executing scenarios for iteration 1
10-27 11:04:22.355  2022-10-27 15:04:22,140 [INFO] {
10-27 11:04:22.356  	"output_id": "success",
10-27 11:04:22.356  	"output_data": {
10-27 11:04:22.356  		"pods": {
10-27 11:04:22.356  			"1666883061109950333": {
10-27 11:04:22.356  				"namespace": "openshift-kube-controller-manager",
10-27 11:04:22.356  				"name": "installer-6-master-00.pubenda-sno.qe.devcluster.openshift.com"
10-27 11:04:22.356  			}
10-27 11:04:22.356  		}
10-27 11:04:22.356  	}
10-27 11:04:22.356  }
10-27 11:04:22.356  
10-27 11:04:22.356  2022-10-27 15:04:22,140 [INFO] 
10-27 11:04:22.356  2022-10-27 15:04:22,140 [INFO] Executing scenarios for iteration 2
10-27 11:04:24.939  2022-10-27 15:04:24,387 [INFO] {
10-27 11:04:24.939  	"output_id": "success",
10-27 11:04:24.939  	"output_data": {
10-27 11:04:24.939  		"pods": {
10-27 11:04:24.939  			"1666883063354512640": {
10-27 11:04:24.939  				"namespace": "openshift-kube-controller-manager",
10-27 11:04:24.939  				"name": "installer-8-master-00.pubenda-sno.qe.devcluster.openshift.com"
10-27 11:04:24.939  			}
10-27 11:04:24.939  		}
10-27 11:04:24.939  	}
10-27 11:04:24.939  }
10-27 11:04:24.939  
10-27 11:04:24.939  2022-10-27 15:04:24,388 [INFO] 
10-27 11:04:24.940  2022-10-27 15:04:24,388 [INFO] Successfully finished running Kraken. UUID for the run: 5bf946f0-5594-4be1-b5e9-952feb2790ec. Report generated at /home/jenkins/ws/workspace/_ci_paige-e2e-multibranch_kraken/kraken/kraken.report. Exiting

Krkn build image update when arca plugin pushes

When we have have updates to this plugin we need to rebuild the krkn image to load the new code into the krkn image and push to quay. This will be very helpful for prow imaging to get updates without human interactions

Add information to README for needing python3.9 or higher

Please describe what you would like to see in this project

Need to add references in this documentation that we need a specific python version to be able to run. Makes it hard for a user to not be able to figure out why the code might not be running properly without really giving and information

Please describe your use case

In the kraken documentation it states it needs python3.9 to run, no where here does it state that making it hard to debug if issues arise

Additional context

Install dependencies outlined in krkn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.