Comments (10)
I haven't explicitly tried this with evicted pods. I'll setup a test and see if I can get this work (either as is or with a code change) as soon as I can!
from pod-reaper.
@brianberzins sure, appreciate your quick response, it will be really helpful, since kubernetes start evicting pods once the cluster reaches it's max capacity in order to accommodate higher priority pods, resulting in lot of evicted pods in the cluster.
from pod-reaper.
Okay. I found a way to replicate this without COMPLETELY messing with a cluster (also because you can't exactly drain a node on a single node minikube setup).
Basically I created a deployment that just ran sleeps and added an emptyDir
volume mount, exec-ed into the pod and cat /dev/urandom
into the dir until it used up all available space -- after which the pod was evicted. Note, it appears that emptyDir.sizeLimit
is not currently be honored as per kubernetes/kubernetes#63641
Now I should be able to test this properly.
-- more details here --
I just confirmed that, much to my surprise, it actually is skipping over Evicted
pods. I suspect that this is something preventing them from being returned by call to get pods, since doing an explicit delete pod
command (which pod-reaper does) usually actually cleans up evicted pods. More to come, but looks like this will require a code change of some variety.
from pod-reaper.
Alright. I know what's going on.
To summarize reasonably, let's say you run kubectl get pods
and get something that looks like this:
NAME READY STATUS RESTARTS AGE
busybox-6fc7f6b4cf-ncwhk 0/1 Evicted 0 6h
busybox-6fc7f6b4cf-qnfv2 0/1 Error 0 6h
busybox-6fc7f6b4cf-m6vw6 1/1 Running 0 6h
The STATUS
column in this case is populated from 3 different places in code. The CONTAINER_STATUSES
option of pod-reaper is currently capable of finding the Error
pod because that Error
is actually a "container status reason" (specifically a ContainerStateTerminated.Reason
).
The Evicted
status is different and actually comes directly from the pod (specifically the PodStatus.Reason
So here's the plan: I'm going to make another role specifically for the pod status. The logic is similar, but it's still looking at a different thing despite looking the same from the kubectl get pods
output.
I built an image to prove this out. He's the log line of interest:
{"level":"info","msg":"reaping pod","pod":"busybox-6fc7f6b4cf-ncwhk","reasons":["has pod status Evicted"],"time":"2019-02-16T03:38:18Z"}
From here, it's just a matter of adding documentation and a bit of code cleanup. I'm hoping to have this all wrapped up with a new version for you in the next couple hours.
Nice find 👍
from pod-reaper.
@cbharathnoor version 2.3.0
and a new latest
include the ability to kill evicted pods!
Readme has been updated to reflect the new configuration you can use to kill those pesky pods: https://github.com/target/pod-reaper#pod-status
Let me know if this works for you! I did a full functional test with the new version and it killed the pod that I forced into an Evicted
status
from pod-reaper.
@brianberzins thank you, pod-reaper is able to delete pods which are in "Evicted" state, absolutely working fine !
One observation, when we configure container status along with pod status in a single deployment template (please find the reference template mentioned below), pod-reaper is not able to delete pods based on container statuses. At any point of time, pod-reaper is deleting pods based on either "Pod" status or "Container" status, having different clean up pods for each of them (Pod and Container statuses) may result in resource overhead. Is there a way, where we can have a single pod-reaper in place which in turn will delete pods based on pod status as well as container status?
Kindly share your inputs.
Example:
containers:
- name: pod-cleanup
image: target/pod-reaper:2.3.0
env:
# Check pod status every 3 minutes
- name: SCHEDULE
value: "*/3 * * * *"
- name: POD_STATUSES
value: "Evicted"
- name: CONTAINER_STATUSES
value: "Completed,Error,ImagePullBackOff,ErrImagePull"
restartPolicy: Always
terminationGracePeriodSeconds: 30
from pod-reaper.
This is easily the most counter-intuitive part of pod reaper. In order for a pod to be reaped EVERY loaded rule needs flag the pod. So in order to get some or
, I have been running multiple pod-reaper containers (with different configurations) in the same pod.
containers:
- image: target/pod-reaper:2.3.0
name: pod-reaper-pod-status
env:
- name: POD_STATUSES
value: Evicted
- name: SCHEDULE
value: "*/3 * * * *"
- image: target/pod-reaper:2.3.0
name: pod-reaper-container-status
env:
- name: CONTAINER_STATUSES
value: "Completed,Error,ImagePullBackOff,ErrImagePull"
- name: SCHEDULE
value: "*/3 * * * *"
Given that you've been running into quota limits: note that pod-reaper is literally just a linux binary installed on top of a scratch (completely empty container) so you can limit the resources you give it a lot.
Think this will work for you?
from pod-reaper.
Checking in: how's this working for you?
from pod-reaper.
@brianberzins Hey, apologies for the late response, i was not around for few days, the above implementation works fine for me. Pod-reaper is able to delete pods based on container and pod statuses. I have tested this out on GKE, pod-reaper behavior looks fine.
from pod-reaper.
@cbharathnoor Awesome!
Glad I could help!
from pod-reaper.
Related Issues (20)
- Easier local development HOT 4
- Allow default configuration override with annotations HOT 14
- Log messages not parsed by Stackdriver HOT 4
- Explicit rule enable
- Setup CI/CD outside of docker
- Deployment bug HOT 2
- Schedule doesn't seem to work correctly HOT 7
- Dry run mode HOT 2
- Update Dependencies to use Go Modules HOT 3
- Helm chart HOT 5
- Does pod-reaper act on 1 pod at a time? or all pods simultaneously HOT 10
- Pod status rule is misleading HOT 2
- Upgrade docker base image to use golang 1.15 HOT 3
- In nonprod, reduce resouces: Reap pod/apps so they don't consume resources on weekend HOT 1
- Docker builds no longer happening automatically HOT 5
- v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; Use v1 ClusterRole
- MAX_DURATION option does not count the Pod Status Start time HOT 3
- Split the helm chart into a new repo or Equate the the Helm chart version with the pod-reaper version HOT 19
- Pod reaping strategy HOT 5
- Improve dry run log accuracy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pod-reaper.