kubernetes-sigs / descheduler Goto Github PK

View Code? Open in Web Editor NEW

4.3K 50.0 650.0 95.74 MB

Descheduler for Kubernetes

Home Page: https://sigs.k8s.io/descheduler

License: Apache License 2.0

Go 95.73% Shell 3.33% Makefile 0.53% Dockerfile 0.09% Mustache 0.32%

k8s-sig-scheduling kubernetes

descheduler's People

Stargazers

Watchers

Forkers

ravisantoshgudimetla jayunit100 eformat aveshagarwal yastij swiftdiaries zouyee openshift songbinliu wackxu concaf spiffxp cjbrigato alexxnica kryndex manifoldco jelmersnoeck cesartl ingvagabund liubin billyteves betabrand bluealin tokkolabs dixudx gosoon renso atoms nikhita gyliu513 avinashpenmetsa cgws ringtail jessejlt rjkernick raghavtan carforyou chrisphillips-cminion moule3053 jpdasma resouer webbrandon sempr yangrenyu dylanble section-io etsangsplk huanwei yumingc velenzuo tsu1980 ykfq apokleos grem11n nitzs sandeepmendiratta epasham russiancook shirleyding mrwulf poidag-zz staeco moonek boite-nl harishkrishna17 themagicalkarp richardyuwen derekyle tulip iomv joelsmith isgasho cmaster11 robotany krmayankk amargherio reetasingh d-kuro jpedro1992 sylr kohlstechnology sidharthsurana jaypipes fengyunpan2 sadiqnaizam nilesh93 wuwx nikitsrj yuxijin-tobeyjin y-taka-23 bowenislandsong anoopwebs husoule sfarcana sightmachine matt-mahdieh mccare bourne-id gfandada swatisehgal

descheduler's Issues

Add a strategy for taints and tolerations

Recently one of the users requested a strategy for taints and tolerations. While I don't have cycles to work on this, I would be more than happy to review if anyone in the community is interested to work on.

RemoveDuplicates should not evict pods when other schedulable nodes are not available

When I run the following policy config file -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemoveDuplicates":
     enabled: true

with RemoveDuplicates strategy enabled, and if there is only one schedulable node available (on which the pods are already running on), then the descheduler still evicts the pods, only to be scheduled on the same node again. This would lead to disruption of service without any gain.

$ descheduler --kubeconfig $KUBECONFIG --policy-config-file policy.yaml -v 5
I0120 02:49:28.828911   13141 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I0120 02:49:28.828993   13141 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I0120 02:49:28.929080   13141 node.go:50] node lister returned empty list, now fetch directly
I0120 02:49:30.343166   13141 duplicates.go:49] Processing node: "kubernetes-master"
I0120 02:49:30.873924   13141 duplicates.go:49] Processing node: "kubernetes-minion-group-5xwf"
I0120 02:49:31.123129   13141 duplicates.go:53] "ReplicaSet/wordpress-57f4bb46bf"
I0120 02:49:31.367105   13141 duplicates.go:61] Evicted pod: "wordpress-57f4bb46bf-fn8qm" (<nil>)
I0120 02:49:31.607484   13141 duplicates.go:61] Evicted pod: "wordpress-57f4bb46bf-k6tqz" (<nil>)
I0120 02:49:31.865925   13141 duplicates.go:61] Evicted pod: "wordpress-57f4bb46bf-rvwd9" (<nil>)
I0120 02:49:32.155498   13141 duplicates.go:61] Evicted pod: "wordpress-57f4bb46bf-v9bzq" (<nil>)
I0120 02:49:32.155526   13141 duplicates.go:49] Processing node: "kubernetes-minion-group-cxg1"
I0120 02:49:32.433999   13141 duplicates.go:49] Processing node: "kubernetes-minion-group-v738"

How about descheduler only evict the pods if there are other schedulable nodes are available?

LowNodeUtilization not working in k8s 1.9 (GKE non-alpha cluster)

I have the following config but the descheduler does not remove any pods

      LowNodeUtilization:
         enabled: true
         params:
           nodeResourceUtilizationThresholds:
             thresholds:
               cpu: 30
               memory: 30
               pods: 30
             targetThresholds:
               cpu: 50
               memory: 50
               pods: 50

I0508 07:06:40.389008       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-2mj6" is over utilized with usage: api.ResourceThresholds{"memory":63.35539318178952, "pods":10, "cpu":29.98741346758968}
I0508 07:06:40.389076       1 lownodeutilization.go:149] allPods:11, nonRemovablePods:5, bePods:1, bPods:2, gPods:3
I0508 07:06:40.389225       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-1qsh" is appropriately utilized with usage: api.ResourceThresholds{"cpu":31.623662680931403, "memory":38.651985111461904, "pods":10}
I0508 07:06:40.389265       1 lownodeutilization.go:149] allPods:11, nonRemovablePods:5, bePods:1, bPods:4, gPods:1
I0508 07:06:40.389353       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-8fc1" is appropriately utilized with usage: api.ResourceThresholds{"pods":5.454545454545454, "cpu":33.38577721837634, "memory":46.87748520961707}
I0508 07:06:40.389375       1 lownodeutilization.go:149] allPods:6, nonRemovablePods:4, bePods:0, bPods:1, gPods:1
I0508 07:06:40.389508       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-0s07" is over utilized with usage: api.ResourceThresholds{"cpu":43.14033983637508, "memory":61.846487904599, "pods":8.181818181818182}
I0508 07:06:40.389535       1 lownodeutilization.go:149] allPods:9, nonRemovablePods:4, bePods:0, bPods:2, gPods:3
I0508 07:06:40.389712       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-nq13" is over utilized with usage: api.ResourceThresholds{"cpu":84.36123348017621, "memory":86.35326222824197, "pods":10}
I0508 07:06:40.389744       1 lownodeutilization.go:149] allPods:11, nonRemovablePods:5, bePods:0, bPods:2, gPods:4
I0508 07:06:40.389853       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-sb0v" is over utilized with usage: api.ResourceThresholds{"pods":7.2727272727272725, "cpu":40.308370044052865, "memory":65.15821416455076}
I0508 07:06:40.389879       1 lownodeutilization.go:149] allPods:8, nonRemovablePods:5, bePods:0, bPods:2, gPods:1
I0508 07:06:40.389965       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-3290" is appropriately utilized with usage: api.ResourceThresholds{"cpu":27.72183763373191, "memory":45.02291850404409, "pods":6.363636363636363}
I0508 07:06:40.389988       1 lownodeutilization.go:149] allPods:7, nonRemovablePods:5, bePods:0, bPods:2, gPods:0
I0508 07:06:40.390079       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-7w01" is appropriately utilized with usage: api.ResourceThresholds{"cpu":37.161736941472626, "memory":43.96316610085953, "pods":6.363636363636363}
I0508 07:06:40.390114       1 lownodeutilization.go:149] allPods:7, nonRemovablePods:5, bePods:0, bPods:0, gPods:2
I0508 07:06:40.390311       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-pcz2" is over utilized with usage: api.ResourceThresholds{"memory":59.747681387354575, "pods":12.727272727272727, "cpu":67.36941472624292}
I0508 07:06:40.390335       1 lownodeutilization.go:149] allPods:14, nonRemovablePods:6, bePods:1, bPods:5, gPods:2
I0508 07:06:40.390425       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-rw0t" is appropriately utilized with usage: api.ResourceThresholds{"cpu":33.38577721837634, "memory":38.39946598414058, "pods":6.363636363636363}
I0508 07:06:40.390452       1 lownodeutilization.go:149] allPods:7, nonRemovablePods:4, bePods:0, bPods:2, gPods:1
I0508 07:06:40.390580       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-36ae422e-wnp4" is over utilized with usage: api.ResourceThresholds{"cpu":65.48143486469478, "memory":81.88036194839464, "pods":8.181818181818182}
I0508 07:06:40.390617       1 lownodeutilization.go:149] allPods:9, nonRemovablePods:4, bePods:0, bPods:3, gPods:2
I0508 07:06:40.390701       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-150f" is appropriately utilized with usage: api.ResourceThresholds{"cpu":27.847702957835118, "memory":39.989094588917425, "pods":5.454545454545454}
I0508 07:06:40.390722       1 lownodeutilization.go:149] allPods:6, nonRemovablePods:4, bePods:0, bPods:1, gPods:1
I0508 07:06:40.390907       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-x9lg" is over utilized with usage: api.ResourceThresholds{"memory":76.15314534759058, "pods":12.727272727272727, "cpu":47.860289490245435}
I0508 07:06:40.390929       1 lownodeutilization.go:149] allPods:14, nonRemovablePods:6, bePods:0, bPods:5, gPods:3
I0508 07:06:40.391013       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-mcvm" is appropriately utilized with usage: api.ResourceThresholds{"cpu":27.72183763373191, "memory":45.02291850404409, "pods":6.363636363636363}
I0508 07:06:40.391032       1 lownodeutilization.go:149] allPods:7, nonRemovablePods:5, bePods:0, bPods:2, gPods:0
I0508 07:06:40.391230       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-tpck" is over utilized with usage: api.ResourceThresholds{"cpu":79.32662051604783, "memory":86.72583143248654, "pods":15.454545454545455}
I0508 07:06:40.391254       1 lownodeutilization.go:149] allPods:17, nonRemovablePods:8, bePods:1, bPods:6, gPods:2
I0508 07:06:40.391429       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-bftv" is over utilized with usage: api.ResourceThresholds{"cpu":67.0547514159849, "memory":40.020142022604475, "pods":12.727272727272727}
I0508 07:06:40.391454       1 lownodeutilization.go:149] allPods:14, nonRemovablePods:5, bePods:0, bPods:5, gPods:4
I0508 07:06:40.391560       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-zdd4" is over utilized with usage: api.ResourceThresholds{"cpu":41.63624921334173, "memory":81.68473886035868, "pods":7.2727272727272725}
I0508 07:06:40.391579       1 lownodeutilization.go:149] allPods:8, nonRemovablePods:7, bePods:0, bPods:0, gPods:1
I0508 07:06:40.391630       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-ffq9" is over utilized with usage: api.ResourceThresholds{"pods":4.545454545454546, "cpu":28.351164254247955, "memory":58.79969974544339}
I0508 07:06:40.391659       1 lownodeutilization.go:149] allPods:5, nonRemovablePods:5, bePods:0, bPods:0, gPods:0
I0508 07:06:40.391865       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-0plb" is over utilized with usage: api.ResourceThresholds{"cpu":69.57205789804908, "memory":26.510368710913788, "pods":12.727272727272727}
I0508 07:06:40.391891       1 lownodeutilization.go:149] allPods:14, nonRemovablePods:5, bePods:1, bPods:2, gPods:6
I0508 07:06:40.391985       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-36ae422e-32s4" is over utilized with usage: api.ResourceThresholds{"cpu":45.972309628697296, "memory":55.872961663211015, "pods":7.2727272727272725}
I0508 07:06:40.392007       1 lownodeutilization.go:149] allPods:8, nonRemovablePods:4, bePods:0, bPods:3, gPods:1
I0508 07:06:40.392185       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-v103" is over utilized with usage: api.ResourceThresholds{"cpu":88.82945248584015, "memory":65.58252909160707, "pods":11.818181818181818}
I0508 07:06:40.392206       1 lownodeutilization.go:149] allPods:13, nonRemovablePods:7, bePods:0, bPods:3, gPods:3
I0508 07:06:40.392274       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-290fc974-pwh6" is appropriately utilized with usage: api.ResourceThresholds{"cpu":22.183763373190686, "memory":36.015023076975325, "pods":5.454545454545454}
I0508 07:06:40.392297       1 lownodeutilization.go:149] allPods:6, nonRemovablePods:5, bePods:0, bPods:1, gPods:0
I0508 07:06:40.392450       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-fxpk" is over utilized with usage: api.ResourceThresholds{"cpu":47.23096286972939, "memory":56.65535699212463, "pods":10}
I0508 07:06:40.392479       1 lownodeutilization.go:149] allPods:11, nonRemovablePods:5, bePods:0, bPods:6, gPods:0
I0508 07:06:40.392632       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-fsr8" is appropriately utilized with usage: api.ResourceThresholds{"pods":11.818181818181818, "cpu":35.08495909376967, "memory":38.59402990191275}
I0508 07:06:40.392652       1 lownodeutilization.go:149] allPods:13, nonRemovablePods:4, bePods:1, bPods:7, gPods:1
I0508 07:06:40.392727       1 lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-290fc974-rq6s" is over utilized with usage: api.ResourceThresholds{"cpu":34.64443045940843, "memory":62.24389505579321, "pods":5.454545454545454}
I0508 07:06:40.392753       1 lownodeutilization.go:149] allPods:6, nonRemovablePods:6, bePods:0, bPods:0, gPods:0
I0508 07:06:40.392759       1 lownodeutilization.go:65] Criteria for a node under utilization: CPU: 30, Mem: 30, Pods: 30
I0508 07:06:40.392782       1 lownodeutilization.go:69] No node is underutilized, nothing to do here, you might tune your thersholds further

kubectl top nodes

NAME                                                  CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
gke-asia-northeast1-std--default-pool-36ae422e-32s4   199m         1%        12555Mi         12%       
gke-asia-northeast1-std--default-pool-290fc974-pwh6   101m         0%        10892Mi         11%       
gke-asia-northeast1-std--default-pool-290fc974-2mj6   218m         1%        8947Mi          9%        
gke-asia-northeast1-std--default-pool-290fc974-bftv   372m         2%        17092Mi         17%       
gke-asia-northeast1-std--default-pool-290fc974-pcz2   279m         1%        44959Mi         46%       
gke-asia-northeast1-std--default-pool-36ae422e-mcvm   286m         1%        29233Mi         30%       
gke-asia-northeast1-std--default-pool-290fc974-0s07   120m         0%        9409Mi          9%        
gke-asia-northeast1-std--default-pool-290fc974-ffq9   164m         1%        13839Mi         14%       
gke-asia-northeast1-std--default-pool-290fc974-sb0v   404m         2%        11927Mi         12%       
gke-asia-northeast1-std--default-pool-290fc974-fxpk   211m         1%        30067Mi         31%       
gke-asia-northeast1-std--default-pool-290fc974-v103   1337m        8%        42334Mi         43%       
gke-asia-northeast1-std--default-pool-36ae422e-wnp4   291m         1%        19506Mi         20%       
gke-asia-northeast1-std--default-pool-36ae422e-fsr8   532m         3%        22507Mi         23%       
gke-asia-northeast1-std--default-pool-36ae422e-3290   235m         1%        33359Mi         34%       
gke-asia-northeast1-std--default-pool-290fc974-rq6s   78m          0%        34039Mi         35%       
gke-asia-northeast1-std--default-pool-36ae422e-8fc1   112m         0%        10349Mi         10%       
gke-asia-northeast1-std--default-pool-36ae422e-7w01   185m         1%        10906Mi         11%       
gke-asia-northeast1-std--default-pool-36ae422e-150f   162m         1%        11357Mi         11%       
gke-asia-northeast1-std--default-pool-290fc974-x9lg   333m         2%        13055Mi         13%       
gke-asia-northeast1-std--default-pool-290fc974-0plb   137m         0%        22509Mi         23%       
gke-asia-northeast1-std--default-pool-36ae422e-1qsh   269m         1%        20021Mi         20%       
gke-asia-northeast1-std--default-pool-290fc974-nq13   256m         1%        56451Mi         58%       
gke-asia-northeast1-std--default-pool-290fc974-tpck   435m         2%        40776Mi         42%       
gke-asia-northeast1-std--default-pool-290fc974-zdd4   88m          0%        27627Mi         28%       
gke-asia-northeast1-std--default-pool-36ae422e-rw0t   120m         0%        26677Mi         27%

Stark difference between what is reported in the logs vs reported by kubectl top

lownodeutilization.go:144] Node "gke-asia-northeast1-std--default-pool-36ae422e-32s4" is over utilized with usage: api.ResourceThresholds{"cpu":45.972309628697296, "memory":55.872961663211015, "pods":7.2727272727272725}

gke-asia-northeast1-std--default-pool-36ae422e-32s4   199m         1%        12555Mi         12%

Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.4-gke.1", GitCommit:"10e47a740d0036a4964280bd663c8500da58e3aa", GitTreeState:"clean", BuildDate:"2018-03-13T18:00:36Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Eviction of stateful set

Does the descheduler evict pods created by a StatefulSet?

I've got 3 pods from the same the StatefulSet in one node but this is not picked up by the duplicate strategy

Improve the test coverage

Running go test -cover:

?   	github.com/kubernetes-incubator/descheduler/cmd/descheduler	[no test files]
?   	github.com/kubernetes-incubator/descheduler/cmd/descheduler/app	[no test files]
?   	github.com/kubernetes-incubator/descheduler/cmd/descheduler/app/options	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/api	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/api/install	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/api/v1alpha1	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/apis/componentconfig	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/apis/componentconfig/install	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/apis/componentconfig/v1alpha1	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/descheduler	[no test files]
?   	github.com/kubernetes-incubator/descheduler/pkg/descheduler/client	[no test files]
ok  	github.com/kubernetes-incubator/descheduler/pkg/descheduler/evictions	0.273s	coverage: 50.0% of statements
?   	github.com/kubernetes-incubator/descheduler/pkg/descheduler/evictions/utils	[no test files]
ok  	github.com/kubernetes-incubator/descheduler/pkg/descheduler/node	0.268s	coverage: 72.5% of statements
ok  	github.com/kubernetes-incubator/descheduler/pkg/descheduler/pod	0.144s	coverage: 33.3% of statements
?   	github.com/kubernetes-incubator/descheduler/pkg/descheduler/scheme	[no test files]
ok  	github.com/kubernetes-incubator/descheduler/pkg/descheduler/strategies	0.077s	coverage: 73.6% of statements
?   	github.com/kubernetes-incubator/descheduler/pkg/utils	[no test files]
?   	github.com/kubernetes-incubator/descheduler/test	[no test files]

Some packages are missing the tests completely.

Another feature I love about the Golang :). More about test coverage at https://blog.golang.org/cover

Add an Ascii-art [before/after] diagram and fix some typos in the rescheduler README.md

This is just a starter issue to get ramped up on contributing. Its been a while since I've contributed anything to upstream :).

For this issue we'd like to:

Update an ascii diagram of before/after a rescheduling scenario (specifically, one for the low node usage / bin packing, as thats strategic for us at this time).
Fix some minor nits and typos in the README.md.

Allow descheduling of Pods which have hostDirs

Currently the descheduler only checks for the kubernetes.io/created-by annotation in order to proceed with descheduling. It also ignores every pod which has a hostDir volume mounted.

Will it be possible to allow descheduling of Pods which have hostDirs (maybe configurable, based on the content in kubernetes.io/created-by) ?

HighNodeUtilization strategy

I was reading the Rescheduler-Design-Implementation document (https://docs.google.com/document/d/1KXw02Q0cOF1MUrdpPNiug0yGZlixvPg2SwBycrT5DkE/edit) and saw that the descheduler should support also the HighNodeUtilization strategy option.

Meaning that the descheduler should evict pods from nodes that reached high thresholds.

This is what i am trying to achieve, balancing a heavy load nodes pods into low utilized nodes but cannot seem to get that to work :(

Any idea how does a policy that balances HighNodeUtilization nodes should be defined ? Or is it not implemented in the code ? Is it a feature that can be added ?

Thank you for any kind of help

Roiy

Putting non supported resource names in threshold does not throw an error

Currently, the descheduler only supports cpu, memory and pods, but if we put another resource name or an invalid resource name, then we do not get an error -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "cpu" : 40
           "memory": 40
           "pods": 40
           "storage": 25 # unsupported value
         targetThresholds:
           "cpu" : 30
           "memory": 2
           "pods": 1

should throw an error, but it does not -

$ _output/bin/descheduler --kubeconfig-file /var/run/kubernetes/admin.kubeconfig --policy-config-file examples/custom.yaml -v 5 
I1124 15:07:19.211499   16232 reflector.go:198] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1124 15:07:19.211643   16232 node.go:50] node lister returned empty list, now fetch directly
I1124 15:07:19.211789   16232 reflector.go:236] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1124 15:07:21.982785   16232 lownodeutilization.go:115] Node "kubernetes-minion-group-j57l" usage: api.ResourceThresholds{"cpu":74.5, "memory":12.991864967531136, "pods":13.636363636363637}
I1124 15:07:21.983109   16232 lownodeutilization.go:115] Node "kubernetes-minion-group-jk62" usage: api.ResourceThresholds{"memory":17.931192353458197, "pods":12.727272727272727, "cpu":87.3}
I1124 15:07:21.983179   16232 lownodeutilization.go:115] Node "kubernetes-minion-group-vkv3" usage: api.ResourceThresholds{"cpu":10, "memory":2.764226588836412, "pods":1.8181818181818181}
I1124 15:07:21.983320   16232 lownodeutilization.go:115] Node "kubernetes-master" usage: api.ResourceThresholds{"pods":8.181818181818182, "cpu":95, "memory":11.575631035804197}
I1124 15:07:21.983374   16232 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-jk62" with usage: api.ResourceThresholds{"cpu":87.3, "memory":17.931192353458197, "pods":12.727272727272727}
I1124 15:07:21.983402   16232 lownodeutilization.go:163] evicting pods from node "kubernetes-master" with usage: api.ResourceThresholds{"cpu":95, "memory":11.575631035804197, "pods":8.181818181818182}
I1124 15:07:21.983485   16232 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-j57l" with usage: api.ResourceThresholds{"cpu":74.5, "memory":12.991864967531136, "pods":13.636363636363637}

Terminology confusion around utilization

The current configuration types describe NodeResourceUtilizationThresholds.

I think utilization implies observed usage, not what is scheduled or allocated.

If we use the term utilization, it should mean the decision is based on metrics.

If we use the term allocated or node scheduling thresholds, it should mean the decision is based on pod resource requests, and not observed usage.

PVC Consideration

Project looks interesting! One consideration on the "Future Roadmap" that would be worth considering is the Fault Domains and the PVCs that are associated with a pod.

"Evicting" the pod with a PVC due to LowNodeUtilization on another node would not result in actual re-placement of that pod, so it shouldn't be attempted.

Document what version of the k8s api this is compatible with

It would be great to have a version matrix in the readme showing versions of descheduler to versions of k8s compatibility.

Question about on how pods are evicted

Hi, I would like to know if it possible to define deschedule policies only for namespaces with a specific label, any idea ?

Thanks.

Limit the number of pods to be evicted on each node

As of now, descheduler doesn't have a cap on the max number of pods to be evicted from each node. We should have this feature to ensure that cluster won't be on fire.

Move this project to sig-sponsored repo

In a recent discussion on sig-scheduling mailing list, we have decided to move this repo to a sig-sponsored repo. Creating this issue to track the movement.

descheduler and devices/latency sensitive pods

If a pod consumes high-value devices (gpus, hugepages) or even has higher latency benefits via cpu pinning, we should avoid descheduling those as its not known if those new pods will actually get a better fit.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Enable profiling in descheduler

/cc @aveshagarwal - As per our offline discussion, I think first step would be to enable profiling. I am planning to add flag(s) which enable profiling. I will try to avoid starting a httpserver based profiling in the initial stages.

Unable to specify only one or two of cpu, memory or pods for LowNodeUtilization

Hi,
I was just playing around with the project.

My policy file looks like -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "cpu" : 20
         targetThresholds:
           "cpu" : 50

and I run the descheduler like -

$ _output/bin/descheduler --kubeconfig-file /var/run/kubernetes/admin.kubeconfig --policy-config-file examples/policy.yaml  -v 5
I1123 17:14:37.581631   13825 reflector.go:198] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1123 17:14:37.581785   13825 node.go:50] node lister returned empty list, now fetch directly
I1123 17:14:37.582104   13825 reflector.go:236] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1123 17:14:38.069287   13825 lownodeutilization.go:104] no target resource threshold for pods is configured

The exit code is 0, but I'm not sure if the descheduler actually went ahead and processed the nodes, etc, because it might have stopped while seeing that there is no targetThreshold for pods.

If this is the case, does it make sense to make all the 3 parameters, pods, memory and cpu, mandatory for the descheduler to take decisions, why can I not set the parameter to just cpu, or memory, or pods.

Does this make sense?

filter pods with pvc volumes

https://github.com/kubernetes-incubator/descheduler/blob/af2198428e0f52c10d9bd58bc8f3173c80a45142/pkg/descheduler/pod/pods.go#L132

Hey!
It looks like PVCs should be marked as non-evictable, or there should be a user-defined flag to allow that behavior.
Greetings.

Create test cluster

Woops

Create a version for descheduler

As of now, we don't have version flag for descheduler. We need to have one. I will submit a PR for this.

cc @aveshagarwal

Add support for inter-pod affinity strategy

For pods with podAffinity set using preferredDuringSchedulingIgnoredDuringExecution, it might be possible that at the time of scheduling on the current node, no pod with the set labels were running, but still the pod got scheduled on the current node since the nature of the affinity was preferred and not required.

In such a case, if the descheduler is run, it can do the following -

finds pods running with podAffinity set using preferredDuringSchedulingIgnoredDuringExecution
checks if the pods found in 1 are scheduled on the desired node or not
if not, descheduler checks on other schedulable nodes if the desired pods are running where this podAffinity condition can be met
if such a node is found, descheduler evicts the pod (and hopefully the scheduler schedules it on the desired node 🎉)

Maybe we could have a policy file describing the strategy like -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemovePodsViolatingPodAffinity":
     enabled: true

Does it make sense to support such a strategy?

Nodes with scheduling disabled should not be taken into consideration for LowNodeUtilization

I have the following nodes -

$ kubectl get nodes
NAME                           STATUS                     ROLES     AGE       VERSION
kubernetes-master              Ready,SchedulingDisabled   <none>    56m       v1.8.4-dirty
kubernetes-minion-group-5rrh   Ready                      <none>    56m       v1.8.4-dirty
kubernetes-minion-group-fb8c   Ready                      <none>    56m       v1.8.4-dirty
kubernetes-minion-group-t1r3   Ready,SchedulingDisabled   <none>    56m       v1.8.4-dirty

The worker node kubernetes-minion-group-t1r3 was cordoned and marked as unschedulable, however it fulfilled the criteria for being an underutilized node according to the following policy file -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:  # any node below the following percentages is considered underutilized
           "cpu" : 40
           "memory": 40
           "pods": 40
         targetThresholds: # any node above the following percentages is considered overutilized
           "cpu" : 30
           "memory": 2
           "pods": 1

When I ran the descheduler, kubernetes-minion-group-t1r3 (the cordoned node) was taken into account and marked as underutilized and multiple pods were evicted from other nodes in the hope that the scheduler will schedule on kubernetes-minion-group-t1r3, but that never happened since the node was cordoned.

Does it make sense to not take a cordoned node into consideration while looking for underutilized nodes?

I ran the descheduler like the following -

$ _output/bin/descheduler --kubeconfig-file /var/run/kubernetes/admin.kubeconfig --policy-config-file examples/custom.yaml -v 5 
I1125 18:58:46.014381    2813 reflector.go:198] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1125 18:58:46.016167    2813 node.go:50] node lister returned empty list, now fetch directly
I1125 18:58:46.017010    2813 reflector.go:236] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1125 18:58:47.834184    2813 lownodeutilization.go:115] Node "kubernetes-master" usage: api.ResourceThresholds{"cpu":95, "memory":11.575631035804197, "pods":8.181818181818182}
I1125 18:58:47.834986    2813 lownodeutilization.go:115] Node "kubernetes-minion-group-5rrh" usage: api.ResourceThresholds{"cpu":90.5, "memory":6.932161992316314, "pods":17.272727272727273}
I1125 18:58:47.835701    2813 lownodeutilization.go:115] Node "kubernetes-minion-group-fb8c" usage: api.ResourceThresholds{"cpu":96.5, "memory":14.0975556030657, "pods":17.272727272727273}
I1125 18:58:47.835783    2813 lownodeutilization.go:115] Node "kubernetes-minion-group-t1r3" usage: api.ResourceThresholds{"cpu":10, "memory":2.764226588836412, "pods":1.8181818181818181}
I1125 18:58:47.835819    2813 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-fb8c" with usage: api.ResourceThresholds{"cpu":96.5, "memory":14.0975556030657, "pods":17.272727272727273}
I1125 18:58:48.096681    2813 lownodeutilization.go:194] Evicted pod: "database-6f97f65956-6pxp5" (<nil>)
I1125 18:58:48.098323    2813 lownodeutilization.go:208] updated node usage: api.ResourceThresholds{"cpu":91.5, "memory":14.0975556030657, "pods":16.363636363636363}
I1125 18:58:48.361411    2813 lownodeutilization.go:194] Evicted pod: "wordpress-57f4bb46bf-g27k6" (<nil>)
I1125 18:58:48.361522    2813 lownodeutilization.go:208] updated node usage: api.ResourceThresholds{"cpu":86.5, "memory":14.0975556030657, "pods":15.454545454545455}
I1125 18:58:48.623304    2813 lownodeutilization.go:194] Evicted pod: "wordpress-57f4bb46bf-m62cm" (<nil>)
I1125 18:58:48.623330    2813 lownodeutilization.go:208] updated node usage: api.ResourceThresholds{"cpu":81.5, "memory":14.0975556030657, "pods":14.545454545454547}
I1125 18:58:48.894712    2813 lownodeutilization.go:194] Evicted pod: "wordpress-57f4bb46bf-mblx7" (<nil>)
I1125 18:58:48.894832    2813 lownodeutilization.go:208] updated node usage: api.ResourceThresholds{"cpu":76.5, "memory":14.0975556030657, "pods":13.636363636363638}
I1125 18:58:48.894991    2813 lownodeutilization.go:163] evicting pods from node "kubernetes-master" with usage: api.ResourceThresholds{"cpu":95, "memory":11.575631035804197, "pods":8.181818181818182}
I1125 18:58:48.895063    2813 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-5rrh" with usage: api.ResourceThresholds{"cpu":90.5, "memory":6.932161992316314, "pods":17.272727272727273}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4-dirty", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"dirty", BuildDate:"2017-11-25T12:04:44Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4-dirty", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"dirty", BuildDate:"2017-11-25T11:54:10Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Does descheduler honor PodDisruptionBudget?

The title says it all really, will the descheduler honor PDB's to ensure we don't evict too many things at once?

https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

Why not be a cron job for descheduler?

I go through the README and found that the descheduler is deployed as a job on kubernetes, and it only exec once.

So why not be a cron job?

Ref:
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

critical is restricted in "kube-system" namespace

Currently types.IsCriticalPod(pod) is still restricted in "kube-system" namspace.

There was a PR doing this, but seems not work as expected.
kubernetes/kubernetes@5b54626#diff-9fe046de3c6aaa377bb7fa24a34509c9R155

max-pods-to-evict-per-node default to 0? and loglevel default to 0?

I found two things that

when with no -v option, descheduler pods have no output, so loglevel default to 0?

    Command:
      /bin/descheduler
    Args:
      --policy-config-file=/policy-dir/policy.yaml
      --dry-run

And I checked help info and found no such things metioned.

  -v, --v Level                          log level for V logs

when run with non-dry-run mode with no --max-pods-to-evict-per-node option no pod will be evicted, so the flag default to 0? also no declaration in help info

# oc logs -f descheduler-cronjob-1523411400-z2z8z 
I0411 01:50:37.761513       1 round_trippers.go:436] GET https://172.30.0.1:443/api 200 OK in 124 milliseconds
I0411 01:50:37.882412       1 round_trippers.go:436] GET https://172.30.0.1:443/apis 200 OK in 15 milliseconds
I0411 01:50:37.898564       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1 200 OK in 15 milliseconds
I0411 01:50:37.903192       1 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0411 01:50:37.903215       1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0411 01:50:37.919025       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0 200 OK in 15 milliseconds
I0411 01:50:37.963552       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/nodes?resourceVersion=8694&timeoutSeconds=481&watch=true 200 OK in 25 milliseconds
I0411 01:50:38.011943       1 duplicates.go:50] Processing node: "ip-172-18-7-158.ec2.internal"
I0411 01:50:38.028600       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-7-158.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 16 milliseconds
I0411 01:50:38.433484       1 duplicates.go:54] "ReplicationController/hello-1"
I0411 01:50:38.433510       1 duplicates.go:50] Processing node: "ip-172-18-14-173.ec2.internal"
I0411 01:50:38.461836       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-14-173.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 28 milliseconds
I0411 01:50:38.479347       1 duplicates.go:54] "ReplicationController/hello-1"
I0411 01:50:38.495887       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-7-158.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 16 milliseconds
I0411 01:50:38.568027       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-14-173.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 15 milliseconds
I0411 01:50:38.569526       1 lownodeutilization.go:141] Node "ip-172-18-7-158.ec2.internal" is under utilized with usage: api.ResourceThresholds{"cpu":30, "memory":14.27776271919991, "pods":5.6}
I0411 01:50:38.569571       1 lownodeutilization.go:149] allPods:14, nonRemovablePods:9, bePods:4, bPods:1, gPods:0
I0411 01:50:38.569603       1 lownodeutilization.go:141] Node "ip-172-18-14-173.ec2.internal" is under utilized with usage: api.ResourceThresholds{"cpu":20, "memory":11.422210175359927, "pods":4.4}
I0411 01:50:38.569616       1 lownodeutilization.go:149] allPods:11, nonRemovablePods:3, bePods:8, bPods:0, gPods:0
I0411 01:50:38.569623       1 lownodeutilization.go:65] Criteria for a node under utilization: CPU: 40, Mem: 40, Pods: 40
I0411 01:50:38.569630       1 lownodeutilization.go:72] Total number of underutilized nodes: 2
I0411 01:50:38.569635       1 lownodeutilization.go:80] all nodes are underutilized, nothing to do here
I0411 01:50:38.569644       1 pod_antiaffinity.go:45] Processing node: "ip-172-18-7-158.ec2.internal"
I0411 01:50:38.585039       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-7-158.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 15 milliseconds
I0411 01:50:38.595997       1 pod_antiaffinity.go:45] Processing node: "ip-172-18-14-173.ec2.internal"
I0411 01:50:38.635903       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-14-173.ec2.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded 200 OK in 39 milliseconds
I0411 01:50:38.659140       1 node_affinity.go:31] Evicted 0 pods

Consider priorities while evicting pods

Now that 1.9 rebase happened, we can use priorities(https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) which are in alpha while evicting pods.

/bin/sh not found when using this image in kubernetes job

The descheduler will be run as a job in kube-system namespace, and the command is

    Command:
      /bin/sh
      -ec
      /bin/descheduler --policy-config-file /policy-dir/policy.yaml

So, there should be a /bin/sh binary in the container, but the image was build from sratch and didn't include it. We can find this from Dockerfile:

FROM scratch

MAINTAINER Avesh Agarwal <[email protected]>

COPY --from=0 /go/src/github.com/kubernetes-incubator/descheduler/_output/bin/descheduler /bin/descheduler

CMD ["/bin/descheduler", "--help"]

And we got the Error:

Error: failed to start container "descheduler": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: "/bin/sh": stat /bin/sh: no such file or directory"

This makes the pod runs into ContainerCannotRun state and the job create a new pod immediatly,
several minutes later I got hundreds of pods and my small cluster finally went down for no responding.

The statefulset pod in a separate namespace marked as duplicate and evicted

The namespace should not be ignored.

deschedule pods that fail to start or restart too often

It is not uncommon that pods get scheduled on nodes that are not able to start it.
For example, a node may have network issues and unable to mount a networked persistent volume, or cannot pull a docker image, or has some docker configuration issue which is seen only on container startup.

Another common issue is when a container gets restarted by liveliness check because of some local node issue (e.g. wrong routing table, slow storage, network latency or packet-drop). In that case, a pod is unhealthy most of the time and hangs in a restart state forever without a chance of being migrated to another node.

As of now, there is no possibility to re-schedule pods with faulty containers. It may be helpful to introduce two new Strategies:

container-restart-rate: re-schedule a pod if it is unhealthy since $notReadyPeriod seconds and one of its containers was restarted $maxRestartCount times.
pod-startup-failure: a pod was scheduled on a node, but was unable to start all of its containers since $maxStartupTime seconds.

The similar issue is filed against kubernetes: kubernetes/kubernetes#13385

Let's use kind for cluster creation instead of spinning up a new cluster

As of now, we are spinning up a new cluster on GCE for descheduler e2es, we should explore other alternatives to this. https://github.com/kubernetes-sigs/kind seems really easy to setup and run tests against. The problem I notice as of now is it doesn't support a multi-node cluster but this feature is actively being worked on kubernetes-sigs/kind#147, we can wait for this to land before we make switch.

confusing with "RemovePodsViolatingInterPodAntiAffinity" strategy

hi:

Just want to make it work and I found that this strategy is not work as my expect.

descheduler version
Descheduler version {Major:0 Minor:4+ GitCommit:d3c2f256852874fdca4682c3c94bc30624979036 GitVersion:v0.4.0 BuildDate:2018-01-10T13:23:09+0800 GoVersion:go1.8.5 Compiler:gc Platform:linux/amd64}

The origin try is as the following steps:

keep only one node schedulable
create a rc
oc run hello --image=openshift/hello-openshift:latest
Create another rc with antiaffinity

affinity:
   podAntiAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
     - labelSelector:
         matchExpressions:
         - key: key
           operator: In
           values: [“value”]
       topologyKey: kubernetes.io/hostname

wait all pods of rc is running;
label the pod of first rc with
oc label pod <pod_name> key=value
Setup descheduler and try to evicted pods

Then I found no one pod has been evicted.

Then I go through all the unit test and try to catch a demo for this strategy.
And try to reproduce the test in https://github.com/kubernetes-incubator/descheduler/blob/master/pkg/descheduler/strategies/pod_antiaffinity_test.go

Then the reproduced steps is:

The origin try is as the following steps:

keep only one node schedulable
Create a rc with antiaffinity

affinity:
   podAntiAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
     - labelSelector:
         matchExpressions:
         - key: key
           operator: In
           values: [“value”]
       topologyKey: kubernetes.io/hostname

Create a rc with the some antiaffinity as step 2
wait all pods of rc is running;
label the pod of first rc with
oc label pod <pod_name> key=value
Setup descheduler and try to evicted pods

Then I found there is one pod has been evicted.

So here I want to discuss is if only one scenario(in the unit test) is for the strategy.
And why my origin steps can not work? Is it a bug?

Thanks!

Is there a plan on "Integration with metrics providers for obtaining real load metrics"

I found this feature from the roadmap and I thought it involves real-time scheduling which is quite different from the current logic.

`LowNodeUtilization` policy needs all thresholds to be violated

I am testing out LowNodeUtilization policy with the following value:

           nodeResourceUtilizationThresholds:
             thresholds:
               cpu: 60
               memory: 60
               pods: 5
             targetThresholds:
               cpu: 100
               memory: 100
               pods: 1000

However all nodes are appropriately utilized.
Eg:

I0514 20:35:09.877111       1 lownodeutilization.go:147] Node "gke-asia-northeast1-std--default-pool-36ae422e-wnp4" is appropriately utilized with usage: api.ResourceThresholds{"memory":48.0697066631997, "pods":12.727272727272727, "cpu":34.64443045940843}

For the above node
"memory":48.0697066631997, < 60
"cpu":34.64443045940843 < 60
But "pods":12.727272727272727 > 5

I checked the code and it looks like IsNodeWithLowUtilization will return false if any threshold is not violated - https://github.com/kubernetes-incubator/descheduler/blob/master/pkg/descheduler/strategies/lownodeutilization.go#L298

This means that ALL thresholds need be violated instead of ANY. Is that by design?

Better control for Critical Pods

Right now the control to mark pods as critical is very basic and requires doing changes in many pods' annotation.

Proposal 1 - Non-critical annotation
If I have 100 pods but I want the descheduler to consider "non-critical" only 20, that means I have to add annotations to 80 pods. We could have a "non-critical" annotation to only mark 20 pods. This could be controlled with an argument. --non-critical-pod-matcher=true (default false).

Proposal 2 - Consider current labels as critical
If I already have an annotation in my running applications that I know identifies a set of critical pods, it would be nice to be able to say "Pods with this custom annotation and value are considered critical". With this, no changes would have to be applied at all to make descheduler run. Personally, I have an annotation called "layer" with values (backend|monitoring|data|frontend). I consider my data and monitoring Pods critical, if I already have this annotation, why add another?

It could be done with --extra-critical-annotations="layer=data,layer=monitoring,k8s-app=prometheus" . And if --non-critical-pod-matcher is set to true, then --extra-non-critical-annotations="...."

Pods do not get evicted while logs say "evicting pods from node"

So, if I understood correctly,

any node below the percentages in nodeResourceUtilizationThresholds.thresholds is considered underutilized
any node above the percentages in nodeResourceUtilizationThresholds.targetThresholds is considered overutilized
any node below the above 2 range is considered appropriately utilized by the descheduler and not taken into consideration

If this is correct, the following happens -

I have 4 nodes, 1 master node and 3 worker nodes -

$ kubectl get nodes
NAME                           STATUS                     ROLES     AGE       VERSION
kubernetes-master              Ready,SchedulingDisabled   <none>    6h        v1.10.0-alpha.0.456+f85649c6cd2032-dirty
kubernetes-minion-group-1vp4   Ready                      <none>    6h        v1.10.0-alpha.0.456+f85649c6cd2032-dirty
kubernetes-minion-group-frgx   Ready                      <none>    6h        v1.10.0-alpha.0.456+f85649c6cd2032-dirty
kubernetes-minion-group-k7c7   Ready                      <none>    6h        v1.10.0-alpha.0.456+f85649c6cd2032-dirty

I tainted and then uncordoned node kubernetes-minion-group-1vp4, which means there are no pods or Kubernetes resources on that node -

$ kubectl get all -o wide | grep kubernetes-minion-group-1vp4
$

and the allocated resources on this node are -

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  200m (10%)    0 (0%)      200Mi (2%)       300Mi (4%)

while on the other 2 worker nodes the allocated resources are -

--
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  1896m (94%)   446m (22%)  1133952Ki (15%)  1441152Ki (19%)
--
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  1840m (92%)   300m (15%)  1130Mi (15%)     1540Mi (21%)

So with the right DeschedulerPolicy, pods should have been descheduled from the loads that are over utilized and scheduled on the fresh node.

I wrote the following DeschedulerPolicy -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:  # any node below the following percentages is considered underutilized
           "cpu" : 40
           "memory": 40
           "pods": 40
         targetThresholds: # any node above the following percentages is considered overutilized
           "cpu" : 30
           "memory": 2
           "pods": 1

I run the descheduler as the following -

$ _output/bin/descheduler --kubeconfig-file /var/run/kubernetes/admin.kubeconfig --policy-config-file examples/policy.yaml  -v 5             
I1123 22:12:27.298937    9381 reflector.go:198] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1123 22:12:27.299080    9381 node.go:50] node lister returned empty list, now fetch directly
I1123 22:12:27.299230    9381 reflector.go:236] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:83
I1123 22:12:31.596854    9381 lownodeutilization.go:115] Node "kubernetes-master" usage: api.ResourceThresholds{"cpu":95, "memory":11.575631035804197, "pods":8.181818181818182}
I1123 22:12:31.597019    9381 lownodeutilization.go:115] Node "kubernetes-minion-group-1vp4" usage: api.ResourceThresholds{"memory":2.764226588836412, "pods":1.8181818181818181, "cpu":10}
I1123 22:12:31.597508    9381 lownodeutilization.go:115] Node "kubernetes-minion-group-frgx" usage: api.ResourceThresholds{"cpu":94.8, "memory":15.305177094063607, "pods":16.363636363636363}
I1123 22:12:31.597910    9381 lownodeutilization.go:115] Node "kubernetes-minion-group-k7c7" usage: api.ResourceThresholds{"cpu":92, "memory":15.617880226925726, "pods":14.545454545454545}
I1123 22:12:31.597955    9381 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-frgx" with usage: api.ResourceThresholds{"cpu":94.8, "memory":15.305177094063607, "pods":16.363636363636363}
I1123 22:12:31.597993    9381 lownodeutilization.go:163] evicting pods from node "kubernetes-minion-group-k7c7" with usage: api.ResourceThresholds{"cpu":92, "memory":15.617880226925726, "pods":14.545454545454545}
I1123 22:12:31.598017    9381 lownodeutilization.go:163] evicting pods from node "kubernetes-master" with usage: api.ResourceThresholds{"cpu":95, "memory":11.575631035804197, "pods":8.181818181818182}
$

Seems like the descheduler ended up making the decisions for evicting pods from overutilized nodes, but when I check the cluster, nothing on the old nodes was terminated and nothing on the fresh node popped up -

$ kubectl get all -o wide | grep kubernetes-minion-group-1vp4
$

What am I doing wrong? :(

all nodes are under target utilization, nothing to do here

Given the following policy:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "cpu" : 50
           "memory": 50
           "pods": 10
         targetThresholds:
           "cpu" : 50
           "memory": 50
           "pods": 50

I am confused by this output:

./_output/bin/descheduler --kubeconfig ~/.kube/config --policy-config-file policy.yaml --node-selector beta.kubernetes.io/instance-type=n1-highmem-4 -v 4
I0814 11:56:02.699491   27948 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0814 11:56:02.699629   27948 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0814 11:56:02.799560   27948 node.go:51] node lister returned empty list, now fetch directly
I0814 11:56:04.839125   27948 request.go:480] Throttling request took 122.384366ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-mr9b%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.039313   27948 request.go:480] Throttling request took 82.857788ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-ps8k%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.239153   27948 request.go:480] Throttling request took 65.339548ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-qcpq%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.439252   27948 request.go:480] Throttling request took 126.223138ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-qg9g%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.639253   27948 request.go:480] Throttling request took 128.039815ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-tw62%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.839299   27948 request.go:480] Throttling request took 111.435987ms, request: GET:https://x.x.x.x.x/api/v1/pods?fieldSelector=spec.nodeName%3Dgke-node-bf2a5a1e-w7n7%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0814 11:56:05.912036   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-kfh8" is under utilized with usage: api.ResourceThresholds{"cpu":21.428571428571427, "memory":7.093371019678181, "pods":7.2727272727272725}
I0814 11:56:05.912091   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.912148   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-mr9b" is under utilized with usage: api.ResourceThresholds{"pods":9.090909090909092, "cpu":33.92857142857143, "memory":3.773439230136872}
I0814 11:56:05.912160   27948 lownodeutilization.go:149] allPods:10, nonRemovablePods:7, bePods:0, bPods:3, gPods:0
I0814 11:56:05.912219   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-qcpq" is under utilized with usage: api.ResourceThresholds{"cpu":15.561224489795919, "memory":7.322094578473096, "pods":9.090909090909092}
I0814 11:56:05.912230   27948 lownodeutilization.go:149] allPods:10, nonRemovablePods:6, bePods:0, bPods:4, gPods:0
I0814 11:56:05.912264   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-qg9g" is under utilized with usage: api.ResourceThresholds{"cpu":27.551020408163264, "memory":7.681984428891371, "pods":7.2727272727272725}
I0814 11:56:05.912273   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.912513   27948 lownodeutilization.go:147] Node "gke-node-bf2a5a1e-9b15" is appropriately utilized with usage: api.ResourceThresholds{"memory":13.213583436348788, "pods":10.909090909090908, "cpu":22.372448979591837}
I0814 11:56:05.912537   27948 lownodeutilization.go:149] allPods:12, nonRemovablePods:6, bePods:0, bPods:6, gPods:0
I0814 11:56:05.912613   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-cs2l" is under utilized with usage: api.ResourceThresholds{"memory":2.9337443495765223, "pods":7.2727272727272725, "cpu":27.551020408163264}
I0814 11:56:05.912631   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.912749   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-dk7j" is under utilized with usage: api.ResourceThresholds{"cpu":15.051020408163266, "memory":6.403485637657102, "pods":7.2727272727272725}
I0814 11:56:05.912770   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:1, bPods:1, gPods:0
I0814 11:56:05.912828   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-ggdw" is under utilized with usage: api.ResourceThresholds{"cpu":19.387755102040817, "memory":1.9656336753240788, "pods":7.2727272727272725}
I0814 11:56:05.912855   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.912897   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-tw62" is under utilized with usage: api.ResourceThresholds{"cpu":20.918367346938776, "memory":2.4510629446719583, "pods":6.363636363636363}
I0814 11:56:05.912909   27948 lownodeutilization.go:149] allPods:7, nonRemovablePods:5, bePods:0, bPods:2, gPods:0
I0814 11:56:05.920135   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-3rt5" is under utilized with usage: api.ResourceThresholds{"cpu":27.551020408163264, "memory":7.68198179540342, "pods":7.2727272727272725}
I0814 11:56:05.920172   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.920269   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-50rb" is under utilized with usage: api.ResourceThresholds{"cpu":28.316326530612244, "memory":8.43523410990875, "pods":10}
I0814 11:56:05.920288   27948 lownodeutilization.go:149] allPods:11, nonRemovablePods:6, bePods:0, bPods:5, gPods:0
I0814 11:56:05.920354   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-hmq1" is under utilized with usage: api.ResourceThresholds{"cpu":27.806122448979593, "memory":7.93306590023853, "pods":8.181818181818182}
I0814 11:56:05.920370   27948 lownodeutilization.go:149] allPods:9, nonRemovablePods:6, bePods:0, bPods:3, gPods:0
I0814 11:56:05.920444   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-ps8k" is under utilized with usage: api.ResourceThresholds{"cpu":27.040816326530614, "memory":7.1798135857332, "pods":5.454545454545454}
I0814 11:56:05.920467   27948 lownodeutilization.go:149] allPods:6, nonRemovablePods:6, bePods:0, bPods:0, gPods:0
I0814 11:56:05.920580   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-w7n7" is under utilized with usage: api.ResourceThresholds{"cpu":23.724489795918366, "memory":4.948809588699222, "pods":8.181818181818182}
I0814 11:56:05.920632   27948 lownodeutilization.go:149] allPods:9, nonRemovablePods:5, bePods:0, bPods:4, gPods:0
I0814 11:56:05.920674   27948 lownodeutilization.go:147] Node "gke-node-bf2a5a1e-1t8l" is appropriately utilized with usage: api.ResourceThresholds{"memory":0.8776025543719349, "pods":3.6363636363636362, "cpu":7.653061224489796}
I0814 11:56:05.920690   27948 lownodeutilization.go:149] allPods:4, nonRemovablePods:4, bePods:0, bPods:0, gPods:0
I0814 11:56:05.920733   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-fjbd" is under utilized with usage: api.ResourceThresholds{"pods":5.454545454545454, "cpu":27.040816326530614, "memory":7.1798135857332}
I0814 11:56:05.920745   27948 lownodeutilization.go:149] allPods:6, nonRemovablePods:6, bePods:0, bPods:0, gPods:0
I0814 11:56:05.921125   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-kxb8" is under utilized with usage: api.ResourceThresholds{"cpu":27.29591836734694, "memory":7.43089769056831, "pods":6.363636363636363}
I0814 11:56:05.921145   27948 lownodeutilization.go:149] allPods:7, nonRemovablePods:6, bePods:0, bPods:1, gPods:0
I0814 11:56:05.921205   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-l4xf" is under utilized with usage: api.ResourceThresholds{"memory":2.8898642218579256, "pods":7.2727272727272725, "cpu":22.193877551020407}
I0814 11:56:05.921220   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.921279   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-cvsc" is under utilized with usage: api.ResourceThresholds{"cpu":8.418367346938776, "memory":1.4256836686734968, "pods":7.2727272727272725}
I0814 11:56:05.921297   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:5, bePods:1, bPods:1, gPods:1
I0814 11:56:05.921388   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-kntr" is under utilized with usage: api.ResourceThresholds{"cpu":12.525510204081632, "memory":3.879292539212722, "pods":6.363636363636363}
I0814 11:56:05.921407   27948 lownodeutilization.go:149] allPods:7, nonRemovablePods:5, bePods:0, bPods:2, gPods:0
I0814 11:56:05.921478   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-5415" is under utilized with usage: api.ResourceThresholds{"cpu":21.428571428571427, "memory":7.093371019678181, "pods":7.2727272727272725}
I0814 11:56:05.921495   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:6, bePods:0, bPods:2, gPods:0
I0814 11:56:05.921562   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-5dd0" is under utilized with usage: api.ResourceThresholds{"cpu":27.806122448979593, "memory":7.93306590023853, "pods":8.181818181818182}
I0814 11:56:05.921580   27948 lownodeutilization.go:149] allPods:9, nonRemovablePods:6, bePods:0, bPods:3, gPods:0
I0814 11:56:05.921636   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-74z1" is under utilized with usage: api.ResourceThresholds{"cpu":21.1734693877551, "memory":2.7021479758398677, "pods":7.2727272727272725}
I0814 11:56:05.921652   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:5, bePods:0, bPods:3, gPods:0
I0814 11:56:05.923369   27948 lownodeutilization.go:141] Node "gke-node-bf2a5a1e-bvsh" is under utilized with usage: api.ResourceThresholds{"cpu":21.1734693877551, "memory":2.7021470495070683, "pods":7.2727272727272725}
I0814 11:56:05.923431   27948 lownodeutilization.go:149] allPods:8, nonRemovablePods:5, bePods:0, bPods:3, gPods:0
I0814 11:56:05.923442   27948 lownodeutilization.go:65] Criteria for a node under utilization: CPU: 50, Mem: 50, Pods: 10
I0814 11:56:05.923478   27948 lownodeutilization.go:72] Total number of underutilized nodes: 22
I0814 11:56:05.923493   27948 lownodeutilization.go:85] all nodes are under target utilization, nothing to do here

According to the descheduler, Total number of underutilized nodes: 22 and all nodes are under target utilization, yet nothing to do here. None of my underutilized nodes get drained.

How can I instruct the descheduler to drain the underutilized nodes?

pod antiaffinity strategy evicts all pods.

Pod anti affinity strategy doesn't have a check on type of pod to be evicted. It can evict critical, mirror pods. As this is some functionality that needs to be respected by all strategies in descheduler, I am planning to move this to pods.go to avoid code duplication so that people implementing strategies won't have to think about them.

No Auth Provider found for name "gcp"

Hi,

My k8s cluster is running on GKE.

I tried using the descheduler, but after compiling I get this error:

$ ./bin/descheduler --dry-run --kubeconfig ~/.kube/config
E0521 16:26:53.978025    4226 server.go:46] No Auth Provider found for name "gcp"

Apparently the code in descheduler/pkg/descheduler/client/client.go needs to import _ "k8s.io/client-go/plugin/pkg/client/auth/gcp" (or _ "k8s.io/client-go/plugin/pkg/client/auth" to support other auth providers)

After adding _ "k8s.io/client-go/plugin/pkg/client/auth/gcp" to the import list I was able to authenticate against GKE.

Evictions found but pods are not deleted

I have this weird issue where the descheduler correctly spots which pods to be evicted but no pods are actually deleted.

Could it be a permission issue? I'm using RBAC and have setup the roles like described in the README

I0608 11:47:47.133914       1 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0608 11:47:47.133965       1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84
I0608 11:47:47.234083       1 duplicates.go:50] Processing node: "ip-172-20-34-164.eu-west-2.compute.internal"
I0608 11:47:47.342677       1 duplicates.go:54] "ReplicaSet/notification-service-v1-b66596d89"
I0608 11:47:47.342721       1 duplicates.go:50] Processing node: "ip-172-20-80-152.eu-west-2.compute.internal"
I0608 11:47:47.354665       1 duplicates.go:54] "ReplicaSet/rabbitmq-k8s-0-5b87cfcfbd"
I0608 11:47:47.354688       1 duplicates.go:54] "ReplicaSet/revenue-modeller-data-store-v2-97fcdc568"
I0608 11:47:47.354697       1 duplicates.go:54] "ReplicaSet/entitlement-service-v1-8454ccd585"
I0608 11:47:47.354705       1 duplicates.go:50] Processing node: "ip-172-20-104-255.eu-west-2.compute.internal"
I0608 11:47:47.407949       1 duplicates.go:54] "ReplicaSet/alert-service-v1-7dc6ddcf8d"
I0608 11:47:47.408001       1 duplicates.go:54] "ReplicaSet/hazelcast-k8s-0-7466b7cb4f"
I0608 11:47:47.438606       1 lownodeutilization.go:141] Node "ip-172-20-34-164.eu-west-2.compute.internal" is under utilized with usage: api.ResourceThresholds{"cpu":32.5, "memory":20.6892852865826, "pods":5.454545454545454}
I0608 11:47:47.438649       1 lownodeutilization.go:149] allPods:6, nonRemovablePods:2, bePods:0, bPods:2, gPods:2
I0608 11:47:47.438798       1 lownodeutilization.go:144] Node "ip-172-20-80-152.eu-west-2.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":99, "memory":86.26743748074475, "pods":16.363636363636363}
I0608 11:47:47.438821       1 lownodeutilization.go:149] allPods:18, nonRemovablePods:6, bePods:1, bPods:10, gPods:1
I0608 11:47:47.438990       1 lownodeutilization.go:144] Node "ip-172-20-104-255.eu-west-2.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":99.5, "memory":92.91303949597655, "pods":15.454545454545455}
I0608 11:47:47.439014       1 lownodeutilization.go:149] allPods:17, nonRemovablePods:8, bePods:0, bPods:6, gPods:3
I0608 11:47:47.439023       1 lownodeutilization.go:65] Criteria for a node under utilization: CPU: 74, Mem: 68, Pods: 12
I0608 11:47:47.439034       1 lownodeutilization.go:72] Total number of underutilized nodes: 1
I0608 11:47:47.439047       1 lownodeutilization.go:89] Criteria for a node above target utilization: CPU: 77, Mem: 75, Pods: 14
I0608 11:47:47.439061       1 lownodeutilization.go:91] Total number of nodes above target utilization: 2
I0608 11:47:47.439077       1 lownodeutilization.go:183] Total capacity to be moved: CPU:1780, Mem:9.083513856e+09, Pods:9.4
I0608 11:47:47.439093       1 lownodeutilization.go:184] ********Number of pods evicted from each node:***********
I0608 11:47:47.439101       1 lownodeutilization.go:191] evicting pods from node "ip-172-20-104-255.eu-west-2.compute.internal" with usage: api.ResourceThresholds{"pods":15.454545454545455, "cpu":99.5, "memory":92.91303949597655}
I0608 11:47:47.439125       1 lownodeutilization.go:202] 0 pods evicted from node "ip-172-20-104-255.eu-west-2.compute.internal" with usage map[cpu:99.5 memory:92.91303949597655 pods:15.454545454545455]
I0608 11:47:47.439152       1 lownodeutilization.go:191] evicting pods from node "ip-172-20-80-152.eu-west-2.compute.internal" with usage: api.ResourceThresholds{"cpu":99, "memory":86.26743748074475, "pods":16.363636363636363}
I0608 11:47:47.439175       1 lownodeutilization.go:202] 0 pods evicted from node "ip-172-20-80-152.eu-west-2.compute.internal" with usage map[cpu:99 memory:86.26743748074475 pods:16.363636363636363]
I0608 11:47:47.439195       1 lownodeutilization.go:94] Total number of pods evicted: 0
I0608 11:47:47.439203       1 pod_antiaffinity.go:45] Processing node: "ip-172-20-34-164.eu-west-2.compute.internal"
I0608 11:47:47.446324       1 pod_antiaffinity.go:45] Processing node: "ip-172-20-80-152.eu-west-2.compute.internal"
I0608 11:47:47.455917       1 pod_antiaffinity.go:45] Processing node: "ip-172-20-104-255.eu-west-2.compute.internal"
I0608 11:47:47.492859       1 node_affinity.go:31] Evicted 0 pods

This is using Kubernetes v1.10.3

RemoveDuplicates strategy is not working with kubernetes 1.9

Currently RemoveDuplicates evict pod according to metadata.annotations.kubernetes.io/created-by, but created-by is deprecated (https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md)

https://github.com/kubernetes-incubator/descheduler/blob/master/pkg/descheduler/pod/pods.go#L113

Is it based on kubectl top node CPU & memory?

While I am using this descheduler, I have noticed that the log shows the exactly the same number of memory utilization for many nodes. Also, each node shows exactly the same number of CPU util & memory util in the log. it seems like descheduler is calculating the utilization from resource requests & limits?
I was hoping it is utilizing the kubectl top nodes to calculating current utilization (which should reflect the results in the log with dynamically changing % of CPU & memory util at the moment). Please clarify how is it calculating the current node resource utilization.

e.g. here is the data that I am talking: in the log, I see this: Node “172.16.4.3" is appropriately utilized with usage: api.ResourceThresholds{“cpu”:52.5, “memory”:32.080248132547204, “pods”:41.25} but kubectl top node shows 172.16.4.3 2025m 25% 8699Mi 55% - meaning CPU 25%, memory 55% utilized
Also, many of my pods are showing memory utilization exactly same as "memory":4.193744917801392

Parallalize computation in strategies.

cc @aveshagarwal . As per our offline discussion, we need to parallelize computation in the strategies so as to reduce the overall time. Coming up with a generic MapReduce framework would be ideal.

Builds failing.

Not to sure if this is an issue for anyone else but in order to build and run out of box I had to update the Dockerfile it build FROM debian:stretch-slim in order to run with/bin/sh.

Serviceaccount descheduler-sa have no permission to evict pod

Go through the README and got

I1130 06:29:15.559480       1 duplicates.go:59] Error when evicting pod: "nginx-1-55kh7" (&errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:""}, Status:"Failure", Message:"pods \"nginx-1-55kh7\" is forbidden: User \"system:serviceaccount:kube-system:descheduler-sa\" cannot create pods/eviction in the namespace \"default\": User \"system:serviceaccount:kube-system:descheduler-sa\" cannot create pods/eviction in project \"default\"", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc4202d77a0), Code:403}})

Usage with Openshift

It would be very helpfull to get a documentation about how to use and setup the descheduler job within a openshift environment.

I tried to follow the README within my openshift cluster but when creating the ClusterRole i get the following error:
error: unable to recognize "STDIN": no matches for rbac.authorization.k8s.io/, Kind=ClusterRole

When calling the "make" on my MAC OS or CENTOS also the build fails:

go build -ldflags "-X github.com/kubernetes-incubator/descheduler/cmd/descheduler/app.version=git describe --tags -X github.com/kubernetes-incubator/descheduler/cmd/descheduler/app.buildDate=date +%FT%T%z -X github.com/kubernetes-incubator/descheduler/cmd/descheduler/app.gitCommit=git rev-parse HEAD" -o _output/bin/descheduler github.com/kubernetes-incubator/descheduler/cmd/descheduler

Add support for node affinity strategy

From the little that I read about node affinity, does adding the following strategy make sense -

For pods with node affinity set using preferredDuringSchedulingIgnoredDuringExecution, it might be possible that the preferred node was unavailable during scheduling and the pod was scheduled on another node. In this case, if the descheduler is run, it does the following -

checks for all the pods with nodeAffinity defined using preferredDuringSchedulingIgnoredDuringExecution
checks if the pod is actually scheduled on the preferred node or not
if not, descheduler checks if the preferred node is available and is schedulable
if such a node is found, descheduler evicts the pod (and hopefully the scheduler schedules it on the preferred node 🎉)

Maybe we could have a policy file describing the strategy like -

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemovePodsViolatingNodeAffinity":
     enabled: true

@aveshagarwal @ravisantoshgudimetla if this makes sense, can I take a stab at a PoC for this?

Status of this incubator project?

Hello! This feature seems fundamental to strong bin packing, but it's been months since the last update.

Is this project still active and is there a timeline to have it merged into an official K8 release?

kubernetes-sigs / descheduler Goto Github PK

descheduler's People

Stargazers

Watchers

Forkers

descheduler's Issues

Recommend Projects

Recommend Topics

Recommend Org