gocrane / crane-scheduler Goto Github PK
View Code? Open in Web Editor NEWCrane scheduler is a Kubernetes scheduler which can schedule pod based on actual node load.
License: Apache License 2.0
Crane scheduler is a Kubernetes scheduler which can schedule pod based on actual node load.
License: Apache License 2.0
hi, everyone. Is this project still active or has it been migrated? Is this scheduler suitable for production use?
大佬,crane-scheduler有没有提供可供第三方访问的接口或者sdk?现在我想在自己的代码中调用crane-scheduler的一些功能。
hello, I deployed crane-scheduler using helm chart. my prometheus svc addr is as below:
but there are error logs in controller pod:
Post "192.168.15.25/api/v1/query": unsupported protocol scheme ""
can someone explains why it happens. Thanks.
[root@zcsmaster1 manifests]# cat kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- /scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=192.168.40.180
- --config=/etc/kubernetes/kube-scheduler/scheduler-config.yaml
- --leader-elect=true
image: docker.io/gocrane/crane-scheduler:0.0.20
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 192.168.40.180
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 12
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 192.168.40.180
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- name: scheduler-config
mountPath: /etc/kubernetes/kube-scheduler
readOnly: true
- name: dynamic-scheduler-policy
mountPath: /etc/kubernetes
hostNetwork: true
priorityClassName: system-node-critical
volumes:
[root@zcsmaster1 manifests]#
您好 ,对默认的调度器进行替换 ,这种方式一直不成功,可以出一下详细的文档吗?
Events:
Type Reason Age From Message
Normal Scheduled 93s default-scheduler Successfully assigned kube-system/kube-scheduler to zcsnode2
Normal Pulled 36s (x4 over 93s) kubelet Container image "docker.io/gocrane/crane-scheduler:0.0.20" already present on machine
Normal Created 36s (x4 over 93s) kubelet Created container kube-scheduler
Normal Started 36s (x4 over 93s) kubelet Started container kube-scheduler
Warning BackOff 3s (x10 over 91s) kubelet Back-off restarting failed container
[root@zcsmaster1 manifests]# kubectl describe pod kube-scheduler -n kube-system
Events:
Type Reason Age From Message
Normal Scheduled 53m default-scheduler Successfully assigned kube-system/crane-scheduler-controller-7845b4cbf7-dhrkm to zcsnode2
Normal Pulled 52m (x2 over 53m) kubelet Container image "docker.io/gocrane/crane-scheduler-controller:0.0.23" already present on machine
Normal Created 52m (x2 over 53m) kubelet Created container controller
Normal Started 52m (x2 over 53m) kubelet Started container controller
Normal Killing 52m kubelet Container controller failed liveness probe, will be restarted
Warning Unhealthy 51m (x5 over 53m) kubelet Liveness probe failed: Get "http://10.244.234.118:8090/healthz": dial tcp 10.244.234.118:8090: connect: connection refused
Warning BackOff 8m29s (x116 over 46m) kubelet Back-off restarting failed container
Warning Unhealthy 3m40s (x138 over 53m) kubelet Readiness probe failed: Get "http://10.244.234.118:8090/healthz": dial tcp 10.244.234.118:8090: connect: connection refused
[root@zcsmaster1 manifests]#
When the pod uses the waitForFirstConsumer type pvc, the crane-scheduler does not have sufficient permissions to update the annotation of the pvc. Scheduler needs permission to update the pvc.
Pod was successfully scheduled.
Create a pod that uses pvc, and the pvc uses a waitForFirstConsumer type storageclass.
k8s版本1.21.10
kube-scheduler-master报错信息:
[root@master ~]# kubectl describe pod kube-scheduler-master -n kube-system
Name: kube-scheduler-master
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master/192.168.189.100
Start Time: Sun, 12 Feb 2023 19:10:12 +0800
Labels: component=kube-scheduler
tier=control-plane
Annotations: kubernetes.io/config.hash: 456d6f68d333532ade0a5a2a7823efaf
kubernetes.io/config.mirror: 456d6f68d333532ade0a5a2a7823efaf
kubernetes.io/config.seen: 2023-03-02T16:52:39.104787713+08:00
kubernetes.io/config.source: file
Status: Running
IP: 192.168.189.100
IPs:
IP: 192.168.189.100
Controlled By: Node/master
Containers:
kube-scheduler:
Container ID: docker://80e22d215ac0eccdce39322f85307f46c558d84d70346a12c89ad45150b440c7
Image: gocrane/crane-scheduler:0.0.23
Image ID: docker-pullable://gocrane/crane-scheduler@sha256:9ba6d11b20794b29d35661998e806b5711b36f49f5b57e8bd32af2ca8426c928
Port: <none>
Host Port: <none>
Command:
kube-scheduler
--authentication-kubeconfig=/etc/kubernetes/scheduler.conf
--authorization-kubeconfig=/etc/kubernetes/scheduler.conf
--bind-address=127.0.0.1
--kubeconfig=/etc/kubernetes/scheduler.conf
--leader-elect=true
--port=0
--config=/etc/kubernetes/scheduler-config.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "kube-scheduler": executable file not found in $PATH: unknown
Exit Code: 127
Started: Thu, 02 Mar 2023 16:53:13 +0800
Finished: Thu, 02 Mar 2023 16:53:13 +0800
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Liveness: http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=8
Startup: http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=24
Environment: <none>
Mounts:
/etc/kubernetes/scheduler-config.yaml from schedulerconfig (ro)
/etc/kubernetes/scheduler.conf from kubeconfig (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/scheduler.conf
HostPathType: FileOrCreate
schedulerconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/scheduler-config.yaml
HostPathType: FileOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 17s (x3 over 39s) kubelet Container image "gocrane/crane-scheduler:0.0.23" already present on machine
Normal Created 17s (x3 over 39s) kubelet Created container kube-scheduler
Warning Failed 17s (x3 over 39s) kubelet Error: failed to start container "kube-scheduler": Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "kube-scheduler": executable file not found in $PATH: unknown
Warning BackOff 1s (x6 over 38s) kubelet Back-off restarting failed container
无法修改版本,显示找不到scheduler-config,应该是因为kube-scheduler-master无法创建的原因
[root@master ~]# KUBE_EDITOR="sed -i 's/v1beta2/v1beta1/g'" kubectl edit cm scheduler-config -n crane-system && KUBE_EDITOR="sed -i 's/0.0.23/0.0.20/g'" kubectl edit deploy crane-scheduler -n crane-system
Error from server (NotFound): configmaps "scheduler-config" not found
修改的kube-scheduler.yaml内容,镜像提前拉取到本地
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
- --port=0
- --config=/etc/kubernetes/scheduler-config.yaml
# image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.21.10
image: gocrane/crane-scheduler:0.0.23
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/scheduler-config.yaml
name: schedulerconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/scheduler-config.yaml
type: FileOrCreate
name: schedulerconfig
status: {}
修改的scheduler-config.yaml的内容
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
- schedulerName: default-scheduler
plugins:
filter:
enabled:
- name: Dynamic
score:
enabled:
- name: Dynamic
weight: 3
pluginConfig:
- name: Dynamic
args:
policyConfigPath: /etc/kubernetes/policy.yaml
k8s 1.16可以用吗?使用了1.18才有的特性?
毕业标准描述。1.16开始支持调度框架
[root@zcsmaster1 manifests]# cat kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
请问那边有问题
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Oct 2022 09:49:30 +0800
Finished: Tue, 18 Oct 2022 09:49:30 +0800
Ready: False
Restart Count: 0
Add livenessProbe and readinessProbe for controller
k8s: v1.21.5
E0625 08:39:09.924340 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:39:09.924391 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl" E0625 08:39:09.940324 1 scheduler.go:379] scheduler cache AssumePod failed: pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed E0625 08:39:09.940364 1 factory.go:338] "Error scheduling pod; retrying" err="pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed" pod="dev-app/jz-digital-attendance-mobile-web-5f69dd6645-hpgqm" W0625 08:39:23.173299 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget E0625 08:40:09.914104 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:40:09.914157 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:40:39.915075 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:40:39.915127 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl" E0625 08:40:39.927121 1 scheduler.go:379] scheduler cache AssumePod failed: pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed E0625 08:40:39.927172 1 factory.go:338] "Error scheduling pod; retrying" err="pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed" pod="dev-app/jz-digital-attendance-mobile-web-5f69dd6645-hpgqm" E0625 08:41:09.915082 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:41:09.915123 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:42:09.915837 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:42:09.915879 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl" E0625 08:42:09.925737 1 scheduler.go:379] scheduler cache AssumePod failed: pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed E0625 08:42:09.925772 1 factory.go:338] "Error scheduling pod; retrying" err="pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed" pod="dev-app/jz-digital-attendance-mobile-web-5f69dd6645-hpgqm" E0625 08:42:09.936894 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:42:09.936970 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:43:09.917671 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:43:09.917714 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl" E0625 08:43:39.918409 1 scheduler.go:379] scheduler cache AssumePod failed: pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed E0625 08:43:39.918449 1 factory.go:338] "Error scheduling pod; retrying" err="pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed" pod="dev-app/jz-digital-attendance-mobile-web-5f69dd6645-hpgqm" E0625 08:43:39.930036 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:43:39.930072 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:43:50.288516 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:43:50.302026 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:44:09.919255 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:44:09.919303 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl" E0625 08:44:39.920148 1 scheduler.go:379] scheduler cache AssumePod failed: pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed E0625 08:44:39.920193 1 factory.go:338] "Error scheduling pod; retrying" err="pod 50450750-6476-4e89-8232-f3f756483a11 is in the cache, so can't be assumed" pod="dev-app/jz-digital-attendance-mobile-web-5f69dd6645-hpgqm" E0625 08:45:09.920842 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed E0625 08:45:09.920881 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0a64a369-5b40-41f3-b354-d056f79b5a81 is in the cache, so can't be assumed" pod="dev-app/xiaofang-auth-admin-web-64bbd7fd4-sct6d" E0625 08:45:09.931887 1 scheduler.go:379] scheduler cache AssumePod failed: pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed E0625 08:45:09.931959 1 factory.go:338] "Error scheduling pod; retrying" err="pod 0ad913e1-30bb-48e7-b563-78ee26bee313 is in the cache, so can't be assumed" pod="dev-app/base-v1-web-5f9b4fb6fc-wqbcl"
使用crane scheduler,不改变pod的默认调度器,仅加入filter和score plugin,是否可以使节点部署超过request的pod
crane版本:helm scheduler-0.2.2
k8s版本:1.24
使用一台k8s节点,规格16核32GB
节点Annotations负载
Annotations: alpha.kubernetes.io/provided-node-ip: 172.30.64.34
cpu_usage_avg_5m: 0.63012,2023-10-17T15:04:32Z
cpu_usage_max_avg_1d: 0.63666,2023-10-17T14:03:36Z
cpu_usage_max_avg_1h: 0.63654,2023-10-17T15:01:29Z
mem_usage_avg_5m: 0.21519,2023-10-17T15:04:34Z
mem_usage_max_avg_1d: 0.21614,2023-10-17T14:02:41Z
mem_usage_max_avg_1h: 0.21700,2023-10-17T15:01:53Z
node.alpha.kubernetes.io/ttl: 0
node_hot_value: 0,2023-10-17T15:04:34Z
节点Requests
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 15367m (96%) 1770m (11%)
memory 25943Mi (91%) 1500Mi (5%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
创建一个测试服务,共有8个pod副本,每个pod压测产生2核1GB负载,requests为3核5GB
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-nginx
namespace: demo
labels:
app: demo-nginx
spec:
replicas: 8
selector:
matchLabels:
app: demo-nginx
template:
metadata:
labels:
app: demo-nginx
spec:
schedulerName: crane-scheduler
containers:
- name: demo-nginx
image: xxxxxx/stress:latest
command: ["stress", "-c", "1","--vm", "1", "--vm-bytes", "1G", "--vm-keep"]
ports:
- containerPort: 80
resources:
requests:
cpu: 3
memory: 5Gi
实际只运行了5个,实际应该运行至少6个Pod
$ kgp -A -o wide|grep demo-nginx
demo demo-nginx-69db9d45df-4j2rh 1/1 Running 0 3h20m 172.30.64.199 ip-172-30-64-34.ap-northeast-1.compute.internal <none> <none>
demo demo-nginx-69db9d45df-4jc5h 0/1 Pending 0 3h20m <none> <none> <none> <none>
demo demo-nginx-69db9d45df-6p4jz 0/1 Pending 0 3h20m <none> <none> <none> <none>
demo demo-nginx-69db9d45df-7fdn2 1/1 Running 0 3h20m 172.30.64.111 ip-172-30-64-34.ap-northeast-1.compute.internal <none> <none>
demo demo-nginx-69db9d45df-b75mz 1/1 Running 0 3h20m 172.30.64.78 ip-172-30-64-34.ap-northeast-1.compute.internal <none> <none>
demo demo-nginx-69db9d45df-vsp6g 1/1 Running 0 3h20m 172.30.64.97 ip-172-30-64-34.ap-northeast-1.compute.internal <none> <none>
demo demo-nginx-69db9d45df-xxrsb 1/1 Running 0 3h20m 172.30.64.10 ip-172-30-64-34.ap-northeast-1.compute.internal <none> <none>
demo demo-nginx-69db9d45df-zgkjr 0/1 Pending 0 8m56s <none> <none> <none> <none>
predicate配置
$ k get cm dynamic-scheduler-policy -n crane-system -o yaml
apiVersion: v1
data:
policy.yaml: |
apiVersion: scheduler.policy.crane.io/v1alpha1
kind: DynamicSchedulerPolicy
spec:
syncPolicy:
##cpu usage
- name: cpu_usage_avg_5m
period: 3m
- name: cpu_usage_max_avg_1h
period: 15m
- name: cpu_usage_max_avg_1d
period: 3h
##memory usage
- name: mem_usage_avg_5m
period: 3m
- name: mem_usage_max_avg_1h
period: 15m
- name: mem_usage_max_avg_1d
period: 3h
predicate:
##cpu usage
- name: cpu_usage_avg_5m
maxLimitPecent: 0.90
- name: cpu_usage_max_avg_1h
maxLimitPecent: 0.95
##memory usage
- name: mem_usage_avg_5m
maxLimitPecent: 0.90
- name: mem_usage_max_avg_1h
maxLimitPecent: 0.95
priority:
###score = sum(() * weight) / len, 0 <= score <= 10
##cpu usage
- name: cpu_usage_avg_5m
weight: 0.2
- name: cpu_usage_max_avg_1h
weight: 0.3
- name: cpu_usage_max_avg_1d
weight: 0.5
##memory usage
- name: mem_usage_avg_5m
weight: 0.2
- name: mem_usage_max_avg_1h
weight: 0.3
- name: mem_usage_max_avg_1d
weight: 0.5
hotValue:
- timeRange: 5m
count: 20
- timeRange: 1m
count: 10
crane-scheduler是根据节点实际负载调度Pod,为什么节点内存负载是0.21,CPU负载是0.63,且未触发predicate指标阀值,实际只运行了5个pod,按照节点剩余25GB((1-0.21)*32)内存、5核CPU((1-0.63)*16)资源计算,至少可运行6个以上Pod
调度失败
I1121 03:06:27.345666 1 plugins.go:92] [crane] Node[dev-monitoring]'s finalscore is 69, while score is 69 and hotvalue is 0.000000
I1121 03:06:27.345752 1 plugins.go:92] [crane] Node[dev-qchen]'s finalscore is 81, while score is 81 and hotvalue is 0.000000
I1121 03:06:27.345751 1 plugins.go:92] [crane] Node[bqdev02]'s finalscore is 72, while score is 72 and hotvalue is 0.000000
I1121 03:06:27.345775 1 plugins.go:92] [crane] Node[bqdev01]'s finalscore is 85, while score is 85 and hotvalue is 0.000000
I1121 03:06:27.345780 1 plugins.go:92] [crane] Node[bqdev03]'s finalscore is 74, while score is 74 and hotvalue is 0.000000
I1121 03:06:27.345787 1 plugins.go:92] [crane] Node[dev-node4]'s finalscore is 67, while score is 67 and hotvalue is 0.000000
I1121 03:06:27.345797 1 plugins.go:92] [crane] Node[dev-xyli]'s finalscore is 67, while score is 67 and hotvalue is 0.000000
I1121 03:06:27.345790 1 plugins.go:92] [crane] Node[dev-master3]'s finalscore is 79, while score is 79 and hotvalue is 0.000000
I1121 03:06:27.345810 1 plugins.go:92] [crane] Node[dev-node5]'s finalscore is 83, while score is 83 and hotvalue is 0.000000
I1121 03:06:27.345821 1 plugins.go:92] [crane] Node[dev-whliao]'s finalscore is 73, while score is 73 and hotvalue is 0.000000
E1121 03:06:27.358217 1 framework.go:1000] "Failed running Bind plugin" err="Operation cannot be fulfilled on pods/binding \"cpu-stress-59f8597545-7bdrq\": pod cpu-stress-59f8597545-7bdrq is already assigned to node \"dev-node5\"" plugin="DefaultBinder" pod="crane-system/cpu-stress-59f8597545-7bdrq"
E1121 03:06:27.358235 1 scheduler.go:610] "scheduler cache ForgetPod failed" err="pod c2cae006-2ae2-4ca6-b2f6-6af43faaa972 wasn't assumed so cannot be forgotten"
E1121 03:06:27.358250 1 factory.go:225] "Error scheduling pod; retrying" err="binding rejected: running Bind plugin \"DefaultBinder\": Operation cannot be fulfilled on pods/binding \"cpu-stress-59f8597545-7bdrq\": pod cpu-stress-59f8597545-7bdrq is already assigned to node \"dev-node5\"" pod="crane-system/cpu-stress-59f8597545-7bdrq"
I1121 03:06:27.358258 1 factory.go:238] "Pod has been assigned to node. Abort adding it back to queue." pod="crane-system/cpu-stress-59f8597545-7bdrq" node="dev-node5"
controller一直重启无法正常Running,查看日志提示如下,但是describe node看是已经成功添加Annotations的
I0201 18:00:43.153543 1 node.go:75] Finished syncing node event "node-2/mem_usage_max_avg_1d" (20.320214ms)
I0201 18:00:43.175135 1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1d" (21.563645ms)
I0201 18:00:43.197964 1 node.go:75] Finished syncing node event "node-1/mem_usage_max_avg_1d" (22.784592ms)
I0201 18:00:53.119482 1 node.go:75] Finished syncing node event "node-2/cpu_usage_avg_5m" (2.02963ms)
W0201 18:00:53.119507 1 node.go:61] failed to sync this node ["node-2/cpu_usage_avg_5m"]: can not annotate node[node-2]: failed to get data cpu_usage_avg_5m{node-2=}:
I0201 18:00:53.120460 1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (939.612µs)
W0201 18:00:53.120483 1 node.go:61] failed to sync this node ["master/cpu_usage_avg_5m"]: can not annotate node[master]: failed to get data cpu_usage_avg_5m{master=}:
给Node添加负载注解时,先通过 InternalIP 查询,如果返回空则通过 Node Name查询指标
I replace the k8s scheduler with crane scheduler, and then created a new pod. I found the new pod always “Pending”, also no related events info.
... ...
QoS Class: Burstable
Node-Selectors:
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
I can only found some useful info in logs, as as follows:
I0614 02:18:42.492977 1 eventhandlers.go:118] "Add event for unscheduled pod" pod="kube-system/kubernetes-dashboard-jqhhq"
I wonder if the new pod is pop from the ‘SchedulingQueue’, and how I solved the problem
如图,分数最高的主机为1.250,但是实际pod却部署到了0.21机器里,求解?
Provide release for kubernetes 1.18 support.
helm安装Crane-scheduler 作为第二个调度器,使用官网示例测试pod没有被调度,一直卡在”Pending“状态:
1、部署yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpu-stress
spec:
selector:
matchLabels:
app: cpu-stress
replicas: 1
template:
metadata:
labels:
app: cpu-stress
spec:
schedulerName: crane-scheduler
hostNetwork: true
tolerations:
- key: node.kubernetes.io/network-unavailable
operator: Exists
effect: NoSchedule
containers:
- name: stress
image: docker.io/gocrane/stress:latest
command: ["stress", "-c", "1"]
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "1"
2、pod详情:
Name: cpu-stress-cc8656b6c-b5hhz
Namespace: default
Priority: 0
Node:
Labels: app=cpu-stress
pod-template-hash=cc8656b6c
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/cpu-stress-cc8656b6c
Containers:
stress:
Image: docker.io/gocrane/stress:latest
Port:
Host Port:
Command:
stress
-c
1
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 1
memory: 1Gi
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9nwd5 (ro)
Volumes:
kube-api-access-9nwd5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors:
Tolerations: node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
3、crane-scheduler日志:
I0824 00:50:47.247851 1 serving.go:331] Generated self-signed cert in-memory
W0824 00:50:48.025758 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work.
W0824 00:50:48.073470 1 authorization.go:47] Authorization is disabled
W0824 00:50:48.073495 1 authentication.go:40] Authentication is disabled
I0824 00:50:48.073517 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0824 00:50:48.080823 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0824 00:50:48.080862 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0824 00:50:48.080915 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.080927 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.080957 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.080968 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.081199 1 secure_serving.go:197] Serving securely on [::]:10259
I0824 00:50:48.081270 1 tlsconfig.go:240] Starting DynamicServingCertificateController
W0824 00:50:48.091287 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 00:50:48.146624 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
I0824 00:50:48.182865 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0824 00:50:48.183903 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.184059 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.284088 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...
W0824 00:57:30.128689 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:02:45.130884 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:08:48.133483 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:14:31.135801 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:20:24.138959 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:30:10.141873 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
4、crane-scheduler-controlle日志:
I0824 08:46:16.647776 1 server.go:61] Starting Controller version v0.0.0-master+$Format:%H$
I0824 08:46:16.648237 1 leaderelection.go:248] attempting to acquire leader lease crane-system/crane-scheduler-controller...
I0824 08:46:16.706891 1 leaderelection.go:258] successfully acquired lease crane-system/crane-scheduler-controller
I0824 08:46:16.807546 1 controller.go:72] Caches are synced for controller
I0824 08:46:16.807631 1 node.go:46] Start to reconcile node events
I0824 08:46:16.807653 1 event.go:30] Start to reconcile EVENT events
I0824 08:46:16.885698 1 node.go:75] Finished syncing node event "node6/cpu_usage_avg_5m" (77.952416ms)
I0824 08:46:16.973162 1 node.go:75] Finished syncing node event "node4/cpu_usage_avg_5m" (87.371252ms)
I0824 08:46:17.045250 1 node.go:75] Finished syncing node event "master2/cpu_usage_avg_5m" (72.023298ms)
I0824 08:46:17.109260 1 node.go:75] Finished syncing node event "master3/cpu_usage_avg_5m" (63.673389ms)
I0824 08:46:17.192332 1 node.go:75] Finished syncing node event "node1/cpu_usage_avg_5m" (83.005155ms)
I0824 08:46:17.529495 1 node.go:75] Finished syncing node event "node2/cpu_usage_avg_5m" (337.099052ms)
I0824 08:46:17.927163 1 node.go:75] Finished syncing node event "node3/cpu_usage_avg_5m" (397.603044ms)
I0824 08:46:18.327978 1 node.go:75] Finished syncing node event "node5/cpu_usage_avg_5m" (400.749476ms)
I0824 08:46:18.746391 1 node.go:75] Finished syncing node event "master1/cpu_usage_avg_5m" (418.360885ms)
I0824 08:46:19.129081 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1h" (382.635495ms)
I0824 08:46:19.524508 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1h" (395.361539ms)
I0824 08:46:19.948035 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1h" (423.453672ms)
I0824 08:46:20.332014 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1h" (383.909395ms)
I0824 08:46:20.737296 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1h" (405.102002ms)
I0824 08:46:21.245055 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1h" (507.697871ms)
I0824 08:46:21.573490 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1h" (328.368489ms)
I0824 08:46:21.937814 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1h" (364.254837ms)
I0824 08:46:22.335988 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1h" (397.952357ms)
I0824 08:46:22.724851 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1d" (388.771915ms)
I0824 08:46:23.126059 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1d" (401.156708ms)
I0824 08:46:23.528329 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1d" (402.208827ms)
I0824 08:46:23.937560 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1d" (409.165081ms)
I0824 08:46:24.331730 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1d" (394.024206ms)
I0824 08:46:24.730137 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1d" (398.33551ms)
I0824 08:46:25.127074 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1d" (396.798913ms)
I0824 08:46:25.528844 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1d" (401.701104ms)
I0824 08:46:25.932684 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1d" (403.762529ms)
I0824 08:46:26.330458 1 node.go:75] Finished syncing node event "node4/mem_usage_avg_5m" (397.710372ms)
I0824 08:46:26.736576 1 node.go:75] Finished syncing node event "master2/mem_usage_avg_5m" (406.060927ms)
crane-scheduler-controller版本:0.0.23
craned版本:0.5.0
k8s版本:1.21.10
docker版本:19.3.14
系统版本:Ubuntu 20.04.3 LTS
scheduler pod日志:
E0920 09:23:09.244915 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:23:09.244970 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:24:33.207711 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:24:33.207754 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:25:10.865375 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:25:10.865408 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:26:07.905010 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:26:07.905083 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:27:33.211667 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:27:33.211720 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:28:33.213730 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:28:33.213767 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:29:33.214463 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:29:33.214499 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
E0920 09:30:33.215495 1 scheduler.go:379] scheduler cache AssumePod failed: pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed
E0920 09:30:33.215532 1 factory.go:338] "Error scheduling pod; retrying" err="pod 3ed0ea4b-407f-427e-a92d-1c1d2adbc55c is in the cache, so can't be assumed" pod="testpods-test/testpods-test-test-pods-65494bf66c-c8k6t"
您好,我部署 0.0.20 版本的时候提示没有找到镜像,请问是我部署的过程有问题还是什么问题呢?
k8s 版本:v1.21
helm 版本:v3.3.3
部署流程:
将项目克隆下来,使用 kubectl.exe apply -f rbac.yaml 部署成功,在 k8s 服务器上使用下面命令修改版本失败
KUBE_EDITOR="sed -i 's/v1beta2/v1beta1/g'" kubectl edit cm scheduler-config -n crane-system && KUBE_EDITOR="sed -i 's/0.0.23/0.0.20/g'" kubectl edit deploy crane-scheduler -n crane-system
直接修改 yaml 文件部署,部署后提示没有找到镜像
将 git\crane-scheduler\deploy\manifests\scheduler-config.yaml 中的 v1beta2 修改为 v1beta1
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
leaderElection:
......
将 git\crane-scheduler\deploy\controller\deployment.yaml 中的 0.0.23 修改为 0.0.20
......
command:
- /controller
- --policy-config-path=/data/policy.yaml
- --prometheus-address=PROMETHEUS_ADDRESS
image: docker.io/gocrane/crane-scheduler-controller:0.0.20
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
name: dynamic-scheduler-policy
......
Func NewPodTopologyCache
responses to build a common cache for plugin NodeResourceTopologyMatch
, and it will be called in plugin's New func as following:
func New(args runtime.Object, handle framework.Handle) (framework.Plugin, error) {
...
topologyMatch := &TopologyMatch{
// here initializing the cache
PodTopologyCache: NewPodTopologyCache(ctx, 30*time.Minute),
handle: handle,
lister: lister,
topologyAwareResources: sets.NewString(cfg.TopologyAwareResources...),
}
return topologyMatch, nil
}
Then, once plugin NodeResourceTopologyMatch
appears in multi profiles
for one scheduler app, then the plugin will be initialized many times, which means the upper func New
will be triggered more than once.
Then, the most important thing is multi PodTopologyCache shows in one scheduler app. Is there any potential risks in this situation(e.g. data race)?
@Garrybest @qmhu PTAL, thanks
Events:
Type Reason Age From Message
Normal Scheduled 5m27s default-scheduler Successfully assigned crane-system/crane-scheduler-controller-5c85f47c45-trmzp to 192.168.227.164
Normal Pulled 5m37s kubelet Container image "docker.io/gocrane/crane-scheduler-controller:0.0.23" already present on machine
Normal Created 5m37s kubelet Created container crane-scheduler-controller
Normal Started 5m36s kubelet Started container crane-scheduler-controller
Warning Unhealthy 32s (x31 over 5m32s) kubelet Readiness probe failed: Get "http://10.244.27.203:8090/healthz": dial tcp 10.244.27.203:8090: connect: connection refused
readness和liveness都失败,后面直接注释掉才正常启动。请修复此问题
E1018 09:42:10.621700 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=mem_usage_max_avg_1h, float64=33.699000000000005)
E1018 09:42:10.621708 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=mem_usage_max_avg_1d, float64=33.699000000000005)
I1018 09:42:10.621717 1 plugins.go:92] [crane] Node[zcsnode2]'s finalscore is 6, while score is 16 and hotvalue is 1.000000
E1018 09:48:25.615198 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=cpu_usage_max_avg_1d, float64=45.77980000000001)
E1018 09:48:25.615301 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=mem_usage_max_avg_1d, float64=71.2381)
I1018 09:48:25.615339 1 plugins.go:92] [crane] Node[zcsmaster1]'s finalscore is 35, while score is 35 and hotvalue is 0.000000
E1018 09:48:25.615397 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=cpu_usage_max_avg_1d, float64=47.9795)
E1018 09:48:25.615417 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=mem_usage_max_avg_1d, float64=75.73570000000001)
I1018 09:48:25.615424 1 plugins.go:92] [crane] Node[zcsnode1]'s finalscore is 37, while score is 37 and hotvalue is 0.000000
E1018 09:48:25.615447 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=cpu_usage_max_avg_1d, float64=47.5513)
E1018 09:48:25.615461 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=mem_usage_max_avg_1d, float64=83.1259)
I1018 09:48:25.615468 1 plugins.go:92] [crane] Node[zcsnode2]'s finalscore is 41, while score is 41 and hotvalue is 0.000000
E1018 09:48:56.352200 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=cpu_usage_max_avg_1d, float64=45.77980000000001)
E1018 09:48:56.352275 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=mem_usage_max_avg_1d, float64=71.2381)
I1018 09:48:56.352287 1 plugins.go:92] [crane] Node[zcsmaster1]'s finalscore is 35, while score is 35 and hotvalue is 0.000000
E1018 09:48:56.352346 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=cpu_usage_max_avg_1d, float64=47.9795)
E1018 09:48:56.352368 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=mem_usage_max_avg_1d, float64=75.73570000000001)
I1018 09:48:56.352379 1 plugins.go:92] [crane] Node[zcsnode1]'s finalscore is 37, while score is 37 and hotvalue is 0.000000
E1018 09:48:56.352415 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=cpu_usage_max_avg_1d, float64=47.5513)
E1018 09:48:56.352455 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=mem_usage_max_avg_1d, float64=83.1259)
I1018 09:48:56.352466 1 plugins.go:92] [crane] Node[zcsnode2]'s finalscore is 41, while score is 41 and hotvalue is 0.000000
E1018 09:51:48.506156 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=cpu_usage_max_avg_1d, float64=45.854200000000006)
E1018 09:51:48.506282 1 stats.go:128] [crane] failed to get node 's score: zcsmaster1%!(EXTRA string=mem_usage_max_avg_1d, float64=71.34190000000001)
I1018 09:51:48.506296 1 plugins.go:92] [crane] Node[zcsmaster1]'s finalscore is 35, while score is 35 and hotvalue is 0.000000
E1018 09:51:48.506329 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=cpu_usage_max_avg_1d, float64=48.017900000000004)
E1018 09:51:48.506357 1 stats.go:128] [crane] failed to get node 's score: zcsnode1%!(EXTRA string=mem_usage_max_avg_1d, float64=75.80170000000001)
I1018 09:51:48.506364 1 plugins.go:92] [crane] Node[zcsnode1]'s finalscore is 37, while score is 37 and hotvalue is 0.000000
E1018 09:51:48.506390 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=cpu_usage_max_avg_1d, float64=47.545500000000004)
E1018 09:51:48.506408 1 stats.go:128] [crane] failed to get node 's score: zcsnode2%!(EXTRA string=mem_usage_max_avg_1d, float64=83.0675)
I1018 09:51:48.506416 1 plugins.go:92] [crane] Node[zcsnode2]'s finalscore is 41, while score is 41 and hotvalue is 0.000000
helm chart中的templates/scheduler-deployment.yaml 语法错误,if格式修复如下
containers:
- command:
- /scheduler
- --leader-elect=false
- --config=/etc/kubernetes/kube-scheduler/scheduler-config.yaml
{{- if ge .Capabilities.KubeVersion.Minor "22" }}
image: "{{ .Values.scheduler.image.repository }}:0.0.23"
{{- else }}
image: "{{ .Values.scheduler.image.repository }}:0.0.20"
{{- end }}
applying score defaultWeights on Score plugins: plugin "Dynamic" returns an invalid score -8, it should in the range of [0, 100] after normalizing
pod can be scheduled successfully
This problem occurs when the prometheus result times out, but the hotValue is normal.
目前看文档资料比较少
crane-scheduler-controller版本:0.0.23
craned版本:0.5.0
k8s版本:1.21.10
docker版本:19.3.14
系统版本:Ubuntu 20.04.3 LTS
手动调用prometheus api接口可以获取到对应的指标
curl -g http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query?query=cpu_usage_avg_5m
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"name":"cpu_usage_avg_5m","instance":"ceph-01"},"value":[1656488784.456,"2.7104166665715894"]},{"metric":{"name":"cpu_usage_avg_5m","instance":"ceph-02"},"value":[1656488784.456,"1.9583333333351618"]},{"metric":{"name":"cpu_usage_avg_5m","instance":"ceph-03"},"value":[1656488784.456,"2.6000000000931323"]},{"metric":{"name":"cpu_usage_avg_5m","instance":"node-01"},"value":[1656488784.456,"4.0291666666841195"]},{"metric":{"name":"cpu_usage_avg_5m","instance":"node-04"},"value":[1656488784.456,"6.870833333426461"]},{"metric":{"name":"cpu_usage_avg_5m","instance":"ykj"},"value":[1656488784.456,"5.891666666672492"]}]}}/ #
curl -g http://prometheus-k8s.monitoring.svc.cluster.local:9090/api/v1/query?query=mem_usage_avg_5m
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"name":"mem_usage_avg_5m","instance":"ceph-01","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-sn9lp"},"value":[1656488826.549,"32.75862684328356"]},{"metric":{"name":"mem_usage_avg_5m","instance":"ceph-02","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-dgd54"},"value":[1656488826.549,"15.044355868789062"]},{"metric":{"name":"mem_usage_avg_5m","instance":"ceph-03","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-td7k2"},"value":[1656488826.549,"34.21244570563606"]},{"metric":{"name":"mem_usage_avg_5m","instance":"node-01","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-zzxmd"},"value":[1656488826.549,"57.21168005976536"]},{"metric":{"name":"mem_usage_avg_5m","instance":"node-04","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-2zkgk"},"value":[1656488826.549,"72.4792896090607"]},{"metric":{"name":"mem_usage_avg_5m","instance":"ykj","job":"node-exporter","namespace":"monitoring","pod":"node-exporter-xfq4n"},"value":[1656488826./
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.