I created a pod with vcuda, the pod had been running successfully. But I noticed the gpu-admission component have some error log.
I0810 16:56:47.809528 11350 util.go:53] Determine if the pod tts1 needs GPU resource
I0810 16:56:47.809971 11350 gpu_predicate.go:379] Quota for namespace default is {Quota:map[] Pool:[]}
I0810 16:56:47.810000 11350 gpu_predicate.go:353] No GPU quota limit for default
I0810 16:56:47.810043 11350 nodeInfo.go:39] debug: NewNodeInfo() creates nodeInfo for turing-02-no.01.novalocal
I0810 16:56:47.810131 11350 util.go:71] Determine if the container nvidia needs GPU resource
I0810 16:56:47.810145 11350 share.go:58] Pick up 0 , cores: 100, memory: 29
I0810 16:56:47.815362 11350 routes.go:71] GPUQuotaPredicate: ExtenderArgs = {Pod:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:tts1,GenerateName:,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/tts1,UID:bdb78144-4ee9-4301-b071-9e31abb3cd41,ResourceVersion:674848,Generation:0,CreationTimestamp:2020-08-10 16:56:47 +0800 CST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{tencent.com/vcuda-core-limit: 50,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[{kubectl Update v1 2020-08-10 16:56:47 +0800 CST nil}],},Spec:PodSpec{Volumes:[{test {HostPathVolumeSource{Path:/data/shengxu8/vcuda/xtts2.0.1029.P4.1,Type:*Directory,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-rl2vc {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-rl2vc,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{nvidia nvidia/cuda:9.1-devel-centos7 [./start.sh] [] /tts [] [] [{LOGGER_LEVEL 5 nil}] {map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}] map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}]} [{test false /tts <nil> } {default-token-rl2vc true /var/run/secrets/kubernetes.io/serviceaccount <nil> }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:true,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],WindowsOptions:nil,},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc000408480} {node.kubernetes.io/unreachable Exists NoExecute 0xc0004084a0}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} Nodes:&NodeList{ListMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Items:[{{ } {turing-02-no.01.novalocal /api/v1/nodes/turing-02-no.01.novalocal 93b77b65-b112-4d36-ab2c-0e34efe8d097 674660 0 2020-08-06 15:12:19 +0800 CST <nil> <nil> map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:turing-02-no.01.novalocal kubernetes.io/os:linux node-role.kubernetes.io/master: nvidia-device-enable:enable] map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 projectcalico.org/IPv4Address:172.31.236.28/24 projectcalico.org/IPv4IPIPTunnelAddr:100.77.205.64 volumes.kubernetes.io/controller-managed-attach-detach:true] [] nil [] [{kubeadm Update v1 2020-08-06 15:12:22 +0800 CST nil} {kubectl Update v1 2020-08-06 15:31:43 +0800 CST nil} {calico-node Update v1 2020-08-07 10:17:19 +0800 CST nil} {kube-controller-manager Update v1 2020-08-10 16:54:34 +0800 CST nil} {kubelet Update v1 2020-08-10 16:55:04 +0800 CST nil}]} {100.64.0.0/24 false [] nil } {map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{53465821184 0} {<nil>} 52212716Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>} BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{52392079360 0} {<nil>} 51164140Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>} BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] [{NetworkUnavailable False 2020-08-07 10:17:19 +0800 CST 2020-08-07 10:17:19 +0800 CST CalicoIsUp Calico is running on this node} {MemoryPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2020-08-10 16:55:04 +0800 CST 2020-08-10 16:54:35 +0800 CST KubeletReady kubelet is posting ready status}] [{InternalIP 172.31.236.28} {Hostname turing-02-no.01.novalocal}] {{10250}} {da54aa58da54aa58da54aa58da54aa58 587DF4B2-1B95-4717-9775-F696ED79D950 06b1a58b-09b9-4068-9777-7dce4cfea189 3.10.0-957.el7.x86_64 CentOS Linux 7 (Core) docker://18.9.6 v1.18.0 v1.18.0 linux amd64} [{[hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13 hub.iflytek.com/turing/ssd_face:0.1] 5654144739} {[hub.iflytek.com/turing/gaea:1.0.4] 5636811356} {[am_train:1.0] 4440398750} {[honk:1.0] 4107130912} {[172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521 172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0] 3789072257} {[nvidia/cuda:10.1-cudnn7-devel-centos7] 3519393408} {[reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1] 3498923930} {[registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime] 3140164483} {[registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91 registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4] 3086599556} {[nvidia/cuda:10.1-devel-centos7] 2686646338} {[nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8 nvidia/cuda:9.1-devel-centos7] 1834703805} {[tkestack/gpu-manager:1.1.0] 437779384} {[k8s.gcr.io/etcd:3.4.3-0] 288425539} {[centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6 centos:7.6.1810] 201756323} {[calico/node:v3.8.2] 188832890} {[k8s.gcr.io/kube-apiserver:v1.18.0] 172962942} {[k8s.gcr.io/kube-controller-manager:v1.18.0] 162366590} {[calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90 calico/cni:v3.8.2] 157232722} {[k8s.gcr.io/kube-proxy:v1.18.0] 116531001} {[k8s.gcr.io/kube-scheduler:v1.18.0] 95274110} {[calico/kube-controllers:v3.8.2] 46809393} {[k8s.gcr.io/coredns:1.6.7] 43781501} {[calico/pod2daemon-flexvol:v3.8.2] 9366797} {[k8s.gcr.io/pause:3.2] 682696}] [] [] nil}}],} NodeNames:<nil>}
I0810 16:56:47.816226 11350 routes.go:81] GPUQuotaPredicate: extenderFilterResult = {"Nodes":{"metadata":{},"items":[{"metadata":{"name":"turing-02-no.01.novalocal","selfLink":"/api/v1/nodes/turing-02-no.01.novalocal","uid":"93b77b65-b112-4d36-ab2c-0e34efe8d097","resourceVersion":"674660","creationTimestamp":"2020-08-06T07:12:19Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"turing-02-no.01.novalocal","kubernetes.io/os":"linux","node-role.kubernetes.io/master":"","nvidia-device-enable":"enable"},"annotations":{"kubeadm.alpha.kubernetes.io/cri-socket":"/var/run/dockershim.sock","node.alpha.kubernetes.io/ttl":"0","projectcalico.org/IPv4Address":"172.31.236.28/24","projectcalico.org/IPv4IPIPTunnelAddr":"100.77.205.64","volumes.kubernetes.io/controller-managed-attach-detach":"true"},"managedFields":[{"manager":"kubeadm","operation":"Update","apiVersion":"v1","time":"2020-08-06T07:12:22Z"},{"manager":"kubectl","operation":"Update","apiVersion":"v1","time":"2020-08-06T07:31:43Z"},{"manager":"calico-node","operation":"Update","apiVersion":"v1","time":"2020-08-07T02:17:19Z"},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2020-08-10T08:54:34Z"},{"manager":"kubelet","operation":"Update","apiVersion":"v1","time":"2020-08-10T08:55:04Z"}]},"spec":{"podCIDR":"100.64.0.0/24"},"status":{"capacity":{"cpu":"32","ephemeral-storage":"52212716Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"263972712Ki","pods":"110","tencent.com/vcuda-core":"100","tencent.com/vcuda-memory":"29"},"allocatable":{"cpu":"32","ephemeral-storage":"51164140Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"263972712Ki","pods":"110","tencent.com/vcuda-core":"100","tencent.com/vcuda-memory":"29"},"conditions":[{"type":"NetworkUnavailable","status":"False","lastHeartbeatTime":"2020-08-07T02:17:19Z","lastTransitionTime":"2020-08-07T02:17:19Z","reason":"CalicoIsUp","message":"Calico is running on this node"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-10T08:54:35Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"InternalIP","address":"172.31.236.28"},{"type":"Hostname","address":"turing-02-no.01.novalocal"}],"daemonEndpoints":{"kubeletEndpoint":{"Port":10250}},"nodeInfo":{"machineID":"da54aa58da54aa58da54aa58da54aa58","systemUUID":"587DF4B2-1B95-4717-9775-F696ED79D950","bootID":"06b1a58b-09b9-4068-9777-7dce4cfea189","kernelVersion":"3.10.0-957.el7.x86_64","osImage":"CentOS Linux 7 (Core)","containerRuntimeVersion":"docker://18.9.6","kubeletVersion":"v1.18.0","kubeProxyVersion":"v1.18.0","operatingSystem":"linux","architecture":"amd64"},"images":[{"names":["hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13","hub.iflytek.com/turing/ssd_face:0.1"],"sizeBytes":5654144739},{"names":["hub.iflytek.com/turing/gaea:1.0.4"],"sizeBytes":5636811356},{"names":["am_train:1.0"],"sizeBytes":4440398750},{"names":["honk:1.0"],"sizeBytes":4107130912},{"names":["172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521","172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0"],"sizeBytes":3789072257},{"names":["nvidia/cuda:10.1-cudnn7-devel-centos7"],"sizeBytes":3519393408},{"names":["reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1"],"sizeBytes":3498923930},{"names":["registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b","registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime"],"sizeBytes":3140164483},{"names":["registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91","registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4"],"sizeBytes":3086599556},{"names":["nvidia/cuda:10.1-devel-centos7"],"sizeBytes":2686646338},{"names":["nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8","nvidia/cuda:9.1-devel-centos7"],"sizeBytes":1834703805},{"names":["tkestack/gpu-manager:1.1.0"],"sizeBytes":437779384},{"names":["k8s.gcr.io/etcd:3.4.3-0"],"sizeBytes":288425539},{"names":["centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6","centos:7.6.1810"],"sizeBytes":201756323},{"names":["calico/node:v3.8.2"],"sizeBytes":188832890},{"names":["k8s.gcr.io/kube-apiserver:v1.18.0"],"sizeBytes":172962942},{"names":["k8s.gcr.io/kube-controller-manager:v1.18.0"],"sizeBytes":162366590},{"names":["calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90","calico/cni:v3.8.2"],"sizeBytes":157232722},{"names":["k8s.gcr.io/kube-proxy:v1.18.0"],"sizeBytes":116531001},{"names":["k8s.gcr.io/kube-scheduler:v1.18.0"],"sizeBytes":95274110},{"names":["calico/kube-controllers:v3.8.2"],"sizeBytes":46809393},{"names":["k8s.gcr.io/coredns:1.6.7"],"sizeBytes":43781501},{"names":["calico/pod2daemon-flexvol:v3.8.2"],"sizeBytes":9366797},{"names":["k8s.gcr.io/pause:3.2"],"sizeBytes":682696}]}}]},"NodeNames":null,"FailedNodes":{},"Error":""}
I0810 16:56:47.818814 11350 util.go:53] Determine if the pod tts1 needs GPU resource
I0810 16:56:47.819235 11350 gpu_predicate.go:379] Quota for namespace default is {Quota:map[] Pool:[]}
I0810 16:56:47.819251 11350 gpu_predicate.go:353] No GPU quota limit for default
I0810 16:56:47.819261 11350 routes.go:71] GPUQuotaPredicate: ExtenderArgs = {Pod:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:tts1,GenerateName:,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/tts1,UID:bdb78144-4ee9-4301-b071-9e31abb3cd41,ResourceVersion:674849,Generation:0,CreationTimestamp:2020-08-10 16:56:47 +0800 CST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{tencent.com/gpu-assigned: false,tencent.com/predicate-gpu-idx-0: 0,tencent.com/predicate-node: turing-02-no.01.novalocal,tencent.com/predicate-time: 1597049807810150433,tencent.com/vcuda-core-limit: 50,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[{gpu-admission Update v1 2020-08-10 16:56:47 +0800 CST nil} {kubectl Update v1 2020-08-10 16:56:47 +0800 CST nil}],},Spec:PodSpec{Volumes:[{test {HostPathVolumeSource{Path:/data/shengxu8/vcuda/xtts2.0.1029.P4.1,Type:*Directory,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-rl2vc {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-rl2vc,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{nvidia nvidia/cuda:9.1-devel-centos7 [./start.sh] [] /tts [] [] [{LOGGER_LEVEL 5 nil}] {map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}] map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}]} [{test false /tts <nil> } {default-token-rl2vc true /var/run/secrets/kubernetes.io/serviceaccount <nil> }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:true,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],WindowsOptions:nil,},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc00060c4e0} {node.kubernetes.io/unreachable Exists NoExecute 0xc00060c500}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} Nodes:&NodeList{ListMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Items:[{{ } {turing-02-no.01.novalocal /api/v1/nodes/turing-02-no.01.novalocal 93b77b65-b112-4d36-ab2c-0e34efe8d097 674660 0 2020-08-06 15:12:19 +0800 CST <nil> <nil> map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:turing-02-no.01.novalocal kubernetes.io/os:linux node-role.kubernetes.io/master: nvidia-device-enable:enable] map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 projectcalico.org/IPv4Address:172.31.236.28/24 projectcalico.org/IPv4IPIPTunnelAddr:100.77.205.64 volumes.kubernetes.io/controller-managed-attach-detach:true] [] nil [] [{kubeadm Update v1 2020-08-06 15:12:22 +0800 CST nil} {kubectl Update v1 2020-08-06 15:31:43 +0800 CST nil} {calico-node Update v1 2020-08-07 10:17:19 +0800 CST nil} {kube-controller-manager Update v1 2020-08-10 16:54:34 +0800 CST nil} {kubelet Update v1 2020-08-10 16:55:04 +0800 CST nil}]} {100.64.0.0/24 false [] nil } {map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{53465821184 0} {<nil>} 52212716Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>} BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{52392079360 0} {<nil>} 51164140Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>} BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] [{NetworkUnavailable False 2020-08-07 10:17:19 +0800 CST 2020-08-07 10:17:19 +0800 CST CalicoIsUp Calico is running on this node} {MemoryPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2020-08-10 16:55:04 +0800 CST 2020-08-10 16:54:35 +0800 CST KubeletReady kubelet is posting ready status}] [{InternalIP 172.31.236.28} {Hostname turing-02-no.01.novalocal}] {{10250}} {da54aa58da54aa58da54aa58da54aa58 587DF4B2-1B95-4717-9775-F696ED79D950 06b1a58b-09b9-4068-9777-7dce4cfea189 3.10.0-957.el7.x86_64 CentOS Linux 7 (Core) docker://18.9.6 v1.18.0 v1.18.0 linux amd64} [{[hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13 hub.iflytek.com/turing/ssd_face:0.1] 5654144739} {[hub.iflytek.com/turing/gaea:1.0.4] 5636811356} {[am_train:1.0] 4440398750} {[honk:1.0] 4107130912} {[172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521 172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0] 3789072257} {[nvidia/cuda:10.1-cudnn7-devel-centos7] 3519393408} {[reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1] 3498923930} {[registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime] 3140164483} {[registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91 registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4] 3086599556} {[nvidia/cuda:10.1-devel-centos7] 2686646338} {[nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8 nvidia/cuda:9.1-devel-centos7] 1834703805} {[tkestack/gpu-manager:1.1.0] 437779384} {[k8s.gcr.io/etcd:3.4.3-0] 288425539} {[centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6 centos:7.6.1810] 201756323} {[calico/node:v3.8.2] 188832890} {[k8s.gcr.io/kube-apiserver:v1.18.0] 172962942} {[k8s.gcr.io/kube-controller-manager:v1.18.0] 162366590} {[calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90 calico/cni:v3.8.2] 157232722} {[k8s.gcr.io/kube-proxy:v1.18.0] 116531001} {[k8s.gcr.io/kube-scheduler:v1.18.0] 95274110} {[calico/kube-controllers:v3.8.2] 46809393} {[k8s.gcr.io/coredns:1.6.7] 43781501} {[calico/pod2daemon-flexvol:v3.8.2] 9366797} {[k8s.gcr.io/pause:3.2] 682696}] [] [] nil}}],} NodeNames:<nil>}
I0810 16:56:47.819866 11350 routes.go:81] GPUQuotaPredicate: extenderFilterResult = {"Nodes":null,"NodeNames":null,"FailedNodes":null,"Error":"pod tts1 had been predicated!"}
And the pod also have the warnning event.
Why the pod display FailedScheduling, but it's running. Is there anything I misunderstanding?