Giter VIP home page Giter VIP logo

gpu-admission's Introduction

GPU admission

It is a scheduler extender for GPU admission. It provides the following features:

  • provides quota limitation according to GPU device type
  • avoids fragment allocation of node by working with gpu-manager

For more details, please refer to the documents in docs directory in this project

1. Build

$ make build

2. Run

2.1 Run gpu-admission.

$ bin/gpu-admission --address=127.0.0.1:3456 --v=4 --kubeconfig <your kubeconfig> --logtostderr=true

Other options

      --address string                   The address it will listen (default "127.0.0.1:3456")
      --alsologtostderr                  log to standard error as well as files
      --kubeconfig string                Path to a kubeconfig. Only required if out-of-cluster.
      --log-backtrace-at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log-dir string                   If non-empty, write log files in this directory
      --log-flush-frequency duration     Maximum number of seconds between log flushes (default 5s)
      --logtostderr                      log to standard error instead of files (default true)
      --master string                    The address of the Kubernetes API server. Overrides any value in kubeconfig. Only required if out-of-cluster.
      --pprofAddress string              The address for debug (default "127.0.0.1:3457")
      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
  -v, --v Level                          number for the log level verbosity
      --version version[=true]           Print version information and quit
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging

2.2 Configure kube-scheduler policy file, and run a kubernetes cluster.

Example for scheduler-policy-config.json:

{
  "kind": "Policy",
  "apiVersion": "v1",
  "predicates": [
    {
      "name": "PodFitsHostPorts"
    },
    {
      "name": "PodFitsResources"
    },
    {
      "name": "NoDiskConflict"
    },
    {
      "name": "MatchNodeSelector"
    },
    {
      "name": "HostName"
    }
  ],
  "extenders": [
    {
      "urlPrefix": "http://<gpu-admission ip>:<gpu-admission port>/scheduler",
      "apiVersion": "v1beta1",
      "filterVerb": "predicates",
      "enableHttps": false,
      "nodeCacheCapable": false
    }
  ],
  "hardPodAffinitySymmetricWeight": 10,
  "alwaysCheckAllPredicates": false
}

Do not forget to add config for scheduler: --policy-config-file=XXX --use-legacy-policy-config=true. Keep this extender as the last one of all scheduler extenders.

gpu-admission's People

Contributors

genedna avatar lixiaocheng18 avatar mymneo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpu-admission's Issues

在有 3 个 gpu 的节点出现调度失败的情况

想要实现 https://ieeexplore.ieee.org/abstract/document/8672318 这边文章中的内容。部署成功后发现其中一台有四卡的节点调度是没有问题的,但是另外一台只有三卡的机器会出现 Pending 的情况,也就是明明有资源但是 scheduler 说:

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/4 nodes are available: 1 Insufficient tencent.com/vcuda-core, 3 node(s) didn't match node selector.

这里是 kubectl describe node 的结果:

image

image

可以看到是只有两块卡被调度了。

之所以是有三块卡是因为有一块卡出了问题,把它屏蔽了。然后这种调度失败可以通过强制 kube-scheduler 重启的方式一定程度上解决,重启之后一般会正常一下,但是后面还会出类似的问题。

在有三块卡的机器上执行 nvidia-smi topo -mp 结果如下:

	GPU0	GPU1	GPU2	CPU Affinity
GPU0	 X 	SYS	SYS	0-11
GPU1	SYS	 X 	PIX	0-11
GPU2	SYS	PIX	 X 	0-11

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  • k8s 版本:1.17.3
  • nvidia driver 版本:440.59

where docs?

I want to know more details,but do not find docs, so where is the documents in this project?

pod xxx had been predicated!

I created a pod with vcuda, the pod had been running successfully. But I noticed the gpu-admission component have some error log.

I0810 16:56:47.809528   11350 util.go:53] Determine if the pod tts1 needs GPU resource
I0810 16:56:47.809971   11350 gpu_predicate.go:379] Quota for namespace default is {Quota:map[] Pool:[]}
I0810 16:56:47.810000   11350 gpu_predicate.go:353] No GPU quota limit for default
I0810 16:56:47.810043   11350 nodeInfo.go:39] debug: NewNodeInfo() creates nodeInfo for turing-02-no.01.novalocal
I0810 16:56:47.810131   11350 util.go:71] Determine if the container nvidia needs GPU resource
I0810 16:56:47.810145   11350 share.go:58] Pick up 0 , cores: 100, memory: 29
I0810 16:56:47.815362   11350 routes.go:71] GPUQuotaPredicate: ExtenderArgs = {Pod:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:tts1,GenerateName:,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/tts1,UID:bdb78144-4ee9-4301-b071-9e31abb3cd41,ResourceVersion:674848,Generation:0,CreationTimestamp:2020-08-10 16:56:47 +0800 CST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{tencent.com/vcuda-core-limit: 50,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[{kubectl Update v1 2020-08-10 16:56:47 +0800 CST nil}],},Spec:PodSpec{Volumes:[{test {HostPathVolumeSource{Path:/data/shengxu8/vcuda/xtts2.0.1029.P4.1,Type:*Directory,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-rl2vc {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-rl2vc,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{nvidia nvidia/cuda:9.1-devel-centos7 [./start.sh] [] /tts [] [] [{LOGGER_LEVEL 5 nil}] {map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}] map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}]} [{test false /tts  <nil> } {default-token-rl2vc true /var/run/secrets/kubernetes.io/serviceaccount  <nil> }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:true,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],WindowsOptions:nil,},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists  NoExecute 0xc000408480} {node.kubernetes.io/unreachable Exists  NoExecute 0xc0004084a0}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} Nodes:&NodeList{ListMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Items:[{{ } {turing-02-no.01.novalocal   /api/v1/nodes/turing-02-no.01.novalocal 93b77b65-b112-4d36-ab2c-0e34efe8d097 674660 0 2020-08-06 15:12:19 +0800 CST <nil> <nil> map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:turing-02-no.01.novalocal kubernetes.io/os:linux node-role.kubernetes.io/master: nvidia-device-enable:enable] map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 projectcalico.org/IPv4Address:172.31.236.28/24 projectcalico.org/IPv4IPIPTunnelAddr:100.77.205.64 volumes.kubernetes.io/controller-managed-attach-detach:true] [] nil []  [{kubeadm Update v1 2020-08-06 15:12:22 +0800 CST nil} {kubectl Update v1 2020-08-06 15:31:43 +0800 CST nil} {calico-node Update v1 2020-08-07 10:17:19 +0800 CST nil} {kube-controller-manager Update v1 2020-08-10 16:54:34 +0800 CST nil} {kubelet Update v1 2020-08-10 16:55:04 +0800 CST nil}]} {100.64.0.0/24  false [] nil } {map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{53465821184 0} {<nil>} 52212716Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>}  BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{52392079360 0} {<nil>} 51164140Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>}  BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}]  [{NetworkUnavailable False 2020-08-07 10:17:19 +0800 CST 2020-08-07 10:17:19 +0800 CST CalicoIsUp Calico is running on this node} {MemoryPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2020-08-10 16:55:04 +0800 CST 2020-08-10 16:54:35 +0800 CST KubeletReady kubelet is posting ready status}] [{InternalIP 172.31.236.28} {Hostname turing-02-no.01.novalocal}] {{10250}} {da54aa58da54aa58da54aa58da54aa58 587DF4B2-1B95-4717-9775-F696ED79D950 06b1a58b-09b9-4068-9777-7dce4cfea189 3.10.0-957.el7.x86_64 CentOS Linux 7 (Core) docker://18.9.6 v1.18.0 v1.18.0 linux amd64} [{[hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13 hub.iflytek.com/turing/ssd_face:0.1] 5654144739} {[hub.iflytek.com/turing/gaea:1.0.4] 5636811356} {[am_train:1.0] 4440398750} {[honk:1.0] 4107130912} {[172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521 172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0] 3789072257} {[nvidia/cuda:10.1-cudnn7-devel-centos7] 3519393408} {[reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1] 3498923930} {[registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime] 3140164483} {[registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91 registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4] 3086599556} {[nvidia/cuda:10.1-devel-centos7] 2686646338} {[nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8 nvidia/cuda:9.1-devel-centos7] 1834703805} {[tkestack/gpu-manager:1.1.0] 437779384} {[k8s.gcr.io/etcd:3.4.3-0] 288425539} {[centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6 centos:7.6.1810] 201756323} {[calico/node:v3.8.2] 188832890} {[k8s.gcr.io/kube-apiserver:v1.18.0] 172962942} {[k8s.gcr.io/kube-controller-manager:v1.18.0] 162366590} {[calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90 calico/cni:v3.8.2] 157232722} {[k8s.gcr.io/kube-proxy:v1.18.0] 116531001} {[k8s.gcr.io/kube-scheduler:v1.18.0] 95274110} {[calico/kube-controllers:v3.8.2] 46809393} {[k8s.gcr.io/coredns:1.6.7] 43781501} {[calico/pod2daemon-flexvol:v3.8.2] 9366797} {[k8s.gcr.io/pause:3.2] 682696}] [] [] nil}}],} NodeNames:<nil>}
I0810 16:56:47.816226   11350 routes.go:81] GPUQuotaPredicate: extenderFilterResult = {"Nodes":{"metadata":{},"items":[{"metadata":{"name":"turing-02-no.01.novalocal","selfLink":"/api/v1/nodes/turing-02-no.01.novalocal","uid":"93b77b65-b112-4d36-ab2c-0e34efe8d097","resourceVersion":"674660","creationTimestamp":"2020-08-06T07:12:19Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"turing-02-no.01.novalocal","kubernetes.io/os":"linux","node-role.kubernetes.io/master":"","nvidia-device-enable":"enable"},"annotations":{"kubeadm.alpha.kubernetes.io/cri-socket":"/var/run/dockershim.sock","node.alpha.kubernetes.io/ttl":"0","projectcalico.org/IPv4Address":"172.31.236.28/24","projectcalico.org/IPv4IPIPTunnelAddr":"100.77.205.64","volumes.kubernetes.io/controller-managed-attach-detach":"true"},"managedFields":[{"manager":"kubeadm","operation":"Update","apiVersion":"v1","time":"2020-08-06T07:12:22Z"},{"manager":"kubectl","operation":"Update","apiVersion":"v1","time":"2020-08-06T07:31:43Z"},{"manager":"calico-node","operation":"Update","apiVersion":"v1","time":"2020-08-07T02:17:19Z"},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2020-08-10T08:54:34Z"},{"manager":"kubelet","operation":"Update","apiVersion":"v1","time":"2020-08-10T08:55:04Z"}]},"spec":{"podCIDR":"100.64.0.0/24"},"status":{"capacity":{"cpu":"32","ephemeral-storage":"52212716Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"263972712Ki","pods":"110","tencent.com/vcuda-core":"100","tencent.com/vcuda-memory":"29"},"allocatable":{"cpu":"32","ephemeral-storage":"51164140Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"263972712Ki","pods":"110","tencent.com/vcuda-core":"100","tencent.com/vcuda-memory":"29"},"conditions":[{"type":"NetworkUnavailable","status":"False","lastHeartbeatTime":"2020-08-07T02:17:19Z","lastTransitionTime":"2020-08-07T02:17:19Z","reason":"CalicoIsUp","message":"Calico is running on this node"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-06T07:36:32Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2020-08-10T08:55:04Z","lastTransitionTime":"2020-08-10T08:54:35Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"InternalIP","address":"172.31.236.28"},{"type":"Hostname","address":"turing-02-no.01.novalocal"}],"daemonEndpoints":{"kubeletEndpoint":{"Port":10250}},"nodeInfo":{"machineID":"da54aa58da54aa58da54aa58da54aa58","systemUUID":"587DF4B2-1B95-4717-9775-F696ED79D950","bootID":"06b1a58b-09b9-4068-9777-7dce4cfea189","kernelVersion":"3.10.0-957.el7.x86_64","osImage":"CentOS Linux 7 (Core)","containerRuntimeVersion":"docker://18.9.6","kubeletVersion":"v1.18.0","kubeProxyVersion":"v1.18.0","operatingSystem":"linux","architecture":"amd64"},"images":[{"names":["hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13","hub.iflytek.com/turing/ssd_face:0.1"],"sizeBytes":5654144739},{"names":["hub.iflytek.com/turing/gaea:1.0.4"],"sizeBytes":5636811356},{"names":["am_train:1.0"],"sizeBytes":4440398750},{"names":["honk:1.0"],"sizeBytes":4107130912},{"names":["172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521","172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0"],"sizeBytes":3789072257},{"names":["nvidia/cuda:10.1-cudnn7-devel-centos7"],"sizeBytes":3519393408},{"names":["reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1"],"sizeBytes":3498923930},{"names":["registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b","registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime"],"sizeBytes":3140164483},{"names":["registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91","registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4"],"sizeBytes":3086599556},{"names":["nvidia/cuda:10.1-devel-centos7"],"sizeBytes":2686646338},{"names":["nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8","nvidia/cuda:9.1-devel-centos7"],"sizeBytes":1834703805},{"names":["tkestack/gpu-manager:1.1.0"],"sizeBytes":437779384},{"names":["k8s.gcr.io/etcd:3.4.3-0"],"sizeBytes":288425539},{"names":["centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6","centos:7.6.1810"],"sizeBytes":201756323},{"names":["calico/node:v3.8.2"],"sizeBytes":188832890},{"names":["k8s.gcr.io/kube-apiserver:v1.18.0"],"sizeBytes":172962942},{"names":["k8s.gcr.io/kube-controller-manager:v1.18.0"],"sizeBytes":162366590},{"names":["calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90","calico/cni:v3.8.2"],"sizeBytes":157232722},{"names":["k8s.gcr.io/kube-proxy:v1.18.0"],"sizeBytes":116531001},{"names":["k8s.gcr.io/kube-scheduler:v1.18.0"],"sizeBytes":95274110},{"names":["calico/kube-controllers:v3.8.2"],"sizeBytes":46809393},{"names":["k8s.gcr.io/coredns:1.6.7"],"sizeBytes":43781501},{"names":["calico/pod2daemon-flexvol:v3.8.2"],"sizeBytes":9366797},{"names":["k8s.gcr.io/pause:3.2"],"sizeBytes":682696}]}}]},"NodeNames":null,"FailedNodes":{},"Error":""}
I0810 16:56:47.818814   11350 util.go:53] Determine if the pod tts1 needs GPU resource
I0810 16:56:47.819235   11350 gpu_predicate.go:379] Quota for namespace default is {Quota:map[] Pool:[]}
I0810 16:56:47.819251   11350 gpu_predicate.go:353] No GPU quota limit for default
I0810 16:56:47.819261   11350 routes.go:71] GPUQuotaPredicate: ExtenderArgs = {Pod:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:tts1,GenerateName:,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/tts1,UID:bdb78144-4ee9-4301-b071-9e31abb3cd41,ResourceVersion:674849,Generation:0,CreationTimestamp:2020-08-10 16:56:47 +0800 CST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{tencent.com/gpu-assigned: false,tencent.com/predicate-gpu-idx-0: 0,tencent.com/predicate-node: turing-02-no.01.novalocal,tencent.com/predicate-time: 1597049807810150433,tencent.com/vcuda-core-limit: 50,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[{gpu-admission Update v1 2020-08-10 16:56:47 +0800 CST nil} {kubectl Update v1 2020-08-10 16:56:47 +0800 CST nil}],},Spec:PodSpec{Volumes:[{test {HostPathVolumeSource{Path:/data/shengxu8/vcuda/xtts2.0.1029.P4.1,Type:*Directory,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-rl2vc {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-rl2vc,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{nvidia nvidia/cuda:9.1-devel-centos7 [./start.sh] [] /tts [] [] [{LOGGER_LEVEL 5 nil}] {map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}] map[tencent.com/vcuda-core:{{50 0} {<nil>} 50 DecimalSI} tencent.com/vcuda-memory:{{14 0} {<nil>} 14 DecimalSI}]} [{test false /tts  <nil> } {default-token-rl2vc true /var/run/secrets/kubernetes.io/serviceaccount  <nil> }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:true,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],WindowsOptions:nil,},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists  NoExecute 0xc00060c4e0} {node.kubernetes.io/unreachable Exists  NoExecute 0xc00060c500}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} Nodes:&NodeList{ListMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Items:[{{ } {turing-02-no.01.novalocal   /api/v1/nodes/turing-02-no.01.novalocal 93b77b65-b112-4d36-ab2c-0e34efe8d097 674660 0 2020-08-06 15:12:19 +0800 CST <nil> <nil> map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:turing-02-no.01.novalocal kubernetes.io/os:linux node-role.kubernetes.io/master: nvidia-device-enable:enable] map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 projectcalico.org/IPv4Address:172.31.236.28/24 projectcalico.org/IPv4IPIPTunnelAddr:100.77.205.64 volumes.kubernetes.io/controller-managed-attach-detach:true] [] nil []  [{kubeadm Update v1 2020-08-06 15:12:22 +0800 CST nil} {kubectl Update v1 2020-08-06 15:31:43 +0800 CST nil} {calico-node Update v1 2020-08-07 10:17:19 +0800 CST nil} {kube-controller-manager Update v1 2020-08-10 16:54:34 +0800 CST nil} {kubelet Update v1 2020-08-10 16:55:04 +0800 CST nil}]} {100.64.0.0/24  false [] nil } {map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{53465821184 0} {<nil>} 52212716Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>}  BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}] map[cpu:{{32 0} {<nil>} 32 DecimalSI} ephemeral-storage:{{52392079360 0} {<nil>} 51164140Ki BinarySI} hugepages-1Gi:{{0 0} {<nil>} 0 DecimalSI} hugepages-2Mi:{{0 0} {<nil>} 0 DecimalSI} memory:{{270308057088 0} {<nil>}  BinarySI} pods:{{110 0} {<nil>} 110 DecimalSI} tencent.com/vcuda-core:{{100 0} {<nil>} 100 DecimalSI} tencent.com/vcuda-memory:{{29 0} {<nil>} 29 DecimalSI}]  [{NetworkUnavailable False 2020-08-07 10:17:19 +0800 CST 2020-08-07 10:17:19 +0800 CST CalicoIsUp Calico is running on this node} {MemoryPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2020-08-10 16:55:04 +0800 CST 2020-08-06 15:36:32 +0800 CST KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2020-08-10 16:55:04 +0800 CST 2020-08-10 16:54:35 +0800 CST KubeletReady kubelet is posting ready status}] [{InternalIP 172.31.236.28} {Hostname turing-02-no.01.novalocal}] {{10250}} {da54aa58da54aa58da54aa58da54aa58 587DF4B2-1B95-4717-9775-F696ED79D950 06b1a58b-09b9-4068-9777-7dce4cfea189 3.10.0-957.el7.x86_64 CentOS Linux 7 (Core) docker://18.9.6 v1.18.0 v1.18.0 linux amd64} [{[hub.iflytek.com/turing/ssd_face@sha256:758fda54e4b87e1790ccbc93d842fd077e6fa09c04b7277b20d66a3f82405a13 hub.iflytek.com/turing/ssd_face:0.1] 5654144739} {[hub.iflytek.com/turing/gaea:1.0.4] 5636811356} {[am_train:1.0] 4440398750} {[honk:1.0] 4107130912} {[172.16.59.153/dlaas/pytorch-py36-cuda9@sha256:1cecf1e8b8f75e16e5514260190e677eb644c0c3d21630b75c0db5e1e5f02521 172.16.59.153/dlaas/pytorch-py36-cuda9:1.0.0] 3789072257} {[nvidia/cuda:10.1-cudnn7-devel-centos7] 3519393408} {[reg.deeplearning.cn/dlaas/cv_dist_openmpi:0.1] 3498923930} {[registry.turing.com:5000/dlaas/pytorch@sha256:1f79d91afb716ef6da8f167829c5fd101959b284028435b92fa3778719a0439b registry.turing.com:5000/dlaas/pytorch:1.5-cuda10.1-cudnn7-runtime] 3140164483} {[registry.turing.com:5000/dlaas/commonjob_centos7.2.1511@sha256:ad1e7f41fb15e4b400e0f4492b56c3b46ef077d7fa110e7515d4446f045eaa91 registry.turing.com:5000/dlaas/commonjob_centos7.2.1511:1.0.4] 3086599556} {[nvidia/cuda:10.1-devel-centos7] 2686646338} {[nvidia/cuda@sha256:86899043c7c1182046cdf9b89ce94ccf35c584b33875ba78316acd883bc6faf8 nvidia/cuda:9.1-devel-centos7] 1834703805} {[tkestack/gpu-manager:1.1.0] 437779384} {[k8s.gcr.io/etcd:3.4.3-0] 288425539} {[centos@sha256:62d9e1c2daa91166139b51577fe4f4f6b4cc41a3a2c7fc36bd895e2a17a3e4e6 centos:7.6.1810] 201756323} {[calico/node:v3.8.2] 188832890} {[k8s.gcr.io/kube-apiserver:v1.18.0] 172962942} {[k8s.gcr.io/kube-controller-manager:v1.18.0] 162366590} {[calico/cni@sha256:4922215c127c18b00c8f5916997259589577c132260181a2c50a093a78564c90 calico/cni:v3.8.2] 157232722} {[k8s.gcr.io/kube-proxy:v1.18.0] 116531001} {[k8s.gcr.io/kube-scheduler:v1.18.0] 95274110} {[calico/kube-controllers:v3.8.2] 46809393} {[k8s.gcr.io/coredns:1.6.7] 43781501} {[calico/pod2daemon-flexvol:v3.8.2] 9366797} {[k8s.gcr.io/pause:3.2] 682696}] [] [] nil}}],} NodeNames:<nil>}
I0810 16:56:47.819866   11350 routes.go:81] GPUQuotaPredicate: extenderFilterResult = {"Nodes":null,"NodeNames":null,"FailedNodes":null,"Error":"pod tts1 had been predicated!"}

And the pod also have the warnning event.

Events:
  Type     Reason            Age                   From                                Message
  ----     ------            ----                  ----                                -------
  Warning  FailedScheduling  <unknown>             default-scheduler                   pod tts1 had been predicated!

Why the pod display FailedScheduling, but it's running. Is there anything I misunderstanding?

Error when run build-img.sh

I run build-img.sh on folder hack
but getting error:

+++ [1008 04:24:10] Generating image...
Sending build context to Docker daemon  9.728kB
Step 1/21 : FROM centos:7 as build
 ---> eeb6ee3f44bd
Step 2/21 : ARG version
 ---> Using cache
 ---> 531c2de10efc
Step 3/21 : ARG commit
 ---> Using cache
 ---> 6fc182aa9c69
Step 4/21 : RUN yum install -y rpm-build make
 ---> Using cache
 ---> 31807ee37af1
Step 5/21 : ENV GOLANG_VERSION 1.13.4
 ---> Using cache
 ---> 40954c916844
Step 6/21 : RUN curl -sSL https://dl.google.com/go/go${GOLANG_VERSION}.linux-amd64.tar.gz     | tar -C /usr/local -xz
 ---> Using cache
 ---> 0449fd06b22d
Step 7/21 : ENV GOPATH /go
 ---> Using cache
 ---> 9f0244b0b788
Step 8/21 : ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
 ---> Using cache
 ---> 4c390d59d877
Step 9/21 : RUN mkdir -p /root/rpmbuild/{SPECS,SOURCES}
 ---> Using cache
 ---> 8ceb116f7df8
Step 10/21 : COPY gpu-admission.spec /root/rpmbuild/SPECS
 ---> Using cache
 ---> f9c2b9e3038d
Step 11/21 : COPY gpu-admission-source.tar.gz /root/rpmbuild/SOURCES
 ---> 27e7f6056f67
Step 12/21 : RUN echo '%_topdir /root/rpmbuild' > /root/.rpmmacros           && echo '%__os_install_post %{nil}' >> /root/.rpmmacros                   && echo '%debug_package %{nil}' >> /root/.rpmmacros
 ---> Running in 9a7e7e8c1fe8
Removing intermediate container 9a7e7e8c1fe8
 ---> 05e83bb47665
Step 13/21 : WORKDIR /root/rpmbuild/SPECS
 ---> Running in c5dd17e7b575
Removing intermediate container c5dd17e7b575
 ---> 13b2d9326e03
Step 14/21 : RUN rpmbuild -ba --quiet   --define 'version '${version}''   --define 'commit '${commit}''   gpu-admission.spec
 ---> Running in dc272d3f908d
/usr/bin/tar: Removing leading `/' from member names
make: *** No rule to make target `all'.  Stop.
error: Bad exit status from /var/tmp/rpm-tmp.ExwRfK (%build)
    Bad exit status from /var/tmp/rpm-tmp.ExwRfK (%build)
The command '/bin/sh -c rpmbuild -ba --quiet   --define 'version '${version}''   --define 'commit '${commit}''   gpu-admission.spec' returned a non-zero code: 1

How to solve it ?
Thank You

I couldn't understand how to run scheduler.

Hi there, I'm new to this field and I couldn't fully understand the README file. Could you give me some suggestions about some questions?

The first question is how to run scheduler with --policy-config-file=XXX --use-legacy-policy-config=true?

The second is what's XXX in --policy-config-file=XXX?

Any help would be appreciated!

pod being pending, while node has enough resources

What happened:

We creating a gpu deployment gpu-work, resources requested by deployment is:

Limits:
      tencent.com/vcuda-core:    10
      tencent.com/vcuda-memory:  1
    Requests:
      cpu:                       200m
      memory:                    256Mi
      tencent.com/vcuda-core:    10
      tencent.com/vcuda-memory:  1

When we scale replicas from 0 to 3, 5, and 7 in sequence, as result we get 5 running pods , 2 pending pod. But the node has enough resources.

NAME                        READY   STATUS    RESTARTS   AGE   IP              NODE                 NOMINATED NODE   READINESS GATES
gpu-work-85bb88797b-2fqht   1/1     Running   0          9s    10.244.99.247   workstation-master   <none>           <none>
gpu-work-85bb88797b-2gdst   1/1     Running   0          9s    10.244.99.196   workstation-master   <none>           <none>
gpu-work-85bb88797b-8j4lt   0/1     Pending   0          7s    <none>          <none>               <none>           <none>
gpu-work-85bb88797b-gvzd4   1/1     Running   0          10s   10.244.99.241   workstation-master   <none>           <none>
gpu-work-85bb88797b-m292c   0/1     Pending   0          7s    <none>          <none>               <none>           <none>
gpu-work-85bb88797b-t5qtr   1/1     Running   0          10s   10.244.99.198   workstation-master   <none>           <none>
gpu-work-85bb88797b-ws5wl   1/1     Running   0          10s   10.244.99.252   workstation-master   <none>           <none>

the event of pod gpu-work-85bb88797b-8j4lt is:

16s         Warning   FailedScheduling    pod/gpu-work-85bb88797b-8j4lt    0/5 nodes are available: 4 Insufficient tencent.com/vcuda-core, 5 Insufficient tencent.com/vcuda-memory.

However, the node workstation-master has enough resources:

Name:               workstation-master
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=workstation-master
                    kubernetes.io/os=linux
                    nvidia-device-enable=enable
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.31.140/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 24 May 2020 16:24:02 +0800
Taints:             tencent.com/vcuda-core=1:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  workstation-master
  AcquireTime:     <unset>
  RenewTime:       Wed, 24 Jun 2020 11:14:31 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 09 Jun 2020 15:39:46 +0800   Tue, 09 Jun 2020 15:39:46 +0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Wed, 24 Jun 2020 11:11:27 +0800   Thu, 18 Jun 2020 16:24:47 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 24 Jun 2020 11:11:27 +0800   Thu, 18 Jun 2020 16:38:00 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 24 Jun 2020 11:11:27 +0800   Thu, 18 Jun 2020 16:24:47 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 24 Jun 2020 11:11:27 +0800   Thu, 18 Jun 2020 16:24:47 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.31.140
  Hostname:    workstation-master
Capacity:
  cpu:                       24
  ephemeral-storage:         307175Mi
  hugepages-1Gi:             0
  hugepages-2Mi:             0
  memory:                    65807052Ki
  pods:                      110
  tencent.com/vcuda-core:    100
  tencent.com/vcuda-memory:  7
Allocatable:
  cpu:                       24
  ephemeral-storage:         306975Mi
  hugepages-1Gi:             0
  hugepages-2Mi:             0
  memory:                    65807052Ki
  pods:                      110
  tencent.com/vcuda-core:    100
  tencent.com/vcuda-memory:  7
System Info:
  Machine ID:                 c52fbc796b1a437c91a15fffc3cedcd9
  System UUID:                03000200-0400-0500-0006-000700080009
  Boot ID:                    f3c21d46-bb36-4b42-8eeb-1801c328ef84
  Kernel Version:             4.19.12-1.el7.elrepo.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.8
  Kubelet Version:            v1.17.3
  Kube-Proxy Version:         v1.17.3
PodCIDR:                      10.244.6.0/24
PodCIDRs:                     10.244.6.0/24
Non-terminated Pods:          (10 in total)
  Namespace                   Name                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                              ------------  ----------  ---------------  -------------  ---
  gpu                         gpu-work-85bb88797b-2fqht         200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         2m46s
  gpu                         gpu-work-85bb88797b-2gdst         200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         2m46s
  gpu                         gpu-work-85bb88797b-gvzd4         200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         2m47s
  gpu                         gpu-work-85bb88797b-t5qtr         200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         2m47s
  gpu                         gpu-work-85bb88797b-ws5wl         200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         2m47s
  kube-system                 calico-node-6sfwf                 250m (1%)     0 (0%)      0 (0%)           0 (0%)         30d
  kube-system                 gpu-admission-5dfb8754fd-fhz6f    0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  kube-system                 gpu-manager-daemonset-pw79l       0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d15h
  kube-system                 kube-proxy-jcw4b                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         30d
  monitoring                  node-exporter-jktjx               112m (0%)     270m (1%)   200Mi (0%)       220Mi (0%)     5d18h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                  Requests     Limits
  --------                  --------     ------
  cpu                       1362m (5%)   270m (1%)
  memory                    1480Mi (2%)  220Mi (0%)
  ephemeral-storage         0 (0%)       0 (0%)
  hugepages-1Gi             0 (0%)       0 (0%)
  hugepages-2Mi             0 (0%)       0 (0%)
  tencent.com/vcuda-core    50           50
  tencent.com/vcuda-memory  5            5

What you expected to happen:

the pod shoule be scheduled successfully

How to reproduce it (as minimally and precisely as possible):

the problem will appear with some conditions:

  1. we have extend the default scheduler's predicate interface.
  2. consecutively scale up gpu deployment, step is 2.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Others

We think the root cause of this problem is the same with this issue. When gpu-admission get a predicated request, the pod args of which had been annotationed by gpu-admission, it response an empty filteredMap, which will trigger this issue.

Fail on startup

Hello!

I'm trying to use gpu-admission with gpu-manager and installing in my cluster on masters node with gpu-admission.yaml.

But the pod is getting error.

F1026 16:50:08.270952       1 main.go:83] Failed to new gpu quota filter: invalid GPUFilter config in file /etc/kubernetes/gpu-admission.config
goroutine 1 [running]:
k8s.io/klog.stacks(0xc0002f9e00, 0xc000312000, 0x90, 0xe5)
	/go/pkg/mod/k8s.io/[email protected]/klog.go:855 +0xb8
k8s.io/klog.(*loggingT).output(0x1ebf960, 0xc000000003, 0xc00030a4d0, 0x1e29c93, 0x7, 0x53, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/klog.go:806 +0x351
k8s.io/klog.(*loggingT).printf(0x1ebf960, 0x3, 0x13ed31a, 0x22, 0xc00016df10, 0x1, 0x1)
	/go/pkg/mod/k8s.io/[email protected]/klog.go:705 +0x14b
k8s.io/klog.Fatalf(...)
	/go/pkg/mod/k8s.io/[email protected]/klog.go:1256
main.main()
	/root/rpmbuild/BUILD/gpu-admission-unknown/main.go:83 +0x3ff

i am using the default gpu-admission.config

{
	"QuotaConfigMapName": "gpuquota",
	"QuotaConfigMapNamespace": "kube-system",
	"GPUModelLabel": "gaia.tencent.com/gpu-model",
	"GPUPoolLabel": "gaia.tencent.com/gpu-pool"
}

Can anyone help me?

Why choose only one node?

Hey guys, who can tell me why deviceFilter choose only one and only one node fullfil the reques.I think that it does not make sense.

Scheduler doesn't works with affinity

Hi! Can anyone help me with this?

When i use this scheduler-policy-config.json file:

{
  "kind": "Policy",
  "apiVersion": "v1",
  "predicates": [
    {
      "name": "PodFitsHostPorts"
    },
    {
      "name": "PodFitsResources"
    },
    {
      "name": "NoDiskConflict"
    },
    {
      "name": "MatchNodeSelector"
    },
    {
      "name": "HostName"
    }
  ],
  "extenders": [
    {
      "urlPrefix": "http://127.0.0.1:3456/scheduler",
      "apiVersion": "v1beta1",
      "filterVerb": "predicates",
      "enableHttps": false,
      "nodeCacheCapable": false
    }
  ],
  "hardPodAffinitySymmetricWeight": 10,
  "alwaysCheckAllPredicates": false
}

The scheduler doesn't use or respect my pod affinity rule:

affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: release
                  operator: In
                  values:
                  - appgpu
              topologyKey: kubernetes.io/hostname
            weight: 100

I don't want more the one replica pod in the same node.

Thanks!

The active pods considered by gpu-admission and gpu-manager are inconsistent

Hi @mYmNeo , I sometimes find that if a GPU pod is created while some GPU pods are being deleted or terminating, the UnexpectedAdmissionError will appear a little more frequently. I observed that the logic for gpu-admission to get active GPU pods on a node is different from that of gpu-manager. When gpu-admission get active pods, it seems to think the pods being deleted still occupies the GPUs, but gpu-manager will excludes these pods. So I think maybe their logic for getting active pods should also be consistent to reduce the occurrence of UnexpectedAdmissionError caused by inconsistent GPU selection.

More detailed documentation for running on a kubernetes cluster

Attempting to run gpu-admission on a kubernetes cluster doesn't appear to work right out of the box and it isn't described in the documentation to do it.
I build the image from the docker file but the run command in it is:

CMD ["/bin/bash", "-c", "/usr/bin/gpu-admission --kubeconfig=/etc/kubernetes/kube-scheduler/kubeconfig --config=/etc/kubernetes/gpu-admission.config --address=0.0.0.0:3456 --v=$LOG_LEVEL --logtostderr=false --log-dir=/var/log/gpu-admission $EXTRA_FLAGS"]

Does this have to be run in the control plane? I am attempting to run this using the managed kubernetes services EKS and can't access the control plane nodes. To get around this I am running a second scheduler on a slave node. The config does not exist in /etc/kubernetes/gpu-admission.config when I deploy using the yaml file. Does the docker file need to be modified to create this directory and file on the node?

Why use Capacity instead of Allocatable?

Why use node.Status.Capacity instead of node.Status.Allocatable? Doesn't the number of Allocatable determine whether it is schedulable? In this case, I feel that there will be a problem when node.Status.Allocatable is less than node.Status.Capacity((perhaps due to health check by device plugin),), because at this time gpu-admission thinks that there is still a GPU that can be scheduled, but in fact there is no GPU. Maybe there will be problems when the container is allocated GPU on the node?

func GetCapacityOfNode(node *v1.Node, resourceName string) int {
val, ok := node.Status.Capacity[v1.ResourceName(resourceName)]
if !ok {
return 0
}
return int(val.Value())
}
// GetGPUDeviceCountOfNode returns the number of GPU devices
func GetGPUDeviceCountOfNode(node *v1.Node) int {
val, ok := node.Status.Capacity[VCoreAnnotation]
if !ok {
return 0
}
return int(val.Value()) / HundredCore
}

update pod annotation failed

When I test the gpu-admission in k8s v1.13.5 ,I got the following error:

I0814 08:36:19.986356       1 gpu_predicate.go:493] failed to add annotation map[tencent.com/gpu-assigned:false tencent.com/predicate-gpu-idx-0:0 tencent.com/predicate-node:ai-1080ti-15 tencent.com/predicate-time:1597394179983794058] to pod 9a3a7c36-dd45-11ea-8e57-6c92bf66acae due to pods "test33" not found
I0814 08:36:19.986380       1 util.go:71] Determine if the container test33 needs GPU resource
I0814 08:36:19.986394       1 share.go:58] Pick up 0 , cores: 100, memory: 43
I0814 08:36:19.988944       1 gpu_predicate.go:493] failed to add annotation map[tencent.com/gpu-assigned:false tencent.com/predicate-gpu-idx-0:0 tencent.com/predicate-node:ai-1080ti-57 tencent.com/predicate-time:1597394179986399567] to pod 9a3a7c36-dd45-11ea-8e57-6c92bf66acae due to pods "test33" not found
I0814 08:36:19.988971       1 util.go:71] Determine if the container test33 needs GPU resource
I0814 08:36:19.988986       1 share.go:58] Pick up 0 , cores: 100, memory: 43
I0814 08:36:19.991268       1 gpu_predicate.go:493] failed to add annotation map[tencent.com/gpu-assigned:false tencent.com/predicate-gpu-idx-0:0 tencent.com/predicate-node:ai-1080ti-62 tencent.com/predicate-time:1597394179988992239] to pod 9a3a7c36-dd45-11ea-8e57-6c92bf66acae due to pods "test33" not found
...
I0814 08:36:19.992368       1 routes.go:81] GPUQuotaPredicate: extenderFilterResult = {"Nodes":{"metadata":{},"items":[]},"NodeNames":null,"FailedNodes":{"ai-1080ti-15":"update pod annotation failed","ai-1080ti-57":"update pod annotation failed","ai-1080ti-62":"update pod annotation failed"},"Error":""}

  • The pod yaml
Name:               test33
Namespace:          danlu-efficiency
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             <none>
Annotations:        <none>
Status:             Pending
IP:
NominatedNodeName:  ai-1080ti-62
Containers:
  test33:
    Image:      danlu/tensorflow:tf1.9.0_py2_gpu_v0.1
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      sleep 100000000
    Limits:
      tencent.com/vcuda-core:    10
      tencent.com/vcuda-memory:  30
    Requests:
      tencent.com/vcuda-core:    10
      tencent.com/vcuda-memory:  30
    Environment:                 <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p6lfp (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-p6lfp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p6lfp
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                     From           Message
  ----     ------            ----                    ----           -------
  Warning  FailedScheduling  4m56s (x1581 over 19h)  gpu-admission  0/16 nodes are available: 1 node(s) were unschedulable, 12 Insufficient tencent.com/vcuda-core, 12 Insufficient tencent.com/vcuda-memory, 3 update pod annotation failed.

Nodes information:

  • ai-1080ti-15
qa-jenkins@fuxi-qa-3:~/vgpu$ kubectl get nodes
NAME           STATUS                     ROLES             AGE    VERSION
ai-1080ti-15   Ready                      nvidia            463d   v1.13.3
ai-1080ti-57   Ready                      1080ti            463d   v1.13.3
ai-1080ti-62   Ready                      nvidia418         442d   v1.13.5
fuxi-dl-42     Ready                      <none>            302d   v1.13.5
fuxi-dl-46     Ready                      <none>            464d   v1.13.3
fuxi-dl-47     Ready                      <none>            464d   v1.13.3
fuxi-dl-48     Ready                      <none>            442d   v1.13.5
fuxi-qa-10g    Ready                      1080ti,training   414d   v1.13.5
fuxi-qa-12g    Ready                      nvidia            414d   v1.13.5
fuxi-qa-14     Ready                      <none>            353d   v1.13.5
fuxi-qa-15     Ready                      <none>            353d   v1.13.5
fuxi-qa-16     Ready                      <none>            309d   v1.13.5
fuxi-qa-3      Ready,SchedulingDisabled   master            603d   v1.13.5
fuxi-qa-4      Ready                      <none>            464d   v1.13.3
fuxi-qa-5      Ready                      <none>            464d   v1.13.3
fuxi-qa-8g     Ready                      nvidia            464d   v1.13.3

NOTE: The nodes begining with 'ai' are GPU nodes and labeled with 'nvidia-device-enable=enable '. Some information about GPU nodes is as follows:

Name:               ai-1080ti-15
Roles:              nvidia
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    hardware=NVIDIAGPU
                    hardware-type=NVIDIAGPU
                    kubernetes.io/hostname=ai-1080ti-15
                    node-role.kubernetes.io/nvidia=GPU
                    nvidia-device-enable=enable
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.200.0.72/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 May 2019 14:28:18 +0800
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------    -----------------                 ------------------                ------                       -------
  MemoryPressure   False     Fri, 14 Aug 2020 17:54:08 +0800   Thu, 30 Apr 2020 18:26:46 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False     Fri, 14 Aug 2020 17:54:08 +0800   Thu, 30 Apr 2020 18:26:46 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False     Fri, 14 Aug 2020 17:54:08 +0800   Thu, 30 Apr 2020 18:26:46 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True      Fri, 14 Aug 2020 17:54:08 +0800   Thu, 30 Apr 2020 18:26:46 +0800   KubeletReady                 kubelet is posting ready status
  OutOfDisk        Unknown   Wed, 08 May 2019 14:28:18 +0800   Wed, 08 May 2019 14:33:54 +0800   NodeStatusNeverUpdated       Kubelet never posted node status.
Addresses:
  InternalIP:  10.200.0.72
  Hostname:    ai-1080ti-15
Capacity:
 cpu:                       56
 ephemeral-storage:         1153070996Ki
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    264030000Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
Allocatable:
 cpu:                       53
 ephemeral-storage:         1041195391675
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    251344688Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
System Info:
 Machine ID:                   2030b7c755d0458cbe03ef3b39b9412b
 System UUID:                  00000000-0000-0000-0000-ac1f6b27b26a
 Boot ID:                      c9dce882-9bc3-478d-a81d-1a8dcfd02a4f
 Kernel Version:               4.19.0-0.bpo.8-amd64
 OS Image:                     Debian GNU/Linux 9 (stretch)
 Operating System:             linux
 Architecture:                 amd64
 Container Runtime Version:    docker://18.6.2
 Kubelet Version:              v1.13.3
 Kube-Proxy Version:           v1.13.3
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                  Requests            Limits
  --------                  --------            ------
  cpu                       51320m (96%)        93300m (176%)
  memory                    101122659840 (39%)  240384047Ki (95%)
  ephemeral-storage         0 (0%)              0 (0%)
  nvidia.com/gpu            7                   7
  tencent.com/vcuda-core    0                   0
  tencent.com/vcuda-memory  0                   0

  • ai-1080ti-57
Name:               ai-1080ti-57
Roles:              1080ti
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    hardware=NVIDIAGPU
                    hardware-type=NVIDIAGPU
                    kubernetes.io/hostname=ai-1080ti-57
                    node-role.kubernetes.io/1080ti=1080ti
                    nvidia-device-enable=enable
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.90.1.126/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 May 2019 14:47:50 +0800
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------    -----------------                 ------------------                ------                       -------
  MemoryPressure   False     Fri, 14 Aug 2020 17:56:33 +0800   Wed, 12 Aug 2020 19:44:29 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False     Fri, 14 Aug 2020 17:56:33 +0800   Wed, 12 Aug 2020 19:44:29 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False     Fri, 14 Aug 2020 17:56:33 +0800   Wed, 12 Aug 2020 19:44:29 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True      Fri, 14 Aug 2020 17:56:33 +0800   Wed, 12 Aug 2020 19:44:29 +0800   KubeletReady                 kubelet is posting ready status
  OutOfDisk        Unknown   Wed, 08 May 2019 14:47:50 +0800   Fri, 09 Aug 2019 11:58:18 +0800   NodeStatusNeverUpdated       Kubelet never posted node status.
Addresses:
  InternalIP:  10.90.1.126
  Hostname:    ai-1080ti-57
Capacity:
 cpu:                       56
 ephemeral-storage:         1152148172Ki
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    264029980Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
Allocatable:
 cpu:                       53
 ephemeral-storage:         1040344917078
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    251344668Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
System Info:
 Machine ID:                   3ff54e221e0d475bacbe8a68bd0dd2e2
 System UUID:                  00000000-0000-0000-0000-ac1f6b91d6e8
 Boot ID:                      efc67001-4c66-4fca-946c-d13f0931fcc2
 Kernel Version:               4.19.0-0.bpo.8-amd64
 OS Image:                     Debian GNU/Linux 9 (stretch)
 Operating System:             linux
 Architecture:                 amd64
 Container Runtime Version:    docker://18.6.2
 Kubelet Version:              v1.13.3
 Kube-Proxy Version:           v1.13.3
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                  Requests            Limits
  --------                  --------            ------
  cpu                       47730m (90%)        73100m (137%)
  memory                    111630921472 (43%)  186532353536 (72%)
  ephemeral-storage         0 (0%)              0 (0%)
  nvidia.com/gpu            7                   7
  tencent.com/vcuda-core    0                   0
  tencent.com/vcuda-memory  0                   0
Events:                     <none>
  • ai-1080ti-62
Name:               ai-1080ti-62
Roles:              nvidia418
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    hardware=NVIDIAGPU
                    hardware-type=NVIDIAGPU
                    kubernetes.io/hostname=ai-1080ti-62
                    node-role.kubernetes.io/nvidia418=nvidia418
                    nvidia-device-enable=enable
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.90.1.131/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 29 May 2019 18:02:54 +0800
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 14 Aug 2020 17:57:17 +0800   Thu, 30 Jul 2020 16:42:25 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 14 Aug 2020 17:57:17 +0800   Thu, 30 Jul 2020 16:42:25 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 14 Aug 2020 17:57:17 +0800   Thu, 30 Jul 2020 16:42:25 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 14 Aug 2020 17:57:17 +0800   Thu, 30 Jul 2020 16:42:25 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.90.1.131
  Hostname:    ai-1080ti-62
Capacity:
 cpu:                       56
 ephemeral-storage:         1152148172Ki
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    264029984Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
Allocatable:
 cpu:                       53
 ephemeral-storage:         1040344917078
 hugepages-1Gi:             0
 hugepages-2Mi:             0
 memory:                    251344672Ki
 nvidia.com/gpu:            8
 pods:                      110
 tencent.com/vcuda-core:    800
 tencent.com/vcuda-memory:  349
System Info:
 Machine ID:                 bf90cb25500346cb8178be49909651e4
 System UUID:                00000000-0000-0000-0000-ac1f6b93483c
 Boot ID:                    97927469-0e92-4816-880c-243a64ef293a
 Kernel Version:             4.19.0-0.bpo.8-amd64
 OS Image:                   Debian GNU/Linux 9 (stretch)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.6.2
 Kubelet Version:            v1.13.5
 Kube-Proxy Version:         v1.13.5
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                  Requests           Limits
  --------                  --------           ------
  cpu                       51063m (96%)       83248m (157%)
  memory                    99256222976 (38%)  132428537Ki (52%)
  ephemeral-storage         0 (0%)             0 (0%)
  nvidia.com/gpu            7                  7
  tencent.com/vcuda-core    0                  0
  tencent.com/vcuda-memory  0                  0
Events:                     <none>

The go.mod:

module tkestack.io/gpu-admission

go 1.13

replace (
        //k8s.io/api => github.com/kubernetes/kubernetes/staging/src/k8s.io/api v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/api => k8s.io/api kubernetes-1.13.5

        k8s.io/apiextensions-apiserver => github.com/kubernetes/kubernetes/staging/src/k8s.io/apiextensions-apiserver v0.0.0-20190816231410-2d3c76f9091b
        
        //k8s.io/apimachinery => github.com/kubernetes/kubernetes/staging/src/k8s.io/apimachinery v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/apimachinery => k8s.io/apimachinery kubernetes-1.13.5
        
        k8s.io/apiserver => github.com/kubernetes/kubernetes/staging/src/k8s.io/apiserver v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/cli-runtime => github.com/kubernetes/kubernetes/staging/src/k8s.io/cli-runtime v0.0.0-20190816231410-2d3c76f9091b

        //k8s.io/client-go => github.com/kubernetes/kubernetes/staging/src/k8s.io/client-go v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/client-go => k8s.io/client-go kubernetes-1.13.5


        k8s.io/cloud-provider => github.com/kubernetes/kubernetes/staging/src/k8s.io/cloud-provider v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/cluster-bootstrap => github.com/kubernetes/kubernetes/staging/src/k8s.io/cluster-bootstrap v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/code-generator => github.com/kubernetes/kubernetes/staging/src/k8s.io/code-generator v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/component-base => github.com/kubernetes/kubernetes/staging/src/k8s.io/component-base v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/cri-api => github.com/kubernetes/kubernetes/staging/src/k8s.io/cri-api v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/csi-translation-lib => github.com/kubernetes/kubernetes/staging/src/k8s.io/csi-translation-lib v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/kube-aggregator => github.com/kubernetes/kubernetes/staging/src/k8s.io/kube-aggregator v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/kube-controller-manager => github.com/kubernetes/kubernetes/staging/src/k8s.io/kube-controller-manager v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/kube-proxy => github.com/kubernetes/kubernetes/staging/src/k8s.io/kube-proxy v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/kube-scheduler => github.com/kubernetes/kubernetes/staging/src/k8s.io/kube-scheduler v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/kubelet => github.com/kubernetes/kubernetes/staging/src/k8s.io/kubelet v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/legacy-cloud-providers => github.com/kubernetes/kubernetes/staging/src/k8s.io/legacy-cloud-providers v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/metrics => github.com/kubernetes/kubernetes/staging/src/k8s.io/metrics v0.0.0-20190816231410-2d3c76f9091b
        k8s.io/sample-apiserver => github.com/kubernetes/kubernetes/staging/src/k8s.io/sample-apiserver v0.0.0-20190816231410-2d3c76f9091b
)

require (
        github.com/gogo/protobuf v1.1.1 // indirect
        github.com/golang/protobuf v1.3.2 // indirect
        github.com/json-iterator/go v1.1.7 // indirect
        github.com/julienschmidt/httprouter v1.3.1-0.20191005171706-08a3b3d20bbe
        github.com/spf13/pflag v1.0.5
        golang.org/x/net v0.0.0-20191109021931-daa7c04131f5 // indirect
        golang.org/x/sys v0.0.0-20191010194322-b09406accb47 // indirect
        k8s.io/api v0.0.0
        k8s.io/apimachinery v0.0.0
        k8s.io/client-go v0.0.0
        k8s.io/component-base v0.0.0
        k8s.io/klog v0.3.1
        k8s.io/kubernetes v1.15.5
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.