GPU Sharing Scheduler for Kubernetes Cluster

License: Apache License 2.0

Dockerfile 3.94% Go 77.33% Shell 14.73% Mustache 2.20% Makefile 1.80%

gpushare-scheduler-extender's Introduction

GPU Sharing Scheduler Extender in Kubernetes

Overview

More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this topic.

Now there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes.

Prerequisites

Kubernetes 1.11+
golang 1.19+
NVIDIA drivers ~= 361.93
Nvidia-docker version > 2.0 (see how to install and it's prerequisites)
Docker configured with Nvidia as the default runtime.

Design

For more details about the design of this project, please read this Design document.

Setup

You can follow this Installation Guide. If you are using Alibaba Cloud Kubernetes, please follow this doc to install with Helm Charts.

User Guide

You can check this User Guide.

Developing

Scheduler Extender

git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extender
make build-image

Device Plugin

git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugin
docker build -t cheyang/gpushare-device-plugin .

Kubectl Extension

golang > 1.10

mkdir -p $GOPATH/src/github.com/AliyunContainerService
cd $GOPATH/src/github.com/AliyunContainerService
git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git
cd gpushare-device-plugin
go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

Demo

- Demo 1: Deploy multiple GPU Shared Pods and schedule them on the same GPU device in binpack way

- Demo 2: Avoid GPU memory requests that fit at the node level, but not at the GPU device level

Related Project

gpushare device plugin

Roadmap

Integrate Nvidia MPS as the option for isolation
Automated Deployment for the Kubernetes cluster which is deployed by kubeadm
Scheduler Extener High Availablity
Generic Solution for GPU, RDMA and other devices

Adopters

If you are intrested in GPUShare and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.

Acknowledgments

GPU sharing solution is based on Nvidia Docker2, and their gpu sharing design is our reference. The Nvidia Community is very supportive and We are very grateful.

gpushare-scheduler-extender's People

Contributors

Stargazers

Watchers

Forkers

cheyang lcasi isgasho yuyangbj wym42 chaosju awesome-archive bnulwh yuxijin-tobeyjin marcplouhinec k7 fzuwill rafmonteiro gavinljj huaizhengzhang monstercy zxthunter oiooj cheng0214 gbtyy ldrmcml tommenx sakuralbj melvynpan alasdairtran gfandada tjnh05 riverzhang adarshjain007 19930711sx shhui manifoldyu hellolijj louhwz wbq1995 artemzholus sheep5665 cdyangzhenyu life1347 alger7w reaminjocye godki1eraron magiczhao xiaocaoxu lxyzhangqing ylhsiehitri yashdusing davidstack jasperzhong jear qiankai-kwai watercolor dst1213 646677064 myonlyzzy airlovelq luolian0 chelarua xingbu110 makisekurisucn box9527 happy2048 yupengzte shanshuiguochu buttonman vcding xjas staugust mozhata zhuanght kobkrit lhty24 icefed troycheng lijingyuwudi soolaugust pan87232494 liaixiong githubstack iamsunguangzhi axing620 kuluso97 zhaogaolong nautilusshell danieltanyouzu syhawk zerocurve aland-zhang morningsong vic0777 edkfjc43782 igorzan yu3peng wenxinax apgpavel kevinwang2011 noliaoliao nic720 flomeworld vivisidea

gpushare-scheduler-extender's Issues

Dependencies added into the source code

First, thank you for the scheduler, it helps a lot when managing gpus in k8s.

The dependencies are currently added directly in the code and versionned. They should be downloaded at compile time. This would simplify analysis of the code.

Can I apply for memory in MiB？such as 100MiB,300MiB.

When I have deployed according to this scheme.GiB is default metric.How to use MiB as default metric instead of GiB.

运行时报错 OCI runtime create failed

当我运行时报错nvidia-container-cli: device error: unknown device id: no-gpu-has-1024MiB-to-run,但是我运行nvidia-device-plugin-daemonset可以正常通过测试

Questions regarding the User Guide

Hi,

Firstly, thank you guys so much for making this open sourced.

I'm a newbie to Kubernetes. I set up Kube as well as this gpushare scheduler using the instructions, however, I did encounter some problems and have questions regarding each of the sections of the User Guide

When I try to query allocation status of the shared GPU, I only get the allocation but no info about the names, IP address, etc. Is this normal?

NAME IPADDRESS GPU Memory()
Allocated/Total GPU Memory In Cluster: 0/0 (0%)

I'm not sure which file I should modify for requesting gpu sharing. It says "specify aliyun.com/gpu-mem" but in what file?
In what file do I limit gpu usage using the statements below?
ALIYUN_COM_GPU_MEM_DEV=15
ALIYUN_COM_GPU_MEM_CONTAINER=3

Thanks a lot in advance! :)

Modify scheduler configuration in minikube

I use minikube to start a local k8s cluster. I try to share gpus in my local minikube.
In this link, https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md#2-modify-scheduler-configuration, I don't know how to modify scheduler configuration. So I skipped this step.
Finally, I execute the command kubectl inspect gpushare, but I got a empty resource.

[root@localhost gpushare]# kubectl inspect gpushare
NAME  IPADDRESS  GPU Memory()

Allocated/Total GPU Memory In Cluster:
0/0 (0%)

Do I have to do modify scheduler configuration step?, but how can i speify the scheduler configuration in miniukube? thanks for your time.

Other GPU sharing strategies besides bin packing

Are there other GPU sharing strategies, besides bin packing, supported? If not, would you consider adding those as a feature?

What I want to achieve is that my containers are spread evenly among the existing GPU's in the cluster, instead of filling one GPU before switching to the next. So, basically, placing a new container on the GPU with the least allocated memory, instead of the most allocated one.

The motivation is that I want to run one training per GPU. One workaround is (artificially) claiming the whole GPU memory: then no other container will ever be placed on the GPU. But that would block the entire GPU for other users of the cluster, which I'd want to avoid.

the question about the controller

I see the source code, but didn't know what's the purpuse of the controller class, and it sync pod information for what?

I have a master and a GPU node, After I create a gpu pod, I get problem.

Error: failed to start container "binpack-1": Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "process_linux.go:339: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=no-gpu-has-2MiB-to-run --compute --utility --require=cuda>=8.0 --pid=260909 /var/lib/docker/overlay2/64f498e224fd0a93b0e15b8769699a97527a3acab3c6288c4c8d939bbe4ca82c/merged]\nnvidia-container-cli: device error: unknown device id: no-gpu-has-2MiB-to-run\n"

does one container can request two GPUs?

I noticed that "Each container can request one or more GPUs " using "nvidia.com/gpu: " as a schedulable resource, (https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/), since the goals of gpushare-scheduler-extender is allowing users to express requests for sharing a resource, can the " aliyun.com/gpu-mem" still allow one container to request one or more GPUs ?

GPU-mem is the whole GB value, not MB value

Right now on a g3s.xlarge instance I'm seeing the gpu-mem value being set to 7 though the host has 1 GPU with 7GB of memory (7618MiB according to nvidia-smi).

If I try to schedule a fraction of gpu-mem (1.5 for example) I'm told I need to use a whole integer.

Should the plugin be exporting 7618 as the gpu-mem value?

Fail to create kube-scheduler

Hi,

I tried to create the kube-scheduler with kubectl create -f https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/config/kube-scheduler.yaml, but failed.

$ kubectl get pods -n kube-system
NAME                                     READY   STATUS
kube-scheduler                           0/1     CrashLoopBackOff
kube-scheduler-leadtek-gs4820            1/1     Running
gpushare-device-plugin-ds-hs4kt          1/1     Running
gpushare-schd-extender-978bd945b-sqhzj   1/1     Running
...

$ kubectl logs -n kube-system kube-scheduler
failed to create listener: failed to listen on 127.0.0.1:10251: listen tcp 127.0.0.1:10251: bind: address already in use

Even if I remove the livenessProbe section in the aforementioned kube-scheduler.yaml, the kubectl logs still shows the same error.

What's going wrong...?

Thanks!

error with gpushare

I follow the Installation Guide, when i apply a yaml file with gpu resource, the status is always RunContainerError, and extender scheduler's log is "pod gpushare in ns default is not assigned to any node, skip", and this is pod imformation: " &Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:gpushare,GenerateName:,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/gpushare,UID:1679f323-4474-11e9-bb2f-246e96b68028,ResourceVersion:788724,Generation:0,CreationTimestamp:2019-03-12 03:08:19 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"gpushare","namespace":"default"},"spec":{"containers":[{"image":"cr.d.xiaomi.net/jishaomin/pause:2.0","name":"test","resources":{"limits":{"aliyun.com/gpu-mem":"10G","cpu":2,"memory":"4G"}}}],"restartPolicy":"Always"}}
,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-76r29 {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-76r29,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{test cr.d.xiaomi.net/jishaomin/pause:2.0 [] [] [] [] [] {map[cpu:{{2 0} {} 2 DecimalSI} memory:{{4 9} {} 4G DecimalSI} aliyun.com/gpu-mem:{{10 9} {} 10G DecimalSI}] map[memory:{{4 9} {} 4G DecimalSI} aliyun.com/gpu-mem:{{10 9} {} 10G DecimalSI} cpu:{{2 0} {} 2 DecimalSI}]} [{default-token-76r29 true /var/run/secrets/kubernetes.io/serviceaccount }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[],HostAliases:[],PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:,ContainerStatuses:[],QOSClass:Guaranteed,InitContainerStatuses:[],NominatedNodeName:,},}
"

Adapting for use with managed control plane

I have an EKS cluster and am hoping to adapt this to run as a second scheduler since I can't edit the default kube-scheduler as called for in your installation instructions (I don't believe, but correct me if I am wrong).

I have edited the yaml slightly to be in line with the guide at the below link:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/

But it doesn't seem to be working (will admit I knew this was wishful thinking). Any ideas what else I need to do? I am very new to go so struggling to dig into the source code.

kind: ServiceAccount
apiVersion: v1
metadata:
  name: gpu-scheduler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gpu-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
  name: gpu-scheduler
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:kube-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1 #extensions/v1beta1
kind: Deployment
metadata:
  labels:
    component: scheduler
    tier: control-plane
  name: gpu-scheduler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: scheduler
      tier: control-plane
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        component: scheduler
        tier: control-plane
    spec:
      serviceAccountName: gpu-scheduler
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpu-scheduler:1.11-d170d8a
        name: gpu-scheduler
        env:
        - name: LOG_LEVEL
          value: debug
        - name: PORT
          value: "12345"
      hostNetwork: true
      tolerations:
      - effect: NoSchedule
        operator: Exists
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        operator: Exists
        key: node.cloudprovider.kubernetes.io/uninitialized
      nodeSelector:
         node-role.kubernetes.io/master: ""

No assume timestamp for pod tf-jupyter-794b84bb56-sdthx in namespace default, so it's not GPUSharedAssumed assumed pod.

Hi,
I remember gpushare did work. Now I have a bug.

Has anybody has an idea on this bug?

Thanks.

[root@node1 demo_gpu_share]# k describe pod tf-jupyter-794b84bb56-sdthx

Events:
Type Reason Age From Message

Warning FailedScheduling 50s (x4 over 2m10s) default-scheduler 0/4 nodes are available: 3 node(s) didn't match node selector, 4 Insufficient aliyun.com/gpu-mem.
Normal Scheduled 47s default-scheduler Successfully assigned default/tf-jupyter-794b84bb56-sdthx to bdworker-gpu1
Normal Pulled 27s (x3 over 40s) kubelet, bdworker-gpu1 Container image "tensorflow/tensorflow:1.12.0-gpu" already present on machine
Normal Created 27s (x3 over 40s) kubelet, bdworker-gpu1 Created container
Warning Failed 27s (x3 over 40s) kubelet, bdworker-gpu1 Error: failed to start container "tensorflow": Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: no-gpu-has-4000MiB-to-run\\n\""": unknown
Warning BackOff 7s (x3 over 20s) kubelet, bdworker-gpu1 Back-off restarting failed container

[root@bdworker-gpu1 ~]# docker logs -f 704d1e0b733e

I1113 15:27:32.916827 1 allocate.go:46] ----Allocating GPU for gpu mem is started----
I1113 15:27:32.916853 1 allocate.go:57] RequestPodGPUs: 4000
I1113 15:27:32.916865 1 allocate.go:61] checking...
I1113 15:27:32.931910 1 podmanager.go:112] all pod list [{{ } {tf-jupyter-794b84bb56-sdthx tf-jupyter-794b84bb56- default /api/v1/namespaces/default/pods/tf-jupyter-794b84bb56-sdthx e77dd94a-0629-11ea-afd3-9a39de800a0f 33973 0 2019-11-13 15:26:03 +0000 UTC map[app:tf-jupyter pod-template-hash:794b84bb56] map[] [{apps/v1 ReplicaSet tf-jupyter-794b84bb56 e77b8eb8-0629-11ea-afd3-9a39de800a0f 0xc42062f88a 0xc42062f88b}] nil [] } {[{bin {&HostPathVolumeSource{Path:/usr/bin,Type:,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {lib {&HostPathVolumeSource{Path:/usr/lib,Type:,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {default-token-z4trp {nil nil nil nil nil &SecretVolumeSource{SecretName:default-token-z4trp,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}] [] [{tensorflow tensorflow/tensorflow:1.12.0-gpu [] [] [{ 0 8888 TCP }] [] [] {map[aliyun.com/gpu-mem:{{4 3} {} 4k DecimalSI}] map[aliyun.com/gpu-mem:{{4 3} {} 4k DecimalSI}]} [{bin false /usr/local/nvidia/bin } {lib false /usr/local/nvidia/lib } {default-token-z4trp true /var/run/secrets/kubernetes.io/serviceaccount }] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}] Always 0xc42062f978 ClusterFirst map[accelerator:nvidia-tesla-m6] default default bdworker-gpu1 false false false &PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],} [] nil default-scheduler [{node.kubernetes.io/not-ready Exists NoExecute 0xc42062fa40} {node.kubernetes.io/unreachable Exists NoExecute 0xc42062fa60}] [] 0xc42062fa70 nil []} {Pending [{PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-11-13 15:27:26 +0000 UTC }] [] [] BestEffort}}]
I1113 15:27:32.932261 1 podmanager.go:123] list pod tf-jupyter-794b84bb56-sdthx in ns default in node bdworker-gpu1 and status is Pending
I1113 15:27:32.932281 1 podutils.go:81] No assume timestamp for pod tf-jupyter-794b84bb56-sdthx in namespace default, so it's not GPUSharedAssumed assumed pod.
W1113 15:27:32.932296 1 allocate.go:152] invalid allocation requst: request GPU memory 4000 can't be satisfied.

Is not kubectl inspect gpushare, is kubectl-inspect-gpushare

In gpushare-scheduler-extender/docs/userguide.md, Is not kubectl inspect gpushare, is kubectl-inspect-gpushare!

About Pod with multiple containers supports

Hi @cheyang,

I copy the issue from device pluign repo and have a question about the case that one pod have multiple containers.
I trace the code in kubelet and find that it called the allocate function for each container in the pod.

for _, container := range pod.Spec.Containers {
		if err := m.allocateContainerResources(pod, &container, devicesToReuse); err != nil {
			return err
		}
		m.podDevices.removeContainerAllocatedResources(string(pod.UID), container.Name, devicesToReuse)
	}
		devs := allocDevices.UnsortedList()
		// TODO: refactor this part of code to just append a ContainerAllocationRequest
		// in a passed in AllocateRequest pointer, and issues a single Allocate call per pod.
		klog.V(3).Infof("Making allocation request for devices %v for device plugin %s", devs, resource)
		resp, err := eI.e.allocate(devs)

this case may corrupt the finding pod logic in device plugin allocate function .

	// podReqGPU = uint(0)
	for _, req := range reqs.ContainerRequests {
		podReqGPU += uint(len(req.DevicesIDs))
	}

	if getGPUMemoryFromPodResource(pod) == podReqGPU {

Do you have any idea to resolve this issue?

questions about devs in nodeinfo.

Is there a problem when different gpu cards exist on the same node?

For example, two gpu cards with 8G and 16G memory exist on one node, so the result of devmap is that they both have 12GiB. I don't quite understand it.

Not Support Assign nodename at the yaml configfile

Pod Status pending . Not Support Assign nodename at the yaml configfile

NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU4(Allocated/Total) GPU5(Allocated/Total) GPU6(Allocated/Total) GPU7(Allocated/Total) PENDING(Allocated) GPU Memory(MiB)
10.116.109.37 10.116.109.37 10/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 10/248
10.116.109.29 10.116.109.29 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/248
10.116.109.30 10.116.109.30 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/248
10.116.109.31 10.116.109.31 5/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 5/248
10.116.109.33 10.116.109.33 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/248
10.116.109.34 10.116.109.34 9/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 30 39/248
10.116.109.35 10.116.109.35 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/248
10.116.109.36 10.116.109.36 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/31 0/248

Allocated/Total GPU Memory In Cluster:
54/1984 (2%)

gpushare获取到的GPU总内存

我的显卡内存是6g,但是gpushare获取到的内存总资源是5g,请问这个是有1g内存的预留吗? 还是预留的内存是按照总内存的比例来进行预留的? 期待您的回复.

how to build the Kubectl Extension ?

go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/.go
err:
stat cmd/inspect/.go: no such file or directory

Pod consumes more GPU resources than the given limit

Hi,

I have the following issue:

Suppose I have 2 deployments. Deployment "A" has a limit of 1024MiB. Deployment "B" has a limit of 5000MiB. The total memory of my GPU is 11GiB.

If I run deployment "A" only, it works just fine, no crash in the container.
Now if I try to run also deployment "B", B crashes instantly with out of gpu memory errors.

If I run deployment "B" only, it works just fine, no crash in the container.
Now if I try to run also deployment "A", A crashes instantly with out of gpu memory errors.

So it seems that the first pod that runs is reserving more memory than it is set in the yml file.
But when I run kubectl inspect gpushare I see the correct sum of requested gpu (6024/11000MiB).

Any idea of the reasons?

Thanks in advance.

Can schedule full GPUs and partial GPUs side-by-side

Hi,

When scheduling GPUs I can schedule a partial GPU request, and a full GPU on the same GPU.

Allocatable:
 aliyun.com/gpu-count:  1
 aliyun.com/gpu-mem:    7

Allocated resources:
  Resource              Requests     Limits
  --------              --------     ------
  aliyun.com/gpu-count  1            1
  aliyun.com/gpu-mem    7            7

I have a total of 3 pods running on this machine, 1 requesting 1 gpu-mem, 1 requesting 6 gpu-mem, and another requesting 1 gpu-count.

I would expect that the gpu-sched would deduct a full GPUs worth of memory from the allocatable once the full GPU pod has been scheduled.

how to change scheduler config, when I start k8s with rke?

I have get gpushare-schd-extender-886d94bf6-fl5mf running, but as you can see, when using rke to start up k8s, there is no scheduler config I can change for scheduler container. Can you suggest how to include the json?

[root@k8s-demo-slave1 kubernetes]# pwd
/etc/kubernetes
[root@k8s-demo-slave1 kubernetes]# ls
scheduler-policy-config.json  ssl
[root@k8s-demo-slave1 kubernetes]# cd ssl
[root@k8s-demo-slave1 ssl]# ls
kube-apiserver-key.pem                   kube-apiserver-requestheader-ca.pem           kubecfg-kube-controller-manager.yaml  kube-controller-manager-key.pem  kube-etcd-192-168-2-229.pem  kube-scheduler-key.pem
kube-apiserver.pem                       kube-ca-key.pem                               kubecfg-kube-node.yaml                kube-controller-manager.pem      kube-node-key.pem            kube-scheduler.pem
kube-apiserver-proxy-client-key.pem      kube-ca.pem                                   kubecfg-kube-proxy.yaml               kube-etcd-192-168-2-140-key.pem  kube-node.pem                kube-service-account-token-key.pem
kube-apiserver-proxy-client.pem          kubecfg-kube-apiserver-proxy-client.yaml      kubecfg-kube-scheduler.yaml           kube-etcd-192-168-2-140.pem      kube-proxy-key.pem           kube-service-account-token.pem
kube-apiserver-requestheader-ca-key.pem  kubecfg-kube-apiserver-requestheader-ca.yaml  kubecfg-kube-scheduler.yaml.bak       kube-etcd-192-168-2-229-key.pem  kube-proxy.pem

[root@k8s-demo-slave1 ssl]# cat kubecfg-kube-scheduler.yaml
apiVersion: v1
kind: Config
clusters:
- cluster:
    api-version: v1
    certificate-authority: /etc/kubernetes/ssl/kube-ca.pem
    server: "https://127.0.0.1:6443"
  name: "local"
contexts:
- context:
    cluster: "local"
    user: "kube-scheduler-local"
  name: "local"
current-context: "local"
users:
- name: "kube-scheduler-local"
  user:
    client-certificate: /etc/kubernetes/ssl/kube-scheduler.pem
    client-key: /etc/kubernetes/ssl/kube-scheduler-key.pem


[root@k8s-demo-slave1 ssl]# kubectl get pods -A

NAMESPACE       NAME                                      READY   STATUS      RESTARTS   AGE
ingress-nginx   default-http-backend-5bcc9fd598-8ggs8     0/1     Evicted     0          17d
ingress-nginx   default-http-backend-5bcc9fd598-ch87f     1/1     Running     0          17d
ingress-nginx   default-http-backend-5bcc9fd598-jbw26     0/1     Evicted     0          21d
ingress-nginx   nginx-ingress-controller-df7sh            1/1     Running     0          21d
ingress-nginx   nginx-ingress-controller-mr89d            1/1     Running     0          17d
kube-system     canal-2bflt                               2/2     Running     0          16d
kube-system     canal-h5sjc                               2/2     Running     0          16d
kube-system     coredns-799dffd9c4-vzvrw                  1/1     Running     0          21d
kube-system     coredns-autoscaler-84766fbb4-5xpk8        1/1     Running     0          21d
kube-system     gpushare-schd-extender-886d94bf6-fl5mf    1/1     Running     0          28m
kube-system     metrics-server-59c6fd6767-2ct2h           1/1     Running     0          17d
kube-system     metrics-server-59c6fd6767-tphk6           0/1     Evicted     0          17d
kube-system     metrics-server-59c6fd6767-vlbgp           0/1     Evicted     0          21d
kube-system     rke-coredns-addon-deploy-job-8dbml        0/1     Completed   0          21d
kube-system     rke-ingress-controller-deploy-job-7h6vd   0/1     Completed   0          21d
kube-system     rke-metrics-addon-deploy-job-sbrlp        0/1     Completed   0          21d
kube-system     rke-network-plugin-deploy-job-5r7d6       0/1     Completed   0          21d

失误，看错了

这一句 # go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

How to modify scheduler configuration when I start k8s with kubeadm

actual gpu mem in use exceeds gpu mem limit

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorrt-inference-server
  labels:
    app: tensorrt-inference-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tensorrt-inference-server
  template:
    metadata:
      labels:
        app: tensorrt-inference-server
    spec:
      volumes:
        - name: model-data
          hostPath:
            path: /home/nocturne/Repos/tensorrt-inference-server/docs/examples/model_repository
      containers:
        - name: tensorrt-inference-server
          image: nvcr.io/nvidia/tensorrtserver:19.07-py3
          imagePullPolicy: IfNotPresent
          args: ["trtserver", "--model-store=/models"]
          volumeMounts:
            - mountPath: /models
              name: model-data
          resources:
            limits:
              aliyun.com/gpu-mem: 1024
          ports:
            - containerPort: 8000
              name: http
            - containerPort: 8001
              name: grpc
            - containerPort: 8002
              name: metrics
          livenessProbe:
            httpGet:
              path: /api/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 5
            periodSeconds: 5
            httpGet:
              path: /api/health/ready
              port: http

I limit the GPU mem in 1024M. However, the pod exceeds this limit.

-> % nvidia-smi 
Fri Aug  2 20:31:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX250       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   70C    P0    N/A /  N/A |   1428MiB /  2002MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     20378      C   trtserver                                   1418MiB |
+-----------------------------------------------------------------------------+

But the kubectl extender tells me that only 1024M mem is allocated.

-> # kubectl inspect gpushare
NAME   IPADDRESS      GPU0(Allocated/Total)  GPU Memory(MiB)
quark  10.68.229.152  1024/2002              1024/2002
------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
1024/2002 (51%)

Is this a bug? Or I do something wrong?

Other info:

-> # kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master     Ready    master   14h   v1.15.0
quark      Ready    <none>   19h   v1.15.0
worker-1   Ready    <none>   37h   v1.15.0

-> # kubectl get po
NAME                                        READY   STATUS    RESTARTS   AGE
tensorrt-inference-server-dc97bf8b8-9ldgc   1/1     Running   0          46m

-> # kubectl get svc
NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                        AGE
kubernetes                  ClusterIP   10.250.0.1     <none>        443/TCP                                        6h18m
tensorrt-inference-server   NodePort    10.250.0.134   <none>        8000:30261/TCP,8001:31320/TCP,8002:30896/TCP   19m

-> # kubectl describe po tensorrt-inference-server                                                                                                                                       
Name:           tensorrt-inference-server-dc97bf8b8-9ldgc                                                                                                                                
Namespace:      default                                                                                                                                                                  
Priority:       0                                                                                                                                                                        
Node:           quark/10.68.229.152                                                                                                                                                      
Start Time:     Fri, 02 Aug 2019 07:53:12 -0400                                                                                                                                          
Labels:         app=tensorrt-inference-server                                                                                                                                            
                pod-template-hash=dc97bf8b8                                                                                                                                              
Annotations:    ALIYUN_COM_GPU_MEM_ASSIGNED: true                                                                                                                                        
                ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1564746792686234334                                                                                                                      
                ALIYUN_COM_GPU_MEM_DEV: 2002                                                                                                                                             
                ALIYUN_COM_GPU_MEM_IDX: 0                                                                                                                                                
                ALIYUN_COM_GPU_MEM_POD: 1024                                                                                                                                             
Status:         Running                                                                                                                                                                  
IP:             10.244.1.41                                                                                                                                                              
Controlled By:  ReplicaSet/tensorrt-inference-server-dc97bf8b8                                                                                                                           
Containers:                                                                                                                                                                              
  tensorrt-inference-server:                                                                                                                                                             
    Container ID:  docker://5943c8c239814a891508281cafdf18a6b319762b31e0bceefad975b4dfe998b6                                                                                             
    Image:         nvcr.io/nvidia/tensorrtserver:19.07-py3                                                                                                                               
    Image ID:      docker-pullable://nvcr.io/nvidia/tensorrtserver@sha256:014cebe2a440d4f6f761e3c6ddb3d8d72f75275301fc13424a7613583ac8509f                                               
    Ports:         8000/TCP, 8001/TCP, 8002/TCP                                             
    Host Ports:    0/TCP, 0/TCP, 0/TCP        
    Args:                                     
      trtserver                               
      --model-store=/models                   
    State:          Running                   
      Started:      Fri, 02 Aug 2019 07:53:13 -0400                                         
    Ready:          True                      
    Restart Count:  0                         
    Limits:                                   
      aliyun.com/gpu-mem:  1024               
    Requests:                                 
      aliyun.com/gpu-mem:  1024               
    Liveness:              http-get http://:http/api/health/live delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:             http-get http://:http/api/health/ready delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:           <none>             
    Mounts:                                   
      /models from model-data (rw)            
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gsjlx (ro)                                                                                                        
Conditions:                                   
  Type              Status                    
  Initialized       True                      
  Ready             True                        
  ContainersReady   True                        
  PodScheduled      True                        
Volumes:                                      
  model-data:                                 
    Type:          HostPath (bare host directory volume)                                    
    Path:          /home/nocturne/Repos/tensorrt-inference-server/docs/examples/model_repository                                                                                         
    HostPathType:                             
  default-token-gsjlx:                        
    Type:        Secret (a volume populated by a Secret)                                    
    SecretName:  default-token-gsjlx          
    Optional:    false                        
QoS Class:       BestEffort                   
Node-Selectors:  <none>                       
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s                            
                 node.kubernetes.io/unreachable:NoExecute for 300s                          
Events:                                       
  Type    Reason     Age        From               Message                                  
  ----    ------     ----       ----               -------                                  
  Normal  Scheduled  47m        default-scheduler  Successfully assigned default/tensorrt-inference-server-dc97bf8b8-9ldgc to quark
  Normal  Pulled     <invalid>  kubelet, quark     Container image "nvcr.io/nvidia/tensorrtserver:19.07-py3" already present on machine
  Normal  Created    <invalid>  kubelet, quark     Created container tensorrt-inference-server                                                                                           
  Normal  Started    <invalid>  kubelet, quark     Started container tensorrt-inference-server

Need deploy gpushare components automatically

Fixed by #25

Error kube-scheduler image registry url

when exec kubectl apply -f config/kube-scheduler.yaml cmd，can't pull image。
and docker pull registry-vpc.cn-shanghai.aliyuncs.com/acs/kube-scheduler-amd64:v1.11.2
report:
^[[OError response from daemon: Get https://registry-vpc.cn-shanghai.aliyuncs.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
modify registry addr，can access
docker pull docker pull registry.cn-hangzhou.aliyuncs.com/acs/kube-scheduler-amd64:v1.11.2

it seems that here are some repeated codes

when i read source code, i find may be there are some repeated codes.

https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/pkg/gpushare/controller.go#L118-L128

is it necessary to delete these?

Failed to create GPU deployment

Hi,

I try to do the example and I got this error

`Events:
Type Reason Age From Message

Warning FailedScheduling 6s (x5 over 3m1s) default-scheduler 0/2 nodes are available: 2 Insufficient aliyun.com/gpu-mem`

What does it mean?

Is there any demos that using this gpushare-scheduler-extender in jupyter with tensorflow or spark?

Hi,
I noticed that there are demos using command lines, yet , is there any demos that using this gpushare-scheduler-extender in jupyter with tensorflow or spark? Because data scientists may not familiar with k8s yaml.

Confused in NodeInfo

i notice deviceinfo , it means every device`s memory is same in one node ?

How to do it, can aliyun.com/gpu-mem and nvidia.com/gpu are compatible ?

I hope that the two resources both aliyun.com/gpu-mem and nvidia.com/gpu can coexist in k8s system.

Currently, pods using aliyun.com/gpu-mem resources and pods using nvidia.com/gpu resources are actually applying for the same physical device GPU and do not perceive each other, that is, actually Its are not compatible.

I hope that the two will coexist. What is the idea of the community? Thanks !

Scheduling GPUs on GKE: No access to master node

Trying to deploy the scheduler on GKE, but the GKE cluster does not have the master node in the nodes we can reach.

No nodes match the master node label. It appears the master node is completely managed by GKE.

We get these logs in the scheduler:

I0518 23:57:53.520315       1 controller_utils.go:1025] Waiting for caches to sync for scheduler controller
I0518 23:57:53.620497       1 controller_utils.go:1032] Caches are synced for scheduler controller
I0518 23:57:53.620576       1 leaderelection.go:185] attempting to acquire leader lease  kube-system/kube-scheduler...

Is there any way around having to use the master node? The master node is accessible via kubernetes APIs.

gpu-scheduler-extender run error

部署 gpu-shared-extender时，查看对应的pod的日志：
[ info ] 2019/07/08 11:11:10 main.go:65: Log level was set to DEBUG
[ info ] 2019/07/08 11:11:10 controller.go:63: Creating event broadcaster
[ info ] 2019/07/08 11:11:10 controller.go:116: begin to wait for cache
E0708 11:11:10.926213 121 reflector.go:205] github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Node: Unauthorized

这个错误是因为哪里的证书设置出错了？ kube-apiserver/kubelet/kube-controller-manager/kube-scheduler都设置过并确认是正确的。

gpushare-scheduler-extender/vendor/github.com/docker LICENSE file

Hi,

I found GPL licenses are presents in 5 directories :

vendor/github.com/moby/moby/contrib/selinux/docker-engine-selinux
vendor/github.com/moby/moby/contrib/selinux-fedora-24/docker-engine-selinux
vendor/github.com/moby/moby/contrib/selinux-oraclelinux-7/docker-engine-selinux
vendor/github.com/docker/docker/contrib/selinux-fedora-24/docker-engine-selinux
vendor/github.com/docker/docker/contrib/selinux-oraclelinux-7/docker-engine-selinux

I was wondering why an Apache-2.0 licensed project would include GPL License files.

Describe the results you received:
GPL licenses found

Describe the results you expected:
No GPL licenses

Note:
docker\docker\contrib\selinux LICENSE file
In moby contrib, there are no selinux\selinux-fedora-24\selinux-oraclelinux-7.

@cheyang @xlgao-zju @oiooj @ringtail @BSWANG

Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

Thanks for your device plugin. I have installed gpu share device plugin. But when I am creating the sample binpack-1-0 is in pending state.

kubectl decribe pod binpack1-0
Events:
Type Reason Age From Message

Warning FailedScheduling 44s (x2 over 44s) default-scheduler Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

root@kmaster:~# kubectl-inspect-gpushare
NAME IPADDRESS GPU Memory(GiB)
node2 192.168.193.239 0/8

Allocated/Total GPU Memory In Cluster:
0/8 (0%)

I am able to see the above result for gpu share. Please help me to fix this issue.
Thanks

Question: how does the GPU device number count in the container with "gpu-count"?

Given a server with 8 GPUs, if we start a pod with "aliyun.com/gpu-count:2", and the scheduler assign GPU3 and GPU7 to this pod, what is the GPU number for these 2 GPU cards in the pod? 0 and 1?

Total GPU Memory In Cluster - should appreciate SchedulingDisabled node status

Hi There,

Thank you for the wonderful work! I was able to configure gpushare-scheduler-extender, it is really useful for addressing GPU memory based load scheduling.

It would be even better, if the scheduler is appreciative of node status. Let say we have two GPU nodes (11Gib + 11 Gib) and one of them is SchedulingDisabled. I should only be allowed to schedule or allocate 11Gib memory to the cluster, since one of the nodes is SchedulingDisabled

Thanks!

how to deploy it in kubernetes 1.13

As it seen in the official document

it has been DEPRECATED，how to deploy scheduler-extender in v1.13 higher version

helm install error: no image recoverDevicePluginImage value

when i install with helm
helm install --name gpushare --namespace kube-system --set kubeVersion=1.11.5 --set masterCount=3 gpushare-installer

it return error

[root@iZuf69zddmom136duk79quZ chart]# helm install --name gpushare --namespace kube-system --set kubeVersion=1.11.5 --set masterCount=3 gpushare-installer
Error: release gpushare failed: DaemonSet.apps "device-plugin-recover-ds" is invalid: spec.template.spec.containers[0].image: Required value

i find deployer/chart/gpushare-installer/value.yaml lose recoverDevicePluginImage value in deed

gpushare-schd-extender not getting scheduled

Using minikube on p3.2xlarge. gpushare-schd-extender-x is not getting scheduled.

$ kubectl inspect gpushare

NAME      IPADDRESS     GPU0(Allocated/Total)  GPU Memory(GiB)
minikube  1.2.3.4  0/15                   0/15
----------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/15 (0%)

$ kubectl get po -n kube-system -l app=gpushare

NAME                                      READY   STATUS    RESTARTS   AGE
gpushare-device-plugin-ds-vlxrz           1/1     Running   0          11m
gpushare-schd-extender-5b869fb687-d479d   0/1     Pending   0          2m54s

$ kubectl describe node minikube

Name:               minikube
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    gpushare=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=minikube
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 06 Jun 2019 23:01:28 +0000
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 06 Jun 2019 23:22:28 +0000   Thu, 06 Jun 2019 23:01:27 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 06 Jun 2019 23:22:28 +0000   Thu, 06 Jun 2019 23:01:27 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 06 Jun 2019 23:22:28 +0000   Thu, 06 Jun 2019 23:01:27 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 06 Jun 2019 23:22:28 +0000   Thu, 06 Jun 2019 23:01:27 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  1.2.3.4
  Hostname:    minikube
Capacity:
 aliyun.com/gpu-count:  1
 aliyun.com/gpu-mem:    15
 cpu:                   8
 ephemeral-storage:     101583780Ki
 hugepages-1Gi:         0
 hugepages-2Mi:         0
 memory:                62873108Ki
 pods:                  110
Allocatable:
 aliyun.com/gpu-count:  1
 aliyun.com/gpu-mem:    15
 cpu:                   8
 ephemeral-storage:     93619611493
 hugepages-1Gi:         0
 hugepages-2Mi:         0
 memory:                62770708Ki
 pods:                  110
System Info:
 Machine ID:                 b2025f76037542639ab86f04ce815234
 System UUID:                EC2FFBCB-B1BD-718E-424D-5F3C92864897
 Boot ID:                    3991d985-f340-4d1f-a8a4-a6f430da359d
 Kernel Version:             4.15.0-1040-aws
 OS Image:                   Ubuntu 18.04.2 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.9.6
 Kubelet Version:            v1.14.2
 Kube-Proxy Version:         v1.14.2
Non-terminated Pods:         (10 in total)
  Namespace                  Name                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                ------------  ----------  ---------------  -------------  ---
  kube-system                coredns-fb8b8dccf-fw9mr             100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     20m
  kube-system                coredns-fb8b8dccf-r2f8p             100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     20m
  kube-system                etcd-minikube                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         20m
  kube-system                gpushare-device-plugin-ds-vlxrz     1 (12%)       1 (12%)     300Mi (0%)       300Mi (0%)     13m
  kube-system                kube-addon-manager-minikube         5m (0%)       0 (0%)      50Mi (0%)        0 (0%)         19m
  kube-system                kube-apiserver-minikube             250m (3%)     0 (0%)      0 (0%)           0 (0%)         19m
  kube-system                kube-controller-manager-minikube    200m (2%)     0 (0%)      0 (0%)           0 (0%)         19m
  kube-system                kube-proxy-bm6mt                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         20m
  kube-system                kube-scheduler-minikube             100m (1%)     0 (0%)      0 (0%)           0 (0%)         15m
  kube-system                storage-provisioner                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         20m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource              Requests     Limits
  --------              --------     ------
  cpu                   1755m (21%)  1 (12%)
  memory                490Mi (0%)   640Mi (1%)
  ephemeral-storage     0 (0%)       0 (0%)
  aliyun.com/gpu-count  0            0
  aliyun.com/gpu-mem    0            0
Events:
  Type    Reason                   Age                From                  Message
  ----    ------                   ----               ----                  -------
  Normal  Starting                 21m                kubelet, minikube     Starting kubelet.
  Normal  NodeAllocatableEnforced  21m                kubelet, minikube     Updated Node Allocatable limit across pods
  Normal  NodeHasSufficientMemory  21m (x8 over 21m)  kubelet, minikube     Node minikube status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    21m (x8 over 21m)  kubelet, minikube     Node minikube status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     21m (x7 over 21m)  kubelet, minikube     Node minikube status is now: NodeHasSufficientPID
  Normal  Starting                 20m                kube-proxy, minikube  Starting kube-proxy.

$ kubectl describe -n kube-system po gpushare-schd-extender-5b869fb687-d479d

Name:               gpushare-schd-extender-5b869fb687-d479d
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=gpushare
                    component=gpushare-schd-extender
                    pod-template-hash=5b869fb687
Annotations:        scheduler.alpha.kubernetes.io/critical-pod: 
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/gpushare-schd-extender-5b869fb687
Containers:
  gpushare-schd-extender:
    Image:      registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
    Port:       <none>
    Host Port:  <none>
    Environment:
      LOG_LEVEL:  debug
      PORT:       12345
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from gpushare-schd-extender-token-w2xm9 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  gpushare-schd-extender-token-w2xm9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  gpushare-schd-extender-token-w2xm9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  19s (x2 over 95s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

question about healthy check

Do we reduce the GPU memory from the allocatable memory pool when unhealthy card is detected?
it seems that extender schedule do not have related information.

About gpu util

If I have two pods sharing the same gpu , echo one applys 50%. How could I figure out the actual gpu util of them.
Any suggestions?

创建pod报错nvidia-container-cli: device error: unknown device id: no-gpu-has-10MiB-to-run

我使用的集群是1.16版本，按照教程安装完成后，aliyun.com/gpu-count: 2 gpu-mem:22
创建pod时aliyun.com/gpu-mem: 10，但是会报错stderr: nvidia-container-cli: device error: unknown device id: no-gpu-has-10MiB-to-run; 如果指定gpu-count: 1 报错Back-off restarting failed container

可能是什么原因？

环境：
docker 19.03.5

modify memory-unit do not work

At first, I build by GiB.Then I set memory-unit=MiB,but it seem like "limits:aliyun.com/gpu-mem: 3 " still deploy by GiB.

Deploy Node error

kubernetes 1.14.3
nvidia-docker2.0
nvidia driver 384.111
Description:
After I successfully deploy one node, it run successfully.
But the error happened when I deploy the second node, which seems not likely work right.
kubectl describe the node would get the information:

The log of device-plugin-ds seems okay.
So what could it happend?

GPU is not being displayed completely.

Setup:

Minikube
EC2 p3.2xlarge

Problem: The GPU is not being consumed completely.

NAME:       minikube
IPADDRESS:  172.30.0.163

NAME                        NAMESPACE  GPU0(Allocated)  
binpack-3-79998566f8-tzlv7  default    1                
cuda-1-56d4f8bbb-xwdfv      default    2                
Allocated :                 3 (20%)    
Total :                     15

Only 15 GiB ~ 15360MiB is being displayed instead of capacity of 16130MiB

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   51C    P0    42W / 300W |  15341MiB / 16130MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     27745      C   python                                     15331MiB |
+-----------------------------------------------------------------------------+

and, sometimes there are two processes using the GPU

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   56C    P0    45W / 300W |  15906MiB / 16130MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     26038      C   python                                     15331MiB |
|    0     26294      C   python                                       565MiB |
+-----------------------------------------------------------------------------+

$ sudo kubectl inspect gpushare
NAME      IPADDRESS     GPU0(Allocated/Total)  GPU Memory(GiB)
minikube  172.30.0.163  3/15                   3/15
----------------------------------------------------
Allocated/Total GPU Memory In Cluster:
3/15 (20%)

Also, how to show memory in MiB instead of GiB in kubectl inspect gpushare?

deploy the gpushare-scheduler and device plugin on master and node, respectively?

Hello,

Thank you for your nice work. I have a silly question: do I have to deploy the gpushare-scheduler and device plugin on master and node, respectively? Is there any requirement on the GPU type/capability?

Thank you!
Best,
Zhendong

The problem about kubectl inspect gpushare

in the cluster node1 ,the total GPU Memory is 1997Mib(GPU0:Quadro P620)+6072Mib(GPU1:GeForce GTX 1060),following like this:

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2157 G /usr/lib/xorg/Xorg 94MiB |
| 0 6768 G compiz 62MiB |
| 0 7184 G fcitx-qimpanel 6MiB |
| 0 8539 G /usr/lib/firefox/firefox 1MiB |
| 0 8759 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+

but when i excute the command :sudo kubectl inspect gpushare, the output is :
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU Memory(MiB)
compute1 192.168.1.3 0/1997 0/1997 0/3994

Allocated/Total GPU Memory In Cluster:
0/3994 (0%)

why the GPU1 only has 1997Mib ,does the output of GPU1 has some realtions to the output of GPU0 ?
Can someone explain this problem?
thank you!!

aliyuncontainerservice / gpushare-scheduler-extender Goto Github PK