Giter VIP home page Giter VIP logo

open-local's Introduction

Open-Local

Go Report Card workflow build codecov

English | 简体中文

Open-Local is a local disk management system composed of multiple components. With Open-Local, using local storage in Kubernetes will be as simple as centralized storage.

Features

  • Local storage pool management
  • Dynamic volume provisioning
  • Extended scheduler
  • Volume expansion
  • Volume snapshot
  • Volume metrics
  • Raw block volume
  • IO Throttling(direct-io only)
  • Ephemeral inline volume

Open-Local Feature Matrix

Feature Open-Local Version K8S Version
Node Disk pooling v0.1.0+ 1.18-1.20
Dynamic Provisioning v0.1.0+ 1.20-1.22
Volume Expansion v0.1.0+ 1.20-1.22
Volume Snapshot v0.1.0+ 1.20-1.22
LVM/Block Device/Mountpoints as fs v0.1.0+ 1.18-1.20
Raw Block Device(volumeMode: Block) v0.3.0+ 1.20-1.22
IO-Throttling v0.4.0+ 1.20-1.22
CSI ephemeral volumes v0.5.0+ 1.20-1.22
IPv6 Support v0.5.3+ 1.20-1.22
SPDK host device v0.6.0+ 1.20-1.22
Read-write snapshot v0.7.0+ 1.20-1.22

Overall Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│ Master                                                                      │
│                   ┌───┬───┐           ┌────────────────┐                    │
│                   │Pod│PVC│           │   API-Server   │                    │
│                   └───┴┬──┘           └────────────────┘                    │
│                        │ bound                ▲                             │
│                        ▼                      │ watch                       │
│                      ┌────┐           ┌───────┴────────┐                    │
│                      │ PV │           │ Kube-Scheduler │                    │
│                      └────┘         ┌─┴────────────────┴─┐                  │
│                        ▲            │     open-local     │                  │
│                        │            │ scheduler-extender │                  │
│                        │      ┌────►└────────────────────┘◄───┐             │
│ ┌──────────────────┐   │      │               ▲               │             │
│ │ NodeLocalStorage │   │create│               │               │  callback   │
│ │    InitConfig    │  ┌┴──────┴─────┐  ┌──────┴───────┐  ┌────┴────────┐    │
│ └──────────────────┘  │  External   │  │   External   │  │  External   │    │
│          ▲            │ Provisioner │  │   Resizer    │  │ Snapshotter │    │
│          │ watch      ├─────────────┤  ├──────────────┤  ├─────────────┤    │
│    ┌─────┴──────┐     ├─────────────┴──┴──────────────┴──┴─────────────┤GRPC│
│    │ open-local │     │                 open-local                     │    │
│    │ controller │     │             CSI ControllerServer               │    │
│    └─────┬──────┘     └────────────────────────────────────────────────┘    │
│          │ create                                                           │
└──────────┼──────────────────────────────────────────────────────────────────┘
           │
┌──────────┼──────────────────────────────────────────────────────────────────┐
│ Worker   │                                                                  │
│          │                                                                  │
│          ▼                ┌───────────┐                                     │
│ ┌──────────────────┐      │  Kubelet  │                                     │
│ │ NodeLocalStorage │      └─────┬─────┘                                     │
│ └──────────────────┘            │ GRPC                     Shared Disks     │
│          ▲                      ▼                          ┌───┐  ┌───┐     │
│          │              ┌────────────────┐                 │sdb│  │sdc│     │
│          │              │   open-local   │ create volume   └───┘  └───┘     │
│          │              │ CSI NodeServer ├───────────────► VolumeGroup      │
│          │              └────────────────┘                                  │
│          │                                                                  │
│          │                                                 Exclusive Disks  │
│          │                ┌─────────────┐                  ┌───┐            │
│          │ update         │ open-local  │  init device     │sdd│            │
│          └────────────────┤    agent    ├────────────────► └───┘            │
│                           └─────────────┘                  Block Device     │
│                                                                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Open-Localcontains four types of components:

  • Scheduler extender: as an extended component of Kubernetes Scheduler, adding local storage scheduling algorithm
  • CSI plugins: providing the ability to create/delete volume, expand volume and take snapshots of the volume
  • Agent: running on each node in the K8s cluster, initializing the storage device according to the configuration list, and reporting local storage device information for Scheduler extender
  • Controller: getting the cluster initial configuration of the storage and deliver a detailed configuration list to Agents running on each node

Open-Local also includes a monitoring dashboard:

Who uses Open-Local

Open-Local has been widely used in production environments, and currently used products include:

  • ACK Distro
  • Alibaba Cloud ECP (Enterprise Container Platform)
  • Alibaba Cloud ADP (Cloud-Native Application Delivery Platform)
  • CNStack Products
  • AntStack Plus Products

User guide

More details here

Collecting User Cases

Before adopting open-local in production, k8s users usually want to know use cases for open-local. Please send us PR to update Use Cases with company, use case and since for wider adoption.

Contact

Join us from DingTalk: Group No.34118035

License

Apache 2.0 License

open-local's People

Contributors

alibaba-oss avatar allencloud avatar caiwenzh avatar caoyingjunz avatar chzhj avatar coolhok avatar dongjiang1989 avatar hyschumi avatar j4ckstraw avatar jenting avatar laurencelizhixin avatar lihezhong93 avatar luokp avatar peter-wangxu avatar rafi avatar thebeatles1994 avatar tzzcfrank avatar vincecui avatar withlin avatar ypnuaa037 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-local's Issues

[bug] path is still mounted in host after umounting successfully in csi-plugin

Ⅰ. Issue Description

image

猜测跟 mount 同步机制(宿主机 < - > 容器)有关,环境mount同步可能要慢。

image

孤儿Pod清理机制需要补充该部分内容。不过清理路径后,会导致临时逻辑卷无法删除(见NodeUnpublishVolume部分逻辑),需将存储卷信息持久化保存在本地才能彻底解决该问题。

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

[feature]Support Scheduling Framework

Why you need it?

目前的调度扩展形式是 Extender,这种方式有一些缺点:

  • 因为涉及到HTTP接口调用,会影响调度性能
  • Extender每次更新cache前都会加锁,目前在大规模集群(500节点)下进行了压力测试(创建10000个使用存储卷的Pod),会发现tps较低,主要原因就是每次请求等待锁释放会耗时。

How it could be?

如果改为 Scheduling Framework 形式,上面的两个问题可以解决:

  • HTTP调用彻底消失
  • Framework可以获取到Score的最终结果(Extender是不提供这种接口的,只有Filter、Score、Bind),在获取到 Node 结果后,将选择的存储设备名称以打标的方式标记在NLS上(也可能是其他资源),这样就可以去掉CSI组件->Extender的回调过程了,这样可以再进一步减少锁的使用。

Other related information

Rename: VG --> StoragePool; LogicalVolume to LocalVolume

Ⅰ. Issue Description

The project named, open-local, not open-lvm, it is suppose to support or local storage solution more than LVM.

Consider to refact the code to make it more generic and prepare to accept new local storage type.

This seem to be a big jobs, especially some of this naming is used in yaml file, which likely break backward compatibility.

Not sure if this is feasible.

[bug]create inline volume error

Ⅰ. Issue Description

create inline volume error

Ⅱ. Describe what happened

  Normal   Scheduled    18s               default-scheduler  Successfully assigned default/file-server-6b9b66fb7c-bqbgl to block4
  Warning  FailedMount  1s (x6 over 18s)  kubelet            MountVolume.SetUp failed for volume "webroot" : rpc error: code = Internal desc = NodePublishVolume(mountLvmFS): mount lvm volume csi-993f2d33ec8c5892c833a038cfafa0e364df02be647edc83bc2f66d5435871ae with path /var/lib/kubelet/pods/f65659a8-7271-4600-b43b-39d692967276/volumes/kubernetes.io~csi/webroot/mount with error: rpc error: code = Internal desc = Failed to run cmd: /bin/nsenter --mount=/proc/1/ns/mnt --ipc=/proc/1/ns/ipc --net=/proc/1/ns/net --uts=/proc/1/ns/uts  lvcreate -n csi-993f2d33ec8c5892c833a038cfafa0e364df02be647edc83bc2f66d5435871ae -L 1024m open-local-pool-0, with out: WARNING: ext4 signature detected on /dev/open-local-pool-0/csi-993f2d33ec8c5892c833a038cfafa0e364df02be647edc83bc2f66d5435871ae at offset 1080. Wipe it? [y/n]: [n]
  Aborted wiping of ext4.
  1 existing signature left on the device.
  Failed to wipe signatures on logical volume open-local-pool-0/csi-993f2d33ec8c5892c833a038cfafa0e364df02be647edc83bc2f66d5435871ae.
  Aborting. Failed to wipe start of new LV.
, with error: exit status 5

Ⅲ. Anything else we need to know?

createLvm(inline volume)

cmd := fmt.Sprintf("%s lvcreate -n %s -L %d%s %s", localtype.NsenterCmd, volumeID, pvSize, unit, vgName)

CreateLV

args := []string{localtype.NsenterCmd, "lvcreate", "-n", name, "-L", fmt.Sprintf("%db", size), "-W", "y", "-y"}

[bug] Pod scheduling problem

Ⅰ. Issue Description

Pods with multiple pvc are scheduled to nodes with insufficient capacity, while other nodes can meet the capacity requirements of PVCs.

Ⅱ. Describe what happened

I deploy an example sts-nginx with 3 replicas on 4 nodes, each pod mounts two pieces of pvc, 1T and 100G, which are created on the node's volume group 'vgdaasdata' and 'vgdaaslogs' respectively.

  1. Initially, The capacity of the 2 volume groups is 1.7T and 560G , which means that only one pod can be scheduled on each node, the cache of scheduler-extender is shown in Figure 1.
    3D972589-97E9-424F-8946-AE88134B17FE

  2. Figure 2 shows that 5 PVs were successfully created, and one was not created. The reason is that there are two pods scheduled to the same node.
    6D11807B-335A-46EE-AE3F-6A98CC210996
    At this time, the cache of scheduler-extender is shown in Figure 3 and 4.
    FE522939-0071-4D2C-9ECF-3F35B7B73883
    A17B1A70-022B-4704-931B-663E9A06321D

  3. Then, I delete the STS and these PVCs, the cache is shown in Figures 5.
    60635978-D202-47FF-BB2F-02FBB08603A8

Ⅲ. Describe what you expected to happen

Only one pod can be scheduled on each node in this situation.

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Deploy a workload as I described.

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version: 0.5.5
  • OS (e.g. from /etc/os-release): centos 7.9
  • Kernel (e.g. uname -a):
  • Install tools: helm 3.0
  • Others:
    Kube-scheduler logs shown in Figure 6.
    7CB1141F-74EF-4FDC-8669-FE9B74D94767

[bug] Extender fail to update some nls after the nls resource is deleted and rebuilt in a large-scale cluster

批量删除nls后,nls会正常创建,但extender patch时会报错:

time="2021-11-26T14:50:31+08:00" level=debug msg="get update on node local cache izbp1277upijzx9vn1t003z"
time="2021-11-26T14:50:31+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:31+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp1277upijzx9vn1t003z\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-11-26T14:50:31+08:00" level=debug msg="get update on node local cache izbp1277upijzwo2cqyljpz"
time="2021-11-26T14:50:31+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:31+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp1277upijzwo2cqyljpz\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-11-26T14:50:31+08:00" level=debug msg="get update on node local cache izbp1277upijzwo2cqyli4z"
time="2021-11-26T14:50:31+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:31+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp1277upijzwo2cqyli4z\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-11-26T14:50:31+08:00" level=debug msg="get update on node local cache izbp1277upijzx9vn1t01pz"
time="2021-11-26T14:50:31+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:32+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp1277upijzx9vn1t01pz\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-11-26T14:50:32+08:00" level=debug msg="get update on node local cache izbp1277upijzwo2cqyljmz"
time="2021-11-26T14:50:32+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:32+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp1277upijzwo2cqyljmz\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-11-26T14:50:32+08:00" level=debug msg="get update on node local cache izbp14kyqi4fdsb7ax48itz"
time="2021-11-26T14:50:32+08:00" level=debug msg="added vgs: []string{\"yoda-pool0\"}"
time="2021-11-26T14:50:32+08:00" level=error msg="local storage CRD update Status FilteredStorageInfo error: Operation cannot be fulfilled on nodelocalstorages.csi.aliyun.com \"izbp14kyqi4fdsb7ax48itz\": the object has been modified; please apply your changes to the latest version and try again"

影响就是extender无法更新nls的status,导致应用无法使用该节点上的存储设备。

是在大规模场景下做的测试。

Use an existing VG?

Question

Is it possible to use an existing VG with this project? I already have a PV and VG created, and the VG has 100GB free to create LVs.

Would it be possible to configure open-local to create new LVs in the existing VG? If it's possible, would appreciate any help.

[feature]Wanted feature

If you find Open-Local cannot satisfy your local storage demand, please create a issue to let us know what feature do you want.

kube-scheduler flag policy-config-file removed from v1.23

Ⅰ. Issue Description

Current oepn-local helm chart using job to append flag policy-config-file to kube-scheduler, and the flag is removed from v1.23, cause error.

Ⅱ. Describe what happened

Scheduler crash

Error: unknown flag: --policy-config-file

https://sourcegraph.com/github.com/kubernetes/kubernetes/-/blob/CHANGELOG/CHANGELOG-1.23.md?L1974

The legacy scheduler policy config is removed in v1.23, the associated flags policy-config-file, policy-configmap, policy-configmap-namespace and use-legacy-policy-config are also removed. Migrate to Component Config instead, see https://kubernetes.io/docs/reference/scheduling/config/ for details. (#105424, @kerthcet) [SIG Scheduling and Testing]

Ⅲ. Describe what you expected to happen

Scheduler running success .

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. helm template openlocal ./helm > openlocal.yaml
  2. kubectl apply open-local.yaml

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • k8s version: v1.23.6
  • Open-Local version: v0.5.4
  • OS (e.g. from /etc/os-release): centos8
  • Kernel (e.g. uname -a): Linux 4.18.0-305.3.1.el8.x86_64 SMP Tue Jun 1 16:14:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:

[feature] Extend to support zfs

Why you need it?

openlocal is a general name, while only lvm is covered, extend it to support ZFS seem to make it better deserving the name ;)

lvm is better performance, less resource demanding, but ZFS has some nice features like ensuring data integrity make it still a valid choice in many datacenter.

How it could be?

Providing lvm and zfs is so similar for end user perspective, I am wondering if it is possible to add a ZFS driver side by side, make openlocal support zfs as backend storage option. I see openebs have an option named local-zfs, but it is look pretty messy there. This project, seem providing much neat code. it could be better base to implement a zfs csi driver here. If zfs is included, I believe the project will get much more attention

What do you thinkg @TheBeatles1994

Other related information

[bug] open-local agent may have memory leak

Ⅰ. Issue Description

yoda-agent-z4kts                                  39m          208Mi
yoda-agent-48nqc                                  42m          192Mi
yoda-agent-zmgxz                                  50m          190Mi
yoda-agent-8r7vq                                  38m          181Mi
yoda-agent-5lbnj                                  37m          175Mi
yoda-agent-42xqb                                  47m          102Mi
yoda-agent-hrj5g                                  59m          84Mi
yoda-agent-89glg                                  59m          77Mi
yoda-agent-brzpb                                  51m          53Mi

It is weird that open-local agent uses so much memory(about 200Mi).

Ⅱ. Describe what happened

It may be difficult to troubleshoot because there is no more information.

pvc is pending when there are available devices on node

Issue Description

执行 kubectl apply -f example/devicests-fs.yaml,pvc会偶然出现pending状态,event如下:

Events:
  Type     Reason                Age                   From                                                                               Message
  ----     ------                ----                  ----                                                                               -------
  Normal   Provisioning          20s (x8 over 2m27s)   local.csi.aliyun.com_iZ0xibum6107bq2rnzisvrZ_d0a2edfe-6682-41a8-84db-13096bca2e49  External provisioner is provisioning volume for claim "default/html-nginx-device-7"
  Warning  ProvisioningFailed    20s (x8 over 2m27s)   local.csi.aliyun.com_iZ0xibum6107bq2rnzisvrZ_d0a2edfe-6682-41a8-84db-13096bca2e49  failed to provision volume with StorageClass "open-local-device-hdd": rpc error: code = InvalidArgument desc = Parse Device part schedule info error rpc error: code = InvalidArgument desc = device schedule with error Get Response StatusCode 500, Response: failed to allocate local storage for pvc default/html-nginx-device-7: Insufficient Device(hdd) storage, pod requested pvc count is 1, node available device count is 0, node device total is 4
  Normal   ExternalProvisioning  12s (x11 over 2m27s)  persistentvolume-controller                                                        waiting for a volume to be created, either by external provisioner "local.csi.aliyun.com" or manually created by system administrator

不过查看 extender的metrics,会发现cache的 local_device_bind 内容是正确的,即为0。

open-local agent populates too many error message if volume group auto-creation failed

Ⅰ. Issue Description

open-local agent populates too many error message if volume group auto-creation failed

Ⅱ. Describe what happened

if the volume group was not able to create by agent, error will be generated again an again
image

These errors may hide the informational message of the system

Ⅲ. Describe what you expected to happen

it's good to introduce some backoff mechanism to eliminate noise messages

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. specify init configuration to create vg with an occupied disk
  2. check the log in the agent

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version: v0.4.0
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

[bug] RESTORESIZE of volumesnapshot is not correct

Ⅰ. Issue Description

image

image

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

After a new PV is added to the VG, the logical volume of stripe type still fails to be created

I tried to create a 3Gi 'striped' PVC, but failed to create it, log as follows:

E0811 01:33:55.208394 1 controller.go:920] error syncing claim "90e63057-6723-40fd-83d2-39199018c149":
failed to provision volume with StorageClass "test": rpc error: code = Unknown desc = Create Lvm with error rpc error: code = Internal desc = failed to create lv: Failed to run cmd: /nsenter --mount=/proc/1/ns/mnt --ipc=/proc/1/ns/ipc --net=/proc/1/ns/net --uts=/proc/1/ns/uts lvcreate -n disk-90e63057-6723-40fd-83d2-39199018c149 -L 3221225472b -W y -y -i 2 vg_test with out: Using default stripesize 64.00 KiB.
Insufficient suitable allocatable extents for logical volume disk-90e63057-6723-40fd-83d2-39199018c149: 494 more required, with error: exit status 5

PV info

[root@node-219 ~]# pvs -S vg_name=vg_test
PV VG Fmt Attr PSize PFree
/dev/sdb vg_test lvm2 a-- <20.00g 540m
/dev/sdc vg_test lvm2 a-- <30.00g <10.54g

After I added a 30G PV(/dev/sdd) to VG, but still failed to create it, log as follows:

E0811 01:33:55.208394 1 controller.go:920] error syncing claim "4b3a6936-8526-41f0-aa3d-f9e04bad4f3f":
failed to provision volume with StorageClass "test": rpc error: code = Unknown desc = Create Lvm with error rpc error: code = Internal desc = failed to create lv: Failed to run cmd: /nsenter --mount=/proc/1/ns/mnt --ipc=/proc/1/ns/ipc --net=/proc/1/ns/net --uts=/proc/1/ns/uts lvcreate -n disk-4b3a6936-8526-41f0-aa3d-f9e04bad4f3f -L 3221225472b -W y -y -i 3 vg_test with out: Using default stripesize 64.00 KiB.
Insufficient suitable allocatable extents for logical volume disk-4b3a6936-8526-41f0-aa3d-f9e04bad4f3f: 238 more required, with error: exit status 5

PV info

[root@node-219 ~]# pvs -S vg_name=vg_test
PV VG Fmt Attr PSize PFree
/dev/sdb vg_test lvm2 a-- <20.00g 540m
/dev/sdc vg_test lvm2 a-- <30.00g <10.54g
/dev/sdd vg_test lvm2 a-- <30.00g <30.00g

Then I looked up the problem and found it was a bug. No matter how many new PVS I added, I couldn't create a 'striped' volume. It depends on the PV with the lowest free capacity.

The code is as follows:

func getPVNumber(vgName string) int {

If you're creating a 'striped' volume, I don't think it's right to get all the PVS in the VG as the '-i' parameter. I've optimized the code. I hope it helps.

	if striping {
		pvCount, err := getRequiredPVNumber(vg, size)
		if err != nil {
			return "", err
		}
		if pvCount == 0 {
			return "", fmt.Errorf("could not create `striping` logical volume, not enough space")
		}
		args = append(args, "-i", strconv.Itoa(pvCount))
	}
func getRequiredPVNumber(vgName string, lvSize uint64) (int, error) {
	pvs, err := ListPV(vgName)
	if err != nil {
		return 0, err
	}
	// calculate pv count
	pvCount := len(pvs)
	for pvCount > 0 {
		avgPvRequest := lvSize / uint64(pvCount)
		for num, pv := range pvs {
			if pv.FreeSize < avgPvRequest {
				pvs = append(pvs[:num], pvs[num+1:]...)
			}
		}

		if pvCount == len(pvs) {
			break
		}
		pvCount = len(pvs)
	}
	return pvCount, nil
}

Not Found csi-provisioner, csi-resizer ...

Question

Hi @TheBeatles1994 @caiwenzh ca

I can't find how to build these images ecp_builder/csi-node-driver-registrar, ecp_builder/csi-provisioner, ecp_builder/csi-resizer,ecp_builder/csi-snapshotter, ecp_builder/snapshot-controller

can you tell me how to do it? is this part of the content released?

expand pvc error

time="2021-11-09T11:37:09+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:09+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:09+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:09+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:09+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:09+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:09+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:09+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:09+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:09+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:10+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:10+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:10+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:10+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:10+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:14+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:14+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:14+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:14+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:14+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:22+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:22+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:22+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:22+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:22+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:22+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:22+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:22+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:22+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=22544384000 bytes), "
time="2021-11-09T11:37:22+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 392985313280"
time="2021-11-09T11:37:38+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:37:38+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:37:38+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:37:38+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=-95567216640 bytes), "
time="2021-11-09T11:37:38+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 274873712640"
time="2021-11-09T11:38:42+08:00" level=info msg="expand pvc size acs-minio/minio-data-minio-5 from 42949672960 to 161061273600"
time="2021-11-09T11:38:42+08:00" level=info msg="expanding pvc acs-minio/minio-data-minio-5"
time="2021-11-09T11:38:42+08:00" level=info msg="pvc acs-minio/minio-data-minio-5 old size is 42949672960, new size 161061273600"
time="2021-11-09T11:38:42+08:00" level=info msg="matching pvc acs-minio/minio-data-minio-5 on vg yoda-pool0(left=-95567216640 bytes), "
time="2021-11-09T11:38:42+08:00" level=error msg="failed to expand pvc acs-minio/minio-data-minio-5: failed to extend pvc, vg yoda-pool0 is not enough, requested total 488552529920, capacity 274873712640"

扩容PVC时,出现left为负值。

make failed

Ⅰ. Issue Description

I check out the repo and run make, it seems fail out of the box, am I missing something?

Ⅱ. Describe what happened

lee@ubuntu:~/workspace/picloud/open-local$ make
go test -v ./...
?       github.com/alibaba/open-local/cmd       [no test files]
?       github.com/alibaba/open-local/cmd/agent [no test files]
?       github.com/alibaba/open-local/cmd/controller    [no test files]
?       github.com/alibaba/open-local/cmd/csi   [no test files]
?       github.com/alibaba/open-local/cmd/doc   [no test files]
time="2022-06-08T12:15:26+02:00" level=info msg="test noResyncPeriodFunc"
time="2022-06-08T12:15:26+02:00" level=info msg="test noResyncPeriodFunc"
time="2022-06-08T12:15:26+02:00" level=info msg="test noResyncPeriodFunc"
time="2022-06-08T12:15:26+02:00" level=info msg="Waiting for informer caches to sync"
time="2022-06-08T12:15:26+02:00" level=info msg="starting http server on port 23000"
time="2022-06-08T12:15:26+02:00" level=info msg="all informer caches are synced"
=== RUN   TestVGWithName
time="2022-06-08T12:15:26+02:00" level=info msg="predicating pod testpod with nodes [[node-192.168.0.1 node-192.168.0.2 node-192.168.0.3 node-192.168.0.4]]"
time="2022-06-08T12:15:26+02:00" level=info msg="predicating pod default/testpod with node node-192.168.0.1"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="allocating lvm volume for pod default/testpod"
time="2022-06-08T12:15:26+02:00" level=error msg="Insufficient LVM storage on node node-192.168.0.1, vg is ssd, pvc requested 150Gi, vg used 0, vg capacity 100Gi"
time="2022-06-08T12:15:26+02:00" level=info msg="fits: false,failReasons: [Insufficient LVM storage on node node-192.168.0.1, vg is ssd, pvc requested 150Gi, vg used 0, vg capacity 100Gi], err: Insufficient LVM storage on node node-192.168.0.1, vg is ssd, pvc requested 150Gi, vg used 0, vg capacity 100Gi"
time="2022-06-08T12:15:26+02:00" level=info msg="pod=default/testpod, node=node-192.168.0.1,fits: false,failReasons: [Insufficient LVM storage on node node-192.168.0.1, vg is ssd, pvc requested 150Gi, vg used 0, vg capacity 100Gi], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="node node-192.168.0.1 is not suitable for pod default/testpod, reason: [Insufficient LVM storage on node node-192.168.0.1, vg is ssd, pvc requested 150Gi, vg used 0, vg capacity 100Gi] "
time="2022-06-08T12:15:26+02:00" level=info msg="predicating pod default/testpod with node node-192.168.0.2"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="allocating lvm volume for pod default/testpod"
time="2022-06-08T12:15:26+02:00" level=info msg="node node-192.168.0.2 is capable of lvm 1 pvcs"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="fits: true,failReasons: [], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="pod=default/testpod, node=node-192.168.0.2,fits: true,failReasons: [], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="predicating pod default/testpod with node node-192.168.0.3"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="allocating lvm volume for pod default/testpod"
time="2022-06-08T12:15:26+02:00" level=info msg="node node-192.168.0.3 is capable of lvm 1 pvcs"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="fits: true,failReasons: [], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="pod=default/testpod, node=node-192.168.0.3,fits: true,failReasons: [], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="predicating pod default/testpod with node node-192.168.0.4"
time="2022-06-08T12:15:26+02:00" level=info msg="got pvc default/pvc-vg as lvm pvc"
time="2022-06-08T12:15:26+02:00" level=info msg="allocating lvm volume for pod default/testpod"
time="2022-06-08T12:15:26+02:00" level=error msg="no vg(LVM) named ssd in node node-192.168.0.4"
time="2022-06-08T12:15:26+02:00" level=info msg="fits: false,failReasons: [no vg(LVM) named ssd in node node-192.168.0.4], err: no vg(LVM) named ssd in node node-192.168.0.4"
time="2022-06-08T12:15:26+02:00" level=info msg="pod=default/testpod, node=node-192.168.0.4,fits: false,failReasons: [no vg(LVM) named ssd in node node-192.168.0.4], err: <nil>"
time="2022-06-08T12:15:26+02:00" level=info msg="node node-192.168.0.4 is not suitable for pod default/testpod, reason: [no vg(LVM) named ssd in node node-192.168.0.4] "
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x46845f]

goroutine 91 [running]:
runtime.throw({0x178205e?, 0x18?})
        /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0004d71e8 sp=0xc0004d71b8 pc=0x4380b1
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:825 +0x305 fp=0xc0004d7238 sp=0xc0004d71e8 pc=0x44e485
aeshashbody()
        /usr/local/go/src/runtime/asm_amd64.s:1343 +0x39f fp=0xc0004d7240 sp=0xc0004d7238 pc=0x46845f
runtime.mapiternext(0xc000788780)
        /usr/local/go/src/runtime/map.go:934 +0x2cb fp=0xc0004d72b0 sp=0xc0004d7240 pc=0x411beb
runtime.mapiterinit(0x0?, 0x8?, 0x1?)
        /usr/local/go/src/runtime/map.go:861 +0x228 fp=0xc0004d72d0 sp=0xc0004d72b0 pc=0x4118c8
reflect.mapiterinit(0xc000039cf8?, 0xc0004d7358?, 0x461365?)
        /usr/local/go/src/runtime/map.go:1373 +0x19 fp=0xc0004d72f8 sp=0xc0004d72d0 pc=0x464b79
github.com/modern-go/reflect2.(*UnsafeMapType).UnsafeIterate(...)
        /home/lee/workspace/picloud/open-local/vendor/github.com/modern-go/reflect2/unsafe_map.go:112
github.com/json-iterator/go.(*sortKeysMapEncoder).Encode(0xc00058f230, 0xc000497f00, 0xc000039ce0)
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect_map.go:291 +0x225 fp=0xc0004d7468 sp=0xc0004d72f8 pc=0x8553e5
github.com/json-iterator/go.(*structFieldEncoder).Encode(0xc00058f350, 0x1436da0?, 0xc000039ce0)
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect_struct_encoder.go:110 +0x56 fp=0xc0004d74e0 sp=0xc0004d7468 pc=0x862b36
github.com/json-iterator/go.(*structEncoder).Encode(0xc00058f3e0, 0x0?, 0xc000039ce0)
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect_struct_encoder.go:158 +0x765 fp=0xc0004d75c8 sp=0xc0004d74e0 pc=0x863545
github.com/json-iterator/go.(*OptionalEncoder).Encode(0xc00013bb80?, 0x0?, 0x0?)
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect_optional.go:70 +0xa4 fp=0xc0004d7618 sp=0xc0004d75c8 pc=0x85a744
github.com/json-iterator/go.(*onePtrEncoder).Encode(0xc0004b3210, 0xc000497ef0, 0xc000497f50?)
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect.go:219 +0x82 fp=0xc0004d7650 sp=0xc0004d7618 pc=0x84d7c2
github.com/json-iterator/go.(*Stream).WriteVal(0xc000039ce0, {0x158a3e0, 0xc000497ef0})
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/reflect.go:98 +0x158 fp=0xc0004d76c0 sp=0xc0004d7650 pc=0x84cad8
github.com/json-iterator/go.(*frozenConfig).Marshal(0xc00013bb80, {0x158a3e0, 0xc000497ef0})
        /home/lee/workspace/picloud/open-local/vendor/github.com/json-iterator/go/config.go:299 +0xc9 fp=0xc0004d7758 sp=0xc0004d76c0 pc=0x843d89
github.com/alibaba/open-local/pkg/scheduler/server.PredicateRoute.func1({0x19bfee0, 0xc00019c080}, 0xc000318000, {0x203000?, 0xc00062b928?, 0xc00062b84d?})
        /home/lee/workspace/picloud/open-local/pkg/scheduler/server/routes.go:83 +0x326 fp=0xc0004d7878 sp=0xc0004d7758 pc=0x132d5e6
github.com/alibaba/open-local/pkg/scheduler/server.DebugLogging.func1({0x19cafb0?, 0xc0005a80e0}, 0xc000056150?, {0x0, 0x0, 0x0})
        /home/lee/workspace/picloud/open-local/pkg/scheduler/server/routes.go:217 +0x267 fp=0xc0004d7988 sp=0xc0004d7878 pc=0x132e4a7
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc0000b0de0, {0x19cafb0, 0xc0005a80e0}, 0xc000318000)
        /home/lee/workspace/picloud/open-local/vendor/github.com/julienschmidt/httprouter/router.go:387 +0x82b fp=0xc0004d7a98 sp=0xc0004d7988 pc=0x12d61ab
net/http.serverHandler.ServeHTTP({0x19bc700?}, {0x19cafb0, 0xc0005a80e0}, 0xc000318000)
        /usr/local/go/src/net/http/server.go:2916 +0x43b fp=0xc0004d7b58 sp=0xc0004d7a98 pc=0x7e87fb
net/http.(*conn).serve(0xc0001da3c0, {0x19cbab0, 0xc0001b68a0})
        /usr/local/go/src/net/http/server.go:1966 +0x5d7 fp=0xc0004d7fb8 sp=0xc0004d7b58 pc=0x7e3cb7
net/http.(*Server).Serve.func3()
        /usr/local/go/src/net/http/server.go:3071 +0x2e fp=0xc0004d7fe0 sp=0xc0004d7fb8 pc=0x7e914e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0004d7fe8 sp=0xc0004d7fe0 pc=0x46b061
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3071 +0x4db

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000103ba0, {0x178cc75?, 0x516ac5?}, 0x18541b0)
        /usr/local/go/src/testing/testing.go:1487 +0x37a
testing.runTests.func1(0xc0001b69c0?)
        /usr/local/go/src/testing/testing.go:1839 +0x6e
testing.tRunner(0xc000103ba0, 0xc00064bcd8)
        /usr/local/go/src/testing/testing.go:1439 +0x102
testing.runTests(0xc00050a0a0?, {0x2540700, 0x7, 0x7}, {0x7fa22c405a68?, 0x40?, 0x2557740?})
        /usr/local/go/src/testing/testing.go:1837 +0x457
testing.(*M).Run(0xc00050a0a0)
        /usr/local/go/src/testing/testing.go:1719 +0x5d9
main.main()
        _testmain.go:59 +0x1aa

goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0?)
        /home/lee/workspace/picloud/open-local/vendor/k8s.io/klog/v2/klog.go:1169 +0x6a
created by k8s.io/klog/v2.init.0
        /home/lee/workspace/picloud/open-local/vendor/k8s.io/klog/v2/klog.go:417 +0xf6

goroutine 92 [IO wait]:
internal/poll.runtime_pollWait(0x7fa204607b38, 0x72)
        /usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc0003c6100?, 0xc00050c2e1?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc0003c6100, {0xc00050c2e1, 0x1, 0x1})
        /usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc0003c6100, {0xc00050c2e1?, 0xc000613628?, 0xc00061e000?})
        /usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000612180, {0xc00050c2e1?, 0xc0005147a0?, 0x985846?})
        /usr/local/go/src/net/net.go:183 +0x45
net/http.(*connReader).backgroundRead(0xc00050c2d0)
        /usr/local/go/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
        /usr/local/go/src/net/http/server.go:668 +0xca

goroutine 43 [select]:
net/http.(*persistConn).roundTrip(0xc00056a360, 0xc0006420c0)
        /usr/local/go/src/net/http/transport.go:2620 +0x974
net/http.(*Transport).roundTrip(0x25410e0, 0xc0004c6600)
        /usr/local/go/src/net/http/transport.go:594 +0x7c9
net/http.(*Transport).RoundTrip(0x40f405?, 0x19b3900?)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x19
net/http.send(0xc0004c6600, {0x19b3900, 0x25410e0}, {0x172b2a0?, 0x178c601?, 0x0?})
        /usr/local/go/src/net/http/client.go:252 +0x5d8
net/http.(*Client).send(0x2556ec0, 0xc0004c6600, {0xd?, 0x1788f4f?, 0x0?})
        /usr/local/go/src/net/http/client.go:176 +0x9b
net/http.(*Client).do(0x2556ec0, 0xc0004c6600)
        /usr/local/go/src/net/http/client.go:725 +0x8f5
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:593
net/http.(*Client).Post(0x17b1437?, {0xc000492480?, 0xc00054bdc8?}, {0x178f761, 0x10}, {0x19b0fe0?, 0xc0001b6a20?})
        /usr/local/go/src/net/http/client.go:858 +0x148
net/http.Post(...)
        /usr/local/go/src/net/http/client.go:835
github.com/alibaba/open-local/cmd/scheduler.predicateFunc(0xc0000f9800, {0x253ebe0, 0x4, 0x4})
        /home/lee/workspace/picloud/open-local/cmd/scheduler/extender_test.go:348 +0x1e8
github.com/alibaba/open-local/cmd/scheduler.TestVGWithName(0x4082b9?)
        /home/lee/workspace/picloud/open-local/cmd/scheduler/extender_test.go:135 +0x17e
testing.tRunner(0xc000103d40, 0x18541b0)
        /usr/local/go/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1486 +0x35f

goroutine 87 [IO wait]:
internal/poll.runtime_pollWait(0x7fa204607d18, 0x72)
        /usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00003a580?, 0xc000064000?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc00003a580)
        /usr/local/go/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc00003a580)
        /usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc0001301e0)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc0001301e0)
        /usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000dc2a0, {0x19cada0, 0xc0001301e0})
        /usr/local/go/src/net/http/server.go:3039 +0x385
net/http.(*Server).ListenAndServe(0xc0000dc2a0)
        /usr/local/go/src/net/http/server.go:2968 +0x7d
net/http.ListenAndServe(...)
        /usr/local/go/src/net/http/server.go:3222
github.com/alibaba/open-local/pkg/scheduler/server.(*ExtenderServer).InitRouter.func1()
        /home/lee/workspace/picloud/open-local/pkg/scheduler/server/web.go:185 +0x157
created by github.com/alibaba/open-local/pkg/scheduler/server.(*ExtenderServer).InitRouter
        /home/lee/workspace/picloud/open-local/pkg/scheduler/server/web.go:182 +0x478

goroutine 49 [IO wait]:
internal/poll.runtime_pollWait(0x7fa204607c28, 0x72)
        /usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00003a800?, 0xc000639000?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00003a800, {0xc000639000, 0x1000, 0x1000})
        /usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00003a800, {0xc000639000?, 0x17814b4?, 0x0?})
        /usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000495a38, {0xc000639000?, 0x19ce530?, 0xc000370ea0?})
        /usr/local/go/src/net/net.go:183 +0x45
net/http.(*persistConn).Read(0xc00056a360, {0xc000639000?, 0x40757d?, 0x60?})
        /usr/local/go/src/net/http/transport.go:1929 +0x4e
bufio.(*Reader).fill(0xc000522a80)
        /usr/local/go/src/bufio/bufio.go:106 +0x103
bufio.(*Reader).Peek(0xc000522a80, 0x1)
        /usr/local/go/src/bufio/bufio.go:144 +0x5d
net/http.(*persistConn).readLoop(0xc00056a360)
        /usr/local/go/src/net/http/transport.go:2093 +0x1ac
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1750 +0x173e

goroutine 178 [select]:
net/http.(*persistConn).writeLoop(0xc00056a360)
        /usr/local/go/src/net/http/transport.go:2392 +0xf5
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1751 +0x1791
FAIL    github.com/alibaba/open-local/cmd/scheduler     0.177s
?       github.com/alibaba/open-local/cmd/version       [no test files]
?       github.com/alibaba/open-local/pkg       [no test files]
?       github.com/alibaba/open-local/pkg/agent/common  [no test files]
=== RUN   TestNewAgent

Ⅲ. Describe what you expected to happen

make should run through.

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. git clone https://github.com/alibaba/open-local.git
  2. cd open-local
  3. make 4.
  4. failed

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version: main branch
  • OS (e.g. from /etc/os-release): ubuntu 22.04
  • Kernel (e.g. uname -a): 5.15.0-33
  • Install tools:
  • Others:

open-local does not work with kubernetes after v1.23

Ⅰ. Issue Description

Ⅱ. Describe what happened

I deploy open-local to my k8s cluster, and kube-scheduler restart failed.
the error log:

Error: unknown flag: --policy-config-file

I use kubernetes version v1.23.6

according to Kubernetes Documentation: Scheduling Policies
policy-config-file flag is not supported after version v1.23

In Kubernetes versions before v1.23, a scheduling policy can be used to specify the predicates and priorities process. For example, you can set a scheduling policy by running kube-scheduler --policy-config-file or kube-scheduler --policy-configmap .

This scheduling policy is not supported since Kubernetes v1.23. Associated flags policy-config-file, policy-configmap, policy-configmap-namespace and use-legacy-policy-config are also not supported. Instead, use the Scheduler Configuration to achieve similar behavior.

It seems following script in helm/templates/init-job.yaml which add '--policy-config-file' to kube-scheduler cause the problem

            if ! grep "^\    - --policy-config-file=*" /etc/kubernetes/manifests/kube-scheduler.yaml; then
                sed -i "/    - --kubeconfig=/a \    - --policy-config-file=/etc/kubernetes/scheduler-policy-config.json" /etc/kubernetes/manifests/kube-scheduler.yaml
            fi

Ⅲ. Describe what you expected to happen

kube-scheduler running after deploy open-local

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. using kuberbetes version newer than v1.23
  2. deploy open-local
    3.use "kubectl get componentstatus" check scheduler status

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version: v0.5.4
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: helm
  • Others:

Unable to install open-local on minicube

Hello,

I followed the installation guide here

When I typed kubectl get po -nkube-system -l app=open-local the output was:

NAME                                              READY   STATUS      RESTARTS   AGE
open-local-agent-p2xdq                            3/3     Running     0          13m
open-local-csi-provisioner-59cd8644ff-n52xc       1/1     Running     0          13m
open-local-csi-resizer-554f54b5b4-xkw97           1/1     Running     0          13m
open-local-csi-snapshotter-64dff4b689-9g9wl       1/1     Running     0          13m
open-local-init-job--1-f9vzz                      0/1     Completed   0          13m
open-local-init-job--1-j7j8b                      0/1     Completed   0          13m
open-local-init-job--1-lmvqd                      0/1     Completed   0          13m
open-local-scheduler-extender-5dc8d8bb49-n44pn    1/1     Running     0          13m
open-local-snapshot-controller-846c8f6578-2bfhx   1/1     Running     0          13m

However, when I typed kubectl get nodelocalstorage, I got this output:

NAME       STATE   PHASE   AGENTUPDATEAT   SCHEDULERUPDATEAT   SCHEDULERUPDATESTATUS
minikube                                                       

According to the installation guide, the column The STATE should display DiskReady.

And if I typed kubectl get nls -o yaml, it outputted:

piVersion: v1
items:
- apiVersion: csi.aliyun.com/v1alpha1
  kind: NodeLocalStorage
  metadata:
    creationTimestamp: "2021-09-20T13:37:09Z"
    generation: 1
    name: minikube
    resourceVersion: "615"
    uid: 6f193362-e2b2-4053-a6e6-81de35c96eaf
  spec:
    listConfig:
      devices: {}
      mountPoints:
        include:
        - /mnt/open-local/disk-[0-9]+
      vgs:
        include:
        - open-local-pool-[0-9]+
    nodeName: minikube
    resourceToBeInited:
      vgs:
      - devices:
        - /dev/sdb
        name: open-local-pool-0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I am running Minicube on my desktop computer which has a SSD hard disk.

Thank you for your help.

VG Create Error

Ⅰ. Issue Description

NodeLocalStorage's Status Is Null.
image

Ⅱ. Describe what happened

Install Open-local By Chart,Worker Of Kubernetes Adding Raw Device,But VG In The Worker Is Not.Then Check NLS status,Find It Was Null.

Ⅲ. Describe what you expected to happen

VG In The Worker Of Raw Device Is Fine.

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. helm install open-local
    2.vgs in Worker
  2. kubectl get nls

Ⅴ. Anything else we need to know?

1、Scheduler Process
image
2、scheduler-policy-config.json
image
3、driver-registrar Logs
image
4、Agent Logs
image
5、scheduler-extender Logs
image
6、NodeLocalStorageInitConfig
image
7、Raw Device Of Worker
image

Ⅵ. Environment:

-Kubernetes Version
image

  • Open-Local version:
    image

  • OS (e.g. from /etc/os-release):

image

  • Kernel (e.g. uname -a):

image

  • Install tools:

image

Thanks

[bug] kubelet report "failed to get plugin info using RPC GetInfo at socket"

Ⅰ. Issue Description

image

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

[feature]Open-Local Monitor Dashboard

同时也需要优化内部Metrics数据更新逻辑,目前的方式是只要更新cache就需要更新metrics,这里不好维护。

可改为:只在调用/metrics接口时才更新metrics。

[feature] support SPDK

Why you need it?

vhost-user-blk/scsi. is a high efficient way to transport data for virtual environments. Open-local currently doesn't support vhost-user-blk/scsi.

How it could be?

The Storage Performance Development Kit (SPDK) can provide vhost support. To support vhost-user-blk/scsi in open-local node CSI driver should communicate with SPDK. Following is a brief description :

.  NodeStageVolume / NodeUnStageVolume
    n/a
.  NodePublishVolume
    -  Create bdev
        # scripts/rpc.py bdev_aio_create <path_to_host_block_dev> <bdev_name>
        # scripts/rpc.py bdev_lvol_create_lvstore <bdev_name> <lvs_name >
        # scripts/rpc.py bdev_lvol_create  <lvol_name> <size> -l <lvs_name>
    -  Create vhost device
        # scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhostblk0 <bdev_name>
        # mknod /var/run/kata-containers/vhost-user/block/devices/vhostblk0 b 241 0
        # mount --bind [...] /var/run/kata-containers/vhost-user/block/devices/vhostblk0 <target_path>
.  NodeUnPublishVolume
       # umount <target_path>
       # scripts/rpc.py bdev_lvol_delete  <lvol_name>
       # rm /var/run/kata-containers/vhost-user/block/devices/vhostblk0

besides, we need add a field in nlsc and nls to indicate if the storage is provided by SPDK.

image

Other related information

Helm install CSIDriver error

System info

Via uname -a && kubectl version && helm version && apt-show-versions lvm2 | grep amd:

Linux master-node 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:57:25Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}
lvm2:amd64/bionic-updates 2.02.176-4.1ubuntu3.18.04.3 uptodate

Bug Description

Setup

wget https://github.com/alibaba/open-local/archive/refs/tags/v0.1.1.zip
unzip v0.1.1.zip
cd open-local-0.1.1

The Problem

Using release 0.1.1 of open-local and following the current user guide's instructions, when I run

helm install open-local ./helm

-- I get the following error:

Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "CSIDriver" in version "storage.k8s.io/v1beta1"

device schedule with error Get Response StatusCode 500

Ⅰ. Issue Description

新建 pvc 时报错

  Normal   WaitForFirstConsumer  2m13s                 persistentvolume-controller                                      waiting for first consumer to be created before binding
  Normal   ExternalProvisioning  11s (x11 over 2m13s)  persistentvolume-controller                                      waiting for a volume to be created, either by external provisioner "local.csi.aliyun.com" or manually created by system administrator
  Normal   Provisioning          6s (x8 over 2m13s)    local.csi.aliyun.com_node1_080636cb-a68d-4ee8-a3a3-db5ae5634cbb  External provisioner is provisioning volume for claim "demo/pvc-open-local-device-hdd-test2-0-d0"
  Warning  ProvisioningFailed    6s (x8 over 2m13s)    local.csi.aliyun.com_node1_080636cb-a68d-4ee8-a3a3-db5ae5634cbb  failed to provision volume with StorageClass "open-local-device-hdd": rpc error: code = InvalidArgument desc = Parse Device part schedule info error rpc error: code = InvalidArgument desc = device schedule with error Get Response StatusCode 500, Response: failed to allocate local storage for pvc demo/pvc-open-local-device-hdd-test2-0-d0: Insufficient Device storage, requested 0, available 0, capacity 0

Ⅱ. Describe what happened

我是在 k3s 集群中部署的 open-local,但是因为 k3s 没有 kube-scheduler 相关配置文件,所以 init-job 无法正常运行。

modifying kube-scheduler.yaml...
grep: /etc/kubernetes/manifests/kube-scheduler.yaml: No such file or directory
+ sed -i '/  hostNetwork: true/a \  dnsPolicy: ClusterFirstWithHostNet' /etc/kubernetes/manifests/kube-scheduler.yaml
sed: can't read /etc/kubernetes/manifests/kube-scheduler.yaml: No such file or directory

其他相关配置都运行良好:

NAME                                              READY   STATUS    RESTARTS   AGE
open-local-agent-7sd9d                            3/3     Running   0          22h
open-local-csi-provisioner-785b7f99bd-hlqdv       1/1     Running   0          22h
open-local-agent-8kg4r                            3/3     Running   0          22h
open-local-agent-jljlv                            3/3     Running   0          22h
open-local-scheduler-extender-5d48bc465c-r42pn    1/1     Running   0          22h
open-local-snapshot-controller-785987975c-hhgr7   1/1     Running   0          22h
open-local-csi-snapshotter-5f797c4596-wml76       1/1     Running   0          22h
open-local-csi-resizer-7c9698976f-f7tzz           1/1     Running   0          22h

master1 [~]$ kubectl get nodelocalstorage -ojson master1|jq .status.filteredStorageInfo
{
  "updateStatusInfo": {
    "lastUpdateTime": "2021-11-12T11:02:56Z",
    "updateStatus": "accepted"
  },
  "volumeGroups": [
    "open-local-pool-0"
  ]
}
master1 [~]$ kubectl get nodelocalstorage -ojson node1|jq .status.filteredStorageInfo
{
  "updateStatusInfo": {
    "lastUpdateTime": "2021-11-12T11:01:56Z",
    "updateStatus": "accepted"
  },
  "volumeGroups": [
    "open-local-pool-0"
  ]
}
master1 [~]$ kubectl get nodelocalstorage -ojson node2|jq .status.filteredStorageInfo
{
  "updateStatusInfo": {
    "lastUpdateTime": "2021-11-12T11:02:56Z",
    "updateStatus": "accepted"
  },
  "volumeGroups": [
    "open-local-pool-0"
  ]
}

所以我怀疑是调度的问题导致无法正常创建 PVC/PV

Ⅲ. Describe what you expected to happen

希望能够在 k3s 正常部署运行 open-local

Ⅳ. How to reproduce it (as minimally and precisely as possible)

  1. 建议参考 k8s-scheduler-extender的方式实现调度的自定义,兼容 k3s 等其他平台

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Open-Local version:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.