Container Storage Interface (CSI) driver for VMware Cloud Director

License: Other

Dockerfile 0.31% Makefile 4.64% Go 95.06%

cloud-director-named-disk-csi-driver's Introduction

Container Storage Interface (CSI) driver for VMware Cloud Director Named Independent Disks

This repository contains the source code and build methods to build a Kubernetes CSI driver that helps provision VMware Cloud Director Named Independent Disks as a storage solution for Kubernetes Applications. This uses VMware Cloud Director API for functionality and hence needs an appropriate VMware Cloud Director Installation. This CSI driver will help enable common scenarios with persistent volumes and stateful-sets using VMware Cloud Director Shareable Named Disks.

The version of the VMware Cloud Director API and Installation that are compatible for a given CSI container image are provided in the following compatibility matrix:

CSI Version	CSE Version	VMware Cloud Director API	VMware Cloud Director Installation	Notes	Kubernetes Versions	docs
main	4.2.0+	37.2+	10.4.2+		1.27 1.26 1.25 1.24 1.23 1.22	CSI `main` docs
1.5.0	4.2.0+	36.0+	10.3.3.4+	Bump gopkg.in/yaml.v3 version (#221) Changes to testing framework (multiple PR's)	1.27 1.26 1.25 1.24 1.23 1.22	CSI `main` docs
1.4.1	4.1.0	36.0+	10.3.3.4+	Bump gopkg.in/yaml.v3 version (#221) Changes to testing framework (multiple PR's)	1.25 1.24 1.23 1.22 1.21	CSI `1.4.z` docs
1.4.0	4.1.0	36.0+	10.3.3.4+	Support for packaging CSI CRS in a container for CSE airgap workflow Testing Framework added Run CSI only on Control-plane Nodes Change CSI controller from stateful-set to deployment Support newer capvcdCluster RDE version Fix issues in XFS mount (support XFS) Set description of named disks as ClusterID upgrade golang version to 1.19 optimize image size of CSI container image	1.25 1.24 1.23 1.22 1.21	CSI `1.4.z` docs
1.3.2	4.0.z	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Add XFS filesystem support (Fixes #122) Updated CSI container registry references to use 'registry.k8s.io' (k8s.gcr.io freeze announcement)	1.22 1.21 1.20 1.19	CSI 1.3.z docs
1.3.1	4.0.0	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Fixed issue where CSI failed to mount persistent volume to node if SCSI Buses inside node are not rescanned	1.22 1.21 1.20 1.19	CSI 1.3.z docs
1.3.0	4.0.0	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Support for fsGroup Support for volume metrics Added secret-based way to get cluster-id for CRS	1.22 1.21 1.20 1.19	CSI 1.3.z docs
1.2.1	3.1.x	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Add XFS filesystem support (Fixes #122) Updated CSI container registry references to use 'registry.k8s.io' (k8s.gcr.io freeze announcement)	1.22 1.21 1.20 1.19	CSI 1.2.x docs
1.2.0	3.1.x	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Add support for Kubernetes 1.22 Small VCD url parsing fixes	1.22 1.21 1.20 1.19	CSI 1.2.x docs
1.1.1	3.1.x	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Fixed refresh-token based authentication issue observed when VCD cells are fronted by a load balancer (Fixes #26).	1.21 1.20 1.19	CSI 1.1.x docs
1.1.0	3.1.x	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	Remove legacy Kubernetes dependencies. Support for CAPVCD RDEs.	1.21 1.20 1.19	CSI 1.1.x docs
1.0.0	3.1.x	36.0+	10.3.1+ (10.3.1 needs hot-patch to prevent VCD cell crashes in multi-cell environments)	First cut with support for Named Independent Disks	1.21 1.20 1.19	CSI 1.0.0 docs

This extension is intended to be installed into a Kubernetes cluster installed with VMware Cloud Director as a Cloud Provider, by a user that has the rights as described in the sections below.

cloud-director-named-disk-csi-driver is distributed as a container image hosted at Distribution Harbor as projects.registry.vmware.com/vmware-cloud-director/cloud-director-named-disk-csi-driver:<CSI version>

This driver is in a GA state and will be supported in production.

Note: This driver is not impacted by the Apache Log4j open source component vulnerability.

CSI Feature matrix

Feature	Support Scope
Storage Type	Independent Shareable Named Disks of VCD
Provisioning	Static Provisioning Dynamic Provisioning
Access Modes	ReadOnlyMany ReadWriteOnce
Volume	Block
VolumeMode	FileSystem Block
Volume Expansion Support	OFFLINE ONLINE
Topology	Static Provisioning: reuses VCD topology capabilities Dynamic Provisioning: places disk in the OVDC of the `ClusterAdminUser` based on the StorageProfile specified.

Terminology

VCD: VMware Cloud Director
ClusterAdminRole: This is the role that has enough rights to create and administer a Kubernetes Cluster in VCD. This role can be created by cloning the vApp Author Role and then adding the following rights (details on adding the rights below can be found in the CSE docs):
1. Full Control: VMWARE:CAPVCDCLUSTER
2. Edit: VMWARE:CAPVCDCLUSTER
3. View: VMWARE:CAPVCDCLUSTER
ClusterAdminUser: For CSI functionality, there needs to be a set of additional rights added to the ClusterAdminRole as described in the "Additional Rights for CSI" section below. The Kubernetes Cluster needs to be created by a user belonging to this enhanced ClusterAdminRole. For convenience, let us term this user as the ClusterAdminUser.

VMware Cloud Director Configuration

In this section, we assume that the Kubernetes cluster is created using the Container Service Extension 4.0. However, that is not a mandatory requirement.

Additional Rights for CSI

The ClusterAdminUser should have view access to the vApp containing the Kubernetes cluster. Since the ClusterAdminUser itself creates the cluster, it will have this access by default. This ClusterAdminUser needs to be created from a ClusterAdminRole with the following additional rights:

Access Control =>
1. User => Manage user's own API TOKEN
Organization VDC => Create a Shared Disk

Troubleshooting

Log VCD requests and responses

Execute the following command to log HTTP requests to VCD and HTTP responses from VCD -

kubectl set env -n kube-system StatefulSet/csi-vcd-controllerplugin -c vcd-csi-plugin GOVCD_LOG_ON_SCREEN=true -oyaml
kubectl set env -n kube-system DaemonSet/csi-vcd-nodeplugin -c vcd-csi-plugin GOVCD_LOG_ON_SCREEN=true -oyaml

Once the above command is executed, CSI containers will start logging the HTTP requests and HTTP responses made via go-vcloud-director SDK. The container logs can be obtained using the command kubectl logs -n kube-system <CSI pod name>

To stop logging the HTTP requests and responses from VCD, the following command can be executed -

kubectl set env -n kube-system Deployment/csi-vcd-controllerplugin -c vcd-csi-plugin GOVCD_LOG_ON_SCREEN-
kubectl set env -n kube-system DaemonSet/csi-vcd-nodeplugin -c vcd-csi-plugin GOVCD_LOG_ON_SCREEN-

NOTE: Please make sure to collect the logs before and after enabling the wire log. The above commands update the CSI controller Deployment and CSI node-plugin DaemonSet, which creates a new CSI pods. The logs present in the old pods will be lost.

Upgrade CSI

To perform an upgrade of the Container Storage Interface (CSI) from versions v1.2.0, v1.2.1, v1.3.0, v1.3.1, and v1.3.2, it is recommended to follow the following steps:

Remove the current StatefulSet:

kubectl delete statefulset -n kube-system csi-vcd-controllerplugin

Apply the CSI 1.4 Controller CRS:

kubectl apply -f https://github.com/vmware/cloud-director-named-disk-csi-driver/blob/1.4.z/manifests/csi-controller-crs.yaml

NOTE:

These steps ensure a successful upgrade of CSI to the latest version (v1.4.0) and guarantee that the new CSI Deployment is properly installed within the Kubernetes environment.
it is recommended not to manually delete any Persistent Volumes (PVs) or Persistent Volume Claims (PVCs) associated with a StatefulSet.

Contributing

Please see CONTRIBUTING.md for instructions on how to contribute.

License

Apache-2.0

cloud-director-named-disk-csi-driver's People

Contributors

Stargazers

Watchers

cloud-director-named-disk-csi-driver's Issues

Support / documentation for installation on clusters not managed via VCD-CSE

Is your feature request related to a problem? Please describe.

Currently our VCD installations have a small number of Kubernetes clusters of different flavours (e.g. OpenShift; micro Kubernetes) which have been installed manually. As a admin of those clusters I'd like to be able to use the CSI driver as opposed to relying on hostpath or NFS based storage.

Describe the solution you'd like

Instructions on using the CSI driver outside of the CSE. Im assuming this may require some modifications to the driver as well (e.g. cluster ID is no longer usable).

Describe alternatives you've considered

Some of the clusters have been provisioned due to vendor software constraints. This makes moving to TKG harder (but not impossible). The preferred solution would be to manage workloads via TKG & CSE deployed clusters.

Additional context

No response

Volume metrics

Is your feature request related to a problem? Please describe.

I am trying to monitor PVs and am not able to get the kubelet_volume_stats_available_bytes because it is no longer supported by default in the node exporter.

Describe the solution you'd like

Include the metrics similar to how the AWS EBS CSI driver did it. I think CSI drivers themselves need to implement these metrics since k8s v1.13 or something around there though I could be wrong.

Issue: kubernetes-sigs/aws-ebs-csi-driver#524
PR: kubernetes-sigs/aws-ebs-csi-driver#677

Digital Ocean CSI PR : digitalocean/csi-digitalocean#197

VSphere Implementation : https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/d288ab214ed6d62b2d29e4e0c6e3cf0522d4fc16/pkg/csi/service/node.go#L232

Describe alternatives you've considered

I've been trying to get my relabel configs right to monitor a group of PVs and am not able to get the PV name out of the node exporter to be able to match node_filesystem_free_bytes to the right PV. This is broadly needed to monitor sensitive PVs running on VKE that might need volumes added to the array as they will up.

Additional context

No response

Mount failed: exit status 32 (mount point does not exist)

Describe the bug

We are trying to install this csi-driver on Kubernetes version 1.24.6, all nodes with OS Ubuntu 22.04. Installation from your manifests succeeds. Next, we create pvs and pod. A disk of the required size is created on the node (/dev/sdd with 12Mi), but not mounted. The container csi-vcd-nodeplugin logs are as follows:

I0120 02:35:17.154778       1 node.go:479] Checking file: [/dev/disk/by-path/pci-0000:0b:00.0-scsi-0:0:1:0] => [/dev/sdd]
I0120 02:35:17.160375       1 node.go:508] Obtained matching disk [/dev/sdd]
I0120 02:35:17.213342       1 node.go:155] Mounting device [/dev/sdd] to folder [/var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount] of type [ext4] with flags [[rw]]
time="2023-01-20T02:35:17Z" level=info msg="attempting to mount disk" fsType=ext4 options="[rw defaults]" source=/dev/sdd target=/var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount
time="2023-01-20T02:35:17Z" level=info msg="mount command" args="-t ext4 -o rw,defaults /dev/sdd /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount" cmd=mount
time="2023-01-20T02:35:17Z" level=error msg="mount Failed" args="-t ext4 -o rw,defaults /dev/sdd /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount" cmd=mount error="exit status 32" output="mount: /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount: mount point does not exist.\n"
time="2023-01-20T02:35:17Z" level=info msg="checking if disk is formatted using lsblk" args="[-n -o FSTYPE /dev/sdd]" disk=/dev/sdd
E0120 02:35:17.243250       1 driver.go:172] GRPC error: function [/csi.v1.Node/NodeStageVolume] req [&csi.NodeStageVolumeRequest{VolumeId:"pvc-72ad8d45-a48c-4898-a0ba-41e9fcb8adc3", PublishContext:map[string]string{"diskID":"pvc-72ad8d45-a48c-4898-a0ba-41e9fcb8adc3", "diskUUID":"6000c29b-da53-f4bc-f0e4-9af9c4f15aea", "filesystem":"ext4", "vmID":"dev-k8s-worker02"}, StagingTargetPath:"/var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount", VolumeCapability:(*csi.VolumeCapability)(0xc000042500), Secrets:map[string]string(nil), VolumeContext:map[string]string{"busSubType":"VirtualSCSI", "busType":"SCSI", "diskID":"urn:vcloud:disk:5b42562f-d15e-42b6-96af-f7e3d7b2636e", "filesystem":"ext4", "storage.kubernetes.io/csiProvisionerIdentity":"1674143428117-8081-named-disk.csi.cloud-director.vmware.com", "storageProfile":"SATA"}, XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}]: [rpc error: code = Internal desc = unable to format and mount device [/dev/sdd] at path [/var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount] with fs [[rw]] and flags [mount failed: exit status 32
mounting arguments: -t ext4 -o rw,defaults /dev/sdd /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount
output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount: mount point does not exist.
]: [%!v(MISSING)]]

If go to the node (where pod is created), then manually mounting the /dev/sdd disk, for example, to the /mnt directory is successful. If you try to create the 2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount (directories from log file) in the /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com directory, these directories are deleted after a few seconds.
Maybe someone faced a similar problem?

Reproduction steps

Install csi-driver
creare kind: StorageClass

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  name: vcd-disk-dev
provisioner: named-disk.csi.cloud-director.vmware.com
reclaimPolicy: Delete
parameters:
  storageProfile: "SATA"
  filesystem: "ext4"

create kind: persistentVolumeClaim

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 12Mi
  storageClassName: "vcd-disk-dev"

create pod with persistentVolumeClaim:

---
apiVersion: v1
kind: Pod
...
spec:
  volumes:
    - name: my-pod-storage
      persistentVolumeClaim:
        claimName: my-pvc1
  containers:
    - name: my-pod-container
...
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: my-pod-storage

Expected behavior

Disk /dev/sdd have to mount in /var/lib/kubelet/plugins/kubernetes.io/csi/named-disk.csi.cloud-director.vmware.com/2aa75d1684f43d1029e9db0bd05c5739be8957e9a70d85d9449c69f8e34c145f/globalmount

Additional context

Kubernetes version: 1.24.6
CSI Version: 1.3.1
Node OS: Ubuntu 22.04 LTS

Increase max number of volumes on a node

Is your feature request related to a problem? Please describe.

Nodes have a limit of 15 disks attached since it uses 1 scsi controller.

https://github.com/vmware/cloud-director-named-disk-csi-driver/blob/main/pkg/csi/node.go#L24-L31

Describe the solution you'd like

Use more than 1 scsi controller to increase maximum number of disks.

Describe alternatives you've considered

No response

Additional context

No response

[Question]Independent Disk

Hey Guys,

lovely to see this ongoing. Its possible to test this staff with 10.3 allready? asswell as AVI Loadbalancer Integration?

I got questions to independent Disk persistent. U cant do snapshots on this volumes. so how to backup this on volume scope?
Normally PVC are volumes where databases write to. These should be able to create a backup on volume level if possible. Or do you have to refer to backup technology on application level ?

https://docs.vmware.com/en/VMware-Cloud-Director/10.3/VMware-Cloud-Director-Tenant-Portal-Guide/GUID-8F8BFCD3-071A-4E45-BAC0-A9B78F2C19CE.html

ATM VMs with independent Disk are backuped on File level scope with agents.

best regards

Add priorityClassName to csi-node and csi controller

Is your feature request related to a problem? Please describe.

I tested VCD csi on single master and single worker cluster.
If e.g. node disk pressure occurs, pod gets evicted.

I saw Node taints for e.g. node disk pressure exists, but I don't know why it doesn't work.

Describe the solution you'd like

I added "priorityClassName: system-node-critical " tocsi-vcd-nodeplugin DaemonSet

And "priorityClassName: system-cluster-critical" to Deployment "csi-vcd-controllerplugin"

And then pod wasn't evicted.

Describe alternatives you've considered

No response

Additional context

No response

CSI volume snapshot

Describe the bug

Hi Team,
Just want to know if CSI volume snapshot is supported with this driver "cloud-director-named-disk-csi-driver". I don't find any documentation.

Please suggest.

Regards,
Balaji Vedagiri

Reproduction steps

1.CSI Volume Snapshot with Velero
2.
3.
...

Expected behavior

No volumesnapshotclasses CRD in the vmware tkg-m CSE cluster.

Additional context

No response

Retag csi images following SemVer syntax

Is your feature request related to a problem? Please describe.

Some automation that follows SemVer syntax to validate image tags fail because the images are tagged with .latest appended.

Describe the solution you'd like

Remove .latest in image tags.

Describe alternatives you've considered

No response

Additional context

No response

Cannot build image using Dockerfile

Describe the bug

Hi everyone, I have modified this code a little. I want to build one image to test but I can't pull photonos-docker-local.artifactory.eng.vmware.com/photon4:4.0-GA image in Dockerfile. Is there a mistake in copying the image name into Dockerfile.

Reproduction steps

1.Modify code
2. `docker build .` under `cloud-director-named-disk-csi-driver` directory => failed
3. `docker pull photonos-docker-local.artifactory.eng.vmware.com/photon4:4.0-GA` => failed
..

Expected behavior

can build image successfully.

Additional context

No response

Duplicate Authorization headers cause requests to fail when vCD is behind L7 load balancers

Describe the bug

As per the bugs raised in vmware/go-vcloud-director#425 and vmware/cloud-provider-for-cloud-director#37, when vCD is behind an L7 Load Balancer authentication fails due to duplicate Authorization headers being sent.

Only 1 Authorization header can be sent else it is an invalid request.

Reproduction steps

1. Place vCD behind an L7 load balancer such as NSX-ALB (Avi)
2. Attempt to use the CSI driver with vCD

Expected behavior

Communication to the vCD API works behind an L7 Load Balancer

Additional context

No response

Filesystem XFS is mounted as ext4 (fsType: ext4 / filesystem: xfs)

Describe the bug

Hi there,

we configured the Kubernetes Cluster to use XFS as Filesystem, after we deployed the Cluster everything looks as aspected.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sk3-standard
parameters:
  filesystem: xfs
  storageProfile: SK3 Standard

After we created the first PVCs/PVs and took a look at the PV, the fsType is ext4 and the filesystem is declared as xfs.

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: named-disk.csi.cloud-director.vmware.com
  creationTimestamp: "2022-12-09T12:53:15Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/named-disk-csi-cloud-director-vmware-com
  name: pvc-a088b019-258a-4ac6-955b-c19f80930e5f
  resourceVersion: "103190"
  uid: de5843da-678a-4d2e-be5b-0266ec77cc81
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Mi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: myvol-centostate-2
    namespace: default
    resourceVersion: "103096"
    uid: a088b019-258a-4ac6-955b-c19f80930e5f
  csi:
    driver: named-disk.csi.cloud-director.vmware.com
    fsType: ext4
    volumeAttributes:
      busSubType: VirtualSCSI
      busType: SCSI
      diskID: urn:vcloud:disk:7a8fb790-f4f5-4422-a054-3dc34ce58742
      filesystem: xfs
      storage.kubernetes.io/csiProvisionerIdentity: 1670585734394-8081-named-disk.csi.cloud-director.vmware.com
      storageProfile: SK3 Standard
    volumeHandle: pvc-a088b019-258a-4ac6-955b-c19f80930e5f
  persistentVolumeReclaimPolicy: Delete
  storageClassName: sk3-standard
  volumeMode: Filesystem
status:
  phase: Bound

But if we look at the node or the container the Filesystem Mounted is EXT4

❯ k exec -ti centostate-0 -- /bin/bash
[root@centostate-0 /]# df -T
Filesystem     Type    1K-blocks    Used Available Use% Mounted on
/dev/sdb       ext4        95054    1550     86336   2% /vol

Unfortunately, I don't currently understand how the filesystemtype and the fstype are related. I only see that ext4 is mounted on the nodes and on the containers.

I would like to understand here if this is an intended behavior.

Reproduction steps

Cloud Dorector -> Kubernetes Container Clusters -> New
Kubernetes Storage -> Filesystem -> xfs
Deploy the Cluster

Expected behavior

The expected behaviour is, that the Volumes are formatted and mounted as XFS.

Additional context

No response

Attaching a disk uses nodeID to find the VM, which fails if hostname in cluster differs from VM name in VMware

Describe the bug

Hostname of k8s nodes is a fqdn, VM name in VMware is the short name
Attaching the volume fails, because the VM to attach the volume to is not found.

AttachVolume.Attach failed for volume "pvc-7bbe77fc-bcbb-4413-8b0e-443acb33bb4a" : rpc error: code = Unknown desc = unable to find VM for node [exp-k8s-prod-worker-0001.mydomain.com]: [unable to find vm [exp-k8s-prod-worker-0001.mydomain.com] in vApp [k8s_prod]: [[ENF] entity not found]]

The VM name is just exp-k8s-prod-worker-0001

The VM lookup seems to just use the NodeID

nodeID := req.GetNodeId()
 
vdcManager.FindVMByName(cs.VAppName, nodeID)

Reproduction steps

Use different k8s node name and vm name

Expected behavior

something else :)

Additional context

No response

Documentation of necessary Role Rights

Is your feature request related to a problem? Please describe.

The documentation states, that it requires a user with permissions based on the vApp Author Role with additional rights from CSE. Since we did not use CSE to create the cluster we assumed, having a user with the vApp Author Role would be enough. Unfortunately the vApp Author user results in the following error in K8s:

Warning FailedAttachVolume 2s (x7 over 37s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-25afded9-68ef-44d9-9ab2-7c461629e170" : rpc error: code = Unknown desc = unable to find VM for node [rosed-k8s-0001]: [unable to find vApp [RosedDev] by name: [[ENF] entity not found]]

The named disk gets created fine but it is unable to find the vApp/VM. When changing the user Role to Organization Administrator, this process works fine.

Describe the solution you'd like

Documentation of the necessary rights a role needs to use the CSI in a K8s cluster which was not created using VMware tools.

Describe alternatives you've considered

No response

Additional context

No response

SecurityContext problem

Describe the bug

I am using version 1.2.0 of this csi driver. When any disk is added, it creates the "mount" folder in this disk by giving "root" user privilege. When we use this disk in a pod running with a non-root user, a "Permission denied" error is received. How can I solve this problem?

This is pod ss. This pod securityContext user : 1000

Reproduction steps

I installed the contents with the manifests and got the attached errors.

Expected behavior

Additional context

No response

Can not pull image cloud-director-named-disk-csi-driver from Harbor

Describe the bug

Can not pull image cloud-director-named-disk-csi-driver from Harbor

Reproduction steps

1.install the cloud-director-named-disk-csi-driver from github for Vmware cloud director kubernetes then can not pull image from Harbor
2. 
3.
...

Expected behavior

Please create image from Harbor

Additional context

No response

Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments

Describe the bug

I dynamically provisioned a volume and then attached it to a deployment. When I delete a node and let Cluster API provision a new node, it might happen that dangling volumeattachments are left behind. In a future drain of a node, pods might get stuck in ContainerCreating state due to this.

k describe po -n kube-system  nginx-59d9859785-lm97h
Name:             nginx-59d9859785-lm97h
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 08:52:50 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dn4fn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dn4fn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Warning  FailedScheduling    9m55s  default-scheduler        0/5 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 1 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
  Normal   Scheduled           4m58s  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-lm97h to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume  4m59s  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         70s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition

Dangling volumeattachments:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-65zg2   true       2d19h
csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-54m5f   true       2d19h

k describe volumeattachments.storage.k8s.io csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600
Name:         csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-65zg2
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2023-12-08T12:52:33Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:52:33Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:52:33Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-08T12:54:31Z
  Resource Version:  990246
  UID:               652890d8-7577-40d0-867a-86d795365ba4
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-65zg2
Events:          <none>

k describe volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Name:         csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-54m5f
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:             2023-12-08T12:48:14Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2023-12-08T12:52:26Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
        f:detachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-11T08:05:27Z
  Resource Version:  1984691
  UID:               583797f0-5ebf-46c0-82f6-27c01478c085
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Detach Error:
    Message:  rpc error: code = NotFound desc = Could not find VM with nodeID [kubermatic-v3-test-worker-57ccd5c88c-54m5f] from which to detach [pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2]
    Time:     2023-12-11T08:05:27Z
Events:       <none>

Reproduction steps

...

Expected behavior

I expected the pod to successfully start on another node.

Additional context

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc-vcdplugin
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
          - mountPath: /usr/share/nginx/html
            name: csi-data-vcdplugin
      volumes:
      - name: csi-data-vcdplugin
        persistentVolumeClaim:
          claimName: csi-pvc-vcdplugin
          readOnly: false

Apply the manifest.

k apply -n default -f nginx.yaml
persistentvolumeclaim/csi-pvc-vcdplugin created
deployment.apps/nginx created

Once the pod is running, drain and delete a node, let Cluster API provision a new one.

k drain --ignore-daemonsets --delete-emptydir-data node/kubermatic-v3-test-worker-57ccd5c88c-54m5f
k delete machine.cluster.k8s.io/kubermatic-v3-test-worker-57ccd5c88c-54m5f

Drain a node.

k drain --ignore-daemonsets --delete-emptydir-data node/node/kubermatic-v3-test-worker-57ccd5c88c-65zg2

Verify you have dangling volumeattachments.

k get volumeattachments.storage.k8s.io --sort-by .spec.source.persistentVolumeName -o custom-columns=PV:.spec.source.persistentVolumeName --no-headers | uniq -c
      2 pvc-8c4fdada-0815-4448-bf89-e68f36a1ace9
      2 pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
      1 pvc-29051ee3-506a-405e-9021-d02b09cc86c0
      1 pvc-d1e7cc95-5653-4307-a271-f7262889e614
      1 pvc-e53d0a3c-df72-4ef4-9658-ef715d763ce4

It may take a few attempts as it does not happen every time.

I can delete the ContainerCreating pod with the --force flag. But the pod still does not start.

k delete po -n kube-system  nginx-59d9859785-lm97h --force
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "nginx-59d9859785-lm97h" force deleted

k describe po -n kube-system  nginx-59d9859785-gbprn
Name:             nginx-59d9859785-gbprn
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 09:23:56 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwr9q (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dwr9q:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           13s   default-scheduler        Successfully assigned kube-system/nginx-59d9859785-gbprn to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume  14s   attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another

When I try to delete the previous volumeattachments:

k delete volumeattachments.storage.k8s.io csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600 csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
volumeattachment.storage.k8s.io "csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600" deleted
volumeattachment.storage.k8s.io "csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f" deleted

Only the one on the just drained and still existing node is deleted. The one on the non existing node remains:

k describe volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Name:         csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-54m5f
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:             2023-12-08T12:48:14Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2023-12-08T12:52:26Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
        f:detachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-11T08:34:24Z
  Resource Version:  1992050
  UID:               583797f0-5ebf-46c0-82f6-27c01478c085
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Detach Error:
    Message:  rpc error: code = NotFound desc = Could not find VM with nodeID [kubermatic-v3-test-worker-57ccd5c88c-54m5f] from which to detach [pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2]
    Time:     2023-12-11T08:34:24Z
Events:       <none>

The new pod will start:

k describe po -n kube-system  nginx-59d9859785-gbprn
Name:             nginx-59d9859785-gbprn
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 09:23:56 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Running
IP:               10.244.10.63
IPs:
  IP:           10.244.10.63
Controlled By:  ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:   containerd://ed9a503d4249f0a9b837c73a0c7063b83b98e781599f0727755225ef900cd927
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:10d1f5b58f74683ad34eb29287e07dab1e90f10af243f151bb50aa5dbb4d62ee
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 11 Dec 2023 09:30:42 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwr9q (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dwr9q:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               7m18s               default-scheduler        Successfully assigned kube-system/nginx-59d9859785-gbprn to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume      7m18s               attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             3m (x2 over 5m15s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  49s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2"
  Normal   Pulling                 48s                 kubelet                  Pulling image "nginx"
  Normal   Pulled                  32s                 kubelet                  Successfully pulled image "nginx" in 15.20183854s (15.20186002s including waiting)
  Normal   Created                 32s                 kubelet                  Created container nginx
  Normal   Started                 32s                 kubelet                  Started container nginx

A new volumeattachments is created, but the dangling remains:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-55n5f   true       5m10s
csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-54m5f   true       2d19h

If I now drain again, I have the same issue again:

k describe po -n kube-system  nginx-59d9859785-6k2nx
Name:             nginx-59d9859785-6k2nx
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-65zg2/10.70.27.39
Start Time:       Mon, 11 Dec 2023 09:36:39 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqg5w (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-bqg5w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           3m3s  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-6k2nx to kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Warning  FailedAttachVolume  3m4s  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already used by pod(s) nginx-59d9859785-gbprn
  Warning  FailedMount         61s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition

Now when I remove the finalizers of the dangling volumeattachment:

kubectl patch volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f -p '{"metadata":{"finalizers":null}}' --type=mergevolumeattachment.storage.k8s.io/csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f patched

Delete the volumeattachment of the just drained node:

k delete volumeattachments.storage.k8s.io csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af
volumeattachment.storage.k8s.io "csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af" deleted

A new volumeattachment is created:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-65zg2   true       20s

And the pod is able to start:

k describe po -n kube-system  nginx-59d9859785-6k2nx
Name:             nginx-59d9859785-6k2nx
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-65zg2/10.70.27.39
Start Time:       Mon, 11 Dec 2023 09:36:39 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Running
IP:               10.244.9.246
IPs:
  IP:           10.244.9.246
Controlled By:  ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:   containerd://d2467ccd11fa6a0524208e248eeabf2ceafd81e48bfa6e3c192a1c4528a1907a
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:10d1f5b58f74683ad34eb29287e07dab1e90f10af243f151bb50aa5dbb4d62ee
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 11 Dec 2023 09:43:09 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqg5w (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-bqg5w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               6m52s                  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-6k2nx to kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Warning  FailedAttachVolume      6m53s                  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already used by pod(s) nginx-59d9859785-gbprn
  Warning  FailedMount             2m36s (x2 over 4m50s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  26s                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2"
  Normal   Pulling                 24s                    kubelet                  Pulling image "nginx"
  Normal   Pulled                  23s                    kubelet                  Successfully pulled image "nginx" in 1.073543173s (1.073553964s including waiting)
  Normal   Created                 23s                    kubelet                  Created container nginx
  Normal   Started                 23s                    kubelet                  Started container nginx

When I now drain again, the pod is able to start without issues and manual intervention.

Kubernetes version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.10", GitCommit:"b8609d4dd75c5d6fba4a5eaa63a5507cb39a6e99", GitTreeState:"clean", BuildDate:"2023-10-18T11:44:31Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:33:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider
VMware Cloud Director version: 10.4.2.21954589

OS version:

cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

uname -a
Linux kubermatic-v3-test-cp-0 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
KubeOne v1.7.1

Container runtime (CRI) and version (if applicable)
containerd://1.6.25

Related plugins (CNI, CSI, ...) and versions (if applicable)

csi-vcd-controllerplugin version:
        image: registry.k8s.io/sig-storage/csi-attacher:v3.2.1
        image: registry.k8s.io/sig-storage/csi-provisioner:v2.2.2
        image: projects.registry.vmware.com/vmware-cloud-director/cloud-director-named-disk-csi-driver:1.4.0

csi-vcd-nodeplugin version:
        image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.2.0
        image: projects.registry.vmware.com/vmware-cloud-director/cloud-director-named-disk-csi-driver:1.4.0

Logs:
csi-vcd-nodeplugin-zg6b4.zip
csi-vcd-controllerplugin-76cff99975-xh8vc.zip
kubelet-journal-kubermatic-v3-test-worker-57ccd5c88c-65zg2.zip

Cannot mount disk into host directory because diskUUID is empty string

Describe the bug

Hi everyone, I know vCD API version and vCD installation version which is support by this CSI. But my vCD infrastructure version is 10.2 and API version is 35.0. We will upgrade vCD in the near future. I changed this source code a little (vcdclient only authenticate with vCD by user/pass, API version constraint is >= 35.0, don't use CSE extension). But I am facing the error diskUUID should not be an empty string when describing test pod (I use my-pvc.yaml and pvc-debugger.yaml in samples directory). I wrote a test file under pkg/vcdclient directory:

/*
   Copyright 2021 VMware, Inc.
   SPDX-License-Identifier: Apache-2.0
*/

package vcdclient

import (
	"fmt"
	"testing"

	"github.com/google/uuid"
	"github.com/stretchr/testify/require"
)

func Test1(t *testing.T) {

	// get client
	vcdClient, err := getTestVCDClient(map[string]interface{}{
		"getVdcClient": true,
	})
	if err != nil {
		panic(err)
	}
	_, err = vcdClient.vdc.FindStorageProfileReference("Premium-SSD")

	if err != nil {
		fmt.Errorf("cannot get storage profile Premium-SSD")
	}


	diskName := fmt.Sprintf("test-pvc-%s", uuid.New().String())
	disk, err := vcdClient.CreateDisk(diskName, 100, VCDBusTypeSCSI, VCDBusSubTypeVirtualSCSI,
		"", "Premium-SSD", false)
	if err != nil {
		fmt.Errorf("cannot create disk")
	}
	fmt.Println("UUID of disk is")
	fmt.Println(disk.UUID)
	fmt.Println("ID of disk is")
	fmt.Println(disk.Id)
	fmt.Println("Shareable of disk is")
	fmt.Println(disk.Shareable)
	fmt.Println()

	// get VM
	nodeID := "fke-worker1"
	vm, err := vcdClient.FindVMByName(nodeID)
	require.NoError(t, err, "unable to find VM [%s] by name", nodeID)
	require.NotNil(t, vm, "vm should not be nil")
	if err != nil {
		fmt.Errorf("cannot find VM by name")
	}
	vcdClient.AttachVolume(vm, disk)
	vcdClient.govcdAttachedVM(disk)
	vcdClient.DetachVolume(vm, disk.Name)
	vcdClient.DeleteDisk(diskName)

	return
}

Then I performed make test command. And this is output of Test1 function:

=== RUN   Test1
UUID of disk is

ID of disk is
urn:vcloud:disk:3b9c03c8-5a40-45f8-abe8-ae727c4bede2
Shareable of disk is
false

--- PASS: Test1 (39.56s)

UUID value of disk is empty.
Is that a error caused by version of vCD API and vCD Installation? What features of vCD API version 36.0 this code used?
Thanks in advanced!

Reproduction steps

1. Change this source a little (as mentioned) (vCD API version 35.0, vCD Installation 10.2)
2. Facing this error
3.
...

Expected behavior

Can get UUID of disk

Additional context

Can you need some information about my issue?

Are named accross a few organization supported?

My manually installed Kubernetes cluster needs to be spread across vdcs. Does this module support it? I understand that named disks can't be moved, but pods can be stitched to some specific vdc.

Support for IDs inside vcloud-csi-config.yaml

Is your feature request related to a problem? Please describe.

https://github.com/vmware/cloud-director-named-disk-csi-driver/blob/main/manifests/vcloud-csi-config.yaml

Right now the vcloud-csi-config.yaml requires (?) Names instead of IDs for org, vdc and vApp. Those things can easily change outside of the Kubernetes context (e.g. by renaming the vApp) and would require to be adjusted inside the config as well.

Describe the solution you'd like

Support of Org/VDC/vApp-IDs which stay static regardless of their name.

Describe alternatives you've considered

No response

Additional context

No response

Allow dynamic "maxVolumesPerNode" like in vSphere's CSI driver

Is your feature request related to a problem? Please describe.

We have a customer that needs to have 120 Volumes per node and currently are limited to 15 as described here.

Describe the solution you'd like

Could you use the same logic vSphere uses as described here?

Describe alternatives you've considered

Create more nodes. But this will mean more costs for our customer.

Additional context

No response

Unable to mount the PVC to the pod in RKE cluster

Describe the bug

Hi,

I am able to install the driver on RKE cluster and provision the PVC but mounting it to the POD has failed with below error

MountVolume.MountDevice failed for volume "pvc-fd87939e-f5b2-40c8-a842-0b868eec4511" : rpc error: code = Internal desc = unable to format and mount device [] at path [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fd87939e-f5b2-40c8-a842-0b868eec4511/globalmount] with fs [[rw]] and flags [exit status 32]: [%!!(MISSING)!(MISSING)v(MISSING)]

Unable to attach or mount volumes: unmounted volumes=[testpvc], unattached volumes=[testpvc kube-api-access-6mbqt]: timed out waiting for the condition

Regards,
Balaji

Reproduction steps

1.Install all the manifest files on the vanila cluster or RKE cluster
2. Install the driver
3. Create the PVC
4. Create a deployment pod with PVC
...

Expected behavior

Below are the errors in the container log

Unable to attach or mount volumes: unmounted volumes=[testpvc], unattached volumes=[testpvc kube-api-access-6mbqt]: timed out waiting for the condition

Additional context

No response

Add support for fsGroup to named-disk-driver

Is your feature request related to a problem? Please describe.

The driver is currently missing named-disk-driver support which makes it hard to run workloads as non-root

Describe the solution you'd like

Support for fsGroup in the csi driver for vcd.

Describe alternatives you've considered

No response

Additional context

No response

Snapshot Support

Is your feature request related to a problem? Please describe.

It would be useful for the driver to support volume snapshots so that volumes can be backed up, e.g. by Velero or Kasten.

Kasten primer check:

Kubernetes Version Check:
  Valid kubernetes version (v1.21.2+vmware.1)  -  OK

RBAC Check:
  Kubernetes RBAC is enabled  -  OK

Aggregated Layer Check:
  The Kubernetes Aggregated Layer is enabled  -  OK

CSI Capabilities Check:
  Using CSI GroupVersion snapshot.storage.k8s.io/v1  -  OK

Validating Provisioners:
named-disk.csi.cloud-director.vmware.com:
  Is a CSI Provisioner  -  OK
  CSI Provisioner doesn't have VolumeSnapshotClass  -  Error
  Storage Classes:
    pure
      Valid Storage Class  -  OK

Velero also requires the addition of PersistentVolume.Spec.PersistentVolumeSource.CSI (https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/csi-snapshots.md).

Describe the solution you'd like

Volume Snapshot support.

Describe alternatives you've considered

Use an alternate storage class
Use Velero with Restic

Additional context

No response

Volume Expansion

Describe the bug

I have created a storage class:

apiVersion: v1
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: pure
provisioner: named-disk.csi.cloud-director.vmware.com
reclaimPolicy: Delete
parameters:
  storageProfile: "Pure"
  filesystem: "ext4"
allowVolumeExpansion: true

I can create a PVC:

NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test   Bound    pvc-acd8f9a9-dcd3-4735-9888-7d2d9748c659   1Gi        RWO            pure           12m

However if I edit the PVC and change spec.resources.requests.storage to a new size, e.g. 2Gi, the change doesn't happen and the following is logged:

Events:
  Type     Reason                 Age                   From                                                                                                      Message
  ----     ------                 ----                  ----                                                                                                      -------
  Normal   Provisioning           8m5s                  named-disk.csi.cloud-director.vmware.com_csi-vcd-controllerplugin-0_097b7dca-fddc-4972-bcc2-b0b6e9d30b62  External provisioner is provisioning volume for claim "default/test"
  Normal   ExternalProvisioning   7m54s (x4 over 8m5s)  persistentvolume-controller                                                                               waiting for a volume to be created, either by external provisioner "named-disk.csi.cloud-director.vmware.com" or manually created by system administrator
  Normal   ProvisioningSucceeded  7m47s                 named-disk.csi.cloud-director.vmware.com_csi-vcd-controllerplugin-0_097b7dca-fddc-4972-bcc2-b0b6e9d30b62  Successfully provisioned volume pvc-acd8f9a9-dcd3-4735-9888-7d2d9748c659
  Warning  ExternalExpanding      7m11s                 volume_expand                                                                                             Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.

I can navigate to Datacenter -> Storage -> Named Disks in the vCD UI -> select the disk -> edit -> change the size and this is successful in vCenter. But the PVC is never updated.

Reproduction steps

1. Create PVC
2. Edit PVC and change size - logs that plugin not capable
3. Edit PVC via vCD - not updated in k8s

Expected behavior

It should be possible to expand the disk by editing the PVC.
Alternatively if the PVC is expanded via vCD, this should be updated in the k8s PVC.

Additional context

No response

Prepend cluster name in PVC name

Is your feature request related to a problem? Please describe.

It is difficult to identify which PVCs are no longer in use or just detached as there is no way to identify which cluster they belong to.

Describe the solution you'd like

Can we prepend the cluster name to the PVC name.

Describe alternatives you've considered

No response

Additional context

No response

[BUG] in deployment Manifests?

Describe the bug

Hi there,

I was going to deploy the CSI-Driver with the manifest manifests/csi-node.yaml, after that I got a lot of failures in my Cluster:

reflector.go:324] k8s.io/client-go/informers/factory.go:134: failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:kube-system:csi-vcd-node-sa" cannot list resource "persistentvolumes" in API group "" at the cluster scope
reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-system:csi-vcd-node-sa" cannot list resource "pods" in API group "" at the cluster scope

I think the Problem is that the default ClusterRole for csi-node has only:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: csi-nodeplugin-role
rules:
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

I would fix this myself, but I don't know, what the csi-nodes really need. :(

Would you like to take a look please?

Reproduction steps

kubectl apply -f https://raw.githubusercontent.com/vmware/cloud-director-named-disk-csi-driver/1.6.0/manifests/csi-controller.yaml

Expected behavior

I expect if I install/update the cis-driver everything works.

Additional context

No response

vmware / cloud-director-named-disk-csi-driver Goto Github PK

cloud-director-named-disk-csi-driver's Introduction

Container Storage Interface (CSI) driver for VMware Cloud Director Named Independent Disks

CSI Feature matrix

Terminology

VMware Cloud Director Configuration

Additional Rights for CSI

Troubleshooting

Log VCD requests and responses

Upgrade CSI

Contributing

License

cloud-director-named-disk-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

cloud-director-named-disk-csi-driver's Issues

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps

Expected behavior

Additional context

Describe the bug

Reproduction steps