rancher / vsphere-charts Goto Github PK

License: Apache License 2.0

Smarty 4.15% Makefile 0.81% Go 95.04%

vsphere-charts's Introduction

vSphere Charts

The repository provides charts for vSphere CSI and CPI based on the upstream repositories and the manifests provided by those repositories.

Prerequisites

Helm 3.x

vSphere CSI Chart

This chart is produced using the following repository: https://github.com/kubernetes-sigs/vsphere-csi-driver

The manifests are located here. The workflow is to compare the existing helm templates to manifests in that repository when a new version has been released. Make any changes that are required to make the templates have parity with the manifests. Then submit your PR.

Any images consumed by this chart need to be mirrored in the following repository first.

vSphere CPI Charts

This chart is produced using the following repository: https://github.com/kubernetes/cloud-provider-vsphere/

Any images consumed by this chart need to be mirrored in the following repository first.

Using charts in rancher/charts and rancher/rke2-charts

Charts from this repository should be consumed by commit hash based on the version and features that you want to have included.

vsphere-charts's People

Contributors

Stargazers

Watchers

vsphere-charts's Issues

Add 1.23 support for both CSI and CPI vSphere charts.

Newer versions of CPI and CSI have been released for support of 1.23. The charts need updated to consume these new images and any manifest tweaks required.

Make imagePullPolicy configurable via values

Is your feature request related to a problem? Please describe.
In an airgapped environment, an imagePullPolicy of "always" is not ideal. Currently, the following templates hard-code this value:

https://github.com/rancher/vsphere-charts/blob/main/charts/rancher-vsphere-csi/templates/controller/deployment.yaml#L127

https://github.com/rancher/vsphere-charts/blob/main/charts/rancher-vsphere-csi/templates/node/daemonset.yaml#L100

Describe the solution you'd like
This value should be configurable in the chart values, such as "Never".

cpi deployement issue

vSphere Server Info

vSphere version: 8.0.2.00300

Rancher Server Setup

Rancher version: 2.8.5
Installation option (Docker install/Helm Chart): app chart install

Information about the Cluster

Kubernetes version: 1.28.10+rke2r1

Describe the bug
failed to create listener: failed to listen on 0.0.0.0:10260: listen tcp 0.0.0.0:10260: bind: address already in use

To Reproduce

Result
adding "--webhook-secure-port=10264 --secure-port=10274" to daemonset arguments config

Expected Result
allow it to be configured from chart screen or resolve bug that webhook port doesn't use secure port.

Cluster member nodes are removed after reboot

vSphere Server Info

vSphere version: 7.0.3.00700

Rancher Server Setup

Rancher version: 2.7.3
Installation option (Docker install/Helm Chart):
- RKE2 1.25.9
Proxy/Cert Details: Valid Wildcard SSL Cert issued by Certum Certification Authority

Information about the Cluster

Kubernetes version: 1.25.9
Cluster Type (Local/Downstream): Local

Custom self hosted cluster running on Ubuntu 22.04.2 LTS nodes

Describe the bug
As soon as I install vSphere CPI (102.0.0+up1.4.2) on a new cluster and reboot any worker or control-plane node the node will be deleted and is unable to rejoin the cluster.
Corresponding rke2-server log:
May 03 15:03:54 testserver01 rke2[857]: time="2023-05-03T15:03:54Z" level=error msg="error syncing 'testagent03': handler node: Operation cannot be fulfilled on nodes \"testagent03\": StorageError: invalid object, Code: 4, Key: /registry/minions/testagent03, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: efd660aa-50e2-4f9a-8483-64f43cd2204e, UID in object meta: , requeuing"

After deinstalling vSphere CPI (102.0.0+up1.4.2), the node rejoins again without issues.
Log:

May 03 15:07:28 testserver02 rke2[857]: time="2023-05-03T15:07:28Z" level=info msg="certificate CN=testagent03 signed by CN=rke2-server-ca@1683122526: notBefore=2023-05-03 14:02:06 +0000 UTC notAfter=2024-05-02 15:07:28 +0000 UTC"
May 03 15:07:28 testserver02 rke2[857]: time="2023-05-03T15:07:28Z" level=info msg="certificate CN=system:node:testagent03,O=system:nodes signed by CN=rke2-client-ca@1683122526: notBefore=2023-05-03 14:02:06 +0000 UTC notAfter=2024-05-02 15:07:28 +0000 UTC"
May 03 15:07:31 testserver02 rke2[857]: time="2023-05-03T15:07:31Z" level=info msg="Handling backend connection request [testagent03]"

To Reproduce
Provision new rke2 cluster

# kubectl get nodes
NAME           STATUS   ROLES                       AGE   VERSION
testagent01    Ready    <none>                      72m   v1.25.9+rke2r1
testagent02    Ready    <none>                      72m   v1.25.9+rke2r1
testagent03    Ready    <none>                      10m   v1.25.9+rke2r1
testserver01   Ready    control-plane,etcd,master   74m   v1.25.9+rke2r1
testserver02   Ready    control-plane,etcd,master   72m   v1.25.9+rke2r1
testserver03   Ready    control-plane,etcd,master   73m   v1.25.9+rke2r1

Install vSphere CPI (102.0.0+up1.4.2) via Rancher UI with "Define vSphere Tags" option. See settings in screenshot.
Reboot any cluster node.
Check rke2-server logs.

Result

Expected Result
Nodes should not be removed when rebooting.

Screenshots

Additional context
All nodes are provisioned with the vSphere parameter "disk.enableUUID=TRUE" set before installing rke2.

Controller does not start when upgrading to latest use-gocsi issue

vSphere Server Info

vSphere version: Latest

Rancher Server Setup

Rancher version: 2.7.5
Installation option (Docker install/Helm Chart): Rancher helm charts through terraform
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: V1.23.16
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Imported

Describe the bug
What happened:

When trying to upgrade vsphere csi driver in rancher rk1 from 2.5.1 to latest, vsphere-csi-controller container in vsphere-csi-controller pod is not starting, having following error:

flag provided but not defined: -use-gocsi
Usage of /bin/vsphere-csi:
-fss-name string
Name of the feature state switch configmap
-fss-namespace string
Namespace of the feature state switch configmap
-kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
-supervisor-fss-name string
Name of the feature state switch configmap in supervisor cluster
-supervisor-fss-namespace string
Namespace of the feature state switch configmap in supervisor cluster
-version
Print driver version and exit

To Reproduce

Clean install (or upgrade) of latest rancher helm chart of vsphere-csi and vsphere-cpi

Result

vsphere-csi-controller is not starting

Expected Result

vsphere-csi-controller is starting

Screenshots
N/A

Additional context
I think is related to this:

vsphere-charts/charts/rancher-vsphere-csi/templates/controller/deployment.yaml

Line 119 in 8b8e8cf

- "--use-gocsi=false"

Please check also: kubernetes-sigs/vsphere-csi-driver#2439 (comment)

Add some structure to the repository, checks, etc.

Add or update the following.

Update Readme to outline workflow
Add templates for PRs and Issues
Add CI for basic linting, package, etc.

Missing the CSI Snapshotter sidecar - blockSnapshots will never work

vSphere Server Info

vSphere version: 7.0.3

Rancher Server Setup

Rancher version: 2.7.3

Information about the Cluster

Kubernetes version: 1.25.7 RKE2

Describe the bug
RKE2 has added the snapshot controllers and vSphere CSI supports VolumeSnapshots. In fact, this chart added a feature flag to add support here however it seems like it was forgotten that the CSI Snapshotter sidecar is required, as per VMware's installation instructions: https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.4.0/manifests/vanilla/deploy-csi-snapshot-components.sh

Notice they apply the following patch:

# Update the vSphere CSI driver to add the snapshot side car.
tmpdir=$(mktemp -d)
echo "creating patch file in tmpdir ${tmpdir}"
cat <<EOF >> "${tmpdir}"/patch.yaml
spec:
  template:
    spec:
      containers:
        - name: csi-snapshotter
          image: 'k8s.gcr.io/sig-storage/csi-snapshotter:${release}'
          args:
            - '--v=4'
            - '--timeout=300s'
            - '--csi-address=\$(ADDRESS)'
            - '--leader-election'
          env:
            - name: ADDRESS
              value: /csi/csi.sock
          volumeMounts:
            - mountPath: /csi
              name: socket-dir
EOF

echo -e "Patching vSphere CSI driver.."
kubectl patch deployment vsphere-csi-controller -n vmware-system-csi --patch "$(cat "${tmpdir}"/patch.yaml)"
echo -e "✅ Successfully patched vSphere CSI driver, please wait till deployment is updated.."
echo -e "\n✅ Successfully deployed all components for CSI Snapshot feature.\n"

To Reproduce
Create a VolumeSnapshotDriver and a VolumeSnapshot with blockSnapshots enabled. You will notice that the snapshot will never complete.

Expected Result
The snapshot should complete and the CSI Snapshotter should be added as a sidecar

CSI-Snapshotter sidecar container not available as part of vSphere CSI Controller

Rancher version :
2.6.x and 2.7.x

RKE2 Version:
v1.27.10+rke2r1

Node(s) CPU architecture, OS, and Version:
Linux xxxx 5.3.18-150300.59.144-default rancher/rke2#1 SMP Tue Dec 5 15:20:50 UTC 2023 (5dc33fd) x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 master 2 workers

Describe the bug:
The vsphere-csi-controller pod running in the kube-system namespace doesn't have the csi-snapshotter side car container that orchestrates the creation of the volume snapshot. The VolumeSnapshot related CRDs are present. This side car container is one of the prerequsites mentioned in the Volume Snapshot official documentation.

Steps To Reproduce:
Install RKE2 and check the vsphere-csi-controller pods, there wont be csi-snapshotter sidecar container

Expected behavior:
CSI snapshotter sidecar container to be part of vSphere CSI controller pods. since we are using the helm based deployment from Rancher , this should be part of the helm chart.

Actual behavior:
csi-snapshotter sidecar not part of the vSphere csi controller

Additional context / logs:
VolumeShapshot related CRDs are available from release 1.26.x but for it to work , CSI snapshotter side car container is necessary as per the CSI volume shapshot documentation. Rancher chart for the vSphere CSI doesn't have this sidecar container. we have added it manually to test the functionality

Screenshots
we added this sidecar manually to the vSphere CSI controller pod, image for reference

Use priorityClassName on vSphere CSI to avoid eviction on node pressure

Is your feature request related to a problem? Please describe.
vSphere CSI use a DaemonSet to ensure a pod is present on every node so that storage can be mounted and made available to pods that require storage on that node.
In case the node is going through some kind of pressure (memory, pid and maybe disk) pods can be evicted by kubernetes in order to lower the pressure and regain control.
If the csi pod is evicted, the node is not able anymore to mount csi volumes from vSphere.

In my case, this is what happened:

one of the nodes went down
kubernetes tried to re-schedule pods of that node to another node
this new node was not able to handle all the pods and went under memory pressure condition
kubernetes started to evict some pods, and vsphere csi pod was one of them
eviction of other pods with volume bindings was not possible, since the node was not releasing the csi volume (since the csi pod was not running)
the node was unable to exit the memory pressure by itself, i had to manually delete some pods in order to make kubernetes able to reschedule the csi pod to the node and mount the csi volumes again

Describe the solution you'd like
This should be avoided if the appropriate PriorityClass was used on the DaemonSet, to signal to kubernetes that this pod has more priority and shouldn't be evicted.

Kubernetes already ships with two PriorityClasses, system-cluster-critical and system-node-critical
These are common classes and are used to ensure that critical components are always scheduled first

It should be sufficient to set spec.template.spec.priorityClassName to system-node-critical in https://github.com/rancher/vsphere-charts/blob/main/charts/rancher-vsphere-csi/templates/node/daemonset.yaml

Password are not quoted and do not allow use of some yaml characters

Since password field is not quoted, is not possible to use password with yaml reserved characters like @password or {{ bar }}

vsphere-charts/charts/rancher-vsphere-cpi/templates/configmap.yaml

Lines 19 to 25 in c374389

  vcenter: 

  {{ .host | quote }}: 

  server: {{ .host | quote }} 

  user: {{ .username }} 

  password: {{ .password }} 

  datacenters: 

  - {{ .datacenters | quote }}

Add Windows CSI support to 2.4.1

Automate image mirroring and chart image update steps

Currently there are two steps involved in updating images for the vsphere charts:

Lookup new images and create a PR in rancher/image-mirror to mirror the upstream images to the rancher organization in Docker Hub
Update the images in the chart(s) (and possibly updates from upstream charts, which is out of scope for now)

The idea is to automate both steps, one is to add automation to rancher/image-mirror so that we can automatically create PRs when new images are available.

The second step can be automated using Renovate, which will use Docker Hub as source for new images and if found, will create a PR to update the chart(s) with the new image(s).

Resources for all containers

Is it possible to allow configuring resources for all of the containers deployed in CSI+CPI?
For example I would like to set a custom memory limit for the csi-attacher container.

Not sure if this issue belongs here or at the upstream repo :)

with windows server 2022 the sockets for csi are not correct

vSphere Server Info

vSphere version: 7.0.3

Rancher Server Setup

Rancher version: v2.7.1
Installation option (Docker install/Helm Chart): Helm Chart
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE2
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: v1.24.10+rke2r1
Cluster Type (Local/Downstream): Hosted vSphere
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

Describe the bug
We installed the csi-driver via the rancher-gui and use windows server 2022 standard
in the vsphere-csi-node-windows pods the node-driver-registrar crashes due to an unreachable socket.
The Pod logs print out the following:

I0307 10:51:09.178916    8264 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix \\\\var\\\\lib\\\\kubelet\\\\plugins\\\\csi.vsphere.vmware.com\\\\csi.sock: connect: A socket operation was attempted to an unreachable network.",}

A workaround is to edit the daemonset vsphere-csi-node-windows and replace \ with \ in the environment of the container.
It would look like this afterwards:

        env:
        - name: ADDRESS
          value: unix://C:\csi\csi.sock
        - name: DRIVER_REG_SOCK_PATH
          value: \var\lib\kubelet\plugins\csi.vsphere.vmware.com\csi.sock

Add support for k8s versions 1.29 and 1.30

Is your feature request related to a problem? Please describe.
Looking to upgrade to 1.29/1.30, but the chart is currently only doing overrides for up to 1.28
Describe the solution you'd like
Bump the overrides to the new version

Describe alternatives you've considered
Overriding it myself, but that doesn't help everyone else

CPI/CSI controller pods scheduled onto worker nodes, upstream restricts to control plane

vSphere Server Info

vSphere version: 7.0.3.00700

Rancher Server Setup

Rancher version: 2.6.6
Installation option (Docker install/Helm Chart): helm chart
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1 1.3.11, kubernetes v1.23.6
Proxy/Cert Details: private CA

Information about the Cluster

Kubernetes version: v1.23.6
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): vSphere

Describe the bug
Upstream Helm charts for vSphere CPI and CSI use nodeAffinity and nodeSelector to run the cloud-controller-manager and vsphere-csi-controller on control plane nodes. The Rancher charts run them on any non-Windows node.

To Reproduce
Deploy vSphere CPI and CSI using the "Apps" -> "Charts" menu in Rancher.

Result
Some cloud-controller-manager / vsphere-csi-controller pods are scheduled on worker nodes. In the case of the cloud-controller-manager DaemonSet, pods are scheduled on all nodes in the cluster.

Expected Result
cloud-controller-manager / vsphere-csi-controller pods scheduled only on control plane nodes

Screenshots
Happy to supply on request

Additional context
The upstream CPI cloud-controller-manager DaemonSet runs only on control plane nodes.

The upstream CSI vsphere-csi-controller Deployment runs only on control plane nodes, and the vSphere docs state "If you deploy vSphere Container Storage Plug-in in a single control plane setup, you can edit the following field to change the number of replicas to one.".

I think Rancher's addition of matchExpressions to exclude Windows nodes here and here mistakenly assumed that the matchExpressions would combine with the control plane matchExpressions to mean "any non-Windows control plane node". However what actually happens is Kubernetes ORs the matchExpressions and the rule becomes "any control plane node OR any non-Windows node" (see the kubernetes docs "If you specify multiple terms in nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified terms can be satisfied (terms are ORed). If you specify multiple expressions in a single matchExpressions field associated with a term in nodeSelectorTerms, then the Pod can be scheduled onto a node only if all the expressions are satisfied (expressions are ANDed).").

Would like to submit a pull request to add labels to vsphere charts

Is your feature request related to a problem? Please describe.
Currently there is no way to add any custom pod labels for these charts unless I fork the chart and add them... they are hardcoded.

Describe the solution you'd like
I would like to add a helper function to add common chart helm labels as well as the ability to override a value to add podlabels

Describe alternatives you've considered
This is a easy update and really does not need an alternative. alternative would be to fork the chart and create your own object adds.

Additional context
I am creating this issue so that I can use this to track my pull request. I am waiting for my company to approve that I can do this then I will submit a pull with what I am thinking.

make Charts available for Kubernetes 1.24+

Hello,

at the Moment the Charts are not available in a Cluster with Kubernetes 1.24.
Would it be possible to make them compatible with this?

I know that the original Vsphere Charts already compatible with Kubernetes 1.24, but for Upgrade Reasons I'm waiting for them to become available trough Rancher.

Kind Regards

rancher-vsphere-csi: make reclaimPolicy configurable in StorageClass

Please make reclaimPolicy in StorageClass configurable via values.yaml

	vcenter:
	{{ .host \| quote }}:
	server: {{ .host \| quote }}
	user: {{ .username }}
	password: {{ .password }}
	datacenters:
	- {{ .datacenters \| quote }}

rancher / vsphere-charts Goto Github PK

vsphere-charts's Introduction

vSphere Charts

Prerequisites

vSphere CSI Chart

vSphere CPI Charts

Using charts in rancher/charts and rancher/rke2-charts

vsphere-charts's People

Contributors

Stargazers

Watchers

Forkers

vsphere-charts's Issues

Recommend Projects

Recommend Topics

Recommend Org