openshift / local-storage-operator Goto Github PK

View Code? Open in Web Editor NEW

80.0 16.0 69.0 29 MB

Operator for local storage

License: Apache License 2.0

Makefile 0.90% Go 97.60% Shell 1.41% Dockerfile 0.10%

local-storage-operator's Introduction

local-storage-operator

Operator for local storage

Deploying with OLM

Instructions to deploy on OCP >= 4.2 using OLM can be found here

Using the must-gather image with the local storage operator

Instructions for using the local storage's must-gather image can be found here

local-storage-operator's People

Contributors

Stargazers

Watchers

local-storage-operator's Issues

LSO fails to create PVs

when creating a new lvset nvme LSO fails to create PVs but there is no error thats reported, in UI or logs;
more details: https://access.redhat.com/support/cases/#/case/03191850

Log shows :

2022-04-08T14:20:10.179Z        INFO    localvolumeset-symlink-controller       provisioning succeeded  {"Request.Namespace": "openshift-local-storage", "Request.Name": "nvme", "Device.Name": "nvme2n1"}
2022-04-08T14:20:10.179Z        INFO    localvolumeset-symlink-controller       total devices provisioned       {"Request.Namespace": "openshift-local-storage", "Request.Name": "nvme", "count": 4, "storageClass.Name": "nvme"}

on the node /mnt/local-storage/

sh-4.4# ls
nvme  sc-ceph  storage.class.    <-- none of these should be here execpt for nvme 
sh-4.4# ls -alh
total 0
drwxr-xr-x. 5 root root  54 Apr  6 22:52 .
drwxr-xr-x. 3 root root  27 Mar 30 19:10 ..
drwxr-xr-x. 2 root root   6 Apr  6 22:52 nvme
drwxr-xr-x. 2 root root   6 Apr  6 13:14 sc-ceph
drwxr-xr-x. 2 root root 196 Apr  1 14:06 storage.class
sh-4.4# ls -alh nvme/ storage.class/
nvme/:
total 0
drwxr-xr-x. 2 root root  6 Apr  6 22:52 .
drwxr-xr-x. 6 root root 65 Apr  8 14:36 ..
 
storage.class/:
total 0
drwxr-xr-x. 2 root root 196 Apr  1 14:06 .
drwxr-xr-x. 6 root root  65 Apr  8 14:36 ..
lrwxrwxrwx. 1 root root  46 Apr  1 14:06 ata-VR000240GXBBL_2026292D4F9A -> /dev/disk/by-id/ata-VR000240GXBBL_2026292D4F9A
lrwxrwxrwx. 1 root root  46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00HTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00HTDT8
lrwxrwxrwx. 1 root root  46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00RTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00RTDT8
lrwxrwxrwx. 1 root root  46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00WTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00WTDT8
lrwxrwxrwx. 1 root root  46 Mar 30 19:19 nvme-KCD6XLUL3T84_41E0A029TDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_41E0A029TDT8
sh-4.4# rm -rf *

Problem seems to be with the old uncleaned storage classes or symlinks in local-storage folder (dont know whey these are here in first place); workaround - is to clean/delete unwanted files:

> oc debug nodes/node1
> chroot /host
> ls /mnt/local-storage/
## if you notice any garbage or old storage classes delete them
> rm -rf nvme sc-ceph storage.class

and try recreate lvset workers-nvme

sh-4.4# cd workers-nvme/
sh-4.4# ls
nvme-KCD6XLUL3T84_4170A001TDT8  nvme-KCD6XLUL3T84_4170A007TDT8  nvme-KCD6XLUL3T84_4170A00DTDT8  nvme-KCD6XLUL3T84_4170A00MTDT8
sh-4.4# exit

> oc get pv | grep local
local-pv-3238eb37          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-57fc01e           3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-672f73bd          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-6a74daf3          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-8d6bb503          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-8f2ffb78          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-de76809a          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s
local-pv-f93b075d          3576Gi     RWO            Delete           Available                                                      workers-nvme            7m50s

local storage operator metrics target down after upgrade.

Prometheus cannot scrape metrics from the local-storage-operator pod after upgrading to ocp 4.9

"lastError": "Get "http://:8383/metrics": dial tcp :8383: connect: connection refused",

"lastError": "Get "http://:8686/metrics": dial tcp :8686: connect: connection refused",

checking the config I can verify the ip address is exactly the one where prometheus cannot connect:

local-storage-operator-76f878db87-qngn4 1/1 Running 0 11h

The serviceMonitor is showing:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp: "2021-11-04T08:33:19Z"
generation: 1
labels:
name: local-storage-operator
name: local-storage-operator-metrics
namespace: openshift-local-storage
spec:
endpoints:
- bearerTokenSecret:
  key: ""
  port: http-metrics
- bearerTokenSecret:
  key: ""
  port: cr-metrics
  namespaceSelector: {}
  selector:
  matchLabels:
  name: local-storage-operator

The service is showing:

spec:
clusterIP:
clusterIPs:
-
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http-metrics
port: 8383
protocol: TCP
targetPort: 8383
- name: cr-metrics
port: 8686
protocol: TCP
targetPort: 8686
selector:
name: local-storage-operator
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

But the pod is not listening at all in those ports: 8383 / 8686

Thanks.

disk ID changement is not handled by local-storage

Version used

local-storage-operator.v4.12.0-202305101515 (channel stable).

Steps to replicate :

Create a localvolume :

apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: infra-1-sdb-prom-core
  namespace: openshift-local-storage
spec:
  logLevel: Normal
  managementState: Managed
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
      - key: kubernetes.io/hostname
        operator: In
        values:
        - cat-fch8-infra-1
  storageClassDevices:
  - devicePaths:
    - /dev/sdb
    fsType: xfs
    storageClassName: local-storage-prom-core
    volumeMode: Filesystem
  tolerations:
    - effect: NoExecute
      operator: Exists
      key: node-role.kubernetes.io/infra

Create a PVC (cluster-monitoring config in Openshift will do it for us. We just specify the storageClass)...
All is working perfectly.

We migrate our control-plane and infra nodes VM in our Nutanix cluster frome site A to site B. VM are powered on.
The migration change the UUID/SERIAL of disk attached.

Current :

After a few minutes, all is working on cluster except monitoring stack with local-storage PVC :

2 symlink (first is linked to an id (disk serial number) which doesn't exists anymore on host. It blinks red.

[core@cat-fch8-infra-1 ~]$ ll /mnt/local-storage/local-storage-prom-core/
total 0
lrwxrwxrwx. 1 root root 80 Aug 25 17:03 scsi-1NUTANIX_NFS_3_0_20023_ee3e4928_d368_4a67_b2dd_d18fdbf99650 -> /dev/disk/by-id/scsi-1NUTANIX_NFS_3_0_20023_ee3e4928_d368_4a67_b2dd_d18fdbf99650
lrwxrwxrwx. 1 root root 79 Sep 11 12:48 scsi-1NUTANIX_NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e -> /dev/disk/by-id/scsi-1NUTANIX_NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e

The disk sdb is present on host ...

[core@cat-fch8-infra-1 ~]$ lsblk -o NAME,SERIAL
NAME   SERIAL
sda    NFS_3_0_7690_b9a6cf2e_fc17_4c0e_896e_ef9df859e41a
├─sda1 
├─sda2 
├─sda3 
└─sda4 
sdb    NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e
sr0    QM00001

... But there is no disk mounted on the VM :

[core@cat-fch8-infra-1 ~]$ sudo df -h | grep local

We have now 2 PVs instead of 1. The new one si in "Available" state :

openshift@cat-fch8-bastion ~]$ oc get pv -l storage.openshift.com/owner-name=infra-1-sdb-prom-core
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                     STORAGECLASS              REASON   AGE
local-pv-7c13822d   15Gi       RWO            Delete           Bound       openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0   local-storage-prom-core            16d
local-pv-812b91bb   15Gi       RWO            Delete           Available                                                             local-storage-prom-core            3h13m

Expected :

Promtheus pods are running. Their PVC (from openshift-monitoring) is linked to a PV which is linked to the disk mounted on sddb on host and local-storage operator manage the ID/SERIAL change of disk.

Issue on local-storage-operator deploying with OCS

Regarding this issue: OCS-Operator 104 issue

It still happens, could you please take a look?

LocalVolumeDiscovery on OKD4.5

We are trying to deploy the local-storage-operator on OKD4.5 using the operator (https://github.com/openshift/local-storage-operator/blob/master/examples/olm/catalog-create-subscribe.yaml) provided in this repository. We managed to install the operator, and create a LocalVolumeDiscovery. But the resource LocalVolumeDiscoveryResult stays empty. When we consult the logs of the diskmaker-discovery-xyz pods we observe the following log entries:

I1126 15:11:15.003791       1 event.go:255] Event(v1.ObjectReference{Kind:"LocalVolumeDiscovery", Namespace:"openshift-local-storage", Name:"auto-discover-devices", UID:"bf8aa84b-5bab-4e85-8628-daa49ce90e61", APIVersion:"local.storage.openshift.io/v1alpha1", ResourceVersion:"84114511", FieldPath:""}): type: 'Warning' reason: 'ErrorUpdatingDiscoveryResultObject' c-0011.host.name - failed to update LocalVolumeDiscoveryResult status. Error: LocalVolumeDiscoveryResult.local.storage.openshift.io "discovery-result-c-0011.host.name" is invalid: status.discoveredDevices.size: Invalid value: "integer": status.discoveredDevices.size in body must be of type string: "integer"
failed to update the device status in the LocalVolumeDiscoveryResult resource

Which seems to indicate that the diskmaker-discovery tries to update the resource with the wrong data type which the API does not expect.

Client Version: 4.5.0-0.okd-2020-08-12-020541
Server Version: 4.5.0-0.okd-2020-10-15-235428
Kubernetes Version: v1.18.3

Is this a bug, or are we doing something wrong?

Add gofmt / go vet checks

Please add gofmt / go vet (/ go lint?) checks to unit prow job to make sure the code looks sane.

Resizing volumes

Is it possible to resize volumes provisioned with this operator?

I've tried the following:

Resize the (VMware-backed) disk from 200 GiB to 210 GiB.
echo 1 > /sys/class/block/sdb/device/rescan on the node. The new size shows up with e.g. lsblk.
Edit the PV and set .spec.capacity.storage=210Gi.
Edit the PVC and set .spec.resources.requests.storage=210Gi.

The PVC is stuck with the following event, even after restarting the pod:

Type     Reason             Age   From           Message
----     ------             ----  ----           -------
Warning  ExternalExpanding  40m   volume_expand  Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.

Handle deletion of LocalVolume CR properly

local-provisioner cannot get nodes

after the last automatic update the local-provisioner daemonset crashloops with the following message.

oc logs -p local-nvmes-local-provisioner-pntrk
I0611 13:14:46.274851       1 common.go:320] StorageClass "local-nvme" configured with MountDir "/mnt/local-storage/local-nvme", HostDir "/mnt/local-storage/local-nvme", VolumeMode "Filesystem", FsType "xfs", BlockCleanerCommand ["/scripts/quick_reset.sh"]
I0611 13:14:46.274965       1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-nvme:{HostDir:/mnt/local-storage/local-nvme MountDir:/mnt/local-storage/local-nvme BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Filesystem FsType:xfs}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:local-nvmes storage.openshift.com/local-volume-owner-namespace:local-storage]}
I0611 13:14:46.274990       1 main.go:64] Ready to run...
W0611 13:14:46.274996       1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.
W0611 13:14:46.275004       1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.
I0611 13:14:46.275288       1 common.go:382] Creating client using in-cluster config
I0611 13:14:46.282826       1 main.go:126] Could not get node information (remaining retries: 2): nodes "node-006.schaeffer-ag.de" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
I0611 13:14:47.283826       1 main.go:126] Could not get node information (remaining retries: 1): nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
I0611 13:14:48.284873       1 main.go:126] Could not get node information (remaining retries: 0): nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
F0611 13:14:48.284900       1 main.go:129] Could not get node information: nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope

It is using quay.io/openshift/origin-local-storage-static-provisioner:latest which corresponds to ID 9f53bcaa098060147bf0b69c4dccc288aae2dcef38caad4fe9504d8558e0dac3 and digest sha256:421ea9b9117615bd68eadee74f3f5be64fc1fdfe2050d2309141820fbbb26875

documentation: local-provisioner comparison

Hello everyone! ✋

I'm very interested in using local storage in Kubernetes for applications like distributed databases. I am currently developing an operator for Cassandra and ScyllaDB with rook and both of those databases would benefit greatly from using local storage.

I am wondering what's the difference between this project and local volume provisioner. I would love to know the scope of the project and see if it aligns with my goals, so that I can also contribute to it. 😄

Please prepare bundle for OCP 4.5

Now that we are fully branched for 4.5, please prepare your operator to supply a 4.5 bundle, so that 4.5 operator publishing works and doesn't overwrite 4.4 bundles. This means at least updating the package.yaml under https://github.com/openshift/local-storage-operator/blob/master/manifests/local-storage-operator.package.yaml

Reference: Get OLM operator owners to update their CSV channels

where is the document for this? How to use this to no-code create pv or pv sets in filesystem volume mode

Where can I find the documentation for this? I am looking to understand how to use this to create PV or PV sets in filesystem volume mode without coding.
The following link mentioned in the internal doc is broken:
"
Warning: These docs are for internal development and testing. Use https://docs.openshift.com/container-platform/latest/storage/persistent_storage/persistent-storage-local.html docs for installation on OCP
"

After experimenting with the GUI, I have successfully created block mode PV sets by following these steps:

Mounting an unformatted disk (in VM),
Creating a discovery instance,
Setting up a volume set to use block mode, which is the default setting,
Consequently, a PV is automatically created.

However, if I format the disk using the command format.xfs /dev/vdb, the aforementioned procedure ceases to function.

publish on operatorhub

It would be great to have a working operator on OperatorHub for non redhat customers.

Hiding the StorageClass for users

Hi,

We're using the local-storage-operator to discover local volumes which will be consumed by Ceph via the OCS operator. This all works perfectly.

Now when a user tries to add an volume to their pod, they see all available StorageClasses, including the storage class in which the Ceph volumes are offered. Selecting this storage class doesn't work for a user and we only want to offer dynamic persitent storage offerings.

Is there a way to hide the storage class? We want to hide the ceph-hdd storage class in the screenshot below:

local-must-gather image not available?

On OCP 4.6.1, I'm trying to debug an issue with the operator, and I can't get local-must-gather working.

From the pod created by oc adm must-gather --image=quay.io/openshift/local-must-gather:latest:

    state:
      waiting:
        message: 'rpc error: code = Unknown desc = Error reading manifest latest in
          quay.io/openshift/local-must-gather: unauthorized: access to the requested
          resource is not authorized'
        reason: ErrImagePull

At first I thought this might be because I'm on a ppc64le cluster, but I can't find the image among the 287 listed at https://quay.io/organization/openshift .

Unable to deploy local-storage

Still seem to have a mismatch somewhere with the versioning changes that were added, not sure I thought that #19 would sync that up.

But trying to run just now I'm still seeing the same error when trying to create a new cr:

no matches for kind "LocalVolume" in version "local.storage.openshift.io/v1alpha1
and it doesn't appear that the local-storage CRD is ever installed.

ART build issue - invalid OLM manifest data

local-storage-operator/manifests/4.3/local-storage-operator.v4.3.0.clusterserviceversion.yaml does not exist as defined in art.yaml .

Pull failure using CSV

Trying to install/enable the local-storage operator reports:

Normal Pulling 8m15s (x4 over 9m52s) kubelet, ovirt-nbx5h-storage-0-4vc2h Pulling image "registry.redhat.io/openshift4/ose-local-storage-operator@sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de"
Warning Failed 8m13s (x4 over 9m50s) kubelet, ovirt-nbx5h-storage-0-4vc2h Error: ErrImagePull
Warning Failed 8m13s kubelet, ovirt-nbx5h-storage-0-4vc2h Failed to pull image "registry.redhat.io/openshift4/ose-local-storage-operator@sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de": rpc error: code = Unknown desc = Error reading manifest sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de in registry.redhat.io/openshift4/ose-local-storage-operator: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<TITLE>Error</TITLE>\nAn error occurred while processing your request.

\nReference #132.b5e13217.1596825417.9166acf7\n\n"

LocalVolumeSet should support all LVM volumes in a volume group

It would be great to support LVM-based LocalVolumeSet volumes. It would need a way to set the volume group to use.

PersistentVolumeMode duplicating corev1 PersistentVolumeMode type

https://github.com/openshift/local-storage-operator/blob/master/pkg/apis/local/v1alpha1/types.go#L44 is duplicating the persistent volume mode defined in corev1, which is sort of weird since corev1 is being imported in that file.

Now there may be a reason for this duplication but it seemed weird to me.

Install via OLM fails on OKD 4.13

Trying to install via OLM on OKD4.13 (Presumably also OCP4.13) seems to have version issues

Environment:

❯ oc get clusterversion
NAME      VERSION                          AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.okd-2022-12-18-054128   True        False         20h     Cluster version is 4.13.0-0.okd-2022-12-18-054128

❯ oc exec -n openshift-operator-lifecycle-manager olm-operator-69cc6d5cc4-p2gbw -- olm --version
OLM version: 0.19.0
git commit: b73f64b354a65d4629ac13323adf58e0b6ef29c8

Issue

Following the install instructions here: https://github.com/bshephar/local-storage-operator/blob/master/docs/deploy-with-olm.md

Results in the following error:

❯ oc logs localstorage-operator-manifests-dst5m
Error: open /etc/nsswitch.conf: permission denied
Usage:
  opm registry serve [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
      --debug                    enable debug logging
  -h, --help                     help for serve
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")
      --timeout-seconds string   Timeout in seconds. This flag will be removed later. (default "infinite")

Global Flags:
      --skip-tls   skip TLS certificate verification for container image registries while pulling bundles or index

This appears to be related to the version of OLM in-use.

The container it's using by default is:

❯ oc get po -o json | jq '.items[].spec.containers[].image'
"quay.io/gnufied/gnufied-index:1.0.0"

Trying different gnufied versions

This is quite old looking at: https://quay.io/repository/gnufied/gnufied-index?tab=tags

So I thought maybe it's just that old version. Trying with the latest:
quay.io/gnufied/gnufied-index@sha256:5c76a091fde6fdf5ccad0ee109f1da81b435e1d3634a3f751dce64fd3db04a45

This gives a new permission related error:

❯ oc logs localstorage-operator-manifests-b4kj4
time="2022-12-22T01:28:59Z" level=warning msg="\x1b[1;33mDEPRECATION NOTICE:\nSqlite-based catalogs and their related subcommands are deprecated. Support for\nthem will be removed in a future release. Please migrate your catalog workflows\nto the new file-based catalog format.\x1b[0m"
Error: open db-895998666: permission denied
Usage:
  opm registry serve [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
      --debug                    enable debug logging
  -h, --help                     help for serve
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")
      --timeout-seconds string   Timeout in seconds. This flag will be removed later. (default "infinite")

Global Flags:
      --skip-tls-verify   skip TLS certificate verification for container image registries while pulling bundles
      --use-http          use plain HTTP for container image registries while pulling bundles

Question:

Do we need to add some additional RBAC rules for local-storage-operator to work with 4.13, or do we need to rebuild images with some additional changes?

Move to using klog for logging

We need log levels and more granular control over logging. Operator should also respect log level set in the CR.

Unable to use LVM volumes

With the following configuration:

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "lvm-disks"
namespace: "local-storage"
spec:
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
storageClassDevices:
- storageClassName: "fast-lvm"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/mapper/storagepool-first

I'm getting

E1119 10:23:47.808238 1 diskmaker.go:164] found empty matching device list

Also changing the devicePaths to /dev/dm-0 or /dev/disk/by-id/dm-name-storagepool-first is not working.
Is LVM not supported in the local-storage-operator?

local-storage-operator needs POD_NAME environment variable

Using https://github.com/openshift/local-storage-operator/blob/master/examples/olm/catalog-create-subscribe.yaml (with fixed channel "4.5", because 4.4 is not available), I get the problem, that local-storage-operator errors out with needing POD_NAME environment variable.

Setting it with downwards-API, it starts at least.

It is reporting the following error though.

Could not create metrics Sevice","error":"failed to create or get service for metrics: services \"local-storage-operator-metrics\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

no matches for kind "LocalVolume" in version "local.storage.openshift.io/v1"

Tried to follow the instructions here:
https://github.com/openshift/local-storage-operator/blob/master/docs/deploy-with-olm.md
oc create -f create-cr.yaml fails with
error: unable to recognize "create-cr.yaml": no matches for kind "LocalVolume" in version "local.storage.openshift.io/v1"

Avoid multiple calls to symlink directory (/mnt/local-storage/<storageclassName>) creation

The code to create the top level symlink directory /mnt/local-storage/<storageclassname> should be moved out to second for loop to avoid calling it multiple times in case of multiple devices on a node.

olm data needs to be updated for 4.3

https://github.com/openshift/local-storage-operator/tree/release-4.3/manifests
file not found: manifests/4.3/image-references

OKD 4.15 and local-storage-operator

Unable to install operator. Install plan failed: api-server resource not found installing CustomResourceDefinition localvolumediscoveries.local.storage.openshift.io: GroupVersionKind apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition not found on the cluster. This API may have been deprecated and removed, see https://kubernetes.io/docs/reference/using-api/deprecation-guide/ for more information.

storageclass doesn't get created on OCP 4.4

Steps taken:
1.
oc create -f https://raw.githubusercontent.com/openshift/local-storage-operator/master/examples/olm/catalog-create-subscribe.yaml

2.[kni@provisionhost-0 ~]$ cat << EOF|oc create -f -
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
- worker-1

storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sdb
EOF

localvolume.local.storage.openshift.io/local-disks created

oc get sc
No resources found in default namespace.

Expected to see:
oc get sc
NAME PROVISIONER AGE
local-sc kubernetes.io/no-provisioner 8h

This works fine on OCP4.3

Ability to specify default storage class

I would to have the ability to set a field in the cr to have the created storage class be the default. I can patch it in after the sc is created of course, but being able to do it from the cr would be nice.

Allow specifying devices with wildcard

It would be great to be able to specify storage devices with wildcards.

See e.g. https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-selection-settings how rook allows matching devices.

Bubble up errors in diskmaker as events on the CR

If there was an error in diskmaker pod then currently you have to go look at its logs - which is not very user friendly. We should bubble up the errors (at least non-recoverable errors) in diskmaker as events in LocalVolume CR.

Clean-up of path "/mnt/local-storage/" needs to be done by LSO after Local volume is deleted

I am trying to create local volume using LSO in OCP 4.3 cluster. I have a local SSD disk (/dev/sdc) on my node. I created 2 partitions on that disk named /dev/sdc1 and /dev/sdc2. I need to use sdc1 with volumeMode: Filesystem and sdc2 with volumeMode: Block. Pasting the yaml files for reference. I see the file mode works fine. But block mode creates local volume even for sdc1 device while I have specified only sdc2 in the yaml.

Local Block volume:

apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: local-block
  namespace: local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: cluster.ocs.openshift.io/openshift-storage
          operator: In
          values:
          - worker
  storageClassDevices:
    - storageClassName: localblock
      volumeMode: Block
      devicePaths:
        - /dev/sdc2

Local File volume:

apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: local-file
  namespace: local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: cluster.ocs.openshift.io/openshift-storage
          operator: In
          values:
          - ""
  storageClassDevices:
    - storageClassName: localfile
      fsType: ext4
      volumeMode: Filesystem
      devicePaths:
        - /dev/sdc1

RBAC error when creating local volume

Just deployed a local-storage-operator on a three node master-only virtualized cluster created by dev-scripts. Attached a qcow2 image as the block device sdf to master-0. I then created a local volume. This resulted in localvolume/elastic being created, but with the following error:

error syncing local storage: error applying pv cluster role binding local-storage-provisioner-pv-binding: clusterrolebindings.rbac.authorization.k8s.io "local-storage-provisioner-pv-binding" is forbidden: user "system:serviceaccount:local-storage:local-storage-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:local-storage" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: {APIGroups:["events.k8s.io"], Resources:["events"], Verbs:["create" "patch" "update"]}

This is what's visible in the event stream: https://i.imgur.com/6CaEcSw.png

All the yaml files used are in this gist: https://gist.github.com/brainfunked/8cc3609c6845bf4829e03b4f7d497de4

occasionally volumes for un-exising nodes are not removed.

I have a cluster in which I have created and destoyed machines selected by the local storage operator. I ended up in a situation in which some volumes are available on nodes which do not exist, for example

# oc --context cluster1 get pv local-pv-8cfbb08f -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: local-volume-provisioner-ip-10-0-154-68.ec2.internal-8e2ed93b-6789-4e52-a925-5c075b926c6c
  creationTimestamp: "2021-06-27T15:39:40Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    storage.openshift.com/local-volume-owner-name: local-disks
    storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
  name: local-pv-8cfbb08f
  resourceVersion: "3018778"
  selfLink: /api/v1/persistentvolumes/local-pv-8cfbb08f
  uid: 8a4bf8b1-09af-47de-aba1-e0f57bf8afa7
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1844Gi
  local:
    fsType: xfs
    path: /mnt/local-storage/local-sc/nvme-Amazon_EC2_NVMe_Instance_Storage_AWSC674E5E529ABDB903
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - ip-10-0-154-68
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-sc
  volumeMode: Filesystem
status:
  phase: Available

so this volumes is supposedly available on the ip-10-0-154-68, but here are the nodes I have:

oc --context cluster1 get nodes
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-129-211.ec2.internal   Ready    master   4d18h   v1.20.0+2817867
ip-10-0-141-135.ec2.internal   Ready    worker   20h     v1.20.0+2817867
ip-10-0-143-43.ec2.internal    Ready    worker   4d17h   v1.20.0+2817867
ip-10-0-151-14.ec2.internal    Ready    master   4d18h   v1.20.0+2817867
ip-10-0-158-68.ec2.internal    Ready    worker   4d17h   v1.20.0+2817867
ip-10-0-159-55.ec2.internal    Ready    worker   20h     v1.20.0+2817867
ip-10-0-162-236.ec2.internal   Ready    master   4d18h   v1.20.0+2817867
ip-10-0-170-241.ec2.internal   Ready    worker   4d17h   v1.20.0+2817867
ip-10-0-170-94.ec2.internal    Ready    worker   20h     v1.20.0+2817867
ip-10-0-59-172.ec2.internal    Ready    worker   4d17h   v1.20.0+2817867

Create Localvolume with persistentVolumeReclaimPolicy : Retain

Hi there,

My apologies if i am posting the query to wrong forum.

I need to create LocalStorage with persistentVolumeReclaimPolicy : Retain.

The default value of Storageclass and PerssistentVolume created by LV has ReclaimPolicy as Delete and I need override it.

Something like this

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: noi-local-storage-observer-worker-0
namespace: local-storage
labels:
release: noi
spec:
persistentVolumeReclaimPolicy: Retain
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
storageClassDevices:

devicePaths:
- /dev/tnca/observer
  fsType: xfs
  storageClassName: local-storage-observer
  volumeMode: Filesystem

Thank you

Future Release Branches Frozen For Merging | branch:release-4.17 branch:release-4.18

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.17
release-4.18

For more information, see the branching documentation.

[RFE] Automate or document cleanup.

Current:
AFAIK, there is not documentation or automation for cleaning symlinks on nodes.
Creating a LocalVolume with a previous StorageClass name after reinstalling LSO will result in old disks (now referred to by no LocalVolumes) being provisioned.

Expected:

There should be documentation or automation for cleaning symlinks on nodes.

Opening because I've seen multiple people run into this issue recently.

Fix minimal version requirement in the documentation

local-storage-operator is not allowed to install on OCP 4.1, but our docs[2] reference it with OCP >= 4.1, which is not correct. Thanks to @liangxia for spotting this.

ownNamespace installMode causing conflicts

Hello,

We have an operator that has dependencies on both cert-manager and local-storage. This causes a conflict because cert-manager uses AllNamespaces installMode where local-storage uses OwnNamespace.
According to some posts online, the goal is to move toward singleton operators to avoid issues like these. Does it make sense to update the CSV to use AllNamespaces going forward to avoid such conflicts?

Sources:
https://groups.google.com/g/operator-framework/c/0KxBa-caG1U
operator-framework/operator-lifecycle-manager#1790

Include additional device metadata in discover result

For rook/ceph, there are several other fields we'd like to see for each device:

vendor/model/serial. Perhaps just including all of the udevadm info ... fields, or a subset of those fields, would be the simplest (ID_*, DEV*, etc.)
device path on the bus, e.g., the DEVPATH udev field
ssd vs hdd
availability of SES services; can ident/fault LED controlled; current LED status

Please prepare bundle for OCP 4.4

Now that we are fully branched for 4.4, please prepare your operator to supply a 4.4 bundle, so that 4.4 operator publishing works and doesn't overwrite 4.3 bundles. This means at least updating the package.yaml under https://github.com/openshift/local-storage-operator/tree/master/manifests
@gnufied

Warning `/dev/disk/by-id/scsi-...-scsi2 was defined in devicePaths, but expected a path in /dev/` since update to `4.9.0-202211280956`

Since updating to 4.9.0-202211280956 we see the following warning events:

Generated from localvolume-symlink-controller
w02 - /dev/disk/by-id/scsi-...-scsi2 was defined in devicePaths, but expected a path in /dev/

The warning does not make sense to me since this is clearly a path in /dev/. Here is the relevant localvolume:

apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  creationTimestamp: "2021-08-18T13:26:48Z"
  finalizers:
  - storage.openshift.com/local-volume-protection
  generation: 1
  name: my-local-storage
  namespace: openshift-local-storage
spec:
  logLevel: Normal
  managementState: Managed
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
      - key: kubernetes.io/hostname
        operator: In
        values:
        - w02
  storageClassDevices:
  - devicePaths:
    - /dev/disk/by-id/scsi-...-scsi2
    fsType: xfs
    storageClassName: my-local-storage
    volumeMode: Filesystem
status:
  conditions:
  - lastTransitionTime: "2022-01-22T18:02:47Z"
    message: Ready
    status: "True"
    type: Available
  generations:
  - group: apps
    lastGeneration: 21
    name: diskmaker-manager
    namespace: openshift-local-storage
    resource: DaemonSet
  managementState: Managed
  observedGeneration: 1
  readyReplicas: 0

unable to run localvolumediscovery on vanilla k8s

I'm running a 2-node cluster on minikube, I deploy the LSO with:
kubectl apply -k config/default

To run discovery I do:
kubectl apply -k config/samples/local_v1alpha1_localvolumediscovery.yaml

The localvolumediscovery CR is created but no diskmaker-discovery pod is created and so no results are created either.

Here are what the operator logs look like:

2021-07-22T19:11:23.006Z	INFO	controllers.LocalVolumeDiscovery	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "default", "Request.Name": "auto-discover-devices"}
time="2021-07-22T19:11:23Z" level=info msg="Reconciling metrics exporter serviceNamespacedNamedefault/local-storage-discovery-metrics" source="exporter.go:100"
time="2021-07-22T19:11:23Z" level=info msg="creating service monitorNamespacedNamedefault/local-storage-discovery-metrics" source="exporter.go:126"
2021-07-22T19:11:25.071Z	ERROR	controllers.LocalVolumeDiscovery	failed to create service and servicemonitors	{"Request.Namespace": "default", "Request.Name": "auto-discover-devices", "object": "auto-discover-devices", "error": "failed to enable service monitor. failed to retrieve servicemonitor default/local-storage-discovery-metrics. no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-07-22T19:11:25.071Z	ERROR	controller-runtime.manager.controller.localvolumediscovery	Reconciler error	{"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "default", "error": "failed to enable service monitor. failed to retrieve servicemonitor default/local-storage-discovery-metrics. no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214

Please prepare bundle for OCP 4.7

Now that we are fully branched for 4.7, please prepare your operator to supply a 4.7 bundle, so that 4.7 operator publishing works and doesn't overwrite 4.6 bundles. This means at least updating the package.yaml under
https://github.com/openshift/local-storage-operator/tree/master/manifests

@jcantrill @richm

Reference: openshift-eng/ocp-build-data#708

operator expects the nodes to be labeled as worker

I see that the operator expects the nodes to be labeled as workers.

Can I overwrite it in the spec of the LocalVolume to use for example nodes labebeled as storage or it's mandatory to use worker?

Typo for the link to the official OpenShift documentation

You have a typo for the link to the OpenShift documentation. The link is not working correctly...
Just change '-' with underscore '_' in file: local-storage-operator/docs/deploy-with-olm.md

https://docs.openshift.com/container-platform/latest/storage/persistent-storage/persistent-storage-local.html ==>
https://docs.openshift.com/container-platform/latest/storage/persistent_storage/persistent-storage-local.html

But for me it makes sense to use '-' so let's discuss with the OpenShift Docs team what separator to use :). Looks like they use underscore for directories. To make it working now it should be replaced to underscore.

"required env POD_NAME not set, please configure downward API"

Environment:
OKD 4.5.0-0.okd-2020-07-12-134038-rc
which comes with OLM 0.15

Steps to reproduce:
Start with fresh OKD environment (3 masters, 3 compute).
Install operator with oc apply -f https://raw.githubusercontent.com/openshift/local-storage-operator/master/examples/olm/catalog-create-subscribe.yaml

Expected result:
local-storage-operator pod starts normally and is usable after rollout.

Actual result:
local-storage-operator pod is in a crash loop with the following log output:
{"level":"info","ts":1594901910.1498752,"logger":"cmd","msg":"Operator Version: 0.0.1"} {"level":"info","ts":1594901910.1499276,"logger":"cmd","msg":"Go Version: go1.13.5"} {"level":"info","ts":1594901910.1499577,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1594901910.1499753,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"} {"level":"info","ts":1594901910.1504426,"logger":"leader","msg":"Trying to become the leader."} {"level":"error","ts":1594901912.5641682,"logger":"cmd","msg":"","error":"required env POD_NAME not set, please configure downward API","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/local-storage-operator/cmd/manager/main.go:91\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

What am I missing here?