Operator for local storage
Instructions to deploy on OCP >= 4.2 using OLM can be found here
Instructions for using the local storage's must-gather image can be found here
Operator for local storage
License: Apache License 2.0
when creating a new lvset nvme
LSO fails to create PVs but there is no error thats reported, in UI or logs;
more details: https://access.redhat.com/support/cases/#/case/03191850
Log shows :
2022-04-08T14:20:10.179Z INFO localvolumeset-symlink-controller provisioning succeeded {"Request.Namespace": "openshift-local-storage", "Request.Name": "nvme", "Device.Name": "nvme2n1"}
2022-04-08T14:20:10.179Z INFO localvolumeset-symlink-controller total devices provisioned {"Request.Namespace": "openshift-local-storage", "Request.Name": "nvme", "count": 4, "storageClass.Name": "nvme"}
on the node /mnt/local-storage/
sh-4.4# ls
nvme sc-ceph storage.class. <-- none of these should be here execpt for nvme
sh-4.4# ls -alh
total 0
drwxr-xr-x. 5 root root 54 Apr 6 22:52 .
drwxr-xr-x. 3 root root 27 Mar 30 19:10 ..
drwxr-xr-x. 2 root root 6 Apr 6 22:52 nvme
drwxr-xr-x. 2 root root 6 Apr 6 13:14 sc-ceph
drwxr-xr-x. 2 root root 196 Apr 1 14:06 storage.class
sh-4.4# ls -alh nvme/ storage.class/
nvme/:
total 0
drwxr-xr-x. 2 root root 6 Apr 6 22:52 .
drwxr-xr-x. 6 root root 65 Apr 8 14:36 ..
storage.class/:
total 0
drwxr-xr-x. 2 root root 196 Apr 1 14:06 .
drwxr-xr-x. 6 root root 65 Apr 8 14:36 ..
lrwxrwxrwx. 1 root root 46 Apr 1 14:06 ata-VR000240GXBBL_2026292D4F9A -> /dev/disk/by-id/ata-VR000240GXBBL_2026292D4F9A
lrwxrwxrwx. 1 root root 46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00HTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00HTDT8
lrwxrwxrwx. 1 root root 46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00RTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00RTDT8
lrwxrwxrwx. 1 root root 46 Mar 30 19:19 nvme-KCD6XLUL3T84_4170A00WTDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_4170A00WTDT8
lrwxrwxrwx. 1 root root 46 Mar 30 19:19 nvme-KCD6XLUL3T84_41E0A029TDT8 -> /dev/disk/by-id/nvme-KCD6XLUL3T84_41E0A029TDT8
sh-4.4# rm -rf *
Problem seems to be with the old uncleaned storage classes or symlinks in local-storage folder (dont know whey these are here in first place); workaround - is to clean/delete unwanted files:
> oc debug nodes/node1
> chroot /host
> ls /mnt/local-storage/
## if you notice any garbage or old storage classes delete them
> rm -rf nvme sc-ceph storage.class
and try recreate lvset workers-nvme
sh-4.4# cd workers-nvme/
sh-4.4# ls
nvme-KCD6XLUL3T84_4170A001TDT8 nvme-KCD6XLUL3T84_4170A007TDT8 nvme-KCD6XLUL3T84_4170A00DTDT8 nvme-KCD6XLUL3T84_4170A00MTDT8
sh-4.4# exit
> oc get pv | grep local
local-pv-3238eb37 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-57fc01e 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-672f73bd 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-6a74daf3 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-8d6bb503 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-8f2ffb78 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-de76809a 3576Gi RWO Delete Available workers-nvme 7m50s
local-pv-f93b075d 3576Gi RWO Delete Available workers-nvme 7m50s
Prometheus cannot scrape metrics from the local-storage-operator pod after upgrading to ocp 4.9
"lastError": "Get "http://:8383/metrics": dial tcp :8383: connect: connection refused",
"lastError": "Get "http://:8686/metrics": dial tcp :8686: connect: connection refused",
checking the config I can verify the ip address is exactly the one where prometheus cannot connect:
local-storage-operator-76f878db87-qngn4 1/1 Running 0 11h
The serviceMonitor is showing:
The service is showing:
spec:
clusterIP:
clusterIPs:
-
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http-metrics
port: 8383
protocol: TCP
targetPort: 8383
- name: cr-metrics
port: 8686
protocol: TCP
targetPort: 8686
selector:
name: local-storage-operator
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
But the pod is not listening at all in those ports: 8383 / 8686
Thanks.
local-storage-operator.v4.12.0-202305101515 (channel stable).
Create a localvolume :
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: infra-1-sdb-prom-core
namespace: openshift-local-storage
spec:
logLevel: Normal
managementState: Managed
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- cat-fch8-infra-1
storageClassDevices:
- devicePaths:
- /dev/sdb
fsType: xfs
storageClassName: local-storage-prom-core
volumeMode: Filesystem
tolerations:
- effect: NoExecute
operator: Exists
key: node-role.kubernetes.io/infra
Create a PVC (cluster-monitoring config in Openshift will do it for us. We just specify the storageClass)...
All is working perfectly.
We migrate our control-plane and infra nodes VM in our Nutanix cluster frome site A to site B. VM are powered on.
The migration change the UUID/SERIAL of disk attached.
After a few minutes, all is working on cluster except monitoring stack with local-storage PVC :
2 symlink (first is linked to an id (disk serial number) which doesn't exists anymore on host. It blinks red.
[core@cat-fch8-infra-1 ~]$ ll /mnt/local-storage/local-storage-prom-core/
total 0
lrwxrwxrwx. 1 root root 80 Aug 25 17:03 scsi-1NUTANIX_NFS_3_0_20023_ee3e4928_d368_4a67_b2dd_d18fdbf99650 -> /dev/disk/by-id/scsi-1NUTANIX_NFS_3_0_20023_ee3e4928_d368_4a67_b2dd_d18fdbf99650
lrwxrwxrwx. 1 root root 79 Sep 11 12:48 scsi-1NUTANIX_NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e -> /dev/disk/by-id/scsi-1NUTANIX_NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e
The disk sdb is present on host ...
[core@cat-fch8-infra-1 ~]$ lsblk -o NAME,SERIAL
NAME SERIAL
sda NFS_3_0_7690_b9a6cf2e_fc17_4c0e_896e_ef9df859e41a
├─sda1
├─sda2
├─sda3
└─sda4
sdb NFS_3_0_7705_f084d7d7_958c_4756_9dc1_6298abbf942e
sr0 QM00001
... But there is no disk mounted on the VM :
[core@cat-fch8-infra-1 ~]$ sudo df -h | grep local
We have now 2 PVs instead of 1. The new one si in "Available" state :
openshift@cat-fch8-bastion ~]$ oc get pv -l storage.openshift.com/owner-name=infra-1-sdb-prom-core
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-7c13822d 15Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0 local-storage-prom-core 16d
local-pv-812b91bb 15Gi RWO Delete Available local-storage-prom-core 3h13m
Promtheus pods are running. Their PVC (from openshift-monitoring) is linked to a PV which is linked to the disk mounted on sddb on host and local-storage operator manage the ID/SERIAL change of disk.
Regarding this issue: OCS-Operator 104 issue
It still happens, could you please take a look?
We are trying to deploy the local-storage-operator
on OKD4.5 using the operator (https://github.com/openshift/local-storage-operator/blob/master/examples/olm/catalog-create-subscribe.yaml) provided in this repository. We managed to install the operator, and create a LocalVolumeDiscovery
. But the resource LocalVolumeDiscoveryResult
stays empty. When we consult the logs of the diskmaker-discovery-xyz
pods we observe the following log entries:
I1126 15:11:15.003791 1 event.go:255] Event(v1.ObjectReference{Kind:"LocalVolumeDiscovery", Namespace:"openshift-local-storage", Name:"auto-discover-devices", UID:"bf8aa84b-5bab-4e85-8628-daa49ce90e61", APIVersion:"local.storage.openshift.io/v1alpha1", ResourceVersion:"84114511", FieldPath:""}): type: 'Warning' reason: 'ErrorUpdatingDiscoveryResultObject' c-0011.host.name - failed to update LocalVolumeDiscoveryResult status. Error: LocalVolumeDiscoveryResult.local.storage.openshift.io "discovery-result-c-0011.host.name" is invalid: status.discoveredDevices.size: Invalid value: "integer": status.discoveredDevices.size in body must be of type string: "integer"
failed to update the device status in the LocalVolumeDiscoveryResult resource
Which seems to indicate that the diskmaker-discovery tries to update the resource with the wrong data type which the API does not expect.
Client Version: 4.5.0-0.okd-2020-08-12-020541
Server Version: 4.5.0-0.okd-2020-10-15-235428
Kubernetes Version: v1.18.3
Is this a bug, or are we doing something wrong?
Please add gofmt / go vet (/ go lint?) checks to unit
prow job to make sure the code looks sane.
Is it possible to resize volumes provisioned with this operator?
I've tried the following:
echo 1 > /sys/class/block/sdb/device/rescan
on the node. The new size shows up with e.g. lsblk
..spec.capacity.storage=210Gi
..spec.resources.requests.storage=210Gi
.The PVC is stuck with the following event, even after restarting the pod:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ExternalExpanding 40m volume_expand Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
after the last automatic update the local-provisioner daemonset crashloops with the following message.
oc logs -p local-nvmes-local-provisioner-pntrk
I0611 13:14:46.274851 1 common.go:320] StorageClass "local-nvme" configured with MountDir "/mnt/local-storage/local-nvme", HostDir "/mnt/local-storage/local-nvme", VolumeMode "Filesystem", FsType "xfs", BlockCleanerCommand ["/scripts/quick_reset.sh"]
I0611 13:14:46.274965 1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-nvme:{HostDir:/mnt/local-storage/local-nvme MountDir:/mnt/local-storage/local-nvme BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Filesystem FsType:xfs}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:local-nvmes storage.openshift.com/local-volume-owner-namespace:local-storage]}
I0611 13:14:46.274990 1 main.go:64] Ready to run...
W0611 13:14:46.274996 1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.
W0611 13:14:46.275004 1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.
I0611 13:14:46.275288 1 common.go:382] Creating client using in-cluster config
I0611 13:14:46.282826 1 main.go:126] Could not get node information (remaining retries: 2): nodes "node-006.schaeffer-ag.de" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
I0611 13:14:47.283826 1 main.go:126] Could not get node information (remaining retries: 1): nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
I0611 13:14:48.284873 1 main.go:126] Could not get node information (remaining retries: 0): nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
F0611 13:14:48.284900 1 main.go:129] Could not get node information: nodes "<nodename>" is forbidden: User "system:serviceaccount:local-storage:local-storage-admin" cannot get resource "nodes" in API group "" at the cluster scope
It is using quay.io/openshift/origin-local-storage-static-provisioner:latest which corresponds to ID 9f53bcaa098060147bf0b69c4dccc288aae2dcef38caad4fe9504d8558e0dac3
and digest sha256:421ea9b9117615bd68eadee74f3f5be64fc1fdfe2050d2309141820fbbb26875
Hello everyone! ✋
I'm very interested in using local storage in Kubernetes for applications like distributed databases. I am currently developing an operator for Cassandra and ScyllaDB with rook and both of those databases would benefit greatly from using local storage.
I am wondering what's the difference between this project and local volume provisioner. I would love to know the scope of the project and see if it aligns with my goals, so that I can also contribute to it. 😄
Now that we are fully branched for 4.5, please prepare your operator to supply a 4.5 bundle, so that 4.5 operator publishing works and doesn't overwrite 4.4 bundles. This means at least updating the package.yaml under https://github.com/openshift/local-storage-operator/blob/master/manifests/local-storage-operator.package.yaml
Reference: Get OLM operator owners to update their CSV channels
Where can I find the documentation for this? I am looking to understand how to use this to create PV or PV sets in filesystem volume mode without coding.
The following link mentioned in the internal doc is broken:
"
Warning: These docs are for internal development and testing. Use https://docs.openshift.com/container-platform/latest/storage/persistent_storage/persistent-storage-local.html docs for installation on OCP
"
After experimenting with the GUI, I have successfully created block mode PV sets by following these steps:
However, if I format the disk using the command format.xfs /dev/vdb
, the aforementioned procedure ceases to function.
It would be great to have a working operator on OperatorHub for non redhat customers.
Hi,
We're using the local-storage-operator
to discover local volumes which will be consumed by Ceph via the OCS operator. This all works perfectly.
Now when a user tries to add an volume to their pod, they see all available StorageClasses
, including the storage class in which the Ceph volumes are offered. Selecting this storage class doesn't work for a user and we only want to offer dynamic persitent storage offerings.
Is there a way to hide the storage class? We want to hide the ceph-hdd
storage class in the screenshot below:
On OCP 4.6.1, I'm trying to debug an issue with the operator, and I can't get local-must-gather working.
From the pod created by oc adm must-gather --image=quay.io/openshift/local-must-gather:latest
:
state:
waiting:
message: 'rpc error: code = Unknown desc = Error reading manifest latest in
quay.io/openshift/local-must-gather: unauthorized: access to the requested
resource is not authorized'
reason: ErrImagePull
At first I thought this might be because I'm on a ppc64le cluster, but I can't find the image among the 287 listed at https://quay.io/organization/openshift .
Still seem to have a mismatch somewhere with the versioning changes that were added, not sure I thought that #19 would sync that up.
But trying to run just now I'm still seeing the same error when trying to create a new cr:
no matches for kind "LocalVolume" in version "local.storage.openshift.io/v1alpha1
and it doesn't appear that the local-storage CRD is ever installed.
local-storage-operator/manifests/4.3/local-storage-operator.v4.3.0.clusterserviceversion.yaml does not exist as defined in art.yaml
.
Trying to install/enable the local-storage operator reports:
Normal Pulling 8m15s (x4 over 9m52s) kubelet, ovirt-nbx5h-storage-0-4vc2h Pulling image "registry.redhat.io/openshift4/ose-local-storage-operator@sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de"
Warning Failed 8m13s (x4 over 9m50s) kubelet, ovirt-nbx5h-storage-0-4vc2h Error: ErrImagePull
Warning Failed 8m13s kubelet, ovirt-nbx5h-storage-0-4vc2h Failed to pull image "registry.redhat.io/openshift4/ose-local-storage-operator@sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de": rpc error: code = Unknown desc = Error reading manifest sha256:9b09d75a9c6970f4d1e65fa6f6b9af69ec462d399cccb0796963417fdff3f1de in registry.redhat.io/openshift4/ose-local-storage-operator: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<TITLE>Error</TITLE>\nAn error occurred while processing your request.
\nReference #132.b5e13217.1596825417.9166acf7\n\n"
It would be great to support LVM-based LocalVolumeSet volumes. It would need a way to set the volume group to use.
https://github.com/openshift/local-storage-operator/blob/master/pkg/apis/local/v1alpha1/types.go#L44 is duplicating the persistent volume mode defined in corev1, which is sort of weird since corev1 is being imported in that file.
Now there may be a reason for this duplication but it seemed weird to me.
Trying to install via OLM on OKD4.13 (Presumably also OCP4.13) seems to have version issues
❯ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.okd-2022-12-18-054128 True False 20h Cluster version is 4.13.0-0.okd-2022-12-18-054128
❯ oc exec -n openshift-operator-lifecycle-manager olm-operator-69cc6d5cc4-p2gbw -- olm --version
OLM version: 0.19.0
git commit: b73f64b354a65d4629ac13323adf58e0b6ef29c8
Following the install instructions here: https://github.com/bshephar/local-storage-operator/blob/master/docs/deploy-with-olm.md
Results in the following error:
❯ oc logs localstorage-operator-manifests-dst5m
Error: open /etc/nsswitch.conf: permission denied
Usage:
opm registry serve [flags]
Flags:
-d, --database string relative path to sqlite db (default "bundles.db")
--debug enable debug logging
-h, --help help for serve
-p, --port string port number to serve on (default "50051")
--skip-migrate do not attempt to migrate to the latest db revision when starting
-t, --termination-log string path to a container termination log file (default "/dev/termination-log")
--timeout-seconds string Timeout in seconds. This flag will be removed later. (default "infinite")
Global Flags:
--skip-tls skip TLS certificate verification for container image registries while pulling bundles or index
This appears to be related to the version of OLM in-use.
The container it's using by default is:
❯ oc get po -o json | jq '.items[].spec.containers[].image'
"quay.io/gnufied/gnufied-index:1.0.0"
This is quite old looking at: https://quay.io/repository/gnufied/gnufied-index?tab=tags
So I thought maybe it's just that old version. Trying with the latest:
quay.io/gnufied/gnufied-index@sha256:5c76a091fde6fdf5ccad0ee109f1da81b435e1d3634a3f751dce64fd3db04a45
This gives a new permission related error:
❯ oc logs localstorage-operator-manifests-b4kj4
time="2022-12-22T01:28:59Z" level=warning msg="\x1b[1;33mDEPRECATION NOTICE:\nSqlite-based catalogs and their related subcommands are deprecated. Support for\nthem will be removed in a future release. Please migrate your catalog workflows\nto the new file-based catalog format.\x1b[0m"
Error: open db-895998666: permission denied
Usage:
opm registry serve [flags]
Flags:
-d, --database string relative path to sqlite db (default "bundles.db")
--debug enable debug logging
-h, --help help for serve
-p, --port string port number to serve on (default "50051")
--skip-migrate do not attempt to migrate to the latest db revision when starting
-t, --termination-log string path to a container termination log file (default "/dev/termination-log")
--timeout-seconds string Timeout in seconds. This flag will be removed later. (default "infinite")
Global Flags:
--skip-tls-verify skip TLS certificate verification for container image registries while pulling bundles
--use-http use plain HTTP for container image registries while pulling bundles
Do we need to add some additional RBAC rules for local-storage-operator to work with 4.13, or do we need to rebuild images with some additional changes?
We need log levels and more granular control over logging. Operator should also respect log level set in the CR.
With the following configuration:
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "lvm-disks"
namespace: "local-storage"
spec:
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
storageClassDevices:
- storageClassName: "fast-lvm"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/mapper/storagepool-first
I'm getting
E1119 10:23:47.808238 1 diskmaker.go:164] found empty matching device list
Also changing the devicePaths to /dev/dm-0 or /dev/disk/by-id/dm-name-storagepool-first is not working.
Is LVM not supported in the local-storage-operator?
Using https://github.com/openshift/local-storage-operator/blob/master/examples/olm/catalog-create-subscribe.yaml (with fixed channel "4.5", because 4.4 is not available), I get the problem, that local-storage-operator errors out with needing POD_NAME environment variable.
Setting it with downwards-API, it starts at least.
It is reporting the following error though.
Could not create metrics Sevice","error":"failed to create or get service for metrics: services \"local-storage-operator-metrics\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
Tried to follow the instructions here:
https://github.com/openshift/local-storage-operator/blob/master/docs/deploy-with-olm.md
oc create -f create-cr.yaml fails with
error: unable to recognize "create-cr.yaml": no matches for kind "LocalVolume" in version "local.storage.openshift.io/v1"
The code to create the top level symlink directory /mnt/local-storage/<storageclassname>
should be moved out to second for loop to avoid calling it multiple times in case of multiple devices on a node.
https://github.com/openshift/local-storage-operator/tree/release-4.3/manifests
file not found: manifests/4.3/image-references
Unable to install operator. Install plan failed: api-server resource not found installing CustomResourceDefinition localvolumediscoveries.local.storage.openshift.io: GroupVersionKind apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition not found on the cluster. This API may have been deprecated and removed, see https://kubernetes.io/docs/reference/using-api/deprecation-guide/ for more information.
Steps taken:
1.
oc create -f https://raw.githubusercontent.com/openshift/local-storage-operator/master/examples/olm/catalog-create-subscribe.yaml
2.[kni@provisionhost-0 ~]$ cat << EOF|oc create -f -
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
- worker-1
storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sdb
EOF
localvolume.local.storage.openshift.io/local-disks created
oc get sc
No resources found in default namespace.
Expected to see:
oc get sc
NAME PROVISIONER AGE
local-sc kubernetes.io/no-provisioner 8h
This works fine on OCP4.3
I would to have the ability to set a field in the cr to have the created storage class be the default. I can patch it in after the sc is created of course, but being able to do it from the cr would be nice.
It would be great to be able to specify storage devices with wildcards.
See e.g. https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-selection-settings how rook allows matching devices.
If there was an error in diskmaker pod then currently you have to go look at its logs - which is not very user friendly. We should bubble up the errors (at least non-recoverable errors) in diskmaker as events in LocalVolume
CR.
I am trying to create local volume using LSO in OCP 4.3 cluster. I have a local SSD disk (/dev/sdc
) on my node. I created 2 partitions on that disk named /dev/sdc1
and /dev/sdc2
. I need to use sdc1 with volumeMode: Filesystem
and sdc2 with volumeMode: Block
. Pasting the yaml files for reference. I see the file mode works fine. But block mode creates local volume even for sdc1 device while I have specified only sdc2 in the yaml.
Local Block volume:
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: local-block
namespace: local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage
operator: In
values:
- worker
storageClassDevices:
- storageClassName: localblock
volumeMode: Block
devicePaths:
- /dev/sdc2
Local File volume:
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: local-file
namespace: local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage
operator: In
values:
- ""
storageClassDevices:
- storageClassName: localfile
fsType: ext4
volumeMode: Filesystem
devicePaths:
- /dev/sdc1
Just deployed a local-storage-operator on a three node master-only virtualized cluster created by dev-scripts. Attached a qcow2 image as the block device sdf to master-0. I then created a local volume. This resulted in localvolume/elastic being created, but with the following error:
error syncing local storage: error applying pv cluster role binding local-storage-provisioner-pv-binding: clusterrolebindings.rbac.authorization.k8s.io "local-storage-provisioner-pv-binding" is forbidden: user "system:serviceaccount:local-storage:local-storage-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:local-storage" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: {APIGroups:["events.k8s.io"], Resources:["events"], Verbs:["create" "patch" "update"]}
This is what's visible in the event stream: https://i.imgur.com/6CaEcSw.png
All the yaml files used are in this gist: https://gist.github.com/brainfunked/8cc3609c6845bf4829e03b4f7d497de4
I have a cluster in which I have created and destoyed machines selected by the local storage operator. I ended up in a situation in which some volumes are available on nodes which do not exist, for example
# oc --context cluster1 get pv local-pv-8cfbb08f -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: local-volume-provisioner-ip-10-0-154-68.ec2.internal-8e2ed93b-6789-4e52-a925-5c075b926c6c
creationTimestamp: "2021-06-27T15:39:40Z"
finalizers:
- kubernetes.io/pv-protection
labels:
storage.openshift.com/local-volume-owner-name: local-disks
storage.openshift.com/local-volume-owner-namespace: openshift-local-storage
name: local-pv-8cfbb08f
resourceVersion: "3018778"
selfLink: /api/v1/persistentvolumes/local-pv-8cfbb08f
uid: 8a4bf8b1-09af-47de-aba1-e0f57bf8afa7
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1844Gi
local:
fsType: xfs
path: /mnt/local-storage/local-sc/nvme-Amazon_EC2_NVMe_Instance_Storage_AWSC674E5E529ABDB903
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ip-10-0-154-68
persistentVolumeReclaimPolicy: Delete
storageClassName: local-sc
volumeMode: Filesystem
status:
phase: Available
so this volumes is supposedly available on the ip-10-0-154-68
, but here are the nodes I have:
oc --context cluster1 get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-129-211.ec2.internal Ready master 4d18h v1.20.0+2817867
ip-10-0-141-135.ec2.internal Ready worker 20h v1.20.0+2817867
ip-10-0-143-43.ec2.internal Ready worker 4d17h v1.20.0+2817867
ip-10-0-151-14.ec2.internal Ready master 4d18h v1.20.0+2817867
ip-10-0-158-68.ec2.internal Ready worker 4d17h v1.20.0+2817867
ip-10-0-159-55.ec2.internal Ready worker 20h v1.20.0+2817867
ip-10-0-162-236.ec2.internal Ready master 4d18h v1.20.0+2817867
ip-10-0-170-241.ec2.internal Ready worker 4d17h v1.20.0+2817867
ip-10-0-170-94.ec2.internal Ready worker 20h v1.20.0+2817867
ip-10-0-59-172.ec2.internal Ready worker 4d17h v1.20.0+2817867
Hi there,
My apologies if i am posting the query to wrong forum.
I need to create LocalStorage with persistentVolumeReclaimPolicy : Retain.
The default value of Storageclass and PerssistentVolume created by LV has ReclaimPolicy as Delete and I need override it.
Something like this
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: noi-local-storage-observer-worker-0
namespace: local-storage
labels:
release: noi
spec:
persistentVolumeReclaimPolicy: Retain
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
storageClassDevices:
Thank you
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.17
release-4.18
For more information, see the branching documentation.
Current:
AFAIK, there is not documentation or automation for cleaning symlinks on nodes.
Creating a LocalVolume with a previous StorageClass name after reinstalling LSO will result in old disks (now referred to by no LocalVolumes) being provisioned.
Expected:
There should be documentation or automation for cleaning symlinks on nodes.
Opening because I've seen multiple people run into this issue recently.
local-storage-operator is not allowed to install on OCP 4.1, but our docs[2] reference it with OCP >= 4.1, which is not correct. Thanks to @liangxia for spotting this.
Hello,
We have an operator that has dependencies on both cert-manager and local-storage. This causes a conflict because cert-manager uses AllNamespaces installMode where local-storage uses OwnNamespace.
According to some posts online, the goal is to move toward singleton operators to avoid issues like these. Does it make sense to update the CSV to use AllNamespaces going forward to avoid such conflicts?
Sources:
https://groups.google.com/g/operator-framework/c/0KxBa-caG1U
operator-framework/operator-lifecycle-manager#1790
For rook/ceph, there are several other fields we'd like to see for each device:
udevadm info ...
fields, or a subset of those fields, would be the simplest (ID_*
, DEV*
, etc.)DEVPATH
udev fieldNow that we are fully branched for 4.4, please prepare your operator to supply a 4.4 bundle, so that 4.4 operator publishing works and doesn't overwrite 4.3 bundles. This means at least updating the package.yaml under https://github.com/openshift/local-storage-operator/tree/master/manifests
@gnufied
Since updating to 4.9.0-202211280956
we see the following warning events:
Generated from localvolume-symlink-controller
w02 - /dev/disk/by-id/scsi-...-scsi2 was defined in devicePaths, but expected a path in /dev/
The warning does not make sense to me since this is clearly a path in /dev/
. Here is the relevant localvolume:
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
creationTimestamp: "2021-08-18T13:26:48Z"
finalizers:
- storage.openshift.com/local-volume-protection
generation: 1
name: my-local-storage
namespace: openshift-local-storage
spec:
logLevel: Normal
managementState: Managed
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- w02
storageClassDevices:
- devicePaths:
- /dev/disk/by-id/scsi-...-scsi2
fsType: xfs
storageClassName: my-local-storage
volumeMode: Filesystem
status:
conditions:
- lastTransitionTime: "2022-01-22T18:02:47Z"
message: Ready
status: "True"
type: Available
generations:
- group: apps
lastGeneration: 21
name: diskmaker-manager
namespace: openshift-local-storage
resource: DaemonSet
managementState: Managed
observedGeneration: 1
readyReplicas: 0
I'm running a 2-node cluster on minikube, I deploy the LSO with:
kubectl apply -k config/default
To run discovery I do:
kubectl apply -k config/samples/local_v1alpha1_localvolumediscovery.yaml
The localvolumediscovery CR is created but no diskmaker-discovery pod is created and so no results are created either.
Here are what the operator logs look like:
2021-07-22T19:11:23.006Z INFO controllers.LocalVolumeDiscovery Reconciling LocalVolumeDiscovery {"Request.Namespace": "default", "Request.Name": "auto-discover-devices"}
time="2021-07-22T19:11:23Z" level=info msg="Reconciling metrics exporter serviceNamespacedNamedefault/local-storage-discovery-metrics" source="exporter.go:100"
time="2021-07-22T19:11:23Z" level=info msg="creating service monitorNamespacedNamedefault/local-storage-discovery-metrics" source="exporter.go:126"
2021-07-22T19:11:25.071Z ERROR controllers.LocalVolumeDiscovery failed to create service and servicemonitors {"Request.Namespace": "default", "Request.Name": "auto-discover-devices", "object": "auto-discover-devices", "error": "failed to enable service monitor. failed to retrieve servicemonitor default/local-storage-discovery-metrics. no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-07-22T19:11:25.071Z ERROR controller-runtime.manager.controller.localvolumediscovery Reconciler error {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "default", "error": "failed to enable service monitor. failed to retrieve servicemonitor default/local-storage-discovery-metrics. no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
Now that we are fully branched for 4.7, please prepare your operator to supply a 4.7 bundle, so that 4.7 operator publishing works and doesn't overwrite 4.6 bundles. This means at least updating the package.yaml under
https://github.com/openshift/local-storage-operator/tree/master/manifests
Reference: openshift-eng/ocp-build-data#708
I see that the operator expects the nodes to be labeled as workers.
Can I overwrite it in the spec of the LocalVolume to use for example nodes labebeled as storage or it's mandatory to use worker?
You have a typo for the link to the OpenShift documentation. The link is not working correctly...
Just change '-' with underscore '_' in file: local-storage-operator/docs/deploy-with-olm.md
https://docs.openshift.com/container-platform/latest/storage/persistent-storage/persistent-storage-local.html ==>
https://docs.openshift.com/container-platform/latest/storage/persistent_storage/persistent-storage-local.html
But for me it makes sense to use '-' so let's discuss with the OpenShift Docs team what separator to use :). Looks like they use underscore for directories. To make it working now it should be replaced to underscore.
Environment:
OKD 4.5.0-0.okd-2020-07-12-134038-rc
which comes with OLM 0.15
Steps to reproduce:
Start with fresh OKD environment (3 masters, 3 compute).
Install operator with oc apply -f https://raw.githubusercontent.com/openshift/local-storage-operator/master/examples/olm/catalog-create-subscribe.yaml
Expected result:
local-storage-operator pod starts normally and is usable after rollout.
Actual result:
local-storage-operator pod is in a crash loop with the following log output:
{"level":"info","ts":1594901910.1498752,"logger":"cmd","msg":"Operator Version: 0.0.1"} {"level":"info","ts":1594901910.1499276,"logger":"cmd","msg":"Go Version: go1.13.5"} {"level":"info","ts":1594901910.1499577,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1594901910.1499753,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"} {"level":"info","ts":1594901910.1504426,"logger":"leader","msg":"Trying to become the leader."} {"level":"error","ts":1594901912.5641682,"logger":"cmd","msg":"","error":"required env POD_NAME not set, please configure downward API","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/local-storage-operator/cmd/manager/main.go:91\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
What am I missing here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.