Giter VIP home page Giter VIP logo

piraeus-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

piraeus-operator's Issues

Failed to install with Helm and Percona

We have tryed to install Piraeus with MariaDB database instead of ETCD. Installation with ETCD passed without any problem.

We have tryed various commands and users, notnihg worked. Commads that we tryed are:
helm install -n linstor piraeus-op ./charts/piraeus --set "operator.controller.dbConnectionURL=jdbc:mysql://user:[email protected]:3306/linstor?createDatabaseIfNotExist=true&useMysqlMetadata=true"
and variations on this were

  • --set "operator.controller.dbConnectionURL=jdbc:mariadb://user:[email protected]/linstor?createDatabaseIfNotExist=true"
  • --set "operator.controller.dbConnectionURL=jdbc:mariadb://devops-haproxy.mysql-ha/linstor?user=user&password=pass"
    And more variations to this like adding and removing port, moving username and password from the end to the begining of line, etc.
    Etcd is disabled in values.yaml.

Here is log from cs-controller:


I0129 12:20:12.423886       1 leaderelection.go:242] attempting to acquire leader lease  linstor/piraeus-op-cs...

time="2021-01-29T12:20:12Z" level=info msg="long live our new leader: 'piraeus-op-cs-controller-7d99c4fff5-vv4bx'!"

I0129 12:20:42.998653       1 leaderelection.go:252] successfully acquired lease linstor/piraeus-op-cs

time="2021-01-29T12:20:42Z" level=info msg="long live our new leader: 'piraeus-op-cs-controller-7d99c4fff5-zvm6w'!"

time="2021-01-29T12:20:43Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"

LINSTOR, Module Controller

Version:            1.11.1 (fe95a94d86c66c6c9846a3cf579a1a776f95d3f4)

Build time:         2021-01-13T08:34:55+00:00

Java Version:       11

Java VM:            Debian, Version 11.0.9.1+1-post-Debian-1deb10u2

Operating system:   Linux, Version 5.4.0-64-generic

Environment:        amd64, 1 processors, 1925 MiB memory reserved for allocations

System components initialization in progress

12:20:43.951 [main] INFO  LINSTOR/Controller - SYSTEM - ErrorReporter DB first time init.

12:20:43.953 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'

12:20:43.988 [main] WARN  io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***

12:20:43.999 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.

12:20:44.332 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 332ms

12:20:44.332 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.

12:20:44.344 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"

12:20:44.345 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/linstor-server/lib/guice-4.2.3.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)

WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1

WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

WARNING: All illegal access operations will be denied in a future release

12:20:45.308 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 976ms

12:20:45.604 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem

12:20:45.717 [Main] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking using SQL driver

12:20:45.719 [Main] INFO  LINSTOR/Controller - SYSTEM - SpaceTrackingService: Instance added as a system service

12:20:45.720 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService

12:20:45.721 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the database connection pool

12:20:45.767 [Main] ERROR LINSTOR/Controller - SYSTEM - Database initialization error [Report number 6013FD9B-00000-000000]

12:20:45.774 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutdown in progress

12:20:45.778 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutting down service instance 'SpaceTrackingService' of type SpaceTrackingService

12:20:45.780 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Waiting for service instance 'SpaceTrackingService' to complete shutdown

12:20:45.780 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutting down service instance 'TaskScheduleService' of type TaskScheduleService

12:20:45.781 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Waiting for service instance 'TaskScheduleService' to complete shutdown

12:20:45.781 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutting down service instance 'DatabaseService' of type DatabaseService

12:20:45.782 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Waiting for service instance 'DatabaseService' to complete shutdown

12:20:45.782 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService

12:20:45.783 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown

12:20:45.784 [Thread-2] INFO  LINSTOR/Controller - SYSTEM - Shutdown complete

time="2021-01-29T12:20:45Z" level=fatal msg="failed to run" err="exit status 20" ```

Allow enabling the topology feature for CSI

Linstor CSI supports the CSI Topology feature. However, it is disabled. See here

One reason why someone might want to disable this feature: piraeusdatastore/linstor-csi#54
In short: the way it is implemented might be unsuitable for large clusters. (I have a feeling this is just a limitation in the way Linstor CSI implemented this feature, but I haven't investigated further)

Adding a new value to the CSI resource to toggle this flag shouldn't be an issue. This leaves the following questions:

Do we want to enable the topology feature?

My opinion: yes, its a useful feature

If yes, do we want it configurable?

My opinion: yes, as there are certain problems with this feature, as linked above

If yes, what should the default?

My opinion: Default to "yes". A default installation of the operator should provide the fullest feature set for users that just want to try it out.

Opinions @alexzhc @JoelColledge ?

[Feature Request] Linstor advanced TLS support

Description

As a developer i want to generate certificates for encrypted communication inside Linstor(link) using Kubernetes/Openshift capabilities.

Requirements

  1. Linstor supports certificates stored in secrets type: kubernetes\tls
  2. Linstor supports certificates in format PKCS8(tls.key) + X.509(tls.crt)
  3. Linstor supports certificates replace on secret change without downtime.

Details

1. Linstor supports certificates stored in secrets type: kubernetes\tls

Kubernetes and Openshift provide ability to generate certificates in following format:

[root@utility1 ~]# kubectl describe node-secret -n cert-manager
NAME                                      TYPE                                  DATA   AGE
node-secret                               kubernetes.io/tls                     5      6d
[root@utility1 ~]# kubectl describe secret node-secret -n cert-manager
Name:         node-secret
Namespace:    cert-manager
Labels:       <none>
Annotations:  cert-manager.io/alt-names: 
              cert-manager.io/certificate-name: node-secret
              cert-manager.io/common-name: node-secret
              cert-manager.io/ip-sans: 
              cert-manager.io/issuer-kind: Issuer
              cert-manager.io/issuer-name: selfsigned-issuer
              cert-manager.io/uri-sans: 

Type:  kubernetes.io/tls

Data
====
tls.key:         1675 bytes
truststore.jks:  801 bytes
ca.crt:          1062 bytes
keystore.jks:    2867 bytes
keystore.p12:    3127 bytes
tls.crt:         1062 bytes

Currently Linstor doesn't have unified naming for certificate secrets.
Linstor Components - use keystore.jks, truststore.jks
Linstor API - use keystore.jks, ca.pem, client.key, truststore.jks
ETCD - use cert.pem, key.pem, ca.pem, client.cert, client.key

Inside Kubernetes is always tls.crt/ca.crt/tls.key and Linstor should support this naming convention.

2. Linstor supports certificates in format PKCS8(tls.key) + X.509(tls.crt)

Currently Linstor doesn't have unified certificate format.
Linstor Components - stored in JKS format
Linstor API - stored in PKCS12 format and
ETCD cluster - PKCS8 + X.509

Linstor should use certificates and key in unified format as in Kubernetes

3. Linstor supports certificates replace on secret change without downtime.

Certificates validity period is limited. Linstor should detect secret update and automatically apply new certificate without downtime

Add podAntiAffinity for cs-controller and csi-controller

Hi, @WanzenBug

As cs-controller and csi-controller both may have multiple replicas for HA. It is important to add podAntiAffinity, so that pods will land on different nodes.

For cs-controller, I can do

  affinity: 
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: role
              operator: In
              values:
              - piraeus-controller
          topologyKey: kubernetes.io/hostname

However csi-controller labels are not that articulated for affinity use:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2020-11-02T08:05:18Z"
  generation: 1
  labels:
    app: piraeus-op
  name: piraeus-op-csi-controller
  namespace: storage-system

Recommendations:

  1. Add labels: role: piraeus-csi-controller for op-csi-controller
  2. Make podAntiAffinity default for both controllers ?

Error writing controller logs to /var/log/linstor-controller, when running on OpenShift

Can I fix this issue without running controller pod with root privileges?

log:

time="2020-11-18T19:13:48Z" level=info msg="running k8s-await-election" version=refs/tags/v0.2.0
I1118 19:13:48.722244       1 leaderelection.go:242] attempting to acquire leader lease  piraeus-operator/piraeus-cs...
time="2020-11-18T19:13:51Z" level=info msg="long live our new leader: 'piraeus-cs-controller-5f998d764b-btsjm'!"
I1120 12:17:59.154742       1 leaderelection.go:252] successfully acquired lease piraeus-operator/piraeus-cs
time="2020-11-20T12:17:59Z" level=info msg="long live our new leader: 'piraeus-cs-controller-5f998d764b-zz46s'!"
time="2020-11-20T12:17:59Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"
LINSTOR, Module Controller
Version:            1.9.0 (678acd24a8b9b73a735407cd79ca33a5e95eb2e2)
Build time:         2020-09-23T09:33:23+00:00
Java Version:       11
Java VM:            Debian, Version 11.0.8+10-post-Debian-1deb10u1
Operating system:   Linux, Version 5.6.19-300.fc32.x86_64
Environment:        amd64, 1 processors, 29694 MiB memory reserved for allocations


System components initialization in progress

12:17:59,493 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
12:17:59,494 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
12:17:59,494 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/usr/share/linstor-server/lib/conf/logback.xml]
12:17:59,549 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
12:17:59,551 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
12:17:59,553 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
12:17:59,556 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
12:17:59,568 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
12:17:59,569 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILE]
12:17:59,573 |-INFO in ch.qos.logback.core.rolling.FixedWindowRollingPolicy@2de23121 - Will use zip compression
12:17:59,575 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log file name: /var/log/linstor-controller/linstor-Controller.log
12:17:59,575 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property is set to [/var/log/linstor-controller/linstor-Controller.log]
12:17:59,576 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Failed to create parent directories for [/var/log/linstor-controller/linstor-Controller.log]
12:17:59,576 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - openFile(/var/log/linstor-controller/linstor-Controller.log,true) call failed. java.io.FileNotFoundException: /var/log/linstor-controller/linstor-Controller.log (No such file or directory)
	at java.io.FileNotFoundException: /var/log/linstor-controller/linstor-Controller.log (No such file or directory)
	at 	at java.base/java.io.FileOutputStream.open0(Native Method)
	at 	at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
	at 	at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
	at 	at ch.qos.logback.core.recovery.ResilientFileOutputStream.<init>(ResilientFileOutputStream.java:26)
	at 	at ch.qos.logback.core.FileAppender.openFile(FileAppender.java:204)
	at 	at ch.qos.logback.core.FileAppender.start(FileAppender.java:127)
	at 	at ch.qos.logback.core.rolling.RollingFileAppender.start(RollingFileAppender.java:100)
	at 	at ch.qos.logback.core.joran.action.AppenderAction.end(AppenderAction.java:90)
	at 	at ch.qos.logback.core.joran.spi.Interpreter.callEndAction(Interpreter.java:309)
	at 	at ch.qos.logback.core.joran.spi.Interpreter.endElement(Interpreter.java:193)
	at 	at ch.qos.logback.core.joran.spi.Interpreter.endElement(Interpreter.java:179)
	at 	at ch.qos.logback.core.joran.spi.EventPlayer.play(EventPlayer.java:62)
	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:165)
	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:152)
	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:110)
	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:53)
	at 	at ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:75)
	at 	at ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:150)
	at 	at org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:84)
	at 	at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:55)
	at 	at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
	at 	at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
	at 	at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:417)
	at 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:362)
	at 	at com.linbit.linstor.logging.StdErrorReporter.<init>(StdErrorReporter.java:75)
	at 	at com.linbit.linstor.core.Controller.main(Controller.java:450)
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/Controller] to INFO
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/Controller] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/Controller]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/Satellite] to INFO
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/Satellite] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/Satellite]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/TESTS] to OFF
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/TESTS] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/TESTS]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
12:17:59,577 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6913c1fb - Registering current configuration as safe fallback point

12:17:59.580 [main] ERROR LINSTOR/Controller - SYSTEM - Unable to create log directory: /var/log/linstor-controller
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
	at org.h2.message.DbException.get(DbException.java:168)
	at org.h2.message.TraceSystem.logWritingError(TraceSystem.java:289)
	at org.h2.message.TraceSystem.openWriter(TraceSystem.java:310)
	at org.h2.message.TraceSystem.writeFile(TraceSystem.java:258)
	at org.h2.message.TraceSystem.write(TraceSystem.java:242)
	at org.h2.message.Trace.error(Trace.java:196)
	at org.h2.engine.Database.openDatabase(Database.java:314)
	at org.h2.engine.Database.<init>(Database.java:280)
	at org.h2.engine.Engine.openSession(Engine.java:66)
	at org.h2.engine.Engine.openSession(Engine.java:179)
	at org.h2.engine.Engine.createSessionAndValidate(Engine.java:157)
	at org.h2.engine.Engine.createSession(Engine.java:140)
	at org.h2.engine.Engine.createSession(Engine.java:28)
	at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:351)
	at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:124)
	at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:103)
	at org.h2.Driver.connect(Driver.java:69)
	at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
	at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:115)
	at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665)
	at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
	at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
	at com.linbit.linstor.logging.H2ErrorReporter.setupErrorDB(H2ErrorReporter.java:66)
	at com.linbit.linstor.logging.H2ErrorReporter.<init>(H2ErrorReporter.java:61)
	at com.linbit.linstor.logging.StdErrorReporter.<init>(StdErrorReporter.java:107)
	at com.linbit.linstor.core.Controller.main(Controller.java:450)
Caused by: org.h2.jdbc.JdbcSQLException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
	... 27 more
Caused by: org.h2.message.DbException: Error while creating file "/var/log/linstor-controller" [90062-197]
	at org.h2.message.DbException.get(DbException.java:179)
	at org.h2.message.DbException.get(DbException.java:155)
	at org.h2.store.fs.FilePathDisk.createDirectory(FilePathDisk.java:271)
	at org.h2.store.fs.FileUtils.createDirectory(FileUtils.java:42)
	at org.h2.store.fs.FileUtils.createDirectories(FileUtils.java:312)
	at org.h2.message.TraceSystem.openWriter(TraceSystem.java:300)
	... 24 more
Caused by: org.h2.jdbc.JdbcSQLException: Error while creating file "/var/log/linstor-controller" [90062-197]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
	... 30 more
12:18:02.218 [main] ERROR LINSTOR/Controller - SYSTEM - Unable to operate the error-reports database: java.sql.SQLException: Cannot create PoolableConnectionFactory (Error while creating file "/var/log/linstor-controller" [90062-197])
12:18:02.218 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
12:18:02.242 [main] WARN  io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***
12:18:02.250 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.
12:18:02.486 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 236ms
12:18:02.486 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.
12:18:02.501 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
12:18:02.501 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" is not installed
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/linstor-server/lib/guice-4.2.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
12:18:03.175 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 689ms
12:18:03.375 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
12:18:03.459 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
12:18:03.460 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the etcd database
12:18:04.008 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'ETCDDatabaseService' of type ETCDDatabaseService
12:18:04.009 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading security objects
12:18:04.090 [Main] INFO  LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
12:18:04.144 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
12:18:06.684 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database completed
12:18:06.703 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TaskScheduleService' of type TaskScheduleService
12:18:06.704 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing network communications services
12:18:06.704 [Main] WARN  LINSTOR/Controller - SYSTEM - The SSL network communication service 'DebugSslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
Unable to create error report file for error report 5FB7B3F7-00000-000000:
/var/log/linstor-controller/ErrorReport-5FB7B3F7-00000-000000.log (No such file or directory)
The error report will be written to the standard error stream instead.

ERROR REPORT 5FB7B3F7-00000-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID:                           678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T09:33:23+00:00
Error time:                         2020-11-20 12:18:06
Node:                               piraeus-cs-controller-5f998d764b-zz46s

============================================================

Reported error:
===============

Category:                           Error
Class name:                         ImplementationError
Class canonical name:               com.linbit.ImplementationError
Generated at:                       Method 'execute', Source file 'TaskScheduleService.java', Line #289

Error message:                      Unhandled exception caught in com.linbit.linstor.tasks.TaskScheduleService

Error context:
    This exception was generated in the service thread of the service 'TaskScheduleService'

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.tasks.TaskScheduleService:289
    run                                      N      com.linbit.linstor.tasks.TaskScheduleService:196
    run                                      N      java.lang.Thread:834

Caused by:
==========

Category:                           RuntimeException
Class name:                         LinStorRuntimeException
Class canonical name:               com.linbit.linstor.LinStorRuntimeException
Generated at:                       Method 'archiveLogDirectory', Source file 'StdErrorReporter.java', Line #730

Error message:                      Unable to list log directory

Call backtrace:

    Method                                   Native Class:Line number
    archiveLogDirectory                      N      com.linbit.linstor.logging.StdErrorReporter:730
    run                12:18:06.709 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'PlainConnector'
                      N     12:18:06.709 [Main] WARN  LINSTOR/Controller - SYSTEM - The SSL network communication service 'SslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
 com.linbit.linstor.tasks.LogArchiveTask:49
12:18:06.709 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'SslConnector'
    execute                                  N      com.linbit.linstor.tasks.TaskScheduleService:282
    run                                      N      com.linbit.linstor.tasks.TaskScheduleService:196
    run                                      N      java.lang.Thread:834

Caused by:
==========

Category:                           Exception
Class name:                         NoSuchFileException
Class canonical name:               java.nio.file.NoSuchFileException
Generated at:                       Method 'translateToIOException', Source file 'UnixException.java', Line #92

Error message:                      /var/log/linstor-controller

Call backtrace:

    Method                                   Native Class:Line number
    translateToIOException                   N      sun.nio.fs.UnixException:92
    rethrowAsIOException                     N      sun.nio.fs.UnixException:111
    rethrowAsIOException                     N      sun.nio.fs.UnixException:116
    newDirectoryStream                       N      sun.nio.fs.UnixFileSystemProvider:432
    newDirectoryStream                       N      java.nio.file.Files:471
    list                                     N      java.nio.file.Files:3698
    archiveLogDirectory                      N      com.linbit.linstor.logging.StdErrorReporter:652
    run                                      N      com.linbit.linstor.tasks.LogArchiveTask:49
12:18:06.711 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnecting to previously known nodes
    execute                                  N      com.linbit.linstor.tasks.TaskScheduleService:282
    run                                      N      com.linbit.linstor.tasks.TaskScheduleService:196
    run                                      N      java.lang.Thread:834


END OF ERROR REPORT.
12:18:06.730 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnect requests sent
Nov 20, 2020 12:18:07 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [[::]:3370]
Nov 20, 2020 12:18:07 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
12:18:07.656 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized

org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
	at org.h2.message.DbException.get(DbException.java:168)
	at org.h2.message.TraceSystem.logWritingError(TraceSystem.java:289)
	at org.h2.message.TraceSystem.openWriter(TraceSystem.java:310)
	at org.h2.message.TraceSystem.writeFile(TraceSystem.java:258)
	at org.h2.message.TraceSystem.write(TraceSystem.java:242)
	at org.h2.message.Trace.error(Trace.java:196)
	at org.h2.engine.Database.openDatabase(Database.java:314)
	at org.h2.engine.Database.<init>(Database.java:280)
	at org.h2.engine.Engine.openSession(Engine.java:66)
	at org.h2.engine.Engine.openSession(Engine.java:179)
	at org.h2.engine.Engine.createSessionAndValidate(Engine.java:157)
	at org.h2.engine.Engine.createSession(Engine.java:140)
	at org.h2.engine.Engine.createSession(Engine.java:28)
	at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:351)
	at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:124)
	at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:103)
	at org.h2.Driver.connect(Driver.java:69)
	at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
	at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:115)
	at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665)
	at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
	at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
	at com.linbit.linstor.logging.H2ErrorReporter.writeErrorReportToDB(H2ErrorReporter.java:105)
	at com.linbit.linstor.logging.StdErrorReporter.reportErrorImpl(StdErrorReporter.java:301)
	at com.linbit.linstor.logging.StdErrorReporter.reportError(StdErrorReporter.java:271)
	at com.linbit.linstor.tasks.TaskScheduleService.execute(TaskScheduleService.java:286)
	at com.linbit.linstor.tasks.TaskScheduleService.run(TaskScheduleService.java:196)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.h2.jdbc.JdbcSQLException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
	... 29 more
Caused by: org.h2.message.DbException: Error while creating file "/var/log/linstor-controller" [90062-197]
	at org.h2.message.DbException.get(DbException.java:179)
	at org.h2.message.DbException.get(DbException.java:155)
	at org.h2.store.fs.FilePathDisk.createDirectory(FilePathDisk.java:271)
	at org.h2.store.fs.FileUtils.createDirectory(FileUtils.java:42)
	at org.h2.store.fs.FileUtils.createDirectories(FileUtils.java:312)
	at org.h2.message.TraceSystem.openWriter(TraceSystem.java:300)
	... 26 more
Caused by: org.h2.jdbc.JdbcSQLException: Error while creating file "/var/log/linstor-controller" [90062-197]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
	... 32 more
12:18:09.291 [TaskScheduleService] ERROR LINSTOR/Controller - SYSTEM - Unable to write error report to DB: Cannot create PoolableConnectionFactory (Error while creating file "/var/log/linstor-controller" [90062-197])
12:18:09.291 [TaskScheduleService] ERROR LINSTOR/Controller - SYSTEM - Unhandled exception caught in com.linbit.linstor.tasks.TaskScheduleService [Report number 5FB7B3F7-00000-000000]

linstor volume list command fails with NullPointerException

root@piraeus-op-cs-controller-595cdf94d6-nrzkz:/# linstor v l
ERROR:
Show reports:
linstor error-reports show 5FA8E4BB-00000-000031
root@piraeus-op-cs-controller-595cdf94d6-nrzkz:/# linstor error-reports show 5FA8E4BB-00000-000031
ERROR REPORT 5FA8E4BB-00000-000031

============================================================

Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T09:33:23+00:00
Error time: 2020-11-09 06:52:57
Node: piraeus-op-cs-controller-595cdf94d6-nrzkz

============================================================

Reported error:

Category: RuntimeException
Class name: NullPointerException
Class canonical name: java.lang.NullPointerException
Generated at: Method 'deviceProviderKindAsString', Source file 'Json.java', Line #73

Call backtrace:

Method                                   Native Class:Line number
deviceProviderKindAsString               N      com.linbit.linstor.api.rest.v1.serializer.Json:73
apiToVolume                              N      com.linbit.linstor.api.rest.v1.serializer.Json:664
lambda$apiToResourceWithVolumes$2        N      com.linbit.linstor.api.rest.v1.serializer.Json:477
accept                                   N      java.util.stream.ReferencePipeline$3$1:195
forEachRemaining                         N      java.util.ArrayList$ArrayListSpliterator:1655
copyInto                                 N      java.util.stream.AbstractPipeline:484
wrapAndCopyInto                          N      java.util.stream.AbstractPipeline:474
evaluateSequential                       N      java.util.stream.ReduceOps$ReduceOp:913
evaluate                                 N      java.util.stream.AbstractPipeline:234
collect                                  N      java.util.stream.ReferencePipeline:578
apiToResourceWithVolumes                 N      com.linbit.linstor.api.rest.v1.serializer.Json:506
lambda$listVolumesApiCallRcWithToResponse$1 N      com.linbit.linstor.api.rest.v1.View:112
accept                                   N      java.util.stream.ReferencePipeline$3$1:195
forEachRemaining                         N      java.util.ArrayList$ArrayListSpliterator:1655
copyInto                                 N      java.util.stream.AbstractPipeline:484
wrapAndCopyInto                          N      java.util.stream.AbstractPipeline:474
evaluateSequential                       N      java.util.stream.ReduceOps$ReduceOp:913
evaluate                                 N      java.util.stream.AbstractPipeline:234
collect                                  N      java.util.stream.ReferencePipeline:578
lambda$listVolumesApiCallRcWithToResponse$2 N      com.linbit.linstor.api.rest.v1.View:113
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:378
onNext                                   N      reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
onNext                                   N      reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribeInner                         N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:191
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8311
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe                                N      reactor.core.publisher.MonoCurrentContext:35
subscribe                                N      reactor.core.publisher.Flux:8325
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
onNext                                   N      reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
complete                                 N      reactor.core.publisher.Operators$MonoSubscriber:1755
onComplete                               N      reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167
onComplete                               N      reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
onComplete                               N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
checkTerminated                          N      reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop                                N      reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete                            N      reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete                               N      reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
onComplete                               N      reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
complete                                 N      reactor.core.publisher.FluxCreate$BaseSink:438
drain                                    N      reactor.core.publisher.FluxCreate$BufferAsyncSink:784
complete                                 N      reactor.core.publisher.FluxCreate$BufferAsyncSink:732
drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedSink:239
drain                                    N      reactor.core.publisher.FluxCreate$SerializedSink:205
complete                                 N      reactor.core.publisher.FluxCreate$SerializedSink:196
apiCallComplete                          N      com.linbit.linstor.netcom.TcpConnectorPeer:455
handleComplete                           N      com.linbit.linstor.proto.CommonMessageProcessor:363
handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:287
doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe                                N      reactor.core.publisher.FluxDefer:46
subscribe                                N      reactor.core.publisher.Flux:8325
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused                               N      reactor.core.publisher.UnicastProcessor:286
drain                                    N      reactor.core.publisher.UnicastProcessor:322
onNext                                   N      reactor.core.publisher.UnicastProcessor:401
next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call                                     N      reactor.core.scheduler.WorkerTask:84
call                                     N      reactor.core.scheduler.WorkerTask:37
run                                      N      java.util.concurrent.FutureTask:264
run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
run                                      N      java.lang.Thread:834

END OF ERROR REPORT.

Display the node name of which controller is on, instead of the pod name

Currently, the operator uses controller pod name and ip to register it, which is unintuitive and does not show its locality directly:

# linstor node list 
+------------------------------------------------------------------------------------------------+
| Node                                      | NodeType   | Addresses                    | State  |
|================================================================================================|
| k8s-worker-1                              | SATELLITE  | 192.168.176.191:3366 (PLAIN) | Online |
| k8s-worker-2                              | SATELLITE  | 192.168.176.192:3366 (PLAIN) | Online |
| k8s-worker-3                              | SATELLITE  | 192.168.176.193:3366 (PLAIN) | Online |
| piraeus-op-cs-controller-6fbd7b7888-ngs45 | CONTROLLER | 172.29.69.195:3366 (PLAIN)   | Online |
+------------------------------------------------------------------------------------------------+

The ideal would be display the node name of which controller is on, instead of the pod name
I tried to tweek it to use spec.nodeName and status.hostIP, but somehow linstor does not allow registration using containerPort. Changing controller to use hostNetwork solves the problem but could be an overshoot.
Is there anyway to do it cleanly?

Error on deploy new pvc. Existing one works.

Got the following errors doing kubectl describe pvc name.
pvc wont get created.

Warning ProvisioningFailed 34s linstor.csi.linbit.com_piraeus-op-csi-controller-5c65956bf4-s4t5h_d02bd753-670a-47cf-a6e4-4aed355fb8e8 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = Internal desc = CreateVolume failed for pvc-56019475-789d-469c-9694-278d711fb079: Message: 'Successfully set property key(s): StorPoolName' next error: Message: 'Successfully set property key(s): StorPoolName' next error: Message: 'Resource 'pvc-56019475-789d-469c-9694-278d711fb079' successfully autoplaced on 2 nodes'; Details: 'Used nodes (storage pool name): 'k8smaster (lvm-hdd)', 'k8w6 (lvm-hdd)'' next error: Message: 'Tie breaker resource 'pvc-56019475-789d-469c-9694-278d711fb079' created on k8w2' next error: Message: 'Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'null' to 'majority' by auto-quorum' next error: Message: 'Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'null' to 'io-error' by auto-quorum' next error: Message: '(Node: 'k8w2') Failed to adjust DRBD resource pvc-56019475-789d-469c-9694-278d711fb079'; Reports: '[5F99F200-4BC88-000024]' next error: Message: 'Created resource 'pvc-56019475-789d-469c-9694-278d711fb079' on 'k8w6'' next error: Message: 'Created resource 'pvc-56019475-789d-469c-9694-278d711fb079' on 'k8smaster''

Allow setting LINSTOR properties via secrets

It should be possible to:

  • Set (controller) properties from our CRDs
  • Set "secret" properties, i.e. the value should reside in a kubernetes secret. The property will not be set directly, but via an env variables (needs support from LINSTOR, should be included in the next release)

Proposal:

  • Add a additionalEnv section to the LinstorController CRD. This is concatenated with the environment we already set and
    added to the controller pod template. The value would be an array of env items, just like the env: section of a containerspec.
  • Add a additionalProperties section to the LinstorController CRD. It would look like this:
    additionalProperties:
    - key: StorDriver/Foobar/UsernameEnv
      value: MY_ENV

In conjunction, these features would allow setting a property value from a secret:

additionalEnv:
- name: MY_PASSWORD 
  valueFrom:
    secretKeyRef:
      name: my-k8s-secret
      key: password
additionalProperties:
- key: StorDriver/Foobar/PasswordEnv
  value: MY_PASSWORD

Note: whats missing in the LINSTOR Controller is the special "*Env" properties. They should be available in the next feature release for selected variables.

Operator fails to update CR status if reconcile timeout is exceeded

We use a fixed 1 minute time-out for the whole reconcile loop.
This includes:

  1. Setting default values
  2. Deploying resources
  3. Interacting with the deployed resources
  4. Updating the status section of the CR

We should always update our status field, but this may not happen if 1-3 run into a time-out. Especially (3) is prone to such long pauses.

Proposed solution

The quickest solution is to update the status by passing a new context to Status().Update(ctx, ...)
We don't have any external contexts to worry about, so no "external" deadline or expected cancellation.

@JoelColledge opinions?

pv-hostpath helm install failed when hostname is long

Hi,

When I tried pv-hostpath helm chart, it failed with too long job name.

  • on AWS, hostname could be as long as 50 characters, and it caused this error.
[root@ip-172-31-38-220 piraeus-operator]# helm install linstor-etcd ./charts/pv-hostpath --set "nodes={ip-172-31-37-227.ap-northeast-1.compute.internal}"
Error: failed post-install: warning: Hook post-install pv-hostpath/templates/pv.yaml failed: Job.batch "linstor-etcd-ip-172-31-37-227.ap-northeast-1.compute.internal-chown" is invalid: spec.template.labels: Invalid value: "linstor-etcd-ip-172-31-37-227.ap-northeast-1.compute.internal-chown": must be no more than 63 characters
[root@ip-172-31-38-220 piraeus-operator]#

Adding trunc to Jobname's hostname part resolved this.

  • since 'linstor-etcd-''-chown' consumes roughly 20 characters, I set the size of truncation 43

Monitoring metrics

Hello,

First of thank you for all the hard work that you are putting into this product. We are testing the operator, and managed to set everything up as expected. Only setting up monitoring proves to be problematic.

Judging from the code in

func addMetrics(ctx context.Context, cfg *rest.Config) {
it should create the monitoring.coreos.com/v1/ServiceMonitor resource definitions by itself but for some reason it does not. Also creating a ServiceMonitor manually does not seem to work.

kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
spec:
  endpoints:
    - port: http-metrics
  namespaceSelector:
    matchNames:
      - infra-storage
  selector:
    matchLabels:
      name: piraeus-operator

Could you give me some pointers on what I am doing wrong?

Deprecated API version of CRDs

Hi Everyone,

I'm about to try out Piraeus, and I've just seen that the APIs used by the CRDs are deprecated in my k8s version. When I tried to upgrade them with a simple replacement, I got validation errors, so further work is expected to make them compatible. Here are my logs:

$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcsidrivers_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorcsidrivers.piraeus.linbit.com created
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorsatellitesets_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorsatellitesets.piraeus.linbit.com created
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcontrollers_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorcontrollers.piraeus.linbit.com created

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:41:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

$ sed -i -E 's#apiextensions\.k8s\.io/v1beta1#apiextensions.k8s.io/v1#g' piraeus.linbit.com_*
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcsidrivers_crd.yaml
error: error validating "piraeus.linbit.com_linstorcsidrivers_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "additionalPrinterColumns" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorsatellitesets_crd.yaml
error: error validating "piraeus.linbit.com_linstorsatellitesets_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcontrollers_crd.yaml
error: error validating "piraeus.linbit.com_linstorcontrollers_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false

kernel-module-injector fails on Ubuntu 18.04 & 20.04 (Rancher 2.5.1)

Good day!

I'm getting the following log output from kernel-module-injector container of the üiraeus-op-ns-node-xxx pod.

SORRY, kernel makefile not found.
You need to tell me a correct KDIR,
Or install the neccessary kernel source packages.

Makefile:125: recipe for target 'check-kdir' failed
make: *** [check-kdir] Error 1

Could not find the expexted *.ko, see stderr for more details

Isn't the kernel source package installed by the kernel-module-injector? Or what I'm doing wrong

Thank you in advance.

Stephan

drbd-kernel-module-injector fails when using drbd9-centos7:v9.0.23 on CentOS 7.8

Compile fails when using drbd9-centos7:v9.0.23 on CentOS 7.8

# kubectl logs -f piraeus-op-ns-node-mf7zf -c drbd-kernel-module-injector
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build

make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build   M=/tmp/pkg/drbd-9.0.23-1/drbd  modules
  COMPAT  before_4_13_kernel_read
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  drbd_release_returns_void
  COMPAT  genl_policy_in_ops
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_atomic_dec_if_positive_linux
  COMPAT  have_atomic_in_flight
  COMPAT  have_bd_claim_by_disk
  COMPAT  have_bd_unlink_disk_holder
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_flush
  COMPAT  have_bio_free
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_rw
  COMPAT  have_bioset_create_front_pad
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_q_bio
/bin/bash: /usr/bin/mkdir: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[3]: execvp: /bin/bash: Argument list too long
make[3]: *** [/tmp/pkg/drbd-9.0.23-1/drbd/.compat_test.3.10.0-1127.10.1.el7.x86_64/have_blk_queue_split_q_bio_bioset.result] Error 127
make[3]: *** Waiting for unfinished jobs....
/bin/bash: /usr/bin/tr: Argument list too long
  COMPAT  have_blk_queue_plugged
/bin/bash: /usr/bin/tr: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[2]: *** [_module_/tmp/pkg/drbd-9.0.23-1/drbd] Error 2
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
make: *** [module] Error 2

Could not find the expexted *.ko, see stderr for more details

Controller is not able to connect to etcd. The current scope has already been entered.

Since today linstor is not working anymore. cs controller cannot communicate with etcd.

What does "The current scope has already been entered" means?

kubectl -n piraeus-system exec deployment/piraeus-op-cs-controller -- linstor err show 5FA5BD9C-00000-000001
ERROR REPORT 5FA5BD9C-00000-000001

============================================================

Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T09:33:23+00:00
Error time: 2020-11-06 21:18:23
Node: piraeus-op-cs-controller-5456d5cddd-nfnfl

============================================================

Reported error:

Category: RuntimeException
Class name: IllegalStateException
Class canonical name: java.lang.IllegalStateException
Generated at: Method 'checkState', Source file 'Preconditions.java', Line #508

Error message: The current scope has already been entered

Call backtrace:

Method                                   Native Class:Line number
checkState                               N      com.google.common.base.Preconditions:508
enter                                    N      com.linbit.linstor.api.LinStorScope:54
initialize                               N      com.linbit.linstor.systemstarter.PreConnectInitializer:64
startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
start                                    N      com.linbit.linstor.core.Controller:337
main                                     N      com.linbit.linstor.core.Controller:556

END OF ERROR REPORT.

Volume group not found error when using symlink devices

I think the storage preparation section in the docs (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#preparing-physical-devices) should warn against using symlink devices like the ones in /dev/disk/by-id/ or other /dev/disk/by-X folders.

The current 3 requirements should list a 4th one that the device must not be a symlink:

  • Are a root device (no partition)
  • do not contain partition information
  • have more than 1 GiB

Although this change could solve provisioning problems for others, the real solution would be to support persistent device names.

Details

I tried to avoid using /dev/sdX in my devicePaths list for preparing devices as these names are known to be not safe for long-term usage, and one should use persistent device names: https://wiki.archlinux.org/index.php/persistent_block_device_naming

When the operator was deployed with a persistent device name (/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1), all seemed fine, however, when I added a StorageClass and tried to provision a PVC using it, the PVC kept to be unbound. I checked the CSI logs, and found these lines:

csi-provisioner I1225 09:05:54.115790       1 controller.go:645] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = CreateVolume fail
ed for pvc-0797f634-b26b-4d82-b5e0-d35014deb438: Message: 'Not enough available nodes'; Details: 'Not enough nodes fulfilling the following auto-place criteria:
csi-provisioner  * has a deployed storage pool named TransactionList [thinpool]
csi-provisioner  * the storage pools have to have at least '5242880' free space
csi-provisioner  * the current access context has enough privileges to use the node and the storage pool
csi-provisioner  * the node is online
csi-provisioner Auto-place configuration details:
csi-provisioner   Additional place count: 3
csi-provisioner   Don't place with resource (List): [pvc-0797f634-b26b-4d82-b5e0-d35014deb438]
csi-provisioner   Storage pool name: TransactionList [thinpool]
csi-provisioner   Layer stack: [DRBD, STORAGE]
csi-provisioner Auto-placing resource: pvc-0797f634-b26b-4d82-b5e0-d35014deb438'
csi-provisioner I1225 09:05:54.115819       1 controller.go:1084] Final error received, removing PVC 0797f634-b26b-4d82-b5e0-d35014deb438 from claims in progress
csi-provisioner W1225 09:05:54.115827       1 controller.go:943] Retrying syncing claim "0797f634-b26b-4d82-b5e0-d35014deb438", failure 7

Started to dig deeper, and found these errors via linstor CLI:

$ linstor storage-pool l  
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ lvm-thin             ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
┊ lvm-thin             ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
┊ lvm-thin             ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'node1', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found
ERROR:
Description:
    Node: 'node2', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found
ERROR:
Description:
    Node: 'node3', storage pool: 'lvm-think' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found

The related error log:

$ cat /var/log/linstor-satellite/ErrorReport-5FE58E5C-1F7FF-000000.log 
ERROR REPORT 5FE58E5C-1F7FF-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.11.0
Build ID:                           3367e32d0fa92515efe61f6963767700a8701d98
Build time:                         2020-12-18T08:40:35+00:00
Error time:                         2020-12-25 07:02:59
Node:                               node3

============================================================

Reported error:
===============

Description:
    Volume group 'linstor_thinpool' not found

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkVgExists', Source file 'LvmUtils.java', Line #398

Error message:                      Volume group 'linstor_thinpool' not found

Call backtrace:

    Method                                   Native Class:Line number
    checkVgExists                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:398
    checkVolumeGroupEntry                    N      com.linbit.linstor.layer.storage.utils.StorageConfigReader:63
    checkConfig                              N      com.linbit.linstor.layer.storage.lvm.LvmProvider:549
    checkStorPool                            N      com.linbit.linstor.layer.storage.StorageLayer:396
    getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:913
    getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1225
    getStoragePoolSpaceInfo                  N      com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:279
    applyChanges                             N      com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:235
    applyFullSync                            N      com.linbit.linstor.core.apicallhandler.StltApiCallHandler:332
    execute                                  N      com.linbit.linstor.api.protobuf.FullSync:94
    executeNonReactive                       N      com.linbit.linstor.proto.CommonMessageProcessor:525
    lambda$execute$13                        N      com.linbit.linstor.proto.CommonMessageProcessor:500
    doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:147
    lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:75
    call                                     N      reactor.core.publisher.MonoCallable:91
    trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:126
    subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N      reactor.core.publisher.Flux:8343
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2344
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N      reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    slowPath                                 N      reactor.core.publisher.FluxArray$ArraySubscription:126
    request                                  N      reactor.core.publisher.FluxArray$ArraySubscription:99
    onSubscribe                              N      reactor.core.publisher.FluxFlatMap$FlatMapMain:363
    subscribe                                N      reactor.core.publisher.FluxMerge:69
    subscribe                                N      reactor.core.publisher.Flux:8357
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
    subscribe                                N      reactor.core.publisher.FluxConcatArray:80
    subscribe                                N      reactor.core.publisher.InternalFluxOperator:62
    subscribe                                N      reactor.core.publisher.FluxDefer:54
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:373
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:834


END OF ERROR REPORT.

I also found this list, and started to wonder how these became sdb:

$ linstor physical-storage l
╭───────────────────────────────────────────╮
┊ Size       ┊ Rotational ┊ Nodes           ┊
╞═══════════════════════════════════════════╡
┊ 8589934592 ┊ True       ┊ node1[/dev/sdb] ┊
┊            ┊            ┊ node2[/dev/sdb] ┊
┊            ┊            ┊ node3[/dev/sdb] ┊
╰───────────────────────────────────────────╯

It seems that it followed the symlink to sdb:

$ ls -la /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
lrwxrwxrwx. 1 root root 9 Dec 25 10:58 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb

Just to make sure that the device is compatible with the 3 required points, I checked fdisk:

$ fdisk /dev/sdb

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd9639fdb.

Command (m for help): m

Help:

  DOS (MBR)
   a   toggle a bootable flag
   b   edit nested BSD disklabel
   c   toggle the dos compatibility flag

  Generic
   d   delete a partition
   F   list free unpartitioned space
   l   list known partition types
   n   add a new partition
   p   print the partition table
   t   change a partition type
   v   verify the partition table
   i   print information about a partition

  Misc
   m   print this menu
   u   change display/entry units
   x   extra functionality (experts only)

  Script
   I   load disk layout from sfdisk script file
   O   dump disk layout to sfdisk script file

  Save & Exit
   w   write table to disk and exit
   q   quit without saving changes

  Create a new label
   g   create a new empty GPT partition table
   G   create a new empty SGI (IRIX) partition table
   o   create a new empty DOS partition table
   s   create a new empty Sun partition table


Command (m for help): F
Unpartitioned space /dev/sdb: 8 GiB, 8588886016 bytes, 16775168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes

Start      End  Sectors Size
 2048 16777215 16775168   8G

Command (m for help): p
Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd9639fdb

Command (m for help): v
Remaining 16777215 unallocated 512-byte sectors.

Command (m for help): i
No partition is defined yet!

Command (m for help): q

So I started to suspect that something is problematic with the persistent name being a symlink. Since I have no experience with Piraeus/Linstor device migration if it's possible or not, I removed everything from the cluster that is related to these, removed all etcd host path volumes, and cleaned up the cluster. Then I re-deployed the Piraeus operator with just the following change in the Helm values:

   storagePools:
     lvmThinPools:
     - devicePaths:
-      - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
+      - /dev/sdb
       name: lvm-thin
       thinVolume: thinpool
       volumeGroup: ""

And it now works fine, all statuses are green:

$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ lvm-thin             ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
┊ lvm-thin             ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
┊ lvm-thin             ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Is there a way to safely mount a Piraeus PV on one of the k8s nodes?

I was wondering if there is a way to safely mount a Piraeus PV on the host for init or backup access. By safely, I mean that without breaking any replication logic provided by Linstor/DRBD.

When migrating workloads from host path volumes to Piraeus PVs, it would be great to easily copy existing data into the new PVs, or it could also provide backup access. I think backup would be less problematic as it is just reading data, on the other hand, initializing them is lots of writing operations.

I tried to mount the new logical volumes in multiple ways without success:

# from one k8s node

$ mkdir /tmp/test-mount
$ mount /dev/mapper/linstor_thinpool-pvc--3d1a3ebb--7a61--4307--b253--1ec13d5c473e_00000 /tmp/test-mount/
mount: /tmp/test-mount: /dev/mapper/linstor_thinpool-pvc--3d1a3ebb--7a61--4307--b253--1ec13d5c473e_00000 already mounted or mount point busy.
$ mount | grep 3d1a3ebb
/dev/drbd1001 on /var/lib/kubelet/pods/9c4ab752-d7d8-4eac-abdc-b27486b147b7/volumes/kubernetes.io~csi/pvc-3d1a3ebb-7a61-4307-b253-1ec13d5c473e/mount type ext4 (rw,noatime,seclabel,discard,stripe=16)
$ mount /dev/mapper/linstor_thinpool-pvc--9e920c35--8a7b--4022--8597--c8b34dfa4dc1_00000 /tmp/test-mount/
mount: /tmp/test-mount: /dev/mapper/linstor_thinpool-pvc--9e920c35--8a7b--4022--8597--c8b34dfa4dc1_00000 already mounted or mount point busy.
$ mount | grep 9e920c35
$ # no output, this LV was mounted on a different node where the related pod was running, still not mountable here

# from the Proxmox host

$ mkdir /tmp/test-mount
$ mount /dev/mapper/pve-vm--118--disk--1 /tmp/test-mount/
mount: /tmp/test-mount: unknown filesystem type 'LVM2_member'.
# this is probably related to the disk containing a separate LVM structure other than the one the Proxmox host uses,
# so it can't mount the second disk of the VM that is passed to Linstor/DRBD
# otherwise, it would be this easy: https://forum.proxmox.com/threads/how-to-mount-lvm-disk-of-vm.25218/#post-126333

As a workaround, I mounted both the new Piraeus PV and the old host path PV into the pod under migration, and copied the old data to the new PV within the pod's shell. Then removed the old host path PV, and redeployed the pod with using only the new Piraeus PV. I think this workaround satisfies the "safe" requirement as all LV access is made through the CSI, however, it is a bit of a hassle to do it every time one needs to copy something to or from the LV.

Isn't there an easier but a still safe way to achieve the same from one of the k8s nodes or from the machine hosting the nodes?

Upgrad to 1.1.0 failed with error unable to recognize "":

Got the following error on trying upgrade 1.0.0 to 1.1.0.

helm upgrade piraeus-op -f ./helm-tobg/values.yaml ./piraeus-operator/charts/piraeus
Error: UPGRADE FAILED: [unable to recognize "": no matches for kind "LinstorController" in version "piraeus.linbit.com/v1", unable to recognize "": no matches for kind "LinstorSatelliteSet" in version "piraeus.linbit.com/v1"]

Feature request: Encrypted communication

Please consider providing an option to enable encrypted communication (https) between Linstor components.
Ideally, this could also include the communication within and to an etcd database cluster, which is used by a Linstor Controller.

CSI cloning support

While reading through the snapshot description (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/optional-components.md#snapshot-support-components) I also found that K8S supports cloning of an existing PVC: https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/
I haven't found anything in the Piraeus documentation, only the Linstor guide describes two modes how to clone a resource (https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-clone-mode).

So my question is: Does the Piraeus CSI driver also support PVC creation from an existing PVC (aka cloning):

kind: PersistentVolumeClaim
metadata:
  name: cloned-pvc
spec:
  storageClassName: my-csi-plugin
  dataSource:
    name: existing-src-pvc-name
    kind: PersistentVolumeClaim
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

And if yes, how is it technically implemented (snapshot+creation from snapshot or new volume+copying with dd)?

In case cloning is currently not implemented in the CSI driver, it would be a wonderful addition.

Improve install instructions for host preparation

Added a separate issue for #142 (comment)

Looks like the instructions are incomplete for the host set-up. Basically the problem is the missing usermode_helper=disabled parameter. On first install it will most likely work, as the module is only loaded when the injector image runs, and that takes care of setting this option. On restart, I guess the module gets loaded sooner, now without this parameter set.

I'm really looking forward to go through it again :)

Operator depends on presence of legacy CRDs

The operator wont start if the legacy CRDs are not installed (linstorcontrollerset and linstornodeset).

the legacy controllers "depend" on the old CRDs and produce an error in the set up step if the CRDs are not installed. In this case we should just skip adding the legacy controllers, as with CRDs there is nothing to "upgrade".

Make replica count for dependent deployments/stateful sets configurable

For a high-availability deployment, we want all components to be in some way redundant.
Currently the following components are not:

  • csi-controller, is a deployment with replica count 1, has built-in leader election capabilities
  • snapshot-controller, is a statefulset with replica count 1, has built-in leader election capabilities
  • etcd, already working with multiple replicas, default is set to 1
  • piraeus-operator, is a deployment with replica count 1, has built-in leader election capabilities
  • stork, already working with multiple replicas, default is set to 1
  • linstor-controller, currently capped at 1 replica in deployment, needs leader-election mechanism -> see #56

add "piraeus-op-csi-" prefix to snapshot-controller pods

For the sake of nameing conformity, please consider adding piraeus-op-csi- prefix to snapshot-controller pods

NAME                                          READY   STATUS    RESTARTS   AGE
piraeus-op-cs-controller-5c4cf45bdf-vvgc7     1/1     Running   1          10m

piraeus-op-csi-controller-b6dcf95fb-6z29m     5/5     Running   0          10m
piraeus-op-csi-controller-b6dcf95fb-89rq4     5/5     Running   0          10m
piraeus-op-csi-controller-b6dcf95fb-px9vh     5/5     Running   0          10m

piraeus-op-csi-node-d78s5                     2/2     Running   0          10m
piraeus-op-csi-node-dddtj                     2/2     Running   0          10m
piraeus-op-csi-node-m446w                     2/2     Running   0          10m

piraeus-op-etcd-0                             1/1     Running   0          11m
piraeus-op-etcd-1                             1/1     Running   0          8m54s
piraeus-op-etcd-2                             1/1     Running   0          8m

piraeus-op-ns-node-57dgm                      1/1     Running   0          10m
piraeus-op-ns-node-7sh99                      1/1     Running   0          10m
piraeus-op-ns-node-rpscv                      1/1     Running   0          10m

piraeus-op-operator-55b5cf4d5-bqc8s           1/1     Running   0          11m
piraeus-op-operator-55b5cf4d5-nvsh5           1/1     Running   0          11m
piraeus-op-operator-55b5cf4d5-ssmmn           1/1     Running   0          11m

piraeus-op-stork-7994f6f9d4-phwcf             1/1     Running   0          11m
piraeus-op-stork-7994f6f9d4-rzdrq             1/1     Running   0          11m
piraeus-op-stork-7994f6f9d4-xqn5q             1/1     Running   0          11m
piraeus-op-stork-scheduler-77759446d8-kzjtg   1/1     Running   0          11m
piraeus-op-stork-scheduler-77759446d8-tmcv2   1/1     Running   0          11m
piraeus-op-stork-scheduler-77759446d8-txvn5   1/1     Running   0          11m

snapshot-controller-7d674d7-4j5fj             1/1     Running   0          11m
snapshot-controller-7d674d7-67dln             1/1     Running   0          11m
snapshot-controller-7d674d7-nkqww             1/1     Running   0          11m

More darksite test please: remove "ref" link in values.schema.json

Hi, team

Let's do a "darksite (just wipe /etc/resolv.conf) " test before each release, so that issues such as below can be spotted:

charts/csi-snapshotter/values.schema.json has added ref

  "resources": {
      "description": "resource requirements for the snapshotter container",
      "$ref": "https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.18/api/openapi-spec/swagger.json#/definitions/io.k8s.api.core.v1.ResourceRequirements"
    }  

helm actually tries to curl those addresses. In a offline environment (or like my place, without vpn, no stable access to githubs), helm will fail with following errors:

# helm install piraeus-op ./charts/piraeus
Error: values don't meet the specifications of the schema(s) in the following chart(s):
csi-snapshotter:
Get https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.18/api/openapi-spec/swagger.json: dial tcp 151.101.108.133:443: i/o timeout

Solution is to remove the ref line.

Suspicious satellite logs

I suspect the below WARN is thrown because we run Linstor dockerized, and not on the host, is it right?

However, why are all satellite pods outputting capacity as 0 when they have 8G disks from which approx 7G is still free? Shouldn't they say something other than 0? This log is there from the first time I started experimenting with Piraeus, it's not new but I was always curious what does it mean and why 0 capacity is logged.

 linstor-satellite 09:50:08.376 [DeviceManager] WARN  LINSTOR/Satellite - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null
 linstor-satellite 09:51:27.362 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors
 linstor-satellite 01:15:00.542 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors
 linstor-satellite 09:49:27.043 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-satellite
 linstor-satellite 09:49:27.044 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - LogArchive: No logs to archive.
 linstor-satellite 01:15:00.564 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors

Syncronize names with LINSTOR

What

Before releasing v1.0, we want to synchronize names between the operator and LINSTOR.
The current proposal is to rename these CRDs:

  • LinstorNodeSet -> LinstorSatelliteSet
  • LinstorControllerSet -> LinstorController

Why

Some names used by the operator obscure the actual use. LinstorNodeSet should actually be LinstorSatelliteSet. LinstorControllerSet is only ever 1 (active) controller, so it is not a set

For whom

  • For our users, to get coherent documentation between LINSTOR and the operator
  • For us, to have similar concepts named similarly

Done when:

  • The renamed CRDs are used by the operator
  • There is documentation on the renaming
  • There is an automatic upgrade path

Piraeus won't start on CentOS Stream

I've upgraded my k8s nodes from CentOS 8 to CentOS Stream with the official recommended commands: https://www.centos.org/centos-stream/ Seemingly, there were only minor changes in package versions, nothing serious.

After rebooting the nodes, some of Piraeus' internal services wouldn't start up and are in a constant crash loop. It seems there is a problem with the kernel-module-injector container.

I've attached all logs I could think of as relevant to let you solve this issue. Please advise if I should enable further debug options for Piraeus (and how).

$ k get all
NAME                                             READY   STATUS                  RESTARTS   AGE
pod/piraeus-op-cs-controller-cfb475c85-cngdm     1/1     Running                 3          47m
pod/piraeus-op-csi-controller-6fb7f7c5d6-hmspq   6/6     Running                 11         51m
pod/piraeus-op-csi-node-c94w2                    3/3     Running                 12         5d23h
pod/piraeus-op-csi-node-mcvw6                    3/3     Running                 10         5d23h
pod/piraeus-op-csi-node-vk9nj                    3/3     Running                 11         5d23h
pod/piraeus-op-etcd-0                            1/1     Running                 3          5d23h
pod/piraeus-op-etcd-1                            1/1     Running                 1          66m
pod/piraeus-op-etcd-2                            1/1     Running                 1          57m
pod/piraeus-op-ns-node-7lqmk                     0/1     Init:CrashLoopBackOff   5          7m7s
pod/piraeus-op-ns-node-djmtm                     0/1     Init:CrashLoopBackOff   5          7m10s
pod/piraeus-op-ns-node-wlnsj                     0/1     Init:CrashLoopBackOff   5          7m8s
pod/piraeus-op-operator-7466ddd49c-bbkgm         1/1     Running                 6          58m

NAME                      TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)             AGE
service/piraeus-op-cs     ClusterIP   10.43.60.86   <none>        3370/TCP            18d
service/piraeus-op-etcd   ClusterIP   None          <none>        2380/TCP,2379/TCP   18d

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/piraeus-op-csi-node   3         3         3       3            3           <none>          18d
daemonset.apps/piraeus-op-ns-node    3         3         0       3            0           <none>          18d

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/piraeus-op-cs-controller    1/1     1            1           18d
deployment.apps/piraeus-op-csi-controller   1/1     1            1           18d
deployment.apps/piraeus-op-operator         1/1     1            1           18d

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/piraeus-op-cs-controller-54b4444965    0         0         0       5d23h
replicaset.apps/piraeus-op-cs-controller-7fb6c98656    0         0         0       18d
replicaset.apps/piraeus-op-cs-controller-cfb475c85     1         1         1       5d23h
replicaset.apps/piraeus-op-csi-controller-565c954d87   0         0         0       18d
replicaset.apps/piraeus-op-csi-controller-6fb7f7c5d6   1         1         1       51m
replicaset.apps/piraeus-op-operator-7466ddd49c         1         1         1       5d23h
replicaset.apps/piraeus-op-operator-7bc4759c9d         0         0         0       18d

NAME                               READY   AGE
statefulset.apps/piraeus-op-etcd   3/3     18d

NAME                                     COMPLETIONS   DURATION   AGE
job.batch/linstor-etcd-rke-node1-chown   1/1           2s         18d
job.batch/linstor-etcd-rke-node2-chown   1/1           2s         18d
job.batch/linstor-etcd-rke-node3-chown   1/1           1s         18d
job.batch/piraeus-op-test-cs-svc         1/1           38s        18d

$ k logs daemonset.apps/piraeus-op-ns-node kernel-module-injector --previous
Found 3 pods, using pod/piraeus-op-ns-node-djmtm
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.0.25-1/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/4.18.0-257.el8.x86_64/build

make -C /lib/modules/4.18.0-257.el8.x86_64/build   M=/tmp/pkg/drbd-9.0.25-1/drbd  modules
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  drbd_release_returns_void
  COMPAT  genl_policy_in_ops
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_atomic_dec_if_positive_linux
  COMPAT  have_atomic_in_flight
  COMPAT  have_bd_claim_by_disk
  COMPAT  have_bd_unlink_disk_holder
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_flush
  COMPAT  have_bio_free
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_rw
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_create_front_pad
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_plugged
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_blkdev_get_by_path
  COMPAT  have_d_inode
  COMPAT  have_file_inode
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_alloc
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_netlink_cb_portid
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_part_stat_h
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_prandom_u32
  COMPAT  have_proc_create_single
  COMPAT  have_ratelimit_state_init
  COMPAT  have_rb_augment_functions
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_same
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_prio
  COMPAT  have_req_write
  COMPAT  have_req_write_same
  COMPAT  have_security_netlink_recv
  COMPAT  have_shash_desc_zero
  COMPAT  have_signed_nla_put
  COMPAT  have_simple_positive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_kernel_param_ops
  COMPAT  have_struct_size
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  hlist_for_each_entry_has_three_parameters
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  kmap_atomic_page_only
  COMPAT  need_make_request_recursion
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  UPD     /tmp/pkg/drbd-9.0.25-1/drbd/compat.4.18.0-257.el8.x86_64.h
  UPD     /tmp/pkg/drbd-9.0.25-1/drbd/compat.h
./drbd-kernel-compat/gen_compat_patch.sh: line 12: spatch: command not found
./drbd-kernel-compat/gen_compat_patch.sh: line 45: hash: spatch: not found
  INFO: no suitable spatch found; trying spatch-as-a-service;
  be patient, may take up to 10 minutes
  if it is in the server side cache it might only take a second
  SPAAS    1c20515525cffc698b58b76a5d936660
Successfully connected to SPAAS ('d35a4b17210dab1336de2725b997f300e9acd297')
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5591  100   772    0  4819   8674  54146 --:--:-- --:--:-- --:--:-- 62820
  You can create a new .tgz including this pre-computed compat patch
  by calling "make unpatch ; echo drbd-9.0.25-1/drbd/drbd-kernel-compat/cocci_cache/1c20515525cffc698b58b76a5d936660/compat.patch >>.filelist ; make tgz"
  PATCH
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd_receiver.c
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_dax_pmem.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/lru_cache.o
  CC [M]  /tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.o
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c: In function 'drbd_create_device':
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c:3713:6: error: implicit declaration of function 'blk_alloc_queue'; did you mean 'blk_alloc_queue_rh'? [-Werror=implicit-function-declaration]
  q = blk_alloc_queue(drbd_make_request, NUMA_NO_NODE);
      ^~~~~~~~~~~~~~~
      blk_alloc_queue_rh
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c:3713:4: warning: assignment to 'struct request_queue *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
  q = blk_alloc_queue(drbd_make_request, NUMA_NO_NODE);
    ^
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:316: /tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.o] Error 1
make[2]: *** [Makefile:1545: _module_/tmp/pkg/drbd-9.0.25-1/drbd] Error 2
make[1]: Leaving directory '/tmp/pkg/drbd-9.0.25-1/drbd'
make[1]: *** [Makefile:132: kbuild] Error 2
make: *** [Makefile:135: module] Error 2

Could not find the expexted *.ko, see stderr for more details

$ k version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:09:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

$ docker version
Client: Docker Engine - Community
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        831ebea
 Built:             Tue Dec 15 04:34:30 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       f001486
  Built:            Tue Dec 15 04:32:21 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ cat /etc/centos-release        
CentOS Stream release 8

$ cat /etc/os-release 
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

$ dnf install -y kmod-drbd90 drbd90-utils
Last metadata expiration check: 1:14:10 ago on Sat 19 Dec 2020 07:45:54 AM CET.
Package kmod-drbd90-9.0.25-2.el8_3.elrepo.x86_64 is already installed.
Package drbd90-utils-9.13.1-1.el8.elrepo.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

$ dnf info kmod-drbd90 drbd90-utils
Last metadata expiration check: 1:15:29 ago on Sat 19 Dec 2020 07:45:54 AM CET.
Installed Packages
Name         : drbd90-utils
Version      : 9.13.1
Release      : 1.el8.elrepo
Architecture : x86_64
Size         : 5.8 M
Source       : drbd90-utils-9.13.1-1.el8.elrepo.src.rpm
Repository   : @System
From repo    : elrepo
Summary      : Management utilities for DRBD
URL          : http://www.drbd.org/
License      : GPLv2+
Description  : DRBD mirrors a block device over the network to another machine.
             : Think of it as networked raid 1. It is a building block for
             : setting up high availability (HA) clusters.
             : 
             : This packages includes the DRBD administration tools and integration
             : scripts for heartbeat, pacemaker, rgmanager and xen.

Name         : kmod-drbd90
Version      : 9.0.25
Release      : 2.el8_3.elrepo
Architecture : x86_64
Size         : 1.3 M
Source       : kmod-drbd90-9.0.25-2.el8_3.elrepo.src.rpm
Repository   : @System
From repo    : elrepo
Summary      : drbd90 kernel module(s)
URL          : http://www.drbd.org/
License      : GPLv2
Description  : DRBD is a distributed replicated block device. It mirrors a
             : block device over the network to another machine. Think of it
             : as networked raid 1. It is a building block for setting up
             : high availability (HA) clusters.
             : This package provides the drbd90 kernel module(s).
             : It is built to depend upon the specific ABI provided by a range of releases
             : of the same variant of the Linux kernel and not on any one specific build.

$ git log
commit 4ee8b6e6a556cb64877a966bd857050b00834caa (HEAD -> master, origin/master, origin/HEAD)
Author: Moritz "WanzenBug" Wanzenböck <...>
Date:   Wed Nov 18 13:59:08 2020 +0100

    Prepare next dev cycle

commit 5068780fda8ce603a6ea32ee70b57b4e6b4e1f23 (tag: v1.2.0)
Author: Moritz "WanzenBug" Wanzenböck <...>
Date:   Wed Nov 18 13:58:51 2020 +0100

    Release v1.2.0

Should a Piraeus git repo upgrade and redeploy solve it, maybe? As there were no releases in the meantime, I'm running the latest 1.2.0 version.

I've also seen #134 and the mentioned files do not exist on the nodes, and also the make error seems to be different.

$ cat /sys/module/drbd/parameters/usermode_helper
cat: /sys/module/drbd/parameters/usermode_helper: No such file or directory

$ cat /etc/modprobe.d/drbd.conf
cat: /etc/modprobe.d/drbd.conf: No such file or directory

PVC property in linstor volume/resource definition

For debugging we often have to check the status of the volume/resource in the linstor CLI. Therefore we first have to check the PV name the CSI driver has generated for the PVC. With the PV name we can then execute linstor v l|grep <pv_name> to see what the actual status of the volume/resource in DRBD is.

To ease that process it would be helpful to write the PVC name to a custom property of the volume and/or resource. In addition the linstor command should be extended to add a column for additional properties specified in an argument (but this feature request has to be raised at https://github.com/LINBIT/linstor-server).
With that you could easily find out which ResourceName (PV name) and DeviceName belong to the PVC name, which significantly will speed up analysis.

Is that a feature of general interest?

A volume definition with the number 0 already exists in resource definition

We get this error on pvc bounding. The status of the pvc is pending forever.

Warning ProvisioningFailed 7m43s linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal Provisioning 3m26s (x9 over 7m53s) linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 External provisioner is provisioning volume for claim "default/hdd-pvc-zabbix-mysql01-8bb0a103-7b21-4e4d-822d-71da9bdc08b3"
Warning ProvisioningFailed 3m26s (x8 over 7m42s) linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = Internal desc = CreateVolume failed for pvc-44c6873d-4875-4c86-9588-a55c2a520302: Message: 'A volume definition with the number 0 already exists in resource definition 'pvc-44c6873d-4875-4c86-9588-a55c2a520302'.'; Cause: 'The VolumeDefinition already exists'; Details: 'Volume definitions of resource: pvc-44c6873d-4875-4c86-9588-a55c2a520302'
Normal ExternalProvisioning 102s (x26 over 7m53s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "linstor.csi.linbit.com" or manually created by system administrator

linstor v l showa this pvc with status:
k8smaster ┊ pvc-44c6873d-4875-4c86-9588-a55c2a520302 ┊ lvm-hdd ┊ 0 ┊ 1132 ┊ /dev/drbd1132 ┊ 9.05 GiB ┊ Unused ┊ UpToDate ┊
┊ k8w6 ┊ pvc-44c6873d-4875-4c86-9588-a55c2a520302 ┊ lvm-hdd ┊ 0 ┊ 1132 ┊ /dev/drbd1132 ┊ 6.25 GiB ┊ Unused ┊ Outdated ┊

Resource not being deleted after PVC and PV deletion

When PVC and PV gets deleted, linstor resource list still shows all of the PVC's

My setup:

    - helm upgrade piraeus-op ./charts/piraeus
      --install
      --create-namespace
      --namespace piraeus
      --values ./values.yaml
      --set etcd.enabled=false
      --set operator.controller.dbConnectionURL=jdbc:postgresql://10.10.6.5:5432/linstordb?user=linstoruser&password=123

Helm Values file:

operator:
  replicas: 3
  satelliteSet:
    kernelModuleInjectionImage: quay.io/piraeusdatastore/drbd9-focal:v9.0.27
    storagePools:
      lvmThinPools:
      - name: lvm-thin-hdd
        thinVolume: thinpoolhdd
        volumeGroup: "linstor_thinpoolhdd"
      - name: lvm-thin-ssd
        thinVolume: thinpoolssd
        volumeGroup: "linstor_thinpoolssd"
  controller:
    replicas: 3
csi:
  enableTopology: true
  controllerReplicas: 3
haController:
  replicas: 3

Created custom StorageClass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: linstor-hdd-r1
  annotations:
    storageclass.kubernetes.io/is-default-class: 'true'
provisioner: linstor.csi.linbit.com
parameters:
  allowRemoteVolumeAccess: 'true'
  autoPlace: '1'
  csi.storage.k8s.io/fstype: xfs
  disklessOnRemaining: 'false'
  mountOpts: 'noatime,discard'
  placementPolicy: FollowTopology
  resourceGroup: linstor-hdd-r1
  storagePool: lvm-thin-hdd
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Deployed a StatefulSet, it provisions nicely, then completely uninstalled it including namespace.

Linstor still has old Resources:
image

Is this expected behavior or a bug or am I doing something wrong?

Configure LINSTOR to also log to files

Right now LINSTOR only logs to stdout, i.e. the logs are only available via Kubernetes logs. This is a problem when trying to access the logs from inside the container, i.e. linstor sos-report create.

We should configure controller and satellite to actually log in their log directory. This probably needs to be done via the logback.xml config. A useful snippet may be:

sed 's/<!-- <appender-ref ref="FILE" /> -->/<appender-ref ref="FILE" />/g' logback.xml

Document setting operator.satelliteSet.kernelModuleInjectionImage

Install the appropriate kernel headers package for your distribution. Then the operator will compile and load the required modules.

This sentence implies that the operator will sort out the modules and there is nothing more the user needs to do. However, when they are not on Ubuntu Bionic, they also need to set operator.satelliteSet.kernelModuleInjectionImage.

This has long been a stumbling block. We should fix it.

[Feature Request] Global registry override for dark sites

As there are quite a few images and registries in the value.yaml, it would be more convenient to have a global registry override for dark sites where a single intranet registry are usually employed to get job done.

If below can be made possible (all of quay.io, docker.io, gcr.io registries are overridden)

helm install piraeus-op piraeus/piraeus --global.registryOverride=myreg.io/piraeus

Securing components

Hi! I am testing Piraeus now and I got it working, but without configuring TLS etc between components or for etcd. That part of the setup is a little complicated for me and I am not sure of how to proceed...

So the question is, what are the risks? I am using this with virtual servers with a cheap provider that don't have private network. Do I risk something if I use the setup as it is? If yes, is there a bit more detailed guide on securing the components?

Thanks!

Feature request: NVMe layer support

Please consider providing an option to support the NVMe layer for Linstor Satellites.
For example, a Satellite's pod might access modules required for the NVMe layer from a host, or maybe modules could be provided as a sidecar container.

Questions related to Piraeus usage

Hi Everyone,

I've managed to start up the Piraeus Operator in an RKE k8s cluster with 3 CentOS 8 nodes, etcd with 3 replicas and persistence through hostPath volume, CSI snapshotter and Stork turned off, kernel injector replaced for CentOS 8, hosts prepared with drbd and kernel headers as required. The 3 VM live on LVM Thin provisioned by Proxmox, and all VMs have one device each (sda) for the OS and local node data. The backing storage is NVMe.

As far as I see, the storage documentation (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md) supports managing ZFS and LVM (Thin) pools through the operator. Since I'm not running the k8s cluster on bare metal, I'd need to create new disks in Proxmox (sdb) and attach them to the VMs. This way I could use the "Preparing physical devices" section to add these new disks to Piraeus. Question 1: is this "multi-level" or nested LVM thin provisioning supported? I think these sub-LVM Thin volumes would not mess with the external LVM Thin as these can contain any filesystem or LVM partition on their own. Am I right?

Question 2: My second question is related to the host disks (sda). Could I use these disks to provision storage to the pods running in k8s by using a folder on the host disks? What I originally desired is to use the host VM filesystem to store data from pods in a highly available manner, replicated across VMs. Like a hostPath or LocalVolume PV but this wouldn't depend on a single node. I'd like to store SQLite databases which would be only accessed by one pod each. I've found network file storage highly unreliable with SQLite (DB file corruption, locking issues, etc) but since I have an NVMe disk, and Linstor/Piraeus seems to be fast, I hope I could achieve this. Maybe I'm just not familiar enough with Piraeus/Linstor to know this is not possible, and I need to stick to disks. If the folder would contain binary files for storing the written blocks, and not the actual files, that would also work. As a workaround, may a loopback device would work pointing to a file that would contain the LVM Thin storage on each host?

+------------------+
|                  |
|  VM       +------+--------+
|  running  |               |
|  RKE      |   sda (OS)    +<------+
|           |               |       |
|           +------+--------+       |
|                  |                |
|           +------+--------+       |
|           |               |       |
|           |   sdb (data)  +<---+  |
|           |               |    |  |
|           +------+--------+    |  |
|                  |             |  |
+------------------+             |  |
                                 |  |
+------------------+             |  |
|                  |             |  |
|  VM       +------+--------+    |  |
|  running  |               |    |  |
|  RKE      |   sda (OS)    +<------+
|           |               |    |  |
|           +------+--------+    |  |
|                  |             |  |
|           +------+--------+    |  |
|           |               |    |  |
|           |   sdb (data)  +<---+  |
|           |               |    |  |
|           +------+--------+    |  |
|                  |             |  |
+------------------+             |  |
                                 |  |
+------------------+             |  |
|                  |             |  |
|  VM       +------+--------+    |  |
|  running  |               |    |  |
|  RKE      |   sda (OS)    +<------+
|           |               |    |  |
|           +------+--------+    |  |
|                  |             |  |
|           +------+--------+    |  |
|           |               |    |  |
|           |   sdb (data)  +<---+  |
|           |               |    |  |
|           +------+--------+    |  |
|                  |             |  |
+------------------+             +  +

                           Linstor (Piraeus)
                           1. the whole sdb dev?
                           2. a folder on sda?

Question 3: Since I haven't provision storage through satellite sets yet, is this why I can't see any new StorageClass created in the cluster? If I check the contents of the non-operator Piraeus deployment (https://raw.githubusercontent.com/piraeusdatastore/piraeus/master/deploy/all.yaml), there are multiple storage classes defined. The operator only creates these when it has the satellites configured with storage?

$ kubectl exec -it deployment.apps/piraeus-op-cs-controller -- bash

root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor n list
╭──────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node                                      ┊ NodeType   ┊ Addresses                  ┊ State  ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ piraeus-op-cs-controller-7fb6c98656-zvx49 ┊ CONTROLLER ┊ 10.42.1.72:3366 (PLAIN)    ┊ Online ┊
┊ node1                                     ┊ SATELLITE  ┊ 192.168.1.201:3366 (PLAIN) ┊ Online ┊
┊ node2                                     ┊ SATELLITE  ┊ 192.168.1.202:3366 (PLAIN) ┊ Online ┊
┊ node3                                     ┊ SATELLITE  ┊ 192.168.1.203:3366 (PLAIN) ┊ Online ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor n info
╭─────────────────────────────────────────────────────────────╮
┊ Node      ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊
╞═════════════════════════════════════════════════════════════╡
┊ node1     ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊
┊ node2     ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊
┊ node3     ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊
╰─────────────────────────────────────────────────────────────╯
Unsupported storage providers:
 node1: 
  SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
  ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
  ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
 node2: 
  SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
  ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
  ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
 node3: 
  SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
  ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
  ZFS: 'cat /sys/module/zfs/version' returned with exit code 1

╭──────────────────────────────────────────╮
┊ Node      ┊ DRBD ┊ LUKS ┊ NVMe ┊ Storage ┊
╞══════════════════════════════════════════╡
┊ node1     ┊ +    ┊ -    ┊ +    ┊ +       ┊
┊ node2     ┊ +    ┊ -    ┊ +    ┊ +       ┊
┊ node3     ┊ +    ┊ -    ┊ +    ┊ +       ┊
╰──────────────────────────────────────────╯
Unsupported resource layers:
 node1: 
  LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory
 node2: 
  LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory
 node3: 
  LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory

root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor v list
╭──────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════╡
╰──────────────────────────────────────────────────────────────────────────────────────────╯

Note: I have cryptsetup installed on the nodes, so I don't know why it's not available. However, my Helm values have an empty operator.controller.luksSecret value, maybe that's why.

Note: this is a home k8s cluster, not affiliated with my work, and I'm on holiday today. The initiator problem is Sonarr/Radarr corrupting their SQLite DBs on NFS storage through nfs-client-provisioner. I'm looking for a "more cloud-native" solution than Ceph, more performant than OpenEBS Jiva/cstor and Longhorn, and more stable than OpenEBS Mayastor. Linstor/Piraeus seemed like a really good alternative from multiple benchmarks, and easy to set up. It indeed looks promising, I really hope I can use it after getting to know it :) Rancher has a host path provisioner project that could share host storage with pods but it would still depend on the shared folder's node, and I'd like to have data replication among nodes.

Two controllers?

Hi, I just tried new piraeus-operator, and found that it is deployed two controllers, one as kind: Deployment another one as kind: StatefulSet:

# k get deploy
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
piraeus-op-cs-controller     1/1     1            1           12h
piraeus-op-csi-controller    1/1     1            1           10d
piraeus-op-operator          1/1     1            1           10d
piraeus-op-stork             1/1     1            1           10d
piraeus-op-stork-scheduler   1/1     1            1           10d

# k get sts
NAME                       READY   AGE
piraeus-op-cs-controller   0/1     10d
piraeus-op-etcd            1/1     10d
snapshot-controller        1/1     10d

# k get pod -l app=piraeus-op-cs,role=piraeus-controller
NAME                                        READY   STATUS    RESTARTS   AGE
piraeus-op-cs-controller-0                  1/1     Running   0          5m25s
piraeus-op-cs-controller-75fc99d964-c5z87   1/1     Running   0          12h
root@piraeus-op-cs-controller-0:/# linstor n l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node                                      ┊ NodeType   ┊ Addresses                 ┊ State                     ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ ks1                                       ┊ SATELLITE  ┊ 172.16.0.6:3366 (PLAIN)   ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks2                                       ┊ SATELLITE  ┊ 172.16.0.7:3366 (PLAIN)   ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks3                                       ┊ SATELLITE  ┊ 172.16.0.8:3366 (PLAIN)   ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks4                                       ┊ SATELLITE  ┊ 172.16.0.9:3366 (PLAIN)   ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ piraeus-op-cs-controller-0                ┊ CONTROLLER ┊ 10.244.3.216:3366 (PLAIN) ┊ OFFLINE                   ┊
┊ piraeus-op-cs-controller-75fc99d964-c5z87 ┊ CONTROLLER ┊ 10.244.2.152:3366 (PLAIN) ┊ Online                    ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Pod resources

What

We want to make all pods configurable with resource requests and limits.

Why

Some k8s distributions require a set of resource requests and limits to schedule pods. Pods without these will never start.
On the other hand, for quick tests it is often desired to ignore those requests (as the test machine cannot provide the requested resources)

For whom

  • For users with k8s distributions that require a resource specification

How to on existing ZFS pool?

New to linstor and related tooling and was wondering if someone could show me the needed yaml config for adding an existing ZFS pool satellite via pireaus-operator? There seems to be lots of docs on lvm but not much on ZFS.

rpc error: code = DeadlineExceeded desc = context deadline exceeded

I just did a re-installation of linstor on my 5 node k8s version 1.18.9 drbd 9.0.25.
It is not possible to create any pvc. I always get the error DeadlineExceeded.
Any solution? What can I do to debug this? Thanks.

This is the error from csi-plugin:

time="2020-10-13T11:32:42Z" level=error msg="method failed" func="github.com/sirupsen/logrus.(*Entry).Error" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:297" error="rpc error: code = Internal desc = CreateVolume failed for pvc-a36918fc-0bb3-40ce-aace-ce86288d93d6: unable to determine volume topology: unable to determine AccessibleTopologies: context canceled" linstorCSIComponent=driver method=/csi.v1.Controller/CreateVolume nodeID=k8w2 provisioner=linstor.csi.linbit.com req="name:"pvc-a36918fc-0bb3-40ce-aace-ce86288d93d6" capacity_range:<required_bytes:1000000000 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > parameters:<key:"autoPlace" value:"1" > parameters:<key:"storagePool" value:"lvm-hdd" > " resp= version=v0.9.0

[Bug] REST-API https port 3371 does not come up unless "--rest-bind=0.0.0.0:3370" is removed from piraeus-server entry.sh

Currently, piraeus-server starts by default with command argument --rest-bind=0.0.0.0:3370, as in https://github.com/piraeusdatastore/piraeus/blob/63dc06e13e7841607a2ec433349ef717ed2af498/dockerfiles/piraeus-server/entry.sh#L12

As a result, when controler.linstorHttpsControllerSecret is set, https port 3371 cannot come up.

Aug 13, 2020 11:23:00 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:23:00 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:23:01 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3370]
Aug 13, 2020 11:23:01 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
11:23:01.150 [Main] ERROR LINSTOR/Controller - SYSTEM - Unable to start grizzly http server on 0.0.0.0:3370.
11:23:01.152 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized

After removing --rest-bind=0.0.0.0:3370 from cmd args, port 3371 comes up.

Aug 13, 2020 11:13:18 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:18 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
11:13:19.060 [Main] ERROR LINSTOR/Controller - SYSTEM - Unable to start grizzly http server on [::]:3370.
11:13:19.061 [Main] INFO  LINSTOR/Controller - SYSTEM - Trying to start grizzly http server on fallback ipv4: 0.0.0.0
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3370]
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3371]
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-2] Started.
11:13:20.539 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized

Feature request: Storage Pools configuration options

Please consider providing an option to configure storage pools independently on different nodes.
Currently on each node there is a drbdpool storage pool created, which I only need on selected nodes. On other nodes I would like to have just a diskless storage pool.

Also please consider adding a way to set specific properties for storage pools, like for example a PrefNic.

Read only filesystem after reboot k8s nodes

I got some pvc which are read only after reboot of k8s nodes.
linstor v l shows up to date.

root@piraeus-op-cs-controller-667d5b46fb-rx9nz:/# linstor v l | grep 2bb1
| k8w1 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | DfltDisklessStorPool | 0 | 1037 | /dev/drbd1037 | | InUse | Diskless |
| k8w2 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 4.70 GiB | Unused | UpToDate |
| k8w3 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 296.69 MiB | Unused | UpToDate |
| k8w4 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 303.25 MiB | Unused | UpToDate |

How can I fix this?

Storage Pool raidLevel property not accepting any LVM raid level value

I was setting up Piraeus in my homelab and I'm trying to set the raidLevel property[1] of a LVM pool but none of the values I'm trying to supply to it are accepted.
I've tried setting it to different values of what should be supported by LVM such as striped, raid0, mirror and raid1 and even literal RAID numbers (e.g. , "0") but they all fail at runtime with an error similar to the following in the operator logs:

time="2021-01-30T23:16:14Z" level=info msg="satellite Reconcile: reconcile loop end" Controller=linstorsatelliteset controller=LinstorSatelliteSet err="multiple errors: Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017344]'|Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017345]'" requestName=piraeus-ns requestNamespace=storage-operators result="{false 0s}"
{"level":"error","ts":1612048574.3175702,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"LinstorSatelliteSet-controller","request":"storage-operators/piraeus-ns","error":"multiple errors: Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017344]'|Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017345]'","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/runner/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}

Going through LINBIT's linstor-server repo to locate the mentioned enum[2] it looks like the whole field can't even do anything right now. It seems the only value you can use would be jbod which is the default anyway.
Or am I missing something and is this supposed to be working?

[1] https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#lvmpools-configuration
[2] https://github.com/LINBIT/linstor-server/blob/master/server/src/main/java/com/linbit/linstor/storage/kinds/RaidLevel.java

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.