piraeusdatastore / piraeus-operator Goto Github PK
View Code? Open in Web Editor NEWThe Piraeus Operator manages LINSTOR clusters in Kubernetes.
Home Page: https://piraeus.io/
License: Apache License 2.0
The Piraeus Operator manages LINSTOR clusters in Kubernetes.
Home Page: https://piraeus.io/
License: Apache License 2.0
We have tryed to install Piraeus with MariaDB database instead of ETCD. Installation with ETCD passed without any problem.
We have tryed various commands and users, notnihg worked. Commads that we tryed are:
helm install -n linstor piraeus-op ./charts/piraeus --set "operator.controller.dbConnectionURL=jdbc:mysql://user:[email protected]:3306/linstor?createDatabaseIfNotExist=true&useMysqlMetadata=true"
and variations on this were
Here is log from cs-controller:
I0129 12:20:12.423886 1 leaderelection.go:242] attempting to acquire leader lease linstor/piraeus-op-cs...
time="2021-01-29T12:20:12Z" level=info msg="long live our new leader: 'piraeus-op-cs-controller-7d99c4fff5-vv4bx'!"
I0129 12:20:42.998653 1 leaderelection.go:252] successfully acquired lease linstor/piraeus-op-cs
time="2021-01-29T12:20:42Z" level=info msg="long live our new leader: 'piraeus-op-cs-controller-7d99c4fff5-zvm6w'!"
time="2021-01-29T12:20:43Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"
LINSTOR, Module Controller
Version: 1.11.1 (fe95a94d86c66c6c9846a3cf579a1a776f95d3f4)
Build time: 2021-01-13T08:34:55+00:00
Java Version: 11
Java VM: Debian, Version 11.0.9.1+1-post-Debian-1deb10u2
Operating system: Linux, Version 5.4.0-64-generic
Environment: amd64, 1 processors, 1925 MiB memory reserved for allocations
System components initialization in progress
12:20:43.951 [main] INFO LINSTOR/Controller - SYSTEM - ErrorReporter DB first time init.
12:20:43.953 [main] INFO LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
12:20:43.988 [main] WARN io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***
12:20:43.999 [Main] INFO LINSTOR/Controller - SYSTEM - Loading API classes started.
12:20:44.332 [Main] INFO LINSTOR/Controller - SYSTEM - API classes loading finished: 332ms
12:20:44.332 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection started.
12:20:44.344 [Main] INFO LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
12:20:44.345 [Main] INFO LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/linstor-server/lib/guice-4.2.3.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
12:20:45.308 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection finished: 976ms
12:20:45.604 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
12:20:45.717 [Main] INFO LINSTOR/Controller - SYSTEM - SpaceTracking using SQL driver
12:20:45.719 [Main] INFO LINSTOR/Controller - SYSTEM - SpaceTrackingService: Instance added as a system service
12:20:45.720 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
12:20:45.721 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing the database connection pool
12:20:45.767 [Main] ERROR LINSTOR/Controller - SYSTEM - Database initialization error [Report number 6013FD9B-00000-000000]
12:20:45.774 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutdown in progress
12:20:45.778 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'SpaceTrackingService' of type SpaceTrackingService
12:20:45.780 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'SpaceTrackingService' to complete shutdown
12:20:45.780 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'TaskScheduleService' of type TaskScheduleService
12:20:45.781 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'TaskScheduleService' to complete shutdown
12:20:45.781 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'DatabaseService' of type DatabaseService
12:20:45.782 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'DatabaseService' to complete shutdown
12:20:45.782 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService
12:20:45.783 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown
12:20:45.784 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutdown complete
time="2021-01-29T12:20:45Z" level=fatal msg="failed to run" err="exit status 20" ```
Linstor CSI supports the CSI Topology feature. However, it is disabled. See here
One reason why someone might want to disable this feature: piraeusdatastore/linstor-csi#54
In short: the way it is implemented might be unsuitable for large clusters. (I have a feeling this is just a limitation in the way Linstor CSI implemented this feature, but I haven't investigated further)
Adding a new value to the CSI resource to toggle this flag shouldn't be an issue. This leaves the following questions:
My opinion: yes, its a useful feature
My opinion: yes, as there are certain problems with this feature, as linked above
My opinion: Default to "yes". A default installation of the operator should provide the fullest feature set for users that just want to try it out.
Opinions @alexzhc @JoelColledge ?
As a developer i want to generate certificates for encrypted communication inside Linstor(link) using Kubernetes/Openshift capabilities.
type: kubernetes\tls
type: kubernetes\tls
Kubernetes and Openshift provide ability to generate certificates in following format:
[root@utility1 ~]# kubectl describe node-secret -n cert-manager
NAME TYPE DATA AGE
node-secret kubernetes.io/tls 5 6d
[root@utility1 ~]# kubectl describe secret node-secret -n cert-manager
Name: node-secret
Namespace: cert-manager
Labels: <none>
Annotations: cert-manager.io/alt-names:
cert-manager.io/certificate-name: node-secret
cert-manager.io/common-name: node-secret
cert-manager.io/ip-sans:
cert-manager.io/issuer-kind: Issuer
cert-manager.io/issuer-name: selfsigned-issuer
cert-manager.io/uri-sans:
Type: kubernetes.io/tls
Data
====
tls.key: 1675 bytes
truststore.jks: 801 bytes
ca.crt: 1062 bytes
keystore.jks: 2867 bytes
keystore.p12: 3127 bytes
tls.crt: 1062 bytes
Currently Linstor doesn't have unified naming for certificate secrets.
Linstor Components - use keystore.jks, truststore.jks
Linstor API - use keystore.jks, ca.pem, client.key, truststore.jks
ETCD - use cert.pem, key.pem, ca.pem, client.cert, client.key
Inside Kubernetes is always tls.crt/ca.crt/tls.key and Linstor should support this naming convention.
Currently Linstor doesn't have unified certificate format.
Linstor Components - stored in JKS format
Linstor API - stored in PKCS12 format and
ETCD cluster - PKCS8 + X.509
Linstor should use certificates and key in unified format as in Kubernetes
Certificates validity period is limited. Linstor should detect secret update and automatically apply new certificate without downtime
Hi, @WanzenBug
As cs-controller and csi-controller both may have multiple replicas for HA. It is important to add podAntiAffinity
, so that pods will land on different nodes.
For cs-controller, I can do
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: role
operator: In
values:
- piraeus-controller
topologyKey: kubernetes.io/hostname
However csi-controller labels are not that articulated for affinity use:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2020-11-02T08:05:18Z"
generation: 1
labels:
app: piraeus-op
name: piraeus-op-csi-controller
namespace: storage-system
Recommendations:
labels: role: piraeus-csi-controller
for op-csi-controllerpodAntiAffinity
default for both controllers ?Can I fix this issue without running controller pod with root privileges?
log:
time="2020-11-18T19:13:48Z" level=info msg="running k8s-await-election" version=refs/tags/v0.2.0
I1118 19:13:48.722244 1 leaderelection.go:242] attempting to acquire leader lease piraeus-operator/piraeus-cs...
time="2020-11-18T19:13:51Z" level=info msg="long live our new leader: 'piraeus-cs-controller-5f998d764b-btsjm'!"
I1120 12:17:59.154742 1 leaderelection.go:252] successfully acquired lease piraeus-operator/piraeus-cs
time="2020-11-20T12:17:59Z" level=info msg="long live our new leader: 'piraeus-cs-controller-5f998d764b-zz46s'!"
time="2020-11-20T12:17:59Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"
LINSTOR, Module Controller
Version: 1.9.0 (678acd24a8b9b73a735407cd79ca33a5e95eb2e2)
Build time: 2020-09-23T09:33:23+00:00
Java Version: 11
Java VM: Debian, Version 11.0.8+10-post-Debian-1deb10u1
Operating system: Linux, Version 5.6.19-300.fc32.x86_64
Environment: amd64, 1 processors, 29694 MiB memory reserved for allocations
System components initialization in progress
12:17:59,493 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
12:17:59,494 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
12:17:59,494 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/usr/share/linstor-server/lib/conf/logback.xml]
12:17:59,549 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
12:17:59,551 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
12:17:59,553 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
12:17:59,556 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
12:17:59,568 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
12:17:59,569 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILE]
12:17:59,573 |-INFO in ch.qos.logback.core.rolling.FixedWindowRollingPolicy@2de23121 - Will use zip compression
12:17:59,575 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log file name: /var/log/linstor-controller/linstor-Controller.log
12:17:59,575 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property is set to [/var/log/linstor-controller/linstor-Controller.log]
12:17:59,576 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Failed to create parent directories for [/var/log/linstor-controller/linstor-Controller.log]
12:17:59,576 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - openFile(/var/log/linstor-controller/linstor-Controller.log,true) call failed. java.io.FileNotFoundException: /var/log/linstor-controller/linstor-Controller.log (No such file or directory)
at java.io.FileNotFoundException: /var/log/linstor-controller/linstor-Controller.log (No such file or directory)
at at java.base/java.io.FileOutputStream.open0(Native Method)
at at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at at ch.qos.logback.core.recovery.ResilientFileOutputStream.<init>(ResilientFileOutputStream.java:26)
at at ch.qos.logback.core.FileAppender.openFile(FileAppender.java:204)
at at ch.qos.logback.core.FileAppender.start(FileAppender.java:127)
at at ch.qos.logback.core.rolling.RollingFileAppender.start(RollingFileAppender.java:100)
at at ch.qos.logback.core.joran.action.AppenderAction.end(AppenderAction.java:90)
at at ch.qos.logback.core.joran.spi.Interpreter.callEndAction(Interpreter.java:309)
at at ch.qos.logback.core.joran.spi.Interpreter.endElement(Interpreter.java:193)
at at ch.qos.logback.core.joran.spi.Interpreter.endElement(Interpreter.java:179)
at at ch.qos.logback.core.joran.spi.EventPlayer.play(EventPlayer.java:62)
at at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:165)
at at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:152)
at at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:110)
at at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:53)
at at ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:75)
at at ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:150)
at at org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:84)
at at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:55)
at at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
at at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
at at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:417)
at at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:362)
at at com.linbit.linstor.logging.StdErrorReporter.<init>(StdErrorReporter.java:75)
at at com.linbit.linstor.core.Controller.main(Controller.java:450)
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/Controller] to INFO
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/Controller] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/Controller]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/Satellite] to INFO
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/Satellite] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/Satellite]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [LINSTOR/TESTS] to OFF
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [LINSTOR/TESTS] to false
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[LINSTOR/TESTS]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
12:17:59,576 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
12:17:59,576 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
12:17:59,577 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6913c1fb - Registering current configuration as safe fallback point
12:17:59.580 [main] ERROR LINSTOR/Controller - SYSTEM - Unable to create log directory: /var/log/linstor-controller
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.TraceSystem.logWritingError(TraceSystem.java:289)
at org.h2.message.TraceSystem.openWriter(TraceSystem.java:310)
at org.h2.message.TraceSystem.writeFile(TraceSystem.java:258)
at org.h2.message.TraceSystem.write(TraceSystem.java:242)
at org.h2.message.Trace.error(Trace.java:196)
at org.h2.engine.Database.openDatabase(Database.java:314)
at org.h2.engine.Database.<init>(Database.java:280)
at org.h2.engine.Engine.openSession(Engine.java:66)
at org.h2.engine.Engine.openSession(Engine.java:179)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:157)
at org.h2.engine.Engine.createSession(Engine.java:140)
at org.h2.engine.Engine.createSession(Engine.java:28)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:351)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:124)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:103)
at org.h2.Driver.connect(Driver.java:69)
at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:115)
at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665)
at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
at com.linbit.linstor.logging.H2ErrorReporter.setupErrorDB(H2ErrorReporter.java:66)
at com.linbit.linstor.logging.H2ErrorReporter.<init>(H2ErrorReporter.java:61)
at com.linbit.linstor.logging.StdErrorReporter.<init>(StdErrorReporter.java:107)
at com.linbit.linstor.core.Controller.main(Controller.java:450)
Caused by: org.h2.jdbc.JdbcSQLException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
... 27 more
Caused by: org.h2.message.DbException: Error while creating file "/var/log/linstor-controller" [90062-197]
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.store.fs.FilePathDisk.createDirectory(FilePathDisk.java:271)
at org.h2.store.fs.FileUtils.createDirectory(FileUtils.java:42)
at org.h2.store.fs.FileUtils.createDirectories(FileUtils.java:312)
at org.h2.message.TraceSystem.openWriter(TraceSystem.java:300)
... 24 more
Caused by: org.h2.jdbc.JdbcSQLException: Error while creating file "/var/log/linstor-controller" [90062-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
... 30 more
12:18:02.218 [main] ERROR LINSTOR/Controller - SYSTEM - Unable to operate the error-reports database: java.sql.SQLException: Cannot create PoolableConnectionFactory (Error while creating file "/var/log/linstor-controller" [90062-197])
12:18:02.218 [main] INFO LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
12:18:02.242 [main] WARN io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***
12:18:02.250 [Main] INFO LINSTOR/Controller - SYSTEM - Loading API classes started.
12:18:02.486 [Main] INFO LINSTOR/Controller - SYSTEM - API classes loading finished: 236ms
12:18:02.486 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection started.
12:18:02.501 [Main] INFO LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
12:18:02.501 [Main] INFO LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" is not installed
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/linstor-server/lib/guice-4.2.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
12:18:03.175 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection finished: 689ms
12:18:03.375 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
12:18:03.459 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
12:18:03.460 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing the etcd database
12:18:04.008 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'ETCDDatabaseService' of type ETCDDatabaseService
12:18:04.009 [Main] INFO LINSTOR/Controller - SYSTEM - Loading security objects
12:18:04.090 [Main] INFO LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
12:18:04.144 [Main] INFO LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
12:18:06.684 [Main] INFO LINSTOR/Controller - SYSTEM - Core objects load from database completed
12:18:06.703 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'TaskScheduleService' of type TaskScheduleService
12:18:06.704 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing network communications services
12:18:06.704 [Main] WARN LINSTOR/Controller - SYSTEM - The SSL network communication service 'DebugSslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
Unable to create error report file for error report 5FB7B3F7-00000-000000:
/var/log/linstor-controller/ErrorReport-5FB7B3F7-00000-000000.log (No such file or directory)
The error report will be written to the standard error stream instead.
ERROR REPORT 5FB7B3F7-00000-000000
============================================================
Application: LINBIT? LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T09:33:23+00:00
Error time: 2020-11-20 12:18:06
Node: piraeus-cs-controller-5f998d764b-zz46s
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'execute', Source file 'TaskScheduleService.java', Line #289
Error message: Unhandled exception caught in com.linbit.linstor.tasks.TaskScheduleService
Error context:
This exception was generated in the service thread of the service 'TaskScheduleService'
Call backtrace:
Method Native Class:Line number
execute N com.linbit.linstor.tasks.TaskScheduleService:289
run N com.linbit.linstor.tasks.TaskScheduleService:196
run N java.lang.Thread:834
Caused by:
==========
Category: RuntimeException
Class name: LinStorRuntimeException
Class canonical name: com.linbit.linstor.LinStorRuntimeException
Generated at: Method 'archiveLogDirectory', Source file 'StdErrorReporter.java', Line #730
Error message: Unable to list log directory
Call backtrace:
Method Native Class:Line number
archiveLogDirectory N com.linbit.linstor.logging.StdErrorReporter:730
run 12:18:06.709 [Main] INFO LINSTOR/Controller - SYSTEM - Created network communication service 'PlainConnector'
N 12:18:06.709 [Main] WARN LINSTOR/Controller - SYSTEM - The SSL network communication service 'SslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
com.linbit.linstor.tasks.LogArchiveTask:49
12:18:06.709 [Main] INFO LINSTOR/Controller - SYSTEM - Created network communication service 'SslConnector'
execute N com.linbit.linstor.tasks.TaskScheduleService:282
run N com.linbit.linstor.tasks.TaskScheduleService:196
run N java.lang.Thread:834
Caused by:
==========
Category: Exception
Class name: NoSuchFileException
Class canonical name: java.nio.file.NoSuchFileException
Generated at: Method 'translateToIOException', Source file 'UnixException.java', Line #92
Error message: /var/log/linstor-controller
Call backtrace:
Method Native Class:Line number
translateToIOException N sun.nio.fs.UnixException:92
rethrowAsIOException N sun.nio.fs.UnixException:111
rethrowAsIOException N sun.nio.fs.UnixException:116
newDirectoryStream N sun.nio.fs.UnixFileSystemProvider:432
newDirectoryStream N java.nio.file.Files:471
list N java.nio.file.Files:3698
archiveLogDirectory N com.linbit.linstor.logging.StdErrorReporter:652
run N com.linbit.linstor.tasks.LogArchiveTask:49
12:18:06.711 [Main] INFO LINSTOR/Controller - SYSTEM - Reconnecting to previously known nodes
execute N com.linbit.linstor.tasks.TaskScheduleService:282
run N com.linbit.linstor.tasks.TaskScheduleService:196
run N java.lang.Thread:834
END OF ERROR REPORT.
12:18:06.730 [Main] INFO LINSTOR/Controller - SYSTEM - Reconnect requests sent
Nov 20, 2020 12:18:07 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [[::]:3370]
Nov 20, 2020 12:18:07 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
12:18:07.656 [Main] INFO LINSTOR/Controller - SYSTEM - Controller initialized
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
org.h2.message.DbException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.TraceSystem.logWritingError(TraceSystem.java:289)
at org.h2.message.TraceSystem.openWriter(TraceSystem.java:310)
at org.h2.message.TraceSystem.writeFile(TraceSystem.java:258)
at org.h2.message.TraceSystem.write(TraceSystem.java:242)
at org.h2.message.Trace.error(Trace.java:196)
at org.h2.engine.Database.openDatabase(Database.java:314)
at org.h2.engine.Database.<init>(Database.java:280)
at org.h2.engine.Engine.openSession(Engine.java:66)
at org.h2.engine.Engine.openSession(Engine.java:179)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:157)
at org.h2.engine.Engine.createSession(Engine.java:140)
at org.h2.engine.Engine.createSession(Engine.java:28)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:351)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:124)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:103)
at org.h2.Driver.connect(Driver.java:69)
at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:115)
at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665)
at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753)
at com.linbit.linstor.logging.H2ErrorReporter.writeErrorReportToDB(H2ErrorReporter.java:105)
at com.linbit.linstor.logging.StdErrorReporter.reportErrorImpl(StdErrorReporter.java:301)
at com.linbit.linstor.logging.StdErrorReporter.reportError(StdErrorReporter.java:271)
at com.linbit.linstor.tasks.TaskScheduleService.execute(TaskScheduleService.java:286)
at com.linbit.linstor.tasks.TaskScheduleService.run(TaskScheduleService.java:196)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.h2.jdbc.JdbcSQLException: Log file error: "/var/log/linstor-controller/error-report.trace.db", cause: "org.h2.message.DbException: Error while creating file ""/var/log/linstor-controller"" [90062-197]" [90034-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
... 29 more
Caused by: org.h2.message.DbException: Error while creating file "/var/log/linstor-controller" [90062-197]
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.store.fs.FilePathDisk.createDirectory(FilePathDisk.java:271)
at org.h2.store.fs.FileUtils.createDirectory(FileUtils.java:42)
at org.h2.store.fs.FileUtils.createDirectories(FileUtils.java:312)
at org.h2.message.TraceSystem.openWriter(TraceSystem.java:300)
... 26 more
Caused by: org.h2.jdbc.JdbcSQLException: Error while creating file "/var/log/linstor-controller" [90062-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
... 32 more
12:18:09.291 [TaskScheduleService] ERROR LINSTOR/Controller - SYSTEM - Unable to write error report to DB: Cannot create PoolableConnectionFactory (Error while creating file "/var/log/linstor-controller" [90062-197])
12:18:09.291 [TaskScheduleService] ERROR LINSTOR/Controller - SYSTEM - Unhandled exception caught in com.linbit.linstor.tasks.TaskScheduleService [Report number 5FB7B3F7-00000-000000]
root@piraeus-op-cs-controller-595cdf94d6-nrzkz:/# linstor v l
ERROR:
Show reports:
linstor error-reports show 5FA8E4BB-00000-000031
root@piraeus-op-cs-controller-595cdf94d6-nrzkz:/# linstor error-reports show 5FA8E4BB-00000-000031
ERROR REPORT 5FA8E4BB-00000-000031
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T09:33:23+00:00
Error time: 2020-11-09 06:52:57
Node: piraeus-op-cs-controller-595cdf94d6-nrzkz
============================================================
Category: RuntimeException
Class name: NullPointerException
Class canonical name: java.lang.NullPointerException
Generated at: Method 'deviceProviderKindAsString', Source file 'Json.java', Line #73
Call backtrace:
Method Native Class:Line number
deviceProviderKindAsString N com.linbit.linstor.api.rest.v1.serializer.Json:73
apiToVolume N com.linbit.linstor.api.rest.v1.serializer.Json:664
lambda$apiToResourceWithVolumes$2 N com.linbit.linstor.api.rest.v1.serializer.Json:477
accept N java.util.stream.ReferencePipeline$3$1:195
forEachRemaining N java.util.ArrayList$ArrayListSpliterator:1655
copyInto N java.util.stream.AbstractPipeline:484
wrapAndCopyInto N java.util.stream.AbstractPipeline:474
evaluateSequential N java.util.stream.ReduceOps$ReduceOp:913
evaluate N java.util.stream.AbstractPipeline:234
collect N java.util.stream.ReferencePipeline:578
apiToResourceWithVolumes N com.linbit.linstor.api.rest.v1.serializer.Json:506
lambda$listVolumesApiCallRcWithToResponse$1 N com.linbit.linstor.api.rest.v1.View:112
accept N java.util.stream.ReferencePipeline$3$1:195
forEachRemaining N java.util.ArrayList$ArrayListSpliterator:1655
copyInto N java.util.stream.AbstractPipeline:484
wrapAndCopyInto N java.util.stream.AbstractPipeline:474
evaluateSequential N java.util.stream.ReduceOps$ReduceOp:913
evaluate N java.util.stream.AbstractPipeline:234
collect N java.util.stream.ReferencePipeline:578
lambda$listVolumesApiCallRcWithToResponse$2 N com.linbit.linstor.api.rest.v1.View:113
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:378
onNext N reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
onNext N reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
request N reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribeInner N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:191
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8311
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N reactor.core.publisher.Operators$ScalarSubscription:2317
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.Flux:8325
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
onNext N reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
complete N reactor.core.publisher.Operators$MonoSubscriber:1755
onComplete N reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167
onComplete N reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
onComplete N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
checkTerminated N reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
onComplete N reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
complete N reactor.core.publisher.FluxCreate$BaseSink:438
drain N reactor.core.publisher.FluxCreate$BufferAsyncSink:784
complete N reactor.core.publisher.FluxCreate$BufferAsyncSink:732
drainLoop N reactor.core.publisher.FluxCreate$SerializedSink:239
drain N reactor.core.publisher.FluxCreate$SerializedSink:205
complete N reactor.core.publisher.FluxCreate$SerializedSink:196
apiCallComplete N com.linbit.linstor.netcom.TcpConnectorPeer:455
handleComplete N com.linbit.linstor.proto.CommonMessageProcessor:363
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:287
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8325
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:322
onNext N reactor.core.publisher.UnicastProcessor:401
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
We got following error on the node which was updatet to DRBD 9.0.25:
Readiness probe failed: Get http://192.168.246.176:3370/: dial tcp 192.168.246.176:3370: connect: connection refused
Port 3370 is responsive on node.
Currently, the operator uses controller pod name and ip to register it, which is unintuitive and does not show its locality directly:
# linstor node list
+------------------------------------------------------------------------------------------------+
| Node | NodeType | Addresses | State |
|================================================================================================|
| k8s-worker-1 | SATELLITE | 192.168.176.191:3366 (PLAIN) | Online |
| k8s-worker-2 | SATELLITE | 192.168.176.192:3366 (PLAIN) | Online |
| k8s-worker-3 | SATELLITE | 192.168.176.193:3366 (PLAIN) | Online |
| piraeus-op-cs-controller-6fbd7b7888-ngs45 | CONTROLLER | 172.29.69.195:3366 (PLAIN) | Online |
+------------------------------------------------------------------------------------------------+
The ideal would be display the node name of which controller is on, instead of the pod name
I tried to tweek it to use spec.nodeName
and status.hostIP
, but somehow linstor does not allow registration using containerPort
. Changing controller to use hostNetwork
solves the problem but could be an overshoot.
Is there anyway to do it cleanly?
Got the following errors doing kubectl describe pvc name.
pvc wont get created.
Warning ProvisioningFailed 34s linstor.csi.linbit.com_piraeus-op-csi-controller-5c65956bf4-s4t5h_d02bd753-670a-47cf-a6e4-4aed355fb8e8 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = Internal desc = CreateVolume failed for pvc-56019475-789d-469c-9694-278d711fb079: Message: 'Successfully set property key(s): StorPoolName' next error: Message: 'Successfully set property key(s): StorPoolName' next error: Message: 'Resource 'pvc-56019475-789d-469c-9694-278d711fb079' successfully autoplaced on 2 nodes'; Details: 'Used nodes (storage pool name): 'k8smaster (lvm-hdd)', 'k8w6 (lvm-hdd)'' next error: Message: 'Tie breaker resource 'pvc-56019475-789d-469c-9694-278d711fb079' created on k8w2' next error: Message: 'Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'null' to 'majority' by auto-quorum' next error: Message: 'Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'null' to 'io-error' by auto-quorum' next error: Message: '(Node: 'k8w2') Failed to adjust DRBD resource pvc-56019475-789d-469c-9694-278d711fb079'; Reports: '[5F99F200-4BC88-000024]' next error: Message: 'Created resource 'pvc-56019475-789d-469c-9694-278d711fb079' on 'k8w6'' next error: Message: 'Created resource 'pvc-56019475-789d-469c-9694-278d711fb079' on 'k8smaster''
It should be possible to:
Proposal:
additionalEnv
section to the LinstorController
CRD. This is concatenated with the environment we already set andenv:
section of a containerspec.additionalProperties
section to the LinstorController
CRD. It would look like this:
additionalProperties:
- key: StorDriver/Foobar/UsernameEnv
value: MY_ENV
In conjunction, these features would allow setting a property value from a secret:
additionalEnv:
- name: MY_PASSWORD
valueFrom:
secretKeyRef:
name: my-k8s-secret
key: password
additionalProperties:
- key: StorDriver/Foobar/PasswordEnv
value: MY_PASSWORD
Note: whats missing in the LINSTOR Controller is the special "*Env" properties. They should be available in the next feature release for selected variables.
We use a fixed 1 minute time-out for the whole reconcile loop.
This includes:
We should always update our status field, but this may not happen if 1-3 run into a time-out. Especially (3) is prone to such long pauses.
The quickest solution is to update the status by passing a new context to Status().Update(ctx, ...)
We don't have any external contexts to worry about, so no "external" deadline or expected cancellation.
@JoelColledge opinions?
Hi,
When I tried pv-hostpath helm chart, it failed with too long job name.
[root@ip-172-31-38-220 piraeus-operator]# helm install linstor-etcd ./charts/pv-hostpath --set "nodes={ip-172-31-37-227.ap-northeast-1.compute.internal}"
Error: failed post-install: warning: Hook post-install pv-hostpath/templates/pv.yaml failed: Job.batch "linstor-etcd-ip-172-31-37-227.ap-northeast-1.compute.internal-chown" is invalid: spec.template.labels: Invalid value: "linstor-etcd-ip-172-31-37-227.ap-northeast-1.compute.internal-chown": must be no more than 63 characters
[root@ip-172-31-38-220 piraeus-operator]#
Adding trunc to Jobname's hostname part resolved this.
Hello,
First of thank you for all the hard work that you are putting into this product. We are testing the operator, and managed to set everything up as expected. Only setting up monitoring proves to be problematic.
Judging from the code in
piraeus-operator/cmd/manager/main.go
Line 162 in e46d5c4
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
spec:
endpoints:
- port: http-metrics
namespaceSelector:
matchNames:
- infra-storage
selector:
matchLabels:
name: piraeus-operator
Could you give me some pointers on what I am doing wrong?
Hi Everyone,
I'm about to try out Piraeus, and I've just seen that the APIs used by the CRDs are deprecated in my k8s version. When I tried to upgrade them with a simple replacement, I got validation errors, so further work is expected to make them compatible. Here are my logs:
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcsidrivers_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorcsidrivers.piraeus.linbit.com created
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorsatellitesets_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorsatellitesets.piraeus.linbit.com created
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcontrollers_crd.yaml
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/linstorcontrollers.piraeus.linbit.com created
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:41:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
$ sed -i -E 's#apiextensions\.k8s\.io/v1beta1#apiextensions.k8s.io/v1#g' piraeus.linbit.com_*
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcsidrivers_crd.yaml
error: error validating "piraeus.linbit.com_linstorcsidrivers_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "additionalPrinterColumns" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorsatellitesets_crd.yaml
error: error validating "piraeus.linbit.com_linstorsatellitesets_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl apply -n piraeus -f piraeus.linbit.com_linstorcontrollers_crd.yaml
error: error validating "piraeus.linbit.com_linstorcontrollers_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
Good day!
I'm getting the following log output from kernel-module-injector container of the üiraeus-op-ns-node-xxx pod.
SORRY, kernel makefile not found.
You need to tell me a correct KDIR,
Or install the neccessary kernel source packages.
Makefile:125: recipe for target 'check-kdir' failed
make: *** [check-kdir] Error 1
Could not find the expexted *.ko, see stderr for more details
Isn't the kernel source package installed by the kernel-module-injector? Or what I'm doing wrong
Thank you in advance.
Stephan
Good day!
As far as I have seen there is only a bionic drdb image available. Would it be possible to have one for Focal as well. On drdb.io it's existing!
Thank you and have a nice day!
Stephan
Compile fails when using drbd9-centos7:v9.0.23 on CentOS 7.8
# kubectl logs -f piraeus-op-ns-node-mf7zf -c drbd-kernel-module-injector
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'
Calling toplevel makefile of kernel source tree, which I believe is in
KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build
make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build M=/tmp/pkg/drbd-9.0.23-1/drbd modules
COMPAT before_4_13_kernel_read
COMPAT alloc_workqueue_takes_fmt
COMPAT blkdev_issue_zeroout_discard
COMPAT drbd_release_returns_void
COMPAT genl_policy_in_ops
COMPAT have_SHASH_DESC_ON_STACK
COMPAT have_WB_congested_enum
COMPAT have_allow_kernel_signal
COMPAT have_atomic_dec_if_positive_linux
COMPAT have_atomic_in_flight
COMPAT have_bd_claim_by_disk
COMPAT have_bd_unlink_disk_holder
COMPAT have_bio_bi_bdev
COMPAT have_bio_bi_error
COMPAT have_bio_bi_opf
COMPAT have_bio_bi_status
COMPAT have_bio_clone_fast
COMPAT have_bio_flush
COMPAT have_bio_free
COMPAT have_bio_op_shift
COMPAT have_bio_set_op_attrs
COMPAT have_bio_rw
COMPAT have_bioset_create_front_pad
COMPAT have_bioset_init
COMPAT have_bioset_need_bvecs
COMPAT have_blk_check_plugged
COMPAT have_blk_qc_t_make_request
COMPAT have_blk_queue_flag_set
COMPAT have_blk_queue_make_request
COMPAT have_blk_queue_merge_bvec
COMPAT have_blk_queue_split_q_bio
/bin/bash: /usr/bin/mkdir: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[3]: execvp: /bin/bash: Argument list too long
make[3]: *** [/tmp/pkg/drbd-9.0.23-1/drbd/.compat_test.3.10.0-1127.10.1.el7.x86_64/have_blk_queue_split_q_bio_bioset.result] Error 127
make[3]: *** Waiting for unfinished jobs....
/bin/bash: /usr/bin/tr: Argument list too long
COMPAT have_blk_queue_plugged
/bin/bash: /usr/bin/tr: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[2]: *** [_module_/tmp/pkg/drbd-9.0.23-1/drbd] Error 2
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
make: *** [module] Error 2
Could not find the expexted *.ko, see stderr for more details
Since today linstor is not working anymore. cs controller cannot communicate with etcd.
What does "The current scope has already been entered" means?
kubectl -n piraeus-system exec deployment/piraeus-op-cs-controller -- linstor err show 5FA5BD9C-00000-000001
ERROR REPORT 5FA5BD9C-00000-000001
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time: 2020-09-23T09:33:23+00:00
Error time: 2020-11-06 21:18:23
Node: piraeus-op-cs-controller-5456d5cddd-nfnfl
============================================================
Category: RuntimeException
Class name: IllegalStateException
Class canonical name: java.lang.IllegalStateException
Generated at: Method 'checkState', Source file 'Preconditions.java', Line #508
Error message: The current scope has already been entered
Call backtrace:
Method Native Class:Line number
checkState N com.google.common.base.Preconditions:508
enter N com.linbit.linstor.api.LinStorScope:54
initialize N com.linbit.linstor.systemstarter.PreConnectInitializer:64
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:337
main N com.linbit.linstor.core.Controller:556
END OF ERROR REPORT.
I think the storage preparation section in the docs (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#preparing-physical-devices) should warn against using symlink devices like the ones in /dev/disk/by-id/
or other /dev/disk/by-X
folders.
The current 3 requirements should list a 4th one that the device must not be a symlink:
Although this change could solve provisioning problems for others, the real solution would be to support persistent device names.
I tried to avoid using /dev/sdX
in my devicePaths
list for preparing devices as these names are known to be not safe for long-term usage, and one should use persistent device names: https://wiki.archlinux.org/index.php/persistent_block_device_naming
When the operator was deployed with a persistent device name (/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
), all seemed fine, however, when I added a StorageClass and tried to provision a PVC using it, the PVC kept to be unbound. I checked the CSI logs, and found these lines:
csi-provisioner I1225 09:05:54.115790 1 controller.go:645] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = CreateVolume fail
ed for pvc-0797f634-b26b-4d82-b5e0-d35014deb438: Message: 'Not enough available nodes'; Details: 'Not enough nodes fulfilling the following auto-place criteria:
csi-provisioner * has a deployed storage pool named TransactionList [thinpool]
csi-provisioner * the storage pools have to have at least '5242880' free space
csi-provisioner * the current access context has enough privileges to use the node and the storage pool
csi-provisioner * the node is online
csi-provisioner Auto-place configuration details:
csi-provisioner Additional place count: 3
csi-provisioner Don't place with resource (List): [pvc-0797f634-b26b-4d82-b5e0-d35014deb438]
csi-provisioner Storage pool name: TransactionList [thinpool]
csi-provisioner Layer stack: [DRBD, STORAGE]
csi-provisioner Auto-placing resource: pvc-0797f634-b26b-4d82-b5e0-d35014deb438'
csi-provisioner I1225 09:05:54.115819 1 controller.go:1084] Final error received, removing PVC 0797f634-b26b-4d82-b5e0-d35014deb438 from claims in progress
csi-provisioner W1225 09:05:54.115827 1 controller.go:943] Retrying syncing claim "0797f634-b26b-4d82-b5e0-d35014deb438", failure 7
Started to dig deeper, and found these errors via linstor
CLI:
$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ lvm-thin ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
┊ lvm-thin ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
┊ lvm-thin ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
Node: 'node1', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
ERROR:
Description:
Node: 'node2', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
ERROR:
Description:
Node: 'node3', storage pool: 'lvm-think' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
The related error log:
$ cat /var/log/linstor-satellite/ErrorReport-5FE58E5C-1F7FF-000000.log
ERROR REPORT 5FE58E5C-1F7FF-000000
============================================================
Application: LINBIT? LINSTOR
Module: Satellite
Version: 1.11.0
Build ID: 3367e32d0fa92515efe61f6963767700a8701d98
Build time: 2020-12-18T08:40:35+00:00
Error time: 2020-12-25 07:02:59
Node: node3
============================================================
Reported error:
===============
Description:
Volume group 'linstor_thinpool' not found
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'checkVgExists', Source file 'LvmUtils.java', Line #398
Error message: Volume group 'linstor_thinpool' not found
Call backtrace:
Method Native Class:Line number
checkVgExists N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:398
checkVolumeGroupEntry N com.linbit.linstor.layer.storage.utils.StorageConfigReader:63
checkConfig N com.linbit.linstor.layer.storage.lvm.LvmProvider:549
checkStorPool N com.linbit.linstor.layer.storage.StorageLayer:396
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:913
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1225
getStoragePoolSpaceInfo N com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:279
applyChanges N com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:235
applyFullSync N com.linbit.linstor.core.apicallhandler.StltApiCallHandler:332
execute N com.linbit.linstor.api.protobuf.FullSync:94
executeNonReactive N com.linbit.linstor.proto.CommonMessageProcessor:525
lambda$execute$13 N com.linbit.linstor.proto.CommonMessageProcessor:500
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8343
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N reactor.core.publisher.Operators$ScalarSubscription:2344
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
slowPath N reactor.core.publisher.FluxArray$ArraySubscription:126
request N reactor.core.publisher.FluxArray$ArraySubscription:99
onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:363
subscribe N reactor.core.publisher.FluxMerge:69
subscribe N reactor.core.publisher.Flux:8357
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
subscribe N reactor.core.publisher.FluxConcatArray:80
subscribe N reactor.core.publisher.InternalFluxOperator:62
subscribe N reactor.core.publisher.FluxDefer:54
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
I also found this list, and started to wonder how these became sdb
:
$ linstor physical-storage l
╭───────────────────────────────────────────╮
┊ Size ┊ Rotational ┊ Nodes ┊
╞═══════════════════════════════════════════╡
┊ 8589934592 ┊ True ┊ node1[/dev/sdb] ┊
┊ ┊ ┊ node2[/dev/sdb] ┊
┊ ┊ ┊ node3[/dev/sdb] ┊
╰───────────────────────────────────────────╯
It seems that it followed the symlink to sdb
:
$ ls -la /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
lrwxrwxrwx. 1 root root 9 Dec 25 10:58 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb
Just to make sure that the device is compatible with the 3 required points, I checked fdisk
:
$ fdisk /dev/sdb
Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd9639fdb.
Command (m for help): m
Help:
DOS (MBR)
a toggle a bootable flag
b edit nested BSD disklabel
c toggle the dos compatibility flag
Generic
d delete a partition
F list free unpartitioned space
l list known partition types
n add a new partition
p print the partition table
t change a partition type
v verify the partition table
i print information about a partition
Misc
m print this menu
u change display/entry units
x extra functionality (experts only)
Script
I load disk layout from sfdisk script file
O dump disk layout to sfdisk script file
Save & Exit
w write table to disk and exit
q quit without saving changes
Create a new label
g create a new empty GPT partition table
G create a new empty SGI (IRIX) partition table
o create a new empty DOS partition table
s create a new empty Sun partition table
Command (m for help): F
Unpartitioned space /dev/sdb: 8 GiB, 8588886016 bytes, 16775168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Start End Sectors Size
2048 16777215 16775168 8G
Command (m for help): p
Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd9639fdb
Command (m for help): v
Remaining 16777215 unallocated 512-byte sectors.
Command (m for help): i
No partition is defined yet!
Command (m for help): q
So I started to suspect that something is problematic with the persistent name being a symlink. Since I have no experience with Piraeus/Linstor device migration if it's possible or not, I removed everything from the cluster that is related to these, removed all etcd host path volumes, and cleaned up the cluster. Then I re-deployed the Piraeus operator with just the following change in the Helm values:
storagePools:
lvmThinPools:
- devicePaths:
- - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
+ - /dev/sdb
name: lvm-thin
thinVolume: thinpool
volumeGroup: ""
And it now works fine, all statuses are green:
$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ lvm-thin ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
┊ lvm-thin ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
┊ lvm-thin ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
I was wondering if there is a way to safely mount a Piraeus PV on the host for init or backup access. By safely, I mean that without breaking any replication logic provided by Linstor/DRBD.
When migrating workloads from host path volumes to Piraeus PVs, it would be great to easily copy existing data into the new PVs, or it could also provide backup access. I think backup would be less problematic as it is just reading data, on the other hand, initializing them is lots of writing operations.
I tried to mount the new logical volumes in multiple ways without success:
# from one k8s node
$ mkdir /tmp/test-mount
$ mount /dev/mapper/linstor_thinpool-pvc--3d1a3ebb--7a61--4307--b253--1ec13d5c473e_00000 /tmp/test-mount/
mount: /tmp/test-mount: /dev/mapper/linstor_thinpool-pvc--3d1a3ebb--7a61--4307--b253--1ec13d5c473e_00000 already mounted or mount point busy.
$ mount | grep 3d1a3ebb
/dev/drbd1001 on /var/lib/kubelet/pods/9c4ab752-d7d8-4eac-abdc-b27486b147b7/volumes/kubernetes.io~csi/pvc-3d1a3ebb-7a61-4307-b253-1ec13d5c473e/mount type ext4 (rw,noatime,seclabel,discard,stripe=16)
$ mount /dev/mapper/linstor_thinpool-pvc--9e920c35--8a7b--4022--8597--c8b34dfa4dc1_00000 /tmp/test-mount/
mount: /tmp/test-mount: /dev/mapper/linstor_thinpool-pvc--9e920c35--8a7b--4022--8597--c8b34dfa4dc1_00000 already mounted or mount point busy.
$ mount | grep 9e920c35
$ # no output, this LV was mounted on a different node where the related pod was running, still not mountable here
# from the Proxmox host
$ mkdir /tmp/test-mount
$ mount /dev/mapper/pve-vm--118--disk--1 /tmp/test-mount/
mount: /tmp/test-mount: unknown filesystem type 'LVM2_member'.
# this is probably related to the disk containing a separate LVM structure other than the one the Proxmox host uses,
# so it can't mount the second disk of the VM that is passed to Linstor/DRBD
# otherwise, it would be this easy: https://forum.proxmox.com/threads/how-to-mount-lvm-disk-of-vm.25218/#post-126333
As a workaround, I mounted both the new Piraeus PV and the old host path PV into the pod under migration, and copied the old data to the new PV within the pod's shell. Then removed the old host path PV, and redeployed the pod with using only the new Piraeus PV. I think this workaround satisfies the "safe" requirement as all LV access is made through the CSI, however, it is a bit of a hassle to do it every time one needs to copy something to or from the LV.
Isn't there an easier but a still safe way to achieve the same from one of the k8s nodes or from the machine hosting the nodes?
Got the following error on trying upgrade 1.0.0 to 1.1.0.
helm upgrade piraeus-op -f ./helm-tobg/values.yaml ./piraeus-operator/charts/piraeus
Error: UPGRADE FAILED: [unable to recognize "": no matches for kind "LinstorController" in version "piraeus.linbit.com/v1", unable to recognize "": no matches for kind "LinstorSatelliteSet" in version "piraeus.linbit.com/v1"]
Please consider providing an option to enable encrypted communication (https) between Linstor components.
Ideally, this could also include the communication within and to an etcd database cluster, which is used by a Linstor Controller.
While reading through the snapshot description (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/optional-components.md#snapshot-support-components) I also found that K8S supports cloning of an existing PVC: https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/
I haven't found anything in the Piraeus documentation, only the Linstor guide describes two modes how to clone a resource (https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-clone-mode).
So my question is: Does the Piraeus CSI driver also support PVC creation from an existing PVC (aka cloning):
kind: PersistentVolumeClaim
metadata:
name: cloned-pvc
spec:
storageClassName: my-csi-plugin
dataSource:
name: existing-src-pvc-name
kind: PersistentVolumeClaim
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
And if yes, how is it technically implemented (snapshot+creation from snapshot or new volume+copying with dd)?
In case cloning is currently not implemented in the CSI driver, it would be a wonderful addition.
Added a separate issue for #142 (comment)
Looks like the instructions are incomplete for the host set-up. Basically the problem is the missing usermode_helper=disabled parameter. On first install it will most likely work, as the module is only loaded when the injector image runs, and that takes care of setting this option. On restart, I guess the module gets loaded sooner, now without this parameter set.
I'm really looking forward to go through it again :)
The operator wont start if the legacy CRDs are not installed (linstorcontrollerset
and linstornodeset
).
the legacy controllers "depend" on the old CRDs and produce an error in the set up step if the CRDs are not installed. In this case we should just skip adding the legacy controllers, as with CRDs there is nothing to "upgrade".
For a high-availability deployment, we want all components to be in some way redundant.
Currently the following components are not:
csi-controller
, is a deployment with replica count 1, has built-in leader election capabilitiessnapshot-controller
, is a statefulset with replica count 1, has built-in leader election capabilitiesetcd
, already working with multiple replicas, default is set to 1piraeus-operator
, is a deployment with replica count 1, has built-in leader election capabilitiesstork
, already working with multiple replicas, default is set to 1linstor-controller
, currently capped at 1 replica in deployment, needs leader-election mechanism -> see #56For the sake of nameing conformity, please consider adding piraeus-op-csi-
prefix to snapshot-controller
pods
NAME READY STATUS RESTARTS AGE
piraeus-op-cs-controller-5c4cf45bdf-vvgc7 1/1 Running 1 10m
piraeus-op-csi-controller-b6dcf95fb-6z29m 5/5 Running 0 10m
piraeus-op-csi-controller-b6dcf95fb-89rq4 5/5 Running 0 10m
piraeus-op-csi-controller-b6dcf95fb-px9vh 5/5 Running 0 10m
piraeus-op-csi-node-d78s5 2/2 Running 0 10m
piraeus-op-csi-node-dddtj 2/2 Running 0 10m
piraeus-op-csi-node-m446w 2/2 Running 0 10m
piraeus-op-etcd-0 1/1 Running 0 11m
piraeus-op-etcd-1 1/1 Running 0 8m54s
piraeus-op-etcd-2 1/1 Running 0 8m
piraeus-op-ns-node-57dgm 1/1 Running 0 10m
piraeus-op-ns-node-7sh99 1/1 Running 0 10m
piraeus-op-ns-node-rpscv 1/1 Running 0 10m
piraeus-op-operator-55b5cf4d5-bqc8s 1/1 Running 0 11m
piraeus-op-operator-55b5cf4d5-nvsh5 1/1 Running 0 11m
piraeus-op-operator-55b5cf4d5-ssmmn 1/1 Running 0 11m
piraeus-op-stork-7994f6f9d4-phwcf 1/1 Running 0 11m
piraeus-op-stork-7994f6f9d4-rzdrq 1/1 Running 0 11m
piraeus-op-stork-7994f6f9d4-xqn5q 1/1 Running 0 11m
piraeus-op-stork-scheduler-77759446d8-kzjtg 1/1 Running 0 11m
piraeus-op-stork-scheduler-77759446d8-tmcv2 1/1 Running 0 11m
piraeus-op-stork-scheduler-77759446d8-txvn5 1/1 Running 0 11m
snapshot-controller-7d674d7-4j5fj 1/1 Running 0 11m
snapshot-controller-7d674d7-67dln 1/1 Running 0 11m
snapshot-controller-7d674d7-nkqww 1/1 Running 0 11m
Hi, team
Let's do a "darksite (just wipe /etc/resolv.conf) " test before each release, so that issues such as below can be spotted:
charts/csi-snapshotter/values.schema.json
has added ref
"resources": {
"description": "resource requirements for the snapshotter container",
"$ref": "https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.18/api/openapi-spec/swagger.json#/definitions/io.k8s.api.core.v1.ResourceRequirements"
}
helm actually tries to curl those addresses. In a offline environment (or like my place, without vpn, no stable access to githubs), helm will fail with following errors:
# helm install piraeus-op ./charts/piraeus
Error: values don't meet the specifications of the schema(s) in the following chart(s):
csi-snapshotter:
Get https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.18/api/openapi-spec/swagger.json: dial tcp 151.101.108.133:443: i/o timeout
Solution is to remove the ref line.
I suspect the below WARN is thrown because we run Linstor dockerized, and not on the host, is it right?
However, why are all satellite pods outputting capacity as 0 when they have 8G disks from which approx 7G is still free? Shouldn't they say something other than 0? This log is there from the first time I started experimenting with Piraeus, it's not new but I was always curious what does it mean and why 0 capacity is logged.
linstor-satellite 09:50:08.376 [DeviceManager] WARN LINSTOR/Satellite - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null
linstor-satellite 09:51:27.362 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors
linstor-satellite 01:15:00.542 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors
linstor-satellite 09:49:27.043 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-satellite
linstor-satellite 09:49:27.044 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - LogArchive: No logs to archive.
linstor-satellite 01:15:00.564 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 0 kiB, no errors
Before releasing v1.0, we want to synchronize names between the operator and LINSTOR.
The current proposal is to rename these CRDs:
LinstorNodeSet
-> LinstorSatelliteSet
LinstorControllerSet
-> LinstorController
Some names used by the operator obscure the actual use. LinstorNodeSet should actually be LinstorSatelliteSet. LinstorControllerSet is only ever 1 (active) controller, so it is not a set
I've upgraded my k8s nodes from CentOS 8 to CentOS Stream with the official recommended commands: https://www.centos.org/centos-stream/ Seemingly, there were only minor changes in package versions, nothing serious.
After rebooting the nodes, some of Piraeus' internal services wouldn't start up and are in a constant crash loop. It seems there is a problem with the kernel-module-injector
container.
I've attached all logs I could think of as relevant to let you solve this issue. Please advise if I should enable further debug options for Piraeus (and how).
$ k get all
NAME READY STATUS RESTARTS AGE
pod/piraeus-op-cs-controller-cfb475c85-cngdm 1/1 Running 3 47m
pod/piraeus-op-csi-controller-6fb7f7c5d6-hmspq 6/6 Running 11 51m
pod/piraeus-op-csi-node-c94w2 3/3 Running 12 5d23h
pod/piraeus-op-csi-node-mcvw6 3/3 Running 10 5d23h
pod/piraeus-op-csi-node-vk9nj 3/3 Running 11 5d23h
pod/piraeus-op-etcd-0 1/1 Running 3 5d23h
pod/piraeus-op-etcd-1 1/1 Running 1 66m
pod/piraeus-op-etcd-2 1/1 Running 1 57m
pod/piraeus-op-ns-node-7lqmk 0/1 Init:CrashLoopBackOff 5 7m7s
pod/piraeus-op-ns-node-djmtm 0/1 Init:CrashLoopBackOff 5 7m10s
pod/piraeus-op-ns-node-wlnsj 0/1 Init:CrashLoopBackOff 5 7m8s
pod/piraeus-op-operator-7466ddd49c-bbkgm 1/1 Running 6 58m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/piraeus-op-cs ClusterIP 10.43.60.86 <none> 3370/TCP 18d
service/piraeus-op-etcd ClusterIP None <none> 2380/TCP,2379/TCP 18d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/piraeus-op-csi-node 3 3 3 3 3 <none> 18d
daemonset.apps/piraeus-op-ns-node 3 3 0 3 0 <none> 18d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/piraeus-op-cs-controller 1/1 1 1 18d
deployment.apps/piraeus-op-csi-controller 1/1 1 1 18d
deployment.apps/piraeus-op-operator 1/1 1 1 18d
NAME DESIRED CURRENT READY AGE
replicaset.apps/piraeus-op-cs-controller-54b4444965 0 0 0 5d23h
replicaset.apps/piraeus-op-cs-controller-7fb6c98656 0 0 0 18d
replicaset.apps/piraeus-op-cs-controller-cfb475c85 1 1 1 5d23h
replicaset.apps/piraeus-op-csi-controller-565c954d87 0 0 0 18d
replicaset.apps/piraeus-op-csi-controller-6fb7f7c5d6 1 1 1 51m
replicaset.apps/piraeus-op-operator-7466ddd49c 1 1 1 5d23h
replicaset.apps/piraeus-op-operator-7bc4759c9d 0 0 0 18d
NAME READY AGE
statefulset.apps/piraeus-op-etcd 3/3 18d
NAME COMPLETIONS DURATION AGE
job.batch/linstor-etcd-rke-node1-chown 1/1 2s 18d
job.batch/linstor-etcd-rke-node2-chown 1/1 2s 18d
job.batch/linstor-etcd-rke-node3-chown 1/1 1s 18d
job.batch/piraeus-op-test-cs-svc 1/1 38s 18d
$ k logs daemonset.apps/piraeus-op-ns-node kernel-module-injector --previous
Found 3 pods, using pod/piraeus-op-ns-node-djmtm
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.0.25-1/drbd'
Calling toplevel makefile of kernel source tree, which I believe is in
KDIR=/lib/modules/4.18.0-257.el8.x86_64/build
make -C /lib/modules/4.18.0-257.el8.x86_64/build M=/tmp/pkg/drbd-9.0.25-1/drbd modules
COMPAT alloc_workqueue_takes_fmt
COMPAT before_4_13_kernel_read
COMPAT blkdev_issue_zeroout_discard
COMPAT drbd_release_returns_void
COMPAT genl_policy_in_ops
COMPAT have_SHASH_DESC_ON_STACK
COMPAT have_WB_congested_enum
COMPAT have_allow_kernel_signal
COMPAT have_atomic_dec_if_positive_linux
COMPAT have_atomic_in_flight
COMPAT have_bd_claim_by_disk
COMPAT have_bd_unlink_disk_holder
COMPAT have_bio_bi_bdev
COMPAT have_bio_bi_error
COMPAT have_bio_bi_opf
COMPAT have_bio_bi_status
COMPAT have_bio_clone_fast
COMPAT have_bio_flush
COMPAT have_bio_free
COMPAT have_bio_op_shift
COMPAT have_bio_rw
COMPAT have_bio_set_op_attrs
COMPAT have_bio_start_io_acct
COMPAT have_bioset_create_front_pad
COMPAT have_bioset_init
COMPAT have_bioset_need_bvecs
COMPAT have_blk_check_plugged
COMPAT have_blk_qc_t_make_request
COMPAT have_blk_queue_flag_set
COMPAT have_blk_queue_make_request
COMPAT have_blk_queue_merge_bvec
COMPAT have_blk_queue_plugged
COMPAT have_blk_queue_split_q_bio
COMPAT have_blk_queue_split_q_bio_bioset
COMPAT have_blk_queue_write_cache
COMPAT have_blkdev_get_by_path
COMPAT have_d_inode
COMPAT have_file_inode
COMPAT have_generic_start_io_acct_q_rw_sect_part
COMPAT have_generic_start_io_acct_rw_sect_part
COMPAT have_genl_family_parallel_ops
COMPAT have_ib_cq_init_attr
COMPAT have_ib_get_dma_mr
COMPAT have_idr_alloc
COMPAT have_idr_is_empty
COMPAT have_inode_lock
COMPAT have_ktime_to_timespec64
COMPAT have_kvfree
COMPAT have_max_send_recv_sge
COMPAT have_netlink_cb_portid
COMPAT have_nla_nest_start_noflag
COMPAT have_nla_parse_deprecated
COMPAT have_nla_put_64bit
COMPAT have_part_stat_h
COMPAT have_pointer_backing_dev_info
COMPAT have_prandom_u32
COMPAT have_proc_create_single
COMPAT have_ratelimit_state_init
COMPAT have_rb_augment_functions
COMPAT have_refcount_inc
COMPAT have_req_hardbarrier
COMPAT have_req_noidle
COMPAT have_req_nounmap
COMPAT have_req_op_write
COMPAT have_req_op_write_same
COMPAT have_req_op_write_zeroes
COMPAT have_req_prio
COMPAT have_req_write
COMPAT have_req_write_same
COMPAT have_security_netlink_recv
COMPAT have_shash_desc_zero
COMPAT have_signed_nla_put
COMPAT have_simple_positive
COMPAT have_struct_bvec_iter
COMPAT have_struct_kernel_param_ops
COMPAT have_struct_size
COMPAT have_time64_to_tm
COMPAT have_timer_setup
COMPAT have_void_make_request
COMPAT hlist_for_each_entry_has_three_parameters
COMPAT ib_alloc_pd_has_2_params
COMPAT ib_device_has_ops
COMPAT ib_post_send_const_params
COMPAT ib_query_device_has_3_params
COMPAT kmap_atomic_page_only
COMPAT need_make_request_recursion
COMPAT queue_limits_has_discard_zeroes_data
COMPAT rdma_create_id_has_net_ns
COMPAT sock_create_kern_has_five_parameters
COMPAT sock_ops_returns_addr_len
UPD /tmp/pkg/drbd-9.0.25-1/drbd/compat.4.18.0-257.el8.x86_64.h
UPD /tmp/pkg/drbd-9.0.25-1/drbd/compat.h
./drbd-kernel-compat/gen_compat_patch.sh: line 12: spatch: command not found
./drbd-kernel-compat/gen_compat_patch.sh: line 45: hash: spatch: not found
INFO: no suitable spatch found; trying spatch-as-a-service;
be patient, may take up to 10 minutes
if it is in the server side cache it might only take a second
SPAAS 1c20515525cffc698b58b76a5d936660
Successfully connected to SPAAS ('d35a4b17210dab1336de2725b997f300e9acd297')
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5591 100 772 0 4819 8674 54146 --:--:-- --:--:-- --:--:-- 62820
You can create a new .tgz including this pre-computed compat patch
by calling "make unpatch ; echo drbd-9.0.25-1/drbd/drbd-kernel-compat/cocci_cache/1c20515525cffc698b58b76a5d936660/compat.patch >>.filelist ; make tgz"
PATCH
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd_receiver.c
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_dax_pmem.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_debugfs.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_bitmap.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_proc.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_sender.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_receiver.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_req.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_actlog.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/lru_cache.o
CC [M] /tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.o
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c: In function 'drbd_create_device':
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c:3713:6: error: implicit declaration of function 'blk_alloc_queue'; did you mean 'blk_alloc_queue_rh'? [-Werror=implicit-function-declaration]
q = blk_alloc_queue(drbd_make_request, NUMA_NO_NODE);
^~~~~~~~~~~~~~~
blk_alloc_queue_rh
/tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.c:3713:4: warning: assignment to 'struct request_queue *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
q = blk_alloc_queue(drbd_make_request, NUMA_NO_NODE);
^
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:316: /tmp/pkg/drbd-9.0.25-1/drbd/drbd_main.o] Error 1
make[2]: *** [Makefile:1545: _module_/tmp/pkg/drbd-9.0.25-1/drbd] Error 2
make[1]: Leaving directory '/tmp/pkg/drbd-9.0.25-1/drbd'
make[1]: *** [Makefile:132: kbuild] Error 2
make: *** [Makefile:135: module] Error 2
Could not find the expexted *.ko, see stderr for more details
$ k version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:09:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
$ docker version
Client: Docker Engine - Community
Version: 20.10.1
API version: 1.41
Go version: go1.13.15
Git commit: 831ebea
Built: Tue Dec 15 04:34:30 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.1
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: f001486
Built: Tue Dec 15 04:32:21 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ cat /etc/centos-release
CentOS Stream release 8
$ cat /etc/os-release
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
$ dnf install -y kmod-drbd90 drbd90-utils
Last metadata expiration check: 1:14:10 ago on Sat 19 Dec 2020 07:45:54 AM CET.
Package kmod-drbd90-9.0.25-2.el8_3.elrepo.x86_64 is already installed.
Package drbd90-utils-9.13.1-1.el8.elrepo.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
$ dnf info kmod-drbd90 drbd90-utils
Last metadata expiration check: 1:15:29 ago on Sat 19 Dec 2020 07:45:54 AM CET.
Installed Packages
Name : drbd90-utils
Version : 9.13.1
Release : 1.el8.elrepo
Architecture : x86_64
Size : 5.8 M
Source : drbd90-utils-9.13.1-1.el8.elrepo.src.rpm
Repository : @System
From repo : elrepo
Summary : Management utilities for DRBD
URL : http://www.drbd.org/
License : GPLv2+
Description : DRBD mirrors a block device over the network to another machine.
: Think of it as networked raid 1. It is a building block for
: setting up high availability (HA) clusters.
:
: This packages includes the DRBD administration tools and integration
: scripts for heartbeat, pacemaker, rgmanager and xen.
Name : kmod-drbd90
Version : 9.0.25
Release : 2.el8_3.elrepo
Architecture : x86_64
Size : 1.3 M
Source : kmod-drbd90-9.0.25-2.el8_3.elrepo.src.rpm
Repository : @System
From repo : elrepo
Summary : drbd90 kernel module(s)
URL : http://www.drbd.org/
License : GPLv2
Description : DRBD is a distributed replicated block device. It mirrors a
: block device over the network to another machine. Think of it
: as networked raid 1. It is a building block for setting up
: high availability (HA) clusters.
: This package provides the drbd90 kernel module(s).
: It is built to depend upon the specific ABI provided by a range of releases
: of the same variant of the Linux kernel and not on any one specific build.
$ git log
commit 4ee8b6e6a556cb64877a966bd857050b00834caa (HEAD -> master, origin/master, origin/HEAD)
Author: Moritz "WanzenBug" Wanzenböck <...>
Date: Wed Nov 18 13:59:08 2020 +0100
Prepare next dev cycle
commit 5068780fda8ce603a6ea32ee70b57b4e6b4e1f23 (tag: v1.2.0)
Author: Moritz "WanzenBug" Wanzenböck <...>
Date: Wed Nov 18 13:58:51 2020 +0100
Release v1.2.0
Should a Piraeus git repo upgrade and redeploy solve it, maybe? As there were no releases in the meantime, I'm running the latest 1.2.0 version.
I've also seen #134 and the mentioned files do not exist on the nodes, and also the make error seems to be different.
$ cat /sys/module/drbd/parameters/usermode_helper
cat: /sys/module/drbd/parameters/usermode_helper: No such file or directory
$ cat /etc/modprobe.d/drbd.conf
cat: /etc/modprobe.d/drbd.conf: No such file or directory
For debugging we often have to check the status of the volume/resource in the linstor CLI. Therefore we first have to check the PV name the CSI driver has generated for the PVC. With the PV name we can then execute linstor v l|grep <pv_name>
to see what the actual status of the volume/resource in DRBD is.
To ease that process it would be helpful to write the PVC name to a custom property of the volume and/or resource. In addition the linstor command should be extended to add a column for additional properties specified in an argument (but this feature request has to be raised at https://github.com/LINBIT/linstor-server).
With that you could easily find out which ResourceName (PV name) and DeviceName belong to the PVC name, which significantly will speed up analysis.
Is that a feature of general interest?
We get this error on pvc bounding. The status of the pvc is pending forever.
Warning ProvisioningFailed 7m43s linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal Provisioning 3m26s (x9 over 7m53s) linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 External provisioner is provisioning volume for claim "default/hdd-pvc-zabbix-mysql01-8bb0a103-7b21-4e4d-822d-71da9bdc08b3"
Warning ProvisioningFailed 3m26s (x8 over 7m42s) linstor.csi.linbit.com_piraeus-op-csi-controller-66fd956c6d-7kv2n_f5010b7b-8089-4db2-9f7d-b7987f9898a9 failed to provision volume with StorageClass "linstor-sc-hdd": rpc error: code = Internal desc = CreateVolume failed for pvc-44c6873d-4875-4c86-9588-a55c2a520302: Message: 'A volume definition with the number 0 already exists in resource definition 'pvc-44c6873d-4875-4c86-9588-a55c2a520302'.'; Cause: 'The VolumeDefinition already exists'; Details: 'Volume definitions of resource: pvc-44c6873d-4875-4c86-9588-a55c2a520302'
Normal ExternalProvisioning 102s (x26 over 7m53s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "linstor.csi.linbit.com" or manually created by system administrator
linstor v l showa this pvc with status:
k8smaster ┊ pvc-44c6873d-4875-4c86-9588-a55c2a520302 ┊ lvm-hdd ┊ 0 ┊ 1132 ┊ /dev/drbd1132 ┊ 9.05 GiB ┊ Unused ┊ UpToDate ┊
┊ k8w6 ┊ pvc-44c6873d-4875-4c86-9588-a55c2a520302 ┊ lvm-hdd ┊ 0 ┊ 1132 ┊ /dev/drbd1132 ┊ 6.25 GiB ┊ Unused ┊ Outdated ┊
Please consider adding an option to set the value of Topology feature gate parameter using a variable.
When PVC and PV gets deleted, linstor resource list
still shows all of the PVC's
My setup:
- helm upgrade piraeus-op ./charts/piraeus
--install
--create-namespace
--namespace piraeus
--values ./values.yaml
--set etcd.enabled=false
--set operator.controller.dbConnectionURL=jdbc:postgresql://10.10.6.5:5432/linstordb?user=linstoruser&password=123
Helm Values file:
operator:
replicas: 3
satelliteSet:
kernelModuleInjectionImage: quay.io/piraeusdatastore/drbd9-focal:v9.0.27
storagePools:
lvmThinPools:
- name: lvm-thin-hdd
thinVolume: thinpoolhdd
volumeGroup: "linstor_thinpoolhdd"
- name: lvm-thin-ssd
thinVolume: thinpoolssd
volumeGroup: "linstor_thinpoolssd"
controller:
replicas: 3
csi:
enableTopology: true
controllerReplicas: 3
haController:
replicas: 3
Created custom StorageClass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: linstor-hdd-r1
annotations:
storageclass.kubernetes.io/is-default-class: 'true'
provisioner: linstor.csi.linbit.com
parameters:
allowRemoteVolumeAccess: 'true'
autoPlace: '1'
csi.storage.k8s.io/fstype: xfs
disklessOnRemaining: 'false'
mountOpts: 'noatime,discard'
placementPolicy: FollowTopology
resourceGroup: linstor-hdd-r1
storagePool: lvm-thin-hdd
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
Deployed a StatefulSet, it provisions nicely, then completely uninstalled it including namespace.
Linstor still has old Resources:
Is this expected behavior or a bug or am I doing something wrong?
Right now LINSTOR only logs to stdout, i.e. the logs are only available via Kubernetes logs. This is a problem when trying to access the logs from inside the container, i.e. linstor sos-report create
.
We should configure controller and satellite to actually log in their log directory. This probably needs to be done via the logback.xml config. A useful snippet may be:
sed 's/<!-- <appender-ref ref="FILE" /> -->/<appender-ref ref="FILE" />/g' logback.xml
Install the appropriate kernel headers package for your distribution. Then the operator will compile and load the required modules.
This sentence implies that the operator will sort out the modules and there is nothing more the user needs to do. However, when they are not on Ubuntu Bionic, they also need to set operator.satelliteSet.kernelModuleInjectionImage
.
This has long been a stumbling block. We should fix it.
As there are quite a few images and registries in the value.yaml, it would be more convenient to have a global registry override
for dark sites where a single intranet registry are usually employed to get job done.
If below can be made possible (all of quay.io, docker.io, gcr.io registries are overridden)
helm install piraeus-op piraeus/piraeus --global.registryOverride=myreg.io/piraeus
Hi! I am testing Piraeus now and I got it working, but without configuring TLS etc between components or for etcd. That part of the setup is a little complicated for me and I am not sure of how to proceed...
So the question is, what are the risks? I am using this with virtual servers with a cheap provider that don't have private network. Do I risk something if I use the setup as it is? If yes, is there a bit more detailed guide on securing the components?
Thanks!
Please consider providing an option to support the NVMe layer for Linstor Satellites.
For example, a Satellite's pod might access modules required for the NVMe layer from a host, or maybe modules could be provided as a sidecar container.
Hi Everyone,
I've managed to start up the Piraeus Operator in an RKE k8s cluster with 3 CentOS 8 nodes, etcd with 3 replicas and persistence through hostPath volume, CSI snapshotter and Stork turned off, kernel injector replaced for CentOS 8, hosts prepared with drbd and kernel headers as required. The 3 VM live on LVM Thin provisioned by Proxmox, and all VMs have one device each (sda) for the OS and local node data. The backing storage is NVMe.
As far as I see, the storage documentation (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md) supports managing ZFS and LVM (Thin) pools through the operator. Since I'm not running the k8s cluster on bare metal, I'd need to create new disks in Proxmox (sdb) and attach them to the VMs. This way I could use the "Preparing physical devices" section to add these new disks to Piraeus. Question 1: is this "multi-level" or nested LVM thin provisioning supported? I think these sub-LVM Thin volumes would not mess with the external LVM Thin as these can contain any filesystem or LVM partition on their own. Am I right?
Question 2: My second question is related to the host disks (sda). Could I use these disks to provision storage to the pods running in k8s by using a folder on the host disks? What I originally desired is to use the host VM filesystem to store data from pods in a highly available manner, replicated across VMs. Like a hostPath
or LocalVolume
PV but this wouldn't depend on a single node. I'd like to store SQLite databases which would be only accessed by one pod each. I've found network file storage highly unreliable with SQLite (DB file corruption, locking issues, etc) but since I have an NVMe disk, and Linstor/Piraeus seems to be fast, I hope I could achieve this. Maybe I'm just not familiar enough with Piraeus/Linstor to know this is not possible, and I need to stick to disks. If the folder would contain binary files for storing the written blocks, and not the actual files, that would also work. As a workaround, may a loopback device would work pointing to a file that would contain the LVM Thin storage on each host?
+------------------+
| |
| VM +------+--------+
| running | |
| RKE | sda (OS) +<------+
| | | |
| +------+--------+ |
| | |
| +------+--------+ |
| | | |
| | sdb (data) +<---+ |
| | | | |
| +------+--------+ | |
| | | |
+------------------+ | |
| |
+------------------+ | |
| | | |
| VM +------+--------+ | |
| running | | | |
| RKE | sda (OS) +<------+
| | | | |
| +------+--------+ | |
| | | |
| +------+--------+ | |
| | | | |
| | sdb (data) +<---+ |
| | | | |
| +------+--------+ | |
| | | |
+------------------+ | |
| |
+------------------+ | |
| | | |
| VM +------+--------+ | |
| running | | | |
| RKE | sda (OS) +<------+
| | | | |
| +------+--------+ | |
| | | |
| +------+--------+ | |
| | | | |
| | sdb (data) +<---+ |
| | | | |
| +------+--------+ | |
| | | |
+------------------+ + +
Linstor (Piraeus)
1. the whole sdb dev?
2. a folder on sda?
Question 3: Since I haven't provision storage through satellite sets yet, is this why I can't see any new StorageClass
created in the cluster? If I check the contents of the non-operator Piraeus deployment (https://raw.githubusercontent.com/piraeusdatastore/piraeus/master/deploy/all.yaml), there are multiple storage classes defined. The operator only creates these when it has the satellites configured with storage?
$ kubectl exec -it deployment.apps/piraeus-op-cs-controller -- bash
root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor n list
╭──────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ piraeus-op-cs-controller-7fb6c98656-zvx49 ┊ CONTROLLER ┊ 10.42.1.72:3366 (PLAIN) ┊ Online ┊
┊ node1 ┊ SATELLITE ┊ 192.168.1.201:3366 (PLAIN) ┊ Online ┊
┊ node2 ┊ SATELLITE ┊ 192.168.1.202:3366 (PLAIN) ┊ Online ┊
┊ node3 ┊ SATELLITE ┊ 192.168.1.203:3366 (PLAIN) ┊ Online ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor n info
╭─────────────────────────────────────────────────────────────╮
┊ Node ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊
╞═════════════════════════════════════════════════════════════╡
┊ node1 ┊ + ┊ + ┊ + ┊ - ┊ + ┊
┊ node2 ┊ + ┊ + ┊ + ┊ - ┊ + ┊
┊ node3 ┊ + ┊ + ┊ + ┊ - ┊ + ┊
╰─────────────────────────────────────────────────────────────╯
Unsupported storage providers:
node1:
SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
node2:
SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
node3:
SPDK: IO exception occured when running 'rpc.py get_spdk_version': Cannot run program "rpc.py": error=2, No such file or directory
ZFS_THIN: 'cat /sys/module/zfs/version' returned with exit code 1
ZFS: 'cat /sys/module/zfs/version' returned with exit code 1
╭──────────────────────────────────────────╮
┊ Node ┊ DRBD ┊ LUKS ┊ NVMe ┊ Storage ┊
╞══════════════════════════════════════════╡
┊ node1 ┊ + ┊ - ┊ + ┊ + ┊
┊ node2 ┊ + ┊ - ┊ + ┊ + ┊
┊ node3 ┊ + ┊ - ┊ + ┊ + ┊
╰──────────────────────────────────────────╯
Unsupported resource layers:
node1:
LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory
node2:
LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory
node3:
LUKS: IO exception occured when running 'cryptsetup --version': Cannot run program "cryptsetup": error=2, No such file or directory
root@piraeus-op-cs-controller-7fb6c98656-zvx49:/# linstor v list
╭──────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════╡
╰──────────────────────────────────────────────────────────────────────────────────────────╯
Note: I have cryptsetup
installed on the nodes, so I don't know why it's not available. However, my Helm values have an empty operator.controller.luksSecret
value, maybe that's why.
Note: this is a home k8s cluster, not affiliated with my work, and I'm on holiday today. The initiator problem is Sonarr/Radarr corrupting their SQLite DBs on NFS storage through nfs-client-provisioner. I'm looking for a "more cloud-native" solution than Ceph, more performant than OpenEBS Jiva/cstor and Longhorn, and more stable than OpenEBS Mayastor. Linstor/Piraeus seemed like a really good alternative from multiple benchmarks, and easy to set up. It indeed looks promising, I really hope I can use it after getting to know it :) Rancher has a host path provisioner project that could share host storage with pods but it would still depend on the shared folder's node, and I'd like to have data replication among nodes.
Hi, I just tried new piraeus-operator, and found that it is deployed two controllers, one as kind: Deployment
another one as kind: StatefulSet
:
# k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
piraeus-op-cs-controller 1/1 1 1 12h
piraeus-op-csi-controller 1/1 1 1 10d
piraeus-op-operator 1/1 1 1 10d
piraeus-op-stork 1/1 1 1 10d
piraeus-op-stork-scheduler 1/1 1 1 10d
# k get sts
NAME READY AGE
piraeus-op-cs-controller 0/1 10d
piraeus-op-etcd 1/1 10d
snapshot-controller 1/1 10d
# k get pod -l app=piraeus-op-cs,role=piraeus-controller
NAME READY STATUS RESTARTS AGE
piraeus-op-cs-controller-0 1/1 Running 0 5m25s
piraeus-op-cs-controller-75fc99d964-c5z87 1/1 Running 0 12h
root@piraeus-op-cs-controller-0:/# linstor n l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ ks1 ┊ SATELLITE ┊ 172.16.0.6:3366 (PLAIN) ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks2 ┊ SATELLITE ┊ 172.16.0.7:3366 (PLAIN) ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks3 ┊ SATELLITE ┊ 172.16.0.8:3366 (PLAIN) ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ ks4 ┊ SATELLITE ┊ 172.16.0.9:3366 (PLAIN) ┊ OFFLINE(OTHER_CONTROLLER) ┊
┊ piraeus-op-cs-controller-0 ┊ CONTROLLER ┊ 10.244.3.216:3366 (PLAIN) ┊ OFFLINE ┊
┊ piraeus-op-cs-controller-75fc99d964-c5z87 ┊ CONTROLLER ┊ 10.244.2.152:3366 (PLAIN) ┊ Online ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
We want to make all pods configurable with resource requests and limits.
Some k8s distributions require a set of resource requests and limits to schedule pods. Pods without these will never start.
On the other hand, for quick tests it is often desired to ignore those requests (as the test machine cannot provide the requested resources)
New to linstor and related tooling and was wondering if someone could show me the needed yaml config for adding an existing ZFS pool satellite via pireaus-operator? There seems to be lots of docs on lvm but not much on ZFS.
I just did a re-installation of linstor on my 5 node k8s version 1.18.9 drbd 9.0.25.
It is not possible to create any pvc. I always get the error DeadlineExceeded.
Any solution? What can I do to debug this? Thanks.
This is the error from csi-plugin:
time="2020-10-13T11:32:42Z" level=error msg="method failed" func="github.com/sirupsen/logrus.(*Entry).Error" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:297" error="rpc error: code = Internal desc = CreateVolume failed for pvc-a36918fc-0bb3-40ce-aace-ce86288d93d6: unable to determine volume topology: unable to determine AccessibleTopologies: context canceled" linstorCSIComponent=driver method=/csi.v1.Controller/CreateVolume nodeID=k8w2 provisioner=linstor.csi.linbit.com req="name:"pvc-a36918fc-0bb3-40ce-aace-ce86288d93d6" capacity_range:<required_bytes:1000000000 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > parameters:<key:"autoPlace" value:"1" > parameters:<key:"storagePool" value:"lvm-hdd" > " resp= version=v0.9.0
Currently, piraeus-server starts by default with command argument --rest-bind=0.0.0.0:3370
, as in https://github.com/piraeusdatastore/piraeus/blob/63dc06e13e7841607a2ec433349ef717ed2af498/dockerfiles/piraeus-server/entry.sh#L12
As a result, when controler.linstorHttpsControllerSecret
is set, https port 3371 cannot come up.
Aug 13, 2020 11:23:00 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:23:00 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:23:01 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3370]
Aug 13, 2020 11:23:01 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
11:23:01.150 [Main] ERROR LINSTOR/Controller - SYSTEM - Unable to start grizzly http server on 0.0.0.0:3370.
11:23:01.152 [Main] INFO LINSTOR/Controller - SYSTEM - Controller initialized
After removing --rest-bind=0.0.0.0:3370
from cmd args, port 3371 comes up.
Aug 13, 2020 11:13:18 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:18 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
11:13:19.060 [Main] ERROR LINSTOR/Controller - SYSTEM - Unable to start grizzly http server on [::]:3370.
11:13:19.061 [Main] INFO LINSTOR/Controller - SYSTEM - Trying to start grizzly http server on fallback ipv4: 0.0.0.0
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.accesslog.FileAppender <init>
INFO: Access log file "/var/log/linstor-controller/rest-access.log" opened
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3370]
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3371]
Aug 13, 2020 11:13:20 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-2] Started.
11:13:20.539 [Main] INFO LINSTOR/Controller - SYSTEM - Controller initialized
Please consider providing an option to configure storage pools independently on different nodes.
Currently on each node there is a drbdpool
storage pool created, which I only need on selected nodes. On other nodes I would like to have just a diskless storage pool.
Also please consider adding a way to set specific properties for storage pools, like for example a PrefNic.
I got some pvc which are read only after reboot of k8s nodes.
linstor v l shows up to date.
root@piraeus-op-cs-controller-667d5b46fb-rx9nz:/# linstor v l | grep 2bb1
| k8w1 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | DfltDisklessStorPool | 0 | 1037 | /dev/drbd1037 | | InUse | Diskless |
| k8w2 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 4.70 GiB | Unused | UpToDate |
| k8w3 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 296.69 MiB | Unused | UpToDate |
| k8w4 | pvc-60be9b00-f476-420b-bc07-a6e87f972bb1 | lvm-any50 | 0 | 1037 | /dev/drbd1037 | 303.25 MiB | Unused | UpToDate |
How can I fix this?
I was setting up Piraeus in my homelab and I'm trying to set the raidLevel property[1] of a LVM pool but none of the values I'm trying to supply to it are accepted.
I've tried setting it to different values of what should be supported by LVM such as striped
, raid0
, mirror
and raid1
and even literal RAID numbers (e.g. , "0"
) but they all fail at runtime with an error similar to the following in the operator logs:
time="2021-01-30T23:16:14Z" level=info msg="satellite Reconcile: reconcile loop end" Controller=linstorsatelliteset controller=LinstorSatelliteSet err="multiple errors: Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017344]'|Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017345]'" requestName=piraeus-ns requestNamespace=storage-operators result="{false 0s}"
{"level":"error","ts":1612048574.3175702,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"LinstorSatelliteSet-controller","request":"storage-operators/piraeus-ns","error":"multiple errors: Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017344]'|Message: 'An unknown error occurred.'; Details: 'No enum constant com.linbit.linstor.storage.kinds.RaidLevel.RAID0'; Reports: '[6015BCC9-00000-017345]'","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/runner/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90"}
Going through LINBIT's linstor-server repo to locate the mentioned enum[2] it looks like the whole field can't even do anything right now. It seems the only value you can use would be jbod
which is the default anyway.
Or am I missing something and is this supposed to be working?
[1] https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#lvmpools-configuration
[2] https://github.com/LINBIT/linstor-server/blob/master/server/src/main/java/com/linbit/linstor/storage/kinds/RaidLevel.java
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.