Giter VIP home page Giter VIP logo

zookeeper-k8s-operator's Introduction

Charmed ZooKeeper K8s Operator

CharmHub Badge Release Tests

Overview

The Charmed ZooKeeper K8s Operator delivers automated operations management from day 0 to day 2 on the Apache ZooKeeper server which enables highly reliable distributed coordination, deployed on top of a Kubernetes cluster. It is an open source, end-to-end, production ready data platform on top of cloud native technologies.

This operator charm comes with features such as:

  • Horizontal scaling for high-availability out-of-the-box
  • Server-Server and Client-Server authentication both enabled by default
  • Access control management supported with user-provided ACL lists.

The ZooKeeper K8s Operator uses the latest upstream ZooKeeper binaries released by the The Apache Software Foundation that come with Kafka, made available using the dataplatformoci/zookeeper OCI image distributed by Canonical.

Requirements

For production environments, it is recommended to deploy at least 5 nodes for Zookeeper. While the following requirements are meant to be for production, the charm can be deployed in smaller environments.

  • 4-8GB of RAM
  • 2-4 cores
  • 1 storage device, 64GB

Config options

To get a description of all config options available, please refer to the config.yaml file.

Options can be changed by using the juju config command:

juju config zookeeper-k8s <config_option_1>=<value> [<config_option_2>=<value>]

Usage

Basic usage

The ZooKeeper operator may be deployed using the Juju command line as follows:

$ juju deploy zookeeper-k8s -n 5

To watch the process, juju status can be used. Once all the units show as active|idle the credentials to access the admin user can be queried with:

juju run-action zookeeper-k8s/leader get-super-password --wait 

Replication

Scaling application

The charm can be scaled using juju scale-application command.

juju scale-application zookeeper-k8s <num_of_servers_to_scale_to>

This will add or remove servers to match the required number. To scale a deployment with 3 zookeeper units to 5, run:

juju scale-application zookeeper-k8s 5

Password rotation

Internal users

The Charmed ZooKeeper K8s Operator has two internal users:

  • super: admin user for the cluster. Used mainly with the Kafka operator.
  • sync: specific to the internal quorum handling.

The set-password action can be used to rotate the password of one of them. If no username is passed, it will default to the super user.

# to set a specific password for the sync user
juju run-action zookeeper-k8s/leader set-password username=sync password=<password> --wait

# to randomly generate a password for the super user
juju run-action zookeeper-k8s/leader set-password --wait

Relations

Supported relations:

tls-certificates interface:

The tls-certificates interface is used with the tls-certificates-operator charm.

To enable TLS:

# deploy the TLS charm 
juju deploy tls-certificates-operator --channel=edge
# add the necessary configurations for TLS
juju config tls-certificates-operator generate-self-signed-certificates="true" ca-common-name="Test CA" 
# to enable TLS relate the application 
juju relate tls-certificates-operator zookeeper-k8s

Updates to private keys for certificate signing requests (CSR) can be made via the set-tls-private-key action.

# Updates can be done with auto-generated keys with
juju run-action zookeeper-k8s/0 set-tls-private-key --wait
juju run-action zookeeper-k8s/1 set-tls-private-key --wait
juju run-action zookeeper-k8s/2 set-tls-private-key --wait

Passing keys to internal keys should only be done with base64 -w0 not cat. With three servers this schema should be followed:

# generate shared internal key
openssl genrsa -out internal-key.pem 3072
# apply keys on each unit
juju run-action zookeeper-k8s/0 set-tls-private-key "internal-key=$(base64 -w0 internal-key.pem)"  --wait
juju run-action zookeeper-k8s/1 set-tls-private-key "internal-key=$(base64 -w0 internal-key.pem)"  --wait
juju run-action zookeeper-k8s/2 set-tls-private-key "internal-key=$(base64 -w0 internal-key.pem)"  --wait

To disable TLS remove the relation

juju remove-relation zookeeper-k8s tls-certificates-operator

Note: The TLS settings here are for self-signed-certificates which are not recommended for production clusters, the tls-certificates-operator charm offers a variety of configurations, read more on the TLS charm here

Monitoring

The Charmed ZooKeeper K8s Operator comes with several exporters by default. The metrics can be queried by accessing the following endpoints:

  • JMX exporter: http://<pod-ip>:9998/metrics
  • ZooKeeper metrics: http://<pod-ip>:7000/metrics

Additionally, the charm provides integration with the Canonical Observability Stack.

Deploy cos-lite bundle in a Kubernetes environment. This can be done by following the deployment tutorial. It is needed to offer the endpoints of the COS relations. The offers-overlay can be used, and this step is shown on the COS tutorial.

Once COS is deployed, we can find the offers from the zookeeper model:

# We are on the `cos` model. Switch to `zookeeper` model
juju switch <zookeeper_model_name>
juju find-offers <k8s_controller_name>:

A similar output should appear, if micro is the k8s controller name and cos the model where cos-lite has been deployed:

Store  URL                   Access  Interfaces                         
micro  admin/cos.grafana     admin   grafana_dashboard:grafana-dashboard
micro  admin/cos.prometheus  admin   prometheus_scrape:metrics-endpoint
. . .

Now, integrate zookeeper with the metrics-endpoint, grafana-dashboard and logging relations:

juju relate micro:admin/cos.prometheus zookeeper-k8s
juju relate micro:admin/cos.grafana zookeeper-k8s
juju relate micro:admin/cos.loki zookeeper-k8s

After this is complete, Grafana will show a new dashboard: ZooKeeper Metrics

Security

Security issues in the Charmed ZooKeeper K8s Operator can be reported through LaunchPad. Please do not file GitHub issues about security issues.

Contributing

Please see the Juju SDK docs for guidelines on enhancements to this charm following best practice guidelines, and CONTRIBUTING.md for developer guidance.

License

The Charmed ZooKeeper K8s Operator is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information.

zookeeper-k8s-operator's People

Contributors

batalex avatar carlcsaposs-canonical avatar davigar15 avatar deusebio avatar evildmp avatar jardon avatar marcoppenheimer avatar taurus-forever avatar welpaolo avatar zmraul avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zookeeper-k8s-operator's Issues

Uncaught exception on `healthy` check

Handling an update_status event while the service is down triggers an exception on src/workload.py -> healthy check. The 10s timeout raises and the hook is not finished normally.

This issue happened on a full cluster crash HA test

Log output

INFO     pytest_operator.plugin:plugin.py:784 Model status:

Model         Controller                Cloud/Region        Version  SLA          Timestamp
test-ha-784h  github-pr-538bf-microk8s  microk8s/localhost  3.1.6    unsupported  11:09:27Z

App            Version  Status   Scale  Charm          Channel  Rev  Address         Exposed  Message
zookeeper-k8s           waiting      3  zookeeper-k8s             0  10.152.183.206  no       waiting for units to settle down

Unit              Workload  Agent  Address      Ports  Message
zookeeper-k8s/0*  active    idle   10.1.209.80         
zookeeper-k8s/1   active    idle   10.1.209.78         
zookeeper-k8s/2   error     idle   10.1.209.79         hook failed: "update-status"


INFO     pytest_operator.plugin:plugin.py:790 Juju error logs:

unit-zookeeper-k8s-0: 10:58:35 ERROR unit.zookeeper-k8s/0.juju-log Cluster upgrade failed, ensure pre-upgrade checks are ran first.
unit-zookeeper-k8s-0: 10:58:53 ERROR unit.zookeeper-k8s/0.juju-log zookeeper service is unreachable or not serving requests
unit-zookeeper-k8s-0: 10:59:02 ERROR unit.zookeeper-k8s/0.juju-log zookeeper service is unreachable or not serving requests
unit-zookeeper-k8s-2: 11:08:02 ERROR unit.zookeeper-k8s/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/./src/charm.py", line 457, in <module>
    main(ZooKeeperCharm)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/main.py", line 436, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 942, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/./src/charm.py", line 229, in _on_cluster_relation_changed
    if self.state.unit_server.started and not self.workload.healthy:
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/src/workload.py", line 92, in healthy
    ruok_response = self.exec(command=timeout + ruok)
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/src/workload.py", line 60, in exec
    return str(self.container.exec(command, working_dir=working_dir).wait_output())
  File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/pebble.py", line 1441, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 124 executing ['timeout', '10s', 'bash', '-c', "echo 'ruok' | (exec 3<>/dev/tcp/localhost/2181; cat >&3; cat <&3; exec 3<&-)"], stdout='', stderr=''
unit-zookeeper-k8s-2: 11:08:02 ERROR juju.worker.uniter.operation hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1

No zookeeper log files anywhere on the container

Steps to reproduce

  1. juju deploy zookeeper-k8s --channel edge -n 3
  2. juju ssh --container zookeeper zookeeper-k8s/0
  3. find / -name zookeeper.log

Expected behavior

Logs to be in /var/log/zookeeper

Actual behavior

Logs not anywhere.

Upgrade from rev41 not working

I'm facing some issues when upgrading the charm. I'm currently using revision 41, deployed using the following bundle:

bundle: kubernetes
name: kafka-k8s-bundle
applications:
  tls:
    charm: self-signed-certificates
    channel: latest/edge
    revision: 75
    scale: 1
    options:
      ca-common-name: Canonical
    constraints: arch=amd64
  zookeeper-k8s:
    charm: zookeeper-k8s
    channel: 3/edge
    revision: 41
    scale: 3
    trust: true
    constraints: arch=amd64
    resources:
      zookeeper-image: 28
relations:
- - zookeeper-k8s:certificates
  - tls:certificates

I have identified some strange hints/issues:

  1. The pre-upgrade-check action fails, although the logs does not show this, and there are only INFO logs
  2. The peer-relation upgrade databag does not have any upgrade stack, although logs are also saying Building upgrade stack for VM (also the logging here may be improved since we are not in vm)
  3. If I try to do juju refresh, the upgrade of the first unit (zookeeper-k8s/2) does not successfully go through, and the state of the unit reports: zookeeper service is unreachable or not serving requests, therefore basically halting the upgrade process

Steps to reproduce and Actual behavior

  1. juju deploy ./bundle.yaml --trust

(wait for units to come up healthy)

  1. juju run zookeeper-k8s/leader pre-upgrade-check --format yaml

The action provides the following output:

Running operation 1 with 1 task
  - task 2 on unit-zookeeper-k8s-0

Waiting for task 2...
zookeeper-k8s/0:
  id: "2"
  message: Unknown error found.
  results:
    return-code: 0
  status: failed
  timing:
    completed: 2024-02-14 23:29:05 +0000 UTC
    enqueued: 2024-02-14 23:29:03 +0000 UTC
    started: 2024-02-14 23:29:03 +0000 UTC
  unit: zookeeper-k8s/0

Also the upgrade peer relation databag does not show any upgrade stack

  1. If I try to upgrade anyway, with juju refresh zookeeper-k8s effectively bumping to rev45, the upgrade of the first unit goes into error with the following state (of juju status)
Model  Controller  Cloud/Region        Version  SLA          Timestamp
tests  micro       microk8s/localhost  3.1.7    unsupported  23:43:54Z

App            Version  Status   Scale  Charm                     Channel      Rev  Address         Exposed  Message
tls                     active       1  self-signed-certificates  latest/edge   75  10.152.183.111  no
zookeeper-k8s           waiting      3  zookeeper-k8s             3/edge        45  10.152.183.126  no       installing agent

Unit              Workload  Agent  Address      Ports  Message
tls/0*            active    idle   10.1.63.214
zookeeper-k8s/0*  active    idle   10.1.63.203
zookeeper-k8s/1   active    idle   10.1.63.202
zookeeper-k8s/2   blocked   idle   10.1.63.207         zookeeper service is unreachable or not serving requests

Expected behavior

Upgrade goes though cleanly .

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.1.7

Juju agent: 3.1.7

Charm revision: 41 upgrade to 50

microk8s: 1.29-strict/stable

installed:               v1.29.0             (6370) 168MB -

Log output

Juju debug log (starting from the pre-upgrade-check action and on)

logs.txt

Upgrade and pod rescheduling failing with TLS

During upgrade tests, we noticed that tls-related files are not correctly created during pod rescheduling, therefore not allowing cluster to recover.

Steps to reproduce

  1. build from 5d9bb11e61174f4680ff9effeb56c7be18b03c18
  2. juju deploy ./zookeeper-k8s_ubuntu-22.04-amd64.charm -n 3 --trust --resource zookeeper-image=ghcr.io/canonical/charmed-zookeeper@sha256:dbdbd8367bf6d813b9aae1e15a6c1743f909db7555a47995b6b5d259e87f2af1
  3. juju deploy self-signed-certificates
  4. juju relate zookeeper-k8s self-signed-certificates
  5. juju run zookeeper-k8s/leader pre-upgrade-check --format yaml
Running operation 1 with 1 task
  - task 2 on unit-zookeeper-k8s-0

Waiting for task 2...
zookeeper-k8s/0:
  id: "2"
  results:
    return-code: 0
  status: completed
  timing:
    completed: 2024-02-16 11:52:50 +0000 UTC
    enqueued: 2024-02-16 11:52:48 +0000 UTC
    started: 2024-02-16 11:52:48 +0000 UTC
  unit: zookeeper-k8s/0
  1. juju refresh zookeeper-k8s --path ./zookeeper-k8s_ubuntu-22.04-amd64.charm

Expected behavior

Upgrades works file and the units recovers from pod rescheduling

Actual behavior

Juju status at the end:

...
Unit                         Workload  Agent  Address      Ports  Message
self-signed-certificates/0*  active    idle   10.1.63.209
zookeeper-k8s/0*             active    idle   10.1.63.227
zookeeper-k8s/1              active    idle   10.1.63.232
zookeeper-k8s/2              blocked   idle   10.1.63.231
...

Juju debug-log:

unit-zookeeper-k8s-2: 11:56:14 INFO unit.zookeeper-k8s/2.juju-log Running legacy hooks/upgrade-charm.
unit-zookeeper-k8s-2: 11:56:16 INFO unit.zookeeper-k8s/2.juju-log zookeeper-k8s/2 initializing...
unit-zookeeper-k8s-2: 11:56:17 INFO unit.zookeeper-k8s/2.juju-log zookeeper-k8s/2 started
unit-self-signed-certificates-0: 11:56:24 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-1: 11:56:41 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-2: 11:58:00 ERROR unit.zookeeper-k8s/2.juju-log Not all application units are connected and broadcasting in the quorum
unit-zookeeper-k8s-2: 11:58:00 CRITICAL unit.zookeeper-k8s/2.juju-log Unit failed to upgrade and requires manual rollback to previous stable version.
    1. Re-run `pre-upgrade-check` action on the leader unit to enter 'recovery' state
    2. Run `juju refresh` to the previously deployed charm revision
unit-zookeeper-k8s-2: 11:58:00 INFO juju.worker.uniter.operation ran "upgrade-charm" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-2: 11:58:00 INFO juju.worker.uniter found queued "config-changed" hook

Zookeeper logs show:

2024-02-16T12:07:59.538Z [zookeeper] 12:07:59.538 [QuorumConnectionThread-[myid=3]-25] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel to server 2
2024-02-16T12:07:59.538Z [zookeeper] 12:07:59.538 [QuorumConnectionThread-[myid=3]-25] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open secure channel to 2 at election address zookeeper-k8s-1.zookeeper-k8s-endpoints/10.1.63.232:3888
2024-02-16T12:07:59.538Z [zookeeper] org.apache.zookeeper.common.X509Exception$SSLContextException: Failed to create KeyManager
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:371)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:349)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:303)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.getDefaultSSLContextAndOptions(X509Util.java:283)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createSSLSocket(X509Util.java:574)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:379)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.lang.Thread.run(Thread.java:833)
2024-02-16T12:07:59.538Z [zookeeper] Caused by: org.apache.zookeeper.common.X509Exception$KeyManagerException: java.io.FileNotFoundException: /etc/zookeeper/keystore.p12 (No such file or directory)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:492)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:369)
2024-02-16T12:07:59.538Z [zookeeper]    ... 9 common frames omitted
2024-02-16T12:07:59.538Z [zookeeper] Caused by: java.io.FileNotFoundException: /etc/zookeeper/keystore.p12 (No such file or directory)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.io.FileInputStream.open0(Native Method)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
2024-02-16T12:07:59.538Z [zookeeper]    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.StandardTypeFileKeyStoreLoader.loadKeyStore(StandardTypeFileKeyStoreLoader.java:53)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.loadKeyStore(X509Util.java:425)
2024-02-16T12:07:59.538Z [zookeeper]    at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:481)
2024-02-16T12:07:59.538Z [zookeeper]    ... 10 common frames omitted
root@zookeeper-k8s-2:/# cd /etc/zookeeper/

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.1.7

Juju agent: 3.1.7

Charm revision: 41 upgrade to 50

microk8s: 1.29-strict/stable

installed:               v1.29.0             (6370) 168MB -

Pod rescheduling will fail to recover unit.

Steps to reproduce

juju add-model zk
juju deploy zookeeper-k8s --channel 3/edge -n 3
# wait for idle
kctl delete pod zookeeper-k8s-0 -n zk

Expected behavior

Deleted pod can rejoin cluster without failures.

Actual behavior

Unit willl error:

Unit              Workload  Agent  Address       Ports  Message
zookeeper-k8s/0   error     idle   10.1.146.158         hook failed: "restart-relation-changed"
zookeeper-k8s/1*  active    idle   10.1.146.178                
zookeeper-k8s/2   active    idle   10.1.146.162             

Log output

Traceback (most recent call last):                                                                                                 
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/./src/charm.py", line 471, in <module>                                     
    main(ZooKeeperK8sCharm)                                                                                                        
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/main.py", line 441, in main                                       
    _emit_charm_event(charm, dispatcher.event_name)                                                                                
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/main.py", line 149, in _emit_charm_event                          
    event_to_emit.emit(*args, **kwargs)                                                                                            
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 354, in emit                                  
    framework._emit(event)                                                                                                         
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 830, in _emit                                 
    self._reemit(event_path)                                                                                                       
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 919, in _reemit                               
    custom_handler(event)                                                                                                          
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 327, in _on_relation_changed
    self.charm.on[self.name].run_with_lock.emit()                                                                                  
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 354, in emit                                  
    framework._emit(event)                                                                                                         
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 830, in _emit                                 
    self._reemit(event_path)                                                                                                       
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/framework.py", line 919, in _reemit                               
    custom_handler(event)                                                                                                          
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 385, in _on_run_with_lock   
    self._callback(event)                                                                                                          
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/./src/charm.py", line 201, in _restart                                     
    self.container.restart(CONTAINER)                                                                                              
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/model.py", line 1902, in restart                                  
    self._pebble.start_services(service_names)                                                                                     
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/pebble.py", line 1598, in start_services                          
    return self._services_action('start', services, timeout, delay)                                                                
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/pebble.py", line 1654, in _services_action                        
    resp = self._request('POST', '/v1/services', body=body)                                                                        
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/pebble.py", line 1458, in _request                                
    response = self._request_raw(method, path, query, headers, data)                                                               
  File "/var/lib/juju/agents/unit-zookeeper-k8s-0/charm/venv/ops/pebble.py", line 1502, in _request_raw                            
    raise APIError(body, code, status, message)                                                                                    
ops.pebble.APIError: cannot start services: service "zookeeper" does not exist                                                     

Additional information

This happens because unit will have a flag set to started:

│ unit data             │ ╭─ zookeeper-k8s/zookeeper-k8s/0 ─╮
│                       │ │                                 │
│                       │ │  quorum  default - non-ssl      │
│                       │ │  state   started                │
│                       │ ╰─────────────────────────────────╯

When events are triggered, the cluster seems ok from the charm point of view, so there is no call to recreate the pebble layer of the unit.
A minimal fix looks something like:

def _on_upgrade(self, event):
    self.unit_peer_data.update({"state": "", "quorum": ""})

Changing `log-level` config option doesn't affect the logs

Steps to reproduce

Pending on merging #73

1. juju config zookeeper-k8s log-level=DEBUG
2. # (inside zookeeper container) cat /var/log/zookeeper.log | grep DEBUG 

Expected behavior

Logs with DEBUG level should appear

Actual behavior

No DEBUG logs

Cause

This is happening because log_level change is dependent on the Pebble layer. When updating config, the charm restarts the service, but doesn't re-plan it

Solution

A solution can be updating _restart logic:

def _restart(self, event: EventBase) -> None:
        """Handler for emitted restart events."""
        self._set_status(self.state.stable)
        if not isinstance(self.unit.status, ActiveStatus):
            event.defer()
            return

        current_plan = self.workload.container.get_plan()
        if current_plan.services != self._layer.services:
            self.workload.start(layer=self._layer)
        else:
            self.workload.restart()
        . . .

Note that checks are also dynamically built, so instead of plan.services, we could also add plan.checks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.