Giter VIP home page Giter VIP logo

admission-webhook-operator's Introduction

Admission Webhook Operator

Overview

This charm encompasses the Kubernetes Python operator for Kubeflow's Admission Webhook (see CharmHub).

Install

To install the Admission Webhook, run:

juju deploy admission-webhook

For more information, see https://juju.is/docs

admission-webhook-operator's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

admission-webhook-operator's Issues

Upgrade test is failing in CI due to authorization issues

when upgrading from podspec to sidecar, upgrade tests fail due to authorization issues. Logs:
ERROR juju.worker.caasoperator could not get pod "unit-admission-webhook-1" "c2d8b455-fd59-4dd1-8d34-3ab2cb6d2d21" Unauthorized

Upgrade tests can be done manually by following the steps:

  1. juju deploy admission-webhook --channel=1.7/stable --trust
  2. juju refresh admission-webhook --channel latest/edge
  3. juju trust admission-webhook --scope=cluster

Admission-webhook failed to start service Error: open /etc/webhook/certs/cert.pem: no such file or directory

Bug Description

when creating a Notebook, I get the error:
statefulset/tst-note-t: create Pod tst-note-t-0 in StatefulSet tst-note-t failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": failed to call webhook: Post "https://admission-webhook.kubeflow.svc:4443/apply-poddefault?timeout=10s": dial tcp 10.233.59.161:4443: connect: connection refused
Снимок экрана 2024-02-13 в 15 28 37

To Reproduce

snap install juju
juju bootstrap my-k8s
juju add-model kubeflow
juju deploy kubeflow --trust

Environment

K8s: v1.28.6
OS: Ubuntu 22.04
Juju: 3.3.1-genericlinux-amd64

admission-webhook: 1.8/stable
argo-controller: 3.3.10/stable
dex-auth: 2.36/stable
envoy: 2.0/stable
istio-ingressgateway: 1.17/stable
istio-pilot: 1.17/stable
jupyter-controller: 1.8/stable
jupyter-ui: 1.8/stable
katib-controller: 0.16/stable
katib-db: 8.0/stable
katib-db-manager: 0.16/stable
katib-ui: 0.16/stable
kfp-api: 2.0/stable
kfp-db: 8.0/stable
kfp-metadata-writer: 2.0/stable
kfp-persistence: 2.0/stable
kfp-profile-controller: 2.0/stable
kfp-schedwf: 2.0/stable
kfp-ui: 2.0/stable
kfp-viewer: 2.0/stable
kfp-viz: 2.0/stable
knative-eventing: 1.10/stable
knative-operator: 1.10/stable
knative-serving: 1.10/stable
kserve-controller: 0.11/stable
kubeflow-dashboard: 1.8/stable
kubeflow-profiles: 1.8/stable
kubeflow-roles: 1.8/stable
kubeflow-volumes: 1.8/stable
metacontroller-operator: 3.0/stable
minio: ckf-1.8/stable
mlmd: 1.14/stable
oidc-gatekeeper: ckf-1.8/stable
pvcviewer-operator: 1.8/stable
seldon-controller-manager: 1.17/stable
tensorboard-controller: 1.8/stable
tensorboards-web-app: 1.8/stable
training-operator: 1.7/stable

Relevant Log Output

Unit                          Workload     Agent  Address         Ports          Message
admission-webhook/0*          maintenance  idle   10.233.118.141                 Workload failed health check
kubectl logs admission-webhook-0 -n kubeflow
<info messages...>
2024-02-13T11:53:36.729Z [container-agent] 2024-02-13 11:53:36 ERROR juju-log Traceback (most recent call last):
2024-02-13T11:53:36.729Z [container-agent]   File "/var/lib/juju/agents/unit-admission-webhook-0/charm/venv/charmed_kubeflow_chisme/pebble/_update_layer.py", line 31, in update_layer
2024-02-13T11:53:36.729Z [container-agent]     container.replan()
2024-02-13T11:53:36.729Z [container-agent]   File "/var/lib/juju/agents/unit-admission-webhook-0/charm/venv/ops/model.py", line 1915, in replan
2024-02-13T11:53:36.729Z [container-agent]     self._pebble.replan_services()
2024-02-13T11:53:36.729Z [container-agent]   File "/var/lib/juju/agents/unit-admission-webhook-0/charm/venv/ops/pebble.py", line 1680, in replan_services
2024-02-13T11:53:36.729Z [container-agent]     return self._services_action('replan', [], timeout, delay)
2024-02-13T11:53:36.729Z [container-agent]   File "/var/lib/juju/agents/unit-admission-webhook-0/charm/venv/ops/pebble.py", line 1761, in _services_action
2024-02-13T11:53:36.729Z [container-agent]     raise ChangeError(change.err, change)
2024-02-13T11:53:36.729Z [container-agent] ops.pebble.ChangeError: cannot perform the following tasks:
2024-02-13T11:53:36.729Z [container-agent] - Start service "admission-webhook" (cannot start service: exited quickly with code 255)
2024-02-13T11:53:36.729Z [container-agent] ----- Logs from task 0 -----
2024-02-13T11:53:36.729Z [container-agent] 2024-02-13T11:53:36Z INFO Most recent service output:
2024-02-13T11:53:36.729Z [container-agent]     F0213 11:53:36.706006      14 config.go:46] config=main.Config{CertFile:"/etc/webhook/certs/cert.pem", KeyFile:"/etc/webhook/certs/key.pem"} Error: open /etc/webhook/certs/cert.pem: no such file or directory
2024-02-13T11:53:36.729Z [container-agent] 2024-02-13T11:53:36Z ERROR cannot start service: exited quickly with code 255
<...info messages>

Additional Context

No response

Integration CI "passes" even if charm goes to `Error`

#16 has integration tests that "pass" in github even though they show Error in the logs. It is expected that these tests should fail because the PR enables k8s 1.22 while the workload (still v1.4 at time of running) uses a resource that was deprecated in k8s 1.22.

I think this happens sometimes in other repos as well, possibly in kubeflow dashboards or profiles.

poddefaults go into wrong namespace

This charm supports a pod-defaults relation which causes it to create pod-default objects which kubeflow notebooks respects to, for example, inject environment variables into notebooks.

These pod-defaults get created in the kubeflow namespace (or whichever namespace the juju model is configured to use, I suppose), however they need to be created in the user namespace e.g. admin in order for them to take effect.

Related code:

def set_pod_spec(self, event):

Fix update status handler

Description

Fixes for update status handler:

  • we need to check that the unit is leader before checking status of the workload in refresh_status()
  • remove redundant logging in update status handler since we already do logging in refresh_status()

`mutatingwebhookconfiguration`/`validatingwebhookconfiguration` objects left behind after application removal

Sometimes (always?) when doing juju remove-application admission-webhook, mutatingwebhookconfiguration/validatingwebhookconfiguration objects are left on the cluster. This will block all pod creation because the webhooks will fail (as the services/pods they point to are removed). This can be seen in the kubernetes event logs as timeouts on calls to admission-webhook.kubeflow.org:

$ kubectl get events -n kubeflow
LAST SEEN   TYPE      REASON                  OBJECT                                                        MESSAGE
93s         Warning   FailedCreate            statefulset/kfp-api-operator                                  create Pod kfp-api-operator-0 in StatefulSet kfp-api-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.kubeflow.svc:443/apply-poddefault?timeout=30s": Service Unavailable
92s         Warning   FailedCreate            statefulset/kubeflow-volumes-operator                         create Pod kubeflow-volumes-operator-0 in StatefulSet kubeflow-volumes-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.kubeflow.svc:443/apply-poddefault?timeout=30s": Service Unavailable

Possible resolutions:

  • ensure juju tracks these objects (do they have the typical juju metadata needed to destroy with an application?)
  • handle removal of these in a charm remove hook
  • add parentage on these objects so if the primary admission-webhook objects go down so do these (although that could tear these down accidentally)

Make charm's images configurable in track/<last-version> branch

Description

The goal of this task is to make all images configurable so that when this charm is deployed in an airgapped environment, all image resources are pulled from an arbitrary local container image registry (avoiding pulling images from the internet).
This serves as a tracking issue for the required changes and backports to the latest stable track/* Github branch.

TL;DR

Mark the following as done when all charms have the following:

  • Required changes (in metadata.yaml, config.yaml, src/charm.py)
  • Required tools/get-images.sh script in place
  • Test on airgap environment
  • Publish to /stable

Required changes

WARNING: No breaking changes should be backported into the track/<version> branch. A breaking change can be anything that requires extra steps to refresh from the previous /stable other than just juju refresh. Please avoid at all costs these situations.

The following files have to be modified and/or verified to enable image configuration:

  • metadata.yaml - the container image(s) of the workload containers have to be specified in this file. This only applies to sidecar charms. Example:
containers:
  training-operator:
    resource: training-operator-image
resources:
  training-operator-image:
    type: oci-image
    description: OCI image for training-operator
    upstream-source: kubeflow/training-operator:v1-855e096
  • config.yaml - in case the charm deploys containers that are used by resource(s) the operator creates. Example:
apiVersion: v1
kind: ConfigMap
metadata:
  name: seldon-config
  namespace: {{ namespace }}
data:
  predictor_servers: |-
    {
        "TENSORFLOW_SERVER": {
          "protocols" : {
            "tensorflow": {
              "image": "tensorflow/serving", <--- this image should be configurable
              "defaultImageVersion": "2.1.0"
              },
            "seldon": {
              "image": "seldonio/tfserving-proxy",
              "defaultImageVersion": "1.15.0"
              }
            }
        },
...
  • tools/get-images.sh - is a bash script that returns a list of all the images that are used by this charm. In the case of a multi-charm repo, this is located at the root of the repo and gathers images from all charms in it.

  • src/charm.py - verify that nothing inside the charm code is calling a subprocess that requires internet connection.

Testing

  1. Spin up an airgap environment following canonical/bundle-kubeflow#682 and canonical/bundle-kubeflow#703 (comment)

  2. Build the charm making sure that all the changes for airgap are in place.

  3. Deploy the charms manually and observe the charm go to active and idle.

  4. Additionally, run integration tests or simulate them. For instance, creating a workload (like a PytorchJob, a SeldonDeployment, etc.).

Publishing

After completing the changes and testing, this charm has to be published to its stable risk in Charmhub. For that you must wait for the charm to be published to /edge, which is the revision to be promoted to /stable. Use the workflow dispatch for this (Actions>Release charm to other tracks...>Run workflow).

`MutatingWebhookConfiguration` conflict when upgrading from 1.6 to 1.7

the name of mutatingWebhookConfiguration resource that the admission-webhook charm creates has been changed from admission-webhook in 1.6 to admission-webhook-mutating-webhook-configuration in 1.7.
In the case of upgrading CKF 1.6 to 1.7, this leads to 2 mutatingWebhookConfiguration objects existing in the cluster for the same webhook. As a result, when creating a notebook, the webhook gets triggered twice from both mutatingWebhookConfiguration objects and the notebook gets stuck in FailedCreate with a conflict on the PodDefaults being applied to the notebook pod.

Admission webhook using incorrect certificate after restart

After restarting an ec2 instance with kubeflow deployment, admission webhook seems to be using uncorrect certificate.
This leads to issues with notebook servers: already existing ones can't be started and new ones can't be created due to error Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": failed to call webhook: Post "https://admission-webhook.kubeflow.svc:4443/apply-poddefault?timeout=10s": x509: certificate signed by unknown authority

image

The admission-webhook is throwing TLS handshake errors:

I0824 12:11:49.489999       1 main.go:616] About to start serving webhooks: &http.Server{Addr:":4443", Handler:http.Handler(nil), TLSConfig:(*tls.Config)(0xc0004c6480), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler)(nil), ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), BaseContext:(func(net.Listener) context.Context)(nil), ConnContext:(func(context.Context, net.Conn) context.Context)(nil), inShutdown:0, disableKeepAlives:0, nextProtoOnce:sync.Once{done:0x0, m:sync.Mutex{state:0, sema:0x0}}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}(nil), activeConn:map[*http.conn]struct {}(nil), doneChan:(chan struct {})(nil), onShutdown:[]func()(nil)}
2022/08/24 12:16:24 http: TLS handshake error from 172.31.73.87:38080: remote error: tls: bad certificate
2022/08/24 12:16:24 http: TLS handshake error from 172.31.73.87:34560: remote error: tls: bad certificate

It tries to connect to controller-service's api-server but it's unreachable after restart.

$ juju debug-log --replay --include admission-webhook
application-admission-webhook: 12:08:11 INFO juju.cmd running jujud [2.9.33 e83d2a73f904080c5cdf4aaed2821abd4f58253a gc go1.18.5]
application-admission-webhook: 12:08:11 DEBUG juju.cmd   args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=admission-webhook", "--debug"}
application-admission-webhook: 12:08:11 DEBUG juju.agent read agent config, format "2.0"
application-admission-webhook: 12:08:11 INFO juju.worker.upgradesteps upgrade steps for 2.9.33 have already been run.
application-admission-webhook: 12:08:11 INFO juju.cmd.jujud caas operator application-admission-webhook start (2.9.33 [gc])
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "clock" manifold worker started at 2022-08-24 12:08:11.040627232 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.introspection introspection worker listening on "@jujud-application-admission-webhook"
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "upgrade-steps-gate" manifold worker started at 2022-08-24 12:08:11.041395951 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.introspection stats worker now serving
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "caas-units-manager" manifold worker started at 2022-08-24 12:08:11.041710838 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "agent" manifold worker started at 2022-08-24 12:08:11.041746533 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "caas-units-manager" manifold worker completed successfully
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "caas-units-manager" manifold worker started at 2022-08-24 12:08:11.050878063 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "api-config-watcher" manifold worker started at 2022-08-24 12:08:11.052175801 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "upgrade-steps-flag" manifold worker started at 2022-08-24 12:08:11.053470761 +0000 UTC
application-admission-webhook: 12:08:11 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:11 DEBUG juju.worker.dependency "migration-fortress" manifold worker started at 2022-08-24 12:08:11.071644351 +0000 UTC
application-admission-webhook: 12:08:14 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:14 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:14 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:14 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:17 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:20 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:20 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:08:21 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:21 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:21 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:25 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:28 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:28 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:28 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:28 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:34 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:37 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:37 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:08:37 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:37 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:37 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:43 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:46 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:46 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:08:46 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:46 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: try was stopped
stack trace:
try was stopped
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:46 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: try was stopped
application-admission-webhook: 12:08:54 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:08:57 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:08:57 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:08:57 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:08:57 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:08:57 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:06 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:09:09 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:09 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:09:09 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:09:09 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:09:09 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:21 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:09:24 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:24 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:09:24 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:09:24 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:09:24 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:36 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:09:39 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:39 DEBUG juju.api no error, but not connected, probably cancelled before we started
application-admission-webhook: 12:09:39 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:09:39 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:09:39 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:54 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:09:57 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:09:57 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:09:57 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
stack trace:
github.com/juju/juju/api.(*addressProvider).next:803: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/api.recordTryError.func1:1057: 
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:09:57 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
application-admission-webhook: 12:10:15 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:10:15 DEBUG juju.api looked up controller-service.controller-my-controller.svc.cluster.local -> [10.152.183.228]
application-admission-webhook: 12:10:15 DEBUG juju.worker.apicaller [646c5e] failed to connect
application-admission-webhook: 12:10:15 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: unable to connect to API: dial tcp 10.152.183.228:17070: connect: connection refused
stack trace:
dial tcp 10.152.183.228:17070: connect: connection refused
github.com/juju/juju/api.gorillaDialWebsocket:737: 
github.com/juju/juju/api.dialer.dial1:1157: 
github.com/juju/juju/api.dialer.dial:1132: unable to connect to API
github.com/juju/juju/api.dialWebsocketMulti:1028: 
github.com/juju/juju/api.dialAPI:686: 
github.com/juju/juju/api.Open:218: 
github.com/juju/juju/worker/apicaller.connectFallback:161: 
github.com/juju/juju/worker/apicaller.OnlyConnect:58: 
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api
application-admission-webhook: 12:10:15 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: unable to connect to API: dial tcp 10.152.183.228:17070: connect: connection refused
application-admission-webhook: 12:10:38 DEBUG juju.worker.apicaller connecting with old password
application-admission-webhook: 12:10:38 DEBUG juju.api successfully dialed "wss://10.152.183.228:17070/model/646c5e31-e9b1-4440-8487-421d41eeec13/api"
application-admission-webhook: 12:10:38 INFO juju.api cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: operation was canceled
application-admission-webhook: 12:10:38 INFO juju.api connection established to "wss://10.152.183.228:17070/model/646c5e31-e9b1-4440-8487-421d41eeec13/api"
application-admission-webhook: 12:10:38 INFO juju.worker.apicaller [646c5e] "application-admission-webhook" successfully connected to "10.152.183.228:17070"
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "api-caller" manifold worker started at 2022-08-24 12:10:38.080910968 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "caas-units-manager" manifold worker completed successfully
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "caas-units-manager" manifold worker started at 2022-08-24 12:10:38.089304877 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "upgrader" manifold worker started at 2022-08-24 12:10:38.09049801 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "log-sender" manifold worker started at 2022-08-24 12:10:38.091603492 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "migration-minion" manifold worker started at 2022-08-24 12:10:38.091723727 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker started at 2022-08-24 12:10:38.091823131 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker completed successfully
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "migration-inactive-flag" manifold worker started at 2022-08-24 12:10:38.097327819 +0000 UTC
application-admission-webhook: 12:10:38 INFO juju.worker.caasupgrader abort check blocked until version event received
application-admission-webhook: 12:10:38 DEBUG juju.worker.caasupgrader current agent binary version: 2.9.33
application-admission-webhook: 12:10:38 INFO juju.worker.caasupgrader unblocking abort check
application-admission-webhook: 12:10:38 INFO juju.worker.migrationminion migration phase is now: NONE
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "charm-dir" manifold worker started at 2022-08-24 12:10:38.107536846 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.logger initial log config: "<root>=DEBUG"
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "logging-config-updater" manifold worker started at 2022-08-24 12:10:38.108163289 +0000 UTC
application-admission-webhook: 12:10:38 INFO juju.worker.logger logger worker started
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "api-address-updater" manifold worker started at 2022-08-24 12:10:38.108504014 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.dependency "proxy-config-updater" manifold worker started at 2022-08-24 12:10:38.108835072 +0000 UTC
application-admission-webhook: 12:10:38 DEBUG juju.worker.logger reconfiguring logging from "<root>=DEBUG" to "<root>=INFO"
application-admission-webhook: 12:10:38 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-admission-webhook: 12:10:38 INFO juju.worker.caasoperator.charm downloading ch:amd64/focal/admission-webhook-42 from API server
application-admission-webhook: 12:10:38 INFO juju.downloader downloading from ch:amd64/focal/admission-webhook-42
application-admission-webhook: 12:10:38 INFO juju.downloader download complete ("ch:amd64/focal/admission-webhook-42")
application-admission-webhook: 12:10:38 INFO juju.downloader download verified ("ch:amd64/focal/admission-webhook-42")
application-admission-webhook: 12:10:57 INFO juju.worker.caasoperator operator "admission-webhook" started
application-admission-webhook: 12:10:57 INFO juju.worker.caasoperator.runner start "admission-webhook/4"
application-admission-webhook: 12:10:57 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-admission-webhook-4
application-admission-webhook: 12:10:57 INFO juju.worker.leadership admission-webhook/4 promoted to leadership of admission-webhook
application-admission-webhook: 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4 unit "admission-webhook/4" started
application-admission-webhook: 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4 hooks are retried true
application-admission-webhook: 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4.charm downloading ch:amd64/focal/admission-webhook-42 from API server
application-admission-webhook: 12:10:57 INFO juju.downloader downloading from ch:amd64/focal/admission-webhook-42
application-admission-webhook: 12:10:57 INFO juju.downloader download complete ("ch:amd64/focal/admission-webhook-42")
application-admission-webhook: 12:10:57 INFO juju.downloader download verified ("ch:amd64/focal/admission-webhook-42")
application-admission-webhook: 12:11:18 INFO juju.worker.caasoperator.uniter.admission-webhook/4 found queued "upgrade-charm" hook
application-admission-webhook: 12:11:24 INFO unit.admission-webhook/4.juju-log Running legacy hooks/upgrade-charm.
application-admission-webhook: 12:11:30 WARNING unit.admission-webhook/4.upgrade-charm Generating RSA private key, 2048 bit long modulus (2 primes)
application-admission-webhook: 12:11:30 WARNING unit.admission-webhook/4.upgrade-charm .............+++++
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm .........................................................................................+++++
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm e is 65537 (0x010001)
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm Generating RSA private key, 2048 bit long modulus (2 primes)
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm ...........+++++
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm ................................................................................+++++
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm e is 65537 (0x010001)
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm Signature ok
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm subject=C = GB, ST = Canonical, L = Canonical, O = Canonical, OU = Canonical, CN = 127.0.0.1
application-admission-webhook: 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm Getting CA Private Key
application-admission-webhook: 12:11:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "upgrade-charm" hook (via hook dispatching script: dispatch)
application-admission-webhook: 12:11:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4 found queued "config-changed" hook
application-admission-webhook: 12:11:37 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "config-changed" hook (via hook dispatching script: dispatch)
application-admission-webhook: 12:11:37 INFO juju.worker.caasoperator started pod init on "admission-webhook/4"
application-admission-webhook: 12:15:42 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-admission-webhook: 12:21:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-admission-webhook: 12:26:02 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-admission-webhook: 12:30:38 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation ran "update-status" hook (via hook dispatching script: dispatch)

api-server logs

2022-08-24 12:10:38 INFO juju.apiserver.connection request_notifier.go:96 agent login: application-admission-webhook for 646c5e31-e9b1-4440-8487-421d41eeec13
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:10:38 INFO juju.apiserver.connection request_notifier.go:96 agent login: application-admission-webhook for 646c5e31-e9b1-4440-8487-421d41eeec13 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 INFO juju.cmd supercommand.go:56 running jujud [2.9.33 e83d2a73f904080c5cdf4aaed2821abd4f58253a gc go1.18.5] 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.cmd supercommand.go:57   args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=admission-webhook", "--debug"} 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.agent agent.go:603 read agent config, format "2.0" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 2.9.33 have already been run. 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 INFO juju.cmd.jujud caasoperator.go:204 caas operator application-admission-webhook start (2.9.33 [gc]) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "clock" manifold worker started at 2022-08-24 12:08:11.040627232 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.introspection worker.go:135 introspection worker listening on "@jujud-application-admission-webhook" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "upgrade-steps-gate" manifold worker started at 2022-08-24 12:08:11.041395951 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "caas-units-manager" manifold worker started at 2022-08-24 12:08:11.041710838 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "agent" manifold worker started at 2022-08-24 12:08:11.041746533 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.introspection worker.go:161 stats worker now serving 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:601 "caas-units-manager" manifold worker completed successfully 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "caas-units-manager" manifold worker started at 2022-08-24 12:08:11.050878063 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "api-config-watcher" manifold worker started at 2022-08-24 12:08:11.052175801 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "upgrade-steps-flag" manifold worker started at 2022-08-24 12:08:11.053470761 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:11 DEBUG juju.worker.dependency engine.go:578 "migration-fortress" manifold worker started at 2022-08-24 12:08:11.071644351 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:14 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:14 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:14 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:14 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:17 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:20 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:20 DEBUG juju.api apiclient.go:1137 no error, but not connected, probably cancelled before we started 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:21 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:21 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:21 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:25 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:28 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:28 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:28 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:28 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:34 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:37 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:08:37 DEBUG juju.api apiclient.go:1137 no error, but not connected, probably cancelled before we started 
[multiple same errors]
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:39 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:39 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:39 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:54 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:57 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:57 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:57 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:09:57 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: i/o timeout 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:15 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:15 DEBUG juju.api apiclient.go:806 looked up controller-service.controller-my-controller.svc.cluster.local -> [10.152.183.228] 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:15 DEBUG juju.worker.apicaller connect.go:160 [646c5e] failed to connect 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:15 DEBUG juju.worker.dependency engine.go:616 "api-caller" manifold worker stopped: [646c5e] "application-admission-webhook" cannot open api: unable to connect to API: dial tcp 10.152.183.228:17070: connect: connection refused
github.com/juju/juju/worker/apicaller.ManifoldConfig.startFunc.func1:97: [646c5e] "application-admission-webhook" cannot open api 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:15 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [646c5e] "application-admission-webhook" cannot open api: unable to connect to API: dial tcp 10.152.183.228:17070: connect: connection refused 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.apicaller connect.go:129 connecting with old password 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://10.152.183.228:17070/model/646c5e31-e9b1-4440-8487-421d41eeec13/api" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-my-controller.svc.cluster.local": lookup controller-service.controller-my-controller.svc.cluster.local: operation was canceled 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.api apiclient.go:688 connection established to "wss://10.152.183.228:17070/model/646c5e31-e9b1-4440-8487-421d41eeec13/api" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.apicaller connect.go:163 [646c5e] "application-admission-webhook" successfully connected to "10.152.183.228:17070" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "api-caller" manifold worker started at 2022-08-24 12:10:38.080910968 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:601 "caas-units-manager" manifold worker completed successfully 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "caas-units-manager" manifold worker started at 2022-08-24 12:10:38.089304877 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "upgrader" manifold worker started at 2022-08-24 12:10:38.09049801 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "log-sender" manifold worker started at 2022-08-24 12:10:38.091603492 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "migration-minion" manifold worker started at 2022-08-24 12:10:38.091723727 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "upgrade-steps-runner" manifold worker started at 2022-08-24 12:10:38.091823131 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:601 "upgrade-steps-runner" manifold worker completed successfully 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "migration-inactive-flag" manifold worker started at 2022-08-24 12:10:38.097327819 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.caasupgrader upgrader.go:128 current agent binary version: 2.9.33 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "charm-dir" manifold worker started at 2022-08-24 12:10:38.107536846 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.logger logger.go:65 initial log config: "<root>=DEBUG" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "logging-config-updater" manifold worker started at 2022-08-24 12:10:38.108163289 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.logger logger.go:120 logger worker started 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "api-address-updater" manifold worker started at 2022-08-24 12:10:38.108504014 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.dependency engine.go:578 "proxy-config-updater" manifold worker started at 2022-08-24 12:10:38.108835072 +0000 UTC 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 DEBUG juju.worker.logger logger.go:93 reconfiguring logging from "<root>=DEBUG" to "<root>=INFO" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 WARNING juju.worker.proxyupdater proxyupdater.go:282 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: "" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.worker.caasoperator.charm bundles.go:78 downloading ch:amd64/focal/admission-webhook-42 from API server 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.downloader download.go:110 downloading from ch:amd64/focal/admission-webhook-42 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.downloader download.go:93 download complete ("ch:amd64/focal/admission-webhook-42") 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:38 INFO juju.downloader download.go:173 download verified ("ch:amd64/focal/admission-webhook-42") 
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:10:52 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-admission-webhook" with version "v1beta1" 
2022-08-24 12:10:52 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-admission-webhook" with version "v1beta1"
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.caasoperator caasoperator.go:424 operator "admission-webhook" started 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.caasoperator.runner runner.go:556 start "admission-webhook/4" 
2022-08-24 12:10:57 INFO juju.worker.raft.raftforwarder target.go:174 claiming lease "646c5e31-e9b1-4440-8487-421d41eeec13:application-leadership#admission-webhook#" for "admission-webhook/4"
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:10:57 INFO juju.worker.raft.raftforwarder target.go:174 claiming lease "646c5e31-e9b1-4440-8487-421d41eeec13:application-leadership#admission-webhook#" for "admission-webhook/4" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-admission-webhook-4 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.leadership tracker.go:194 admission-webhook/4 promoted to leadership of admission-webhook 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4 uniter.go:326 unit "admission-webhook/4" started 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4 uniter.go:344 hooks are retried true 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.worker.caasoperator.uniter.admission-webhook/4.charm bundles.go:78 downloading ch:amd64/focal/admission-webhook-42 from API server 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.downloader download.go:110 downloading from ch:amd64/focal/admission-webhook-42 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.downloader download.go:93 download complete ("ch:amd64/focal/admission-webhook-42") 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:10:57 INFO juju.downloader download.go:173 download verified ("ch:amd64/focal/admission-webhook-42") 
2022-08-24 12:11:10 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-katib.kubeflow.org" with version "v1beta1"
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:11:10 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-katib.kubeflow.org" with version "v1beta1" 
2022-08-24 12:11:11 INFO juju.kubernetes.provider admissionregistration.go:249 ensuring validating webhook "kubeflow-katib.kubeflow.org" with version "v1beta1"
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:11:11 INFO juju.kubernetes.provider admissionregistration.go:249 ensuring validating webhook "kubeflow-katib.kubeflow.org" with version "v1beta1" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:18 INFO juju.worker.caasoperator.uniter.admission-webhook/4 resolver.go:149 found queued "upgrade-charm" hook 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:24 INFO unit.admission-webhook/4.juju-log server.go:316 Running legacy hooks/upgrade-charm. 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:30 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 Generating RSA private key, 2048 bit long modulus (2 primes) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:30 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 .............+++++ 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 .........................................................................................+++++ 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 e is 65537 (0x010001) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 Generating RSA private key, 2048 bit long modulus (2 primes) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 ...........+++++ 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 ................................................................................+++++ 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 e is 65537 (0x010001) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 Signature ok 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 subject=C = GB, ST = Canonical, L = Canonical, O = Canonical, OU = Canonical, CN = 127.0.0.1 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:31 WARNING unit.admission-webhook/4.upgrade-charm logger.go:60 Getting CA Private Key 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "upgrade-charm" hook (via hook dispatching script: dispatch) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4 resolver.go:149 found queued "config-changed" hook 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:37 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "config-changed" hook (via hook dispatching script: dispatch) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:11:37 INFO juju.worker.caasoperator initializer.go:116 started pod init on "admission-webhook/4" 
2022-08-24 12:12:37 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-admission-webhook" with version "v1beta1"
5353a87f-64ff-44da-8664-389bc91fcbf6: controller-0 2022-08-24 12:12:37 INFO juju.kubernetes.provider admissionregistration.go:59 ensuring mutating webhook "kubeflow-admission-webhook" with version "v1beta1" 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:15:42 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "update-status" hook (via hook dispatching script: dispatch) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:21:33 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "update-status" hook (via hook dispatching script: dispatch) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:26:02 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "update-status" hook (via hook dispatching script: dispatch) 
646c5e31-e9b1-4440-8487-421d41eeec13: application-admission-webhook 2022-08-24 12:30:38 INFO juju.worker.caasoperator.uniter.admission-webhook/4.operation runhook.go:146 ran "update-status" hook (via hook dispatching script: dispatch) 

Steps to reproduce

Deploy 1.6 bundle
Connect to dashboard
Create a notebook
Restart the instance
Start an existing notebook server or create a new one

Workaround

Restart the deployment:
kubectl rollout restart deployment/admission-webhook -n kubeflow

incorrect on_remove logs

the messages logged by the on_remove event handler are not right. They log Failed to remove K8s resources when removing CRDs, and vice versa.

admission workload doesn't restart after charm's reconfiguration.

Bug Description

Workload doesn't restart after charm's reconfiguration. This means that if the workload is misconfigured, it will restart once it hits the health check's threshold (around 3 and half minutes after deployment (threshold * (period + timeout)). But the workload "unhealthiness" will be visible to the user once an update-status is fired, which is 5 minutes after deployment. If then, a user reconfigures the workload, that won't have an effect to the actual workload, since the workload won't be restarted. Instead, it will keep its initial configuration and the workload will log health check failures trying to hit the previous port.

The opposite scenario also exposes the same issue: if its port configuration is modified after deploying the charm, this doesn't affect the charm and the charm remains healthy (considering that the charms is reconfigured in an inappropriate way). This shows that its workload isn't restarted and still uses the initial port provided.

Update-status

During udpate status, the charm's status will be set to Maintenance (Workload failed health check) and it will log the following which is inaccurate since the workload won't be restarted every time an update-status is received rather only when threshold is hit.

unit-admission-webhook-0: 14:54:48 ERROR unit.admission-webhook/0.juju-log Container admission-webhook failed health check. It will be restarted.

Questions

  1. Is pebble expected to restart the workload only once, when the health check failures' threshold is hit and not ever again? Asked about this in a matrix thread
  2. Should config-changed events restart the workload? I understand that yes since charm updates its layer. For comparison, observing oidc-gatekeeper behaviour, once its public-url is reconfigured, the workload is actually restarted. I 'm not how this may interact with its service_patch which is configured during init().
  3. What should we log?

To Reproduce

  • Deploy admission-webhook charm with its port misconfigured e.g. with argument --config port=3333
    juju deploy admission-webhook --channel latest/edge --trust --config port=3333  
    
  • Wait until the health check is down. This takes around 3 and a half minutes. You can view its checks with
    kubectl -n kubeflow exec admission-webhook-0 -c admission-webhook -- /charm/bin/pebble checks
    
  • Reconfigure its port
    juju config admission-webhook port=4443
    
  • Observe the workload logs

Environment

╰─$ microk8s version
MicroK8s v1.26.11 revision 6237

╰─$ juju version --all
version: 3.1.7-genericlinux-amd64
git-commit: 0cd207d999fef1fc8b965c410e9f58fafe7ee335
git-tree-state: archive

Relevant Log Output

╰─$ kubectl -n kubeflow logs admission-webhook-0 -c admission-webhook
2024-02-15T12:52:03.914Z [pebble] HTTP API server listening on ":38813".
2024-02-15T12:52:03.914Z [pebble] Started daemon.
2024-02-15T12:52:07.731Z [pebble] POST /v1/files 3.479094ms 200
2024-02-15T12:52:07.735Z [pebble] POST /v1/files 3.194301ms 200
2024-02-15T12:52:09.290Z [pebble] GET /v1/plan?format=yaml 176.341µs 200
2024-02-15T12:52:09.292Z [pebble] POST /v1/layers 390.801µs 200
2024-02-15T12:52:09.304Z [pebble] POST /v1/services 4.234492ms 202
2024-02-15T12:52:09.307Z [pebble] Service "admission-webhook" starting: /webhook
2024-02-15T12:52:09.327Z [admission-webhook] {"level":"info","ts":"2024-02-15T12:52:09Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
2024-02-15T12:52:09.327Z [admission-webhook] I0215 12:52:09.327074      14 main.go:771] About to start serving webhooks: &http.Server{Addr:":4443", Handler:http.Handler(nil), DisableGeneralOptionsHandler:false, TLSConfig:(*tls.Config)(0xc0006829c0), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler)(nil), ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), BaseContext:(func(net.Listener) context.Context)(nil), ConnContext:(func(context.Context, net.Conn) context.Context)(nil), inShutdown:atomic.Bool{_:atomic.noCopy{}, v:0x0}, disableKeepAlives:atomic.Bool{_:atomic.noCopy{}, v:0x0}, nextProtoOnce:sync.Once{done:0x0, m:sync.Mutex{state:0, sema:0x0}}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}(nil), activeConn:map[*http.conn]struct {}(nil), onShutdown:[]func()(nil), listenerGroup:sync.WaitGroup{noCopy:sync.noCopy{}, state:atomic.Uint64{_:atomic.noCopy{}, _:atomic.align64{}, v:0x0}, sema:0x0}}
2024-02-15T12:52:09.327Z [admission-webhook] {"level":"info","ts":"2024-02-15T12:52:09Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
2024-02-15T12:52:10.313Z [pebble] GET /v1/changes/1/wait?timeout=4.000s 1.008495385s 200
2024-02-15T12:52:12.305Z [pebble] GET /v1/plan?format=yaml 155.159µs 200
2024-02-15T12:52:39.292Z [pebble] Check "admission-webhook-up" failure 1 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:53:09.294Z [pebble] Check "admission-webhook-up" failure 2 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:53:39.293Z [pebble] Check "admission-webhook-up" failure 3 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:54:09.294Z [pebble] Check "admission-webhook-up" failure 4 (threshold 4): dial tcp [::1]:3333: connect: connection refused


2024-02-15T12:54:09.294Z [pebble] Check "admission-webhook-up" failure threshold 4 hit, triggering action
2024-02-15T12:54:09.294Z [pebble] Service "admission-webhook" on-check-failure action is "restart", terminating process before restarting
2024-02-15T12:54:09.297Z [pebble] Service "admission-webhook" exited after check failure, restarting
2024-02-15T12:54:09.297Z [pebble] Service "admission-webhook" on-check-failure action is "restart", waiting ~500ms before restart (backoff 1)
2024-02-15T12:54:09.826Z [pebble] Service "admission-webhook" starting: /webhook
2024-02-15T12:54:09.850Z [admission-webhook] {"level":"info","ts":"2024-02-15T12:54:09Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
2024-02-15T12:54:09.850Z [admission-webhook] I0215 12:54:09.850506      24 main.go:771] About to start serving webhooks: &http.Server{Addr:":4443", Handler:http.Handler(nil), DisableGeneralOptionsHandler:false, TLSConfig:(*tls.Config)(0xc0002871e0), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler)(nil), ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), BaseContext:(func(net.Listener) context.Context)(nil), ConnContext:(func(context.Context, net.Conn) context.Context)(nil), inShutdown:atomic.Bool{_:atomic.noCopy{}, v:0x0}, disableKeepAlives:atomic.Bool{_:atomic.noCopy{}, v:0x0}, nextProtoOnce:sync.Once{done:0x0, m:sync.Mutex{state:0, sema:0x0}}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}(nil), activeConn:map[*http.conn]struct {}(nil), onShutdown:[]func()(nil), listenerGroup:sync.WaitGroup{noCopy:sync.noCopy{}, state:atomic.Uint64{_:atomic.noCopy{}, _:atomic.align64{}, v:0x0}, sema:0x0}}
2024-02-15T12:54:09.850Z [admission-webhook] {"level":"info","ts":"2024-02-15T12:54:09Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
2024-02-15T12:54:39.293Z [pebble] Check "admission-webhook-up" failure 5 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:54:48.114Z [pebble] GET /v1/checks?names=admission-webhook-up 175.272µs 200
2024-02-15T12:55:03.823Z [pebble] GET /v1/plan?format=yaml 195.019µs 200
2024-02-15T12:55:09.294Z [pebble] Check "admission-webhook-up" failure 6 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:55:39.294Z [pebble] Check "admission-webhook-up" failure 7 (threshold 4): dial tcp [::1]:3333: connect: connection refused


# ran `juju config admission-webhook port=4443` here
2024-02-15T12:56:09.295Z [pebble] Check "admission-webhook-up" failure 8 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:56:39.294Z [pebble] Check "admission-webhook-up" failure 9 (threshold 4): dial tcp [::1]:3333: connect: connection refused
2024-02-15T12:56:44.325Z [pebble] GET /v1/checks?names=admission-webhook-up 82.388µs 200

Additional Context

No response

Missing aggregtaion ClusterRoles

since admission webhook was converted to sidecar, the application-specific ClusterRoles can now be moved from kubeflow-roles operator to admission-webhook operator.

pebble service cannot start with `Error: open /etc/webhook/certs/cert.pem: no such file or directory`

This issue is to document the investigation of the bug reported in #126

Description

the admission-webhook pebble service does not start, with the error:

config=main.Config{CertFile:"/etc/webhook/certs/cert.pem", KeyFile:"/etc/webhook/certs/key.pem"} Error: open /etc/webhook/certs/cert.pem: no such file or directory
2024-02-13T11:53:36.729Z [container-agent] 2024-02-13T11:53:36Z ERROR cannot start service: exited quickly with code 255

The webhook service needs to find the certificate files (cert.pem and key.pem) in the workload container directory /etc/webhook/certs.
When these cert files are not found, this puts the workload into an usable state, where the webhook service is not up. This is problematic because users cannot create Notebooks, as reported in the issue.

Case where the flow works correctly

In the charm, we upload the certificates to the container on pebble-ready event, so the event sequence for this to be successful is:

  • install -> leader_elected -> admission_webhook_pebble_ready -> config_changed
    the admission_webhook_pebble_ready handler is:
    def _on_pebble_ready(self, event):
    """Configure started container."""
    # upload certs to container
    self._upload_certs_to_container(event)
    # proceed with other actions
    self._on_event(event)

    where on_event method is:
    def _on_event(self, event, force_conflicts: bool = False) -> None:
    """Perform all required actions for the Charm.
    Args:
    force_conflicts (bool): Should only be used when need to resolved conflicts on K8S
    resources.
    """
    try:
    self._check_leader()
    self._apply_k8s_resources(force_conflicts=force_conflicts)
    update_layer(
    self._container_name,
    self._container,
    self._admission_webhook_layer,
    self.logger,
    )
    except ErrorWithStatus as err:
    self.model.unit.status = err.status
    self.logger.error(f"Failed to handle {event} with error: {err}")
    return
    self.model.unit.status = ActiveStatus()

and the config_changed handler is set to on_event

the key thing to observe here is when the first call of update_layer is, because that is when the service is attempted to start.
chisme's update_layer is where the Pebble service is getting started if the pebble service definition was updated, the method in chisme:

def update_layer(container_name: str, container: Container, new_layer: Layer, logger: Logger):
    """Updates the Pebble configuration layer if changed.

    Args:
        container_name (str): The name of the container to update layer.
        container (ops.model.Container): The container object to update layer.
        new_layer (ops.pebble.Layer): The layer object to be updated to the container.
        logger (logging.Logger): A logger to use for logging.
    """
    if not container.can_connect():
        raise ErrorWithStatus("Waiting for pod startup to complete", MaintenanceStatus)

    current_layer = container.get_plan()

    if current_layer.services != new_layer.services:          # HERE: this is the check referred to
        container.add_layer(container_name, new_layer, combine=True)
        try:
            logger.info("Pebble plan updated with new configuration, replanning")
            container.replan()
        except ChangeError:
            logger.error(traceback.format_exc())
            raise ErrorWithStatus("Failed to replan", BlockedStatus)

the code flow would be:

  1. pebble_ready handler executes -> certificates are pushed to container -> on_event is called -> update_layer first call so the service is started correctly
  2. config_changed handler executes on_event -> update_layer is called for the second time -> services equivalence check does not pass, so the service is not restarted

Cases where it can go wrong

Case 1

If update_layer gets called for the first time when the cert files are not yet uploaded, this can happen when the following conditions are true:

  1. any event that calls update_layer gets fired before pebble_ready
  2. container.can_connect() returns True, so the update_layer function does not raise an Error and proceeds to execute container.replan() which starts the service
  • We know that condition 1. is possible for charm execution, for example config_changed can happen before pebble_ready
  • For condition 2. to apply, this means that pebble_ready event was fired as well, BUT there is no guarantee that pebble_ready handler has executed, it is a possible race condition here.

Race condition

the error-prone scenario is like this:

  1. config_changed is fired
  2. config_changed handler execution starts
  3. before config_changed handler execution reaches the container.can_connect() check in update_layer, the container becomes ready
  4. pebble_ready event is queued, but the config_changed handler is in the middle of execution so it continues
  5. config_changed handler container.can_connect() check in update_layer returns True
  6. config_changed handler adds the layer and calls container.replan()
  7. the error is hit because the cert files are not found
  8. pebble_ready handler executes
  9. the service equivalence check if current_layer.services != new_layer.services: returns False because the service is the same
  10. pebble_ready handler skips adding the layer and container.replan() i.e. it does not restart the service

At this state, the charm is unable to recover from the error, because every handler will skip restarting the service.

Case 2

If the pebble_ready handler executes, but somehow the cert files are not uploaded.
Initially, we thought this is possible if the pebble push API was non-blocking, meaning that the charm continues to the next handler before the files have completed copying.
To eliminate this possibility, I tested with pushing files of different sizes to the container while calculating the time it took to push them. I observed that bigger files take more time to execute. Also, I added a check that pebble pull API does not raise an error right after the push call.
so concluding that this case is unlikely.

`MutatingWebhook` namespace selector missing, giving it wider scope than intended

Upstream Kubeflow applies a namespace selector in their manifests of:

  namespaceSelector:
    matchLabels:
      app.kubernetes.io/part-of: kubeflow-profile

which restricts the admission-webhook to apply only to namespaces which have been created by kubeflow-profiles for users. Currently, we have this webhook to apply to all pods in all namespaces. I think this configuration is why if we break our admission-webook workload (eg: delete the deployment, but do not delete the mutatingwebhook that uses the deployment), it blocks all pods on the cluster from deploying.

We can adopt upstream's configuration by changing our charm's mutatingwebhookconfiguration from:

"mutatingWebhookConfigurations": [
    {
        "name": "admission-webhook",
        "webhooks": [
            {
                "name": "admission-webhook.kubeflow.org",
                "failurePolicy": "Fail",
                "clientConfig": {
                    "caBundle": ca_bundle,
                    "service": {
                        "name": hookenv.service_name(),
                        "namespace": model,
                        "path": "/apply-poddefault",
                        "port": 4443,
                    },
                },
                "objectSelector": {
                    "matchExpressions": [
                        {
                            "key": "juju-app",
                            "operator": "NotIn",
                            "values": ["admission-webhook"],
                        },
                        {
                            "key": "app.kubernetes.io/name",
                            "operator": "NotIn",
                            "values": ["admission-webhook"],
                        },
                        {
                            "key": "juju-operator",
                            "operator": "NotIn",
                            "values": ["admission-webhook"],
                        },
                        {
                            "key": "operator.juju.is/name",
                            "operator": "NotIn",
                            "values": ["admission-webhook"],
                        },
                    ]
                },
                "rules": [
                    {
                        "apiGroups": [""],
                        "apiVersions": ["v1"],
                        "operations": ["CREATE"],
                        "resources": ["pods"],
                    }
                ],
            },
        ],
    }
],

to:

"mutatingWebhookConfigurations": [
    {
        "name": "admission-webhook",
        "webhooks": [
            {
                "name": "admission-webhook.kubeflow.org",
                "failurePolicy": "Fail",
                "clientConfig": {
                    "caBundle": ca_bundle,
                    "service": {
                        "name": hookenv.service_name(),
                        "namespace": model,
                        "path": "/apply-poddefault",
                        "port": 4443,
                    },
                },
                "namespaceSelector": {
                    "matchLabels": {
                        "app.kubernetes.io/part-of": "kubeflow-profile",
                    },
                },
                "rules": [
                    {
                        "apiGroups": [""],
                        "apiVersions": ["v1"],
                        "operations": ["CREATE"],
                        "resources": ["pods"],
                    }
                ],
            },
        ],
    }
],

(remove objectSelector, add namespaceSelector). Strictly speaking, the objectSelector could stay, but I believe it is unnecessary with the namespaceSelector added.

To test this:

  • juju deploy admission-webhook
  • create a namespace (note that if the selectors are changed to a namespace selector as described above, we must have the correct label in the namespace metadata)
apiVersion: v1
kind: Namespace
metadata:
  name: user
  labels:
    app.kubernetes.io/part-of: kubeflow-profile
  • create a PodDefault, for example (the content of this poddefault doesn't matter - this was taken from our kfp charm):
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-ml-pipeline
  namespace: user
spec:
  desc: Allow access to Kubeflow Pipelines
  selector:
    matchLabels:
      access-ml-pipeline: "true"
  volumes:
    - name: volume-kf-pipeline-token
      projected:
        sources:
          - serviceAccountToken:
              path: token
              expirationSeconds: 7200
              audience: pipelines.kubeflow.org      
  volumeMounts:
    - mountPath: /var/run/secrets/kubeflow/pipelines
      name: volume-kf-pipeline-token
      readOnly: true
  env:
    - name: KF_PIPELINES_SA_TOKEN_PATH
      value: /var/run/secrets/kubeflow/pipelines/token
  • create a pod with the above cited label of access-ml-pipeline: "true":
apiVersion: v1
kind: Pod
metadata:
  labels:
    access-ml-pipeline: "true"
  name: testpod
  namespace: user
spec:
  containers:
  - args:
    - "while true; do sleep 3600; done"
    command: ["/bin/bash", "-c", "--"]
    image: ubuntu:latest
    imagePullPolicy: Always
    name: ubuntu

exec a shell in the above pod (kubectl exec -it testpod -- bash) and confirm that /var/run/secrets/kubeflow/pipelines/token exists

Something like the procedure above should be added as an integration test so we confirm PodDefaults actually work correctly

kubeflow deploy deadlocks after deploying admission-webhook

Running the following script:
https://github.com/mlopsworks/charms/blob/main/integration_test_microk8s

On Ubuntu 20.10, I get a kubeflow setup that's stuck:

image (5)

kubectl get events -A

is full of:

controller-micro   4m23s       Warning   FailedCreate              statefulset/argo-controller-operator                          create Pod argo-controller-operator-0 in StatefulSet argo-controller-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.controller.svc:443/apply-poddefault?timeout=30s": service "admission-webhook" not found
controller-micro   3m55s       Warning   FailedCreate              statefulset/dex-auth-operator                                 create Pod dex-auth-operator-0 in StatefulSet dex-auth-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.controller.svc:443/apply-poddefault?timeout=30s": service "admission-webhook" not found
controller-micro   3m53s       Warning   FailedCreate              statefulset/istio-ingressgateway-operator                     create Pod istio-ingressgateway-operator-0 in StatefulSet istio-ingressgateway-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.controller.svc:443/apply-poddefault?timeout=30s": service "admission-webhook" not found
controller-micro   3m50s       Warning   FailedCreate              statefulset/istio-pilot-operator                              create Pod istio-pilot-operator-0 in StatefulSet istio-pilot-operator failed error: Internal error occurred: failed calling webhook "admission-webhook.kubeflow.org": Post "https://admission-webhook.controller.svc:443/apply-poddefault?timeout=30s": service "admission-webhook" not found

the admission-webhook pod seems fine:

luke@mind:~/pc/charms$ kubectl logs admission-webhook-5bb76f79f7-7k5jk -n controller-micro
I0521 15:25:54.607026       1 main.go:552] About to start serving webhooks: &http.Server{Addr:":443", Handler:http.Handler(nil), TLSConfig:(*tls.Config)(0xc0001e1b00), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler)(nil), ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), disableKeepAlives:0, inShutdown:0, nextProtoOnce:sync.Once{m:sync.Mutex{state:0, sema:0x0}, done:0x0}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}(nil), activeConn:map[*http.conn]struct {}(nil), doneChan:(chan struct {})(nil), onShutdown:[]func()(nil)}

there is an admission-webhook service...

luke@mind:~/pc/charms$ kubectl get services -A|grep admission-webhook
controller-micro   admission-webhook-operator           ClusterIP   10.152.183.69    <none>        30666/TCP                17m
controller-micro   admission-webhook                    ClusterIP   10.152.183.238   <none>        443/TCP                  16m

so I'm not sure why k8s is not finding it!

testing in air-gapped: TLS handshake error

Bug Description

when deploying admission-webhook charm in an air-gapped environment, the charm goes to active, but the workload container has logs of TLS handshake errors. This indicates that the certificates configured in the MutatingWebhookConfiguration cannot be verified, and the charm will not act as expected in the bundle.

To Reproduce

After setting up the air-gapped environment, go the directory where the charm file is, and deploy the charm with --resource oci-image set to the image in the local registry.
juju deploy ./admission-webhook_98aac65.charm --resource oci-image=172.17.0.2:5000/kubeflownotebookswg/poddefaults-webhook:v1.7.0 --trust

Environment

following the script in canonical/bundle-kubeflow#682 (comment)

Relevant Log Output

microk8s kubectl logs admission-webhook-0 -c admission-webhook -nkubeflow
2023-08-29T14:09:04.466Z [pebble] HTTP API server listening on ":38813".
2023-08-29T14:09:04.466Z [pebble] Started daemon.
2023-08-29T14:09:15.820Z [pebble] POST /v1/files 5.123433ms 200
2023-08-29T14:09:15.824Z [pebble] POST /v1/files 3.933732ms 200
2023-08-29T14:09:16.800Z [pebble] GET /v1/plan?format=yaml 302.664µs 200
2023-08-29T14:09:16.802Z [pebble] POST /v1/layers 415.125µs 200
2023-08-29T14:09:16.811Z [pebble] POST /v1/services 4.067584ms 202
2023-08-29T14:09:16.815Z [pebble] Service "admission-webhook" starting: /webhook
2023-08-29T14:09:16.832Z [admission-webhook] I0829 14:09:16.832366      13 main.go:768] About to start serving webhooks: &http.Server{Addr:":4443", Handler:http.Handler(nil), TLSConfig:(*tls.Config)(0xc000496480), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler)(nil), ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), BaseContext:(func(net.Listener) context.Context)(nil), ConnContext:(func(context.Context, net.Conn) context.Context)(nil), inShutdown:0, disableKeepAlives:0, nextProtoOnce:sync.Once{done:0x0, m:sync.Mutex{state:0, sema:0x0}}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}(nil), activeConn:map[*http.conn]struct {}(nil), doneChan:(chan struct {})(nil), onShutdown:[]func()(nil)}
2023-08-29T14:09:17.825Z [pebble] GET /v1/changes/1/wait?timeout=4.000s 1.01415687s 200
2023-08-29T14:09:19.456Z [pebble] GET /v1/plan?format=yaml 367.09µs 200
2023-08-29T14:09:46.804Z [admission-webhook] 2023/08/29 14:09:46 http: TLS handshake error from [::1]:36654: EOF
2023-08-29T14:10:16.803Z [admission-webhook] 2023/08/29 14:10:16 http: TLS handshake error from [::1]:37596: EOF
2023-08-29T14:10:46.803Z [admission-webhook] 2023/08/29 14:10:46 http: TLS handshake error from [::1]:41522: EOF
2023-08-29T14:11:00.597Z [pebble] GET /v1/plan?format=yaml 189.604µs 200
2023-08-29T14:11:05.386Z [pebble] GET /v1/checks 107.768µs 200
2023-08-29T14:11:16.803Z [admission-webhook] 2023/08/29 14:11:16 http: TLS handshake error from [::1]:44022: EOF
2023-08-29T14:11:35.521Z [pebble] GET /v1/plan?format=yaml 166.75µs 200
2023-08-29T14:11:41.220Z [pebble] GET /v1/checks 43.539µs 200
2023-08-29T14:11:46.804Z [admission-webhook] 2023/08/29 14:11:46 http: TLS handshake error from [::1]:33436: EOF
2023-08-29T14:12:16.803Z [admission-webhook] 2023/08/29 14:12:16 http: TLS handshake error from [::1]:46958: EOF
2023-08-29T14:12:46.803Z [admission-webhook] 2023/08/29 14:12:46 http: TLS handshake error from [::1]:34576: EOF
2023-08-29T14:13:16.804Z [admission-webhook] 2023/08/29 14:13:16 http: TLS handshake error from [::1]:41790: EOF
2023-08-29T14:13:46.802Z [admission-webhook] 2023/08/29 14:13:46 http: TLS handshake error from [::1]:53510: EOF
2023-08-29T14:13:51.783Z [pebble] GET /v1/checks?names=admission-webhook-up 52.829µs 200
2023-08-29T14:14:16.803Z [admission-webhook] 2023/08/29 14:14:16 http: TLS handshake error from [::1]:42772: EOF
2023-08-29T14:14:46.803Z [admission-webhook] 2023/08/29 14:14:46 http: TLS handshake error from [::1]:55118: EOF
2023-08-29T14:15:16.802Z [admission-webhook] 2023/08/29 14:15:16 http: TLS handshake error from [::1]:49968: EOF
2023-08-29T14:15:46.803Z [admission-webhook] 2023/08/29 14:15:46 http: TLS handshake error from [::1]:42028: EOF
2023-08-29T14:16:16.802Z [admission-webhook] 2023/08/29 14:16:16 http: TLS handshake error from [::1]:54382: EOF
2023-08-29T14:16:46.803Z [admission-webhook] 2023/08/29 14:16:46 http: TLS handshake error from [::1]:60010: EOF
2023-08-29T14:17:16.803Z [admission-webhook] 2023/08/29 14:17:16 http: TLS handshake error from [::1]:52832: EOF
2023-08-29T14:17:46.803Z [admission-webhook] 2023/08/29 14:17:46 http: TLS handshake error from [::1]:60104: EOF
2023-08-29T14:18:16.803Z [admission-webhook] 2023/08/29 14:18:16 http: TLS handshake error from [::1]:37882: EOF
2023-08-29T14:18:46.803Z [admission-webhook] 2023/08/29 14:18:46 http: TLS handshake error from [::1]:36522: EOF
2023-08-29T14:18:47.005Z [pebble] GET /v1/checks?names=admission-webhook-up 47.519µs 200
2023-08-29T14:19:16.803Z [admission-webhook] 2023/08/29 14:19:16 http: TLS handshake error from [::1]:39050: EOF

Admission Webhook charm rewrite using sidecar pattern

Admission Webhook charm rewrite using sidecar pattern

Work items are tracked in Jira

Design

Design is specified in corresponding spec.

Main design points summary:

  • Use of sidecar pattern and Pebble layer.
  • Porting of managing K8S resources from Pod Spec to Pebble framework.
  • Pebble layer checks on admission-webhook service
  • Handle upgrade and update status events
  • Handle and test on remove event

Testing

  • Integration testing allows for verification of functionality.
  • Created unit tests to cover charm functionality.

latest/edge charm stuck in maintenance with `Workload failed health check`

Bug Description

there's an issue with the latest/edge charm where it gets stuck in Maintenance with the message Workload failed health check, it came up in this run of kubeflow-profiles-operator CI

To Reproduce

juju deploy admission-webhook --channel=latest/edge --trust

Environment

microk8s 1.25-strict/stable
juju 3.1/stable

Relevant Log Output

juju ssh admission-webhook/0

# PEBBLE_SOCKET=/charm/containers/admission-webhook/pebble.socket /charm/bin/pebble checks
Check                 Level  Status  Failures
admission-webhook-up  -      down    33/4
# PEBBLE_SOCKET=/charm/containers/admission-webhook/pebble.socket /charm/bin/pebble changes
ID   Status  Spawn               Ready               Summary
1    Error   today at 09:15 UTC  today at 09:15 UTC  Replan service "admission-webhook"

# PEBBLE_SOCKET=/charm/containers/admission-webhook/pebble.socket /charm/bin/pebble tasks 1
Status  Spawn               Ready               Summary
Error   today at 09:15 UTC  today at 09:15 UTC  Start service "admission-webhook"

......................................................................
Start service "admission-webhook"

2023-09-12T09:15:21Z INFO Most recent service output:
    F0912 09:15:21.267022      14 config.go:46] config=main.Config{CertFile:"/etc/webhook/certs/cert.pem", KeyFile:"/etc/webhook/certs/key.pem"} Error: open /etc/webhook/certs/cert.pem: no such file or directory
2023-09-12T09:15:21Z ERROR cannot start service: exited quickly with code 255

Additional Context

No response

Missing pod-defaults interface

pod-defaults interface was unintentionally removed in #68
came across this when trying to integrate with mlflow:

juju relate mlflow-server:pod-defaults admission-webhook:pod-defaults
ERROR application "admission-webhook" has no "pod-defaults" relation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.