Comments (4)
I should note that
- This affects ALL nodes in the cluster
- I have restarted microk8s on all nodes using
sudo snap restart microk8s
, but it did not fix anything
from microk8s.
Ok, so I managed to isolate the corrupted deployment configuration. Somehow there is a corrupted protocol buffer in the dqlite
database.
Isolate the corrupted service
On any of the nodes, run
sudo /snap/microk8s/current/bin/dqlite \
--cert /var/snap/microk8s/current/var/kubernetes/backend/cluster.crt \
--key /var/snap/microk8s/current/var/kubernetes/backend/cluster.key \
--servers file:////var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml \
k8s
Then in dqlite
run
dqlite> select name from kine where name like '%deployments/default%';
I then copied the deployment names, dropped them in to Sublime text, and created a script with a bunch of lines that look like this:
echo "search" && microk8s kubectl get deployments/search-worker -o yaml | grep "apiVersion:"
This will error on the specific deployment that is causing the problem, and print apiVersion: apps/v1
for everything else.
View the configuration
Back in dqlite
, grab the BLOB data for that particular bad registry entry, and the BLOB data for a good record while you're at it. This data is stored as an ASCII buffer.
The bad record's data starts with 107 56 115 0 10 21 10 7 97 112 112 115 16 118 49
, the latter part of which reads as apps\x10v1
.
The good record's data starts with 107 56 115 0 10 21 10 7 97 112 112 115 47 118 49
, the latter part of which reads as apps/v1
, which is what we want.
There doesn't appear to be any other corruption in here, but even if there is, it's this first part of this protocol buffer that I need to fix. Then I can just delete and recreate the deployment through the API as expected.
Basically, I either need to patch that 16
with a 47
in the dqlite
database, or find a way to remove that Registry entry. However, I'm not sure how to do this in a way where the change will propagate to the other nodes like it's supposed to.
from microk8s.
Explicitly deleting that record in the dqlite
database unstuck the deployment lifecycle across the entire cluster, and things are now back in working order.
However, someone from the microk8s team should look into this, since it feels very wrong to me that a protocol buffer that has been corrupted should ever find its way into the dqlite database. Especially if this corruption results in completely knocking out basic reliability/recovery functionality.
from microk8s.
Basically, the root cause here seems to be the dqlite
record being persisted with a resource type + version combination that does not exist in kubectl api-resources
.
Feels like the solution here is two-fold
- the resource
KIND
+APIVERSION
combination should be validated prior to persistence apiserver
should be updated to be more resilient to record corruption like this. Just because a singledeployment
record could not be read should not prevent things likelist
commands from succeeding.
I don't know if microk8s has its own apiserver
implementation, or if this issue really belongs in the Kubernetes mainline, but a single corrupted byte in a single record in the dqlite
database shouldn't have such an outsized effect on the platform.
from microk8s.
Related Issues (20)
- When Worker or Master Nodes gets shutdown they stuck at NotReady HOT 2
- MicroK8s Homebrew installation fails on macOS Sonoma 14.5 HOT 2
- MicroK8s in Raspberry Pi 5 with Raspberry Pi OS 64 Bit - node not ready HOT 1
- LaunchConfiguration Documentation for join is not correct or misleading HOT 1
- microk8s add-node fails (without error) HOT 1
- ipset v7.11: Kernel and userspace incompatible
- Microk8s Cluster Tries to Pull Image from Insecure Private Registry With HTTPS HOT 1
- Can not run ping with non-root user in a pod of the microk8s
- Microk8s is not running, but some pods does
- 2 nodes goes into NotReady state when 1 node goes to NotReady state in a HA cluster. HOT 19
- MicroK8s 1.28.12 after node restart it doesn't join due to nf_conntrack HOT 2
- MicoK8s issues with Ubuntu 24.04 LTS HOT 1
- Unable to access the Prometheus metrics of kube-scheduler and kube-control-manager in microk8s cluster HOT 6
- max_container_log_line_size should be tunable
- Cryptographic API Misuse Vulnerability HOT 1
- Cannot join on IPv6 address
- Server with `--node-ip=w.x.y.z,foo:bar::baz` specified gets replaced with just the IPv4 address
- Unable to fetch etcd prometheus metrics with multi node microk8s cluster
- worker node doesn't come up after reboot, logs full of `command [/snap/microk8s/7040/microk8s-enable.wrapper ingress] failed with exit code 1: exit status 1` HOT 1
- Add label app.kubernetes.io/part-of when enabling addons HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from microk8s.