Comments (2)
@smokfyz For k3s auto upgrades, nothing happens basically, the binary is replaced without even kube going down. Now for node upgrades, that's more invasive. Node upgrades provoke the nodes to be drained and uncordoned from the cluster for reboot, one after another, so super important to be in HA.
This I guess would be problematic unless you have longhorn correctly setup in an HA fashion, if you are in HA it should be seemless IMHO as longhorn would have distributed the data in duplicates across the cluster. Same for the statefulset it should not be a problem if configured in an HA fashion. But I must admit am not an expert in working with them.
@kube-hetzner/core Please correct me if I'm wrong π
from terraform-hcloud-kube-hetzner.
Here's what I was able to get out of GPT-4:
Node upgrades and the management of StatefulSets in a High Availability (HA) Kubernetes cluster, particularly with automatic node upgrades, present a complex but manageable scenario. Here are some thoughts and considerations:
-
Importance of HA in Node Upgrades: In an HA cluster, automatic node upgrades are less likely to cause service disruptions. HA setups typically involve multiple replicas of nodes and pods, ensuring that even if one node is taken down for an upgrade, the others can continue to handle the workload. This redundancy is crucial for maintaining service availability.
-
Challenges with StatefulSets: StatefulSets, which are used for managing stateful applications (like databases), have their own complexities. They maintain a sticky identity for each of their pods. During node upgrades, care must be taken to ensure that the state (like persistent data) is not lost or corrupted. This requires a well-thought-out data replication and backup strategy.
-
Graceful Handling of Pod Eviction: During automatic node upgrades, pods are evicted as nodes are drained. For StatefulSets, Kubernetes tries to reschedule the pods to other nodes while maintaining their identity and connection to the right data volumes. Ensuring that other nodes have the capacity and configuration to accept these pods is essential.
-
Storage Considerations with Longhorn or Similar Solutions: If you're using a distributed storage solution like Longhorn, it's crucial to ensure that your storage layer is also highly available and resilient to node failures. Data should be replicated across multiple nodes so that if one node goes down during an upgrade, the data remains accessible.
-
Testing and Monitoring: Before implementing automatic upgrades in production, itβs vital to test the process in a staging environment. Monitoring tools should be in place to quickly identify and address any issues that arise during the upgrade process.
-
Version Compatibility and Rollback Plans: Ensure that the new versions of Kubernetes or other components are compatible with your existing workloads. Have a rollback plan in case the upgrade leads to unexpected issues.
-
Coordination and Scheduling: While automatic upgrades are convenient, it might be beneficial to have some control over the scheduling to ensure upgrades occur during low-traffic periods, thereby minimizing potential impacts.
In summary, while automatic node upgrades in an HA Kubernetes cluster offer many benefits in terms of reducing manual overhead and ensuring up-to-date systems, they require careful planning, robust infrastructure setup (especially for stateful applications), and thorough testing to ensure smooth operations without significant service disruptions.
from terraform-hcloud-kube-hetzner.
Related Issues (20)
- [Bug]: When specifing custom namespace for ingress (traefik) controller, hanging waiting for deployment HOT 1
- [Bug]: Stuck at "waiting for the condition on deployments/system-upgrade-controller" (cilium pod stuck) HOT 5
- [Bug]: "waiting for the k3s server to start" HOT 14
- Fail to write to /var/post_install/kustomization.yaml HOT 5
- Issue creating snapshots HOT 16
- Placement group contains already 10 servers HOT 24
- [Bug]: Timeout waiting for system-upgrade-controller HOT 7
- [Feature Request]: Outputs for Kubernetes Terraform provider HOT 9
- When ingress_controller = "none" a Traefik instance is still deployed and running on nodes HOT 4
- Cloud Volume Fails To Mount With RWX HOT 6
- Invalid SSH identity files HOT 1
- Make it possible to run kured on ARM nodes or allow disabling kured (to install it externally from helm) HOT 1
- Remove lock ttl from the kured defaults HOT 1
- Creation of 2 load balancers, and only 1 used. HOT 1
- [Feature Request]: Support longhorn_volume_size also for control plane nodes
- Allow Hetzner Volume to be reused by a different node/pod when another pod is terminating HOT 4
- [Bug]: Nodes unable to start/connect after fresh creation HOT 4
- [Request]: can multiple kubernetes services share hetzner's loadbalancer through annotations? HOT 1
- [Bug]: kured wrong sentinel path HOT 5
- Limit concurrency on machines updated in parallel HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from terraform-hcloud-kube-hetzner.