Comments (14)
Personally I like the idea to use bird
. And with the experience we got provisioning the /etc/network/interfaces
to persist the elastic IP on the host I am confident we can make it.
Something that it does not support I presume is the situation where: the host is up but the api-server or something not host-related is down that prevents the control-plane to work in the right way. During this scenario, bird
won't re-allocate the IP
In general, if we go with BGP support I would like to add a field as part of the PacketCluster to make it optional. Something like: PacketCluster.elasticIPRecoveryStrategy=none|bgp
. Currently, this is not a smooth process because cluster-template
is not that flexible (we need to configure or not configure BGP/bird looking at that field).
I will have a chat with people from cluster-api to see if they have API tips for me.
from cluster-api-provider-packet.
There is a guide to running bird for bgp on packet here
The ccm deploys (well, almost; it is disabled but will be enabled as an option in the next week or so) MetalLB to create load balancers for services of type=LoadBalancer
. We could approach the metallb folks with an option to have it be able to deploy for other things. It would be nice to have something k8s-native do this part as well.
from cluster-api-provider-packet.
I gather some information about how we can make this to work and this is what I got:
-
We can use keepalived/watchdog and bird via BGP.
- PRO: It uses BGP and the foundation we provide. I think it is a good way to showcase how to use those technologies.
- CONS: metallb and bird will use the same BGP "channel" (not sure if it is called in this way) and this will make BGP unhappy. So they can't both run in the control plane. @c0dyhi11 can probably elaborate
- NOTE: keepalived, watchdog, BPG are new for me, and even Cody is not 100% sure we make it to work, but 99% it is a YES! So this is a small CONS.
-
kube-vip
- CONS: It wonβt work as it is L2 via arp
- PRO: Dan (maintainer) told me that it won't take too much to make it to work with no L2.
- NOTE: never used it :D
-
metallb control plane: the idea is to use k8s addon to spin up during
kubeadm init
a static pod for metallb that has to be removed as soon as some worker joined the cluster.- PRO: at the end of the process you have a metallb and a cluster up and running. Pretty clean
- CONS: the process of deploying a static pod, watching its state and remove it looks a bit flaky and it requires to ship a dedicated program (it can be a container by itself)
-
Control loop that runs as part of the cluster-api-provider process
- CONS: it is not really a standard approach and it has to be maintained
- CONS: I didn't find a way to leverage the client-go and kubernetes to write this loop in a way to is scalable and reliable as the Shared Informers are.
- PRO: not that hard to write
- NOTE: I know Go
from cluster-api-provider-packet.
Commenting on each of them in turn:
We can use keepalived/watchdog and bird via BGP
What this does is reinvent metallb, somewhat outside of the k8s native structure. The distinct upside is that all of the control nodes are active at once, which is ideal. No switchover time. I would want to compare this to just running metallb on the control plane nodes. If the latter works, it obviously is easier to manage.
kube-vip
I think it is worth a conversation with @thebsdbox how that would work. It isn't just a question of choosing to enable/disable the IP on the various nodes, but also a question of informing the upstream router that a particular server will be able to handle that IP. Since all of Packet is L3, the upstream has to know which ones have it. That usually is done via BGP, which brings is back to the original conflict question.
metallb control plane
If we can get this to work, it would be great. However, there might be the same conflicts. The question is, why would we want to remove the metallb pod once a worker has spun up? Why can we not leave it there?
Also remember that the metallb config can have different peers with different selectors (which we make use of), and different address pools with different selectors as well as restricted address pools. It might be possible to utilize a single metallb deployment for all of this.
The other question is bootstrapping. Most bootstrapping mechanisms (kubeadm, k3s, by extension cluster-api itself) need the IP to provide to each node as it bootstraps. If we can get it to work correctly that way, it will be pretty good.
control loop in cluster-api process
The ccm already has control loops in goroutines to do exactly this (but for load balancer allocation), so it wouldn't be too hard. I probably would do it in ClusterReconciler.SetupWithManager().
A fifth potential approach (combining some of those) is to do it in the CCM, where we already have control loops and are modifying metallb configs. I am not 100% sure that moving this down a layer, though, makes sense.
from cluster-api-provider-packet.
A fifth potential approach (combining some of those) is to do it in the CCM, where we already have control loops and are modifying metallb configs. I am not 100% sure that moving this down a layer, though, makes sense.
It looks like having this logic as part of the CCM will allow Kubernetes HA to work even without cluster-api. That is probably a good benefit
I agree, having metallb working sounds ideal
from cluster-api-provider-packet.
I see two problems with the "control loop in CAPP" solution. I am not as concerned with setting up a separate loop, as we can have a goroutine with a timer to keep an eye on things. Not perfect, but it works. My concerns are:
- The control plane now depends on CAPP running to work in the case of control plane failure. Normally, you rely on CAPP only during changes, so the level of reliability can be lower. Now, you need to depend on it for close to normal running. It isn't awful - we do rely on the
MachineDeployment
to control the number of machines - but it is less than perfect. - It would have to be a timer loop, which means we might not know something changed.
The key question now is, what is the fastest way to do this? I suspect metallb, in or outside of CCM, but I am not sure. Thoughts @gianarb?
from cluster-api-provider-packet.
I have the same question, that's why I didn't have an answer yet. When you say I suspect metallb, in or outside of CCM
you mean the addon
static pod? Because it does not look like something quick to do if we have to manage its lifecycle (we have to build, ship a new container/binary)
from cluster-api-provider-packet.
Looking at this conundrum i've seen it approached in a few different ways. I think the most simple method is something like a daemonset
that is tagged to run on control plane nodes. These containers use client-go
for leader election and use the GO BGP client to inform of whom is leader on a new election event.
from cluster-api-provider-packet.
Thanks for hopping in @thebsdbox
Why does there need to be a leader? Like with an ELB, it can connect to multiple nodes at once, as long as unreachable nodes (apiserver is down) are not publishing.
In any case, is that easier than doing metallb?
from cluster-api-provider-packet.
Ah apologies I didn't fully understand the plan I guess. Perhaps a better alternative would be then to use a client-go
watcher for nodes (w/control plane label
) and of the node health. I'd still use leaderElection
so there is a single source of truth and then have that update an ELB api or send BGP client broadcasts. I don't know the metallb
source code but I think it uses a number of components including a "broadcaster" so it might be a bit more challenging to engineer it into this solution.
from cluster-api-provider-packet.
didn't fully understand the plan I guess
That is the problem, there is no plan. We are figuring it out (now with your help).
arp doesn't work because of Packet L3. BGP does, but (like arp) need a way of controlling it. If an entire node goes down, so will the speaker, and no traffic should be routed there. The intermediate phase - node up, apiserver down - is the one we need to handle. So either a single node responds to the IP (like kube-vip) or all do (like ELB).
to use a client-go watcher for nodes (w/control plane label) and of the node health
Basically like a watchdog, tailored for k8s (or like an ELB health check, same idea) and that then modifies it?
from cluster-api-provider-packet.
The client-go
watch dog could look something like below.
opts := metav1.ListOptions{}
opts.LabelSelector = "node-role.kubernetes.io/master"
// Watch function
// Use a restartable watcher, as this should help in the event of etcd or timeout issues
rw, err = watchtools.NewRetryWatcher("1", &cache.ListWatch{
WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
return clientset.CoreV1().Nodes().Watch(opts)
},
})
if err != nil {
return fmt.Errorf("error creating watcher: %s", err.Error())
}
ch := rw.ResultChan()
log.Infof("Beginning watching Kubernetes Control Plane Nodes")
for event := range ch {
// We need to inspect the event and get status of it
switch event.Type {
case watch.Added, watch.Modified:
node, ok := event.Object.(*v1.Node)
if !ok {
return fmt.Errorf("Unable to parse Node from watcher")
}
for x:= range node.Status.Conditions {
if node.Status.Conditions[x].Type == corev1.NodeReady {
if node.Status.Conditions[x].Status!= corev1.ConditionTrue {
// remove it
}
}
}
... etc ..
This could also act on a watch.Delete
in the event that we do a kubeadm reset
and a kubectl node delete
for managing control plane nodes.
from cluster-api-provider-packet.
Thanks a lot @thebsdbox ! Let me elaborate on my desire because I think I didn't explain them well (looking at both suggestions).
I am fine with a watch tool or shared informer but they do not solve what I am looking for when thinking about the implementation that controls and allocates the ControlPlaneEndpoint based on api-server
availability.
Watching nodes is fine but it is not enough because I would like to get a distributed control loop that runs every x(10) seconds for every cluster. Almost like a health check.
I wrapped my head about this desire because CAPP is deployed as a Deployment, it means that technically it can have one or more replicas (@deitch suggested to me that this should not be done and if this is an assumption we can make that's fine), it means that we need some sort of coordination otherwise every replica will run all the control loops.
As I said, this is a problem only if we think about the possibility of running multiple replicas for CAPP, CCM, and so on but it looks something to avoid, it means that a goroutine is more than enough and I do not need to watch Kubernetes. We can if we want to trigger the loop based on K8S nodes events but it is not mandatory at the moment.
I had a look at the "π External Remediation Proposal" and I do not think it works with Control Plane because external remediation works only when a Machine is unhealthy. The MachineHealthCheck responsibility is to mark a Machine as unhealthy but the book says:
Control Plane Machines are currently not supported and will not be remediated if they are unhealthy
from cluster-api-provider-packet.
Closed this in favor of kubernetes-sigs/cloud-provider-equinix-metal#57
from cluster-api-provider-packet.
Related Issues (20)
- removing the support for older packet-ccm cloud provider HOT 3
- Implement Machine Health Check Remediations HOT 8
- CAPI 1.6 Released - Update to support it
- Use Equinix Metal LB service to provision the Kubernetes API VIP HOT 1
- Provision EM Clusters with Service LoadBalancing optionally enabled through CPEM HOT 8
- Can't open standard output error in controller logs HOT 5
- update device-id logging lines in packetmachine_controller.go to just use deviceID HOT 10
- Scaling from 0 seems to have issues HOT 7
- Use same packet-ci yaml file for github actions and e2e tests. HOT 8
- Get rid of spurious error message about packetmachine with machine name not existing HOT 8
- Add kube-state-metrics support HOT 9
- Update to current version of kustomize HOT 9
- Create onboarding and growth path for contributors HOT 6
- Create project roadmap HOT 6
- Review/update contributing.md HOT 7
- Create dev container HOT 6
- Debug flag HOT 6
- Makefile cleanup HOT 6
- Start testing against k/k master and/or next-release-latest HOT 6
- Add an auth provider for issuing tokens to device creation HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cluster-api-provider-packet.