Giter VIP home page Giter VIP logo

Comments (14)

gianarb avatar gianarb commented on June 17, 2024

Personally I like the idea to use bird. And with the experience we got provisioning the /etc/network/interfaces to persist the elastic IP on the host I am confident we can make it.

Something that it does not support I presume is the situation where: the host is up but the api-server or something not host-related is down that prevents the control-plane to work in the right way. During this scenario, bird won't re-allocate the IP

In general, if we go with BGP support I would like to add a field as part of the PacketCluster to make it optional. Something like: PacketCluster.elasticIPRecoveryStrategy=none|bgp. Currently, this is not a smooth process because cluster-template is not that flexible (we need to configure or not configure BGP/bird looking at that field).

I will have a chat with people from cluster-api to see if they have API tips for me.

from cluster-api-provider-packet.

deitch avatar deitch commented on June 17, 2024

There is a guide to running bird for bgp on packet here

The ccm deploys (well, almost; it is disabled but will be enabled as an option in the next week or so) MetalLB to create load balancers for services of type=LoadBalancer. We could approach the metallb folks with an option to have it be able to deploy for other things. It would be nice to have something k8s-native do this part as well.

from cluster-api-provider-packet.

gianarb avatar gianarb commented on June 17, 2024

I gather some information about how we can make this to work and this is what I got:

  1. We can use keepalived/watchdog and bird via BGP.

    • PRO: It uses BGP and the foundation we provide. I think it is a good way to showcase how to use those technologies.
    • CONS: metallb and bird will use the same BGP "channel" (not sure if it is called in this way) and this will make BGP unhappy. So they can't both run in the control plane. @c0dyhi11 can probably elaborate
    • NOTE: keepalived, watchdog, BPG are new for me, and even Cody is not 100% sure we make it to work, but 99% it is a YES! So this is a small CONS.
  2. kube-vip

    • CONS: It won’t work as it is L2 via arp
    • PRO: Dan (maintainer) told me that it won't take too much to make it to work with no L2.
    • NOTE: never used it :D
  3. metallb control plane: the idea is to use k8s addon to spin up during kubeadm init a static pod for metallb that has to be removed as soon as some worker joined the cluster.

    • PRO: at the end of the process you have a metallb and a cluster up and running. Pretty clean
    • CONS: the process of deploying a static pod, watching its state and remove it looks a bit flaky and it requires to ship a dedicated program (it can be a container by itself)
  4. Control loop that runs as part of the cluster-api-provider process

    • CONS: it is not really a standard approach and it has to be maintained
    • CONS: I didn't find a way to leverage the client-go and kubernetes to write this loop in a way to is scalable and reliable as the Shared Informers are.
    • PRO: not that hard to write
    • NOTE: I know Go

from cluster-api-provider-packet.

deitch avatar deitch commented on June 17, 2024

Commenting on each of them in turn:

We can use keepalived/watchdog and bird via BGP

What this does is reinvent metallb, somewhat outside of the k8s native structure. The distinct upside is that all of the control nodes are active at once, which is ideal. No switchover time. I would want to compare this to just running metallb on the control plane nodes. If the latter works, it obviously is easier to manage.

kube-vip

I think it is worth a conversation with @thebsdbox how that would work. It isn't just a question of choosing to enable/disable the IP on the various nodes, but also a question of informing the upstream router that a particular server will be able to handle that IP. Since all of Packet is L3, the upstream has to know which ones have it. That usually is done via BGP, which brings is back to the original conflict question.

metallb control plane

If we can get this to work, it would be great. However, there might be the same conflicts. The question is, why would we want to remove the metallb pod once a worker has spun up? Why can we not leave it there?

Also remember that the metallb config can have different peers with different selectors (which we make use of), and different address pools with different selectors as well as restricted address pools. It might be possible to utilize a single metallb deployment for all of this.

The other question is bootstrapping. Most bootstrapping mechanisms (kubeadm, k3s, by extension cluster-api itself) need the IP to provide to each node as it bootstraps. If we can get it to work correctly that way, it will be pretty good.

control loop in cluster-api process

The ccm already has control loops in goroutines to do exactly this (but for load balancer allocation), so it wouldn't be too hard. I probably would do it in ClusterReconciler.SetupWithManager().

A fifth potential approach (combining some of those) is to do it in the CCM, where we already have control loops and are modifying metallb configs. I am not 100% sure that moving this down a layer, though, makes sense.

from cluster-api-provider-packet.

gianarb avatar gianarb commented on June 17, 2024

A fifth potential approach (combining some of those) is to do it in the CCM, where we already have control loops and are modifying metallb configs. I am not 100% sure that moving this down a layer, though, makes sense.

It looks like having this logic as part of the CCM will allow Kubernetes HA to work even without cluster-api. That is probably a good benefit

I agree, having metallb working sounds ideal

from cluster-api-provider-packet.

deitch avatar deitch commented on June 17, 2024

I see two problems with the "control loop in CAPP" solution. I am not as concerned with setting up a separate loop, as we can have a goroutine with a timer to keep an eye on things. Not perfect, but it works. My concerns are:

  • The control plane now depends on CAPP running to work in the case of control plane failure. Normally, you rely on CAPP only during changes, so the level of reliability can be lower. Now, you need to depend on it for close to normal running. It isn't awful - we do rely on the MachineDeployment to control the number of machines - but it is less than perfect.
  • It would have to be a timer loop, which means we might not know something changed.

The key question now is, what is the fastest way to do this? I suspect metallb, in or outside of CCM, but I am not sure. Thoughts @gianarb?

from cluster-api-provider-packet.

gianarb avatar gianarb commented on June 17, 2024

I have the same question, that's why I didn't have an answer yet. When you say I suspect metallb, in or outside of CCM you mean the addon static pod? Because it does not look like something quick to do if we have to manage its lifecycle (we have to build, ship a new container/binary)

from cluster-api-provider-packet.

thebsdbox avatar thebsdbox commented on June 17, 2024

Looking at this conundrum i've seen it approached in a few different ways. I think the most simple method is something like a daemonset that is tagged to run on control plane nodes. These containers use client-go for leader election and use the GO BGP client to inform of whom is leader on a new election event.

from cluster-api-provider-packet.

deitch avatar deitch commented on June 17, 2024

Thanks for hopping in @thebsdbox

Why does there need to be a leader? Like with an ELB, it can connect to multiple nodes at once, as long as unreachable nodes (apiserver is down) are not publishing.

In any case, is that easier than doing metallb?

from cluster-api-provider-packet.

thebsdbox avatar thebsdbox commented on June 17, 2024

Ah apologies I didn't fully understand the plan I guess. Perhaps a better alternative would be then to use a client-go watcher for nodes (w/control plane label) and of the node health. I'd still use leaderElection so there is a single source of truth and then have that update an ELB api or send BGP client broadcasts. I don't know the metallb source code but I think it uses a number of components including a "broadcaster" so it might be a bit more challenging to engineer it into this solution.

from cluster-api-provider-packet.

deitch avatar deitch commented on June 17, 2024

didn't fully understand the plan I guess

That is the problem, there is no plan. We are figuring it out (now with your help).

arp doesn't work because of Packet L3. BGP does, but (like arp) need a way of controlling it. If an entire node goes down, so will the speaker, and no traffic should be routed there. The intermediate phase - node up, apiserver down - is the one we need to handle. So either a single node responds to the IP (like kube-vip) or all do (like ELB).

to use a client-go watcher for nodes (w/control plane label) and of the node health

Basically like a watchdog, tailored for k8s (or like an ELB health check, same idea) and that then modifies it?

from cluster-api-provider-packet.

thebsdbox avatar thebsdbox commented on June 17, 2024

The client-go watch dog could look something like below.

	opts := metav1.ListOptions{}
	opts.LabelSelector = "node-role.kubernetes.io/master"

	// Watch function
	// Use a restartable watcher, as this should help in the event of etcd or timeout issues
	rw, err = watchtools.NewRetryWatcher("1", &cache.ListWatch{
		WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
			return clientset.CoreV1().Nodes().Watch(opts)
		},
	})

	if err != nil {
		return fmt.Errorf("error creating watcher: %s", err.Error())
	}

	ch := rw.ResultChan()
	log.Infof("Beginning watching Kubernetes Control Plane Nodes")

	for event := range ch {

		// We need to inspect the event and get status of it
		switch event.Type {
		case watch.Added, watch.Modified:
			node, ok := event.Object.(*v1.Node)
			if !ok {
				return fmt.Errorf("Unable to parse Node from watcher")
			}			
			 for x:= range node.Status.Conditions {
				 if node.Status.Conditions[x].Type == corev1.NodeReady {
				     if node.Status.Conditions[x].Status!= corev1.ConditionTrue {
				    // remove it
				    }
				}
			}

   ... etc .. 

This could also act on a watch.Delete in the event that we do a kubeadm reset and a kubectl node delete for managing control plane nodes.

from cluster-api-provider-packet.

gianarb avatar gianarb commented on June 17, 2024

Thanks a lot @thebsdbox ! Let me elaborate on my desire because I think I didn't explain them well (looking at both suggestions).

I am fine with a watch tool or shared informer but they do not solve what I am looking for when thinking about the implementation that controls and allocates the ControlPlaneEndpoint based on api-server availability.

Watching nodes is fine but it is not enough because I would like to get a distributed control loop that runs every x(10) seconds for every cluster. Almost like a health check.

I wrapped my head about this desire because CAPP is deployed as a Deployment, it means that technically it can have one or more replicas (@deitch suggested to me that this should not be done and if this is an assumption we can make that's fine), it means that we need some sort of coordination otherwise every replica will run all the control loops.

As I said, this is a problem only if we think about the possibility of running multiple replicas for CAPP, CCM, and so on but it looks something to avoid, it means that a goroutine is more than enough and I do not need to watch Kubernetes. We can if we want to trigger the loop based on K8S nodes events but it is not mandatory at the moment.

I had a look at the "πŸ“– External Remediation Proposal" and I do not think it works with Control Plane because external remediation works only when a Machine is unhealthy. The MachineHealthCheck responsibility is to mark a Machine as unhealthy but the book says:

Control Plane Machines are currently not supported and will not be remediated if they are unhealthy

from cluster-api-provider-packet.

gianarb avatar gianarb commented on June 17, 2024

Closed this in favor of kubernetes-sigs/cloud-provider-equinix-metal#57

from cluster-api-provider-packet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.