Kubernetes Cloud Provider for Equinix Metal (formerly Packet Cloud Controller Manager)

Home Page: https://deploy.equinix.com/labs/cloud-provider-equinix-metal

License: Apache License 2.0

Makefile 2.80% Go 94.50% Dockerfile 0.73% Smarty 1.16% Starlark 0.81%

kubernetes cloud-providers packet controller-manager cloud equinix-metal

cloud-provider-equinix-metal's People

Stargazers

Watchers

cloud-provider-equinix-metal's Issues

Add ability to run ccm locally

Similar to what we do with cluster-api-provider for packet, where you can set a KUBECONFIG and then are able to run it locally, we need the ability to run the ccm locally. This will speed up development cycles.

Reconsider MetalLB deployment management

The CCM's LoadBalancer feature, when enabled, creates and manages a MetalLB deployment. This includes the RBAC and service records.

While convenient, this managed installation expressing many opinions that are not configurable, including:

namespace
resource names
RBAC settings
all aspects of MetalLB configuration

Rather than managing the complete deployment, PacketCCM could offer an option to manage only the configmap used by a user-managed deployment.

Remove deployment management (daemonset, services, rbac) for MetalLB resources from PacketCCM
Remove RBAC packet-ccm uses in support of the controller management responsibility
Keep ConfigMap management
- ConfigMap location should be a configuration option to the PacketCCM (PacketCCM's deployment would need to run with a clusterrole to access configmaps in any namespace)
- ConfigMap type could be an option (to more easily support various versions of MetalLB or KubeVIP, assuming either can be managed by a single configmap)
- ConfigMap management approach could be an option (patch vs create/replace, allowing users to provide base configuration that PacketCCM ignores)
Optional Helm chart flags could be used to install MetalLB or KubeVIP
- If we alternatively offer a kustomize based install, users could apply their own patches to the MetalLB yaml we provide
- ^ may also be possible with Helm (I don't know Helm well enough to say)
- Or these instructions could be left to the user, so the PacketCCM helm chart does not need to reproduce the helm chart of other applications. (the experience would be something like: helm install metallb; helm install packet-ccm --options-for-metallb)

Everything above still applies if we move to KubeVIP.

Document or script release specific changes

Missing release artifacts are easy when the process is not documented, scripted, or automated. The current release of packet-ccm is v1.0.1, however the latest deployment recommended is v1.0.0. The deploy/release path was not updated in the latest release.

Update the documentation or scripts to make release deployment a better process for the engineer performing the steps.

Rebrand: Packet to Equinix Metal

Packet was acquired by Equinix in March. To help make our vision for global, interconnected bare metal a reality, we've rebranded Packet as Equinix Metal™ and introduced new locations and features. Login, sign up, and say hello at metal.equinix.com. Example rebrands:

Packet --> Equinix Metal
[email protected] --> [email protected]
https://slack.packet.com/ --> https://slack.equinixmetal.com/
Freenode IRC #packethost --> Freenode IRC #equinixmetal
packet.com --> metal.equinix.com
https://app.packet.net/ --> https://console.equinix.com/

Retry configuring controllers or exit on setup error

If API is unavailable during setup and for example we fail to fetch kube-apiserver namespace like in logs below, then CCM is running, but no Service will get IP address assigned until CCM is restarted.

Setup should either be retried or process should exit to avoid getting stuck in broken state.

Logs below confirm it:

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0128 21:09:41.741041       1 main.go:235] authToken: '<masked>'
I0128 21:09:41.741123       1 main.go:235] projectID: '0361e5d1-8d07-44af-8ad0-f88c8c0578a2'
I0128 21:09:41.741148       1 main.go:235] load balancer config: ''%!s(MISSING)
I0128 21:09:41.741206       1 main.go:235] metallb://
I0128 21:09:41.741257       1 main.go:235] facility: 'ewr1'
I0128 21:09:41.741296       1 main.go:235] peer ASN: '65530'
I0128 21:09:41.741355       1 main.go:235] local ASN: '65000'
I0128 21:09:41.741438       1 main.go:235] Elastic IP Tag: ''
I0128 21:09:41.741463       1 main.go:235] API Server Port: '6443'
I0128 21:09:41.741486       1 main.go:235] BGP Node Selector: ''
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0128 21:09:44.575227       1 serving.go:331] Generated self-signed cert in-memory
I0128 21:09:45.652794       1 controllermanager.go:127] Version: v0.0.0-master+$Format:%h$
I0128 21:09:45.663835       1 secure_serving.go:197] Serving securely on [::]:10258
I0128 21:09:45.667055       1 tlsconfig.go:240] Starting DynamicServingCertificateController
E0128 21:09:46.693789       1 cloud.go:110] could not initialize loadbalancer: failed to get kube-system namespace: Get "https://kube-apiserver/api/v1/namespaces/kube-system": dial tcp 100.66.57.80:443: connect: connection refused
I0128 21:09:46.694130       1 node_controller.go:108] Sending events to api server.
W0128 21:09:46.694174       1 cloud.go:152] The Equinix Metal cloud provider does not support InstancesV2

After re-creating CCM pod, Services correctly gets IP assigned.

Controller not able to report election events

When running from latest master, following errors can be found in the logs:

E1111 08:23:54.704129       1 event.go:319] Could not construct reference to: '&v1.Lease{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cloud-controller-manager", GenerateName:"", Namespace:"kube-system", SelfLink:"/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager", UID:"fc70e414-ece0-48f7-8924-ce0d420d560f", ResourceVersion:"9981", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63740678790, loc:(*time.Location)(0x2851be0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"packet-cloud-controller-manager", Operation:"Update", APIVersion:"coordination.k8s.io/v1", Time:(*v1.Time)(0xc00080acc0), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc00080ace0)}}}, Spec:v1.LeaseSpec{HolderIdentity:(*string)(nil), LeaseDurationSeconds:(*int32)(nil), AcquireTime:(*v1.MicroTime)(nil), RenewTime:(*v1.MicroTime)(nil), LeaseTransitions:(*int32)(nil)}}' due to: 'no kind is registered for the type v1.Lease in scheme "pkg/runtime/scheme.go:101"'. Will not report event: 'Normal' 'LeaderElection' '<removed> became leader'

And:

E1111 08:23:54.706793       1 event.go:263] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cloud-controller-manager.16466684168b47fd", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"cloud-controller-manager", UID:"20b807eb-9efc-4fd8-ae9a-56f463d9fb70", APIVersion:"v1", ResourceVersion:"9980", FieldPath:""}, Reason:"LeaderElection", Message:"l8e-mat-controller-0_6ffa29ed-6b28-4dc9-8b57-3ec89b97a1ef became leader", Source:v1.EventSource{Component:"cloud-controller-manager", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbfe305c6a9f7c3fd, ext:17295922338, loc:(*time.Location)(0x2851be0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbfe305c6a9f7c3fd, ext:17295922338, loc:(*time.Location)(0x2851be0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot create resource "events" in API group "" in the namespace "kube-system"' (will not retry!)

Unify secrets with how csi does it

packet-csi supports env vars, but also config file passed via --config=/.... This should be unified between the two.

EIP can conflict between 2 clusters in same project due to tag naming

ccm uses EM EIP tags to determine which service an EIP is attached to (or intended for). That way it can stop, start, etc. and the affiliation will be appropriate. You even can pre-assign an EIP (which @c0dyhi11 wants to do with Anthos).

The logic when syncing services inside ccm is roughly (ignoring removal, etc.):

Find all services of type=LoadBalancer
Filter out any that already have Spec.LoadBalancerIP set
Create two tags:
- usage=packet-ccm-auto
- service=<service-identifier> (see below)
Find any EIP that have the above tags set
- if found, use it
- if not, create one, add the tags
Assign the EIP by setting Spec.LoadBalancerIP=<eip>

The <service-identifier> is created as follows (pseudo-code):

identifier = sha256( serviceNamespace + "/" +serviceName )

We use sha256 instead of just the clear text, because some customers did not want the names of their services exposed on the project configuration; it should remain internal to the cluster only.

The problem arises when you have 2 or more clusters in the same project. If both have a Service with the same name, they will conflict. It will try to assign the EIP to both of them, which is not what you want.

LoadBalancer enabling behaviours - what is the correct UX

Currently, the ccm has an option to manage loadbalancers, specifically metallb.

The ccm takes the job of:

interfacing with the Packet API to get an EIP to assign to to a Service type=LoadBalancer
attaching that EIP to the Service, specifically Spec.Status.LoadBalancerIP
interfacing with the Packet API to enable BGP on the project and per-device
placing BGP information on each node's annotations

Note that none of the above has anything to do with a specific implementation (metallb). Packet is unique in that it could have multiple different loadbalancer implementations. Unlike AWS (ELB) or other providers, it isn't tightly tied to one.

In addition, because the (current) LB implementation is metallb, it also does:

manage the metallb configmap by placing the bgp and service information in it

Since the two - general lb requirements (list of first 4); metallb-specific (last 1) - have been closely intertwined, ccm has used the existence of the configmap as a "switch" to determine whether or not to do the rest. The UI looks something like this. It is controlled via the value of the configmap flag, env var PACKET_LB_CONFIGMAP or equivalent.

disabled - do nothing, do not try to manage the metallb configmap (item 5), but also don't do any non-metallb stuff (items 1-4)
`` (empty) - default to the metallb default, metallb-system:config
<something:else> - use that as the config map namespace+name

If the field is not disabled and not empty, then:

if the configmap exists, manage that configmap (item 5) and the other non-metallb stuff (items 1-4)
if the configmap does not exist, do not manage any configmap (item 5) or any of the other non-metallb stuff (items 1-4)

My contention is that we should be able to separate the two lists of metallb-specific and non-metallb-specific, and have a choice to:

disable everything - we have this now with PACKET_LB_CONFIGMAP, which is a bad name for this
enable just the first list (items 1-4) - does not exist
enable the first list (items 1-4) and the second (item 5) - we have this now if the configmap exists and the env var is not disabled

Question: what should this configuration look like. How should we tell ccm to pick option a, b or c? Obviously env vars / configs / cli flags, as usual, but what specifically should they be and what should the options be set to?

Let's keep in mind that we may support other LB implementations in the future.

If we can come up with a sane UI with the right flexibility, then we can implement it quickly.

cc @thebsdbox @gianarb @displague @detiber @jmarhee

Support automatic creation of the MetalLB config when nodes are added/updated/deleted

Currently the only way to have a LoadBalancer functionality on Packet is using MetalLB. It won't be an overstatement to say that the Kubernetes clusters running on Packet have to use MetalLB.

MetalLB is cool and works perfectly, the only thing that is error prone in MetalLB is that it needs a configuration. And sometimes folks forget to create/update the config as the newer nodes are added or removed from the cluster.

So I think this feature is a good candidate for the cloud controller manager to take care of.

Task breakdown:

Accept a configMap which has information of the EIPs to use and nodeSelectors, it can look something like this:

metallb:
  # List of all the EIP blocks you have purchased from Packet
  eips:
  - 147.23.89.56/28
  - 147.23.90.6/32
  nodeSelectors:
    # Nodes where the MetalLB pods should be running
    node-role.kubernetes.io/node: ""

The controller watches node objects and creates/updates the metallb config in the metallb-system namespace. For this it will have to talk to the Packet API to fetch the information like what is the peer IP of the node in the private IP range.
Also update the existing secret config that is fed to the ccm pod to also take bgp Password, if it is set on the packet console.

...
stringData:
  apiKey: "abc123abc123abc123"
  projectID: "abc123abc123abc123"
  bgpPassword: "abc123abc123abc123"

make build fails on OSX

It fails with the following info:

➜ make build                                                                                     
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build -v -o dist/bin/packet-cloud-controller-manager-darwin-amd64 -L/usr/local/opt/libffi/lib ./
flag provided but not defined: -L/usr/local/opt/libffi/lib
usage: go build [-o output] [-i] [build flags] [packages]
Run 'go help build' for details.
make: *** [dist/bin/packet-cloud-controller-manager-darwin-amd64] Error 2

Stop Releasing Container Images with Commit SHA

As this project is consumed by customers, we should stop publishing each commit as a unique tag. This pollutes the tag list for customers and adds various layers of confusion to something that should be a simple docker search.

If we continue to release main / master, then this should be published using an mutable tag, such as latest - though its usage should be discouraged.

Each tagged release should be tagged with a semantic version.

Create and manage Helm chart for packet-ccm

It would be nice if there is a helm chart for the deployment and loadbalancer converted to approproate files. What do you think @deitch?

Adjust CCM Load Balancer so that it adds all peering IPs returned by the Packet API / metadata

This #62 PR changed the way the CCM creates peers by pulling the BGP info through the Packet API. However, one drawback is that it only adds the first peering IP of the array returned by the API.

https://github.com/packethost/packet-ccm/blob/30ebae601a916e3b620e4e547642b37bf9344c33/packet/loadbalancers.go#L140

Depending on the ToR router, the API will sometimes return more than 1 peering IP so we should loop through the array and add a peer config for each peering IP returned by the API.

Set k8s built-in labels through the CCM

https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/

We should be setting some of those built-in labels like node.kubernetes.io/instance-type and topology.kubernetes.io/zone so that they match with packet naming.

Unable to list configmaps in metallb-system namespace by cloud-controller-manager SA

I am trying to integrate packet-ccm with Lokomotive. I tried to run it standalone by running the deployment from v1.1.0 release and for load balancer using Lokomotive's MetalLb component. But when I deploy it logs says

failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"

Can someone please tell, if I am doing something wrong?

Steps to reproduce:

Deployed cluster using Lokomotive latest release v0.5.0 by setting --cloud-provider=external for kubelet
created secret as specfied in packet-ccm docs
applied v1.1.0 deployment
applied lokomotive MetalLb component

re-applying Lokomotive's MetalLB fails ccm

Steps to reproduce

Apply secrets
Apply deployment on cluster with kubelet configured.
Apply Lokomotive MetalLb component.

If component is re-applied, ccm does not pick configMap correctly and throws error that configMap is not correct, which should not happen. You have to delete the pod manually to get it to work again.

failed to update and sync nodes: failed to get metallb config map: could not parse config: yaml: unmarshal errors:
  line 1: field peer-autodiscovery not found in type metallb.ConfigFile

cc @deitch

https://github.com/packethost/packet-ccm#control-plane-loadbalancer-implementation "this section" link broken

It seems https://github.com/packethost/packet-ccm#Elastic_IP_as_Control_Plane_Endpoint no longer exists.

Upgrade client to kubernetes 1.15+

References to DO are here

Expose BGP-related information as node annotations

General

In order for k8s clusters to maintain BGP peering with Packet ToR switches using solutions such as MetalLB, it is necessary for k8s controllers to be able to retrieve the following information about a Packet device:

The local ASN
The remote ASN
The peer address

The information above is available on the Packet API.

I suggest we add logic in the CCM to "enrich" Node objects with BGP-related information using annotations since this information is non identifying.

NOTE: In cases where a BGP password is configured for a project, this information is necessary as well in order to successfully establish BGP sessions. However, storing clear-text passwords in Node objects isn't a very good idea. A better approach would be to store the password in a k8s secret, however secrets are namespace-scoped and the CCM can't know which applications are going to need access to the password. Therefore, I guess we should simply ignore the BGP password at the CCM level. Since BGP passwords aren't node-specific, it seems reasonable to me that users configure a secret themselves and wire the relevant application to use that secret.

cc: @deitch

Implementation

I've discussed this proposal with SIG Cloud Provider as to date there doesn't seem to be a CCM which adds annotations to Node objects. Here is the answer I've received:

None of the core controllers add labels, but if you need to, you can create a custom one and run it as part of Initialize (https://github.com/kubernetes/cloud-provider/blob/master/cloud.go#L44-L47)

So looks like this ^ is the right place within the CCM to include the new logic.

Following the k8s guidelines for label structure, we should select a prefix. The following assumes the chosen prefix is packet.com, however I have no opinion regarding the actual prefix as long as it's clear it is Packet-specific.

Once implemented, a user who runs a k8s cluster on Packet with the CCM deployed should expect annotations as in the following example to exist on any Node object before the node.cloudprovider.kubernetes.io/uninitialized taint is removed:

packet.com/customer-asn: 65000
packet.com/peer-asn: 65530
packet.com/peer-address: 10.80.3.126

Add ability to select on which nodes BGP will be enabled

In some large clusters, one may want to have dedicated pool of Ingress nodes, which will be handling ingress traffic. CCM currently enables BGP on all nodes in the cluster.

In Lokomotive we allow users to disable BGP per-worker pool, if needed, for example for security reasons.

It would be great if one could configure a node selector in the CCM, which would select on which nodes BGP should be enabled.

Docker multi arch images are not totally right

Hello
I think this screenoshot from Docker hub explains the issue:

The image tagged as arm64 gets detected as amd64 arch on hub, I think this prevents images to work on arm.

Makefile: platforms joining doesn't work

Currently, trying to run locally make release fails with the following error for me:

bgp-node-selector-amd64: digest: sha256:bf058f9eef5ce126feb1d8d097f2d6597917d997481c770fd132772125e37212 size: 739
make[2]: Leaving directory '/home/invidian/repos/kinvolk/packet-ccm'
# path to credentials based on manifest-tool's requirements here https://github.com/estesp/manifest-tool#sample-usage
/home/invidian/go/bin/manifest-tool push from-args --platforms linux/arm64 linux/amd64, --template quay.io/invidian/packet-ccm:bgp-node-selector-ARCH --target quay.io/invidian/packet-ccm:bgp-node-selector
FATA[0000] You must specify all three arguments --platforms, --template and --target
make[1]: *** [Makefile:198: push-manifest] Error 1
make[1]: Leaving directory '/home/invidian/repos/kinvolk/packet-ccm'
make: *** [Makefile:267: release] Error 2

Service watcher is started even though there is nothing to do for it

Currently, even if elastic IP tag is not set, CCM tries to watch Service objects and every loop prints similar messages:

failed to update and sync service for add default/kubernetes: elastic ip tag is empty. Nothing to do

I think if there is no loops accessing the Service objects, watch could not even be started to save resources. Optionally, RBAC grant can then also be reduced to not need to read service objects, following least privileged principle.

Restructure docs

The docs on the main page are more focused on building than deploying. We need to fix the docs as follows:

switch so the main readme is about deploying, and building is linked further
include a link to ccm general information at https://kubernetes.io

See the csi repo for an example

Reduce RBAC grants for CCM

Even though CCM is rather highly privileged process on the cluster, for the sake of lease privilege principle and to better describe what CCM actually touches, current RBAC rules could be reduced. For example:

Currently ClusterRole is used for endpoints objects. This could be moved to Role in deployment namespace.
Currently ClusterRole is used for coordination.k8s.io objects. This could be moved to Role in deployment namespace.
Currently ClusterRole is used for configmaps objects to manage MetalLB ConfigMap. This could possibly be reduced to only metallb-system namespace. Do note, that this will also require binding extension-apiserver-authentication-reader system role to be able to read extension-apiserver-authentication ConfigMap.

Add annotations to nodes for BGP Peering "Config" type

Packet has two types of switches in their fleet today. (With possibly more to come) These types today are Juniper and Arista.
These switches require different BGP peering options. In order for MetalLB to know the peering options, we'll need to add annotations to the nodes to specify which options are needed for peering.

The API that is hit is: https://api.packet.net/devices/<Instance_Id>/bgp/neighbors
The result looks like this:

{
  "bgp_neighbors": [
    {
      "address_family": 4,
      "customer_as": 65000,
      "customer_ip": "10.99.2.129",
      "md5_enabled": false,
      "md5_password": null,
      "multihop": false,
      "peer_as": 65530,
      "peer_ips": [
        "10.99.2.128"
      ],
      "routes_in": [
        {
          "route": "10.99.2.128/25",
          "exact": false
        }
      ],
      "routes_out": []
    }
  ],
  "device": {
    "href": "/devices/a9660769-d216-4bab-9068-8e5763e7f962"
  }
}

Remove loadbalancer.yaml?

Is there any good reason to keep our own copy of metallb's loadbalancer.yaml?

It is a (slightly out-of-date) copy of what is linked from the official metallb site.

Is there any reason not to remove it, and just tell people to install from there?

Cannot deploy loadbalancer.yaml - server reported 404 Not Found, status code=404

Whilst following the README I ran into the following error:

root@ccm-test-1:~# kubectl apply -f https://github.com/packethost/packet-ccm/releases/download/${RELEASE}/loadbalancer.yaml
error: unable to read URL "https://github.com/packethost/packet-ccm/releases/download/v1.1.0/loadbalancer.yaml", server reported 404 Not Found, status code=404
root@ccm-test-1:~#

Release v1.1.0 does not have loadbalancer.yaml

curl https://github.com/packethost/packet-ccm/releases/download/v1.1.0/loadbalancer.yaml throws not found.

“unexpected providerID format” Error

Hey folks.

Working with the CCM. I got it installed and it looks like it tagged the master node just fine. But my worker node gives a goofy error in the logs and isn't tagged.

E0229 07:11:54.037457       1 node_controller.go:140] unexpected providerID format: eae05540-a992-43d3-9b01-5c6d61184de2, format should be: packet://device-id
I0229 07:11:55.278356       1 node_controller.go:315] Successfully initialized node k8s-us-west-1-master with cloud provider
E0229 07:16:56.975066       1 node_controller.go:140] unexpected providerID format: eae05540-a992-43d3-9b01-5c6d61184de2, format should be: packet://device-id

@deitch mentioned he’d seen this before.

Elastic IP management for control plane

Initially, this issue got discussed in CAPP #141.

The problem statement: No Load balancer

Packet does not provide a load balancer, it means that you have to figure out a way to route traffic between Control Planes when running Kubernetes in HA otherwise you will end up having a single point of failure (SPOF) even if you run k8s HA. Because only the Node that has the Elastic IP attached to it gets traffic.

When looking at how other cloud providers solve this issue it is clear that an externally managed load balancer will make this implementation straightforward. Your cluster gets a load balancer and all the control planes are registered there.

But we do not have a load balancer

We evaluated a couple of different ways originally to get around this problem, I will summarize them here

BGP at Control Plane device level with bird or equivalent. This will may work but it will conflict with MetalLB. The load balancer implementation that we ship with Kubernetes via CCM #53 because we will end up having two BGP communications ongoing from the same node, one from bird and one from MetalLB. This can potentially confuse the router.
MetalLB for Control Plane. This is a nice and clean solution in theory, but in practice, it is hard to implement. You need to have a control plane up and running in order to place a Service type=LoadBalancer (managed by MetalLB) in front of the API Servers. But you can not have a reachable API Server until you get the right ElasticIP assigned to the Load Balancer. 🔁
Write a control loop based on time (every 6sec for example) that will check the status of the ElasticIP, if it does not respond it can be attached to another control plane because it means that the current one is broken.

We discussed the pros and cons in the other issue linked above.

This is a CCM issue... maybe

We are moving the conversation here because we think this is a better home for this kind of feature. Kubernetes HA on Packet is something that does not require ClusterAPI. Kubernetes clusters can be provisioned via Terraform or other tools. Fixing this issue here will make everybody using Kubernetes on Packet and CCM capable of using this feature.

Update Kubernetes dependencies to fix printing "unable to load initial CA bundle for" warning

Specifically to include kubernetes/kubernetes@413960e#diff-cede7efa8beef64baa0651bd73b39e27ad9508763719b48b4d88483a61964c89, as currently following warning is being printed for no reason:

W1203 10:38:04.249751       1 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::client-ca-file" due to: configmap "extension-apiserver-authentication" not found
W1203 10:38:04.249802       1 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" due to: configmap "extension-apiserver-authentication" not found

Add option to use a specific range of IPs

This is a necessary use case for customers that bring their own IPs, or Packet global anycast IPs.

Implement

Currently we don't implement the loadbalancer status reply from the GetLoadBalancer function, this means that unless something else does it (metal) then there is no way of moving a service from <pending> or as below in the service spec:

status:
  loadBalancer: {}

To remedy this we simply need to update the GetLoadBalancer to do:

func (l *loadBalancers) GetLoadBalancer(ctx context.Context, clusterName string, service *v1.Service) (status *v1.LoadBalancerStatus, exists bool, err error) {
     if service.Spec.LoadBalancerIP == "SOMETHING that exists" {
			return &v1.LoadBalancerStatus{
				Ingress: []v1.LoadBalancerIngress{
					{
						IP: service.Spec.LoadBalancerIP,
					},
				},
			}, true, nil
    }
   return nil, false, nil
}

Add packet secret template to releases

https://github.com/packethost/packet-ccm/blob/master/deploy/template/secret.yaml

It would be helpful to have this secret template used by the CCM in the releases page:

https://github.com/packethost/packet-ccm/releases

Logs spammed with "The Packet cloud provider does not support InstancesV2"

It seems running CCM from image docker.io/packethost/packet-ccm:bba111cc65391734bb78dbaf99ed2f56a9a1463d produces a lot of spammy logs like shown below:

W0125 09:42:37.963982       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:37 [DEBUG] GET https://api.equinix.com/metal/v1/devices/aaf4679f-13f4-46c2-8519-ddbc4184438e?include=facility
W0125 09:42:38.138011       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:38 [DEBUG] GET https://api.equinix.com/metal/v1/devices/aaf4679f-13f4-46c2-8519-ddbc4184438e?include=facility
W0125 09:42:38.321460       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:38 [DEBUG] GET https://api.equinix.com/metal/v1/devices/8af3e9af-e1cd-4311-9c53-740a5c2e470b?include=facility
W0125 09:42:38.512576       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:38 [DEBUG] GET https://api.equinix.com/metal/v1/devices/8af3e9af-e1cd-4311-9c53-740a5c2e470b?include=facility
W0125 09:42:38.687375       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:38 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:42:38.864293       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:38 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:42:44.038321       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:44 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:42:44.220514       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:44 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:42:49.403057       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:49 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:42:49.577594       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:42:49 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility
W0125 09:46:31.563386       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:46:31 [DEBUG] GET https://api.equinix.com/metal/v1/devices/aaf4679f-13f4-46c2-8519-ddbc4184438e?include=facility
W0125 09:46:32.027907       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:46:32 [DEBUG] GET https://api.equinix.com/metal/v1/devices/8af3e9af-e1cd-4311-9c53-740a5c2e470b?include=facility
W0125 09:46:32.238768       1 cloud.go:154] The Packet cloud provider does not support InstancesV2
2021/01/25 09:46:32 [DEBUG] GET https://api.equinix.com/metal/v1/devices/ba5666bd-3c48-498b-bc9c-bc2e288900d4?include=facility

Is this something which could be silenced?

Report UserAgent in API calls

The packngo API calls should report a CCM specific user-agent so the CCM usage can be quantified.

Currently, the User-Agent reported is the default packngo/v0.1.0 UA

CCM unable to get node info

When running, it complains about getting node info.

In k3s:

I1112 14:43:00.227384       1 controllermanager.go:244] Started "cloud-node-lifecycle"
E1112 14:43:00.239786       1 node_controller.go:140] provider name from providerID should be packet: k3s://ccm01
E1112 14:43:00.983278       1 node_controller.go:140] provider name from providerID should be packet: k3s://ccm02

In k8s:

I1112 15:17:52.834928       1 controllermanager.go:244] Started "cloud-node-lifecycle"
E1112 15:17:53.263606       1 node_controller.go:140] unexpected providerID format: ef6660bc-6578-490c-9d49-3ddadb2c483e, format should be: packet://device-id
E1112 15:17:54.178116       1 node_controller.go:140] unexpected providerID format: e197e6a0-4a2b-4256-9b7a-c1ea510579d4, format should be: packet://device-id

In all cases, it appears it doesn't send anything consistent.

CCM Load Balancer doesn't remove node peers from the metalLB configmap when deleting nodes and missing the notification

The CCM has logic to remove node peers from the configmap, but only if it catches the nodes when they are being deleted. If it misses the delete node notification (like when CCM has to get restarted due to rescheduling from a deleted node to a healthy node), it has no provision for checking on each future loop (unlike adding, which does).

Update to ccm v1alpha2

v1alpha2 is a major rewrite of how ccm is implemented. See details here and here

ccm crashes due to gce flag

This bug is manifesting itself. Either fix it manually in main.go as digital ocean did or upgrade to k8s 1.15.0 or higher, which includes that fix, as DO did

Ideally we will get 1.15 working, and this will go away.

Fix Load Balancer config formatting

Right now, following is printed in logs:

I0128 21:37:41.355102       1 main.go:235] load balancer config: ''%!s(MISSING)
I0128 21:37:41.355109       1 main.go:235] metallb://
I0128 21:37:41.355113       1 main.go:235] facility: 'ewr1'

When using version v3.0.0.

CCM removing BGP password on startup w/ load balancer disabled

Using v1.1.0 and the below config-sa.json. On startup, the project's BGP password is reset to empty causing the MetalLB deployment to fail BGP communication (which uses a password). Appears to be only on startup as I can put the password back and have it stay until the next time CCM restarts.

config-sa.json

{
  "apiKey": "...",
  "projectID": "...",
  "disableLoadBalancer": "true",
  "eipTag": "cluster-api-provider-packet:cluster-id:dev-ewr2-mine-k8s-game"
}

Logs

$ k logs packet-cloud-controller-manager-65c4c67f64-2fnmz
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0802 22:33:24.193929       1 serving.go:312] Generated self-signed cert in-memory
W0802 22:33:24.363280       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0802 22:33:24.364088       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
W0802 22:33:24.364096       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0802 22:33:24.364445       1 secure_serving.go:178] Serving securely on [::]:10258
I0802 22:33:24.364485       1 tlsconfig.go:219] Starting DynamicServingCertificateController
2020/08/02 22:33:24 [DEBUG] POST https://api.packet.net/projects/faab0f5b-488a-4928-bb54-15ceeb9c4237/bgp-configs

Add config struct for newBGP() function

It currently takes 7 parameters and #101 adds 8th, so it's definitely time to move that to dedicated struct.

Packet Uniform Standards Request

Hello!

We believe this repository is Maintained and therefore needs the following files updated:

If you feel the repository should be experimental or end of life or that you'll need assistance to update these files, please let us know by filing an issue with https://github.com/packethost/standards.

The Uniform Standards Project

Packet maintains a number of public repositories that help customers to run various workloads on Packet. These repositories are in various states of completeness and quality, and being public, developers often find them and start using them. This creates problems:

Developers using low-quality repositories may infer that Packet generally provides a low quality experience.
Many of our repositories are put online with no formal communication with, or training for, customer success. This leads to a below average support experience when things do go wrong.
We spend a huge amount of time supporting users through various channels when with better upfront planning, documentation and testing much of this support work could be eliminated.

To that end, we propose three tiers of repositories: Private, Experimental, and Maintained.

As a resource and example of a maintained repository, we've created https://github.com/packethost/standards. This is also where you can file any requests for assistance or modification of scope.

The Goal

Our repositories should be the example from which adjacent, competing, projects look for inspiration.

Each repository should not look entirely different from other repositories in the ecosystem, having a different layout, a different testing model, or a different logging model, for example, without reason or recommendation from the subject matter experts from the community.

We should share our improvements with each ecosystem while seeking and respecting the feedback of these communities.

Whether or not strict guidelines have been provided for the project type, our repositories should ensure that the same components are offered across the board. How these components are provided may vary, based on the conventions of the project type. GitHub provides general guidance on this which they have integrated into their user experience.

Thanks so much for your time!

~Rain.

Add documentation for building development images

It would be nice to have some quick explanation how to build development Docker images, supporting multi-arch.

Current workflow I figured out is to create a local git tag, then run make release BUILD_IMAGE=<your registry> CONFIRM=true or running make release BUILD_IMAGE=<registry> RELEASE_TAG=bgp-node-selector-test CONFIRM=true.

Race when using ccm via cluster-api

Hello!
The cluster api does not require a CNI plugin to be installed straight away in the cluster. But until networking is setup right the manager is in NotReady state:

$ kubectl --kubeconfig /tmp/ciao get node
NAME            STATUS     ROLES    AGE   VERSION
cncf-master-0   NotReady   master   35s   v1.18.5

The CCM does not deploy when a master is NotReady:

$ kubectl --kubeconfig /tmp/ciao get pod -n kube-system packet-cloud-controller-manager-77d965697-z4jzf -o json | jq .status
{
  "conditions": [
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2020-06-29T08:19:51Z",
      "message": "0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.",
      "reason": "Unschedulable",
      "status": "False",
      "type": "PodScheduled"
    }
  ],
  "phase": "Pending",
  "qosClass": "Burstable"
}

This is not expected because when I look at the ccm spec:

        - key: "node-role.kubernetes.io/master"
          effect: NoSchedule

This toleration should make it to work

Errors when LoadBalancer is not deployed?

The README says that the LoadBalancer is optional, however when I deploy the CCM without the LoadBalancer I see a lot of errors in the CCM logs asking to edit MetalLB resources.

Full logs are provided below.

What would you suggest?

root@ccm-test-1:~# RELEASE=v1.1.0
root@ccm-test-1:~# kubectl apply -f https://github.com/packethost/packet-ccm/releases/download/${RELEASE}/deployment.yaml
deployment.apps/packet-cloud-controller-manager created
serviceaccount/cloud-controller-manager created
clusterrole.rbac.authorization.k8s.io/system:cloud-controller-manager created
clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created

root@ccm-test-1:~# kubectl logs deployment.apps/packet-cloud-controller-manager -n kube-system
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I1203 14:39:04.122130       1 serving.go:312] Generated self-signed cert in-memory
W1203 14:39:06.168099       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1203 14:39:06.174304       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
W1203 14:39:06.174352       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I1203 14:39:06.176071       1 secure_serving.go:178] Serving securely on [::]:10258
I1203 14:39:06.176213       1 tlsconfig.go:219] Starting DynamicServingCertificateController
2020/12/03 14:39:06 [DEBUG] POST https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/bgp-configs
E1203 14:39:06.791637       1 cloud.go:210] failed to update and sync nodes for handler: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:39:06.791672       1 cloud.go:210] failed to update and sync nodes for handler: elastic ip tag is empty. Nothing to do
I1203 14:39:06.791773       1 cloud.go:237] nodes watcher started
2020/12/03 14:39:06 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:06.802529       1 cloud.go:222] failed to update and sync node for add ccm-test-1 for handler: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:39:06.802605       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:39:06.802649       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:39:06.802682       1 cloud.go:222] failed to update and sync node for add ccm-test-1 for handler: elastic ip tag is empty. Nothing to do
E1203 14:39:07.353793       1 cloud.go:260] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
I1203 14:39:07.353895       1 cloud.go:285] services watcher started
I1203 14:39:07.353934       1 core.go:101] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W1203 14:39:07.353977       1 controllermanager.go:244] Skipping "route"
I1203 14:39:07.357382       1 node_controller.go:110] Sending events to api server.
I1203 14:39:07.357568       1 controllermanager.go:247] Started "cloud-node"
2020/12/03 14:39:07 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:07.360403       1 node_controller.go:237] provider name from providerID should be packet, was k3s
2020/12/03 14:39:07 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/devices?include=facility
I1203 14:39:07.360804       1 node_lifecycle_controller.go:77] Sending events to api server
I1203 14:39:07.360885       1 controllermanager.go:247] Started "cloud-node-lifecycle"
E1203 14:39:07.364310       1 core.go:90] Failed to start service controller: the cloud provider does not support external load balancers
W1203 14:39:07.364347       1 controllermanager.go:244] Skipping "service"
E1203 14:39:08.260536       1 cloud.go:271] failed to update and sync service for add kube-system/metrics-server: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
2020/12/03 14:39:08 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:08.927122       1 cloud.go:271] failed to update and sync service for add kube-system/traefik-prometheus: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
2020/12/03 14:39:08 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:09.630507       1 cloud.go:271] failed to update and sync service for add kube-system/traefik: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
2020/12/03 14:39:09 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:10.133549       1 cloud.go:271] failed to update and sync service for add default/kubernetes: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
2020/12/03 14:39:10 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:39:10.733212       1 cloud.go:271] failed to update and sync service for add kube-system/kube-dns: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
2020/12/03 14:40:07 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:40:07.930101       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:40:07.931886       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:40:07.931976       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:40:07.932028       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:40:07.932059       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do
2020/12/03 14:41:07 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:41:08.629716       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:41:08.631329       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:41:08.631425       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:41:08.631481       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:41:08.631507       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do
2020/12/03 14:42:08 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:42:09.276806       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:42:09.278411       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:42:09.278514       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:42:09.278566       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:42:09.278596       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do
2020/12/03 14:43:09 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:43:10.005317       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:43:10.007068       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:43:10.007162       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:43:10.007212       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:43:10.007255       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do
E1203 14:44:09.083326       1 node_controller.go:237] provider name from providerID should be packet, was k3s
2020/12/03 14:44:09 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/devices?include=facility
2020/12/03 14:44:10 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:44:10.837322       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:44:10.839266       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:44:10.839357       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:44:10.839415       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:44:10.839462       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do
2020/12/03 14:45:10 [DEBUG] GET https://api.packet.net/projects/66ae0069-7d03-4db5-9af1-6b14036e380a/ips?
E1203 14:45:11.829765       1 cloud.go:302] failed to update and sync services: unable to retrieve metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:45:11.831526       1 cloud.go:311] failed to update and sync nodes: failed to get metallb config map: unable to get metallb configmap config: configmaps "config" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "metallb-system"
E1203 14:45:11.831622       1 bgp.go:79] could not ensure BGP enabled for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:45:11.831678       1 bgp.go:88] bgp.reconcileNodes(): could not get BGP info for node ccm-test-1: provider name from providerID should be packet, was k3s
E1203 14:45:11.831719       1 cloud.go:311] failed to update and sync nodes: elastic ip tag is empty. Nothing to do

Periodically reconcile all nodes

To make environment more robust, I think CCM should periodically verify, that all nodes has correct spec.providerID set, has BGP enabled etc.

Right now CCM only reacts when node joins or gets removed from the cluster, which is not ideal, as such events might be missed, e.g. when CCM gets updated and it's not available for some time.

Having periodic reconciliation would also help in scenarios when CCM is installed on existing clusters, as right now when one does so, the CCM won't set providerID for existing nodes, to workaround that, one must remove and re-register all nodes in the cluster.

kubernetes-sigs / cloud-provider-equinix-metal Goto Github PK