sh4d1 / scaleway-k8s-vpc Goto Github PK

View Code? Open in Web Editor NEW

23.0 6.0 7.0 120 KB

A Kubernetes Controller for VPC/Private Network on a Scaleway Kubernetes cluster

License: Other

Makefile 6.45% Go 93.55%

scaleway k8s kubernetes vpc private-network controller

scaleway-k8s-vpc's Introduction

Scaleway K8S VPC

Note: This in just a Proof of Concept, it is not suited for production usage.

Scaleway K8S VPC is a controller for Kubernetes running on Scaleway, leveraging CRDs to use PrivateNetwork in the cluster.

Getting started

Install the controller and the node daemon with:

kubectl create -k https://github.com/Sh4d1/scaleway-k8s-vpc/config/default

Create and enter your Scaleway credentials with:

kubectl create -f https://raw.githubusercontent.com/Sh4d1/scaleway-k8s-vpc/main/secret.yaml --edit --namespace scaleway-k8s-vpc-system

You can now create the following PrivateNetwork object:

apiVersion: vpc.scaleway.com/v1alpha1
kind: PrivateNetwork
metadata:
  name: my-privatenetwork
spec:
  id: <private network ID>
  ipam:
    type: Static
    static:
      cidr: 192.168.0.0/24
  routes:
  - to: 1.2.3.4/16
    via: 192.168.0.10

This will attach the private network to all nodes in the cluster, set up the interfaces with IPs in the range, and add the routes if needed.

If you have a DHCP running in the private network you can use it to assign IPs:

apiVersion: vpc.scaleway.com/v1alpha1
kind: PrivateNetwork
metadata:
  name: my-privatenetwork
spec:
  id: <private network ID>
  ipam:
    type: DHCP
  routes:
  - to: 1.2.3.4/16
    via: 192.168.0.10

Contribution

Feel free to submit any issue, feature request or pull request 😄!

scaleway-k8s-vpc's People

Contributors

Stargazers

Watchers

Forkers

lucasboisserie jeansebastienh wouter0100 linkurious jmc000 paulden

scaleway-k8s-vpc's Issues

Multiple CIDR configuration

Hello,

First thank you for this project that is very timely for me!
Here is my issue, I'm setting up a VPC on the CIDR 192.168.0.0/24 exposed on Internet through a gateway whose ip is 192.168.0.1. Using your controller I attach my kubernetes cluster nodes to the VPC with static IPs. The IP range must be [192.168.0.1-192.168.0.255] for the cluster nodes, corresponding to the following CIDR list:

192.168.0.2/31
192.168.0.4/30
192.168.0.8/29
192.168.0.16/28
192.168.0.32/27
192.168.0.64/26
192.168.0.128/25

I tried use a list for .spec.ipam.static.cidr value but only string are allowed. Then I tried to use the following string list:
cidr: "192.168.0.3/31,192.168.0.4/30,192.168.0.8/29,192.168.0.16/28,192.168.0.32/27,192.168.0.64/26,192.168.0.128/25"
without success. No static IP were given to the nodes on the ens5 network interface.
Here is the manifest I'm using:

apiVersion: vpc.scaleway.com/v1alpha1
kind: PrivateNetwork
metadata:
  name: vpc-ens5
spec:
  id: <VPC-ID>
  ipam:
    type: Static
    static:
      cidr: "192.168.0.3/32,192.168.0.4/30,192.168.0.8/29,192.168.0.16/28,192.168.0.32/27,192.168.0.64/26,192.168.0.128/25"
  routes:
  - to: 192.168.1.0/24
    via: 192.168.0.1

Is it possible to setup this kind of CIDR? Did I made a mistake / misunderstand some concepts?

Thank you, in advance for your valuable assistance!

IPAM change from Static to DHCP: multiple addresses

Hello,

If you edit PrivateNetwork IPAM (in my case, moving from Static to DHCP), multiple addresses are attached to private network interface and results in these errors:

E0403 13:06:09.344009 1 networkinterface_controller.go:172] controllers/NetworkInterface "msg"="unable to configure link" "error"="found 3 address for link ens5 instead of 1" "networkinterface"={"Namespace":"","Name":"xxxx"}
E0403 13:06:09.344138 1 controller.go:246] controller "msg"="Reconciler error" "error"="found 3 address for link ens5 instead of 1" "controller"="networkinterface" "name"="xxxxx" "namespace"="" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface

Nodes reboot solved the issue.

IPv4 addresses not released from IPAM

Yesterday we had an outage involving IPv4 addresses not being assigned to network interfaces. This was caused by the fact that the go-ipam library was thinking that there were no IPv4 addresses left.

However, we have assigned multiple /24's (3 in total), resulting in 759 usable IPv4 addresses - and we had never near that amount of nodes in K8s at the same time. This points to an error in releasing IPv4 addresses back to the usable IPv4 pool.

We fixed the outage rather quickly by moving the application to external IPv4 addresses, but after we got it back online the cleanup started of the go-ipam storage. In the end we created a script that is able to do that for us. The script requires you to scale down the scaleway-k8s-vpc controller to 0 (so no changes will be made) and copy over each base64 encoded string from the scaleway-k8s-vpc-ipam one-by-one and execute this script.

Code to clean-up go-ipam storage

package main

import (
	"encoding/base64"
	goipam "github.com/metal-stack/go-ipam"
	"inet.af/netaddr"
	"log"
)

var inUseIPsStrings = []string{
	"172.21.1.176",
	"172.21.0.165",
	"172.21.1.185",
	"172.21.0.1",
	"172.21.0.118",
	"172.21.0.2",
	"172.21.2.253",
	"172.21.0.137",
	"172.21.2.250",
	"172.21.2.254",
	"172.21.0.55",
	"172.21.0.3",
}

var storageBase64 = ""

func main() {
	var inUseIPs []netaddr.IP
	for _, psString := range inUseIPsStrings {
		temp, _ := netaddr.ParseIP(psString)
		inUseIPs = append(inUseIPs, temp)
	}

	log.Printf("%+v", inUseIPs)

	data, err := base64.StdEncoding.DecodeString(storageBase64)
	if err != nil {
		log.Panicln(err)
	}

	prefix := &goipam.Prefix{}
	err = prefix.GobDecode(data)
	if err != nil {
		log.Panicln(err)
	}

	memory := goipam.NewMemory()

	*prefix, err = memory.CreatePrefix(*prefix)
	if err != nil {
		log.Panicln(err)
	}

	ipnet, err := netaddr.ParseIPPrefix(prefix.Cidr)
	if err != nil {
		log.Panicln(err)
	}

	ipam := goipam.NewWithStorage(memory)

	// GetIPs is a custom code added that returns prefix.ips, so we're able to loop over it
	for ip, assigned := range prefix.GetIPs() {
		if !assigned {
			continue
		}

		temp, err := netaddr.ParseIP(ip)
		if err != nil {
			log.Panicln(err)
		}

		if temp.Compare(ipnet.Range().From()) == 0 || temp.Compare(ipnet.Range().To()) == 0 {
			// Keep first and last IP reserved (network + broadcast)
			continue
		}

		found := false
		// If in use, skip the IP
		for _, p := range inUseIPs {
			if temp.Compare(p) == 0 {
				found = true
				break
			}
		}
		if found {
			continue
		}

		prefix, err = ipam.ReleaseIP(&goipam.IP{
			IP: temp,
			ParentPrefix: prefix.Cidr,
		})
		if err != nil {
			log.Panicln(err)
		}
	}

	log.Printf("%+v %+v", prefix.Usage(), err)

	encode, err := prefix.GobEncode()
	if err != nil {
		log.Panicln(err)
	}

	log.Printf("%+v", base64.StdEncoding.EncodeToString(encode))
}

added in go-ipam/prefix.go.

func (p Prefix) GetIPs() map[string]bool {
	return p.ips
}

Now I'm starting to investigate why the IPv4 addresses were not properly released and I'll keep this thread up-to-date as I debug this issue further.

IP reemployment within CIDR range

Hi,

I've started using this CRD 6 months ago and it works fine until now. However I am concerned that during my Kapsule cluster the total number nodes once provisioned by the cluster autoscaling routine exceed the total number of IPs within the specified CIDR in the PrivateNetwork resource.

Here is the relevant part of my configuration:

static:
  cidr: "192.168.0.0/24"
  availableRanges:
  - 192.168.0.3/32
  - 192.168.0.4/30
  - 192.168.0.8/29
  - 192.168.0.16/28
  - 192.168.0.32/27
  - 192.168.0.64/26
  - 192.168.0.128/25

It corresponds to the 192.168.0.3-254 range.

When I verify the NetworkInterfaces in my cluster, only few node do have an attached IP:

NAME                        ADDRESS
vpc-ens5-4jtnp              192.168.0.206/24
vpc-ens5-4nldl
vpc-ens5-7rrnb              192.168.0.11/24
vpc-ens5-6948w              192.168.0.33/24
vpc-ens5-bhcw6
vpc-ens5-c5ktx
vpc-ens5-cknmd
vpc-ens5-jw9cw
vpc-ens5-kc5k8              192.168.0.10/24
vpc-ens5-kq8nq
vpc-ens5-m9jfr
vpc-ens5-qzm4g
vpc-ens5-rxtnp
vpc-ens5-v6s4f              192.168.0.254/24
vpc-ens5-vzc8z              192.168.0.17/24
vpc-ens5-wwwlw              192.168.0.13/24
vpc-ens5-xc2wh

With the 192.168.0.254 IP, it look like every available IP spots have been used and i cannot reemploy an unused (but once used) IP.
I have provisioned new node and indeed for each observed that none were attached to the VPC.

Thank you in advance for your help!

IP not assigned when scaling large amount of nodes

We have been using this VPC operator for quite some time now, and we're really happy with it. The only problem what we encountered is that sometimes a node is not assigned an IPv4 address. I just replace the node in the Kapsule console to "fix" it.

❯ k get networkinterfaces
NAME                                ADDRESS          NODE NAME                                        MAC ADDRESS         LINK NAME
hiddenfromyou-staging-db-pn-4lrnd   172.21.0.11/24   scw-hiddenfromyou-stagin-dev1-m-ctd-no--ff7c21                       
hiddenfromyou-staging-db-pn-5fbgw   172.21.0.1/24    scw-hiddenfromyou-stagin-dev1-m-ctd-no--0b8d14   02:00:00:00:49:a4   ens5
hiddenfromyou-staging-db-pn-7g6lq   172.21.0.8/24    scw-hiddenfromyou-stagin-dev1-m-ctd-no--775f4b   02:00:00:00:49:a6   ens5
hiddenfromyou-staging-db-pn-86q5x   172.21.0.4/24    scw-hiddenfromyou-stagin-dev1-m-ctd-no--792f2f   02:00:00:00:49:a5   ens5
hiddenfromyou-staging-db-pn-9vlkc   172.21.0.12/24   scw-hiddenfromyou-stagin-dev1-m-ctd-no--30f4d6   02:00:00:00:49:aa   ens5
hiddenfromyou-staging-db-pn-j9724   172.21.0.7/24    scw-hiddenfromyou-staging-monitoring-ae5ad8244   02:00:00:00:49:15   
hiddenfromyou-staging-db-pn-kjg55   172.21.0.3/24    scw-hiddenfromyou-staging-dev1-m-ctd-pg-5550ce   02:00:00:00:49:12   ens5
hiddenfromyou-staging-db-pn-mm59j   172.21.0.2/24    scw-hiddenfromyou-staging-dev1-m-ctd-pg-502fae   02:00:00:00:49:11   ens5
hiddenfromyou-staging-db-pn-pvvrp   172.21.0.6/24    scw-hiddenfromyou-staging-dev1-m-ctd-pg-23e920   02:00:00:00:49:14   ens5
hiddenfromyou-staging-db-pn-q68lv   172.21.0.9/24    scw-hiddenfromyou-stagin-dev1-m-ctd-no--158b22   02:00:00:00:49:a7   ens5
hiddenfromyou-staging-db-pn-v68w5   172.21.0.10/24   scw-hiddenfromyou-stagin-dev1-m-ctd-no--ddc8f0   02:00:00:00:49:a8   ens5

❯ k get pods -o wide | grep scw-hiddenfromyou-stagin-dev1-m-ctd-no--ff7c21
scaleway-k8s-vpc-node-n2lcw                  1/1     Running   0          12m   10.18.226.113   scw-hiddenfromyou-stagin-dev1-m-ctd-no--ff7c21   <none>           <none>

❯ k logs scaleway-k8s-vpc-node-n2lcw
I1203 09:37:12.113097       1 listener.go:44] controller-runtime/metrics "msg"="metrics server is starting to listen"  "addr"=":8080"
I1203 09:37:12.535973       1 node.go:108] setup "msg"="starting manager"  
I1203 09:37:12.536413       1 internal.go:406] controller-runtime/manager "msg"="starting metrics server"  "path"="/metrics"
I1203 09:37:12.537088       1 controller.go:142] controller "msg"="Starting EventSource" "controller"="networkinterface" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"id":"","nodeName":""},"status":{"linkName":"","macAddress":""}}}
I1203 09:37:12.637693       1 controller.go:142] controller "msg"="Starting EventSource" "controller"="networkinterface" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"id":""},"status":{}}}
I1203 09:37:12.739127       1 controller.go:149] controller "msg"="Starting Controller" "controller"="networkinterface" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface" 
I1203 09:37:12.739202       1 controller.go:176] controller "msg"="Starting workers" "controller"="networkinterface" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface" "worker count"=1

❯ k logs scaleway-k8s-vpc-controller-8c55f447-4gs4q | grep hiddenfromyou-staging-db-pn-4lrnd
E1203 09:36:22.244447       1 privatenetwork_controller.go:223] controllers/PrivateNetwork "msg"="could not update networkInterface status" "error"="Operation cannot be fulfilled on networkinterfaces.vpc.scaleway.com \"hiddenfromyou-staging-db-pn-4lrnd\": the object has been modified; please apply your changes to the latest version and try again" "privatenetwork"={"Namespace":"","Name":"hiddenfromyou-staging-db-pn"} 
E1203 09:36:22.244562       1 controller.go:246] controller "msg"="Reconciler error" "error"="Operation cannot be fulfilled on networkinterfaces.vpc.scaleway.com \"hiddenfromyou-staging-db-pn-4lrnd\": the object has been modified; please apply your changes to the latest version and try again" "controller"="privatenetwork" "name"="hiddenfromyou-staging-db-pn" "namespace"="" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="PrivateNetwork"

The weird thing is, I'm not sure why it's modified. It seems to fail the update to the custom NetworkInterfaces CRD to add the MAC address, but nothing else seems to update it. Could this be some kind of race condition?

The modified error is always there on successful deployments as well, but there there logs are sliglty different:


I1203 09:35:39.740506       1 privatenetwork_controller.go:226] controllers/PrivateNetwork "msg"="Successfully created networkInterface hiddenfromyou-staging-db-pn-5fbgw on node scw-hiddenfromyou-stagin-dev1-m-ctd-no--0b8d14" "privatenetwork"={"Namespace":"","Name":"hiddenfromyou-staging-db-pn"} 
E1203 09:35:39.793400       1 networkinterface_controller.go:143] controllers/NetworkInterface "msg"="failed to update networkInterface hiddenfromyou-staging-db-pn-5fbgw" "error"="Operation cannot be fulfilled on networkinterfaces.vpc.scaleway.com \"hiddenfromyou-staging-db-pn-5fbgw\": the object has been modified; please apply your changes to the latest version and try again" "networkinterface"={"Namespace":"","Name":"hiddenfromyou-staging-db-pn-5fbgw"} 
E1203 09:35:39.793600       1 controller.go:246] controller "msg"="Reconciler error" "error"="Operation cannot be fulfilled on networkinterfaces.vpc.scaleway.com \"hiddenfromyou-staging-db-pn-5fbgw\": the object has been modified; please apply your changes to the latest version and try again" "controller"="networkinterface" "name"="hiddenfromyou-staging-db-pn-5fbgw" "namespace"="" "reconcilerGroup"="vpc.scaleway.com" "reconcilerKind"="NetworkInterface"

It is certainly a recurring problem, as I've had this a couple of times already (5+).