Giter VIP home page Giter VIP logo

cni-migration's Introduction

cni-migration is a CLI tool for migrating a Kubernetes cluster's CNI solution from Flannel (Canal) to Cilium. The tool works by running both CNIs at the same time using multus-cni. All pods are updated to attach a network interface from both CNIs, and then migrate each node to only running Cilium. This ensures that all pods are able to communicate to both networks at all times during the migration.

How

The following are the steps taken to migrate the CNI. During and after each step, the inter-pod communication is regularly tested using knet-stress, which will send a HTTP request to all other knet-stress instances on all nodes. This proves a bi-directional network connectivity across cluster.

  1. This step involves installing both CNIs on all nodes and labelling the nodes accordingly.
  • Label all nodes with node-role.kubernetes/canal-cilium=true and patch the canal DaemonSet to have a node selector on this label.
  • Label all nodes with node-role.kubernetes/cni-priority-canal=true.
  • Deploy two knet-stress DaemonSets that run two knet-stress instances on each node.
  • Deploy two DeamonSet instances of Cilium.
    • The first has a node selector on the label node-role.kubernetes/cilium-canal=true and writes its CNI config to 99-cilium.conf. This then runs on all nodes.
    • The second has a node selector on the label node-role.kubernetes/cilium=true and writes its CNI config to 00-cilium.conf. This will not run until a node is being migrated.
  • Deploy twp DaemonSet instances of Multus.
    • Deploy multus DaemonSet with the node selectornode-role.kubernetes/cni-priority-canal=true. This has a static config that uses the Flannel CNI config for the main Pod IP network interface, and the Cilium as an extra network interface attached. The resulting CNI config is written to 00-multus.conflist. This CNI config will be chosen by the Kubelet until the node has been migrated.
    • Deploy multus DaemonSet with the node selectornode-role.kubernetes/cni-priority-cilium=true. This multus is the same as the previous however swaps the primary Pod IP to that of Cilium rather than Flannel.
  1. This step ensures that all workloads on the cluster are running with network interfaces from both CNIs. The "sbr" Channing CNI is used to the at the default route inside each pod is Cilium, however the Pod IP remains that of the range of Flannel.
  • Roll all nodes in the cluster one by one. This step ensures that every pod in the cluster is reassigned an IP, meaning that all pods will have a network interface from both CNIs applied using multus.
  • Check knet-stress connectively after every node roll.
  • At this stage, all pods on the cluster have both CNI network interfaces attached. All nodes are running the two CNIs which are controlled by multus.
  1. This step will reverse the order of priority of CNIs, so that Cilium becomes the primary Pod IP, with an extra Flannel network interface attached.
  • Relabel and roll all the nodes on the on the cluster with the label node-role.kubernetes/cni-priority-cilium=true. This will change the priority of the CNI on each cluster to Cilium and have each Pod IP be in Cilium's range.
  • Check knet-stress connectively after every node roll.
  • At this stage, all pods on the cluster have both CNI network interfaces attached, however the Pod IP is not in Cilium's range, rather than Flannel.
  1. This step is iterative by performing the same operation on all nodes until they have all been migrated.
  • First, the selected node is drained, tainted, and has all pods deleted on it. This node removes the label node-role.kubernetes/cilium-canal=true. The taint added uses the label node-role.kubernetes/cilium=true which terminates the first Cilium DaemonSet, replaced with the second. This second Cilium DaemonSet writes its CNI config to 00-cilium.conf which puts it as the first CNI config to be selected and used by Kubelet, making this node now only use Cilium CNI, rather than multus (Cilium and Canal).
  • The node is untainted which allows workloads to be re-scheduled to it, which will have only Cilium CNI network interfaces attached. These pods should still be reachable by all other pods in the cluster.
  • The node has the label node-role.kubernetes/migrated=true added which signals that this node has been migrated.
  1. After migrating all nodes, we now do a simple clean up of old resources.
  • The now unused and non-scheduled Multus, Canal, and first Cilium DaemonSets are deleted.

The cluster should now be fully migrated from Canal to Cilium CNI.

Requirements

The following requirements apply in order to run the migration.

Firewall

  • Cilium uses Geneve as a backend mode and as such, needs the port 6081 over UDP to communicate across nodes. This must be opened before migration. Note: Cilium can not run in VXLAN mode since it has not been possible to run two separate VXLAN interfaces on each host (one for Flannel and one for Cilium).
  • All Kubernetes NetworkPolices will remain active and applied during, and after the migration, being compatible with Cilium. No action needed.

Images

  • docker.io/cilium/cilium:v1.7.2
  • docker.io/cilium/operator:v1.7.2
  • nfvpe/multus:v3.4.1
  • gcr.io/jetstack-josh/knet-stress:cli (preferably a private image is built from source and used)

Configuration

The cni-migration tool has input configuration file (default --config conifg.yaml), that holds options for the migration.

labels

This holds options on which label keys and shared value should be used for each signal of steps:

  canal-cilium: node-role.kubernetes.io/canal-cilium
  cni-priority-canal: node-role.kubernetes.io/priority-canal
  cni-priority-cilium: node-role.kubernetes.io/priority-cilium
  rolled: node-role.kubernetes.io/rolled
  cilium: node-role.kubernetes.io/cilium
  migrated: node-role.kubernetes.io/migrated
  value: "true" # used as the value to each label key

paths

The file paths for each manifest bundle:

  cilium: ./resources/cilium.yaml
  multus: ./resources/multus.yaml
  knet-stress: ./resources/knet-stress.yaml

preflightResources

List of resources that must exist before beginning the migration.

  daemonsets:
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets:

watchedResources

List of resources which must be ready when checked throughout the migration before continuing:

  daemonsets:
    kube-system:
    - canal
    - cilium
    - cilium-migrated
    - kube-multus-canal
    - kube-multus-cilium
    - kube-controller-manager
    - kube-scheduler
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets:

cleanUpResources

List of resources which will be removed after completing the migration successfully:

  daemonsets:
    kube-system:
    - canal
    - cilium
    - kube-multus-canal
    - kube-multus-cilium
    knet-stress:
    - knet-stress
    - knet-stress-2
  deployments:
  statefulsets:

cni-migration's People

Contributors

joshvanl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cni-migration's Issues

[Question] Why you have decided to use --ipv4-range=x.x.x.x

First thanks for sharing your work, highly appreciate it!

On the question - why you have decided to use --ipv4-range=172.29.0.0/16 as an arg on the Cilium DS instead of using for example
ipam: kubernetes or ipam: cluster-pool and have per-node subnets for the pods?
together with:

  cluster-pool-ipv4-cidr: "x.x.x.x"
  cluster-pool-ipv4-mask-size: "24"

Is there any particular reason? @JoshVanL

Thanks in advance!

Using `cni-migration` for migration with "plain" Calico as CNI

First of all thank you so much for your aweseome work ๐Ÿ‘ and your blog post on the website of Cilium!
I'd like to use cni-migration for migrating a k8s-cluster to Cilium as the only CNI (I'd like to avoid CNI-chaining) - as I'm having just Calico as CNI and not Canal (Calico on top of Flannel) as in your scenario.

If I understand it correcly, basically all I have to do is to adapt the resources (multus.yaml) and in particular:

  • ConfigMap multus-cni-config to reflect the definition of Calico by changing/adopting cni-conf-flannel.json:
cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.0",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
            "type": "host-local",
            "subnet": "usePodCidr"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        }
      ]
    }

As well as change DaemonSet for calico in the config.yaml

But not quite sure if there's more to it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.