Giter VIP home page Giter VIP logo

Comments (105)

mnencia avatar mnencia commented on May 25, 2024 6

This works and the resulting system is executing ignition

MICROOS_DISK="https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2"
curl -sL "${MICROOS_DISK}" -o /tmp/MicroOS.qcow2
qemu-img convert -p -f qcow2 -O host_device /tmp/MicroOS.qcow2 /dev/sda

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 4

Alright, folks, initial logic is done and has been pushed to the staging branch, more testing is required now.

The key to unlocking this last part was finding out how k3s is installed on MicroOS, thanks to https://build.opensuse.org/package/show/openSUSE:Factory/k3s.

Also, made use of the config.yaml option to configure k3s, as explained in k3s configuration file.

What remains is installing Kured on the cluster itself and configuring it if needed. For that, someone needs to install Kubic (also a MicroOS derivative, that uses kubeadm), as it it already configured to work with transactional-udpates and dump this file /usr/share/kured/kured-<version>.yaml as explained in here, to see what the config params are.

Any PRs, comments, or tweaks are always welcome! 🙏

ksnip_20220206-083941

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 4

I've opened #48 to improve messages while we are waiting for ssh to be available

from terraform-hcloud-kube-hetzner.

sysrich avatar sysrich commented on May 25, 2024 3

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 3

I've tried with the volume and MicroOS based on Leap 15.3 (Experimental) and it works

Image: https://download.opensuse.org/repositories/openSUSE:/Leap:/Micro:/5.1/images/openSUSE-Leap-Micro.x86_64-Default.raw.xz

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 3

This list of commands works for me.

I had to remove the gpg and sha256sum verification because they do not match.

[
      "set -ex",
      "gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys 0x22C07BA534178CD02EFE22AAB88B2FD43DBDC284",
      "MIRROR_URL='https://mirrorcache.opensuse.org/tumbleweed/appliances/'",
      "IMAGE_NAME='openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2'",
      "IMAGE_URL=$MIRROR_URL/$IMAGE_NAME",
      "curl --progress-bar -L -o $IMAGE_NAME $IMAGE_URL",
      "curl --progress-bar -L -o $IMAGE_NAME.sha256 $IMAGE_URL.sha256",
      "curl --progress-bar -L -o $IMAGE_NAME.sha256.asc $IMAGE_URL.sha256.asc",
      # "gpg --verify $IMAGE_NAME.sha256.asc", # TODO: this doesn't match
      # "sha256sum -c $IMAGE_NAME.sha256", # TODO: this doesn't match
      "qemu-img convert -p -f qcow2 -O host_device $IMAGE_NAME /dev/sda",
      "sgdisk -e /dev/sda",
      "partprobe /dev/sda",
      "parted -s /dev/sda resizepart 4 99%",
      "parted -s /dev/sda mkpart primary ext2 99% 100%",
      "mount /dev/sda4 /mnt/ && btrfs filesystem resize max /mnt && umount /mnt",
      "mke2fs -L ignition /dev/sda5",
      "mount /dev/sda5 /mnt",
      "mkdir /mnt/ignition",
      "cp /root/config.ign /mnt/ignition/config.ign",
      "umount /mnt",
      "shutdown -r +1",
      "sleep 1",
      "exit 0"
]

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 3

Done for today - Remember, if you advance, please start by pulling the staging branch, and when you are done open a PR there - Or just shoot ideas here!

Everything is welcome to move this needle forward, and thanks one more time for the great job today @phaer and @mnencia ! You folks saved the day 🙏

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 3

Perfect, have replaced udevadm settle with what is needed.

About the shutdown -r +1, maybe you understood already. Terraform needs to hear back a success, that's why we plan the reboot in the future, hence the above command, and than exit 0. Unfortunately, the shutdown command only supports minutes.

I'm not sure I understand your command above, @mnencia, especially that part '(sleep 2; reboot)&, could you please explain?

It's a trick to have a successful exit code from a reboot command via ssh. I'll explain briefly:

() creates a subshell that is sent to the background from the &, the sleep 2 gives some time to the connection to terminate, then the reboot takes effect. The trick works because the ssh command doesn't allocate a tty, so the processes are not signaled when the connection terminates.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 2

Folks, recap on the advances. Both methods above have failed.

  • The one with dd created a read-only filesystem, even for '/boot/writable' that is not supposed to be read-only. Probably because of the sensitivity of btrfs. So we cannot add the ignition file for SSH.
  • The one from hetzner-microos, used kexec to launch the tumbleweed yast installer, all manual, not what we want, and does not give a proper MicroOS, just Tumbleweed with transactional updates if you choose so during the setup. So no go!

However, the good news is that MicroOS has images already loaded with k3s, so we have this option https://downloadcontent.opensuse.org/tumbleweed/appliances/iso/openSUSE-MicroOS.x86_64-k3s-SelfInstall.iso

For now, attempts to install it via kexec on the rescue env have failed, because of space limitations. Even when I load it on volumes. But there are still things to try and I have already requested Hetzner support to add it to their list of ISO (as a plan B), hopefully, they'll do so.

Let's see! 🤞 If you can think of something, do not hesitate!

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2
  • The one with dd created a read-only filesystem, even for '/boot/writable' that is not supposed to be read-only. Probably because of the sensitivity of btrfs. So we cannot add the ignition file for SSH.

I think that's by purpose. You can mark the filesystem as rw by running

btrfs property set /mnt ro false

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

I tried it, but Ignition does not read the configuration, probably because the partition is not labeled "ignition".

What about trying to use cloud-init? It should not require modifying the image, but only defining the right user-data content.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

Creating ignition/config.ign in the root doesn't work. Let's try with /boot/writable/ignition/config.ign

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 2

Hi, my approach did not work with the k3s self install image, but it did work with the one based on leap 15.3, as recommended by @mnencia.

I did not yet succeed to install k3s on this, as the default repo seems unavailable after boot.

Here's the terraform snippet I used to provision my test server. /root/config.ignwas provisioned beforehand via poseidon/ct provider:

  provisioner "remote-exec" {
    inline = [
      "set -ex",
      "gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys 0x22C07BA534178CD02EFE22AAB88B2FD43DBDC284",
      "export MIRROR_URL='https://download.opensuse.org/repositories/openSUSE:/Leap:/Micro:/5.1/images'",
      "export IMAGE_NAME='openSUSE-Leap-Micro.x86_64-Default.raw.xz'",
      "export IMAGE_URL=$MIRROR_URL/$IMAGE_NAME",
      "curl --progress-bar -L -o $IMAGE_NAME $IMAGE_URL",
      "curl --progress-bar -L -o $IMAGE_NAME.sha256 $IMAGE_URL.sha256",
      "curl --progress-bar -L -o $IMAGE_NAME.sha256.asc $IMAGE_URL.sha256.asc",
      "gpg --verify $IMAGE_NAME.sha256.asc",
      "sha256sum -c $IMAGE_NAME.sha256",
      "cat $IMAGE_NAME | xz -d | dd of=/dev/sda status=progress",
      "sgdisk -e /dev/sda",
      "partprobe /dev/sda",
      "parted -s /dev/sda resizepart 3 99%",
      "parted -s /dev/sda mkpart primary ext2 99% 100%",
      "mount /dev/sda3 /mnt/ && btrfs filesystem resize max /mnt && umount /mnt",
      "mke2fs -L ignition /dev/sda4",
      "mount /dev/sda4 /mnt",
      "mkdir /mnt/ignition",
      "cp /root/config.ign /mnt/ignition/config.ign",
      "umount /mnt",
      "shutdown -r +1",
      "sleep 1",
      "exit 0"
    ]
 

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

@phaer You are right. Using the URL you provided works perfectly. Probably the mirrorcache URL was redirecting the download to different mirrors with different images.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

Because it means to use the mirrors anyway

$ curl -I https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220202.qcow2
HTTP/2 302
location: https://mirrorcache.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220202.qcow2
content-type: text/html; charset=iso-8859-1
date: Fri, 04 Feb 2022 21:24:08 GMT
server: Apache

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

What about using

TMP_DIR=$(mktemp -d)
aria2c --dir=$TMP_DIR --follow-metalink=mem https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4

It is safer to isolate the download in a dedicated directory

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

Added eth1 and hostname in #40

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 2

@mysticaltech Your last link, about Kured's configuration links to k3s docs a second time, I think thats a mistake?

Just installed Kubic in a VM, but /usr/share/kured/kured-<version>.yaml does not exist after boot. find / -iname '*kured*' only yields the following file:

/usr/share/k8s-yaml/kured/kured.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kured
rules:
# Allow kured to read spec.unschedulable
# Allow kubectl to drain/uncordon
#
# NB: These permissions are tightly coupled to the bundled version of kubectl; the ones below
# match https://github.com/kubernetes/kubernetes/blob/v1.19.4/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go
#
- apiGroups: [""]
  resources: ["nodes"]
  verbs:     ["get", "patch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs:     ["list","delete","get"]
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  verbs:     ["get"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs:     ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kured
subjects:
- kind: ServiceAccount
  name: kured
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kured
rules:
# Allow kured to lock/unlock itself
- apiGroups:     ["apps"]
  resources:     ["daemonsets"]
  resourceNames: ["kured"]
  verbs:         ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: kube-system
  name: kured
subjects:
- kind: ServiceAccount
  namespace: kube-system
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kured
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kured
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured            # Must match `--ds-name`
  namespace: kube-system # Must match `--ds-namespace`
spec:
  selector:
    matchLabels:
      name: kured
  updateStrategy:
   type: RollingUpdate
  template:
    metadata:
      labels:
        name: kured
    spec:
      serviceAccountName: kured
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      hostPID: true # Facilitate entering the host mount namespace via init
      restartPolicy: Always
      containers:
        - name: kured
          image: registry.opensuse.org/kubic/kured:1.9.1
                 # If you find yourself here wondering why there is no
                 # :latest tag on Docker Hub,see the FAQ in the README
          imagePullPolicy: Always
          securityContext:
            privileged: true # Give permission to nsenter /proc/1/ns/mnt
          env:
            # Pass in the name of the node on which this pod is scheduled
            # for use with drain/uncordon operations and lock acquisition
            - name: KURED_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          command:
            - /usr/bin/kured
#            - --force-reboot=false
#            - --drain-grace-period=-1
#            - --skip-wait-for-delete-timeout=0
#            - --drain-timeout=0
#            - --period=1h
#            - --ds-namespace=kube-system
#            - --ds-name=kured
#            - --lock-annotation=weave.works/kured-node-lock
#            - --lock-ttl=0
#            - --prometheus-url=http://prometheus.monitoring.svc.cluster.local
#            - --alert-filter-regexp=^RebootRequired$
#            - --alert-firing-only=false
#            - --reboot-sentinel=/var/run/reboot-required
#            - --prefer-no-schedule-taint=""
#            - --reboot-sentinel-command=""
#            - --slack-hook-url=https://hooks.slack.com/...
#            - --slack-username=prod
#            - --slack-channel=alerting
#            - --notify-url="" # See also shoutrrr url format
#            - --message-template-drain=Draining node %s
#            - --message-template-drain=Rebooting node %s
#            - --blocking-pod-selector=runtime=long,cost=expensive
#            - --blocking-pod-selector=name=temperamental
#            - --blocking-pod-selector=...
#            - --reboot-days=sun,mon,tue,wed,thu,fri,sat
#            - --reboot-delay=90s
#            - --start-time=0:00
#            - --end-time=23:59:59
#            - --time-zone=UTC
#            - --annotate-nodes=false
#            - --lock-release-delay=30m
#            - --log-format=text

I can't really investigate further atm, but that does look more like an example than a working config.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

Maybe we should go with the Suse registry. If they have this patch applied https://build.opensuse.org/package/view_file/openSUSE:Factory/kured/systemctl-path.patch?expand=1

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

I would have used udevadm settle instead of the sleep command.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

Fix for servicelb in #42

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 2

I found the initial issue with the restart command. Once you say it you cannot believe you missed it until then:

diff --git a/kured/patch.yaml b/kured/patch.yaml
index bfec414..bf72a0c 100644
--- a/kured/patch.yaml
+++ b/kured/patch.yaml
@@ -17,4 +17,4 @@ spec:
         - name: kured
           command:
             - /usr/bin/kured
-            - --reboot-command="/usr/bin/systemctl reboot"
+            - --reboot-command=/usr/bin/systemctl reboot

In pr #51

from terraform-hcloud-kube-hetzner.

spigell avatar spigell commented on May 25, 2024 1

Watching for your progress, thanks for sharing! I also stuck with k3os and my bare metal setup. Seeking for similar functionality from another os.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

About installing MicroOS on Hetzner, did you see https://github.com/balta3/hetzner-microos? It could give some useful hints in the direction of an automated install ok MicroOS on hetzner.cloud

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 1

Thanks, @mnencia, yes, thanks for sharing, I saw it too. It is interesting. But that method seems simpler https://major.io/2021/08/20/deploy-fedora-coreos-in-hetzner-cloud/.

I believe OpenSuse MicroOS inspired itself a good deal from Fedora CoreOS, as it even supports "ignition", maybe it even shares some code, so that method should work! 🤞

Will keep you folks posted, ASAP. Hopefully this week.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

I tried putting it under /boot/writable, mounted with the command mount -o subvol=/@/boot/writable /dev/sda4 /mnt, but it doesn't work anyway. Probably there is some subtle requirement that is missing.

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 1

If it's possible to resize the btrfs filesystem ourselves from hetzners rescue system, we could maybe just add a small, labeled partition at the end? That one wouldn't require an external volume. Going to test that :)

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

I think the add-label option in hcloud is about Hetzner metadata. However, Ignition doesn't run even creating a volume like the following

/dev/sdb1: LABEL="ignition" UUID="3dc811c0-9891-4e0f-8b94-1cc38f3967aa" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="ignition" PARTUUID="8fbab7ef-6be0-46b6-b594-53d9847a9e8d"

Looking in the system and initrd content it looks like there is no ignition handling code running in the SelfInstall flavor.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 1

Now the next steps are very clear:

  • Check the partition size, resize to max if necessary (like @phaer did above)
  • Verify networking, especially the private network interface, which needs to work well before anything else can happen. It should receive DHCP, and maybe we need to add routes to the Hetzner GW to it like done in the current code.
  • Then, we have to run k3s for each of the three configurations, starting with the control plane.

I will definitely look at this as soon as I can but if you continue the work, please do not hesitate to share your success and failures, so we advance faster!

The last step will be to configure the cluster with kured for automatic safe rebooting after upgrades, and also transactional-updates, which luckily for us has full support for it.

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 1

I had to remove the gpg and sha256sum verification because they do not match.

Interesting, I had the same problem when I tried that image, but just failed to reproduce it. Could be that those keys were just updated, or maybe there's inconsistent state between mirrors and we got different redirects.

@mnencia Could you try if verification works if you use the following mirror for all 3 files (qcow, sha256sum, gpg signature)? If so, I would report that issue upstream.

MIRROR_URL='http://mirror.easyname.at/opensuse/tumbleweed/appliances/'

It is of course most important to get it to work at all, so great that it works without verification as a first step! 🎉

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

I tried running kured using the manifest provided by @phaer, and it looks working:

time="2022-02-06T19:04:44Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2022-02-06T19:04:44Z" level=info msg="Kubernetes Reboot Daemon: 1.9.1"
time="2022-02-06T19:04:44Z" level=info msg="Node ID: k3s-control-plane-0"
time="2022-02-06T19:04:44Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2022-02-06T19:04:44Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2022-02-06T19:04:44Z" level=info msg="Lock release delay not set, lock will be released immediately after rebooting"
time="2022-02-06T19:04:44Z" level=info msg="PreferNoSchedule taint: "
time="2022-02-06T19:04:44Z" level=info msg="Blocking Pod Selectors: []"
time="2022-02-06T19:04:44Z" level=info msg="Reboot schedule: SunMonTueWedThuFriSat between 00:00 and 23:59 UTC"
time="2022-02-06T19:04:44Z" level=info msg="Reboot check command: [test -f /var/run/reboot-required] every 1h0m0s"
time="2022-02-06T19:04:44Z" level=info msg="Reboot command: [/usr/bin/systemctl reboot]"

I've checked and the transactional-update system looks already configured correctly:

k3s-control-plane-0:~ # cat /etc/transactional-update.conf
REBOOT_METHOD=kured
static:~ # systemctl status transactional-update.timer
● transactional-update.timer - Daily update of the system
     Loaded: loaded (/usr/lib/systemd/system/transactional-update.timer; enabled; vendor preset: enabled)
     Active: active (waiting) since Sun 2022-02-06 17:55:00 UTC; 1h 23min ago
    Trigger: Mon 2022-02-07 00:35:47 UTC; 5h 17min left
   Triggers: ● transactional-update.service
       Docs: man:transactional-update(8)

Feb 06 17:55:00 static systemd[1]: Started Daily update of the system.
k3s-control-plane-0:~ # cat /usr/lib/systemd/system/transactional-update.timer
[Unit]
Description=Daily update of the system
Documentation=man:transactional-update(8)
After=network.target local-fs.target

[Timer]
OnCalendar=daily
AccuracySec=1m
RandomizedDelaySec=2h
#Persistent=true

[Install]
WantedBy=timers.target

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

The recipe to build the kured.yaml file is here.

mkdir -p %{buildroot}%{_datadir}/k8s-yaml/kured
cat kured-rbac.yaml kured-ds.yaml > %{buildroot}%{_datadir}/k8s-yaml/kured/kured.yaml
chmod 644  %{buildroot}%{_datadir}/k8s-yaml/kured/kured.yaml
sed -i -e 's|image: .*|image: registry.opensuse.org/kubic/kured:%{version}|g' %{buildroot}%{_datadir}/k8s-yaml/kured/kured.yaml

It's the concatenation of kured-rbac.yaml and kured-ds.yaml, with the image repointed from docker.io/weaveworks/kured to registry.opensuse.org/kubic/kured.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

That is precisely the exact content of https://github.com/weaveworks/kured/releases/download/1.9.1/kured-1.9.1-dockerhub.yaml, but the imagePullPolicy and the image fields in the daemonset spec. Using the GitHub release file directly is fine, but if we want to have the same behavior as Kubic, we can easily accomplish it using kustomize.

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 1

Yes, either udevadm settle or just a partprobe /dev/sda (where the parameter is even optional) should solve that. I started working on reducing sleep calls in the provisioners locally as provisioning new machines takes quite a while already, so each sleep we can avoid is a win IMO.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

One thing could be to reduce the reboot time by avoiding shutdown -r +1 and executing the following provisioner after the init script instead.

provisioner "local-exec" {
  command = "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -l root ${self.ipv4_address} '(sleep 2; reboot)&'"
}

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 1

Basically, now all seems to check out, let's just wait for a reboot to be confirmed without issue, (the kured logs should show it), and then we will be good to go! It updates along with tumbleweed as it's a superset of it, so that should happen anytime now! 🤞

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 1

Thanks, @mnencia! Now it's perfect 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024 1

Ah, just understood @DimStar77 you approve requests, but @ggardet you created them! Please, we need fresher versions of k3s for this project, how can we help streamline this? Please see the above proposal. Thanks!

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

This is what happened on my control node:

time="2022-02-09T07:19:03Z" level=info msg="Reboot required"
time="2022-02-09T07:19:03Z" level=info msg="Acquired reboot lock"
time="2022-02-09T07:19:03Z" level=info msg="Draining node k3s-control-plane-0"
WARNING: ignoring DaemonSet-managed Pods: kube-system/hcloud-csi-node-6xvnz, kube-system/kured-8wklm
evicting pod kube-system/hcloud-cloud-controller-manager-fcc9fb55c-spqqt
time="2022-02-09T07:19:04Z" level=info msg="Running command: [/usr/bin/nsenter -m/proc/1/ns/mnt -- /usr/bin/systemctl reboot] for node: k3s-control-plane-0"
time="2022-02-09T07:19:04Z" level=warning msg="nsenter: can't execute '/usr/bin/systemctl reboot': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2022-02-09T07:19:04Z" level=fatal msg="Error invoking reboot command: exit status 127"

It looks like it is trying to execute /usr/bin/systemctl reboot as a file.

That's weird as https://github.com/weaveworks/kured/blob/main/cmd/kured/main.go#L666 should correctly split it in subparts.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

The nsenter command seems to work when executed from inside the pod

/ # /usr/bin/nsenter -m/proc/1/ns/mnt -- /usr/bin/systemctl is-system-running
running

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

If you put quote around the command you get a similar error

/ # /usr/bin/nsenter -m/proc/1/ns/mnt -- '/usr/bin/systemctl is-system-running'
nsenter: can't execute '/usr/bin/systemctl is-system-running': No such file or directory

from terraform-hcloud-kube-hetzner.

phaer avatar phaer commented on May 25, 2024 1

Can't contribute anything to the reboot command atm, but

Something else we need to fix is SSH hardening. It seems that password auth is still active. And the ports that are hit, do not make sense to me, as those are supposed to be blocked by the firewall, just checked.

Disabling password auth would definitely be a good idea. But those ports seem harmless. The logged ones are source ports, so the first log line means that 221.131.165.75 is connecting from port 56702 to our local port 22 (implicitly, because thats the only one our sshd listens to.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

Regarding the restart of control-plane nodes, the main issue here is the fact that there are only two nodes. Etcd does not work well with only two nodes because if one restart the other loses the quorum anyway, so the cluster is unusable until the restarted node works again. There are also other issues:

  • The kubeconfig.yaml file only points to k3s-control-plane-0, so if for any reason that node dies, you have to repoint it to the second node.
  • In terraform k3s-control-plane-0 node is special, if you need to reinstall the node (for example because of a disk corruption) it is really difficult to replace it.

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024 1

I have a cluster with 3 masters, let's see how it reacts to the restart.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@sysrich I saw your article here https://rootco.de/2020-12-09-microos-pi-network-monitor/. I know it does not explain how to run MicroOS on Hetzner, but it says that Hetzner has included the ISO, unfortunately running hcloud iso list does not show it anymore. I am left with thinking about doing something like this in rescue mode:

export MICROOS_DISK=" https://downloadcontent.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-ContainerHost-SelfInstall.raw.xz"
curl -sL $MICROOS_DISK | xz -d | dd of=/dev/sda status=progress
  • I will try it out, but wanted to get your opinion if possible, do you think it will work?
  • After that, I just plan to issue a shutdown -r now command to reboot. Do you think networking will work OOTB?
  • And does MicroOS support Hetzner user-data?
  • If not, I can mount the disk just after dd finishes, but where do embed the ignition or combustion file? Basically, I want to install my SSH public key and k3s, or at least the former, if I can install k3s via SSH later on.

Last but not least, if you have done this before or have any user-data example or general guidance, it would be very much appreciated.

Am trying you move away from k3os as soon as possible, and I know that MicroOS is a much better solution (thanks to your hard work in part I would imagine)! 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Oh wow, will try ASAP @mnencia! 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Awesome! Did you create a folder in /boot/writable called ignition, and copy your generated config.ign to it? See https://en.opensuse.org/Portal:MicroOS/Ignition and https://en.opensuse.org/Portal:MicroOS/Design.

That's the format of mine for instance:
{"ignition":{"version":"3.0.0"},"passwd":{"users":[{"name":"root","sshAuthorizedKeys":["ssh-ed25519 ___ [email protected]"]}]}}

Also, with the Hetzner console, you can basically "plug a virtual monitor in the server' and see the boot process, which will tell you if the ignition is picked up or not, or what are the errors generated. If it goes too fast, just use screen recording software.

ksnip_20220204-105333

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Also, if you are going to retry, do not forget to resize the btrfs after dd, so it takes the whole space left on the disk.

Last but not least, this image is probably best, because it has k3s installed in it already, we just need to see how it's done and how it updates, with transactional updates too or not. If it uses transactional updates then great, if not we can just install the k3s tumbleweed package in the vanilla MicroOS (above), and have it update in a transactional manner along with other OS updates.

https://mirrorcache.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-k3s-SelfInstall-Snapshot20220201.raw.xz

Source: https://mirrorcache.opensuse.org/tumbleweed/appliances/

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@mnencia About cloud-init, it could work and is worth a shot too. My initial attempts failed, but I did not try everything.

About ignition, another thing worth a shot is creating ignition/config.ign on the root of an attached volume!

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

The docs are not great, but https://en.opensuse.org/Portal:MicroOS/Design reveals the proper ignition location! The above should work 🤞
ksnip_20220204-111717

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Thank you @sysrich, really appreciate the guidance! 🙏 I obviously had completely missed that! 🤦

Will attempt it with an attached volume. @mnencia If you are going to try this before me, just wanted to share this new finding of mine, the hcloud cli has an add-label command that could be handy here (if you didn't know already).

ksnip_20220204-120030

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@mnencia Great find, and well done. If we can find the Default raw image for MicroOS based on Tumbleweed that works, that's it!

Maybe the ignition logic for these is located somewhere else? And does not support volumes.

@phaer Please let us know if you try with partioniong and it works with the k3s self install image, it would be ideal!

Just FYI folks, when I ran dd myself yesterday, I had to run parted -l to correct a size mismatched error that was showing up on fdisk -l /dev/sda.

Also, I believe it's important to rezize the btrfs to max. Saying all this because it could perhaps influence the ignition seeking mechanisms on Tumbleweed based images?!

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Wonderful! You made my day @mnencia! :-) Well done everyone. Now serious business can begin!

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Beautiful! Yes, image verification is not a big issue for now.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Why not just do this without mirrors folks? Just use https://download.opensuse.org/opensuse/tumbleweed/appliances/. That's a personal preference at least.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Ok I see, yeah makes sense!

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Turns out we can simplify even more just by using the metalink and aria2 (thank you get.opensuse.org, I wish we had found that before).

It takes care of mirrors and even does automatic checksum checks, and corrections if I understand correctly. Basically, it just gets the job done without headaches, just by issuing:

aria2c https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-kvm-and-xen.qcow2.meta4

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Ok, that should work:

[
      "set -ex",
      "apt install -y aria2",
      "aria2c https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4",
      "qemu-img convert -p -f qcow2 -O host_device $(ls -a | grep MicroOS | grep -v meta4) /dev/sda",
      "sgdisk -e /dev/sda",
      "partprobe /dev/sda",
      "parted -s /dev/sda resizepart 4 99%",
      "parted -s /dev/sda mkpart primary ext2 99% 100%",
      "mount /dev/sda4 /mnt/ && btrfs filesystem resize max /mnt && umount /mnt",
      "mke2fs -L ignition /dev/sda5",
      "mount /dev/sda5 /mnt",
      "mkdir /mnt/ignition",
      "cp /root/config.ign /mnt/ignition/config.ign",
      "umount /mnt",
      "shutdown -r +1",
      "sleep 1",
      "exit 0"
]

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024

Wouldn't be better to use the -o option to use a fixed destination name?

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

No, unfortunately, it doesn't work for metalinks and torrents, because the name is dynamic.

But anyways, got it working and it's beautiful, thank you so much for the amazing work today, that is art! :) Will push a staging branch as soon as I have something solid.

ksnip_20220205-003632

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Love the --follow-metalink=mem definitely cleaner! 🙏

About the temp dir, I just feel it's overkill because we are on the rescue env anyway. But to make it safer, I made the grep a lot more selective with a proper regex that matches the full thing, without room for doubt!

ksnip_20220205-010058

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Have created a new branch called staging and have pushed the initial work in there.

Please note that agents.tf, servers.tf, and output.tf are stored in the temp folder for the moment. I am adding to the root folder only the necessary files needed to reach the next step.

Please do not hesitate to submit PR there.

For now, the master instance is created, we are able to access it with SSH, and k3s is there so this is great.

But we have to find a way to configure eth1 because it comes DOWN. If you have any ideas, please shoot. I believe we may need to add a simple combustion file, in the extra partition, renaming the label to "combustion" (they allow both ignition and combustion, so that is great).

ksnip_20220205-012116

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

So ip link set eth1 up does turn the interface UP. But no DHCP happens, and I do not know how to provoke it. Maybe we have to install new packages?! Seems unlikely.

Also wicked ifup all returns the following, without eth1, as if it does not have it in its config!

ksnip_20220205-024632

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Indeed, it's not configured. That article I believe says how to copy the configuration file. Apparently etc is writable so config should persist, no need for "combustion" for now.

ksnip_20220205-024855

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Wonderful, thank you @mnencia, simple and elegant solution! 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Yes, now corrected! Thanks for catching @phaer :)

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Great find folks, so that basically confirms that MicroOS, and Kubic do create the default "indicator" file /var/run/reboot-required when it needs to reboot. And Kured picks this up, and issues the standard systemctl reboot command when it's ready. They kept it simple! :)

Will just go with the Github release directly for now, as I do not see the default behavior changing anytime soon.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Yes, that makes sense, on it @mnencia! 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Probably they did the patch because before we could not specify a custom reboot-command for kured, but now we can, so just did a "kustomization" and added the right command (still using the official kured from github).

Let's give it a shot, normally it should work! 🤞

ksnip_20220207-075544

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

About the initial partitioning, it was not always working consistently, turns out the disk just needed some "breathing" time before doing the mounting! So added a sleep 5 && fdisk -l /dev/sda in between and it seems to have solved the problem.

Screenshot of the issues, that was happening:
ksnip_20220207-082239
ksnip_20220207-082205
ksnip_20220207-082147
ksnip_20220207-081920

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Here's the final working config of the initial disk partitioning for ignition (for others looking at this later on):

[
    "set -ex",
    "apt-get install -y aria2",
    "aria2c --follow-metalink=mem https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4",
    "qemu-img convert -p -f qcow2 -O host_device $(ls -a | grep -ie '^opensuse.*microos.*k3s.*qcow2$') /dev/sda",
    "sgdisk -e /dev/sda",
    "partprobe /dev/sda",
    "parted -s /dev/sda resizepart 4 99%",
    "parted -s /dev/sda mkpart primary ext2 99% 100%",
    "sleep 5 && fdisk -l /dev/sda",
    "mount /dev/sda4 /mnt/ && btrfs filesystem resize max /mnt && umount /mnt",
    "mke2fs -L ignition /dev/sda5",
    "mount /dev/sda5 /mnt",
    "mkdir /mnt/ignition",
    "cp /root/config.ign /mnt/ignition/config.ign",
    "umount /mnt",
    "shutdown -r +1",
    "sleep 1",
    "exit 0"
  ]

from terraform-hcloud-kube-hetzner.

mnencia avatar mnencia commented on May 25, 2024

Pull request to reduce reboot time #41

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Perfect, have replaced udevadm settle with what is needed.

About the shutdown -r +1, maybe you understood already. Terraform needs to hear back a success, that's why we plan the reboot in the future, hence the above command, and than exit 0. Unfortunately, the shutdown command only supports minutes.

I'm not sure I understand your command above, @mnencia, especially that part '(sleep 2; reboot)&, could you please explain?

Anyways, will push my changes now and you folks can enhance.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Oh wow, beautiful, thanks for that! That's some dark bash magic haha.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Just pushed the kured logic and the udevadm settle suggestion.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Just noticed that there's something up with the load-balancer, as if the disable: servicelb in the k3s config.yaml did not take effect (which would be worrisome because it could imply that the whole config file is maybe not read).

Will investigate that later tonight, but as always, don't hesitate to submit PRs! 🙏

ksnip_20220207-095147

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Actually, now that I am seeing it, it would be interesting to make the servicelb an option, what do you think?

ksnip_20220207-095932

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Between, had to use the config.yaml route, because putting the options in the default SERVER_OPTS and AGENTS_OPTS (like proposed in the k3s package in the conf files, also present on the system at /etc/rancher/k3s) did not work, especially for SERVER_OPTS.

As if the list was too long and it was getting truncated. Basically, they were not parsed correctly and k3s would not start. So left these env variables empty and added a config.yaml file like mentioned here https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Ah yes, just remembered, the config.yaml gets picked up for sure, because otherwise the servers, .i.e the additional control planes would not have joined the cluster without it. See https://github.com/kube-hetzner/kube-hetzner/blob/staging/templates/server_config.yaml.tpl

Someone probably needs to dive down the k3s logs using journalctl -u k3s-server (or journalctl -u k3s-agent on agent nodes), and see why the service lb has not been disabled successfully. I can do this tonight.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Fantastic! Let's continue to test this folks, and if all checks out, we merge it to master and will expose a k3os branch before for those that still want it.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

In the meantime, that's a great presentation of MicroOS by @sysrich. We are literally standing on the shoulders of giants, it's just amazing engineering! 🤯 🙏

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Hey folks, something I noticed is that the k3s package releases are very rare (the last one was 3 months ago), this is not ideal, it seems safer to have it follow the k3s official release schedule.

I tried to open a discussion with the lead maintainer (in the discussion section here) and asked how we could help.

It's easy to register an account on https://build.opensuse.org/ and with it, we can clone the repo with: osc -A https://api.opensuse.org checkout openSUSE:Factory/k3s

Also seems easy to upgrade the packages:

  • Replace the tar.gz file (the new one can probably be downloaded on the k3s releases page, seems like the raw source.
  • Replace the name everywhere with the latest tag on Github, today it's 1.23.3+k3s1 (just scraping the leading "v").
  • Update the doc and release notes, by copying them from k3s.
  • And push with osc up according to osc's readme.

I believe that would end up creating requests here https://build.opensuse.org/package/requests/openSUSE:Factory/k3s, and make the maintainers' life easier, they probably could test it and approve it.

We should try I guess. What do you think?

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@sysrich & @ibuildthecloud, As SUSE is now k3s's main patron and maintainer, IMHO, it would make sense that the k3s openSUSE:Factory package is always streamlined and synced with the official stable k3s releases. This could also probably be automated fairly easily via a CD system.

Please, folks, little projects like ours depend on a stable flow of k3s releases!

If we can help, just tell us what to do! Thanks and keep up the good work 🙏 ✨

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Thank you @phaer, for the nice reorganization of the k3s configs in terraform, without templates! Love it, it looks great and easier to understand :)

ksnip_20220208-162155

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@DimStar77 Finally found you! :) One of your packages, the k3s one is essential to our project! Thank you for your good work 🙏

Could we help you maintain fresher versions of k3s, ideally we could follow minor releases from https://github.com/k3s-io/k3s/releases? Would the osc flow proposed at #35 (comment) work to submit requests?

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Turns out some kind of CD is already in place, this is great news folks! That's a screenshot of the .osc/_service file in the openSUSE k3s package.

ksnip_20220208-231040

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

The build is already almost completely automated.

#
# spec file for package k3s
#
# Copyright (c) 2021 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via https://bugs.opensuse.org/
#


# To workaround https://github.com/rancher/k3s/issues/231, build kubectl
%define build_kubectl 1

# baseversion - version of kubernetes for this package
%define baseversion 1.20

Name:           k3s
Version:        1.22.3+k3s1
Release:        0
Summary:        A container orchestration system based on a reduced Kubernetes feature set
License:        Apache-2.0
Group:          System/Management
URL:            https://k3s.io
Source0:        https://github.com/k3s-io/k3s/archive/v%{version}.tar.gz#/%{name}-%{version}.tar.gz
Source1:        k3s-server.service
Source2:        k3s-agent.service
Source3:        server.conf
Source4:        agent.conf
BuildRequires:  c_compiler
BuildRequires:  golang-packaging
BuildRequires:  pkgconfig
BuildRequires:  systemd-rpm-macros
BuildRequires:  golang(API) >= 1.16
BuildRequires:  pkgconfig(sqlite3)
%ifarch aarch64
BuildRequires:  binutils-gold
%endif
Requires:       cni-plugins
Requires:       conntrack-tools
Requires:       containerd
Requires:       iptables
Requires:       runc
# Conflicts:      cri-tools
Conflicts:      kubectl
Conflicts:      kubernetes-client
Conflicts:      kubernetes-client-provider
Requires(post): update-alternatives
Requires(postun):update-alternatives
%{?systemd_requires}

%description
k3s is a container orchestration system for automating application
deployment, scaling, and management. It is a Kubernetes-compliant
distribution that differs from the original Kubernetes (colloquially
"k8s") in that:

  * Legacy, alpha, or non-default features are removed.
  * Most in-tree plugins (cloud providers and storage plugins) were
    removed, since they can be replaced with out-of-tree addons.
  * sqlite3 is the default storage mechanism.
    etcd3 is still available, but not the default.
  * There is a new launcher that handles a lot of the complexity of
    TLS and options.

%prep
%setup -q -n %{name}-%(echo %{version} | tr '+' '-')
sed -i 's#exec.LookPath("host-local")#exec.LookPath("%{_libexecdir}/cni/host-local")#' pkg/agent/config/config.go

%build
%{goprep} github.com/rancher/k3s
%{gobuild}
%if %{build_kubectl}
%{gobuild} ./cmd/kubectl
%endif

%install
%{goinstall}
%if %{build_kubectl}
%{goinstall} ./cmd/kubectl
mv -v %{buildroot}%{_bindir}/kubectl %{buildroot}%{_bindir}/kubectl%{baseversion}
%endif

# Install symlinks
pushd %{buildroot}%{_bindir}
%if !%{build_kubectl}
ln -s k3s kubectl%{baseversion}
%endif
# ln -s k3s crictl
popd

mkdir -p %{buildroot}%{_localstatedir}/lib/rancher/k3s
mkdir -p %{buildroot}%{_localstatedir}/lib/rancher/k3s/server/manifests

mkdir -p %{buildroot}%{_sysconfdir}/rancher/k3s
install -D -m 644 %{SOURCE3} %{buildroot}%{_sysconfdir}/rancher/k3s/server.conf
install -D -m 644 %{SOURCE4} %{buildroot}%{_sysconfdir}/rancher/k3s/agent.conf

mkdir -p %{buildroot}%{_unitdir}
install -D -m 644 %{SOURCE1} %{buildroot}%{_unitdir}/k3s-server.service
install -D -m 644 %{SOURCE2} %{buildroot}%{_unitdir}/k3s-agent.service
mkdir -p %{buildroot}%{_sbindir}
ln -s %{_sbindir}/service %{buildroot}%{_sbindir}/rck3s-server
ln -s %{_sbindir}/service %{buildroot}%{_sbindir}/rck3s-agent

# alternatives
ln -s -f %{_sysconfdir}/alternatives/kubectl %{buildroot}%{_bindir}/kubectl

%pre
%service_add_pre k3s-server.service k3s-agent.service

%post
export baseversion="%{baseversion}"
%{_sbindir}/update-alternatives \
  --install %{_bindir}/kubectl kubectl %{_bindir}/kubectl%{baseversion} ${baseversion/./}

%service_add_post k3s-server.service k3s-agent.service

%preun
%service_del_preun k3s-server.service k3s-agent.service

%postun
if [ ! -f %{_bindir}/kubectl%{baseversion} ] ; then
  update-alternatives --remove kubectl %{_bindir}/kubectl%{baseversion}
fi

%service_del_postun k3s-server.service k3s-agent.service

%files
%license LICENSE
%doc README.md
# %{_bindir}/crictl
%{_bindir}/k3s
%{_bindir}/kubectl
%{_bindir}/kubectl%{baseversion}
%{_localstatedir}/lib/rancher
%config %{_sysconfdir}/rancher
%config(noreplace) %{_sysconfdir}/rancher/k3s/agent.conf
%config(noreplace) %{_sysconfdir}/rancher/k3s/server.conf
%{_unitdir}/k3s-agent.service
%{_unitdir}/k3s-server.service
%{_sbindir}/rck3s-agent
%{_sbindir}/rck3s-server
%ghost %_sysconfdir/alternatives/kubectl

%changelog

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

More good news team, someone was kind enough to reply and explain how to do this! 🍾

ksnip_20220208-233321

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

So this is the proper package to work with: https://build.opensuse.org/package/show/devel:kubic/k3s

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Was able to execute some of the commands, it re-packaged v1.23.3+k3s1, but the test build failed on my machine.

Some weird errors, about failed DNS requests to proxy.golang.com (see screenshot). Tried to disable the proxy, was not successful yet. I suspect, that my environment itself must not be compatible, as I am on Fedora, and ideally, that build process is tuned for Tumbleweed machines. So maybe will try from a live openSUSE USB at some point.

Anyways, there is a path forward to try to speed up releases, now we know what the flow is, and how to submit upgrade requests ourselves.

ksnip_20220209-011238

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

About Node upgrades, one came through, and the /var/run/reboot-required showed up as planned. Also, kured uncordoned a few nodes, but before, I believe an error happens, because it pod itself restarts, but not the node.

Have replaced the image with the one from openSUSE as initially suggested by @mnencia, this one registry.opensuse.org/kubic/kured:1.9.1, and will report back tomorrow if the node rebooted or not.

ksnip_20220209-025639

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Something else we need to fix is SSH hardening. It seems that password auth is still active. And the ports that are hit, do not make sense to me, as those are supposed to be blocked by the firewall, just checked.

ksnip_20220209-024947

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@mnencia Good news, the registry.opensuse.org/kubic/kured:1.9.1 worked in rebooting my node, because we do not have to enter the reboot command manually, which apparently seems tricky! Should have listened to you since the beginning :)

ksnip_20220209-101221

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

Just pushed created a new kured.yaml from the file you dumped on Kubic @phaer, so that apparently works in rebooting the nodes. You can pull it on staging.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

However, it rendered, my control-plane unaccessible. So I guess, Kured in more for worker nodes, also saw that in the docs yesterday. I will comb the k3s logs for more details.

ksnip_20220209-101917

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

@phaer About the SSH ports, thanks for explaining! Definitely we can remove password auth through ignition normally.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

FYI, when my first_control_plane stopped "correctly" it seems, to reboot. Accessing these logs through journalctl -u k3s-server | less if you know a better way, please let me know.

ksnip_20220209-103623

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

And then, there is this error again and again! And also tried to re-run k3s-sever without the cluster-init:true param in the config, but to no avail, the same thing.

ksnip_20220209-104119

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

My guess is that we first ban control-plane nodes from kured, make sure it works correctly on the workers, and then, see what can be done with the former.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on May 25, 2024

If we exhaust all avenues, maybe we could go back to https://github.com/rancher/system-upgrade-controller, like used in k3os!

from terraform-hcloud-kube-hetzner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.