Giter VIP home page Giter VIP logo

Comments (6)

billimek avatar billimek commented on May 25, 2024 3

Updates are currently being handled from a base host and 'pushing' new config to the k3s nodes, for example:

for node in f g h; NIX_SSHOPTS="-A" nixos-rebuild switch --flake .#k3s-$node --target-host nix@k3s-$node --use-remote-sudo; end

This, in conjunction with a scheduled reboot checker and associated /usr/bin symlinks ensures that cured will properly drain and reboot the notes when required.
image

from k8s-gitops.

billimek avatar billimek commented on May 25, 2024

Leveraging an intel N100 'T9 Plus mini PC' from China as the node.

Installing NixOS via USB-based installer. Most of this is documented here and won't repeat it for this write-up.

Using Nix to install and configure k3s via this configuration.

from k8s-gitops.

billimek avatar billimek commented on May 25, 2024

Issues encountered and resolutions:

Issue Resolution
Password problems This is more of a NixOS issue but when initially creating a user, the password was being 'unset' and it was only possible to access via SSH and associated SSH keys. It wasn't possible to log-in via the physical terminal. The resolution was to make the user 'mutable', set an default password, and then set the real password (using passwd) at the time of initial bootstrapping (ref)
Swap device exists swapDevices = lib.mkForce [ ]; doesn't seem to work as expected (ref). Following additional documentation here by having the GPT partition not automount fixed the issue
VLAN configuration Intended to have the host configure itself to use a VLAN and not require switch configuration. Spent a lot of time trying different configurations. Ultimately was not successful in having the host set VLAN and needed to configure this on the switch instead. Will likely revisit this in a more controlled environment like a VM
ceph rdb issues csi-cephfsplugin pods were crashlooping. Eventually determined the issue to be the lack of the rdb kernel module. Resolved by configuring boot.kernelModules = [ "kvm-intel" "rbd" ]; (ref)
no logs Oddly, couldn't view any logs from pods running on this node (i.e. kubectl logs -f <some pod>). No issues from pods running on other nodes. Also saw entries in dmesg output suggesting that a firewall was blocking some stuff. Eventually disabled the firewall (via networking.firewall.enable = false;) and this issue was resolved (ref)
NFS issues Workloads requiring NFS access (i.e. plex) complained about mounting an NFS volume. Investigation led me to understand that rpcbind is required. Resolved by setting services.rpcbind.enable = true (ref)
system-upgrade-controller system-upgrade-controller was failing to operate properly on the new node. Determined that when Nix is controlling k3s installation, (as is the case here) it prevents it from being messed-with. As system-upgrade-controller is trying to do an in-place replacement of the k3s binary on the host, this fails. Resolution was to switch to the 'unstable' nix branch for k3s so that the k3s version will (currently) match the version that system-upgrade-controller handles on the other nodes (ref). Long-term, this probably needs to be handled consistently. Two options are: 1. use NixOS for all nodes, or 2. install and manage k3s out-of-band from Nix
no automated OS updates Ubuntu can do auto upgrades for security vulnerabilities, and it even flags the node for reboot when required vis kured. NixOS can do this too via the system.autoUpgrade but I need to understand how to properly configure it and, if possible, support kured for safe reboots as well. (ref) Resolved partly via this and this

from k8s-gitops.

billimek avatar billimek commented on May 25, 2024

This is going well so far. Pretty easily added two more nodes running NixOS.

However, the issue of k3s upgrades become apparent again when I discovered that system upgrade controller upgraded the rest of the cluster to v1.27.4+k3s1 while the NixOS nodes were still running v1.27.3+k3s1. I was able to run nixos-rebuild switch to get them upgraded but it would be better if this was more automated.

Still don't have a good solution for how to solve automated nixos-rebuilds, especially with secrets involved. Still pondering.

from k8s-gitops.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.