billimek / k8s-gitops Goto Github PK

View Code? Open in Web Editor NEW

605.0 10.0 83.0 10.6 MB

GitOps principles to define kubernetes cluster state via code

License: Apache License 2.0

Shell 100.00%

kubernetes gitops flux helm rook k8s cert-manager k3s prometheus grafana loki k8s-at-home volsync

k8s-gitops's Introduction

GitOps Workflow for Kubernetes Cluster

📖 Overview

Leverage Flux2 to automate cluster state using code residing in this repo

💻 Infrastructure

See the k3s setup in the homelab-infrastructure repo for more detail about hardware and infrastructure

⚙️ Setup

See setup for more detail about setup & bootstrapping a new cluster

🔧 Workloads (by namespace)

🤖 Automation

Renovate keeps workloads up-to-date by scanning the repo and opening pull requests when it detects a new container image update or a new helm chart

Kured automatically drains & reboots nodes when OS patches are applied requiring a reboot
System Upgrade Controller automatically upgrades k3s to new versions as they are released

🤝 Community

There is a k8s@home Discord for this community.

k8s-gitops's People

Contributors

Stargazers

Watchers

k8s-gitops's Issues

Deprecate sealed-secrets

This will follow after #57 is successfully completed.

run syslog server for all non-kubenetes & external logs

split infrastructure config to a separate repo

After operating this for a while, it seems 'better' to have the infrastructure setup in a separate dedicated repo. This will likely be the stuff in the setup/cluster directory as well as more details about the router & dedicated (for now) haproxy configuration.

The terraforming steps (yet to be created - see #33) will also be added to the new infrastructure repo.

Explore etcd disk I/O & memory growth

Using rancher:

Not using rancher (just rke):

Stash:

Incorrect Fluentd image repo ref

k8s-gitops/logging/fluentd.yaml

Line 18 in f6acbe8

repository: gcr.io/fluentd-elasticsearch/fluentd

Should be quay.io/fluentd_elasticsearch/fluentd

Add velero annotations to all necessary deployments

See #15 for some background.

Deployments to annotate with velero restic backup stuff:

explore 'wave'

See https://github.com/pusher/wave

migrate kibana to elastic/helm-charts

See https://github.com/elastic/helm-charts/tree/master/kibana

vault bootstrapping challenges

When creating a brand-new cluster, there is a challenge with manual steps required to access vault:

There are a lot of helm charts and other cluster secrets that require the vault-secrets-operator to create ,the cluster secrets
vault-secrets-operator needs access to vault in order to create the required secrets
vault must have the values already populated for vault-secrets-operator to work
vault must be running, unsealed, and accessible in order to populate with the necessary secrets

By running vault itself in the same cluster, a number of manual steps are necessary in order to satisfy number 3 & 3 above:

Flux must have already installed vault to the cluster
The vault instance must already be initialized and unsealed
The vault instance must already have the seed values written to it for all the cluster secrets

Furthermore, no matter what it seems some manual steps are going to be necessary to set-up vault-secret-operator access to vault, unless a pre-determined token is used.

Investigate:

Is it possible to automate/script the initial setup of vault when bootstrapping a new cluster?
Is it possible to run vault externally on the same network already (mostly) setup, with less manual bootstrapping steps involved?
Is it possible to run vault externally in google cloud, accessed via IAP tunnel already (mostly) setup, with less manual bootstrapping steps involved?

migrate elasticsearch to elastic/helm-charts

See https://github.com/elastic/helm-charts/tree/master/elasticsearch

Create matrix of mixed architecture support for workloads

For each thing running in this cluster (e.g. prometheus-operator, plex, grafana, unifi, etc), create a matrix showing which things natively support running in a mixed-architecture environment.

This would specifically be workloads that run container images that were built with multi-arch support. There are also a number of workloads with 'pending' multi-arch support that should be noted to (reference to the issue or PR).

Deploy vault-secrets-operator

See vault-secrets-operator.yaml & instructions

automate installation of tiller

Is it possible to automatically install tiller to a k3s cluster using the auto deploying manifests feature?

It would be nice to have tiller auto installed without needing to run helm init...

explore local storage for elasticsearch

Problem

Currently, running elasticsearch backed by ceph/rbd-based storage is overkill:

The three elasticsearch pods are replicating data across each-other
The ceph/rbd-backed storage is further replicating the data across three nodes
This 'looks like' a classic write amplification scenario and is unecessary

Proposal

Explore using kubernetes native local storage to hold the data 'locally' per node

See also this repo and associated helm chart

migrate to prometheus-operator

See:

explore flagger

See https://flagger.app/

migrate from traefik to nginx for ingress controller

See #67 for some background.

migrate workload storage from nfs to ceph rbd

See #3 for context

storage 👎

Persistent Storage is a really big pain in the ass

NFS

Centralized NFS has worked really well with little or no problems
Challenges with this approach are:
- This is file-level shared-access type of storage which is apparently very bad to use for things like databases (sqlite, mariadb, postgresdb, elasticsearch, etc) which need block-level storage
- All of the NFS storage is centralized to the proxmox node which means that other nodes are 'down' if the first node needs to go down. Realistically not sure how big of an issue this is

ceph

proxmox-provided ceph

ancient version of ceph and difficult to change (don't want to go beyond proxmox-provided ceph provider)
Need to do some special steps to make it work in the cluster which I don't care for
if I'm going to do ceph, I'd prefer to use rook
problems encountered:
- observed big issues with the entire ceph system (OSDs?) being non-responsive after a node reboot and requires a further reboot of the other nodes, or manually stopping/removing/adding/starting the OSD to get into a recovery state
- after running, untouched, for a week suddenly something happened with ceph and the OSDs started writing a ton of crap to the logs and the /var/log filesystem filled-up which is also used by the proxmox root filesystem. the mons detected no disk space and shut themselves down, which resulted in a completely unusable ceph system
- when ceph 'goes down' (which seems to happen with frequency), any VMs with storage backed by ceph completely lock-up making them unusable until a manual ceph recovery - this is unacceptable

rook-provided ceph

always being updated which is nice and can pin to recent versions of ceph (mimic for example)
requires direct passthrough of drives to the VMs - not a big deal but an extra step is required during VM setup
problems encountered:
- without warning of apparent reason, the OSDs get into a state where they think that they cannot see each-other and the entire ceph system locks up. a reboot of the node is required to recover
- If using the same network as the rest of the k8s cluster and LAN, when ceph gets into a problem state, tcp connections start building until the haproxy loadbalancer runs out of tcp connections and ALL of haproxy stops responding which completely fucks the entire network. This is NOT COOL
- k8s nodes cannot be rebooted without draining first because otherwise the node will hang forever with libceph errors being spit out in dmesg. Not cool

longhorn

It's 'alpha' and I should have known better:

randomly loses a node and you have to manually click on a bunch of things in the UI console to get it repaired
randomly went read-only, completely breaking the application relying on it

migrate filebeat

use fluxcloud

See https://github.com/justinbarrick/fluxcloud

explore traefik alternatives

It appears that traefik 2.0 will not work with native kubernetes Ingress objects, so it's pretty much worthless going forward IMO.

Need to experiment with and find an alternative ingress controller as a replacement. Some alternative ingress controllers to look at:

Deploy pihole

See this example

explore kustomize

See https://github.com/kubernetes-sigs/kustomize

migrate workloads that are sensitive to nfs to ceph

plex
sonarr
radarr
grafana
nzbget (maybe)

nextcloud chart doesn't work anymore

Examine this PR to see what needs to be done to shore-this up.

explore using vault within kubernetes

See:

migrate nextcloud

explore using cert-manager

Instead of using Traefik's built-in LetsEncrypt certificate support, consider using cert-manager. The idea is that instead of setting-up a KV store (consul) to manager certs for traefik to run in HA, we 'offload' the cert work to cert-manager.

See the following for reference:

Create velero backup and schedule yamls for all the necessary things

See #15 for background.

Deployments to create velero backup and schedule yamls. Run velero schedule create ... -o=yaml to dump the necessary yamls to put into files for gitops:

re-implement rook

Armed with more knowledge, it is possible to deploy rook such that the OSDs can be scheduled to the master nodes via the placement tolerations for just OSDs.

This will allow a scenario where all of the rook operator and ceph components are scheduled to run on the worker nodes, but the OSDs themselves can be isolated to run on only the master nodes.

The reason behind wanting to run the OSDs on different nodes is to avoid issues where the OSDs/rbd workloads will run in a different kernel space from the ceph/rbd client workloads.

reorganize repo structure

The directories should probably exactly match the namespace names they will get deployed to. This is the case for monitoring and logging but not so much the other stuff (e.g. deployments should probably be default).
The sealed secret setup stuff living under setup/manual-steps/values-to-encrypt may be more effective being colocated with the actual deployment. Right now it's a bit awkward to setup part of a helmrelease in one file but need to configure and 'process' the setup of the sealed-secret parts. Still need to think-through that workflow.
Need to figure out where to handle the setup/manual-steps/yamls stuff.
Consider also moving away from sealed-secrets and leveraging vault. Will handle vault exploration in a different story.

Design cluster migration plan

Migrating from one k8s cluster to another is particularly challenging because of a few factors:

Persistent Storage: 'old cluster' has many workloads with persistent storage (25 in fact) that needs to be somehow migrated to the replacement workloads in the 'new cluster' in a way that doesn't result in a loss of data
LoadBalancer IPs: Services leveraging MetalLB (12 of them) all have some external thing referencing those static IPs. Ideally we want to retain the same MetalLB IP when migrating from the 'old cluster' to the 'new cluster'.
domain names for ingress: The traefik ingress domain name in the 'new' cluster needs to match the same domain name as the 'old cluster'

Items 2 & 3 are relatively quick and easy to migrate. Item 1 (storage) is going to be problematic and time-consuming unless there is a good solve. Maybe somehow leveraging velero or stash to 'backup & restore', or scripting/automating the actual storage migration.

Resolve issues with stash

The current version of stash seems to have an issue in my specific case. I didn't trace down the actual root cause, but symtpoms are that it isn't possible to add new stash sidecars to deployments and the stash CRD seems to be blocking flux from working properly.

As a result, uninstalled stash and looking for a fix/update. Also explored other options (ark) but don't see anything else out there right now. I want a way to regularly back-up persistent volumes for deployed services.

See this slack thread for some more context

explore kubediff

https://github.com/weaveworks/kubediff

expand elasticsearch data storage from 30Gi to 60Gi

With this commit the change was introduced to the elasticsearch helm chart to use a data storage bucket of 60Gi instead of the default 30Gi. The reason for this is that the 30 days of data I want to retain is about the same size as the data allowance.

However, the nature of the ceph provisioner block storage prevents dynamic resizing of a PVC. Therefore, the entire chart and PVC must be destroyed and re-created.

This issue is to track and document that process for future learning.

add the intel-gpu-plugin daemonset to the cluster
add documentation about setting-up proxmox host(s) to properly enable device virtualization to VMs
add documentation about setting-up the k8s VM(s) to properly consume to the virtualized GPU device
figure out how to label/taint/annotate/whatever only the nodes that will actually run the GPU thing. Otherwise, the pods deployed by the daemonset will crash over and over on non-eligible nodes.

extract the flux SSH key from the newly-installed flux workload
manually upload/configure the github repo to use the new key

It would be great if the github deploy key could be automated. See this gist for inspiration:

        echo -n ">> "
        {
                curl \
                        -i\
                        -H"Authorization: token $github_access_token"\
                        --data @- https://api.github.com/repos/noah/$reponame/keys << EOF
        {
                "title" : "$repo_id $(date)",
                "key" : "$(cat $keyfile.pub)",
                "read_only" : false
        }
EOF
        } 2>/dev/null | head -1 # status code should be 201