Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!

License: MIT License

HCL 85.45% Smarty 8.18% Shell 6.37%

kubernetes terraform k8s k3s hcloud hetzner-cloud

terraform-hcloud-kube-hetzner's Issues

First control plane is not responding after post microOS_install_commands reboot

I've set up terraform.tfvars with my hcloud_token, public_key, private_key, and:

location                  = "nbg1" # change to `ash` for us-east Ashburn, Virginia location
network_region            = "eu-central" # change to `us-east` if location is ash
agent_server_type         = "cx11"
control_plane_server_type = "cpx11"
lb_server_type            = "lb11"

And as a result of terraform apply I can see:

hcloud_server.first_control_plane (local-exec): Executing: ["/bin/sh" "-c" "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /Users/....user_name..../.ssh/id_rsa [email protected]... '(sleep 2; reboot)&'; sleep 3"]
hcloud_server.first_control_plane (local-exec): Warning: Permanently added '...IP...' (ED25519) to the list of known hosts.
hcloud_server.first_control_plane (local-exec): Connection to ...IP... closed by remote host.
hcloud_server.first_control_plane: Still creating... [2m10s elapsed]
hcloud_server.first_control_plane: Provisioning with 'local-exec'...
hcloud_server.first_control_plane (local-exec): Executing: ["/bin/sh" "-c" "until ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /Users/....user_name..../.ssh/id_rsa -o ConnectTimeout=2 [email protected]... true 2> /dev/null\ndo\n  echo \"Waiting for MicroOS to reboot and become available...\"\n  sleep 3\ndone\n"]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [2m20s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [2m30s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [2m40s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [2m50s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [3m0s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [3m10s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [3m20s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [3m30s elapsed]
hcloud_server.first_control_plane: Provisioning with 'file'...
hcloud_server.first_control_plane: Still creating... [3m40s elapsed]
hcloud_server.first_control_plane: Still creating... [3m50s elapsed]
hcloud_server.first_control_plane: Still creating... [4m0s elapsed]
hcloud_server.first_control_plane: Still creating... [4m10s elapsed]
hcloud_server.first_control_plane: Still creating... [4m20s elapsed]
hcloud_server.first_control_plane: Still creating... [4m30s elapsed]
hcloud_server.first_control_plane: Still creating... [4m40s elapsed]
hcloud_server.first_control_plane: Still creating... [4m50s elapsed]
hcloud_server.first_control_plane: Still creating... [5m0s elapsed]
hcloud_server.first_control_plane: Still creating... [5m10s elapsed]
hcloud_server.first_control_plane: Still creating... [5m20s elapsed]
hcloud_server.first_control_plane: Still creating... [5m30s elapsed]
hcloud_server.first_control_plane: Still creating... [5m40s elapsed]
hcloud_server.first_control_plane: Still creating... [5m50s elapsed]
hcloud_server.first_control_plane: Still creating... [6m0s elapsed]
hcloud_server.first_control_plane: Still creating... [6m10s elapsed]
hcloud_server.first_control_plane: Still creating... [6m20s elapsed]
hcloud_server.first_control_plane: Still creating... [6m30s elapsed]
hcloud_server.first_control_plane: Still creating... [6m40s elapsed]
hcloud_server.first_control_plane: Still creating... [6m50s elapsed]
hcloud_server.first_control_plane: Still creating... [7m0s elapsed]
hcloud_server.first_control_plane: Still creating... [7m10s elapsed]
hcloud_server.first_control_plane: Still creating... [7m20s elapsed]
hcloud_server.first_control_plane: Still creating... [7m30s elapsed]
hcloud_server.first_control_plane: Still creating... [7m40s elapsed]
hcloud_server.first_control_plane: Still creating... [7m50s elapsed]
hcloud_server.first_control_plane: Still creating... [8m0s elapsed]
hcloud_server.first_control_plane: Still creating... [8m10s elapsed]
hcloud_server.first_control_plane: Still creating... [8m20s elapsed]
hcloud_server.first_control_plane: Still creating... [8m30s elapsed]
╷
│ Error: file provisioner error
│
│   with hcloud_server.first_control_plane,
│   on master.tf line 55, in resource "hcloud_server" "first_control_plane":
│   55:   provisioner "file" {
│
│ timeout - last error: dial tcp ...IP...:22: connect: operation timed out
╵

I've tried to ssh to this machine:

$ ssh [email protected]... -o StrictHostKeyChecking=no                                                                                                                                                                   
ssh: connect to host ...IP... port 22: Operation timed out

and do it again after manual restart of this machine from hetzner web and unfortunately it looks that this machine is not responding.

Do you have an idea how to diagnose it or what could be the problem?

Thanks :)

Validate node reboots healthy

@mnencia Have created like you a cluster of 3 controls and 2 agents. And to accelerated the process, have issued touch /var/run/reboot-required on all five of them, to simulate a post-update scenario.

I will report back on what happens after that. Please do not hesitate to share your findings here too.

Disable Traefik ingress controller & lb

Hi guys
Thanks to all contributors to this amazing project!

Is it possible to disable the treafik ingress controller at all? I would prefer using nginx-ingress or istio-gateway as ingress solution. For that, I didn't need the Loadbalancer and the treafik installation.

I'm also open to contributing such a toggle function but would need some inputs on how to implement it the best way. :)

cheers,
Johann Schley

local-exec provisioner error

i followed the instructions from the readme file and the error
local-exec provisioner error Error running command 'kubectl -n kube-system create secret generic hcloud exit status 1. Output: Unable to connect to the server: dial tcp
pops up.
Am i missing anything?

Decide which versioning scheme to use

Recently, the master branch has been quite unstable and required users to either stay on an unsupported, older commit of kube-hetzner or to re-provision their whole cluster.

I believe that the time would be right to agree on a versioning scheme and implement at least a minimum of a release process to communicate breaking changes more clearly.

My proposal would be to just use https://semver.org/ and start to define a process after which we could release a 1.0.0 (or 0.1.0 if you prefer ;). Ideally we would end up with a git tag, an auto-generated github release & a ready-to-use module published on registry.terraform.io.

I also started a GitHub project regarding the whole thing, you can find it linked in the sidebar of this issue or at https://github.com/orgs/kube-hetzner/projects/1

eager to hear what @mysticaltech, @mnencia and others are thinking!

terraform initialization stops after Install MicroOS and restart

I've set up terraform.tfvars with my hcloud_token, public_key, private_key, and:

location                  = "nbg1" # change to `ash` for us-east Ashburn, Virginia location
network_region            = "eu-central" # change to `us-east` if location is ash
agent_server_type         = "cx11"
control_plane_server_type = "cpx11"
lb_server_type            = "lb11"

servers_num               = 1
agents_num                = 0
# only one server is chosen because all servers has the same issue so I tried to focus on only one

just after success steps in host/main.tf:

  # Install MicroOS
  # Issue a reboot command

I can see:

module.first_control_plane.hcloud_server.server (local-exec): Executing: ["/bin/sh" "-c" "until ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i /Users/drackowski/.ssh/id_rsa -o ConnectTimeout=2 [email protected] true 2> /dev/null\ndo\n  echo \"Waiting for MicroOS to reboot and become available...\"\n  sleep 3\ndone\n"]
module.first_control_plane.hcloud_server.server (local-exec): Waiting for MicroOS to reboot and become available...
module.first_control_plane.hcloud_server.server: Still creating... [2m10s elapsed]
module.first_control_plane.hcloud_server.server (local-exec): Waiting for MicroOS to reboot and become available...
...
...
...
module.first_control_plane.hcloud_server.server: Provisioning with 'remote-exec'...
module.first_control_plane.hcloud_server.server (remote-exec): Connecting to remote host via SSH...
module.first_control_plane.hcloud_server.server (remote-exec):   Host: .......
module.first_control_plane.hcloud_server.server (remote-exec):   User: root
module.first_control_plane.hcloud_server.server (remote-exec):   Password: false
module.first_control_plane.hcloud_server.server (remote-exec):   Private key: true
module.first_control_plane.hcloud_server.server (remote-exec):   Certificate: false
module.first_control_plane.hcloud_server.server (remote-exec):   SSH Agent: true
module.first_control_plane.hcloud_server.server (remote-exec):   Checking Host Key: false
module.first_control_plane.hcloud_server.server (remote-exec):   Target Platform: unix

Error response is:

╷
│ Error: remote-exec provisioner error
│
│   with module.first_control_plane.hcloud_server.server,
│   on modules/host/main.tf line 60, in resource "hcloud_server" "server":
│   60:   provisioner "remote-exec" {
│
│ timeout - last error: dial tcp ........:22: i/o timeout
╵

When I'm trying to ssh to this machine it's not responding "timed out" after long time

On server console I can see welcome message from openSUSE with few SSH host keys and "static login: "

Do you know what can I check there else to diagnose what has happened? 😅

Full terraform apply console output is in attachment full console output.txt

nodes "k3s-control-plane-0" not found

Hi!

I'm having issues getting started with kube-hetzner.

I see the following issue:

Any suggestion on how to get it running?

Error when provisioning

I followed the Readme and am getting this error. It seemed to have created the 3 control planes, network and firewall but not the nodepool/nodes or load balancer.

 Error: invalid input in field 'name' (invalid_input): [name => [Name must be a valid hostname.]]
│
│   with module.agents["myname_nodes-1"].hcloud_server.server,
│   on modules/host/main.tf line 1, in resource "hcloud_server" "server":
│    1: resource "hcloud_server" "server" {

Request: option to add custom certificate

The option to add an existing certificates to the hetzner loadbalaner.
In my case I have an cloudflare origin server certificate.

Error: hcloud/setRescue

I recently contacted Hetzner to increase my limits to deploy 3 master nodes and 3 worker nodes and after the limit increase I executed terraform but the script exited with an unknown error

╷
│ Error: hcloud/setRescue: hcclient/WaitForActions: action 382332309 failed: Unknown Error (unknown_error)
│ 
│   with hcloud_server.control_planes[0],
│   on servers.tf line 1, in resource "hcloud_server" "control_planes":
│    1: resource "hcloud_server" "control_planes" {
│

here is my terraform.tfvars

# You need to replace these
hcloud_token = "my-token"
public_key   = "/home/user/.ssh/id_ed25519.pub"
# Must be "private_key = null" when you want to use ssh-agent, for a Yubikey like device auth or an SSH key-pair with passphrase
private_key  = "/home/user/.ssh/id_ed25519"

# These can be customized, or left with the default values
# For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
# For Hetzner server types see https://www.hetzner.com/cloud
location                  = "fsn1" # change to `ash` for us-east Ashburn, Virginia location
network_region            = "eu-central" # change to `us-east` if location is ash
agent_server_type         = "cx41"
control_plane_server_type = "cx21"
lb_server_type            = "lb21"

# At least 3 server nodes is recommended for HA, otherwise you need to turn off automatic upgrade (see ReadMe).
servers_num               = 3

# For agent nodes, at least 2 is recommended for HA, but you can keep automatic upgrades.
agents_num                = 3

# If you want to use a specific Hetzner CCM and CSI version, set them below, otherwise leave as is for the latest versions
# hetzner_ccm_version = ""
# hetzner_csi_version = ""

# If you want to kustomize the Hetzner CCM and CSI containers with the "latest" tags and imagePullPolicy Always, 
# to have them automatically update when the node themselve get updated via the rancher system upgrade controller, the default is "false".
# If you choose to keep the default of "false", you can always use ArgoCD to monitor the CSI and CCM manifest for new releases,
# that is probably the more "vanilla" option to keep these components always updated. 
# hetzner_ccm_containers_latest = true
# hetzner_csi_containers_latest = true

# If you want to use letsencrypt with tls Challenge, the email address is used to send you certificates expiration notices
traefik_acme_tls = true
traefik_acme_email = "my-email"

# If you want to allow non-control-plane workloads to run on the control-plane nodes set "true" below. The default is "false".
# allow_scheduling_on_control_plane = true

I have not edited any other file.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.

Check this box to trigger a request for Renovate to run again on this repository

Metrics-Server unable to scrape metrics

Hi,

the metrics-server in my cluster is unable to scrape metrics from nodes:

I0305 08:15:26.352996       1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0305 08:15:26.719418       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0305 08:15:26.719485       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0305 08:15:26.719421       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0305 08:15:26.719520       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0305 08:15:26.719496       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0305 08:15:26.719646       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0305 08:15:26.720159       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0305 08:15:26.720267       1 secure_serving.go:202] Serving securely on :4443
I0305 08:15:26.720356       1 tlsconfig.go:240] Starting DynamicServingCertificateController
E0305 08:15:26.723158       1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.2.0.1:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"
I0305 08:15:26.820176       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0305 08:15:26.820185       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0305 08:15:26.820191       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0305 08:15:27.267832       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:27.636114       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:28.269638       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:29.635527       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:31.635288       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:33.636699       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:35.635678       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:37.636936       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:39.635200       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0305 08:15:41.635422       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
E0305 08:15:41.711552       1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.2.0.1:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"
E0305 08:15:56.722408       1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.2.0.1:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"
E0305 08:16:11.698713       1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.2.0.1:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"
E0305 08:16:26.707787       1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.2.0.1:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, 88.198.105.71, not 10.2.0.1" node="agent-big-0"

My cluster only consists of that one big agent and 3 control nodes. Any idea whats happening here?

Error: hcloud/inlineAttachServerToNetwork

One of the nodes were tainted. I tried to reapply Terraform via tf apply.
That didn't work, so I deleted the node (agent-3) completely in the Hetzner UI
and tried a tf plan and tf apply --auto-apply


hcloud_server.agents[3]: Creating...
╷
│ Error: hcloud/inlineAttachServerToNetwork: attach server to network: provided IP is not available (ip_not_available)
│
│   with hcloud_server.agents[3],
│   on agents.tf line 1, in resource "hcloud_server" "agents":
│    1: resource "hcloud_server" "agents" {

The agent-3 was created but not attached to K8.

After that I tried to increase the node numbers from 3 agents to 5.
Node agent-4 was created and attached, but node-3 was still not able to attach:

complete output:

 tf apply --auto-approve
random_password.k3s_token: Refreshing state... [id=none]
local_file.traefik_config: Refreshing state... [id=25ba84696ee16d68f5b98f6ea6b70bb14c3c530c]
hcloud_placement_group.k3s_placement_group: Refreshing state... [id=19653]
hcloud_ssh_key.default: Refreshing state... [id=5492430]
hcloud_network.k3s: Refreshing state... [id=1352333]
hcloud_firewall.k3s: Refreshing state... [id=290151]
hcloud_network_subnet.k3s: Refreshing state... [id=1352333-10.0.0.0/16]
local_file.hetzner_csi_config: Refreshing state... [id=aa232912bcf86722e32b698e1e077522c7f02a9d]
local_file.hetzner_ccm_config: Refreshing state... [id=f5ec6cb5689cb5830d04857365d567edae562174]
hcloud_server.first_control_plane: Refreshing state... [id=17736249]
hcloud_server.control_planes[0]: Refreshing state... [id=17736377]
hcloud_server.control_planes[1]: Refreshing state... [id=17736378]
hcloud_server.agents[5]: Refreshing state... [id=17861319]
hcloud_server.agents[3]: Refreshing state... [id=17869801]
hcloud_server.agents[0]: Refreshing state... [id=17736379]
hcloud_server.agents[1]: Refreshing state... [id=17736385]
hcloud_server.agents[4]: Refreshing state... [id=17858945]
hcloud_server.agents[2]: Refreshing state... [id=17736383]

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # hcloud_placement_group.k3s_placement_group has been changed
  ~ resource "hcloud_placement_group" "k3s_placement_group" {
        id      = "19653"
        name    = "k3s-placement-group"
      ~ servers = [
          + 17869801,
            # (8 unchanged elements hidden)
        ]
        # (2 unchanged attributes hidden)
    }
  # hcloud_server.agents[3] has been changed
  ~ resource "hcloud_server" "agents" {
      + datacenter         = "fsn1-dc14"
        id                 = "17869801"
      + ipv4_address       = "78.47.82.149"
      + ipv6_address       = "2a01:4f8:c17:8d4a::1"
      + ipv6_network       = "2a01:4f8:c17:8d4a::/64"
        name               = "k3s-agent-3"
      + status             = "running"
        # (12 unchanged attributes hidden)

      - network {
          - alias_ips  = [] -> null
          - ip         = "10.0.0.8" -> null
          - network_id = 1352333 -> null
        }
    }
  # hcloud_firewall.k3s has been changed
  ~ resource "hcloud_firewall" "k3s" {
        id     = "290151"
        name   = "k3s-firewall"
        # (1 unchanged attribute hidden)

      + apply_to {
          + server = 17869801
        }

        # (21 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may
include actions to undo or respond to these changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # hcloud_server.agents[3] is tainted, so must be replaced
-/+ resource "hcloud_server" "agents" {
      + backup_window      = (known after apply)
      ~ datacenter         = "fsn1-dc14" -> (known after apply)
      ~ id                 = "17869801" -> (known after apply)
      ~ ipv4_address       = "78.47.82.xxx" -> (known after apply)
      ~ ipv6_address       = "2a01:4f8:c17:xxxx::1" -> (known after apply)
      ~ ipv6_network       = "2a01:4f8:c17:xxxx::/64" -> (known after apply)
        name               = "k3s-agent-3"
      ~ status             = "running" -> (known after apply)
        # (12 unchanged attributes hidden)

      + network {
          + alias_ips   = []
          + ip          = "10.0.0.8"
          + mac_address = (known after apply)
          + network_id  = 1352333
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Changes to Outputs:
  ~ agents_public_ip = [
        # (2 unchanged elements hidden)
        "138.201.246.xxx",
      + (known after apply),
      + "78.46.163.xxx",
      + "49.12.100.xxx",
    ]
hcloud_server.agents[3]: Destroying... [id=17869801]
hcloud_server.agents[3]: Destruction complete after 2s
hcloud_server.agents[3]: Creating...
hcloud_server.agents[3]: Still creating... [10s elapsed]
╷
│ Error: hcloud/inlineAttachServerToNetwork: attach server to network: provided IP is not available (ip_not_available)
│
│   with hcloud_server.agents[3],
│   on agents.tf line 1, in resource "hcloud_server" "agents":
│    1: resource "hcloud_server" "agents" {
│
╵

How can I get agent-3 working again?

thank you in advance

Fix for hcloud csi crashloopbackoff

Just apply the following command kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v1.6.0/deploy/kubernetes/hcloud-csi.yml

hcloud_network_subnet.k3s: Still destroying

Somehow terraform destroy keeps hanging on destroying the network:

hcloud_placement_group.k3s_placement_group: Destruction complete after 0s
hcloud_firewall.k3s: Destruction complete after 0s
...
..
.
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 10m40s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 10m50s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 11m0s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 11m10s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 11m20s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 11m30s elapsed]
hcloud_network_subnet.k3s: Still destroying... [id=1352246-10.0.0.0/16, 11m40s elapsed]

I also tried to reapply terraform destroy
When I manually delete the network in the UI, it finishes a few seconds later

Make sure the hostname does not get superseeded by the transient one

We are setting the hostnane via ignition by editing /etc/hostname, that set the static hostname correctly, but it gets superseded by the transient one.

I tried forcing the transient one with DHCP_HOSTNAME=hostname in ifcfg files, but to no avail.

New "MicroOS" version does not deploy, stuck on control-plane-1

Bildschirmaufnahme.2022-02-13.um.420mov.mov

Can someone tell me please what I am doing wrong?

❯ tf apply --auto-approve

Terraform used the selected providers to generate the following execution plan. Resource actions are
indicated with the following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # data.remote_file.kubeconfig will be read during apply
  # (config refers to values not yet known)
 <= data "remote_file" "kubeconfig"  {
      + content = (known after apply)
      + id      = (known after apply)
      + path    = "/etc/rancher/k3s/k3s.yaml"

      + conn {
          + agent       = false
          + host        = (known after apply)
          + port        = 22
          + private_key = (sensitive value)
          + user        = "root"
        }
    }

  # hcloud_firewall.k3s will be created
  + resource "hcloud_firewall" "k3s" {
      + id     = (known after apply)
      + labels = (known after apply)
      + name   = "k3s"

      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + protocol        = "icmp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + port            = "123"
          + protocol        = "udp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + port            = "443"
          + protocol        = "tcp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + port            = "53"
          + protocol        = "tcp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + port            = "53"
          + protocol        = "udp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = [
              + "0.0.0.0/0",
            ]
          + direction       = "out"
          + port            = "80"
          + protocol        = "tcp"
          + source_ips      = []
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + protocol        = "icmp"
          + source_ips      = [
              + "0.0.0.0/0",
            ]
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + protocol        = "icmp"
          + source_ips      = [
              + "10.0.0.0/8",
              + "127.0.0.1/32",
              + "169.254.169.254/32",
              + "213.239.246.1/32",
            ]
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + port            = "22"
          + protocol        = "tcp"
          + source_ips      = [
              + "0.0.0.0/0",
            ]
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + port            = "6443"
          + protocol        = "tcp"
          + source_ips      = [
              + "0.0.0.0/0",
            ]
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + port            = "any"
          + protocol        = "tcp"
          + source_ips      = [
              + "10.0.0.0/8",
              + "127.0.0.1/32",
              + "169.254.169.254/32",
              + "213.239.246.1/32",
            ]
        }
      + rule {
          + destination_ips = []
          + direction       = "in"
          + port            = "any"
          + protocol        = "udp"
          + source_ips      = [
              + "10.0.0.0/8",
              + "127.0.0.1/32",
              + "169.254.169.254/32",
              + "213.239.246.1/32",
            ]
        }
    }

  # hcloud_network.k3s will be created
  + resource "hcloud_network" "k3s" {
      + delete_protection = false
      + id                = (known after apply)
      + ip_range          = "10.0.0.0/8"
      + name              = "k3s"
    }

  # hcloud_network_subnet.k3s will be created
  + resource "hcloud_network_subnet" "k3s" {
      + gateway      = (known after apply)
      + id           = (known after apply)
      + ip_range     = "10.0.0.0/16"
      + network_id   = (known after apply)
      + network_zone = "eu-central"
      + type         = "cloud"
    }

  # hcloud_placement_group.k3s will be created
  + resource "hcloud_placement_group" "k3s" {
      + id      = (known after apply)
      + labels  = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + name    = "k3s"
      + servers = (known after apply)
      + type    = "spread"
    }

  # hcloud_server.agents[0] will be created
  + resource "hcloud_server" "agents" {
      + backup_window      = (known after apply)
      + backups            = false
      + datacenter         = (known after apply)
      + delete_protection  = false
      + firewall_ids       = (known after apply)
      + id                 = (known after apply)
      + image              = "ubuntu-20.04"
      + ipv4_address       = (known after apply)
      + ipv6_address       = (known after apply)
      + ipv6_network       = (known after apply)
      + keep_disk          = false
      + labels             = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + location           = "fsn1"
      + name               = "k3s-agent-0"
      + placement_group_id = (known after apply)
      + rebuild_protection = false
      + rescue             = "linux64"
      + server_type        = "cpx21"
      + ssh_keys           = (known after apply)
      + status             = (known after apply)

      + network {
          + alias_ips   = []
          + ip          = "10.0.1.1"
          + mac_address = (known after apply)
          + network_id  = (known after apply)
        }
    }

  # hcloud_server.agents[1] will be created
  + resource "hcloud_server" "agents" {
      + backup_window      = (known after apply)
      + backups            = false
      + datacenter         = (known after apply)
      + delete_protection  = false
      + firewall_ids       = (known after apply)
      + id                 = (known after apply)
      + image              = "ubuntu-20.04"
      + ipv4_address       = (known after apply)
      + ipv6_address       = (known after apply)
      + ipv6_network       = (known after apply)
      + keep_disk          = false
      + labels             = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + location           = "fsn1"
      + name               = "k3s-agent-1"
      + placement_group_id = (known after apply)
      + rebuild_protection = false
      + rescue             = "linux64"
      + server_type        = "cpx21"
      + ssh_keys           = (known after apply)
      + status             = (known after apply)

      + network {
          + alias_ips   = []
          + ip          = "10.0.1.2"
          + mac_address = (known after apply)
          + network_id  = (known after apply)
        }
    }

  # hcloud_server.control_planes[0] will be created
  + resource "hcloud_server" "control_planes" {
      + backup_window      = (known after apply)
      + backups            = false
      + datacenter         = (known after apply)
      + delete_protection  = false
      + firewall_ids       = (known after apply)
      + id                 = (known after apply)
      + image              = "ubuntu-20.04"
      + ipv4_address       = (known after apply)
      + ipv6_address       = (known after apply)
      + ipv6_network       = (known after apply)
      + keep_disk          = false
      + labels             = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + location           = "fsn1"
      + name               = "k3s-control-plane-1"
      + placement_group_id = (known after apply)
      + rebuild_protection = false
      + rescue             = "linux64"
      + server_type        = "cpx11"
      + ssh_keys           = (known after apply)
      + status             = (known after apply)

      + network {
          + alias_ips   = []
          + ip          = "10.0.0.3"
          + mac_address = (known after apply)
          + network_id  = (known after apply)
        }
    }

  # hcloud_server.control_planes[1] will be created
  + resource "hcloud_server" "control_planes" {
      + backup_window      = (known after apply)
      + backups            = false
      + datacenter         = (known after apply)
      + delete_protection  = false
      + firewall_ids       = (known after apply)
      + id                 = (known after apply)
      + image              = "ubuntu-20.04"
      + ipv4_address       = (known after apply)
      + ipv6_address       = (known after apply)
      + ipv6_network       = (known after apply)
      + keep_disk          = false
      + labels             = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + location           = "fsn1"
      + name               = "k3s-control-plane-2"
      + placement_group_id = (known after apply)
      + rebuild_protection = false
      + rescue             = "linux64"
      + server_type        = "cpx11"
      + ssh_keys           = (known after apply)
      + status             = (known after apply)

      + network {
          + alias_ips   = []
          + ip          = "10.0.0.4"
          + mac_address = (known after apply)
          + network_id  = (known after apply)
        }
    }

  # hcloud_server.first_control_plane will be created
  + resource "hcloud_server" "first_control_plane" {
      + backup_window      = (known after apply)
      + backups            = false
      + datacenter         = (known after apply)
      + delete_protection  = false
      + firewall_ids       = (known after apply)
      + id                 = (known after apply)
      + image              = "ubuntu-20.04"
      + ipv4_address       = (known after apply)
      + ipv6_address       = (known after apply)
      + ipv6_network       = (known after apply)
      + keep_disk          = false
      + labels             = {
          + "engine"      = "k3s"
          + "provisioner" = "terraform"
        }
      + location           = "fsn1"
      + name               = "k3s-control-plane-0"
      + placement_group_id = (known after apply)
      + rebuild_protection = false
      + rescue             = "linux64"
      + server_type        = "cpx11"
      + ssh_keys           = (known after apply)
      + status             = (known after apply)

      + network {
          + alias_ips   = []
          + ip          = "10.0.0.2"
          + mac_address = (known after apply)
          + network_id  = (known after apply)
        }
    }

  # hcloud_ssh_key.k3s will be created
  + resource "hcloud_ssh_key" "k3s" {
      + fingerprint = (known after apply)
      + id          = (known after apply)
      + name        = "k3s"
      + public_key  = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5mH6iwpbJY+ssGIJUVsClE5LO/e9/YhA2k+oOP6VzxK2f9GutJu6wYNd6re5Ma1BRZL1ld95QKs/k1F1HWq75y1VJMawD+72+7OR6eT1nwJyrFDVk801UgCuOPJtLGAjNXx9uT2AMKZ08crnRGap3XzjLynVxoeETndINMew3LKnaL3zGkrDRRZnysrIoB3c8ywS9WlQxB5M3zdMICQ6aqsonIHChDybHnKb+wEKFUbND5ga/V1VG2GUR18uNGu01Zpxxof566C+26owSfrnA9R7KllUI/+/zYTqFRt5a2F3B/k0I+5WhSsAuRbI/eundl1oTP4sAtJ8qKBt20VYL [email protected]"
    }

  # local_file.kubeconfig will be created
  + resource "local_file" "kubeconfig" {
      + directory_permission = "0777"
      + file_permission      = "600"
      + filename             = "kubeconfig.yaml"
      + id                   = (known after apply)
      + sensitive_content    = (sensitive value)
    }

  # random_password.k3s_token will be created
  + resource "random_password" "k3s_token" {
      + id          = (known after apply)
      + length      = 48
      + lower       = true
      + min_lower   = 0
      + min_numeric = 0
      + min_special = 0
      + min_upper   = 0
      + number      = true
      + result      = (sensitive value)
      + special     = false
      + upper       = true
    }

Plan: 12 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + agents_public_ip        = [
      + (known after apply),
      + (known after apply),
    ]
  + controlplanes_public_ip = [
      + (known after apply),
      + (known after apply),
      + (known after apply),
    ]
  + kubeconfig              = (sensitive value)
  + kubeconfig_file         = (sensitive value)
random_password.k3s_token: Creating...
random_password.k3s_token: Creation complete after 0s [id=none]
hcloud_network.k3s: Creating...
hcloud_placement_group.k3s: Creating...
hcloud_ssh_key.k3s: Creating...
hcloud_firewall.k3s: Creating...
hcloud_placement_group.k3s: Creation complete after 1s [id=21532]
hcloud_ssh_key.k3s: Creation complete after 1s [id=5557860]
hcloud_network.k3s: Creation complete after 1s [id=1370757]
hcloud_network_subnet.k3s: Creating...
hcloud_firewall.k3s: Creation complete after 1s [id=300569]
hcloud_network_subnet.k3s: Creation complete after 1s [id=1370757-10.0.0.0/16]
hcloud_server.first_control_plane: Creating...
hcloud_server.first_control_plane: Still creating... [10s elapsed]
hcloud_server.first_control_plane: Provisioning with 'file'...
hcloud_server.first_control_plane: Still creating... [20s elapsed]
hcloud_server.first_control_plane: Still creating... [30s elapsed]
hcloud_server.first_control_plane: Provisioning with 'remote-exec'...
hcloud_server.first_control_plane (remote-exec): Connecting to remote host via SSH...
hcloud_server.first_control_plane (remote-exec):   Host: 49.12.221.176
hcloud_server.first_control_plane (remote-exec):   User: root
hcloud_server.first_control_plane (remote-exec):   Password: false
hcloud_server.first_control_plane (remote-exec):   Private key: true
hcloud_server.first_control_plane (remote-exec):   Certificate: false
hcloud_server.first_control_plane (remote-exec):   SSH Agent: true
hcloud_server.first_control_plane (remote-exec):   Checking Host Key: false
hcloud_server.first_control_plane (remote-exec):   Target Platform: unix
hcloud_server.first_control_plane (remote-exec): Connected!
hcloud_server.first_control_plane (remote-exec): + apt-get install -y aria2
hcloud_server.first_control_plane: Still creating... [40s elapsed]
hcloud_server.first_control_plane (remote-exec): Reading package lists... 0%
hcloud_server.first_control_plane (remote-exec): Reading package lists... 0%
hcloud_server.first_control_plane (remote-exec): Reading package lists... 16%
hcloud_server.first_control_plane (remote-exec): Reading package lists... Done
hcloud_server.first_control_plane (remote-exec): Building dependency tree... 0%
hcloud_server.first_control_plane (remote-exec): Building dependency tree... 0%
hcloud_server.first_control_plane (remote-exec): Building dependency tree... 50%
hcloud_server.first_control_plane (remote-exec): Building dependency tree... 50%
hcloud_server.first_control_plane (remote-exec): Building dependency tree... Done
hcloud_server.first_control_plane (remote-exec): Reading state information... 0%
hcloud_server.first_control_plane (remote-exec): Reading state information... 0%
hcloud_server.first_control_plane (remote-exec): Reading state information... Done
hcloud_server.first_control_plane (remote-exec): The following additional packages will be installed:
hcloud_server.first_control_plane (remote-exec):   libaria2-0 libc-ares2
hcloud_server.first_control_plane (remote-exec): The following NEW packages will be installed:
hcloud_server.first_control_plane (remote-exec):   aria2 libaria2-0 libc-ares2
hcloud_server.first_control_plane (remote-exec): 0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
hcloud_server.first_control_plane (remote-exec): Need to get 1,571 kB of archives.
hcloud_server.first_control_plane (remote-exec): After this operation, 6,225 kB of additional disk space will be used.
hcloud_server.first_control_plane (remote-exec): 0% [Working]
hcloud_server.first_control_plane (remote-exec): Get:1 http://mirror.hetzner.com/debian/packages bullseye/main amd64 libc-ares2 amd64 1.17.1-1+deb11u1 [102 kB]
hcloud_server.first_control_plane (remote-exec): 1% [1 libc-ares2 14.2 kB/102 kB 14%]
hcloud_server.first_control_plane (remote-exec): 12% [Working]
hcloud_server.first_control_plane (remote-exec): Get:2 http://mirror.hetzner.com/debian/packages bullseye/main amd64 libaria2-0 amd64 1.35.0-3 [1,107 kB]
hcloud_server.first_control_plane (remote-exec): 13% [2 libaria2-0 28.6 kB/1,107 kB 3%]
hcloud_server.first_control_plane (remote-exec): 75% [Waiting for headers]
hcloud_server.first_control_plane (remote-exec): Get:3 http://mirror.hetzner.com/debian/packages bullseye/main amd64 aria2 amd64 1.35.0-3 [362 kB]
hcloud_server.first_control_plane (remote-exec): 77% [3 aria2 35.8 kB/362 kB 10%]
hcloud_server.first_control_plane (remote-exec): 100% [Working]
hcloud_server.first_control_plane (remote-exec): Fetched 1,571 kB in 0s (4,481 kB/s)
hcloud_server.first_control_plane (remote-exec): Selecting previously unselected package libc-ares2:amd64.
hcloud_server.first_control_plane (remote-exec): (Reading database ...
hcloud_server.first_control_plane (remote-exec): (Reading database ... 5%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 10%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 15%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 20%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 25%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 30%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 35%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 40%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 45%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 50%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 55%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 60%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 65%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 70%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 75%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 80%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 85%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 90%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 95%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 100%
hcloud_server.first_control_plane (remote-exec): (Reading database ... 62163 files and directories currently installed.)
hcloud_server.first_control_plane (remote-exec): Preparing to unpack .../libc-ares2_1.17.1-1+deb11u1_amd64.deb ...
hcloud_server.first_control_plane (remote-exec): Unpacking libc-ares2:amd64 (1.17.1-1+deb11u1) ...
hcloud_server.first_control_plane (remote-exec): Selecting previously unselected package libaria2-0:amd64.
hcloud_server.first_control_plane (remote-exec): Preparing to unpack .../libaria2-0_1.35.0-3_amd64.deb ...
hcloud_server.first_control_plane (remote-exec): Unpacking libaria2-0:amd64 (1.35.0-3) ...
hcloud_server.first_control_plane (remote-exec): Selecting previously unselected package aria2.
hcloud_server.first_control_plane (remote-exec): Preparing to unpack .../aria2_1.35.0-3_amd64.deb ...
hcloud_server.first_control_plane (remote-exec): Unpacking aria2 (1.35.0-3) ...
hcloud_server.first_control_plane (remote-exec): Setting up libc-ares2:amd64 (1.17.1-1+deb11u1) ...
hcloud_server.first_control_plane (remote-exec): Setting up libaria2-0:amd64 (1.35.0-3) ...
hcloud_server.first_control_plane (remote-exec): Setting up aria2 (1.35.0-3) ...
hcloud_server.first_control_plane (remote-exec): Processing triggers for man-db (2.9.4-2) ...
hcloud_server.first_control_plane (remote-exec): Processing triggers for libc-bin (2.31-13+deb11u2) ...
hcloud_server.first_control_plane (remote-exec): + aria2c --follow-metalink=mem https://download.opensuse.org/tumbleweed/appliances/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4

hcloud_server.first_control_plane (remote-exec): 02/13 07:16:42 [NOTICE] Downloading 1 item(s)
hcloud_server.first_control_plane (remote-exec): [#19122d 0B/0B CN:1 DL:0B]
hcloud_server.first_control_plane (remote-exec): 02/13 07:16:43 [NOTICE] Download complete: [MEMORY]openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220210.qcow2.meta4
hcloud_server.first_control_plane (remote-exec): [#3aa26e 15MiB/601MiB(2%) CN:5 DL:20MiB
hcloud_server.first_control_plane (remote-exec): [#3aa26e 407MiB/601MiB(67%) CN:5 DL:232
hcloud_server.first_control_plane (remote-exec): [#3aa26e 549MiB/601MiB(91%) CN:2 DL:200
hcloud_server.first_control_plane (remote-exec): [#3aa26e 563MiB/601MiB(93%) CN:2 DL:150
hcloud_server.first_control_plane: Still creating... [50s elapsed]
hcloud_server.first_control_plane (remote-exec): [#3aa26e 573MiB/601MiB(95%) CN:1 DL:121
hcloud_server.first_control_plane (remote-exec): [#3aa26e 576MiB/601MiB(95%) CN:1 DL:100
hcloud_server.first_control_plane (remote-exec): [#3aa26e 578MiB/601MiB(96%) CN:1 DL:85M
hcloud_server.first_control_plane (remote-exec): [#3aa26e 580MiB/601MiB(96%) CN:1 DL:74M
hcloud_server.first_control_plane (remote-exec): [#3aa26e 583MiB/601MiB(97%) CN:1 DL:66M
hcloud_server.first_control_plane (remote-exec): [#3aa26e 586MiB/601MiB(97%) CN:1 DL:60M
hcloud_server.first_control_plane (remote-exec): [#3aa26e 590MiB/601MiB(98%) CN:1 DL:54M

hcloud_server.first_control_plane (remote-exec): 02/13 07:16:54 [NOTICE] Download complete: /root/openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220210.qcow2

hcloud_server.first_control_plane (remote-exec): Download Results:
hcloud_server.first_control_plane (remote-exec): gid   |stat|avg speed  |path/URI
hcloud_server.first_control_plane (remote-exec): ======+====+===========+=======================================================
hcloud_server.first_control_plane (remote-exec): 19122d|OK  |   141KiB/s|[MEMORY]openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220210.qcow2.meta4
hcloud_server.first_control_plane (remote-exec): 3aa26e|OK  |    53MiB/s|/root/openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220210.qcow2

hcloud_server.first_control_plane (remote-exec): Status Legend:
hcloud_server.first_control_plane (remote-exec): (OK):download completed.
hcloud_server.first_control_plane (remote-exec): + + grep -ie ^opensuse.*microos.*k3s.*qcow2$
hcloud_server.first_control_plane (remote-exec): ls -a
hcloud_server.first_control_plane (remote-exec): + qemu-img convert -p -f qcow2 -O host_device openSUSE-MicroOS.x86_64-16.0.0-k3s-kvm-and-xen-Snapshot20220210.qcow2 /dev/sda
hcloud_server.first_control_plane (remote-exec):     (0.00/100%)
hcloud_server.first_control_plane (remote-exec):     (1.00/100%)
hcloud_server.first_control_plane (remote-exec):     (2.01/100%)
hcloud_server.first_control_plane (remote-exec):     (3.01/100%)
hcloud_server.first_control_plane (remote-exec):     (4.01/100%)
hcloud_server.first_control_plane (remote-exec):     (5.02/100%)
hcloud_server.first_control_plane (remote-exec):     (6.02/100%)
hcloud_server.first_control_plane (remote-exec):     (7.05/100%)
hcloud_server.first_control_plane (remote-exec):     (8.05/100%)
hcloud_server.first_control_plane (remote-exec):     (9.06/100%)
hcloud_server.first_control_plane (remote-exec):     (10.06/100%)
hcloud_server.first_control_plane (remote-exec):     (11.07/100%)
hcloud_server.first_control_plane (remote-exec):     (12.08/100%)
hcloud_server.first_control_plane (remote-exec):     (13.08/100%)
hcloud_server.first_control_plane: Still creating... [1m0s elapsed]
hcloud_server.first_control_plane (remote-exec):     (14.08/100%)
hcloud_server.first_control_plane (remote-exec):     (15.09/100%)
hcloud_server.first_control_plane (remote-exec):     (16.09/100%)
hcloud_server.first_control_plane (remote-exec):     (17.10/100%)
hcloud_server.first_control_plane (remote-exec):     (18.10/100%)
hcloud_server.first_control_plane (remote-exec):     (19.10/100%)
hcloud_server.first_control_plane (remote-exec):     (20.11/100%)
hcloud_server.first_control_plane (remote-exec):     (21.11/100%)
hcloud_server.first_control_plane (remote-exec):     (22.11/100%)
hcloud_server.first_control_plane (remote-exec):     (23.12/100%)
hcloud_server.first_control_plane (remote-exec):     (24.12/100%)
hcloud_server.first_control_plane (remote-exec):     (25.12/100%)
hcloud_server.first_control_plane (remote-exec):     (26.13/100%)
hcloud_server.first_control_plane (remote-exec):     (27.13/100%)
hcloud_server.first_control_plane (remote-exec):     (28.14/100%)
hcloud_server.first_control_plane (remote-exec):     (29.14/100%)
hcloud_server.first_control_plane (remote-exec):     (30.14/100%)
hcloud_server.first_control_plane (remote-exec):     (31.15/100%)
hcloud_server.first_control_plane (remote-exec):     (32.15/100%)
hcloud_server.first_control_plane (remote-exec):     (33.15/100%)
hcloud_server.first_control_plane (remote-exec):     (34.16/100%)
hcloud_server.first_control_plane (remote-exec):     (35.16/100%)
hcloud_server.first_control_plane (remote-exec):     (36.16/100%)
hcloud_server.first_control_plane (remote-exec):     (37.17/100%)
hcloud_server.first_control_plane (remote-exec):     (38.17/100%)
hcloud_server.first_control_plane (remote-exec):     (39.18/100%)
hcloud_server.first_control_plane (remote-exec):     (40.18/100%)
hcloud_server.first_control_plane (remote-exec):     (41.18/100%)
hcloud_server.first_control_plane (remote-exec):     (42.19/100%)
hcloud_server.first_control_plane (remote-exec):     (43.19/100%)
hcloud_server.first_control_plane (remote-exec):     (44.19/100%)
hcloud_server.first_control_plane (remote-exec):     (45.20/100%)
hcloud_server.first_control_plane (remote-exec):     (46.20/100%)
hcloud_server.first_control_plane (remote-exec):     (47.20/100%)
hcloud_server.first_control_plane (remote-exec):     (48.21/100%)
hcloud_server.first_control_plane (remote-exec):     (49.21/100%)
hcloud_server.first_control_plane (remote-exec):     (50.22/100%)
hcloud_server.first_control_plane (remote-exec):     (51.22/100%)
hcloud_server.first_control_plane (remote-exec):     (52.22/100%)
hcloud_server.first_control_plane (remote-exec):     (53.23/100%)
hcloud_server.first_control_plane (remote-exec):     (54.23/100%)
hcloud_server.first_control_plane (remote-exec):     (55.23/100%)
hcloud_server.first_control_plane (remote-exec):     (56.24/100%)
hcloud_server.first_control_plane (remote-exec):     (57.24/100%)
hcloud_server.first_control_plane (remote-exec):     (58.24/100%)
hcloud_server.first_control_plane (remote-exec):     (59.25/100%)
hcloud_server.first_control_plane (remote-exec):     (60.25/100%)
hcloud_server.first_control_plane (remote-exec):     (61.28/100%)
hcloud_server.first_control_plane (remote-exec):     (62.28/100%)
hcloud_server.first_control_plane (remote-exec):     (63.29/100%)
hcloud_server.first_control_plane (remote-exec):     (64.29/100%)
hcloud_server.first_control_plane (remote-exec):     (65.29/100%)
hcloud_server.first_control_plane (remote-exec):     (66.30/100%)
hcloud_server.first_control_plane (remote-exec):     (67.30/100%)
hcloud_server.first_control_plane (remote-exec):     (68.30/100%)
hcloud_server.first_control_plane (remote-exec):     (69.31/100%)
hcloud_server.first_control_plane (remote-exec):     (70.31/100%)
hcloud_server.first_control_plane (remote-exec):     (71.32/100%)
hcloud_server.first_control_plane (remote-exec):     (72.32/100%)
hcloud_server.first_control_plane (remote-exec):     (73.32/100%)
hcloud_server.first_control_plane (remote-exec):     (74.33/100%)
hcloud_server.first_control_plane (remote-exec):     (75.33/100%)
hcloud_server.first_control_plane (remote-exec):     (76.33/100%)
hcloud_server.first_control_plane (remote-exec):     (77.34/100%)
hcloud_server.first_control_plane (remote-exec):     (78.34/100%)
hcloud_server.first_control_plane (remote-exec):     (79.34/100%)
hcloud_server.first_control_plane (remote-exec):     (80.35/100%)
hcloud_server.first_control_plane (remote-exec):     (81.35/100%)
hcloud_server.first_control_plane (remote-exec):     (82.36/100%)
hcloud_server.first_control_plane (remote-exec):     (83.36/100%)
hcloud_server.first_control_plane (remote-exec):     (84.36/100%)
hcloud_server.first_control_plane (remote-exec):     (85.37/100%)
hcloud_server.first_control_plane (remote-exec):     (86.38/100%)
hcloud_server.first_control_plane (remote-exec):     (87.38/100%)
hcloud_server.first_control_plane (remote-exec):     (88.38/100%)
hcloud_server.first_control_plane (remote-exec):     (89.39/100%)
hcloud_server.first_control_plane (remote-exec):     (90.39/100%)
hcloud_server.first_control_plane (remote-exec):     (91.39/100%)
hcloud_server.first_control_plane (remote-exec):     (92.40/100%)
hcloud_server.first_control_plane (remote-exec):     (93.40/100%)
hcloud_server.first_control_plane (remote-exec):     (94.40/100%)
hcloud_server.first_control_plane (remote-exec):     (95.41/100%)
hcloud_server.first_control_plane (remote-exec):     (96.41/100%)
hcloud_server.first_control_plane (remote-exec):     (97.42/100%)
hcloud_server.first_control_plane (remote-exec):     (98.42/100%)
hcloud_server.first_control_plane (remote-exec):     (99.42/100%)
hcloud_server.first_control_plane (remote-exec):     (100.00/100%)
hcloud_server.first_control_plane (remote-exec):     (100.00/100%)
hcloud_server.first_control_plane (remote-exec): + sgdisk -e /dev/sda
hcloud_server.first_control_plane (remote-exec): The operation has completed successfully.
hcloud_server.first_control_plane (remote-exec): + parted -s /dev/sda resizepart 4 99%
hcloud_server.first_control_plane (remote-exec): + parted -s /dev/sda mkpart primary ext2 99% 100%
hcloud_server.first_control_plane (remote-exec): + partprobe /dev/sda
hcloud_server.first_control_plane (remote-exec): + udevadm settle
hcloud_server.first_control_plane (remote-exec): + fdisk -l /dev/sda
hcloud_server.first_control_plane (remote-exec): Disk /dev/sda: 38.15 GiB, 40961572864 bytes, 80003072 sectors
hcloud_server.first_control_plane (remote-exec): Disk model: QEMU HARDDISK
hcloud_server.first_control_plane (remote-exec): Units: sectors of 1 * 512 = 512 bytes
hcloud_server.first_control_plane (remote-exec): Sector size (logical/physical): 512 bytes / 512 bytes
hcloud_server.first_control_plane (remote-exec): I/O size (minimum/optimal): 512 bytes / 512 bytes
hcloud_server.first_control_plane (remote-exec): Disklabel type: gpt
hcloud_server.first_control_plane (remote-exec): Disk identifier: EC33AA26-C0DC-4B6C-AF09-4CA8108C7753

hcloud_server.first_control_plane (remote-exec): Device        Start      End  Sectors  Size Type
hcloud_server.first_control_plane (remote-exec): /dev/sda1      2048     6143     4096    2M BIOS
hcloud_server.first_control_plane (remote-exec): /dev/sda2      6144    47103    40960   20M EFI
hcloud_server.first_control_plane (remote-exec): /dev/sda3     47104 31438847 31391744   15G Linu
hcloud_server.first_control_plane (remote-exec): /dev/sda4  31438848 79203041 47764194 22.8G Linu
hcloud_server.first_control_plane (remote-exec): /dev/sda5  79204352 80001023   796672  389M Linu
hcloud_server.first_control_plane (remote-exec): + mount /dev/sda4 /mnt/
hcloud_server.first_control_plane (remote-exec): + btrfs filesystem resize max /mnt
hcloud_server.first_control_plane (remote-exec): Resize '/mnt' of 'max'
hcloud_server.first_control_plane (remote-exec): + umount /mnt
hcloud_server.first_control_plane (remote-exec): + mke2fs -L ignition /dev/sda5
hcloud_server.first_control_plane (remote-exec): mke2fs 1.46.2 (28-Feb-2021)
hcloud_server.first_control_plane (remote-exec): Discarding device blocks: done
hcloud_server.first_control_plane (remote-exec): Creating filesystem with 398336 1k blocks and 99960 inodes
hcloud_server.first_control_plane (remote-exec): Filesystem UUID: 8a3cd038-472e-4812-abe5-ad2f7a5980ef
hcloud_server.first_control_plane (remote-exec): Superblock backups stored on blocks:
hcloud_server.first_control_plane (remote-exec): 	8193, 24577, 40961, 57345, 73729, 204801, 221185

hcloud_server.first_control_plane (remote-exec): Allocating group tables: done
hcloud_server.first_control_plane (remote-exec): Writing inode tables: done
hcloud_server.first_control_plane (remote-exec): Writing superblocks and filesystem accounting information: done

hcloud_server.first_control_plane (remote-exec): + mount /dev/sda5 /mnt
hcloud_server.first_control_plane (remote-exec): + mkdir /mnt/ignition
hcloud_server.first_control_plane (remote-exec): + cp /root/config.ign /mnt/ignition/config.ign
hcloud_server.first_control_plane (remote-exec): + umount /mnt
hcloud_server.first_control_plane: Provisioning with 'local-exec'...
hcloud_server.first_control_plane (local-exec): Executing: ["/bin/sh" "-c" "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa [email protected] '(sleep 2; reboot)&'; sleep 3"]
hcloud_server.first_control_plane: Still creating... [1m10s elapsed]
hcloud_server.first_control_plane (local-exec): Warning: Permanently added '49.12.221.176' (ECDSA) to the list of known hosts.
hcloud_server.first_control_plane (local-exec): Connection to 49.12.221.176 closed by remote host.
hcloud_server.first_control_plane: Provisioning with 'local-exec'...
hcloud_server.first_control_plane (local-exec): Executing: ["/bin/sh" "-c" "until ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa -o ConnectTimeout=2 [email protected] true 2> /dev/null\ndo\n  echo \"Waiting for MicroOS to reboot and become available...\"\n  sleep 2\ndone\n"]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [1m20s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [1m30s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [1m40s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Still creating... [1m50s elapsed]
hcloud_server.first_control_plane (local-exec): Waiting for MicroOS to reboot and become available...
hcloud_server.first_control_plane: Provisioning with 'file'...
hcloud_server.first_control_plane: Still creating... [2m0s elapsed]
hcloud_server.first_control_plane: Still creating... [2m10s elapsed]
hcloud_server.first_control_plane: Still creating... [2m20s elapsed]
hcloud_server.first_control_plane: Still creating... [2m30s elapsed]
hcloud_server.first_control_plane: Still creating... [2m40s elapsed]
hcloud_server.first_control_plane: Still creating... [2m50s elapsed]
hcloud_server.first_control_plane: Still creating... [3m0s elapsed]
hcloud_server.first_control_plane: Still creating... [3m10s elapsed]
hcloud_server.first_control_plane: Still creating... [3m20s elapsed]
hcloud_server.first_control_plane: Still creating... [3m30s elapsed]
hcloud_server.first_control_plane: Still creating... [3m40s elapsed]
hcloud_server.first_control_plane: Still creating... [3m50s elapsed]
hcloud_server.first_control_plane: Still creating... [4m0s elapsed]
hcloud_server.first_control_plane: Still creating... [4m10s elapsed]
hcloud_server.first_control_plane: Still creating... [4m20s elapsed]
hcloud_server.first_control_plane: Still creating... [4m30s elapsed]
hcloud_server.first_control_plane: Still creating... [4m40s elapsed]
hcloud_server.first_control_plane: Still creating... [4m50s elapsed]
hcloud_server.first_control_plane: Still creating... [5m0s elapsed]
hcloud_server.first_control_plane: Still creating... [5m10s elapsed]
hcloud_server.first_control_plane: Still creating... [5m20s elapsed]
hcloud_server.first_control_plane: Still creating... [5m30s elapsed]
hcloud_server.first_control_plane: Still creating... [5m40s elapsed]
hcloud_server.first_control_plane: Still creating... [5m50s elapsed]
hcloud_server.first_control_plane: Still creating... [6m0s elapsed]
hcloud_server.first_control_plane: Still creating... [6m10s elapsed]
hcloud_server.first_control_plane: Still creating... [6m20s elapsed]
hcloud_server.first_control_plane: Still creating... [6m30s elapsed]
hcloud_server.first_control_plane: Still creating... [6m40s elapsed]
hcloud_server.first_control_plane: Still creating... [6m50s elapsed]
╷
│ Error: file provisioner error
│
│   with hcloud_server.first_control_plane,
│   on master.tf line 54, in resource "hcloud_server" "first_control_plane":
│   54:   provisioner "file" {
│
│ timeout - last error: dial tcp 49.12.221.176:22: connect: operation timed out
╵

traefik ingress: Skipping service: no endpoints found

Hi again :) So...I thought I understood, but I continue to have issues setting up simple ingress routes. I understand if this is out of scope for the kube-hetzner project, as it may likely just be a traefik configuration that I don't understand. In this scenario, I have installed the whoami helm chart:

helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install my-release cowboysysop/whoami

And I can see the ingress, service, and pod deployed all correctly into the default namespace, and port forwarding on the service or pod displays the whoami information, but the external route (https://whoami.site.com) returns an empty reply, like it was never 'caught' via the ingress, I keep getting these messages in the traefik pod's log:

level=error msg="Skipping service: no endpoints found" ingress=whoami-1645104801 namespace=default serviceName=whoami-1645104801 servicePort="&ServiceBackendPort{Name:,Number:80,}" providerName=kubernetes

As always, any help is much appreciated.

Server IPs blacklisted by opensuse.org

@mnencia @phaer At some point I was getting these errors:

It wouldn't even let me curl to the links, nslookup would give the IP, and that works on my personal machine, it responds to HTTPs, but on the node, total silence. Meaning, the IP had downloaded so much, that it was blacklisted.

Had to temporarily "host" the meta4 file, over at https://raw.githubusercontent.com/kube-hetzner/kube-hetzner/staging/.files/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4

It works like a charm and is 10x faster to download, have no idea why 🤯, the only thing is that it would be hard to maintain up-to-date images that way.

4 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}

Hello, after a fresh terraform apply at hetzner, all my pods are in ready state with the following error messages:

0/4 nodes are available: 4 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.

Running the commande:
kubectl taint nodes --all node.cloudprovider.kubernetes.io/uninitialized-
fixed the problem, but the load balancer is not created

Do, you know how can I fix it please ?

thank you - and quick question about tls

Hi - first THANK YOU so much for all the effort you've put into this repository @mysticaltech. I am very new to kubernetes and the whole k3 ecosystem, and your effort to help others is really wonderful. Kudos to you, sir :)

Second - I have a small issue with my TLS and I cannot get it to work. I simply want blog.domain.com to be secured, and though there are lots of ways to get a cert, I'm simply trying to use a wildcard of my own. I've successfully created this using:

kubectl create secret tls domain-tls --cert ./domain.crt --key ./domain.key

And I have the ingress set like (bound to a ghost deployment)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: domain-ingress-blog
spec:
  tls: 
    - hosts: 
      - blog.domain.com
      secretName: domain-tls
  rules:
  - host: blog.domain.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: domain-blog-ghost
            port:
              number: 80

And I've created an A record pointing the blog.domain.com to the traefik load balancer IP with my dns provider.

I am missing something, though, because the default Traefik cert is always shown when I hit blog.domain.com:

My Traefik rendered yaml template is below, resulting from the terraform apply:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    service:
      enabled: true
      type: LoadBalancer
      annotations:
        "load-balancer.hetzner.cloud/name": "traefik"
        # make hetzners load-balancer connect to our nodes via our private k3s-net.
        "load-balancer.hetzner.cloud/use-private-ip": "true"
        # keep hetzner-ccm from exposing our private ingress ip, which in general isn't routeable from the public internet.
        "load-balancer.hetzner.cloud/disable-private-ingress": "true"
        # disable ipv6 by default, because external-dns doesn't support AAAA for hcloud yet https://github.com/kubernetes-sigs/external-dns/issues/2044
        "load-balancer.hetzner.cloud/ipv6-disabled": "false"
        "load-balancer.hetzner.cloud/location": "ash"
        "load-balancer.hetzner.cloud/type": "lb11"
        "load-balancer.hetzner.cloud/uses-proxyprotocol": "true"
        # "load-balancer.hetzner.cloud/http-redirect-http": "true"
        "load-balancer.hetzner.cloud/http-sticky-sessions": "true"
    additionalArguments:
      - "--entryPoints.web.proxyProtocol.trustedIPs=127.0.0.1/32,10.0.0.0/8"
      - "--entryPoints.websecure.proxyProtocol.trustedIPs=127.0.0.1/32,10.0.0.0/8"
      - "--entryPoints.web.forwardedHeaders.trustedIPs=127.0.0.1/32,10.0.0.0/8"
      - "--entryPoints.websecure.forwardedHeaders.trustedIPs=127.0.0.1/32,10.0.0.0/8"

Any help would be greatly appreciated. Thanks again :)

Why are Server not Put into a Placement Group?

When I Setup a Cluster it seems that the server are not created with a Placement Group of the Type "Spread".
This should be common practice though to maximise availability should the host machine fail.

There is a fairly recent Tutorial on Hetzner Community that mentions that.

Stuck on Waiting for load-balancer to get an IP

I am trying to create a small cluster with 1 control plane and 2 agents. I already increased the timeout of the bash script to 500 and still having the issue. I tried to create a new project and generated a new api token and still the same.

Here's the loop output

null_resource.first_control_plane: Still creating... [10m50s elapsed]
null_resource.first_control_plane (remote-exec): Waiting for load-balancer to get an IP...
null_resource.first_control_plane (remote-exec): Waiting for load-balancer to get an IP... 
null_resource.first_control_plane (remote-exec): Waiting for load-balancer to get an IP...

Here is my vars file

location                  = "fsn1"
network_region            = "eu-central" 
agent_server_type         = "cpx21"
control_plane_server_type = "cpx11"
lb_server_type            = "lb11"
servers_num               = 1
agents_num                = 2

Remove SSH password auth

We need to remove SSH password auth ideally through ignition, but if not possible, through combustion. And also do some basic hardening of that service if needed.

First and foremost, we need to find the location of the SSH config file.

Migration from k3os to openSUSE MicroOS

Recently Rancher, the creators of k3s and k3os has been bought by SUSE. And in doing so, they've dropped official support for k3os (k3s on the other hand is thriving and has been separated from Rancher).

I went on to contact Jacob Blain Christen, the lead maintainer of k3os, and he told me that he'll continue to do releases on the weekends and that the project could live on if the community maintained it.

However, that is not a stable backing for this project, so I made my own research and concluded that OpenSuse MicroOS, has HUGE backing has it piggybacks on Tumbleweed, a major OpenSuse distro, and has stable and automated transactional updates, as such it's now the best OS to replace k3os.

When using allow_scheduling_on_control_plane=true, lb does not point to control planes

What I did:

git clone https://github.com/kube-hetzner/kube-hetzner.git
git checkout staging

export TF_VAR_hcloud_token="my-hcloud-token"

Create a terraform.tfvars file:

# You need to replace these
public_key   = "~/.ssh/id_ed25519.pub"
# Must be "private_key = null" when you want to use ssh-agent, for a Yubikey like device auth or an SSH key-pair with passphrase
private_key  = "~/.ssh/id_ed25519"

# These can be customized, or left with the default values
# For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
# For Hetzner server types see https://www.hetzner.com/cloud
location                  = "nbg1" # change to `ash` for us-east Ashburn, Virginia location
network_region            = "eu-central" # change to `us-east` if location is ash
agent_server_type         = "cpx21"
control_plane_server_type = "cpx31"
lb_server_type            = "lb11"
servers_num               = 3
agents_num                = 0

# If you want to use a specific Hetzner CCM and CSI version, set them below, otherwise leave as is for the latest versions
# hetzner_ccm_version = ""
# hetzner_csi_version = ""

# If you want to kustomize the Hetzner CCM and CSI containers with the "latest" tags and imagePullPolicy Always, 
# to have them automatically update when the node themselve get updated via the rancher system upgrade controller, the default is "false".
# If you choose to keep the default of "false", you can always use ArgoCD to monitor the CSI and CCM manifest for new releases,
# that is probably the more "vanilla" option to keep these components always updated. 
# hetzner_ccm_containers_latest = true
# hetzner_csi_containers_latest = true

# If you want to use letsencrypt with tls Challenge, the email address is used to send you certificates expiration notices
traefik_acme_tls = true
traefik_acme_email = "[email protected]"

# If you want to allow non-control-plane workloads to run on the control-plane nodes set "true" below. The default is "false".
allow_scheduling_on_control_plane = true

What happened:

All resources are properly scheduled, but the load balancer does not point to the control planes.

What I expect:

I expect a reference from the load balancer to the control planes.

Request: enable IPv6 on Loadbalancer

Hi,

would be great if we can enable IPv6 on loadbalancer via a variable and get the v6 address just like the v4 (hcloud_load_balancer.traefik.ipv6)

Thanks
and great project!

Stuck on `remote-exec`

Hi @mysticaltech !

Finally got time to test this beast repo this weekend ;)
This might sound a bit silly, but I'm kind of stuck on the remote-exec for the initialization of the first_control_plane

I did clone and create a new terraform.tfvars with the corresponding token, public key and private key.
Server spins up fine but seems like just can't connect to it.

However, if I manually connect from the host to ssh [email protected], this works!

Let me know what I'm missing - and great stuff! thank you for sharing this repo.

Secret in TLS example

The TLS example configures a secret while at the same time providing annotations to let traefik request a certificate from lets encrypt:
https://github.com/kube-hetzner/kube-hetzner/blob/6f6de884ec1baace14b894e5ee1917ffa947e1ca/examples/tls/ingress.yaml#L12

If I get this right the line referencing the secret is wrong here? If thats the case I could do a pull request to fix the documentation and example.

Latest changes breaks update

Running terraform apply on an already existing cluster fails because IP changes:
command:
terraform apply -var-file=prod.tfvars -var hcloud_token="REDACTED"

error:

module.kubernetes.hcloud_server.first_control_plane: Modifying... [id=18032205]
╷
│ Error: hcloud/updateServerInlineNetworkAttachments: hcloud/inlineAttachServerToNetwork: attach server to network: provided IP is not available (ip_not_available)
│ 
│   with module.kubernetes.hcloud_server.first_control_plane,
│   on .terraform/modules/kubernetes/master.tf line 1, in resource "hcloud_server" "first_control_plane":
│    1: resource "hcloud_server" "first_control_plane" {
│

plan output:

$ terraform plan -var-file=prod.tfvars -var hcloud_token="REDACTED"
module.kubernetes.hcloud_ssh_key.k3s: Refreshing state... [id=5585976]
module.kubernetes.random_password.k3s_token: Refreshing state... [id=none]
module.kubernetes.hcloud_network.k3s: Refreshing state... [id=REDACTED]
module.kubernetes.hcloud_placement_group.k3s: Refreshing state... [id=22200]
module.kubernetes.hcloud_firewall.k3s: Refreshing state... [id=303856]
module.kubernetes.hcloud_network_subnet.k3s: Refreshing state... [id=REDACTED-10.0.0.0/16]
module.kubernetes.hcloud_server.first_control_plane: Refreshing state... [id=18032205]
module.kubernetes.hcloud_server.control_planes[0]: Refreshing state... [id=18036735]
module.kubernetes.hcloud_server.agents[2]: Refreshing state... [id=18076430]
module.kubernetes.hcloud_server.agents[0]: Refreshing state... [id=18032235]
module.kubernetes.hcloud_server.agents[1]: Refreshing state... [id=18036736]
module.kubernetes.hcloud_server.control_planes[1]: Refreshing state... [id=18032231]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
 <= read (data resources)

Terraform will perform the following actions:

  # module.kubernetes.data.remote_file.kubeconfig will be read during apply
  # (config refers to values not yet known)
 <= data "remote_file" "kubeconfig"  {
      + content = (known after apply)
      + id      = (known after apply)
      + path    = "/etc/rancher/k3s/k3s.yaml"

      + conn {
          + agent = (sensitive)
          + host  = "REDACTED"
          + port  = 22
          + user  = "root"
        }
    }

  # module.kubernetes.hcloud_server.agents[0] will be updated in-place
  ~ resource "hcloud_server" "agents" {
        id                 = "18032235"
        name               = "k3s-agent-0"
        # (17 unchanged attributes hidden)

      - network {
          - alias_ips   = [] -> null
          - ip          = "10.0.1.1" -> null
          - mac_address = "86:00:00:04:6a:7d" -> null
          - network_id  = REDACTED -> null
        }
      + network {
          + alias_ips   = []
          + ip          = "10.0.2.1"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.hcloud_server.agents[1] will be updated in-place
  ~ resource "hcloud_server" "agents" {
        id                 = "18036736"
        name               = "k3s-agent-1"
        # (17 unchanged attributes hidden)

      - network {
          - alias_ips   = [] -> null
          - ip          = "10.0.1.2" -> null
          - mac_address = "86:00:00:04:70:79" -> null
          - network_id  = REDACTED -> null
        }
      + network {
          + alias_ips   = []
          + ip          = "10.0.2.2"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.hcloud_server.agents[2] will be updated in-place
  ~ resource "hcloud_server" "agents" {
        id                 = "18076430"
        name               = "k3s-agent-2"
        # (17 unchanged attributes hidden)

      - network {
          - alias_ips   = [] -> null
          - ip          = "10.0.1.3" -> null
          - mac_address = "86:00:00:04:9c:3f" -> null
          - network_id  = REDACTED -> null
        }
      + network {
          + alias_ips   = []
          + ip          = "10.0.2.3"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.hcloud_server.control_planes[0] will be updated in-place
  ~ resource "hcloud_server" "control_planes" {
        id                 = "18036735"
        name               = "k3s-control-plane-1"
        # (17 unchanged attributes hidden)

      - network {
          - alias_ips   = [] -> null
          - ip          = "10.0.0.3" -> null
          - mac_address = "86:00:00:04:70:78" -> null
          - network_id  = REDACTED -> null
        }
      + network {
          + alias_ips   = []
          + ip          = "10.0.1.2"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.hcloud_server.control_planes[1] will be updated in-place
  ~ resource "hcloud_server" "control_planes" {
        id                 = "18032231"
        name               = "k3s-control-plane-2"
        # (17 unchanged attributes hidden)

      - network {
          - alias_ips   = [] -> null
          - ip          = "10.0.0.4" -> null
          - mac_address = "86:00:00:04:6a:7a" -> null
          - network_id  = REDACTED -> null
        }
      + network {
          + alias_ips   = []
          + ip          = "10.0.1.3"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.hcloud_server.first_control_plane will be updated in-place
  ~ resource "hcloud_server" "first_control_plane" {
        id                 = "18032205"
        name               = "k3s-control-plane-0"
        # (17 unchanged attributes hidden)

      + network {
          + alias_ips   = []
          + ip          = "10.0.1.1"
          + mac_address = (known after apply)
          + network_id  = REDACTED
        }
    }

  # module.kubernetes.local_file.kubeconfig will be created
  + resource "local_file" "kubeconfig" {
      + directory_permission = "0777"
      + file_permission      = "600"
      + filename             = "kubeconfig.yaml"
      + id                   = (known after apply)
      + sensitive_content    = (sensitive value)
    }

The connection to the server x.x.x.x:6443 was refused - did you specify the right host or port?

Hi all - interestingly, I've left the k3 cluster running for some time, and twice now the cluster has become completely unreachable. This happens after a couple hours, but I am not sure how many. I think it's somehow tied to the auto-rebooting nature of kured but that's a guess. If I restart the servers via the hetzner UI one by one, the cluster comes back online.

This is the error I'm getting:

The connection to the server 5.161.69.37:6443 was refused - did you specify the right host or port?

And when I ssh into that box, this is the status I see for the k3 service:

static:~ # systemctl status k3s-server.service
× k3s-server.service - Lightweight Kubernetes
     Loaded: loaded (/usr/lib/systemd/system/k3s-server.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2022-02-18 00:27:29 UTC; 2h 16min ago
       Docs: https://k3s.io
    Process: 1478 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 1484 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 1485 ExecStart=/usr/bin/k3s server ${SERVER_OPTS} (code=exited, status=2)
   Main PID: 1485 (code=exited, status=2)
      Tasks: 82
        CPU: 1h 2min 31.811s
     CGroup: /system.slice/k3s-server.service
             ├─2215 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id e579d5aad3973a0ce14cbb971f1415469a925718cc2ed07f3556c70688f631f9 -address /run/k3s/containerd/containerd.sock
             ├─2218 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id a8b619e61d6205f9eb9ae4b7714dc9e5ed68042dcdac8230072682e2f984e972 -address /run/k3s/containerd/containerd.sock
             ├─2432 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id a31940dafe0a541b55356cdaf11845155b4261ea756ecc253eb4f3d17bacb571 -address /run/k3s/containerd/containerd.sock
             ├─2513 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id 6b6eb08633e07d89b0485d57bff93bdfd4f768e8da5d8e87edb8bcb58d7c7086 -address /run/k3s/containerd/containerd.sock
             ├─2656 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id 49b259381dae18c8113ff298eeab50130d6c6836aca4c2a42d57bb577bd0687d -address /run/k3s/containerd/containerd.sock
             └─2856 /usr/sbin/containerd-shim-runc-v2 -namespace k8s.io -id a2b2f399521c8270574ab6f5b505806a4775ff4a76a44af53d87b534503a2088 -address /run/k3s/containerd/containerd.sock

Feb 18 00:27:29 static k3s[1485]:         /home/abuild/rpmbuild/BUILD/k3s-1.22.3-k3s1/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:272 +0x745
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Failed with result 'exit-code'.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2215 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2218 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2432 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2513 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2656 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Unit process 2856 (containerd-shim) remains running after unit stopped.
Feb 18 00:27:29 static systemd[1]: k3s-server.service: Consumed 1h 2min 29.664s CPU time.

Happy to dig in to find more detail, but I am pretty new to k3 (and k8). I would suggest leaving a cluster running for a bit and seeing if this is also happening to you? My terraforms.tfvars is pretty simple:

# You need to replace these
hcloud_token = "xxxx"
public_key   = "./xxx-kube.pub"
# Must be "private_key = null" when you want to use ssh-agent, for a Yubikey like device auth or an SSH key-pair with passphrase
private_key  = "./xxx-kube"


# These can be customized, or left with the default values
# For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
# For Hetzner server types see https://www.hetzner.com/cloud
location                  = "ash" # change to `ash` for us-east Ashburn, Virginia location
network_region            = "us-east" # change to `us-east` if location is ash
agent_server_type         = "cpx31"
control_plane_server_type = "cpx11"
lb_server_type            = "lb11"

# At least 3 server nodes is recommended for HA, otherwise you need to turn off automatic upgrade (see ReadMe).
servers_num               = 3

# For agent nodes, at least 2 is recommended for HA, but you can keep automatic upgrades.
agents_num                = 3

# If you want to use a specific Hetzner CCM and CSI version, set them below, otherwise leave as is for the latest versions
# hetzner_ccm_version = ""
# hetzner_csi_version = ""

# If you want to allow non-control-plane workloads to run on the control-plane nodes set "true" below. The default is "false".
# allow_scheduling_on_control_plane = true

As always, thanks for the help! I hope this is just something with my setup, and not universal, but I thought I should report it now that I've seen it happen twice.

timeout - last error: dial tcp IP:22: i/o timeout

Hello, when I run the following command:
terraform apply -auto-approve,
I have this error:

This is my configuration:

# Only the first values starting with a * are obligatory, the rest can remain with their default values, or you
# could adapt them to your needs.
#
# Note that some values, notably "location" and "public_key" have no effect after the initial cluster has been setup.
# This is in order to keep terraform from re-provisioning all nodes at once which would loose data. If you want to update,
# those, you should instead change the value here and then manually re-provision each node one-by-one. Grep for "lifecycle".

# * Your Hetzner project API token 
hcloud_token = "🤐"
# * Your public key
public_key = "id_rsa.pub"
# * Your private key, must be "private_key = null" when you want to use ssh-agent, for a Yubikey like device auth or an SSH key-pair with passphrase
private_key = "id_rsa"

# These can be customized, or left with the default values
# For Hetzner locations see https://docs.hetzner.com/general/others/data-centers-and-connection/
# For Hetzner server types see https://www.hetzner.com/cloud
location       = "fsn1"       # change to `ash` for us-east Ashburn, Virginia location
network_region = "eu-central" # change to `us-east` if location is ash

# You can have up to as many subnets as you want (preferably if the form of 10.X.0.0/16),
# their primary use is to logically separate the nodes.
# The control_plane network is mandatory.
network_ipv4_subnets = {
  control_plane = "10.1.0.0/16"
  agent_big     = "10.2.0.0/16"
  agent_small   = "10.3.0.0/16"
}

# At least 3 server nodes is recommended for HA, otherwise you need to turn off automatic upgrade (see ReadMe).
# As per rancher docs, it must be always an odd number, never even! See https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
# For instance, 1 is ok (non-HA), 2 not ok, 3 is ok (becomes HA).
control_plane_count = 3

# The type of control plane nodes, see https://www.hetzner.com/cloud, the minimum instance supported is cpx11 (just a few cents more than cx11)
control_plane_server_type = "cpx11"

# As for the agent nodepools, below is just an example, if you do not want nodepools, just use one,
# and change the name to what you want, it need not be "agent-big" or "agent-small", also give them the subnet prefer.
# For single node clusters set this equal to {}
agent_nodepools = {
  # agent-big = {
  #   server_type = "cpx21",
  #   count       = 1,
  #   subnet      = "agent_big",
  # }
  agent-small = {
    server_type = "cpx11",
    count       = 2,
    subnet      = "agent_small",
  }
}

# That will depend on how much load you want it to handle, see https://www.hetzner.com/cloud/load-balancer
load_balancer_type = "lb11"

### The following values are fully optional

# It's best to leave the network range as is, unless you know what you are doing. The default is "10.0.0.0/8".
# network_ipv4_range = "10.0.0.0/8"

# If you want to use a specific Hetzner CCM and CSI version, set them below, otherwise leave as is for the latest versions
# hetzner_ccm_version = ""
# hetzner_csi_version = ""

# If you want to use letsencrypt with tls Challenge, the email address is used to send you certificates expiration notices
traefik_acme_tls   = true
traefik_acme_email = "🤐"

# If you want to allow non-control-plane workloads to run on the control-plane nodes set "true" below. The default is "false".
# Also good for single node clusters.
/* allow_scheduling_on_control_plane = true */

# If you want to disable automatic upgrade of k3s, you can set this to false, default is "true".
# automatically_upgrade_k3s = false

# Allows you to specify either stable, latest, or testing (defaults to stable), see https://rancher.com/docs/k3s/latest/en/upgrades/basic/
# initial_k3s_channel = "latest"

# Adding extra firewall rules, like opening a port
# In this example with allow port TCP 5432 for a Postgres service we will open via a nodeport
# More info on the format here https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/firewall
# extra_firewall_rules = [
#   {
#     direction = "in"
#     protocol  = "tcp"
#     port      = "5432"
#     source_ips = [
#       "0.0.0.0/0"
#     ]
#   },
# ]

Do you know how can I fix this problem please ?

Error: hcloud/inlineAttachServerToNetwork: attach server to network: no subnet or IP available (service_error)

Hi - thanks so much for this project. When I attempt to deploy to the ash region, using the instructions I get this after the terraform apply -auto-approve command:

Error: hcloud/inlineAttachServerToNetwork: attach server to network: no subnet or IP available (service_error)
   with hcloud_server.first_control_plane,
   on master.tf line 1, in resource "hcloud_server" "first_control_plane":
    1: resource "hcloud_server" "first_control_plane" {

Unable to deploy Kubernetes cluster. SSH connection fails

Hi,
I'm trying to deploy a cluster with the same config than the template but when the servers reboot the deployment is unable to connect to them via SSH.

module.control_planes[1].hcloud_server.server (remote-exec): Connecting to remote host via SSH... module.control_planes[1].hcloud_server.server (remote-exec): Host: XXXXX module.control_planes[1].hcloud_server.server (remote-exec): User: root module.control_planes[1].hcloud_server.server (remote-exec): Password: false module.control_planes[1].hcloud_server.server (remote-exec): Private key: true module.control_planes[1].hcloud_server.server (remote-exec): Certificate: false module.control_planes[1].hcloud_server.server (remote-exec): SSH Agent: true module.control_planes[1].hcloud_server.server (remote-exec): Checking Host Key: false module.control_planes[1].hcloud_server.server (remote-exec): Target Platform: unix module.agents["agent-small-0"].hcloud_server.server: Still creating... [4m30s elapsed] module.agents["agent-big-0"].hcloud_server.server: Still creating... [4m30s elapsed]

use of existing https certificates

With https://console.hetzner.cloud/projects/.../security/certificates there is a comfortable interface for managing https certificates over different contexes.

Is there any way to use them with kube-hetzner?

Creating a cluster with 3 control-planes, the 3rd one fails to join

So I took up a cluster with 3 control planes and 2 agents. Then noticed that only 2 servers and 2 agents were present. But hcloud server list was listing them all.

Turns out it failed to join, because of "too many learner", as follows:

So I issued systemctl start k3s-server another time and it worked. Meaning we have to wait and make sure that servers start before, and retry if necessary, before returning success.

Move to k3s in binary form

@mnencia @phaer Had a very interesting conversation with Richard Brown, he says no RPM is needed and that the btrfs sub-volumes are writable so we can just swap the binary, and voila!

So we can go back to MicroOS vanilla, and just use the k3s binaries from https://github.com/k3s-io/k3s/releases, as is. Maybe have a timer that checks for a new release, if there is one, touch /var/run/reboot-required, Kured drains the node, and reboots, and on reboot, we have a small script that does the swap :)

Let's see - I will try to give it a shot this weekend, but please do not hesitate if you feel inspired.

Also, welcome to the team if you'll accept, just sent you the invitation :) 🍾

How do NodePort Services work?

Hi there,

thank you very much for your awesome help in my last ticket regarding the TLS configuration. It works now :)

Now im wondering, how exposing a service with a NodePort works. If I understand it correctly the firewall and loadbalancer should get configured to forward the port once I create a NodePort Service in my cluster, but that does not happen.

Is my assumption correct? Is NodePort not possible with kube-hetzner?

Thank you very much

Where to terminate TLS connections?

Hi,

I really appreciate your work and have successfully created my own k8s cluster on the Hetzner cloud :)

Now I wanted to add TLS / HTTPS support and wanted to let the TLS connection terminate on the loadbalancer. Automatically retrieved certificates from letsencrypt seem fine. However the loadbalancer does not seem to work when I change its service

from
"[tcp] 443 -> 31028"
to
"[https] 443 -> 30468"

I have completely removed the tcp service for port 80, because I think I will not need it.
The loadbalancer shows 'unhealthy' for this service and I cannot access any ingress anymore.

Can someone please advise me on how to achieve TLS support with the hetzner loadbalancer and traefik ingress? :) Thanks!

staging not found

commit e7f016f

staging is not available for me:

hcloud_server.first_control_plane (remote-exec): 02/10 12:20:52 [ERROR] CUID#7 - Download aborted. URI=https://raw.githubusercontent.com/kube-hetzner/kube-hetzner/staging/.files/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4
hcloud_server.first_control_plane (remote-exec): Exception: [AbstractCommand.cc:351] errorCode=3 URI=https://raw.githubusercontent.com/kube-hetzner/kube-hetzner/staging/.files/openSUSE-MicroOS.x86_64-k3s-kvm-and-xen.qcow2.meta4
hcloud_server.first_control_plane (remote-exec):   -> [HttpSkipResponseCommand.cc:218] errorCode=3 Resource not found

replaced staging with master, now it is running

Add cilium helm repository

Hi!

For users that don't have the repo added to helm:
provisioner "local-exec" {
command = "helm repo add cilium https://helm.cilium.io/ "
}

Failed on 'first_control_plane`

Hehe back again with another question ;p
Do you mind to give pointers what might go wrong in this case? 🙏

[FIXED] hcloud/setRescue: hcclient/WaitForActions: action _ failed

Sometimes Hetzner nodes just fail to enter the rescue mode and the node stays off.

Control-plane-0 should be tainted with 'node-role.kubernetes.io/master=true:NoSchedule'

Suggest to taint master node (control-plane-0) with node-role.kubernetes.io/master=true:NoSchedule as currently it is not tainted.
I.e. adding

  # Taint Control Plane as master node to avoid scheduling of workloads here
  provisioner "local-exec" {
    command = <<-EOT
      kubectl taint nodes "${self.name}" node-role.kubernetes.io/master=true:NoSchedule
    EOT
  }

to master.tf

Because of being not tainted, workload pods can get placed on control-plane-0 and disrupt cluster
(i.e. Cassandra or HDFS pod placed on control-plane-0 with almost 100% guarantee disrupts cluster)

Fix for agents launched between Feb 10 and Feb 15 2022

Hello folks, there was an error in the agents' definition launched from Feb 10 to Feb 15. Here's the fix. You have two options.

1/ Scale down, to 0 agents, apply and scale back up, apply.

2/ Login via SSH to each agent and issue a few commands to fix them.

Get the agent IP with hcloud server list
Login via ssh root@IP -i ~/.ssh/id_ed25519 -o StrictHostKeyChecking=no

Issue the following commands:

systemctl disable k3s-server
systemctl stop k3s-server
systemctl --now enable k3s-agent

feature request: Nodepools

Feature Nodepools

In the current form, it's only possible to create a cluster with equally sized nodes.
It would be great to have be able to have different sized nodes like this:

  pools:
    - id: "memory-pool"
      count: 3
      size: CX51
    - id: "worker-pool"
      count: 8
      size: CX11

If we like to go further, it's maybe possible to spead the cluster into different physical locations too. But I think that would be a really hard nut, because of private networks asf.

  pools:
    - id: "memory-pool"
      location: "fsn1"
      count: 3
      size: CX51
    - id: "worker-pool"
      location: "hel1"
      count: 8
      size: CX11

Error: hcloud/setRescue: hcclient/WaitForActions: action 347680047 failed: Unknown Error (unknown_error)

If you get the following error:

Error: hcloud/setRescue: hcclient/WaitForActions: action 347680047 failed: Unknown Error (unknown_error)

Please run, terraform apply -auto-approve again. It happens because rarely Hetzner cloud takes time to enter rescue mode.

k3s failed to start, see journalctl -u k3s

[Fixed] k3s failed to start, see journalctl -u k3s, that error happens sometimes on first_control_plane when the eth1 network interface is not present. This bug is rare enough, and we believe it comes from Hetzner, randomly.

If it happens, destroy, and re-apply terraform.

'sleep' is not recognized as an internal or external command

After runing the cluster I get this,

╷
│ Error: local-exec provisioner error
│
│ with hcloud_server.first_control_plane,
│ on master.tf line 44, in resource "hcloud_server" "first_control_plane":
│ 44: provisioner "local-exec" {
│
│ Error running command 'sleep 60 && ping 138.201.89.68 | grep --line-buffered "bytes from" | head -1 && sleep 100 &&
│ scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./keys/id_rsa
│ [email protected]:/etc/rancher/k3s/k3s.yaml ./kubeconfig.yaml
│ sed -i -e 's/127.0.0.1/138.201.89.68/g' ./kubeconfig.yaml
│ ': exit status 1. Output: 'sleep' is not recognized as an internal or external command,
│ operable program or batch file.

I'm trying to access the server but I have another problem, I have a generated keys using puttygen, I can't connect to the controlplane . does anyine know how to export key from keygen in a proper format? i'm using windows 10

feature request: custom firewall

Just a suggestion (but a really nice to have):

Feature Custom Firewalls

In order to keep variables, where they should be, and never touch the .tf files,
it would be nice if there is a place, where custom ports can be managed:

currently I do this in main.tf

resource "hcloud_firewall" "k3s" {
 name = "k3s-firewall"

## My Custom firewall rule
# Postgres
 rule {
   direction = "out"
   protocol  = "tcp"
   port      = "5432"
   destination_ips = [
     "0.0.0.0/0"
   ]
 }

Maybe that part could be "outsourced" in a firewall.tf-file or in a nested array in variables.

edit: removed "multiple feature" like mentioned in the first comment

kube-hetzner / terraform-hcloud-kube-hetzner Goto Github PK

terraform-hcloud-kube-hetzner's Issues

Feature Nodepools

Feature Custom Firewalls

Recommend Projects

Recommend Topics

Recommend Org