Giter VIP home page Giter VIP logo

Comments (12)

dperetti avatar dperetti commented on June 20, 2024 2

This terraform thing is too flakey i'm afraid. As far as I'm concerned, it has worked for a few days. Now I cannot create nodeools anymore. It's stuck in creating state even though the servers are up and running in the Hetzner console.
On the servers I get:
"Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"

from terraform-hcloud-kube-hetzner.

andi0b avatar andi0b commented on June 20, 2024 2

Might be connected to the (unresolved) discussion I started recently #1287

I'm having weird behaviour after updating the nodes with a recent microos update. Weird network connectivity issues, that I couldn't figure out yet (I just rolled back and disabled updates for now).

Edit: I also saw some "503 Service Unavailable" and "connection refused" in my logs. I know those are very generic errors, but still.

from terraform-hcloud-kube-hetzner.

mateuszlewko avatar mateuszlewko commented on June 20, 2024 1

Now when running again I see the following logs for systemctl status k3s-agent:


el=info msg="Waiting to retrieve agent configuration; server is not ready: CA cert validation failed: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"

from terraform-hcloud-kube-hetzner.

valkenburg-prevue-ch avatar valkenburg-prevue-ch commented on June 20, 2024 1

Hmm, I might have had the same too, I had nodes unable to come back to life after a reboot after a k3s upgrade. Replaced the nodes (long live longhorn) and turned off upgrades. Have not verified that this was the real problem though, but haven't seen it happen again either. No need to roll back anything though: the fresh nodes are on the latest k3s and have microos updating weekly without issues. Just not automatically upgrading k3s.

from terraform-hcloud-kube-hetzner.

mateuszlewko avatar mateuszlewko commented on June 20, 2024 1

I disabled wireguard and recreated the cluster some time later. I haven't checked if wireguard works better with cillium or if that was the actual problem.

from terraform-hcloud-kube-hetzner.

andi0b avatar andi0b commented on June 20, 2024 1

@mysticaltech

Did you manage to make it work? What about you @andi0b ?

No, I'm currently on easter holiday and didn't investigate it more. I just disabled kured (I think with something like kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}') and rolled back the nodes to the last working snapshot (i think with transactional-update rollback [number]).

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024 1

Folks, this was probably due to a bug in system upgrade controller, now fixed. Make sure to upgrade with terraform init -upgrade. If such an issue comes again, please don't hesitate to open another one with your kube.tf. Closing this one for now.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

@kube-hetzner/core Any ideas?

@mateuszlewko Try with cni_ plugin="cilium", I would guess it works better with wireguard.

from terraform-hcloud-kube-hetzner.

kimdre avatar kimdre commented on June 20, 2024

Hi, I got the same error timed out waiting for the condition on deployments/system-upgrade-controller with a very similiar configuration and cilium enabled.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

Considering this as a occasional hiccup, but will monitor the situation.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

@kimdre Could you share your kube.tf please.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

@mateuszlewko Did you manage to make it work? What about you @andi0b ?

from terraform-hcloud-kube-hetzner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.