Giter VIP home page Giter VIP logo

Comments (10)

M4t7e avatar M4t7e commented on June 20, 2024 3

Hey @tobiasehlert, HCCM is already hinting at what's wrong here:

Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)

Overview:

  • network_ipv4_cidr = 10.20.128.0/17 (Hetzner Network)
    • Starting from this network onwards: Agent Node Subnets
    • At the end of this network: Server Node Subnets
  • cluster_ipv4_cidr = 10.20.128.0/20 (Reserved for K8s Pod Networks -> HCCM RouteController)

k3s-01-agent-small-nbg1-vvj: 10.20.128.101

HCCM RouteController tried to add the Pod network route 10.20.128.0/24 (probably matching 1:1 with the subnet of the server itself) with 10.20.128.101 as the gateway:

  1. The gateway IP can not be contained in destination range (only exception are default routes with 0.0.0.0/0)
  2. The Pod IP range is probably clashing with the Hetzner Network Subnets for Agent Nodes

You have to leave enough space at the beginning and at the end of network_ipv4_cidr for Hetzner Networks, so that they don't collide with Pod and Service CIDRs (especially at the beginning of the ranges).

from terraform-hcloud-kube-hetzner.

tobiasehlert avatar tobiasehlert commented on June 20, 2024 2

@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.

Yeah I saw that note about changing cidrs, but had to due some overlapping cidr :(
But yeah, thanks to @M4t7e it works now.. was unaware how to portion up the subnets, but how it rocks :D

from terraform-hcloud-kube-hetzner.

tobiasehlert avatar tobiasehlert commented on June 20, 2024 1

Found also some event that looks reasonable for one of my nodes (k3s-01-agent-small-nbg1-vvj) in the cluster:

Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 398.38809ms: hcloud/CreateRoute: invalid gateway (invalid_input)

When looking at the pod logs of hcloud-cloud-controller-manager it looks like there is some routing issue..

2024-02-22T13:42:11+01:00 I0222 12:42:11.829375       1 route_controller.go:216] action for Node "k3s-01-control-plane-hel1-iwm" with CIDR "10.20.132.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829410       1 route_controller.go:216] action for Node "k3s-01-control-plane-nbg1-oze" with CIDR "10.20.131.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829422       1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-vvj" with CIDR "10.20.128.0/24": "add"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829433       1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-yiv" with CIDR "10.20.129.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829445       1 route_controller.go:216] action for Node "k3s-01-control-plane-fsn1-ywt" with CIDR "10.20.130.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829459       1 route_controller.go:290] route spec to be created: &{ k3s-01-agent-small-nbg1-vvj false [{InternalIP 10.20.128.101} {Hostname k3s-01-agent-small-nbg1-vvj} {ExternalIP XX.XX.XX.XX}] 10.20.128.0/24 false}
2024-02-22T13:42:11+01:00 I0222 12:42:11.829493       1 route_controller.go:304] Creating route for node k3s-01-agent-small-nbg1-vvj 10.20.128.0/24 with hint fac268fa-acab-4287-bc7f-5008bb1790cf, throttled 12.44ยตs
2024-02-22T13:42:12+01:00 E0222 12:42:12.401242       1 route_controller.go:329] Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)
2024-02-22T13:42:12+01:00 I0222 12:42:12.401365       1 route_controller.go:387] Patching node status k3s-01-agent-small-nbg1-vvj with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-02-22 12:42:00 +0000 UTC,LastTransitionTime:2024-02-22 12:42:00 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}
2024-02-22T13:42:12+01:00 I0222 12:42:12.401535       1 event.go:307] "Event occurred" object="k3s-01-agent-small-nbg1-vvj" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 571.712557ms: hcloud/CreateRoute: invalid gateway (invalid_input)"

Someone experienced this before?

from terraform-hcloud-kube-hetzner.

tobiasehlert avatar tobiasehlert commented on June 20, 2024 1

I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know ๐Ÿ™

Yes, from what I've seen yet it looks exactly like that.. just removed the whole cluster and created a new one and it's not working with cilium_routing_mode set to tunnel. But there was no difference at all @mysticaltech

To me it looks like it's the hcloud csi things that are the issue in this case.. but I can't get my head around the issue.

from terraform-hcloud-kube-hetzner.

M4t7e avatar M4t7e commented on June 20, 2024 1

@tobiasehlert Yeah, sure. Here some considerations for the subnetting...

You need enough space for Hetzner Subnets. Total limit today is 50 Subnets per Network (see https://docs.hetzner.com/cloud/networks/faq#are-there-any-limits-on-how-networks-can-be-used).

For routing configuration simplicity, it's best if cluster_ipv4_cidr falls within network_ipv4_cidr. The cluster_ipv4_cidr will use most IPs since they are allocated for the Pods, and Hetzner CCM reserves larger ranges for the Nodes, adding the Pod routes with the corresponding Node IP as the gateway. Max 100 routes per Network are possible (see Hetzner faq). service_ipv4_cidr typically requires less space compared to the Pods.

Hetzner Subnets and Pod Networks are both allocated in ascending order. Therefore, we could disregard the Server Node Subnets at the end (it's highly unlikely they will ever be used) if we aim to save space.

One example could be like this:

  • network_ipv4_cidr = 10.0.0.0/16 (sufficient for 64 /24 Subnets -> you can treat only 10.0.0.0/18 as reserved for it)
  • service_ipv4_cidr = 10.0.64.0/18 (half size of cluster_ipv4_cidr)
  • cluster_dns_ipv4 = 10.0.64.10 (has to be in service_ipv4_cidr)
  • cluster_ipv4_cidr = 10.0.128.0/17 (biggest range for Pods -> more than 100 /24 networks/routes for Pods)

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024 1

Thanks @M4t7e, excellent! Should've had a better look at the kube.tf.

@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

Thanks for sharing @tobiasehlert, @M4t7e FYI happening in cilium.

I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know ๐Ÿ™

from terraform-hcloud-kube-hetzner.

mysticaltech avatar mysticaltech commented on June 20, 2024

@tobiasehlert Weird, it's the first time we hear of that. Please inspect and share your hcloud ccm and csi logs then if you suspect this is causing the issue. Also please have a look at our readme's debug section and try to do some general node level debug just in case. Also, the hcloud cli can be useful here to inspect the routes and such.

from terraform-hcloud-kube-hetzner.

tobiasehlert avatar tobiasehlert commented on June 20, 2024

Thanks for your response @M4t7e!

What size should the both Service and Cluster code be each? Do you have some suggestions there?

from terraform-hcloud-kube-hetzner.

tobiasehlert avatar tobiasehlert commented on June 20, 2024

Thanks @M4t7e!

I'll go for this then :)

network_ipv4_cidr = "10.20.128.0/17"
service_ipv4_cidr = "10.20.160.0/19"
cluster_ipv4_cidr = "10.20.192.0/18"
cluster_dns_ipv4  = "10.20.160.10"

from terraform-hcloud-kube-hetzner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.