Comments (10)
Hey @tobiasehlert, HCCM is already hinting at what's wrong here:
Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)
Overview:
- network_ipv4_cidr =
10.20.128.0/17
(Hetzner Network)- Starting from this network onwards: Agent Node Subnets
- At the end of this network: Server Node Subnets
- cluster_ipv4_cidr =
10.20.128.0/20
(Reserved for K8s Pod Networks -> HCCM RouteController)
k3s-01-agent-small-nbg1-vvj: 10.20.128.101
HCCM RouteController tried to add the Pod network route 10.20.128.0/24
(probably matching 1:1 with the subnet of the server itself) with 10.20.128.101
as the gateway:
- The gateway IP can not be contained in destination range (only exception are default routes with
0.0.0.0/0
) - The Pod IP range is probably clashing with the Hetzner Network Subnets for Agent Nodes
You have to leave enough space at the beginning and at the end of network_ipv4_cidr
for Hetzner Networks, so that they don't collide with Pod and Service CIDRs (especially at the beginning of the ranges).
from terraform-hcloud-kube-hetzner.
@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.
Yeah I saw that note about changing cidrs, but had to due some overlapping cidr :(
But yeah, thanks to @M4t7e it works now.. was unaware how to portion up the subnets, but how it rocks :D
from terraform-hcloud-kube-hetzner.
Found also some event that looks reasonable for one of my nodes (k3s-01-agent-small-nbg1-vvj) in the cluster:
Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 398.38809ms: hcloud/CreateRoute: invalid gateway (invalid_input)
When looking at the pod logs of hcloud-cloud-controller-manager it looks like there is some routing issue..
2024-02-22T13:42:11+01:00 I0222 12:42:11.829375 1 route_controller.go:216] action for Node "k3s-01-control-plane-hel1-iwm" with CIDR "10.20.132.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829410 1 route_controller.go:216] action for Node "k3s-01-control-plane-nbg1-oze" with CIDR "10.20.131.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829422 1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-vvj" with CIDR "10.20.128.0/24": "add"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829433 1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-yiv" with CIDR "10.20.129.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829445 1 route_controller.go:216] action for Node "k3s-01-control-plane-fsn1-ywt" with CIDR "10.20.130.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829459 1 route_controller.go:290] route spec to be created: &{ k3s-01-agent-small-nbg1-vvj false [{InternalIP 10.20.128.101} {Hostname k3s-01-agent-small-nbg1-vvj} {ExternalIP XX.XX.XX.XX}] 10.20.128.0/24 false}
2024-02-22T13:42:11+01:00 I0222 12:42:11.829493 1 route_controller.go:304] Creating route for node k3s-01-agent-small-nbg1-vvj 10.20.128.0/24 with hint fac268fa-acab-4287-bc7f-5008bb1790cf, throttled 12.44ยตs
2024-02-22T13:42:12+01:00 E0222 12:42:12.401242 1 route_controller.go:329] Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)
2024-02-22T13:42:12+01:00 I0222 12:42:12.401365 1 route_controller.go:387] Patching node status k3s-01-agent-small-nbg1-vvj with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-02-22 12:42:00 +0000 UTC,LastTransitionTime:2024-02-22 12:42:00 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}
2024-02-22T13:42:12+01:00 I0222 12:42:12.401535 1 event.go:307] "Event occurred" object="k3s-01-agent-small-nbg1-vvj" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 571.712557ms: hcloud/CreateRoute: invalid gateway (invalid_input)"
Someone experienced this before?
from terraform-hcloud-kube-hetzner.
I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know ๐
Yes, from what I've seen yet it looks exactly like that.. just removed the whole cluster and created a new one and it's not working with cilium_routing_mode set to tunnel
. But there was no difference at all @mysticaltech
To me it looks like it's the hcloud csi things that are the issue in this case.. but I can't get my head around the issue.
from terraform-hcloud-kube-hetzner.
@tobiasehlert Yeah, sure. Here some considerations for the subnetting...
You need enough space for Hetzner Subnets. Total limit today is 50 Subnets per Network (see https://docs.hetzner.com/cloud/networks/faq#are-there-any-limits-on-how-networks-can-be-used).
For routing configuration simplicity, it's best if cluster_ipv4_cidr
falls within network_ipv4_cidr
. The cluster_ipv4_cidr
will use most IPs since they are allocated for the Pods, and Hetzner CCM reserves larger ranges for the Nodes, adding the Pod routes with the corresponding Node IP as the gateway. Max 100 routes per Network are possible (see Hetzner faq). service_ipv4_cidr
typically requires less space compared to the Pods.
Hetzner Subnets and Pod Networks are both allocated in ascending order. Therefore, we could disregard the Server Node Subnets at the end (it's highly unlikely they will ever be used) if we aim to save space.
One example could be like this:
- network_ipv4_cidr =
10.0.0.0/16
(sufficient for 64 /24 Subnets -> you can treat only10.0.0.0/18
as reserved for it) - service_ipv4_cidr =
10.0.64.0/18
(half size ofcluster_ipv4_cidr
) - cluster_dns_ipv4 =
10.0.64.10
(has to be inservice_ipv4_cidr
) - cluster_ipv4_cidr =
10.0.128.0/17
(biggest range for Pods -> more than 100 /24 networks/routes for Pods)
from terraform-hcloud-kube-hetzner.
Thanks @M4t7e, excellent! Should've had a better look at the kube.tf.
@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.
from terraform-hcloud-kube-hetzner.
Thanks for sharing @tobiasehlert, @M4t7e FYI happening in cilium.
I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know ๐
from terraform-hcloud-kube-hetzner.
@tobiasehlert Weird, it's the first time we hear of that. Please inspect and share your hcloud ccm and csi logs then if you suspect this is causing the issue. Also please have a look at our readme's debug section and try to do some general node level debug just in case. Also, the hcloud
cli can be useful here to inspect the routes and such.
from terraform-hcloud-kube-hetzner.
Thanks for your response @M4t7e!
What size should the both Service and Cluster code be each? Do you have some suggestions there?
from terraform-hcloud-kube-hetzner.
Thanks @M4t7e!
I'll go for this then :)
network_ipv4_cidr = "10.20.128.0/17"
service_ipv4_cidr = "10.20.160.0/19"
cluster_ipv4_cidr = "10.20.192.0/18"
cluster_dns_ipv4 = "10.20.160.10"
from terraform-hcloud-kube-hetzner.
Related Issues (20)
- Missing "cluster-init" option in config.yaml in the only control plane node. HOT 4
- [Bug]: Invalid provider configuration with terraform plan | apply HOT 2
- [Bug]: terraform validate fails "Names in agent_nodepools must be unique." HOT 2
- [Bug]: Autoupgrade nodes seems to lead to not ready nodes that need manual reboots HOT 8
- Longhorn installation fails (CRDs not installed) HOT 1
- Allow configuring s3 `etcd-snapshot-retention` in config file HOT 2
- System-upgrade-controller fails to run HOT 5
- [Bug]: Can't restore a copy HOT 2
- [Feature Request]: Collect extra-manifests recursive HOT 2
- [Bug]: Local Rancher Cluster mixed roles validation fails HOT 1
- [Bug]: HOT 1
- [Bug]: Terraform does not stop HOT 13
- [Bug]: ImagePullBackoff of system-upgrade controller HOT 1
- Not able to upgrade Traefik HOT 1
- [Bug]: Sudden drop of public internet connectivity for some nodes of arm64 cluster HOT 10
- [Bug]: zram_size not passed on HOT 4
- [Bug]: Terraform Validate fails agent_nodepools HOT 1
- [Bug]: Waiting for load-balancer to get an IP... Hangs HOT 2
- Disable the default load balancer HOT 7
- [Bug]: nginx stuck deploying when not scheduling on control-plane
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from terraform-hcloud-kube-hetzner.