Comments (48)
make sure your nodes are using eth0 for the first network interface name. Ubuntu typically does not use eth0 it might be ens18 or enp0s1 or something else like that. to check it ssh into a node and type ip a
to view all interfaces, look for the one with your lan IP and verify its name.
from k3s-ansible-traefik-rancher.
Is this eth0 or enp0s18?
I tried with the later and that caused all masters to fail and script to exit
server@k3s-master-02:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether ce:fc:22:6b:12:1c brd ff:ff:ff:ff:ff:ff
altname enp0s18
inet 192.168.4.102/24 metric 100 brd 192.168.4.255 scope global dynamic eth0
valid_lft 75942sec preferred_lft 75942sec
inet6 fe80::ccfc:22ff:fe6b:121c/64 scope link
valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 3e:34:33:84:b9:16 brd ff:ff:ff:ff:ff:ff
inet 10.42.2.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::3c34:33ff:fe84:b916/64 scope link
valid_lft forever preferred_lft forever
4: cni0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 0a:2f:dc:ba:3d:2a brd ff:ff:ff:ff:ff:ff
inet 10.42.1.1/24 brd 10.42.1.255 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::82f:dcff:feba:3d2a/64 scope link
valid_lft forever preferred_lft forever
from k3s-ansible-traefik-rancher.
I did a ip route get 8.8.8.8 and looks like all nodes are on eth0 unless its being tricky or something
server@k3s-worker-03:~$ ip route get 8.8.8.8
8.8.8.8 via 192.168.4.1 dev eth0 src 192.168.4.113 uid 1000
cache
from k3s-ansible-traefik-rancher.
Okay so it is eth0, must be something else. I haven't re-tested with ubuntu as I dont use it anymore personally. I run everything on debian. I can fire up my test server and see if I get the same result and I'll let you know
from k3s-ansible-traefik-rancher.
Cool that would be appreciated, maybe its something up with ubuntu 24.04 being so new...
Seems to always fail on the TASK [k3s_server_post : Wait for MetalLB resources] if it helps.
from k3s-ansible-traefik-rancher.
yeah could also be a metalLB version conflict with the k3s version
from k3s-ansible-traefik-rancher.
I haven't changed the efaults from the repo for the k3s version or the metalLB version, unless some update updated them potentially.
I will start setting up debian vms instead.
Do you also prefer/recommend lxc containers over vm's?
from k3s-ansible-traefik-rancher.
I use vm's to test with my prod cluster is bare metal. I havent tried using lxc other than to test functionality. I usually run debian12 cloud-init custom built images but vanilla deb12 install would work as well. i dont customize it too much, one of my other repo's has the packer files for it to build from the net-install ISO
from k3s-ansible-traefik-rancher.
Gotcha, I can try setting up a debian cloud init.
I just tried using the latest versions of k3s/kubevip/metallb with no luck, can't seem to get this working on ubuntu 24.04 :/
from k3s-ansible-traefik-rancher.
welp i tried setting up 24.02 coud-init vm's to test, it wont even get past downloading k3s binary. something about an https connection missing a parameter. doesnt make sense. that task hasnt been changed. i tried another release version to no avail as well. must be something in the canonical provided ubuntu cloud-init images
from k3s-ansible-traefik-rancher.
fixed that. needed to update ansible haha
from k3s-ansible-traefik-rancher.
I did change it to 1.28.8 when trying to get the k3s binary to download. I ran against 3 masters of ubuntu 22.04 LTS cloud-init image from canonical within proxmox. the cluster fully installed past the metalLB error you were getting. I wonder what else could be causing the problem for you. If its because you have some agent nodes too maybe, I can try to check with that but metalLB wouldnt go on the agents anyways since its a critical addon
from k3s-ansible-traefik-rancher.
So I setup 6 debian12 containers and got a bit farther, but still hit an error on the cert manager install:
TASK [cert-manager : apply cert-manager CRDs] *******************************************************************************
FAILED - RETRYING: [192.168.5.101]: apply cert-manager CRDs (2 retries left).
FAILED - RETRYING: [192.168.5.101]: apply cert-manager CRDs (1 retries left).
fatal: [192.168.5.101]: FAILED! => {"attempts": 2, "changed": true, "cmd": ["kubectl", "apply", "-f", "https://github.com/jetstack/cert-manager/releases/download/v1.13.2/cert-manager.crds.yaml"], "delta": "0:00:00.081072", "end": "2024-05-14 23:10:15.135764", "msg": "non-zero return code", "rc": 1, "start": "2024-05-14 23:10:15.054692", "stderr": "The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?", "stderr_lines": ["The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?"], "stdout": "", "stdout_lines": []}
PLAY RECAP ******************************************************************************************************************
192.168.5.101 : ok=55 changed=28 unreachable=0 failed=1 skipped=25 rescued=0 ignored=0
192.168.5.102 : ok=35 changed=17 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.103 : ok=35 changed=17 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.111 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.112 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.113 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
This just due to a version conflict? I'll give it more attempts today
from k3s-ansible-traefik-rancher.
Okay looks like there its not getting the kubeconfig from environment although it should. I might need to revisit the command ran to do that and specify the path to the config as an argument. its reverting to default looking for a k3s api at 127.0.0.1 rather than trying to use either a master server ip or the vip, perferably the vip which would be in the kubeconfig from prior scripted actions. there is a part that copies that from /etc into the ansible user's home dir
from k3s-ansible-traefik-rancher.
Gotcha, I will wait until those updates then if that's the case, I appreciate it!
from k3s-ansible-traefik-rancher.
welcome. I will take a look into it prob this afternoon
from k3s-ansible-traefik-rancher.
Hey sorry it took me until today to look at this. I just made a bunch of updates/fixes and tested running smooth on the ubuntu cloud-init test vm's i set up. between the kubectl command and then traefik updating things in their values files that I missed even in my own main documentation...but it should be good now. Next I need to adjust versions for k3s and metallb on the defaults in the repo so changing them YMMV but other than that let me know if more issues arrise.
from k3s-ansible-traefik-rancher.
No worries that was fast as is!
I brought in the new code and made my updates to the files needed, but this time ran into an error on the "Wait for MetalLB resources" section with this, its past midnight here so i'll have to take a look deeper tomorrow on these errors (running on Debian 12):
TASK [k3s_server_post : Wait for MetalLB resources] **********************************************************************************************************************************************************failed: [192.168.5.101] (item=controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "controller", "--for", "condition=Available=True", "--timeout=120s"], "delta": "0:02:10.206022", "end": "2024-05-17 00:06:50.899498", "item": {"condition": "--for condition=Available=True", "description": "controller", "name": "controller", "resource": "deployment"}, "msg": "non-zero return code", "rc": 1, "start": "2024-05-17 00:04:40.693476", "stderr": "E0517 00:05:41.908297 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request\nW0517 00:05:42.962668 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:05:42.962692 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:05:46.149432 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:05:46.149453 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:05:50.371827 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:05:50.371848 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:05:58.570020 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:05:58.570051 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nerror: timed out waiting for the condition on deployments/controller", "stderr_lines": ["E0517 00:05:41.908297 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request", "W0517 00:05:42.962668 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:05:42.962692 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:05:46.149432 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:05:46.149453 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:05:50.371827 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:05:50.371848 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:05:58.570020 32755 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:05:58.570051 32755 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "error: timed out waiting for the condition on deployments/controller"], "stdout": "", "stdout_lines": []}
ok: [192.168.5.101] => (item=webhook service)
failed: [192.168.5.101] (item=pods in replica sets) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "pod", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for", "condition=Ready", "--timeout=120s"], "delta": "0:02:06.534153", "end": "2024-05-17 00:08:59.172505", "item": {"condition": "--for condition=Ready", "description": "pods in replica sets", "resource": "pod", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2024-05-17 00:06:52.638352", "stderr": "error: timed out waiting for the condition on pods/controller-586bfc6b59-s8sfj", "stderr_lines": ["error: timed out waiting for the condition on pods/controller-586bfc6b59-s8sfj"], "stdout": "", "stdout_lines": []}
failed: [192.168.5.101] (item=ready replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "replicaset", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.readyReplicas}=1", "--timeout=120s"], "delta": "0:01:00.088430", "end": "2024-05-17 00:10:12.166949", "item": {"condition": "--for=jsonpath='{.status.readyReplicas}'=1", "description": "ready replicas of controller", "resource": "replicaset", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2024-05-17 00:09:12.078519", "stderr": "Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)", "stderr_lines": ["Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)"], "stdout": "", "stdout_lines": []}
ok: [192.168.5.101] => (item=fully labeled replicas of controller)
failed: [192.168.5.101] (item=available replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "replicaset", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.availableReplicas}=1", "--timeout=120s"], "delta": "0:02:00.647835", "end": "2024-05-17 00:12:26.566429", "item": {"condition": "--for=jsonpath='{.status.availableReplicas}'=1", "description": "available replicas of controller", "resource": "replicaset", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2024-05-17 00:10:25.918594", "stderr": "E0517 00:11:38.158727 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request\nW0517 00:11:39.017024 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:11:39.017145 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:11:41.637903 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:11:41.637928 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:11:45.512648 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:11:45.512670 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nW0517 00:11:57.735361 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready\nE0517 00:11:57.735381 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready\nerror: timed out waiting for the condition on replicasets/controller-586bfc6b59", "stderr_lines": ["E0517 00:11:38.158727 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request", "W0517 00:11:39.017024 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:11:39.017145 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:11:41.637903 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:11:41.637928 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:11:45.512648 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:11:45.512670 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "W0517 00:11:57.735361 33785 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: apiserver not ready", "E0517 00:11:57.735381 33785 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready", "error: timed out waiting for the condition on replicasets/controller-586bfc6b59"], "stdout": "", "stdout_lines": []}
NO MORE HOSTS LEFT *******************************************************************************************************************************************************************************************
PLAY RECAP ***************************************************************************************************************************************************************************************************
192.168.5.101 : ok=43 changed=16 unreachable=0 failed=1 skipped=20 rescued=0 ignored=0
192.168.5.102 : ok=34 changed=10 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.103 : ok=34 changed=10 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.111 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.112 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.113 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
from k3s-ansible-traefik-rancher.
is that due to nic device name this go around in the vars file?
from k3s-ansible-traefik-rancher.
Nah i'm still on eth0
from k3s-ansible-traefik-rancher.
weird. what metallb version?
from k3s-ansible-traefik-rancher.
13.12
from k3s-ansible-traefik-rancher.
k3s version?
from k3s-ansible-traefik-rancher.
v1.26.10+k3s1
from k3s-ansible-traefik-rancher.
okay i forgot to bump that in the sample vars file. try it with v1.28.8+k3s1 that's what I tested it with.
from k3s-ansible-traefik-rancher.
we are getting closer!
It made it past that, still got 1st master failing, now at the apply metallbn crs step:
TASK [k3s_server_post : Apply metallb CRs] *******************************************************************************************************************************************************************
FAILED - RETRYING: [192.168.5.101]: Apply metallb CRs (5 retries left).
FAILED - RETRYING: [192.168.5.101]: Apply metallb CRs (4 retries left).
FAILED - RETRYING: [192.168.5.101]: Apply metallb CRs (3 retries left).
FAILED - RETRYING: [192.168.5.101]: Apply metallb CRs (2 retries left).
FAILED - RETRYING: [192.168.5.101]: Apply metallb CRs (1 retries left).
fatal: [192.168.5.101]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["k3s", "kubectl", "apply", "-f", "/tmp/k3s/metallb-crs.yaml", "--timeout=120s"], "delta": "0:00:00.090814", "end": "2024-05-19 20:05:39.450825", "msg": "non-zero return code", "rc": 1, "start": "2024-05-19 20:05:39.360011", "stderr": "The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?", "stderr_lines": ["The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?"], "stdout": "", "stdout_lines": []}
NO MORE HOSTS LEFT *******************************************************************************************************************************************************************************************
PLAY RECAP ***************************************************************************************************************************************************************************************************
192.168.5.101 : ok=45 changed=16 unreachable=0 failed=1 skipped=20 rescued=0 ignored=0
192.168.5.102 : ok=34 changed=10 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.103 : ok=34 changed=10 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.111 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.112 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.113 : ok=11 changed=3 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
from k3s-ansible-traefik-rancher.
zero issues with that on my end.
from k3s-ansible-traefik-rancher.
Just ran another test on ubuntu 24.04 LTS and also debian 12 both successful with no errors.
from k3s-ansible-traefik-rancher.
make sure you run reset playbook and then try to run k3s-uninstall.sh (if its not found thats good) on each node to make sure its cleaned out, reboot and then try fresh. I had done a reset in one of my tests that some k3s files were left behind somehow. k3s-uninstall was still there so i ran that to clean the rest out
from k3s-ansible-traefik-rancher.
I have ran the kill-all.sh, but I don't see a k3s-uninstall.sh?
from k3s-ansible-traefik-rancher.
I have ran the kill-all.sh, but I don't see a k3s-uninstall.sh?
when you ssh to each node. if its not there in the tab auto-complete then thats good its all cleaned out
from k3s-ansible-traefik-rancher.
I didn't find any k3s-uninstall.sh on any of the vm's.
It couldn't be because i'm running the deploy.sh from ubuntu when all my vm's are now debian 12 could it?
from k3s-ansible-traefik-rancher.
nah. only thing I would make sure is you installed ansible from pip and not from apt repo so its the newest version, that you can ssh to each node with the ansible user using ssh key auth and that user has sudo rights without needing a password to be re-entered since that is how typically a cloud-init image is built
from k3s-ansible-traefik-rancher.
Damn, it made it farther then ever this time:
TASK [cert-manager : deploy cert-manager using helm without internal CA] **************************************************************************************************************************fatal: [192.168.5.101]: FAILED! => {"changed": true, "cmd": ["helm", "install", "cert-manager", "jetstack/cert-manager", "--namespace", "cert-manager", "--version", "v1.13.2", "--wait"], "delta": "0:00:15.008485", "end": "2024-05-21 21:54:28.767310", "msg": "non-zero return code", "rc": 1, "start": "2024-05-21 21:54:13.758825", "stderr": "Error: INSTALLATION FAILED: Unable to continue with install: could not get information about the resource ServiceAccount \"cert-manager-webhook\" in namespace \"cert-manager\": Get \"https://192.168.5.50:6443/api/v1/namespaces/cert-manager/serviceaccounts/cert-manager-webhook\": dial tcp 192.168.5.50:6443: connect: connection refused - error from a previous attempt: unexpected EOF", "stderr_lines": ["Error: INSTALLATION FAILED: Unable to continue with install: could not get information about the resource ServiceAccount \"cert-manager-webhook\" in namespace \"cert-manager\": Get \"https://192.168.5.50:6443/api/v1/namespaces/cert-manager/serviceaccounts/cert-manager-webhook\": dial tcp 192.168.5.50:6443: connect: connection refused - error from a previous attempt: unexpected EOF"], "stdout": "", "stdout_lines": []}
PLAY RECAP ****************************************************************************************************************************************************************************************192.168.5.101 : ok=57 changed=30 unreachable=0 failed=1 skipped=27 rescued=0 ignored=0
192.168.5.102 : ok=35 changed=17 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.103 : ok=35 changed=17 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
192.168.5.111 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.112 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
192.168.5.113 : ok=11 changed=7 unreachable=0 failed=0 skipped=19 rescued=0 ignored=0
server@k3s-admin:~/k3s-ansible-traefik-rancher$ ansible --version
ansible [core 2.16.6]
config file = /home/server/k3s-ansible-traefik-rancher/ansible.cfg
configured module search path = ['/home/server/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /home/server/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
Maybe it is because of my Ansible version, it looks like i'm on 2.16.6?
from k3s-ansible-traefik-rancher.
im on 2.15.11
from k3s-ansible-traefik-rancher.
Is this a slower system you are trying to deploy to? or maybe all on spinning drives for storage? seems like resources arent becoming ready within the time they should and its causing errors. that or as things are in progress still stuff going up/down and causing issues. k3s and kubernetes in general is very sensitive to latency
from k3s-ansible-traefik-rancher.
Its on an NVMe that all my other proxmox VMs also run on, not sure if that would be the issue or not.
from k3s-ansible-traefik-rancher.
could be if there is a lot of IO overhead
from k3s-ansible-traefik-rancher.
hmm could be, I will be installing 4 x 2TB NVMe's in either 2 truenas pools (both in mirror) or in 1 pool with 2 mirrored vdevs.
Then I can test it on there with no other vm's running.
from k3s-ansible-traefik-rancher.
technically if its all the same host you dont need to run worker nodes, just allocate more to the server nodes for resources. my bare metal cluster of 5 are all masters each node is all roles. i can tolerate 2 of them going down and everything will still keep running
from k3s-ansible-traefik-rancher.
VM's are running on 3 seperate servers but running from the same nfs share.
But I shall try just running 3 masters across them instead and see.
from k3s-ansible-traefik-rancher.
well that could be the issue right there, storing VM disk over NFS
from k3s-ansible-traefik-rancher.
my test server is a dell r620 with 96gb ram and both sas ssd's and some 10k 2.5 sas hdd's, the hdd's are in a zfs pool. I have put all 3 test vm's on the same sas ssd with no issue there in testing.
from k3s-ansible-traefik-rancher.
My proxmox is in HA so it needs to be a shared persistant storage outside of the servers, residing in my truenas 45Drives Q30.
It runs multiple docker vm's pretty flawlessly but maybe between that and this the iops isn't there.
But I will get the mirrored NVMe pool up and see how that goes when I get the time.
While installing that i'll be replacing a backplane that had a defective slot for an HHD and swapping moba/cpu/ram to a supermicro board so lots of updates in the queue!
from k3s-ansible-traefik-rancher.
nice. I havent done much research on using NFS for vm storage, id be inclined to try it with iscsi targets though. In proxmox each target would be assigned to the vm, (i think from my test) at least unless you go into the proxmox CLI to configure the iscsi share, then mount it to somewhere under /mnt and add it as a directory type in proxmox datacenter -> storage. not sure if iscsi is faster or not but you do typically need fast storage for etcd databases in k3s. since k3s has HA in itself another option is to exclude the nodes from any live migrations on the proxmox servers and just use a local storage on each proxmox node for those vm's.
from k3s-ansible-traefik-rancher.
Hey @ChrisThePCGeek!
It's been a busy time with work, travel and life BUT I finally got time to setup my containers to local storage and not HA containers in proxmox and finally have your script working without failure!
So note to other's, if you run your proxmox VM's in HA from a shared pool (even with mine being an NVMe pool) this might not work no matter what.
Once VM's were running on local from each server of mine this worked flawlessly.
Thanks again for all your support and explanations here!
from k3s-ansible-traefik-rancher.
I haven't been able to get to rancher yet by going to 192.168.4.60 though, I just get a 404 page not found, am I missing something?
server@k3s-master-01:~$ kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cattle-fleet-system gitjob ClusterIP 10.43.45.81 <none> 80/TCP 70m app=gitjob
cattle-provisioning-capi-system capi-webhook-service ClusterIP 10.43.194.147 <none> 443/TCP 69m cluster.x-k8s.io/provider=cluster-api
cattle-system rancher ClusterIP 10.43.225.226 <none> 80/TCP,443/TCP 71m app=rancher
cattle-system rancher-webhook ClusterIP 10.43.22.84 <none> 443/TCP 69m app=rancher-webhook
cert-manager cert-manager ClusterIP 10.43.37.110 <none> 9402/TCP 72m app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager cert-manager-webhook ClusterIP 10.43.191.82 <none> 443/TCP 72m app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 74m <none>
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 74m k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.43.218.3 <none> 443/TCP 74m k8s-app=metrics-server
kube-system traefik LoadBalancer 10.43.236.73 192.168.4.60 80:31371/TCP,443:31919/TCP 72m app.kubernetes.io/instance=traefik-kube-system,app.kubernetes.io/name=traefik
kube-system traefik-external LoadBalancer 10.43.245.211 192.168.4.61 80:31377/TCP,443:31794/TCP 71m app.kubernetes.io/instance=traefik-external-kube-system,app.kubernetes.io/name=traefik
metallb-system webhook-service ClusterIP 10.43.232.32 <none> 443/TCP 74m component=controller
from k3s-ansible-traefik-rancher.
I was able to get to it by adding 192.168.4.60 into my windows hosts file but yet I can't from just the IP
from k3s-ansible-traefik-rancher.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k3s-ansible-traefik-rancher.