banzaicloud / pke Goto Github PK
View Code? Open in Web Editor NEWPKE is an extremely simple CNCF certified Kubernetes installer and distribution, designed to work on any cloud, VM or bare metal.
License: Apache License 2.0
PKE is an extremely simple CNCF certified Kubernetes installer and distribution, designed to work on any cloud, VM or bare metal.
License: Apache License 2.0
Describe the bug
Cilium components and CoreDNS don't start after spinning up the vagrant box with CentOS8 operating system.
Steps to reproduce the issue:
I reworked a little centos8-single.sh
script:
#!/bin/bash -e
# build latest pke tool
#GOOS=linux make pke
KUBERNETES_VERSION="${1:-v1.18.9}"
export VAGRANT_VAGRANTFILE=Vagrantfile-centos8
vagrant up centos1
vagrant ssh centos1 -c "sudo yum install -y ca-certificates"
vagrant ssh centos1 -c "sudo update-ca-trust enable"
vagrant ssh centos1 -c "sudo cp /vagrant/crt/* /etc/pki/ca-trust/source/anchors/"
vagrant ssh centos1 -c "sudo update-ca-trust extract"
vagrant ssh centos1 -c "sudo curl -vL https://banzaicloud.com/downloads/pke/latest -o /usr/local/bin/pke"
vagrant ssh centos1 -c "sudo chmod +x /usr/local/bin/pke"
vagrant ssh centos1 -c "sudo /scripts/pke-single.sh '$KUBERNETES_VERSION' '192.168.64.11:6443' containerd cilium"
vagrant ssh centos1 -c 'sudo cat /etc/kubernetes/admin.conf' > pke-single-config.yaml
export KUBECONFIG=$PWD/pke-single-config.yaml
echo ""
echo "You can access your PKE cluster either:"
echo "- from your host machine accessing the cluster via kubectl. Please run:"
echo "export KUBECONFIG=$PWD/pke-single-config.yaml"
echo ""
echo "- or starting a shell in the virtual machine. Please run:"
echo "vagrant ssh centos1 -c 'sudo -s'"
I commented the build of pke utility and instead - downloaded it directly from latest release.
After that I invoked the script and waited until all components of k8s would be installed.
[root@centos1 ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/auto-approver-6cdf7bb44f-p886f 0/1 ContainerCreating 0 4m4s
kube-system pod/cilium-gt6ck 0/1 CrashLoopBackOff 4 4m4s
kube-system pod/cilium-operator-65d6bd4cf9-vrmh4 1/1 Running 0 4m4s
kube-system pod/coredns-66985b8c8d-prhmz 0/1 ContainerCreating 0 4m4s
kube-system pod/coredns-66985b8c8d-wxqnz 0/1 ContainerCreating 0 4m4s
kube-system pod/etcd-centos1 1/1 Running 0 4m19s
kube-system pod/kube-apiserver-centos1 1/1 Running 0 4m19s
kube-system pod/kube-controller-manager-centos1 1/1 Running 0 4m18s
kube-system pod/kube-proxy-7qjds 1/1 Running 0 4m4s
kube-system pod/kube-scheduler-centos1 1/1 Running 0 4m18s
kube-system pod/local-path-provisioner-7b69d654d-k8fl4 0/1 ContainerCreating 0 4m4s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.10.0.1 <none> 443/TCP 4m21s
kube-system service/kube-dns ClusterIP 10.10.0.10 <none> 53/UDP,53/TCP,9153/TCP 4m19s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/cilium 1 1 0 1 0 <none> 4m17s
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 4m19s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/auto-approver 0/1 1 0 4m18s
kube-system deployment.apps/cilium-operator 1/1 1 1 4m17s
kube-system deployment.apps/coredns 0/2 2 0 4m19s
kube-system deployment.apps/local-path-provisioner 0/1 1 0 4m18s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/auto-approver-6cdf7bb44f 1 1 0 4m4s
kube-system replicaset.apps/cilium-operator-65d6bd4cf9 1 1 1 4m4s
kube-system replicaset.apps/coredns-66985b8c8d 2 2 0 4m4s
kube-system replicaset.apps/local-path-provisioner-7b69d654d 1 1 0 4m4s
[root@centos1 ~]# kubectl logs -n kube-system pod/cilium-gt6ck
Error from server: Get https://192.168.64.11:10250/containerLogs/kube-system/cilium-gt6ck/cilium-agent: remote error: tls: internal error
[root@centos1 containers]# cat cilium-gt6ck_kube-system_cilium-agent-514b0e54b301c8db3a37fbd497d2869d3c301c0e003508d4aa781563dbfeea33.log
...65
2020-10-19T17:00:32.431721961Z stderr F level=info msg="Annotating k8s node" subsys=daemon v4CiliumHostIP.IPv4=10.20.0.69 v4Prefix=10.20.0.0/24 v4healthIP.IPv4=10.20.0.210 v6CiliumHostIP.IPv6="<nil>" v6Prefix="<nil>" v6healthIP.IPv6="<nil>"
2020-10-19T17:00:32.431728564Z stderr F level=info msg="Initializing identity allocator" subsys=identity-cache
2020-10-19T17:00:32.431730942Z stderr F level=info msg="Cluster-ID is not specified, skipping ClusterMesh initialization" subsys=daemon
2020-10-19T17:00:32.431956193Z stderr F level=info msg="Adding local node to cluster" subsys=nodediscovery
2020-10-19T17:00:32.441482266Z stderr F level=info msg="Setting up base BPF datapath" subsys=daemon
2020-10-19T17:00:34.398110106Z stderr F level=info msg="Blacklisting local route as no-alloc" route=10.0.2.0/24 subsys=ipam
2020-10-19T17:00:34.398170967Z stderr F level=info msg="Blacklisting local route as no-alloc" route=192.168.64.0/24 subsys=ipam
2020-10-19T17:00:34.400694344Z stderr F level=error msg="Command execution failed" cmd="[iptables -w 5 -t filter -S]" error="exit status 3" subsys=iptables
2020-10-19T17:00:34.400705923Z stderr F level=warning msg="modprobe: ERROR: could not insert 'ip_tables': Exec format error" subsys=iptables
2020-10-19T17:00:34.400709297Z stderr F level=warning msg="iptables v1.6.1: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)" subsys=iptables
2020-10-19T17:00:34.40071334Z stderr F level=warning msg="Perhaps iptables or your kernel needs to be upgraded." subsys=iptables
2020-10-19T17:00:34.406168382Z stderr F level=error msg="Command execution failed" cmd="[iptables -w 5 -t filter -N CILIUM_TRANSIENT_FORWARD]" error="exit status 3" subsys=iptables
2020-10-19T17:00:34.406180964Z stderr F level=warning msg="modprobe: ERROR: could not insert 'ip_tables': Exec format error" subsys=iptables
2020-10-19T17:00:34.406183959Z stderr F level=warning msg="iptables v1.6.1: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)" subsys=iptables
2020-10-19T17:00:34.406192082Z stderr F level=warning msg="Perhaps iptables or your kernel needs to be upgraded." subsys=iptables
2020-10-19T17:00:34.406194308Z stderr F level=error msg="Error while initializing daemon" error="cannot add custom chain CILIUM_TRANSIENT_FORWARD: exit status 3" subsys=daemon
2020-10-19T17:00:34.406196664Z stderr F level=fatal msg="Error while creating daemon" error="cannot add custom chain CILIUM_TRANSIENT_FORWARD: exit status 3" subsys=daemon
If I run yum update
- everything will run fine.
Expected behavior
All works fine right out-of-the-box.
Create the following commands:
Is your feature request related to a problem? Please describe.
I want PKE to install Docker for me.
Describe the solution you'd like to see
Install Docker if it's not already installed on the machine.
Describe alternatives you've considered
The current alternative is to use images with pre-installed docker.
Additional context
While we prefer containerd, docker is still a requirement in some cases (eg. for GPU workloads)
Kubeadm does not retry on connection refused.
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: Get https://pke-x-nlb-3dfa0a01002de621.elb.eu-central-1.amazonaws.com:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 52.58.170.154:6443: connect: connection refused
Is your feature request related to a problem? Please describe.
Currently we install PKE "manually" by downloading the binary.
Describe the solution you'd like to see
Let's build RPM/DEB packages.
The ComponentStatus resource is deprecated as of Kubernetes 1.19: kubernetes/kubernetes#93570
We use it to check if the API server is available: https://github.com/banzaicloud/pke/blob/master/cmd/pke/app/phases/kubeadm/controlplane/controlplane.go#L1018
Support automatic keepalived installation/configuration
Describe the bug
Using ubuntu-single-docker.sh script, the installation failed in the case of Ubuntu 20.04.
Steps to reproduce the issue:
Run ./ubuntu-single-docker.sh
Expected behavior
The installation should be completed successfully
Additional context
ubuntu-docker: E
ubuntu-docker: :
ubuntu-docker: Version '1.2.13-1' for 'containerd.io' was not found
ubuntu-docker: E
ubuntu-docker: :
ubuntu-docker: Version '5:19.03.8~3-0~ubuntu-focal' for 'docker-ce' was not found
ubuntu-docker: E
ubuntu-docker: :
ubuntu-docker: Version '5:19.03.8~3-0~ubuntu-focal' for 'docker-ce-cli' was not found
ubuntu-docker: /tmp/vagrant-shell: line 20: /etc/docker/daemon.json: No such file or directory
ubuntu-docker: Failed to restart docker.service: Unit docker.service not found.
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
Currently only the builtin kubedm versions are supported.
Support latest Ubuntu LTS (18.04) as a base operating system for PKE.
--platform=auto|ubuntu|rhel
)Support for external etcd configuration
--etcd-prefix string
Describe the bug
Upgrade fails with:
err> [preflight] Some fatal errors occurred:
err> [ERROR CoreDNSUnsupportedPlugins]: there are unsupported plugins in the CoreDNS Corefile
Steps to reproduce the issue:
pke upgrade master --kuberentes-version=1.16.4
Expected behavior
Upgrade succeeds.
Additional context
See related issue in Kubernetes: kubernetes/kubernetes#82889
Skip installing kubernetes and container runtime based on PKE config
See #98
Is your feature request related to a problem? Please describe.
I would like to be able to collect audit logs using the logging-operator.
Describe the solution you'd like to see
I would like to see the audit log written to a hostpath, because the fluent-bit daemonset could read those logs from there as usual.
Describe alternatives you've considered
We could push the audit logs to a webhook, that would write the logs to stdout, so that it could be collected the usual way but that would be overkill I beleive.
Update the PKE to send error output and result of the bootstrapping process. Depends on API endpoint made in banzaicloud/pipeline#3174
fixed in #52
Is your feature request related to a problem? Please describe.
When creating a cluster using a pre-cached image with PKE already installed, I don't want:
Describe the solution you'd like to see
Store some PKE configuration in images that PKE can use to configure the node (eg. container runtime). This configuration could be created by a user data script on base images without PKE.
Is your feature request related to a problem? Please describe.
Installing PKE on CentOS 8/RHEL 8
This message indicates configuration should not be warning
[WARNING][pipeline-ready] Skipping phase due to missing Pipeline API endpoint credentials. missing pipeline-cluster-id: validation failed
Improvement idea (easier maintenance):
Looking at the code there's a bunch of inlined text, like this:
https://github.com/banzaicloud/pke/blob/master/cmd/pke/app/phases/kubeadm/node/kubeadm.go#L112
For easier maintenance you could consider embedding static assets instead of defining them inline as sting:
https://github.com/avelino/awesome-go#resource-embedding
Hyperkube was deprecated and there are no official images shipped as of 1.19
We don't seem to use it since 1.18, but it's worth noting as in the v1beta3
it will be removed.
Some references:
In case of no cloud provider is specified no default storage class is defined.
Describe the bug
The advertiseaddress is wrong after the new kubeadm config has been generated.
Steps to reproduce the issue:
Run pke upgrade
on a cluster which uses different addresses for advertise-address and controlplane-endpoint.
Expected behavior
Pke upgrade should run without error.
Describe the bug
When you run banzai cp up --provider pke
on a linux box without docker or containerd, the installer exits with an error message missing ctr
in the path.
Expected behavior
PKE installation starts, and all steps requiring ctr
run after that only.
log:
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
[WARNING HTTPProxy]: Connection to "https://192.168.150.50" uses proxy "http://172.18.24.90:22222". If that is not intended, adjust your proxy settings
[WARNING HTTPProxyCIDR]: connection to "10.10.0.0/16" uses proxy "http://172.18.24.90:22222". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[WARNING HTTPProxyCIDR]: connection to "10.20.0.0/16" uses proxy "http://172.18.24.90:22222". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
/bin/kubeadm [init --config=/etc/kubernetes/kubeadm.conf] err: exit status 1 4m26.70353459s
Resetting kubeadm...
/bin/kubeadm [reset --force --cri-socket=unix:///run/containerd/containerd.sock]
[preflight] running pre-flight checks
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
W0319 18:46:33.298246 6055 reset.go:213] [reset] Unable to fetch the kubeadm-config ConfigMap, using etcd pod spec as fallback: failed to get config map: Get https://192.168.150.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: net/http: TLS handshake timeout
/bin/kubeadm [reset --force --cri-socket=unix:///run/containerd/containerd.sock] err: <nil> 10.096034565s
Error: exit status 1
[root@kubesphere kubernetes]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─0-containerd.conf, 10-kubeadm.conf
Active: active (running) since Tue 2019-03-19 18:58:20 CST; 1s ago
Docs: https://kubernetes.io/docs/
Main PID: 7246 (kubelet)
Tasks: 18
Memory: 26.0M
CGroup: /system.slice/kubelet.service
└─7246 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.457124 7246 kubelet.go:2266] node "kubesphere" not found
Mar 19 18:58:21 kubesphere kubelet[7246]: I0319 18:58:21.459905 7246 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
Mar 19 18:58:21 kubesphere kubelet[7246]: I0319 18:58:21.461490 7246 kubelet_node_status.go:72] Attempting to register node kubesphere
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.461995 7246 kubelet_node_status.go:94] Unable to register node "kubesphere" with API server: Post https://192.168.150.50:6443/api/v1/nodes: dial tcp 192.168.150.50:6443: connect: connection refused
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.557409 7246 kubelet.go:2266] node "kubesphere" not found
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.657547 7246 kubelet.go:2266] node "kubesphere" not found
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.746162 7246 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.150.50:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.150.50:6443: connect: connection refused
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.746904 7246 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.150.50:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubesphere&limit=500&resourceVersion=0: dial tcp 192.168.150.50:6443: connect: connection refused
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.747999 7246 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://192.168.150.50:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubesphere&limit=500&resourceVersion=0: dial tcp 192.168.150.50:6443: connect: connection refused
Mar 19 18:58:21 kubesphere kubelet[7246]: E0319 18:58:21.757721 7246 kubelet.go:2266] node "kubesphere" not found
Is your feature request related to a problem? Please describe.
Sometimes we need to start K8s cluster without DenyEscalatingExec
admission plugin.
Describe the solution you'd like to see
Use --deny-escalating-exec
for turn on the admission plugin during install.
Describe alternatives you've considered
We can ssh into master node and edit apiserver manifest.
Do we still use it?
Is your feature request related to a problem? Please describe.
Support Kubernetes version 1.20
Kubernetes API server fails to start on additional master nodes due to admission-control configuration is missing.
Is your feature request related to a problem? Please describe.
Implement test for K8s upgrade flow.
Describe the solution you'd like to see
It should be run as circleci job.
This issue was automatically created by Allstar.
Security Policy Violation
Dismiss stale reviews not configured for branch master
This issue will auto resolve when the policy is in compliance.
Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.