att-comdev / halcyon-kubernetes Goto Github PK

View Code? Open in Web Editor NEW

35.0 14.0 22.0 102 KB

Ansible playbooks for a kubadm-based kubernetes deployment, on supporting any cloud and any kubeadm-enabled OS.

License: Apache License 2.0

Makefile 100.00%

ansible kubernetes kubeadm

halcyon-kubernetes's Issues

proxy_enable needs to be defined in group_vars

The proxy_enable setting added in #28 is not defined in group_vars which leads to the following error:

fatal: [ravi-kube196]: FAILED! => {"failed": true, "msg": "The conditional check 'docker_shared_mounts or proxy_enable' failed. The error was: error while evaluating conditional (docker_shared_mounts or proxy_enable): 'proxy_enable' is undefined\n\nThe error appears to have been in '/Users/rmehra/esupport/code/halcyon-vagrant-kubernetes/halcyon-kubernetes/kube-deploy/roles/deploy-kube/tasks/ubuntu.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: setting up docker unit drop-in dir\n  ^ here\n"}

Remove 15-hostname-override.conf if hostname override not requested

This is more of a usability issue. I was running this playbook multiple times and resetting in-between. I forgot to remove the deployed /etc/systemd/system/kubelet.service.d/15-hostname-override.conf file and I was constantly hitting #35. It would be nice if the file is removed (state: absent) if kubelet_hostname_override: false.

kubelet_hostname_override leads to some ugly errors in kube-proxy

When kubelet_hostname_override setting is enabled, the kube-proxy gives the following startup errors:

E0112 06:07:10.274956       1 server.go:421] Can't get Node "kube1", assuming iptables proxy, err: nodes "kube1" not found
W0112 06:07:10.277165       1 server.go:468] Failed to retrieve node info: nodes "kube1" not found
W0112 06:07:10.277296       1 proxier.go:249] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIP

I'm not sure of the symptoms of this error. I was having multiple networking issues so it was hard to correlate which error messages caused which issues. The errors can be cleared though by passing the --hostname-override setting to kube-proxy as well as kubelet, which is apparently required according to kubernetes/kubernetes#18104 (comment).

Full kube-proxy logs

E0112 06:07:10.274956       1 server.go:421] Can't get Node "kube1", assuming iptables proxy, err: nodes "kube1" not found
I0112 06:07:10.276098       1 server.go:215] Using iptables Proxier.
W0112 06:07:10.277165       1 server.go:468] Failed to retrieve node info: nodes "kube1" not found
W0112 06:07:10.277296       1 proxier.go:249] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIP
W0112 06:07:10.277347       1 proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0112 06:07:10.277415       1 server.go:227] Tearing down userspace rules.
I0112 06:07:10.287037       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0112 06:07:10.287505       1 conntrack.go:66] Setting conntrack hashsize to 32768
I0112 06:07:10.287732       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0112 06:07:10.287818       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600

removing all reviewable integration

Self explanatory. Reviewable causes more confusion than it solves with reviews.

Interface with multiple IPs will fail

I have two IP addresses on my "public_iface" interface. Commands like ip addr show {{ public_iface }} | grep "inet\b" | awk '{print $2}' | cut -d/ -f1 will grab both and will fail subsequent commands.

feat: add kubectl auto-completion

one nice (and extremely simple) change that we can make is to have kubectlauto-completion.

this can be done with a simple ansible lineinfile EOF in ~/.bashrc.

source <(kubectl completion bash)

i can get back to this later, but if someone in the community wants to add to the playbooks...that's fine also. i just wanted to create the feature request before i forget it (as i'm working on something else).

feat: add Atomic support for halcyon

similar to CoreOS and #54, a lot of people would like to see support for Atomic as well.

this issue is just to track this effort and let the community know that there is an intention to support more base OS's. Ceph support could potentially be easier in Atomic than CoreOS, but it's something we'd like to have in place in order to support both the openstack-helm and kolla-kubernetes.

kube-proxy needs --cluster-cidr set

Kubernetes 1.5 now checks for the --cluster-cidr flag and will give a warning without it. See kubernetes/kubernetes#39440. There is some discussion on how kubeadm can automatically resolve it in the future but for now, it looks like we'll have to set it ourselves.

CentOS cannot pull tiller image due to change in Docker's pull logic.

As a result of moby/moby#30083, pulls from gcr.io are not working in all cases.

any plan to support offline deploy

Thanks the great project.
But do you have any plan to support k8s 1.5+ in offline env ?

add ci to deployment

add ci testing to this repo/deployment.

Add support for KVM as an additional provider

It was determined that support for KVM was necessary for some individuals who wish to use the halcyon-kubernetes project for their development needs.

add multiple sdn providers to playbooks

i think we need to add multiple cni-enabled sdn providers to the set of playbooks. i would like to see this done in the following (somewhat opinionated) way:

role:
  kube-sdn:
    tasks:
      main.yml (referring to other sdn playbooks)
        calico.yml
        canal.yml
        romana.yml
        weave.yml

i would like to add an "sdn boostrapped" artifact in /etc/kubernetes/halcyon/network/.sdn, and include the following output that can then be used later to identify which SDN is deployed:

ubuntu@kube1:~$ cat /etc/kubernetes/halcyon/network/.sdn

# Halcyon Network Deployment:
kube_sdn_deploy: {{ kube_sdn_deploy }}
kube_sdn: {{ kube_sdn }}

if any of the sdn folks are interested in this, that's fine...i can pick this up later as well.

Romana agents crashloop on worker nodes

When deploying Romana on worker nodes spun up with vagrant, the romana agent pods remain in a crashloopbackoff state. Describing the pods shows:

8m 8m 1 {kubelet 172.16.35.13} spec.containers{romana-agent} Normal Started Started container with docker id 1241efd23e9a
8m 8m 1 {kubelet 172.16.35.13} spec.containers{romana-agent} Normal Created Created container with docker id 1241efd23e9a; Security:[seccomp=unconfined]
8m 5m 13 {kubelet 172.16.35.13} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "romana-agent" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=romana-agent pod=romana-agent-dgth5_kube-system(0eada120-aad2-11e6-831b-02d0b043c29f)"

The docker logs for the container shows:

ubuntu@kube1:~$ sudo docker logs 99bda40ea34c
Error: Unable to fetch list of hosts using 'http://10.99.99.99:9600'
run-romana-agent: entrypoint for romana services container.
Options:
-h or --help: print usage
--romana-root: URL for Romana Root service, eg: http://127.0.0.1:9600
--interface: Interface for IP Address lookup, instead of eth0
--nat: Add NAT rule for traffic coming from pods
--nat-interface: Interface that NATs traffic, instead of eth0 or --interface setting.
--cluster-ip-cidr: CIDR for cluster IPs. Excluded from NAT rule.
--pod-to-host: Permit pods to connect to the host server

Fix proxy support for docker

The vagrant-proxyconf plugin doesn't properly set up the docker proxy for CentOS and requires a vagrant reload after docker is installed which causes problems with the ansible playbook.

@mwgiles (https://github.com/mwgiles) has proposed a fix: https://github.com/portdirect/halcyon-kubernetes/pull/1, that should be brought into this repo to resolve this issue.

separate ansible playbooks from vagrant deployment

for some reason, people see "vagrant" and immediately think that this is only for a local lab; when in fact, we're allowing for multiple vagrant provider deployments. so in order to address this concern, i think these playbooks need broken away from the vagrant or any terraform solution. so the repos would look like this:

halcyon-terraform-kubernetes:
  - terraform deployments that use halcyon-kubernetes as a submodule.
halcyon-vagrant-kubernetes:
  - vagrant deployments that use halcyon-kubernetes as a submodule.
halcyon-kubernetes:
  - just the necessary ansible playbooks that are portable and can be used as a Galaxy role, or as a submodule for other deployments.

i think this will make the most sense for other people long term.

Multi-platform support

It would be great if the project could help deploy kubernetes to other architectures like arm/arm64. The project as-is already almost gets a working install since recent kubeadm versions have good multi-platform support built in.

Here is my post-playbook steps for getting to a happy state.

sudo kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm64/g" | sudo kubectl create -f -
sudo kubectl delete -f https://rawgit.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml
curl -sSL https://rawgit.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml | sed "s/amd64/arm64/g" | sudo kubectl create -f -
sudo kubectl delete pods --all --namespace=kube-system && sudo kubectl delete pods --all

Helm/tiller doesn't currently have arm64 support, but it does have 32-bit arm support and it can be deployed with:

curl -L http://storage.googleapis.com/kubernetes-helm/helm-canary-linux-arm.tar.gz | tar zxv --strip 1 -C /tmp; chmod +x /tmp/helm; sudo mv /tmp/helm /usr/local/bin/helm
sudo /usr/local/bin/helm init

To make these steps work with this project, I'm thinking either ansible could detect the platform with uname -m or it could be user-specified in group_vars/all.yml. Then, ansible just modifies the dashboard and flannel manifests with the specified architecture before installing them.

Other notes:

We don't have to (i.e. we can't) support clusters with mixed architectures because kubernetes doesn't even support it yet.
Flannel is the only supported CNI on ARM, as per https://github.com/luxas/kubernetes-on-arm.

Error: unknown flag: --use-kubernetes-version

I got a fatal error message while "vagrant up"
below is error message

➜ halcyon-vagrant-kubernetes git:(master) ✗ ./setup-halcyon.sh --k8s-config kolla --k8s-version v1.5.2 --guest-os centos
➜ halcyon-vagrant-kubernetes git:(master) ✗ vagrant up
...

TASK [kube-init : initialize the kubernetes master] ****************************
fatal:[
kube1
]:FAILED! =>{
"changed":true,
"cmd":"kubeadm init --token ac7da3.b2cofcda6ab01976 --use-kubernetes-version v1.5.2 --api-advertise-addresses 172.16.35.11",
"delta":"0:00:00.067388",
"end":"2017-04-17 06:39:06.573168",
"failed":true,
"rc":1,
"start":"2017-04-17 06:39:06.505780",
"stderr":"Error: unknown flag: --use-kubernetes-version\nUsage:\n kubeadm init [flags]\n\nFlags:\n --apiserver-advertise-address string The IP address the API Server will advertise it's listening on. 0.0.0.0 means the default network interface's address.\n --apiserver-bind-port int32 Port for the API Server to bind to (default 6443)\n --apiserver-cert-extra-sans stringSlice Optional extra altnames to use for the API Server serving cert. Can be both IP addresses and dns names.\n --cert-dir string The path where to save and store the certificates (default "/etc/kubernetes/pki")\n --config string Path to kubeadm config file (WARNING: Usage of a configuration file is experimental)\n --kubernetes-version string Choose a specific Kubernetes version for the control plane (default "v1.6.0")\n --pod-network-cidr string Specify range of IP addresses for the pod network; if set, the control plane will automatically allocate CIDRs for every node\n --service-cidr string Use alternative range of IP address for service VIPs (default "10.96.0.0/12")\n --service-dns-domain string Use alternative domain for services, e.g. "myorg.internal" (default "cluster.local")\n --skip-preflight-checks Skip preflight checks normally run before modifying the system\n --token string The token to use for establishing bidirectional trust between nodes and masters.\n --token-ttl duration The duration before the bootstrap token is automatically deleted. 0 means 'never expires'.",
"stderr_lines":[
"Error: unknown flag: --use-kubernetes-version",
"Usage:",
" kubeadm init [flags]",
"",
"Flags:",
" --apiserver-advertise-address string The IP address the API Server will advertise it's listening on. 0.0.0.0 means the default network interface's address.",
" --apiserver-bind-port int32 Port for the API Server to bind to (default 6443)",
" --apiserver-cert-extra-sans stringSlice Optional extra altnames to use for the API Server serving cert. Can be both IP addresses and dns names.",
" --cert-dir string The path where to save and store the certificates (default "/etc/kubernetes/pki")",
" --config string Path to kubeadm config file (WARNING: Usage of a configuration file is experimental)",
" --kubernetes-version string Choose a specific Kubernetes version for the control plane (default "v1.6.0")",
" --pod-network-cidr string Specify range of IP addresses for the pod network; if set, the control plane will automatically allocate CIDRs for every node",
" --service-cidr string Use alternative range of IP address for service VIPs (default "10.96.0.0/12")",
" --service-dns-domain string Use alternative domain for services, e.g. "myorg.internal" (default "cluster.local")",
" --skip-preflight-checks Skip preflight checks normally run before modifying the system",
" --token string The token to use for establishing bidirectional trust between nodes and masters.",
" --token-ttl duration The duration before the bootstrap token is automatically deleted. 0 means 'never expires'."
],
"stdout":"",
"stdout_lines":[

]
}
...

And I had accessed to my kube1 VM
Checked kubeadm version
Below is #an inforamtion of kubeadm version in my kube1 VM

[vagrant@kube1 ~]$ kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Please check below changelog "--use-kubernetes-version" was changed
https://github.com/kubernetes/kops/blob/master/vendor/k8s.io/kubernetes/CHANGELOG.md

feat: add CoreOS support for halcyon

many people would like to see this feature added, especially when we're talking for projects like openstack-helm and kolla-kubernetes.

i'd like to see CoreOS supported for the project, and this issue is just to track this and let the community know that this is the project's intention. i know we're going to run into issues with Ceph support, but it's possible that this could get easier with things like a Ceph Helm Chart.

add centos deployment to playbooks

i would like to add centos tasks to these plabooks as well. this should be done in a somewhat opinionated way (i've sort of started the framework):

roles:
  playbook:
    main.yml (referring to centos/ubuntu tasks)
      centos.yml
      ubuntu.yml
      {{ future_os_support.yml }}

att-comdev / halcyon-kubernetes Goto Github PK

halcyon-kubernetes's Issues

Recommend Projects

Recommend Topics

Recommend Org