k8sp / vagrant-coreos Goto Github PK

Run Kubernetes and an nginx service on a Vagrant cluster of 3 CoreOS nodes.

Shell 100.00%

vagrant-coreos's Introduction

This trial is fully coded in this bash script. When you run this script, it creates a Vagrant cluster of CoreOS VMs following steps in this tutorial.

Pitfalls

Need to Wait Minutes for Kubernetes to Start

After the Vagrant virtual cluster starts, it will take few minutes for Kubernetes to start, as described here. Before that, kubectl command would complain something like

The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port?

Use the Right Client Configuration

We need to let kubectl know how to connect to the cluster by

specifying a configuration file via something like export KUBECONFIG="$(pwd)/kubeconfig", and
specifying a context in that configuration file via kubectl config use-context vagrant-multi.

I once forgot to update environment variable KUBECONFIG and used a configuration file that describes a cluster whose VMs are not running. This causes kubectl complaining

The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port?

No Load Balancer for Vagrant Cluster

It is notable that we cannot create a service with type "LoadBalancer" when we are using a Vagrant cluster, because load balancers are not something provided by Kubernetes but by the cloud service like AWS, and Vagrant doesn't provide a Kubernetes load balancer. More details are here: #2

Among the three ways for Kubernetes to expose services:

ClusterIP: use a cluster-internal IP only - this is the default and is discussed above. Choosing this value means that you want this service to be reachable only from inside of the cluster.
NodePort : on top of having a cluster-internal IP, expose the service on a port on each node of the cluster (the same port on each node). You’ll be able to contact the service on any :NodePort address.
LoadBalancer: on top of having a cluster-internal IP and exposing service on a NodePort also, ask the cloud provider for a load balancer which forwards to the Service exposed as a :NodePort for each Node.

we can use NodePort.

The last few lines in the bash script creates a Wildfly pod and expose it by creating a NodePort typed service. The script also shows how to find out the VM that runs the Wildfly pod so to access the Wildfly service from outside of the VM cluster.

According to k8sp/issues#13, we need to figure out how to create a load balancer for a bare-metal cluster.

Different Version of Kubernetes Client and Server

It happened that I installed an old version (1.0.1) of Kubernetes client but used 1.2.3 server. When I run kubectl run, it creates only pods but no deployment. This was solved after I upgraded the client. More details are here #1

GFW

按照文档：https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant-single.html 进行时间的时候，vagrant up 正常，但执行 kubectl get nodes 返回错误: The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port? vagrant ssh 进去之后执行 docker ps 会卡住，没有任何反馈信息，Ctrl + C 可以终止退出。

@typhoonzero 描述 Found something here: https://github.com/coreos/coreos-kubernetes/blob/master/Documentation/kubernetes-on-vagrant.md NOTE: When the cluster is first launched , it must download all container images for the cluster components (Kubernetes, dns, heapster, etc). Depending on the speed of your connection, it can take a few minutes before the Kubernetes api-server is available. Before the api-server is running, the kubectl command above may show output similar to:

The connection to the server 172.17.4.101:443 was refused - did you specify the right host or port? Maybe we need a different docker registry in China, or try to use proxies?

类似的问题：coreos/coreos-kubernetes#393

初步判断：通过翻墙可以解决下载 Docker Images 失败的问题。

如果说，第一次vagrant up的时候并没有翻墙，那么下载Flannel image就会失败，导致后面一系列的操作都失败。这个时候，如果没有执行vagrant destroy，只是vagrant halt，然后在翻墙的网络下执行vagrant up，那么问题依旧得不到解决。原因是：When the cluster is first launched , it must download all container images for the cluster components。

总结：

在GFW环境下，首先要解决翻墙的问题
在翻墙的网络环境下，需要vagrant destroy 所有的vm，重新执行vagrant up

在multi-node 上运行 Guestbook 时，kubectl get pods 返回状态“Pending”

在执行 kubectl create -f examples/guestbook/ 之后，kubectl get services 能够返回正常的信息，但执行 kubectl get pods 返回的状态出现 Pending 的时候，可以执行下面的命令查看具体的原因： kubectl describe pod <NAME> 最下面会出来一些描述信息，如： Node didn't have enough resource: Memory, requested: xxxx, use: xxxx, capacity: xxxxxx 或： Node didn't have enough resource: CPU, requested: xxxx, use: xxxx, capacity: xxxxxx

这是vm资源不足造成的，通过修改Vagrantfile增加资源：

$worker_vm_memory = 2048

注意：

直接在VBox图形界面里面修改不起作用，每次vagrant up的时候，会被Vagrantfile里面的设置重置。

vagrant-coreos's People

Contributors

Watchers

Forkers

pineking

vagrant-coreos's Issues

flanneld restarts multiple times at CoreOS boot time

I am using the VM cluster defined in https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/vagrant with minor changes:

Each VM (c1, w1 and e1) has 2GB memory
All VMs run CoreOS stable (VERSION=1010.5.0)

After starting the cluster, I SSHed to c1, but login prompt shows that coreos-cloudinit-xxx.service failed:

$ vagrant ssh c1
CoreOS stable (1010.5.0)
Failed Units: 2
  coreos-cloudinit-120749834.service
  update-engine.service

systemctrl list-units also shows that coreos-cloudinit-120749834.service failed.

systemctl status coreos-cloudinit-120749834.service shows log messages which tells

Jun 03 23:37:40 c1 bash[1172]: Job for flanneld.service failed because the control process exited with error code. See "systemctl status flanneld.service" and "journalctl -xe" for details.

So I did systemctl status flanneld.service, which shows that flanneld Docker images are downloaded multiple times and re-started multiples. It doesn't make sense to me to restart flanneld multiple times.

journalctl -xe shows more details:

core@c1 ~ $ journalctl -xe
Jun 03 23:44:02 c1 systemd[1]: flanneld.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 03 23:44:02 c1 systemd[1]: Failed to start Network fabric for containers.
-- Subject: Unit flanneld.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has failed.
-- 
-- The result is failed.
Jun 03 23:44:02 c1 systemd[1]: flanneld.service: Unit entered failed state.
Jun 03 23:44:02 c1 systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jun 03 23:44:05 c1 dockerd[1300]: time="2016-06-03T23:44:05.198857494Z" level=info msg="Pull session cancelled"
Jun 03 23:44:08 c1 systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Jun 03 23:44:08 c1 systemd[1]: Stopped Network fabric for containers.
-- Subject: Unit flanneld.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has finished shutting down.
Jun 03 23:44:08 c1 systemd[1]: Starting Network fabric for containers...
-- Subject: Unit flanneld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has begun starting up.
Jun 03 23:44:08 c1 dockerd[1300]: time="2016-06-03T23:44:08.308888684Z" level=error msg="Handler for POST /v1.22/containers/create returned error: No such image: quay.io/coreos/flannel:0.5.5"
Jun 03 23:44:08 c1 sdnotify-proxy[1502]: Unable to find image 'quay.io/coreos/flannel:0.5.5' locally
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: 0.5.5: Pulling from coreos/flannel
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: 7bfac8493465: Pulling fs layer
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: a3ed95caeb02: Pulling fs layer
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: fdaeea203ca1: Pulling fs layer
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: 1f0ee8606937: Pulling fs layer
Jun 03 23:44:15 c1 sdnotify-proxy[1502]: 1f0ee8606937: Waiting
Jun 03 23:44:20 c1 sdnotify-proxy[1502]: a3ed95caeb02: Verifying Checksum
Jun 03 23:44:20 c1 sdnotify-proxy[1502]: a3ed95caeb02: Download complete
Jun 03 23:44:23 c1 sdnotify-proxy[1502]: 1f0ee8606937: Verifying Checksum
Jun 03 23:44:23 c1 sdnotify-proxy[1502]: 1f0ee8606937: Download complete
Jun 03 23:45:38 c1 systemd[1]: flanneld.service: Start operation timed out. Terminating.
Jun 03 23:45:38 c1 systemd[1]: flanneld.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 03 23:45:38 c1 systemd[1]: Failed to start Network fabric for containers.
-- Subject: Unit flanneld.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has failed.
-- 
-- The result is failed.
Jun 03 23:45:38 c1 systemd[1]: flanneld.service: Unit entered failed state.
Jun 03 23:45:38 c1 systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jun 03 23:45:39 c1 dockerd[1300]: time="2016-06-03T23:45:39.867910333Z" level=info msg="Pull session cancelled"
Jun 03 23:45:43 c1 systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Jun 03 23:45:43 c1 systemd[1]: Stopped Network fabric for containers.
-- Subject: Unit flanneld.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has finished shutting down.
Jun 03 23:45:43 c1 systemd[1]: Starting Network fabric for containers...
-- Subject: Unit flanneld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has begun starting up.
Jun 03 23:45:43 c1 dockerd[1300]: time="2016-06-03T23:45:43.798850628Z" level=error msg="Handler for POST /v1.22/containers/create returned error: No such image: quay.io/coreos/flannel:0.5.5"
Jun 03 23:45:43 c1 sdnotify-proxy[1533]: Unable to find image 'quay.io/coreos/flannel:0.5.5' locally

kubectl run doesn't create deployment

I tried to follow this tutorial to run Kubernetes on a Vagrant cluster of multiple CoreOS nodes. I can start this cluster. But when I run

kubectl run nginx --image=nginx --port=80

I got only pods but no deployments, which means that kubectl get pods -l run=nginx lists a pod, but kubectl get deployments lists no deployment.

The problem can be recreated by running https://github.com/k8sp/vagrant-coreos/blob/master/run.sh

I have the same problem if I run above experiments on a single-node cluster, as shown in https://github.com/k8sp/vagrant-coreos/blob/master/run-single.sh

kubectl get nodes:The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port?

按照文档：https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant-single.html 进行时间的时候，vagrant up 正常，但执行 kubectl get nodes 返回错误:
The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port?

环境

Ubuntu 16.04 64bit （Thinkpad X230i Windows 10 & Ubuntu 双系统）

操作步骤：

wget "https://releases.hashicorp.com/vagrant/1.8.1/vagrant_1.8.1_x86_64.deb"
sudo dpkg -i vagrant_1.8.1_x86_64.deb
curl -O https://storage.googleapis.com/kubernetes-release/release/v1.2.3/bin/linux/amd64/kubectl
chmod +x kubectl
sudo mv kubectl /usr/local/bin/kubectl
sudo apt-get install virtualbox
git clone https://github.com/coreos/coreos-kubernetes.git
cd coreos-kubernetes/single-node/
sed -i 's/alpha/beta/g' Vagrantfile
vagrant box update
vagrant up

下载太慢，改为手工下载box文件：http://beta.release.core-os.net/amd64-usr/1010.3.0/coreos_production_vagrant.box

vagrant box add my-coreos /home/liangjiameng/Downloads/coreos_production_vagrant.box

修改Vagrantfile： config.vm.box = "my-coreos" 注释随后两行（版本和url）

vagrant up

vagrant up 后，virtualbox 报错。执行下面两行：

sudo modprobe vboxdrv
sudo modprobe vboxnetadp

再次执行 vagrant up 正常启动虚拟机

vagrant up

配置kubectl

export KUBECONFIG="${KUBECONFIG}:$(pwd)/kubeconfig"
kubectl config use-context vagrant-single
kubectl get nodes

The connection to the server 172.17.4.99:443 was refused - did you specify the right host or port?

其他信息：

$ ping 172.17.4.99 是通的
$ vagrant ssh
Last login: Sun May 15 14:59:07 2016 from 10.0.2.2
CoreOS beta (1010.3.0)
Failed Units: 2
coreos-cloudinit-137139417.service
update-engine.service

core@localhost ~ $ systemctl | grep kube 没有任何输出

No external IP after creating service

I ran

kubectl run nginx --image=nginx --port=80

and created a deployment nginx. Then I ran

kubectl expose deployment nginx --type="LoadBalancer"

and created service nginx. However, this servcie doesn't have external IP:

$ kubectl get service
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.3.0.1     <none>        443/TCP   16m
nginx        10.3.0.212                 80/TCP    4m

I found the IP of the VM that runs the nginx pod using kubectl get pod -o wide. However, curl http://<ip>:80 complains "empty reply from server".