Giter VIP home page Giter VIP logo

coreos-distro's People

Contributors

mattma avatar yichengq avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yichengq openxxs

coreos-distro's Issues

Using images/containers instead of curl/wget binary

Instead of doing curl the kubernetes binary. We should consider to use docker/rkt image instead.

Something like below

# kube-apiserver.service

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=etcd2.service setup-network-environment.service
After=etcd2.service setup-network-environment.service

[Service]
EnvironmentFile=/etc/network-environment
ExecStartPre=-/usr/bin/docker kill api-server
ExecStartPre=-/usr/bin/docker rm api-server
ExecStartPre=/usr/bin/docker pull mattma/kube-apiserver:1.0.1
ExecStart=/usr/bin/docker run --name api-server mattma/kube-apiserver:1.0.1
ExecStop=/usr/bin/docker stop api-server
Restart=always
RestartSec=10

[X-Fleet]
Global=true
MachineMetadata=role=master

mattma/kube-apiserver:1.0.1 image is located at here. To build out this image, I am using the configuration which is matching exactly what we are currently have.

The tricking thing to note, when docker container is created, it will be running in an isolated environment, how does it go to talk to the outside world. In current implementation, the binary is running inside the host machine so it is never run into this situation.

Confirm docker is landed in flannel's network

flannel.service start first to setup the flannel network. All future docker containers in Node machines should land in flannel network.

It seems working currently without any configuration. Not sure it is the case. Someone need to confirm docker.service works as expected.

three master setup

Update the README.md and etcd-user-data with three master setup instruction.

One master node works as expected. Three master node does not work.

Question 1: When all three machines loads up with etcd-user-data file as cloud-init setting, they both load with the static value listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001. See here.

When I follow the current README doc, in the first master, I got

coreos-01 core # etcdctl   cluster-health
cluster is healthy
member 6ae27f9fa2984b1d is healthy

But it did not detect the 2nd master. Even though I have the setting (Have done daemon-reload on both machine after setting the new value in initial-cluster.conf)

# /etc/systemd/system/etcd2.service.d/initial-cluster.conf
[Service]
Environment="ETCD_INITIAL_CLUSTER=784767c97ce5410f931f0cfee19523f8=http://172.17.8.101:2380,c1f07ef72ede4399b9fc35d14e5db0a8=http://172.17.9.101:2380"

In the 2nd machine, follow the same step in One master node with the setting above in initial-cluster.conf. So in this case, both master one and master two have the exactly same value in /etc/systemd/system/etcd2.service.d/initial-cluster.conf.

Master two etcd is running as fine. But when got error below when run

node-01 core # etcdctl cluster-health
Error:  cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379

I believe there must be the static value issue in cloud-init file, may need to be updated as well.

Another question, If I set the environment variable of ROLE. E.G: ROLE=master with vagrant up will load up etcd-user-data. And ROLE=node will load up user-data. This is the legacy stuff, right? It will always gonna load etcd-user-data in this case?????

Kube-kubelet does not setup correctly with `dns`

core@kube-node-02 ~ $ sudo journalctl -u kube-kubelet
-- Logs begin at Fri 2015-08-21 18:05:09 UTC, end at Fri 2015-08-21 19:40:15 UTC. --
Aug 21 18:11:10 kube-node-02 systemd[1]: [/run/fleet/units/kube-kubelet.service:15] Unknown lvalue '--cluster_dns' in section 'Service'
Aug 21 18:11:10 kube-node-02 systemd[1]: [/run/fleet/units/kube-kubelet.service:16] Unknown lvalue '--cluster_domain' in section 'Service'
Aug 21 18:11:10 kube-node-02 systemd[1]: Starting Kubernetes Kubelet...
Aug 21 18:11:10 kube-node-02 rm[1452]: /usr/bin/rm: cannot remove '/opt/bin/kubelet': No such file or directory
Aug 21 18:11:10 kube-node-02 curl[1454]: Warning: Illegal date format for -z, --timecond (and not a file name).
Aug 21 18:11:10 kube-node-02 curl[1454]: Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.
Aug 21 18:11:10 kube-node-02 curl[1454]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Aug 21 18:11:10 kube-node-02 curl[1454]: Dload  Upload   Total   Spent    Left  Speed
Aug 21 18:11:14 kube-node-02 curl[1454]: [471B blob data]
Aug 21 18:11:14 kube-node-02 systemd[1]: Started Kubernetes Kubelet.
Aug 21 18:11:14 kube-node-02 kubelet[1476]: W0821 18:11:14.227446    1476 server.go:462] Could not load kubeconfig file /var/lib/kubelet/kubeconfig: stat /var/lib/kubelet/kubeconfig: no such file or directory. Trying auth path instead.
Aug 21 18:11:14 kube-node-02 kubelet[1476]: W0821 18:11:14.227710    1476 server.go:424] Could not load kubernetes auth path /var/lib/kubelet/kubernetes_auth: stat /var/lib/kubelet/kubernetes_auth: no such file or directory. Continuing with defaults.
Aug 21 18:11:14 kube-node-02 kubelet[1476]: I0821 18:11:14.227882    1476 manager.go:127] cAdvisor running in container: "/system.slice"
Aug 21 18:11:14 kube-node-02 kubelet[1476]: I0821 18:11:14.228296    1476 fs.go:93] Filesystem partitions: map[/dev/sda9:{mountpoint:/ major:8 minor:9} /dev/sda3:{mountpoint:/usr major:8 minor:3} /dev/sda6:{mountpoint:/usr/share/oem major:8 minor:6}]
Aug 21 18:11:14 kube-node-02 kubelet[1476]: I0821 18:11:14.229182    1476 manager.go:156] Machine: {NumCores:1 CpuFrequency:2798419 MemoryCapacity:1045966848 MachineID:305215dab8894a50a74d3e5a305c8396 SystemUUID:C7BCCB48-62B8-4A24-B7A6-1428FCEEF09D BootID
Aug 21 18:11:14 kube-node-02 kubelet[1476]: I0821 18:11:14.231838    1476 manager.go:163] Version: {KernelVersion:4.1.5-coreos ContainerOsVersion:CoreOS 779.0.0 DockerVersion:1.7.1 CadvisorVersion:0.15.1}
Aug 21 18:11:14 kube-node-02 kubelet[1476]: I0821 18:11:14.232189    1476 plugins.go:69] No cloud provider specified.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.076494    1476 docker.go:295] Connecting to docker on unix:///var/run/docker.sock
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.076870    1476 server.go:661] Watching apiserver
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.121402    1476 plugins.go:56] Registering credential provider: .dockercfg
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.127058    1476 server.go:623] Started kubelet
Aug 21 18:11:15 kube-node-02 kubelet[1476]: E0821 18:11:15.127933    1476 kubelet.go:682] Image garbage collection failed: unable to find data for container /
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.133172    1476 kubelet.go:702] Running in container "/kubelet"
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.133251    1476 server.go:63] Starting to listen on 0.0.0.0:10250
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.291570    1476 factory.go:226] System is using systemd
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.292315    1476 factory.go:234] Registering Docker factory
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.292856    1476 factory.go:89] Registering Raw factory
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.307898    1476 kubelet.go:821] Successfully registered node 172.17.8.102
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.342549    1476 manager.go:946] Started watching for new ooms in manager
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.342760    1476 oomparser.go:183] oomparser using systemd
Aug 21 18:11:15 kube-node-02 kubelet[1476]: I0821 18:11:15.345129    1476 manager.go:243] Starting recovery of all containers
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.359541    1476 container.go:255] Failed to create summary reader for "/system.slice/sys-kernel-debug.mount": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.360516    1476 container.go:255] Failed to create summary reader for "/system.slice/system-systemd\\x2dfsck.slice": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.362517    1476 container.go:255] Failed to create summary reader for "/system.slice/systemd-vconsole-setup.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.364128    1476 container.go:255] Failed to create summary reader for "/system.slice/tmp.mount": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.368040    1476 container.go:255] Failed to create summary reader for "/system.slice/dev-mqueue.mount": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.368972    1476 container.go:255] Failed to create summary reader for "/system.slice/etcd2.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.370248    1476 container.go:255] Failed to create summary reader for "/system.slice/fleet.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.371185    1476 container.go:255] Failed to create summary reader for "/system.slice/kube-kubelet.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.372476    1476 container.go:255] Failed to create summary reader for "/system.slice/ldconfig.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.373742    1476 container.go:255] Failed to create summary reader for "/system.slice/rpc-statd.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.379591    1476 container.go:255] Failed to create summary reader for "/system.slice/boot.mount": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.383116    1476 container.go:255] Failed to create summary reader for "/user.slice": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.385733    1476 container.go:255] Failed to create summary reader for "/system.slice/rpcbind.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.388105    1476 container.go:255] Failed to create summary reader for "/system.slice/systemd-tmpfiles-setup.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.389048    1476 container.go:255] Failed to create summary reader for "/system.slice/system-addon\\x2dconfig.slice": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.391693    1476 container.go:255] Failed to create summary reader for "/system.slice/systemd-journal-catalog-update.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.395006    1476 container.go:255] Failed to create summary reader for "/system.slice/audit-rules.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.396578    1476 container.go:255] Failed to create summary reader for "/system.slice/docker.service": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.398537    1476 container.go:255] Failed to create summary reader for "/system.slice/media.mount": none of the resources are being tracked.
Aug 21 18:11:15 kube-node-02 kubelet[1476]: W0821 18:11:15.398992    1476 container.go:255] Failed to create summary reader for "/system.slice/setup-network-environment.service": none of the resources are being tracked.
core@kube-node-02 ~ $ sudo systemctl status kube-kubelet
● kube-kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/run/fleet/units/kube-kubelet.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-08-21 18:11:14 UTC; 1h 31min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes,http://kubernetes.io/v1.0/docs/admin/kubelet.html
  Process: 1475 ExecStartPre=/usr/bin/mkdir -p /opt/kubernetes/manifests/ (code=exited, status=0/SUCCESS)
  Process: 1472 ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet (code=exited, status=0/SUCCESS)
  Process: 1454 ExecStartPre=/usr/bin/curl -L -o /opt/bin/kubelet -z /opt/bin/kubelet https://storage.googleapis.com/kubernetes-release/release/v1.0.1/bin/linux/amd64/kubelet (code=exited, status=0/SUCCESS)
  Process: 1452 ExecStartPre=/usr/bin/rm /opt/bin/kubelet (code=exited, status=1/FAILURE)
  Process: 1449 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
 Main PID: 1476 (kubelet)
   CGroup: /system.slice/kube-kubelet.service
           ├─1476 /opt/bin/kubelet --address=0.0.0.0 --port=10250 --hostname_override=172.17.8.102 --api_servers=http://172.17.8.100:8080 --allow_privileged=true # cluster_dns matches `setup/dns/dns-service.yaml` @ `spec.clusterIP`
           └─1492 journalctl -f

Aug 21 19:33:16 kube-node-02 kubelet[1476]: W0821 19:33:16.134458    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:34:16 kube-node-02 kubelet[1476]: W0821 19:34:16.154107    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:35:16 kube-node-02 kubelet[1476]: W0821 19:35:16.194820    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:36:16 kube-node-02 kubelet[1476]: W0821 19:36:16.212533    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:37:16 kube-node-02 kubelet[1476]: W0821 19:37:16.232000    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:38:16 kube-node-02 kubelet[1476]: W0821 19:38:16.247220    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:39:16 kube-node-02 kubelet[1476]: W0821 19:39:16.269459    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:40:16 kube-node-02 kubelet[1476]: W0821 19:40:16.281662    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:41:16 kube-node-02 kubelet[1476]: W0821 19:41:16.299104    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.
Aug 21 19:42:16 kube-node-02 kubelet[1476]: W0821 19:42:16.314775    1476 container.go:255] Failed to create summary reader for "/system.slice/motdgen.service": none of the resources are being tracked.

Latest new branch failed on starting on Kube-proxy and Kubelet on Node machine

branch is the latest greatest. Whatever in the repo now, it works since kube-proxy and kubelet still using the static binary instead of the image, to reproduce the issue, simply copy the below service file to replace what is in the kube-proxy.

#kube-proxy.service
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/GoogleCloudPlatform/kubernetes,http://kubernetes.io/v1.0/docs/admin/kube-proxy.html
Requires=setup-network-environment.service
After=setup-network-environment.service

[Service]
EnvironmentFile=/etc/sysconfig/kubernetes-config
ExecStartPre=-/usr/bin/docker kill kube-proxy
ExecStartPre=-/usr/bin/docker rm kube-proxy
ExecStartPre=/usr/bin/docker pull mattma/kube-proxy:${KUBERNETES_VERSION}
ExecStart=/usr/bin/docker run \
  --net=host \
  --name kube-proxy \
  mattma/kube-proxy:${KUBERNETES_VERSION} \
  --master=http://${API_SERVER_IP}:${INSECURE_PORT} \
  --logtostderr=true
Restart=always
RestartSec=10

[X-Fleet]
Global=true
MachineMetadata=role=node
Error response from daemon: Cannot start container f4e9a61799493a04da1f0fea940ce1d12dd08ac6724684b0d18d51fd01e4c147: [8] System error: no such file or directory
core@kube-node-02 ~ $ docker logs f4e9a6179949
no such file or directory

Which file or directory is missing?

All instructions works on Master machine, but similar setting does not work on Node machine, Error below. could not figure out why? @yichengq

Road map to Stable v1.0 release

  • Current, if you run the update-demo example in the cluster, it will work in all cases. Except that it cannot access the pod data by using API server endpoint via proxy/namespaces/default/pods/" + server.podId + "/data.json. Compare with the original update-demo in Kubernetes repo.
  • Using images/containers instead of curl/wget binary. See Issue 5
  • configuration instead of hard code values. See Issue 7
  • User need an authentication to access the API server. via secret, ssh key, etc.
  • Test in the production machines. Form a cluster, to see how it perform via kubectl command from local machine.
  • Write an documentation for how to deploy to a production environment. Ex: "digitial ocean", "AWS", etc
  • Three master nodes for high availabilties. Based on the design goal
  • Provide an easy update path when a newer version of the kubernetes release via Environment Variable
  • Current implementation of dns controller is borrowed from here. Need someone who is knowledgable on this topic to review it.
  • HTTP proxy service. Do we need this proxy service?
  • Do we need to have rpcbind.service and rpc-statd.service?
  • Someone from In-depth kubernetes background could review all the units files.
  • Support for service-account and tokens. See here

Kube-proxy got error status on WatchServices and WatchEndpoints

core@kube-node-02 ~ $ sudo systemctl status -l kube-proxy
● kube-proxy.service - Kubernetes Proxy
   Loaded: loaded (/run/fleet/units/kube-proxy.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Thu 2015-08-20 03:53:41 UTC; 1h 22min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes,http://kubernetes.io/v1.0/docs/admin/kube-proxy.html
  Process: 4010 ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy (code=exited, status=0/SUCCESS)
  Process: 3994 ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.0.1/bin/linux/amd64/kube-proxy (code=exited, status=0/SUCCESS)
  Process: 3992 ExecStartPre=/usr/bin/rm /opt/bin/kube-proxy (code=exited, status=0/SUCCESS)
  Process: 3989 ExecStartPre=/usr/bin/mkdir -p /opt/bin (code=exited, status=0/SUCCESS)
 Main PID: 4013 (kube-proxy)
   Memory: 4.3M
      CPU: 9.706s
   CGroup: /system.slice/kube-proxy.service
           └─4013 /opt/bin/kube-proxy --master=http://172.17.8.100:8080 --logtostderr=true

Aug 20 03:53:13 kube-node-02 systemd[1]: Starting Kubernetes Proxy...
Aug 20 03:53:13 kube-node-02 curl[3994]: Warning: Illegal date format for -z, --timecond (and not a file name).
Aug 20 03:53:13 kube-node-02 curl[3994]: Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.
Aug 20 03:53:13 kube-node-02 curl[3994]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Aug 20 03:53:13 kube-node-02 curl[3994]: Dload  Upload   Total   Spent    Left  Speed
Aug 20 03:53:38 kube-node-02 curl[3994]: [1.9K blob data]
Aug 20 03:53:41 kube-node-02 curl[3994]: [320B blob data]
Aug 20 03:53:41 kube-node-02 systemd[1]: Started Kubernetes Proxy.
Aug 20 05:02:04 kube-node-02 kube-proxy[4013]: W0820 05:02:04.397432    4013 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [58876/51366]) [59875] Reason: Details:<nil> Code:0}
Aug 20 05:06:22 kube-node-02 kube-proxy[4013]: W0820 05:06:22.938093    4013 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [59748/59285]) [60747] Reason: Details:<nil> Code:0}
core@kube-node-02 ~ $ sudo journalctl -u kube-proxy
-- Logs begin at Tue 2015-08-18 17:51:34 UTC, end at Thu 2015-08-20 05:23:19 UTC. --
Aug 18 17:57:16 kube-node-02 systemd[1]: Starting Kubernetes Proxy...
Aug 18 17:57:16 kube-node-02 rm[1358]: /usr/bin/rm: cannot remove '/opt/bin/kube-proxy': No such file or directory
Aug 18 17:57:16 kube-node-02 curl[1361]: Warning: Illegal date format for -z, --timecond (and not a file name).
Aug 18 17:57:16 kube-node-02 curl[1361]: Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.
Aug 18 17:57:16 kube-node-02 curl[1361]: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Aug 18 17:57:16 kube-node-02 curl[1361]: Dload  Upload   Total   Spent    Left  Speed
Aug 18 17:57:17 kube-node-02 curl[1361]: [234B blob data]
Aug 18 17:57:17 kube-node-02 systemd[1]: Started Kubernetes Proxy.
Aug 18 18:51:38 kube-node-02 kube-proxy[1364]: W0818 18:51:38.734295    1364 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested in
Aug 18 19:05:36 kube-node-02 kube-proxy[1364]: W0818 19:05:36.534740    1364 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested i
Aug 18 20:01:28 kube-node-02 kube-proxy[1364]: W0818 20:01:28.246425    1364 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested in
Aug 18 20:04:27 kube-node-02 kube-proxy[1364]: W0818 20:04:27.595492    1364 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested i
Aug 18 22:35:56 kube-node-02 kube-proxy[1364]: W0818 22:35:56.749712    1364 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested in
Aug 18 23:03:50 kube-node-02 kube-proxy[1364]: W0818 23:03:50.948246    1364 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested i
Aug 19 05:43:46 kube-node-02 kube-proxy[1364]: W0819 05:43:46.742629    1364 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested i
Aug 19 05:46:01 kube-node-02 kube-proxy[1364]: W0819 05:46:01.276691    1364 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested in
Aug 19 06:30:58 kube-node-02 kube-proxy[1364]: W0819 06:30:58.203745    1364 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested in
Aug 19 06:40:10 kube-node-02 kube-proxy[1364]: W0819 06:40:10.843763    1364 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested i
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.830407    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.831744    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.831787    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.831823    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.833005    1364 proxysocket.go:133] Failed to connect to balancer: failed to connect to an endpoint.
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.834083    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.834127    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.834162    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.834194    1364 proxysocket.go:99] Dial failed: dial tcp 10.244.45.2:27017: connection refused
Aug 20 03:46:46 kube-node-02 kube-proxy[1364]: E0820 03:46:46.834201    1364 proxysocket.go:133] Failed to connect to balancer: failed to connect to an endpoint.
....

Refer to the issue 9713 and issue 9310

configuration instead of hard code values

Case 1:

localhost usage, it used throughout the api-server, control-manager, schuduler:

 --etcd_servers=http://127.0.0.1:2379,http://127.0.0.1:4001 \

Should we use an Environment variable instead of? like $private_ivp4

Case 2:

insecure port used to indicate the open port on master node, so user could do http://172.17.8.100:8080 to check api-serverhealth, schdule something. Several other service is using it to talk to api-server. We should make the port configurable without hard code value.

Case 3

cluster ip range is used by flannel network overlay which is later being saved in etcd after flannel.service started. Should be an variable here.

Case 4

api-server public address. Can this one be something like --api_servers=http://${ETH1_IPV4}:8080 \?

Case 5

cluster-domain and cluster-dns is shared by yaml and service files, how could we introduce an variable to work in different type of file format.

In general, we should use as many as Environment variable instead of static value, use as many as Environment variable instead of user configurable value, if we absolutely need to use user configurable value, we should do it cleanly and intuitively.

In conclusion, we should make it much easy to go in production without less configurable value, I think the absolutely thing would be master ip address which used to talk to api-server, master machine id which used in node etcd cluster setting?

etcd2.service status=1/FAILURE

To reproduce the issues. Follow steps:

  1. sudo systemctl cat etcd2
# /etc/systemd/system/etcd2.service
[Install]
WantedBy=default.target

[Unit]
Description=etcd2
Conflicts=etcd.service

[Service]
User=etcd
Environment=ETCD_DATA_DIR=/var/lib/etcd2
Environment=ETCD_NAME=%m
ExecStart=/usr/bin/etcd2
Restart=always
RestartSec=10s
LimitNOFILE=40000

# /run/systemd/system/etcd2.service.d/20-cloudinit.conf
[Service]
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://172.17.8.101:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.17.8.101:2380"
Environment="ETCD_INITIAL_CLUSTER_STATE=new"
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://0.0.0.0:4001"
Environment="ETCD_LISTEN_PEER_URLS=http://172.17.8.101:2380,http://172.17.8.101:7001"
  1. sudo systemctl start etcd2

Success: system does not return anything

  1. sudo systemctl enable etcd2
Created symlink from /etc/systemd/system/default.target.wants/etcd2.service to /etc/systemd/system/etcd2.service.
  1. sudo systemctl status etcd2
● etcd2.service - etcd2
   Loaded: loaded (/etc/systemd/system/etcd2.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/etcd2.service.d
           └─20-cloudinit.conf
   Active: activating (auto-restart) (Result: exit-code) since Fri 2015-07-31 02:17:52 UTC; 11s ago
 Main PID: 1078 (code=exited, status=1/FAILURE)
   Memory: 0B
      CPU: 0
   CGroup: /system.slice/etcd2.service

Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE
Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Unit entered failed state.
Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Failed with result 'exit-code'.
  1. sudo journalctl -u etcd2

repeatedly getting the failure issues

Jul 31 02:17:52 coreos-01 systemd[1]: Starting etcd2...
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: listening for peers on http://172.17.8.101:2380
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: listening for peers on http://172.17.8.101:7001
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: listening for client requests on http://0.0.0.0:2379
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: listening for client requests on http://0.0.0.0:4001
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: stopping listening for client requests on http://0.0.0.0:4001
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: stopping listening for client requests on http://0.0.0.0:2379
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: stopping listening for peers on http://172.17.8.101:7001
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: stopping listening for peers on http://172.17.8.101:2380
Jul 31 02:17:52 coreos-01 etcd2[1078]: 2015/07/31 02:17:52 etcdmain: advertise URLs of "36cdb64424934ae7a3fcc2ec8b2d44ea" do not match in --initial-advertise-peer-urls [http://172.17.8.101:2380] and --initial-cluster [http://localhost:2380 http://localhos
Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE
Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Unit entered failed state.
Jul 31 02:17:52 coreos-01 systemd[1]: etcd2.service: Failed with result 'exit-code'.
Jul 31 02:18:05 coreos-01 systemd[1]: etcd2.service: Service hold-off time over, scheduling restart.
Jul 31 02:18:05 coreos-01 systemd[1]: Started etcd2.
Jul 31 02:18:05 coreos-01 systemd[1]: Starting etcd2...
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: listening for peers on http://172.17.8.101:2380
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: listening for peers on http://172.17.8.101:7001
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: listening for client requests on http://0.0.0.0:2379
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: listening for client requests on http://0.0.0.0:4001
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: stopping listening for client requests on http://0.0.0.0:4001
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: stopping listening for client requests on http://0.0.0.0:2379
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: stopping listening for peers on http://172.17.8.101:7001
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: stopping listening for peers on http://172.17.8.101:2380
Jul 31 02:18:05 coreos-01 etcd2[1113]: 2015/07/31 02:18:05 etcdmain: advertise URLs of "36cdb64424934ae7a3fcc2ec8b2d44ea" do not match in --initial-advertise-peer-urls [http://172.17.8.101:2380] and --initial-cluster [http://localhost:2380 http://localhos
Jul 31 02:18:05 coreos-01 systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.