kubernetes-retired / bootkube Goto Github PK

View Code? Open in Web Editor NEW

1.2K 47.0 224.0 74.58 MB

This project now lives at https://github.com/kinvolk/bootkube

License: Apache License 2.0

Makefile 0.85% Go 85.40% Shell 10.59% HCL 2.20% Groovy 0.77% Dockerfile 0.20%

k8s-sig-cluster-lifecycle

bootkube's Introduction

Bootkube

Bootkube is a tool for launching self-hosted Kubernetes clusters.

When launched, bootkube will deploy a temporary Kubernetes control-plane (api-server, scheduler, controller-manager), which operates long enough to bootstrap a replacement self-hosted control-plane.

Additionally, bootkube can be used to generate all of the necessary assets for use in bootstrapping a new cluster. These assets can then be modified to support any additional configuration options.

Details of self-hosting

Guides

Usage

Bootkube has two main commands: render and start.

There is a third, experimental command recover which can help reboot a downed cluster (see below).

Render assets

Bootkube can be used to render all of the assets necessary for bootstrapping a self-hosted Kubernetes cluster. This includes generation of TLS assets, Kubernetes object manifests, and a kubeconfig to connect to the bootstrapped cluster.

To see available options, run:

bootkube render --help

Example:

bootkube render --asset-dir=my-cluster

The resulting assets can be inspected / modified in the generated asset-dir.

Start bootkube

To start bootkube use the start subcommand.

To see available options, run:

bootkube start --help

Example:

bootkube start --asset-dir=my-cluster

When bootkube start is creating Kubernetes resources from manifests, the following order is used:

Any Namespace objects are created, in lexicographical order.
Any CustomResourceDefinition objects are created, in lexicographical order.
Any remaining resources are created, in lexicographical order.

Recover a downed cluster

In the case of a partial or total control plane outage (i.e. due to lost master nodes) an experimental recover command can extract and write manifests from a backup location. These manifests can then be used by the start command to reboot the cluster. Currently recovery from a running apiserver, an external running etcd cluster, or an etcd backup taken from the self hosted etcd cluster are the methods.

For more details and examples see disaster recovery documentation.

Development

See Documentation/development.md for more information.

Getting Involved

Want to contribute to bootkube? Have Questions? We are looking for active participation from the community

You can find us at the #bootkube channel on Kubernetes slack.

License

bootkube is under the Apache 2.0 license. See the LICENSE file for details.

bootkube's People

Stargazers

Watchers

Forkers

aaronlevy derekparker chancez 40a gosharplite e-llp dlamotte digideskio thenewnormal rtvt123 universal-it-systems luxas kenan435 cheungpat kalbasit carmenlau noonien askagirl peebs linearregression mkristiansen ethernetdan hongchaodeng brancz arvinhub steve-wormley philips maniacs-ops takac sebrandon1 dhawal55 bassam ijumps justinsb abourget mojmtrbl enterstudio stackpointcloud squat adieu s-urbaniak dhawaltestorg ktbartholomew yifan-gu timechanter carolynvs janwillies bdudelsack lblackstone jamiehannaford smugcloud jlothian hasbro17 lparth diegs simonfuhrer coresolve andrecolin abhinavdahiya zbmc roberthbailey sstarcher rfratto kamalmarhubi mattma2009 madorn bzub rphillips ajeeshgit genti-t asifdxtreme mbssaiakhil sonyeric taharah kubermatic thorfour weikinhuang tony24681379 briceburg ellerbrock cloud-architecture kramergroup euank clustellar brntbeer lilic lannyma next-stack spiffxp fgimenez cmuh homepc91 mingzhaodotname tomdee star-self rbramwell lzbgt idvoretskyi redbaron thomasfricke

bootkube's Issues

bootkube --asset-dir should be able to parse k8s objects directly

Rather than needing a specific directory structure, we should teach bootkube how to parse the kubernetes objects directly. This way we could just parse "api-server flags" and "api-server secrets" for use in bootkube's temporary apiserver and we don't need to specify them (or etcd) separately

Upgrade from Kubernetes v1.2.2 to v1.2.3

What is the upgrade process for this change? Alternately, @aaronlevy will there be a v1.2.3_runonce.0 hyperkube?

Support check pointing configmaps

Chatted with @derekparker about this.

Currently bootkube checkpoints secrets for the bootstrapped API server. This lets the API server refer to secrets in it's manifest (e.g. for TLS assets, config flags, etc). It doesn't support configmaps, which might be more appropriate for non-secret configuration. In my case, this would be a policy file.

Provide a generic mechanism for flag customizations

My initial concern is that I don’t want to plumb through every possible configurable through bootkube -- because it just becomes unnecessarily complex. However, people have differing needs so we need a sane way of allowing customization.

I was thinking of breaking the rendering steps into multiple pieces

init

command: bootkube init

This would output configMap objects (kubernetes objects themselves) with all of the default values we want for each component. So api-server for example would have a configMap with key=value for each of the flags we default.

At this stage a deployer can just go in and modify each component configMap to the values they would like. This also will help us in the future because eventually all core components will be able to retrieve their config from an api object (componentConfig).

validate

command: bootkube validate

This step could be run standalone against the configMap objects, and/or as part of the next bootkube render step.

Here we will validate that options provided in the configMaps are compatible / recommended. For example:

Setting controller-manager --service-cluster-ip-range=10.0.0.0/16 and kubelet --cluster-dns=172.14.0.1 is likely an error and we can warn.

Setting --cloud-provider=aws on the controller-manager, but not in kubelet & api-server are likely a misconfiguration.

This way, flags and their inter-related dependencies can be modeled as a suite of tests -- rather than us trying to plumb this logic through bootkube / templating. And an end user has the same flexibility to modify any flags they want with no code changes necessary (and if validation issues arise - it is just a matter of adding additional tests).

render

command: bootkube render

This step will be modified to get all configuration from the configMaps created above (and run the same validation from above. After validation, we do the same thing we were before: render all of the necessary component manifests -- but have them reference the configMaps for their flags.

To use the configMaps as flags directly (until it is supported upstream), we would need to figure out the best way to convert the key=values to --key=value for use in the command line of the manifest (e.g. command: /hyperkube apis-server $flags)

Or as another option, we just let the render step do the actual conversion from the configMap object to the manifest -- and we don't directly use the configMap objects via the api until supported upstream

checkpointer should be deployed independently from api-server

This can make upgrades of both checkpointer and api-server at the same time somewhat fragile. We should deploy this as a separate component so it can be upgraded independently.

hack/git-version.sh doesn't check for staged/indexed files

We currently only check if there are un-staged changes to mark the build version as "-dirty". We should probably also check for staged/uncommitted/untracked.

Self-host Flannel

We should see how realistic it is to self-host the flannel components. This may just be deploying flannel-server (deployment) and flannel-client (daemonset), then determining if we can get the kubelet to coordinate around "CNI is configured".

Simple scheduler & controller-manager disaster recovery

There are potential failure cases, where you have permanently lost all schedulers and/or all controller-managers, where recovery leaves you in a chicken-egg state:

For example, assume you have lost all schedulers - but still have a functioning api-server which contains the scheduler deployment object:

You need a controller-manager to convert the deployment into unscheduled pods, and you need a scheduler to then schedule those pods to nodes (scheduler to schedule the scheduler, if you will).

While these types of situations should be mitigated by deploying across failure domains, it is still something we need to cover.

In the short term this could mean documenting, for example, how to create a temporary scheduler pod pre-assigned to a node (once a single scheduler exists, it will then schedule the rest of the scheduler pods).

Another option might be to build a tool which knows how to do this for you based on parsing an existing object:

kube-recover deployment kube-controller-manager --target=<node-ip>
kube-recover deployment kube-scheduler --target=<node-ip>

Where the kube-recover tool could read the object from the api-server (or from disk), parse out the podSpec and pre-assign a pod to the target node (bypassing both need for controller-manager and scheduler).

bootkube should log an error when it cannot create a manifest

It seems that bootkube will silently continue if there is, for example, a typo in one of the manifests it tries to create. We should fail on anything but "already exists"

Another service could win race for dns service IP allocation

We need a pre-assigned service IP for the kubernetes dns service - but it's possible when creating all assets (equivalent of kubectl create -f cluster/manifests) that another service is randomly assigned this ip (defaults to 10.3.0.10).

One option would just to force the kube-dns service to be created first - but this seems less than ideal.

self-hosted kubelet using host mount ns can't find cni binaries

Using nsenter to join the host mount namespace before executing the self-hosted kubelet causes the "/" to be the host rootfs, which does not contain the CNI binaries normally shipped in hyperkube image.

Options here are to wait for mount propagation options to be plumbed through the podSpec, or running the self-hosted kubelet via rkt rather than docker (and use rkt-fly stage1 for executing the kubelet pod).

Mostly just need to track this issue somewhere. Also related: #96

Kube-api-checkpoint POD does not resolve localhost on single node self hosted environment

Hi there!

I'm trying to deploy a k8s self hosted cluster on a single node. My environment consist of a Core OS (1068.10.0) virtual machine that sits on an ESXi server, and I'm using a cloud-config file to deploy the k8s cluster, where I have defined 2 different service units, along with kubelet unit:

bootkube-render service, renders k8s spcecs:

[Unit]
Requires=docker.service
After=docker.service
ConditionPathIsDirectory=!/home/core/cluster
[Service]
Type=oneshot
RemainAfterExit=yes
Environment=BOOTKUBE_VERSION=v0.1.4
ExecStart=/usr/bin/docker run -v /home/core:/home/core quay.io/coreos/bootkube:${BOOTKUBE_VERSION} /bootkube render --asset-dir=/home/core/cluster --api-servers=https://{MY_IP}:443 
ExecStart=/bin/mkdir -p /etc/kubernetes
ExecStart=/bin/chown -R core:core /home/core/cluster
ExecStart=/bin/cp /home/core/cluster/auth/kubeconfig /etc/kubernetes
ExecStart=/bin/chmod 0644 /etc/kubernetes/kubeconfig

bootkube-start service, starts bootkube:

[Unit]
Requires=kubelet.service bootkube-render.service
After=kubelet.service 
[Service]
Type=oneshot
RemainAfterExit=yes
Environment=BOOTKUBE_VERSION=v0.1.4
ExecStart=/usr/bin/docker run --net=host -v /home/core:/home/core quay.io/coreos/bootkube:${BOOTKUBE_VERSION} /bootkube start --asset-dir=/home/core/cluster

Everything starts, the cluster is created, and I can use it regularly to deploy other applications, but ... I'm having problems with kube-api-checkpoint POD, it doesn't start and remains in CrashLoopBackOff state. I can see the following in the logs:

kubectl logs pod/kube-api-checkpoint-192.168.0.144 --namespace=kube-system        
I0901 11:17:14.355626       1 main.go:52] begin apiserver checkpointing...
E0901 11:17:14.394681       1 main.go:101] Get http://localhost:10255/pods: dial tcp: lookup localhost on 8.8.8.8:53: no such host
F0901 11:17:14.394784       1 main.go:61] unexpected end of JSON input

So, apparently there is something wrong with DNS resloution in the container. I've been searching what could be the cause, and according to this https://github.com/gliderlabs/docker-alpine/issues/8 , the problem is related to Alpine based containers (pod-checkppoint container is based on Alpine). According to that discussion, the rigth version that works well on k8s is Alpine linux is 3.4.0, which contains musl-libc, version 1.1.13 or greater.

I've checked quay.io/coreos/pod-checkpointer different tags on core os conatiner repository (quay.io/coreos/pod-checkpointer):

pod-checkpointer:f9949bfecdabc573517487d1a5af1e0cd22acf52:

docker run -ti --entrypoint="/bin/sh" quay.io/coreos/pod-checkpointer:f9949bfecdabc573517487d1a5af1e0cd22acf52
/ # cat /etc/o
opt/        os-release
/ # cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.4.0
PRETTY_NAME="Alpine Linux v3.4"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"
/ # ldd --version
musl libc (x86_64)
Version 1.1.14
Dynamic Program Loader
Usage: ldd [options] [--] pathname

pod-checkpointer:91d5a311eee40d579a8f0549c10eeed57979d6c4:

docker run -ti --entrypoint="/bin/sh" quay.io/coreos/pod-checkpointer:91d5a311eee40d579a8f0549c10eeed57979d6c4
/ # cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.3.3
PRETTY_NAME="Alpine Linux v3.3"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"
/ # ldd --version
musl libc
Version 1.1.12
Dynamic Program Loader
Usage: ldd [options] [--] pathname

The only version that has Alpine 3.4.0 is pod-checkpointer:f9949bfecdabc573517487d1a5af1e0cd22acf52

I think that what is going here is that templates generated by bootkube (kube-api-server for instance) are generated with wrong pod-checkpoint container tag (reviewing the code, templates contain the following tag: 91d5a31).

So, I've tryed to manually change pod-checkpoint container tag to the right tag on my virtual machine (just modified pod manifest under /etc/kubernetes/manifests), but it doesn't work either, so right know I' running out of ideas.

Am I missing something?

I'm new to go language, core os, systemd, k8s... and so on :-) so I could be wrong in my assumptions.

Thank you in advance for your attention.

ability to configure flags on various k8s resource manifests

specifically on the API server we need the option to set the --oidc-issuer-url and --oidc-client-id flags

checkpointer doesn't checkpoint all secrets referenced by the API server

I have two secrets referenced by my API server.

"kube-apiserver" is the one generated by bootkube regularly
"abac-policy" is an ABAC policy that I generated and put in the assets/manifests directory before running bootkube (v0.1.5)

After my API server dies, it's clear that the checkpointer correctly rewrites the first secret to a host volume, but not the custom one I provided.

cc @dghubble @aaronlevy @derekparker @pbx0

My API server:

core@node1 ~ $ cat assets/manifests/kube-apiserver.yaml 
apiVersion: "extensions/v1beta1"
kind: DaemonSet
metadata:
  name: kube-apiserver
  namespace: kube-system
  labels:
    k8s-app: kube-apiserver
    version: v1.4.0_coreos.0
spec:
  template:
    metadata:
      labels:
        k8s-app: kube-apiserver
        version: v1.4.0_coreos.0
    spec:
      nodeSelector:
        master: "true"
      hostNetwork: true
      containers:
      - name: checkpoint-installer
        image: quay.io/coreos/pod-checkpointer:969e207f005a78d1823e88bb10be34386eea473f
        command:
        - /checkpoint-installer.sh
        volumeMounts:
        - mountPath: /etc/kubernetes/manifests
          name: etc-k8s-manifests
      - name: kube-apiserver
        image: quay.io/coreos/hyperkube:v1.4.0_coreos.0
        command:
        - /hyperkube
        - apiserver
        - --bind-address=0.0.0.0
        - --secure-port=443
        - --insecure-port=8080
        - --etcd-servers=http://node1.example.com:2379
        - --allow-privileged=true
        - --service-cluster-ip-range=10.3.0.0/24
        - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota
        - --runtime-config=api/all=true
        - --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt
        - --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key
        - --service-account-key-file=/etc/kubernetes/secrets/service-account.pub
        - --client-ca-file=/etc/kubernetes/secrets/ca.crt
        - --authorization-mode=ABAC,RBAC
        - --authorization-rbac-super-user=system:serviceaccount:kube-system:default
        - --runtime-config=rbac.authorization.k8s.io/v1alpha1
        - --authorization-policy-file=/etc/kubernetes/authz/policy.jsonl
        - --oidc-issuer-url=https://cluster.example.com:32000/identity
        - --oidc-client-id=tectonic-kubectl
        - --oidc-username-claim=email
        - --oidc-ca-file=/etc/kubernetes/secrets/ca.crt

        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true
        - mountPath: /etc/kubernetes/secrets
          name: secrets
          readOnly: true
        - mountPath: /etc/kubernetes/authz
          name: policy
          readOnly: true
      volumes:
      - name: ssl-certs-host
        hostPath:
          path: /usr/share/ca-certificates
      - name: etc-k8s-manifests
        hostPath:
          path: /etc/kubernetes/manifests
      - name: secrets
        secret:
          secretName: kube-apiserver
      - name: policy
        secret:
          secretName: abac-policy

My custom secret (which may or may not matter).

$ cat assets/manifests/abac-policy.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: abac-policy
  namespace: kube-system
data:
  "policy.jsonl": eyJraW5kIjoiUG9saWN5IiwiYXBpVmVyc2lvbiI6ImFiYWMuYXV0aG9yaXphdGlvbi5rdWJlcm5ldGVzLmlvL3YxYmV0YTEiLCJzcGVjIjp7InVzZXIiOiIqIiwiZ3JvdXAiOiIqIiwicmVhZG9ubHkiOnRydWUsIm5vblJlc291cmNlUGF0aCI6Ii92ZXJzaW9uIn19Cnsia2luZCI6IlBvbGljeSIsImFwaVZlcnNpb24iOiJhYmFjLmF1dGhvcml6YXRpb24ua3ViZXJuZXRlcy5pby92MWJldGExIiwic3BlYyI6eyJ1c2VyIjoiKiIsImdyb3VwIjoiKiIsInJlYWRvbmx5Ijp0cnVlLCJub25SZXNvdXJjZVBhdGgiOiIvYXBpIn19Cnsia2luZCI6IlBvbGljeSIsImFwaVZlcnNpb24iOiJhYmFjLmF1dGhvcml6YXRpb24ua3ViZXJuZXRlcy5pby92MWJldGExIiwic3BlYyI6eyJ1c2VyIjoiKiIsImdyb3VwIjoiKiIsInJlYWRvbmx5Ijp0cnVlLCJub25SZXNvdXJjZVBhdGgiOiIvYXBpLyoifX0KeyJraW5kIjoiUG9saWN5IiwiYXBpVmVyc2lvbiI6ImFiYWMuYXV0aG9yaXphdGlvbi5rdWJlcm5ldGVzLmlvL3YxYmV0YTEiLCJzcGVjIjp7InVzZXIiOiIqIiwiZ3JvdXAiOiIqIiwicmVhZG9ubHkiOnRydWUsIm5vblJlc291cmNlUGF0aCI6Ii9hcGlzIn19Cnsia2luZCI6IlBvbGljeSIsImFwaVZlcnNpb24iOiJhYmFjLmF1dGhvcml6YXRpb24ua3ViZXJuZXRlcy5pby92MWJldGExIiwic3BlYyI6eyJ1c2VyIjoiKiIsImdyb3VwIjoiKiIsInJlYWRvbmx5Ijp0cnVlLCJub25SZXNvdXJjZVBhdGgiOiIvYXBpcy8qIn19Cnsia2luZCI6IlBvbGljeSIsImFwaVZlcnNpb24iOiJhYmFjLmF1dGhvcml6YXRpb24ua3ViZXJuZXRlcy5pby92MWJldGExIiwic3BlYyI6eyJ1c2VyIjoia3ViZWxldCIsImFwaUdyb3VwIjoiKiIsInJlc291cmNlIjoiKiIsIm5hbWVzcGFjZSI6IioifX0KeyJraW5kIjoiUG9saWN5IiwiYXBpVmVyc2lvbiI6ImFiYWMuYXV0aG9yaXphdGlvbi5rdWJlcm5ldGVzLmlvL3YxYmV0YTEiLCJzcGVjIjp7InVzZXIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCIsImFwaUdyb3VwIjoiKiIsInJlc291cmNlIjoiKiIsIm5hbWVzcGFjZSI6IioifX0K

The checkpointed API server. Notice that "kube-apiserver" secret is correctly converted to a host path, but the abac policy isn't.

core@node1 ~ $ cat /srv/kubernetes/manifests/apiserver.json | jq .
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "temp-apiserver",
    "namespace": "kube-system",
    "creationTimestamp": null
  },
  "spec": {
    "volumes": [
      {
        "name": "ssl-certs-host",
        "hostPath": {
          "path": "/usr/share/ca-certificates"
        }
      },
      {
        "name": "etc-k8s-manifests",
        "hostPath": {
          "path": "/etc/kubernetes/manifests"
        }
      },
      {
        "name": "secrets",
        "hostPath": {
          "path": "/etc/kubernetes/checkpoint-secrets/temp-apiserver/kube-apiserver"
        }
      },
      {
        "name": "policy",
        "secret": {
          "secretName": "abac-policy"
        }
      }
    ],
    "containers": [
      {
        "name": "checkpoint-installer",
        "image": "quay.io/coreos/pod-checkpointer:969e207f005a78d1823e88bb10be34386eea473f",
        "command": [
          "/checkpoint-installer.sh"
        ],
        "resources": {},
        "volumeMounts": [
          {
            "name": "etc-k8s-manifests",
            "mountPath": "/etc/kubernetes/manifests"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "imagePullPolicy": "IfNotPresent"
      },
      {
        "name": "kube-apiserver",
        "image": "quay.io/coreos/hyperkube:v1.4.0_coreos.0",
        "command": [
          "/hyperkube",
          "apiserver",
          "--bind-address=0.0.0.0",
          "--secure-port=443",
          "--insecure-port=8081",
          "--etcd-servers=http://node1.example.com:2379",
          "--allow-privileged=true",
          "--service-cluster-ip-range=10.3.0.0/24",
          "--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota",
          "--runtime-config=api/all=true",
          "--tls-cert-file=/etc/kubernetes/secrets/apiserver.crt",
          "--tls-private-key-file=/etc/kubernetes/secrets/apiserver.key",
          "--service-account-key-file=/etc/kubernetes/secrets/service-account.pub",
          "--client-ca-file=/etc/kubernetes/secrets/ca.crt",
          "--authorization-mode=ABAC,RBAC",
          "--authorization-rbac-super-user=system:serviceaccount:kube-system:default",
          "--runtime-config=rbac.authorization.k8s.io/v1alpha1",
          "--authorization-policy-file=/etc/kubernetes/authz/policy.jsonl",
          "--oidc-issuer-url=https://cluster.example.com:32000/identity",
          "--oidc-client-id=tectonic-kubectl",
          "--oidc-username-claim=email",
          "--oidc-ca-file=/etc/kubernetes/secrets/ca.crt"
        ],
        "resources": {},
        "volumeMounts": [
          {
            "name": "ssl-certs-host",
            "readOnly": true,
            "mountPath": "/etc/ssl/certs"
          },
          {
            "name": "secrets",
            "readOnly": true,
            "mountPath": "/etc/kubernetes/secrets"
          },
          {
            "name": "policy",
            "readOnly": true,
            "mountPath": "/etc/kubernetes/authz"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "imagePullPolicy": "IfNotPresent"
      }
    ],
    "restartPolicy": "Always",
    "terminationGracePeriodSeconds": 30,
    "dnsPolicy": "ClusterFirst",
    "nodeSelector": {
      "master": "true"
    },
    "serviceAccountName": "default",
    "serviceAccount": "default",
    "nodeName": "node1.example.com",
    "hostNetwork": true,
    "securityContext": {}
  },
  "status": {}
}

Here's the checkpoint-secrets directory.

core@node1 ~ $ ls -R /etc/kubernetes/checkpoint-secrets
/etc/kubernetes/checkpoint-secrets:
temp-apiserver

/etc/kubernetes/checkpoint-secrets/temp-apiserver:
kube-apiserver

/etc/kubernetes/checkpoint-secrets/temp-apiserver/kube-apiserver:
apiserver.crt  apiserver.key  ca.crt  service-account.pub

bootkube: rename packages to github.com/kubernetes-incubator/bootkube

See https://groups.google.com/forum/#!topic/kubernetes-dev/Ley-Xe3LiwQ

cc @aaronlevy

unable to get GCE master running

I'm trying to follow GCE Quickstart and I'm stuck on init_master.sh script.

It seems that k8s can't find node to deploy to.
Is something missing from documentation?

> $ gcloud compute instances create k8s-core1 --image https://www.googleapis.com/compute/v1/projects/coreos-cloud/global/images/coreos-alpha-1068-0-0-v20160607 --machine-type n1-standard-1
Created [https://www.googleapis.com/compute/v1/projects/cloud-test-1470325509499/zones/europe-west1-d/instances/k8s-core1].
NAME       ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
k8s-core1  europe-west1-d  n1-standard-1               10.132.0.5   <SOME_IP>  RUNNING

> $ gcloud compute instances add-tags k8s-core1 --tags apiserver
Updated [https://www.googleapis.com/compute/v1/projects/cloud-test-1470325509499/zones/europe-west1-d/instances/k8s-core1].

> $ gcloud compute firewall-rules create api-443 --target-tags=apiserver --allow tcp:443
Created [https://www.googleapis.com/compute/v1/projects/cloud-test-1470325509499/global/firewalls/api-443].
NAME     NETWORK  SRC_RANGES  RULES    SRC_TAGS  TARGET_TAGS
api-443  default  0.0.0.0/0   tcp:443            apiserver

> $ IDENT=~/.ssh/gkey ./init-master.sh SOME_IP
kubelet.master                                                                                                                                     100%  943     0.9KB/s   00:00
init-master.sh
.....
<MORE LOGS>
.....
[  678.500433] bootkube[5]: I0810 08:55:19.236758       5 status.go:87] Pod status kubelet: DoesNotExist
[  678.501210] bootkube[5]: I0810 08:55:19.237609       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  678.501623] bootkube[5]: I0810 08:55:19.238023       5 status.go:87] Pod status kube-scheduler: Pending
[  678.501958] bootkube[5]: I0810 08:55:19.238358       5 status.go:87] Pod status kube-controller-manager: Pending
[  683.500447] bootkube[5]: I0810 08:55:24.236800       5 status.go:87] Pod status kubelet: DoesNotExist
[  683.501129] bootkube[5]: I0810 08:55:24.237529       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  683.501537] bootkube[5]: I0810 08:55:24.237937       5 status.go:87] Pod status kube-scheduler: Pending
[  683.501869] bootkube[5]: I0810 08:55:24.238270       5 status.go:87] Pod status kube-controller-manager: Pending
[  688.500444] bootkube[5]: I0810 08:55:29.236796       5 status.go:87] Pod status kubelet: DoesNotExist
[  688.500834] bootkube[5]: I0810 08:55:29.237233       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  688.501055] bootkube[5]: I0810 08:55:29.237455       5 status.go:87] Pod status kube-scheduler: Pending
[  688.501382] bootkube[5]: I0810 08:55:29.237782       5 status.go:87] Pod status kube-controller-manager: Pending
[  693.500453] bootkube[5]: I0810 08:55:34.236802       5 status.go:87] Pod status kubelet: DoesNotExist
[  693.501157] bootkube[5]: I0810 08:55:34.237557       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  693.501578] bootkube[5]: I0810 08:55:34.237977       5 status.go:87] Pod status kube-scheduler: Pending
[  693.501918] bootkube[5]: I0810 08:55:34.238318       5 status.go:87] Pod status kube-controller-manager: Pending
[  698.500453] bootkube[5]: I0810 08:55:39.236802       5 status.go:87] Pod status kubelet: DoesNotExist
[  698.501167] bootkube[5]: I0810 08:55:39.237564       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  698.501579] bootkube[5]: I0810 08:55:39.237979       5 status.go:87] Pod status kube-scheduler: Pending
[  698.501929] bootkube[5]: I0810 08:55:39.238329       5 status.go:87] Pod status kube-controller-manager: Pending
[  703.500399] bootkube[5]: I0810 08:55:44.236752       5 status.go:87] Pod status kubelet: DoesNotExist
[  703.501034] bootkube[5]: I0810 08:55:44.237410       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  703.501437] bootkube[5]: I0810 08:55:44.237837       5 status.go:87] Pod status kube-scheduler: Pending
[  703.501789] bootkube[5]: I0810 08:55:44.238189       5 status.go:87] Pod status kube-controller-manager: Pending
[  704.127074] bootkube[5]: I0810 08:55:44.863436       5 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-controller-manager-2834499578-cexfh", UID:"ee649929-5ed6-11e6-bcbe-42010a800002", APIVersion:"v1", ResourceVersion:"40", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' no nodes available to schedule pods
[  704.202313] bootkube[5]: I0810 08:55:44.938681       5 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v11-2259792283-dw54e", UID:"ee7243b4-5ed6-11e6-bcbe-42010a800002", APIVersion:"v1", ResourceVersion:"52", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' no nodes available to schedule pods
[  704.482294] bootkube[5]: I0810 08:55:45.218604       5 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-4136156790-13a95", UID:"ee9aa5c1-5ed6-11e6-bcbe-42010a800002", APIVersion:"v1", ResourceVersion:"70", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' no nodes available to schedule pods
[  708.500412] bootkube[5]: I0810 08:55:49.236759       5 status.go:87] Pod status kubelet: DoesNotExist
[  708.501237] bootkube[5]: I0810 08:55:49.237637       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  708.501653] bootkube[5]: I0810 08:55:49.238053       5 status.go:87] Pod status kube-scheduler: Pending
[  708.502045] bootkube[5]: I0810 08:55:49.238446       5 status.go:87] Pod status kube-controller-manager: Pending
[  713.500440] bootkube[5]: I0810 08:55:54.236792       5 status.go:87] Pod status kubelet: DoesNotExist
[  713.501138] bootkube[5]: I0810 08:55:54.237537       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  713.501719] bootkube[5]: I0810 08:55:54.238120       5 status.go:87] Pod status kube-scheduler: Pending
[  713.502066] bootkube[5]: I0810 08:55:54.238467       5 status.go:87] Pod status kube-controller-manager: Pending
[  718.500455] bootkube[5]: I0810 08:55:59.236813       5 status.go:87] Pod status kubelet: DoesNotExist
[  718.501116] bootkube[5]: I0810 08:55:59.237517       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  718.501552] bootkube[5]: I0810 08:55:59.237952       5 status.go:87] Pod status kube-scheduler: Pending
[  718.501885] bootkube[5]: I0810 08:55:59.238282       5 status.go:87] Pod status kube-controller-manager: Pending
[  723.500445] bootkube[5]: I0810 08:56:04.236795       5 status.go:87] Pod status kubelet: DoesNotExist
[  723.501113] bootkube[5]: I0810 08:56:04.237512       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  723.501519] bootkube[5]: I0810 08:56:04.237920       5 status.go:87] Pod status kube-scheduler: Pending
[  723.501866] bootkube[5]: I0810 08:56:04.238266       5 status.go:87] Pod status kube-controller-manager: Pending
[  728.500440] bootkube[5]: I0810 08:56:09.236794       5 status.go:87] Pod status kubelet: DoesNotExist
[  728.501099] bootkube[5]: I0810 08:56:09.237499       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  728.501516] bootkube[5]: I0810 08:56:09.237916       5 status.go:87] Pod status kube-scheduler: Pending
[  728.501942] bootkube[5]: I0810 08:56:09.238342       5 status.go:87] Pod status kube-controller-manager: Pending
[  733.500448] bootkube[5]: I0810 08:56:14.236798       5 status.go:87] Pod status kubelet: DoesNotExist
[  733.501098] bootkube[5]: I0810 08:56:14.237497       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  733.501497] bootkube[5]: I0810 08:56:14.237897       5 status.go:87] Pod status kube-scheduler: Pending
[  733.501826] bootkube[5]: I0810 08:56:14.238226       5 status.go:87] Pod status kube-controller-manager: Pending
[  738.500501] bootkube[5]: I0810 08:56:19.236733       5 status.go:87] Pod status kubelet: DoesNotExist
[  738.501182] bootkube[5]: I0810 08:56:19.237552       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  738.506116] bootkube[5]: I0810 08:56:19.237973       5 status.go:87] Pod status kube-scheduler: Pending
[  738.506423] bootkube[5]: I0810 08:56:19.237984       5 status.go:87] Pod status kube-controller-manager: Pending
[  738.506678] bootkube[5]: I0810 08:56:19.238000       5 status.go:87] Pod status kubelet: DoesNotExist
[  738.506962] bootkube[5]: I0810 08:56:19.238006       5 status.go:87] Pod status kube-apiserver: DoesNotExist
[  738.507398] bootkube[5]: I0810 08:56:19.238012       5 status.go:87] Pod status kube-scheduler: Pending
[  738.507673] bootkube[5]: I0810 08:56:19.238017       5 status.go:87] Pod status kube-controller-manager: Pending
[  738.507939] bootkube[5]: Error: error while checking pod status: timed out waiting for the condition
[  738.508240] bootkube[5]: error while checking pod status: timed out waiting for the condition

Provide bootkube flags table with example arguments

bootkube render commands aren't directly discover-able just from using --help. I read through the hack examples and figured them out. A better way is to mention reasonable examples for each argument in the help output and/or to add a table with example valid arguments.

Should the IPs for SANS include IP1.=172.15.0.2,IP.2=10.3.0.1 or just the IPs (no)? What separator? The API server is specified by IP address. The etcd servers expect a scheme and port. Ideally users should not guess at or piece together the tool's argument assumptions.

Use latest kube-dns

kube-dns got a rewrite in 1.3, and is required for features like federation to work properly. It's also lighter weight/simpler. Bootkube is deploying the old kube-dns, it should get upgraded (also while at it the file should be renamed to a -deployment suffix since its no longer an RC).

allow CA/cert expiration to be configurable

And have a valid option (possibly default) to not expire at all.

This has been tripping users up and they've been getting locked out of their clusters (in kube-aws etc). Administrators that care about security should be able to rotate CA/Certs on their own after the cluster is up. We shouldn't make this expiry decision for them. This should be a config parameter when generating the CAs/Certs.

Some of the initial work is already done in:
https://github.com/coreos/pkg/blob/master/k8s-tlsutil/k8s-tlsutil.go
But the option to not expire needs to be added there.

Build a bootkube container image

With the first v0.1.0 release we should build a container image so this can be run via:

sudo rkt run quay.io/coreos/bootkube:v0.1.0 --exec bootkube

Rendered kubeconfig.yaml uses wrong server value

The kubeconfig.yaml cluster server value is hardcoded to https://172.17.4.100:6443 and I see a TODO comment. Adding this issue to track the fix. For now, I'll comment that users should edit the config manually.

https://github.com/coreos/bootkube/blob/master/pkg/asset/k8s.go#L51

unable to deploy the latest version on baremetal

I'm trying to deploy fresh k8s cluster using coreos-baremetal and bootkube, and keep getting the same error unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account
API server works, at least I can get list of nodes, looks like only the Pods can't be deployed.

I'm using the latest version of coreos-baremetal, and tried bootkube from both this repository and dghubble/bootkube

Previously everything worked fine (at the time when v1.3.0_coreos.0 was used in manifests)

I0831 16:17:24.945130    1974 status.go:88] Pod status kubelet: DoesNotExist
I0831 16:17:24.945280    1974 status.go:88] Pod status kube-apiserver: DoesNotExist
I0831 16:17:24.945314    1974 status.go:88] Pod status kube-scheduler: DoesNotExist
I0831 16:17:24.945341    1974 status.go:88] Pod status kube-controller-manager: DoesNotExist
I0831 16:17:29.945171    1974 status.go:88] Pod status kube-apiserver: DoesNotExist
I0831 16:17:29.945232    1974 status.go:88] Pod status kube-scheduler: DoesNotExist
I0831 16:17:29.945237    1974 status.go:88] Pod status kube-controller-manager: DoesNotExist
I0831 16:17:29.945241    1974 status.go:88] Pod status kubelet: DoesNotExist
E0831 16:17:31.606982    1974 controller.go:547] unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account
E0831 16:17:31.607255    1974 controller.go:547] unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account
I0831 16:17:31.607328    1974 event.go:216] Event(api.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kubelet", UID:"001fdb3b-6f96-11e6-81ac-000c2966c9e7", APIVersion:"extensions", ResourceVersion:"151", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account
I0831 16:17:31.607453    1974 event.go:216] Event(api.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kubelet", UID:"001fdb3b-6f96-11e6-81ac-000c2966c9e7", APIVersion:"extensions", ResourceVersion:"151", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account
E0831 16:17:31.607575    1974 controller.go:547] unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account
I0831 16:17:31.607642    1974 event.go:216] Event(api.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kubelet", UID:"001fdb3b-6f96-11e6-81ac-000c2966c9e7", APIVersion:"extensions", ResourceVersion:"151", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account

simplified bootkube status output

Bootkube currently displays a lot of log output from apiserver+scheduler+controller-manager. This can be confusing and makes it difficult to determine if things are "going well".

We should provide some simplified output of bootkube state (although I think we still need to preserve the log output, as that is the primary means for debugging issues.).

The simplified output might be something like:

control-plane started
manifests created
pods pending
done

One option is to split log vs state output on stdout and stderr.

/cc @sym3tri

Support deploying self-hosted etcd

From the README:

When you start bootkube, you must also give it the addresses of your etcd servers, and enough information for bootkube to create an ssh tunnel to the node that will become a member of the master control plane. Upon startup, bootkube will create a reverse proxy using an ssh connection, which will allow a bootstrap kubelet to contact the apiserver running as part of bootkube.

In the original prototype we had a built in etcd. Why is that no longer part of this?

Git Tag release commits for simpler checkout/signing

Remove log-flush-frequency flag

bootkube imports indirectly k8s.io/kubernetes/pkg/util.logs, which is creating the global flag:

var logFlushFreq = pflag.Duration("log-flush-frequency", 5*time.Second, "Maximum number of seconds between log flushes")

As far as I can guess, this flag will not be used in bootkube, but it shows up through the Usage function.

$ ./bootkube version -h
Output version information

Usage:
  bootkube version [flags]

Global Flags:
      --log-flush-frequency=5s: Maximum number of seconds between log flushes

Can the flag be removed?

Self-hosted kubelet breaks port-forwarding

I found that trying to use kubectl port-forward does not work with the self-hosted cluster. I experienced this at first on 1.3.0, and then tried again after the 1.3.4 PR got merged earlier today. As far as I can tell this has likely always been an issue.

In my gist below, I try to port-forward prometheus, and that doesn't work. I try to port-forward the api-server, and that doesn't work. So I delete the kubelet daemonset, and then I port-forward the api-server again, this time it works.

https://gist.github.com/chancez/bb993765348612f6ee47b59e0e01a8c2

Use inotify-based self-hosted kubelet

Build an updated hyperkube image which uses the inotify-based self-hosted kubelet "pivot". See: kubernetes/kubernetes#23343

Allow cluster members identified by DNS name

Background:

We're migrating provisioning of cluster members to use DNS names instead of IPs to ensure reference clusters are similar to production https://github.com/coreos/coreos-baremetal/issues/291. This allows network settings to be provided by network DHCP server since the gatway, DNS server, IP, etc. can change over the life of the machine. I'll note we're using static MAC address to IP mappings which are fine for now, this is really to move away from hardcoding networkd configurations and instead respect the network setup.

Problem:

There are a few places in bootkube which still assume IPs will be used (--hostname-override=$(MY_POD_IP) via downward api) and I think we'll need the ability to toggle that assumption. In the absence of an explicit --hostname-override, the kubelet exec's uname -n to determine the hostname to use. During provisioning we can enforce that /etc/hostname is populated with the correct name to satisfy the kubelet, but we need kubelet.yaml to leave off the flag. Also, kube-apiserver.yaml also uses the downward api to set - --advertise-address=$(MY_POD_IP) which can be left off.

Desired Changes:

I'll share the tweaks I made which allowed bootkube to bootstrap a DNS-only cluster. The current behavior is that nodes are registered with their DNS name via on-host and then via their IP via the pod kubelet and bootkube start always times out.

Flaky bring-up: apiserver fails

Most of the time bootkube reports that components have started and the cluster is accessible via kubectl. Occasionally, components are reported to be Running, but the apiserver container exits and does not restart. Reporting logs from a cluster where this occurred. Master node 172.15.0.21 is up.

apiserver Docker logs

I0606 20:35:39.439996       1 server.go:188] Will report 172.15.0.21 as public IP address.
I0606 20:35:39.440050       1 plugins.go:71] No cloud provider specified.
I0606 20:35:39.440103       1 server.go:112] constructing etcd storage interface.
  sv: v1
  mv: __internal
I0606 20:35:39.440146       1 genericapiserver.go:82] Adding storage destination for group 
I0606 20:35:39.440152       1 server.go:296] Configuring extensions/v1beta1 storage destination
I0606 20:35:39.440156       1 server.go:112] constructing etcd storage interface.
  sv: extensions/v1beta1
  mv: extensions/__internal
I0606 20:35:39.440176       1 genericapiserver.go:82] Adding storage destination for group extensions
I0606 20:35:39.440179       1 genericapiserver.go:82] Adding storage destination for group autoscaling
I0606 20:35:39.440182       1 genericapiserver.go:82] Adding storage destination for group batch
I0606 20:35:39.440186       1 server.go:325] Configuring autoscaling/v1 storage destination
I0606 20:35:39.440190       1 server.go:339] Using autoscaling/v1 for autoscaling group storage version
I0606 20:35:39.440193       1 server.go:112] constructing etcd storage interface.
  sv: autoscaling/v1
  mv: extensions/__internal
I0606 20:35:39.440216       1 genericapiserver.go:82] Adding storage destination for group autoscaling
I0606 20:35:39.440221       1 server.go:352] Configuring batch/v1 storage destination
I0606 20:35:39.440225       1 server.go:366] Using batch/v1 for batch group storage version
I0606 20:35:39.440228       1 server.go:112] constructing etcd storage interface.
  sv: batch/v1
  mv: extensions/__internal
I0606 20:35:39.440254       1 genericapiserver.go:82] Adding storage destination for group batch
I0606 20:35:39.440631       1 genericapiserver.go:393] Node port range unspecified. Defaulting to 30000-32767.
[restful] 2016/06/06 20:35:39 log.go:30: [restful/swagger] listing is available at https://172.15.0.21:443/swaggerapi/
[restful] 2016/06/06 20:35:39 log.go:30: [restful/swagger] https://172.15.0.21:443/swaggerui/ is mapped to folder /swagger-ui/
I0606 20:35:39.755027       1 genericapiserver.go:692] Serving securely on 0.0.0.0:443
I0606 20:35:39.755048       1 genericapiserver.go:734] Serving insecurely on 127.0.0.1:8080
E0606 20:35:40.019931       1 genericapiserver.go:716] Unable to listen for secure (listen tcp 0.0.0.0:443: bind: address already in use); will try again.

kubelet Docker logs

E0606 22:03:13.811908    1841 event.go:202] Unable to write event: 'dial tcp 172.15.0.21:443: connection refused' (may retry after sleeping)
W0606 22:03:14.299181    1841 manager.go:408] Failed to update status for pod "_()": Get https://172.15.0.21:443/api/v1/namespaces/kube-system/pods/kube-apiserver-6taow: dial tcp 172.15.0.21:443: connection refused
W0606 22:03:14.299421    1841 manager.go:408] Failed to update status for pod "_()": Get https://172.15.0.21:443/api/v1/namespaces/kube-system/pods/kube-proxy-sbzwr: dial tcp 172.15.0.21:443: connection refused
W0606 22:03:14.299602    1841 manager.go:408] Failed to update status for pod "_()": Get https://172.15.0.21:443/api/v1/namespaces/kube-system/pods/kubelet-pipaa: dial tcp 172.15.0.21:443: connection refused

Host kubelet

# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubelet via Hyperkube ACI
Requires=flanneld.service
After=flanneld.service
[Service]
Environment=KUBELET_ACI=quay.io/aaron_levy/hyperkube
Environment=KUBELET_VERSION=v1.2.2_runonce.0
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --runonce \
  --runonce-timeout=60s \
  --api-servers=https://172.15.0.21:443 \
  --kubeconfig=/etc/kubernetes/kubeconfig \
  --lock-file=/var/run/lock/kubelet.lock \
  --allow-privileged \
  --hostname-override=172.15.0.21 \
  --node-labels=master=true \
  --minimum-container-ttl-duration=3m0s \
  --cluster_dns=10.3.0.10 \
  --cluster_domain=cluster.local
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

Generated apiserver manifest

apiVersion: "extensions/v1beta1"
kind: DaemonSet
metadata:
  name: kube-apiserver
  namespace: kube-system
  labels:
    k8s-app: kube-apiserver
    version: v1.2.2_coreos.0
spec:
  template:
    metadata:
      labels:
        k8s-app: kube-apiserver
        version: v1.2.2_coreos.0
    spec:
      nodeSelector:
        master: "true"
      hostNetwork: true
      containers:
      - name: kube-apiserver
        image: quay.io/peanutbutter/hyperkube:v1.2.4_inotify.3
        command:
        - /hyperkube
        - apiserver
        - --bind-address=0.0.0.0
        - --secure-port=443
        - --insecure-port=8080
        - --advertise-address=$(MY_POD_IP)
        - --etcd-servers=http://172.15.0.21:2379
        - --allow-privileged=true
        - --service-cluster-ip-range=10.3.0.0/24
        - --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
        - --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt
        - --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key
        - --service-account-key-file=/etc/kubernetes/secrets/service-account.pub
        - --client-ca-file=/etc/kubernetes/secrets/ca.crt
        env:
          - name: MY_POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        ports:
        - containerPort: 443
          hostPort: 443
          name: https
        - containerPort: 8080
          hostPort: 8080
          name: local
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true
        - mountPath: /etc/kubernetes/secrets
          name: secrets
          readOnly: true
      volumes:
      - name: ssl-certs-host
        hostPath:
          path: /usr/share/ca-certificates
      - name: secrets
        secret:
          secretName: kube-apiserver

Allow a CA to be provided to bootkube render

Bare metal examples could be simpler if a pre-existing CA cert could be provided to bootkube render so that only manifests and other credentials are generated.

Currently, I generate the CA with bootkube, paste it into different machine metadata declarations, then provision nodes to be bootkube-ready. It might also be valuable to pass a desired token to bootkube, depending on how long we're going to be using tokens.

Link to related upstream proposals for self-hosted in README

I've been getting more questions about self-hosted as 1.3 approaches and since I've been talking about self-hosted in a couple of SIGs.

I've gotten feedback that this is awesome work, but the README in the repo doesn't really provide any information about the actual implementation/upstream work involved.

It would be great if we just documented this in some more detail, as well as linking to the upstream proposals this project is related to. I might submit a PR for this when I get some free time, but opening here to document the feedback.

Replace bootkube insecure api port 8081 with 8080

This was initially a workaround because api-server would immediately fail if it couldn't bind on insecure port - but now with kubernetes/kubernetes#28797 it will continually retry binding on the insecure port and we shouldn't need this staggering.

SSH Agent Support

The bootkube start --ssh-keyfile flag assumes a SSH key file is (1) available on disk and (2) not locked by a passphrase. It may be possible to do this in a more general way through the SSH Agent, possibly using https://godoc.org/golang.org/x/crypto/ssh/agent.

Add basic README getting started

Lots of people want to play with this and are pinging @polvi's monokube repo: polvi/monokube#2

People just need a quick getting started. We can mostly just port stuff from https://github.com/polvi/monokube#monokube

Cluster DNS should resolve host /etc/resolv.conf

README: provide enough information for someone to get started

Currently the README for bootkube isn't something that you can dive into and get a sense for how the project works. We probably need two things:

Some diagrams to explain the theory of operation. We can lift diagrams from one of the many presentations that we have given on self hosted. For example: https://speakerdeck.com/philips/pushing-kubernetes-forward?slide=23
A document in a Documentation folder that gives a top to bottom guide on how to use it. Starting from curl of a bootkube release to your laptop and some pre-req configuration of some machines

cc @joshix

Allow an on-host mode of executing bootkube

This would allow for rendering assets likely directly to etcd, then running bootkube locally on a host (no ssh-tunnel) using those assets. The end result I'd probably like to see is something along the lines of:

sudo rkt run quay.io/coreos/bootkube start --etcd=https://example.com:2379

/cc @chancez @dghubble

bootkube flag to tell which k8s version to install

Would be very handy to have the bootkube flag to tell which k8s version to install, as right now
it looks like it is hardcoded.

As we have tested bootkube with corectl and there is a PR #87 how to use it.
I'm looking to swap go binaries to bootkube in kube-solo and kube-cluster to take an advantage of an easy k8s setup, but having no way to tell which k8s version to use, makes that not very useful.

Allow a token to be provided to bootkube render

Currently, the generator always uses the secret token "token". Alternately, @aaronlevy mentioned potential changes which would alter the on-host secret material which must be configured.

conformance test doesn't respect ginkgo skip list

Seems to still run "Feature" tests - so string is probably getting munged somehow in the process. See the conformance tests script

Deploy component-health services

ref: #64

The /componentstatus has some hard-coded expectations which we don't necessarily follow. We should provide an alternative way for easily inspecting cluster health.

One option discussed is deploying cluster-health services which can then be more consistently queried by external tools.

Something like:

Controller-manager and scheduler health can be implemented as normal services

etcd (I believe) would need to be implemented as a selector-less service (and populate the endpoints with known etcd locations). Once we self-host etcd this wouldn't be necessary

checkpointer can't find checkpointed apiserver manifest

I'm seeing some issues with running bootkube using the coreos-baremetal instructions. bootkube runs and reports that it's working, but the API server is never actually available. The bootstrapped API server fails with something like

I0916 17:54:22.088920       1 genericapiserver.go:690] Serving securely on 0.0.0.0:443
I0916 17:54:22.089037       1 genericapiserver.go:735] Serving insecurely on 127.0.0.1:8080
E0916 17:54:22.089963       1 genericapiserver.go:716] Unable to listen for secure (listen tcp 0.0.0.0:443: bind: address already in use); will try again.

(but then exits despite promising to try again)

While the checkpointer complains when it gets brought back up that it can't find a checkpointed manifest

I0916 18:18:23.712503       1 main.go:52] begin apiserver checkpointing...
I0916 18:18:24.048986       1 main.go:83] no apiserver running, installing temp apiserver static manifest
E0916 18:18:24.049506       1 main.go:86] open /srv/kubernetes/manifests/apiserver.json: no such file or directory
I0916 18:19:24.052212       1 main.go:83] no apiserver running, installing temp apiserver static manifest
E0916 18:19:24.052395       1 main.go:86] open /srv/kubernetes/manifests/apiserver.json: no such file or directory
I0916 18:20:24.055698       1 main.go:83] no apiserver running, installing temp apiserver static manifest
E0916 18:20:24.055852       1 main.go:86] open /srv/kubernetes/manifests/apiserver.json: no such file or directory
I0916 18:21:24.057407       1 main.go:83] no apiserver running, installing temp apiserver static manifest
E0916 18:21:24.057584       1 main.go:86] open /srv/kubernetes/manifests/apiserver.json: no such file or directory

Is there some ordering in which these components would be expected to get to this state? E.g. is there something that could fail during an install then cause it? I've been seeing this several times today and yesterday and am trying to figure out if there's something I'm doing wrong.

This is with both bootkube master, and bootkube referring to quay.io/peanutbutter/hyperkube:v1.3.7_eric.0 which was built from coreos/kubernetes#76

cc @derekparker @aaronlevy

Have bootkube start check for active checkpointer pod

This will reduce the window on first boot where an apiserver could fail and there is no checkpoint.

Can just add the expected checkpoint pod to status.go

kube-proxy often comes up in a broken state and does not correct itself

TL;DR: run the hack/multi-node setup and check the logs of your kube-proxy pods.

Running the vanilla, default hack/multi-node setup, I find that a large majority of the time, kube-proxy comes up, and cannot connect to the kube-apiserver, and never succeeds. Usually, all kube-proxies are effected, but not always. The logs are in the gist below.

This severely breaks all functionality within the cluster, as all control components will use the kubernetes.default hostname or 10.3.0.1 which is the api server's cluster IP. This means the scheduler, controller-manager, etc cannot function if their node's local kube-proxy is in this broken state. As a result, this bug tends to prevent the cluster from working entirely, as the scheduler/controller-manager stop functioning. Deleting/restart the kube-proxy can resolve this as it tends to come back up in a working state, but if the scheduler was not working when deleting the kube-proxy pod, the proxy pod will not come back and the cluster remains broken.

This issue seems to have something to do with the pivot of the bootkube API server to the self-hosted API server, but I cannot say for sure. The self-hosted API server logs seem normal.

https://gist.github.com/chancez/b802af6729bfd03a6b629639fa76cd5d

I've got one idea for potentially fixing this, and that might be to configure the controller-manager/scheduler to use the external IP/hostname instead of the service, but I'm not fond of that, or sure if that would resolve it, but rather just make it so the cluster doesn't get into an unrecoverable state. Ideally, we find out why kube-proxy never fixes itself as that seems to be the root issue here.

Build guidelines to avoid missing P224

Kubernetes build issues such as kubernetes/kubernetes#29534 propagate into bootkube since kubernetes is a vendored component. RHEL/Fedora don't include P224 for some license reasons and builds in this environment fail

Name        : golang
Version     : 1.6.3
Release     : 2.fc24
Architecture: x86_64

make install
mkdir -p _output/bin/linux/
GOOS=linux go build -ldflags "-X github.com/kubernetes-incubator/bootkube/pkg/version.Version=v0.2.0" -o _output/bin/linux/bootkube github.com/kubernetes-incubator/bootkube/cmd/bootkube
# github.com/kubernetes-incubator/bootkube/vendor/k8s.io/kubernetes/pkg/util/certificates
vendor/k8s.io/kubernetes/pkg/util/certificates/csr.go:96: undefined: elliptic.P224
# github.com/kubernetes-incubator/bootkube/vendor/github.com/google/certificate-transparency/go/x509
vendor/github.com/google/certificate-transparency/go/x509/x509.go:342: undefined: elliptic.P224
vendor/github.com/google/certificate-transparency/go/x509/x509.go:355: undefined: elliptic.P224
vendor/github.com/google/certificate-transparency/go/x509/x509.go:1461: undefined: elliptic.P224
Makefile:27: recipe for target '_output/bin/linux/bootkube' failed
make: *** [_output/bin/linux/bootkube] Error 2

Using the binary from the go1.7.1 linux/amd64 tarball does seem to allow builds to succeed as reported upstream. It doesn't appear much could be done within bootkube to address this except re-vendoring when upstream reaches a conclusion.

Perhaps I'll add a Go version note to the README for now?

cc @gtank

Vagrant examples using out-dated user-data.sample

The user-data.sample file within the hack/ directory for Vagrant isn't up to date it seems. It's missing some stuff relating to the checkpointer's checkpoint-secrets directory, and im not sure what else, but the checkpointer isn't running when I attempt to use Vagrant.

some k8s components report unhealthy status after cluster bootsrap

curl 127.0.0.1:8080/api/v1/componentstatuses

{
  "kind": "ComponentStatusList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/componentstatuses"
  },
  "items": [
    {
      "metadata": {
        "name": "scheduler",
        "selfLink": "/api/v1/componentstatuses/scheduler",
        "creationTimestamp": null
      },
      "conditions": [
        {
          "type": "Healthy",
          "status": "False",
          "message": "Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused"
        }
      ]
    },
    {
      "metadata": {
        "name": "controller-manager",
        "selfLink": "/api/v1/componentstatuses/controller-manager",
        "creationTimestamp": null
      },
      "conditions": [
        {
          "type": "Healthy",
          "status": "False",
          "message": "Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: getsockopt: connection refused"
        }
      ]
    },
    {
      "metadata": {
        "name": "etcd-0",
        "selfLink": "/api/v1/componentstatuses/etcd-0",
        "creationTimestamp": null
      },
      "conditions": [
        {
          "type": "Healthy",
          "status": "True",
          "message": "{\"health\": \"true\"}"
        }
      ]
    }
  ]

How to upstream?

Hi,
I'd like to know your thoughts on upstreaming.
How is this going to be integrated with the new turnup ux that's developed in sig-cluster-lifecycle?

Really nice thing, would like to make it even more portable

scheduler/controller-manager leader election identity uses hostname - but could have multiple per host

We use deployments for both scheduler + controller-manager, so this means it is possible for multiple copies of these components to end up on the same host. Because the leader-election identity is based on hostname, each pod on same host could think they are master (I haven't 100% verified this, but based on skim of the code).

Scheduler: https://github.com/kubernetes/kubernetes/blob/v1.4.3/plugin/cmd/kube-scheduler/app/server.go#L136

Controller-maanger https://github.com/kubernetes/kubernetes/blob/v1.4.3/cmd/kube-controller-manager/app/controllermanager.go#L176

We could try making use of anti-affinity to keep them from being scheduled on same host, as this isn't meant for throughput - but rather limiting failure / total loss of schedulers/CMs - so multiple on same host doesn't make a lot of sense.

And/or we should come up with a better ID than the hostname.