coreos / vault-operator Goto Github PK

View Code? Open in Web Editor NEW

760.0 38.0 110.0 568 KB

Run and manage Vault on Kubernetes simply and securely

Home Page: https://coreos.com/blog/introducing-vault-operator-project

License: Apache License 2.0

Shell 10.65% Go 89.31% HCL 0.04%

vault kubernetes operator operators security

vault-operator's Introduction

Vault Operator

Project status: beta

The basic features have been completed, and while no breaking API changes are currently planned, the API can change in a backwards incompatible way before the project is declared stable.

Overview

The Vault operator deploys and manages Vault clusters on Kubernetes. Vault instances created by the Vault operator are highly available and support automatic failover and upgrade.

Getting Started

Prerequisites

Kubernetes 1.8+

Configuring RBAC

Consult the RBAC guide on how to configure RBAC for the Vault operator.

Deploying the etcd operator

The Vault operator employs the etcd operator to deploy an etcd cluster as the storage backend.

Create the etcd operator Custom Resource Definitions (CRD):
```
kubectl create -f example/etcd_crds.yaml
```

Deploy the etcd operator:

kubectl -n default create -f example/etcd-operator-deploy.yaml

Deploying the Vault operator

Create the Vault CRD:

kubectl create -f example/vault_crd.yaml

Deploy the Vault operator:

kubectl -n default create -f example/deployment.yaml

Verify that the operators are running:

$ kubectl -n default get deploy
NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
etcd-operator    1         1         1            1           5m
vault-operator   1         1         1            1           5m

Deploying a Vault cluster

A Vault cluster can be deployed by creating a VaultService Custom Resource(CR). For each Vault cluster the Vault operator will also create an etcd cluster for the storage backend.

Create a Vault CR that deploys a 2 node Vault cluster in high availablilty mode:
```
kubectl -n default create -f example/example_vault.yaml
```

Wait until the example-... pods for the etcd and Vault cluster are up:

$ kubectl -n default get pods
NAME                              READY     STATUS    RESTARTS   AGE
etcd-operator-78899f87f6-qdn5h    3/3       Running   0          10m
example-7678c8f49c-kfx2w          1/2       Running   0          2m
example-7678c8f49c-pqrj8          1/2       Running   0          2m
example-etcd-7lpjg7n76d           1/1       Running   0          2m
example-etcd-dhxrksssgx           1/1       Running   0          2m
example-etcd-s7mzhffz92           1/1       Running   0          2m
vault-operator-5976f74f84-pxkf6   1/1       Running   0          10m

Get the Vault pods:

$ kubectl -n default get pods -l app=vault,vault_cluster=example
NAME                       READY     STATUS    RESTARTS   AGE
example-7678c8f49c-kfx2w   1/2       Running   0          2m
example-7678c8f49c-pqrj8   1/2       Running   0          2m

Check the Vault CR status:

$ kubectl -n default get vault example -o yaml
apiVersion: vault.security.coreos.com/v1alpha1
kind: VaultService
metadata:
    name: example
    namespace: default
    ...
spec:
    nodes: 2
    version: 0.9.1-0
    ...
status:
    initialized: false
    phase: Running
    updatedNodes:
    - example-7678c8f49c-kfx2w
    - example-7678c8f49c-pqrj8
    vaultStatus:
        active: ""
        sealed:
        - example-7678c8f49c-kfx2w
        - example-7678c8f49c-pqrj8
        standby: null
    ...

The Vault CR status shows the cluster is currently uninitialized and sealed.

Using the Vault cluster

See the Vault usage guide on how to initialize, unseal, and use the deployed Vault cluster.

Consult the monitoring guide on how to monitor and alert on a Vault cluster with Prometheus.

See the recovery guide on how to backup and restore Vault cluster data using the etcd opeartor

For an overview of the default TLS configuration or how to specify custom TLS assets for a Vault cluster see the TLS setup guide.

Uninstalling Vault operator

Delete the Vault custom resource:

kubectl -n default delete -f example/example_vault.yaml

Delete the operators and other resources:

kubectl -n default delete deploy vault-operator etcd-operator
kubectl -n default delete -f example/rbac.yaml

vault-operator's People

Contributors

Stargazers

Watchers

Forkers

hasbro17 dmyerscough vroy adamdecaf shaunstanislauslau edwardjrp linecode rgodishela jkevlin admtthomas fengzixu ehernandez-xk colhom stakater prydie srang mize85 raoofm rxacevedo thereallukl zenaptix-lab mmiller1 dhrp acidburn0zzz therealsamlin wshearn triwats azalio johnkim76 golang-alex-alex2006hw metabol louisvernon linus5 claytonsilva all-the-time taemon1337 cpick bopini jayunit100 pathcl gfeldmanc getoutreach daviey tw3rp wackxu jangocheng krisnova wolfram-laube eversmily amcbarnett ryaneorth hnhbdss tmc gmaliar tidalf rkamisetti14792 bderickson injeti-manohar saadrana219 wgaldino pks-os dtkachenko kxx747 yeonwoonj kumarchatla another-maverick chandanghosh benjamink songtao antenehlab loxo33 bpatton00 geosong miradam kylape cdchris12 guangbochen psi bercikr koba1t cloud-land leitmedium deeco diamanticom lulianbing58 dminca ionfury monergeim barkardk ma-sattari syhan ossys ftntming happynation laashub-soa etsangsplk devopstoday11 jmshoffs0812 clix-dev-llc marvel-works

vault-operator's Issues

etcd-operator fails with "a container name must be specified for pod etcd-operator-764f7ff957-w7shx, choose one of: [etcd-operator etcd-backup-operator etcd-restore-operator]"

a container name must be specified for pod etcd-operator-764f7ff957-w7shx,
choose one of:
[etcd-operator etcd-backup-operator etcd-restore-operator]

Hi, I am just following the README to try this out for the first time and it is failing at the kubectl create -f example/etcd-operator-deploy.yaml step. Only 2/3 pods come up and the last one is stuck in CrashLoopBackoff.

Add CHANGELOG

We need to keep a CHANGELOG to track and showcase major changes for each release.

Looking for maintainers

As the current maintainers we are focusing our efforts on tools to help all engineers build Operators with the Operator Framework and less time on specific Operators such as Vault. Building the Vault Operator was key to learning which abstractions to provide in our SDK. As a result of our commitments to the new projects we are not able to give enough time to the Vault Operator.

With our realigned focus, we explored a number of options (including with Hashicorp) to ensure the health of this project. At this point, we're looking for maintainers to help us realize the vision of this project. We will be involved with the limited time we can make available for reviewing code, etc as those maintainers get up to speed. If you're interested, please let us know by commenting down below so we can reach out to you.

vault HA configuration

Enable the HA setup in Vault and test it manually
Test that the HA setup actually works with an integration test

cannot configure vault storage backend

As of now we cannot configure the path for etcd storage backend and it defaults to vault/

add statsd_exporter as sidecar container to expose vault metrics in prometheus format

ref: https://github.com/prometheus/statsd_exporter

etcd-operator deploy fails: is forbidden: User "system:serviceaccount:default:default" cannot get pods in the namespace "default"

level=fatal msg="create service failed: pods "etcd-operator-764f7ff957-4h5bw" is forbidden: User "system:serviceaccount:default:default" cannot get pods in the namespace "default": User "system:serviceaccount:default:default" cannot get pods in project "default"

On Openshift 3.10/kube 1.10

Deployable Operator on Kubernetes

The next major milestone for the project is a deployable Operator container. This should largely be inspired by the etcd Operator and have the following workflow:

Install the Vault Operator via kubectl:

$ kubectl create -f Documentation/examples/vault-operator-deployment.yaml

Vault Operator will automatically creates a Kubernetes Custom Defined Resource (CDR):

$ kubectl get customresourcedefinitions
NAME                      DESCRIPTION             VERSION(S)
vault.coreos.com   Managed vault instance v1alpha1

Custom service account

At the moment when vault-operator creates pod, it uses the default service account. I would like to define custom service account so I can, for example, define Pod Security Policies only for the vault pod.

I can probably provide PR of this but would like to get some feedback first. Is there something that I should remember?

no-op after creation of the CRD on openshift 3.10

On openshift (3.10), i see no resources created after i make the vault CRD.

The vault operator is healthy:

➜  vault-operator git:(openshift) ✗ oc get pods --all-namespaces | grep vau
vault                               vault-operator-694f678549-x6jb4                       1/1       Running            0          1h

The logs when creating a CRD, look like this:

ERROR: logging before flag.Parse: I1010 17:12:13.892378       1 leaderelection.go:184] successfully acquired lease vault/vault-operator
time="2018-10-10T17:12:13Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"vault\", Name:\"vault-operator\", UID:\"a1fa7cf5-ccaf-11e8-b49a-005056b9215d\", APIVersion:\"v1\", ResourceVersion:\"1628479\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' vault-operator-694f678549-x6jb4 became leader"
time="2018-10-10T17:12:13Z" level=info msg="starting Vaults controller"
ERROR: logging before flag.Parse: W1010 17:28:12.478461       1 reflector.go:334] github.com/coreos-inc/vault-operator/pkg/operator/controller.go:35: watch of *v1alpha1.VaultService ended with: The resourceVersion for the provided watch is too old.

Creating new CRD's after this point doesnt seem to lead to any log messages. Not sure why.

Persistent Volume Claims option?

Hi,

I'm pretty new to k8 and vault. I follow the readme it works great! But, is there a reason you are not using "Persistent Volume Claims" on the etcd storage? Or is it in a future release plan?

Please let know if there is a better place to ask this type of questions. Thank you.

health check for standby returns a failure status

health check for standby returns a failure status code, causing the second vault instance in HA to be marked as failed, which in turn causes deployment to fail.

https://github.com/coreos/vault-operator/blob/master/pkg/util/k8sutil/vault.go#L189
and
https://github.com/coreos/vault-operator/blob/master/pkg/util/k8sutil/vault.go#L201

handle upgrades

the upgrade algorithm is described here:

https://www.vaultproject.io/docs/guides/upgrading/index.html

Auto initialize vault

Vault should be auto initialized and the keys should be sent to aws kms. Either kube2iam be used to pass the aws credentials or accessKey/SecretKey pair can be used too.

Initial check should be made if vault was initialized before. If yes, then #308 or continue with init.

Ingress to access Vault

I have been deployed the vault example, but dont can configure ingress to access externally

Vault

apiVersion: "vault.security.coreos.com/v1alpha1"
kind: "VaultService"
metadata:
  name: "example"
spec:
  nodes: 1
  version: "0.10.4"

Ingress

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  generation: 6
  labels:
    app: vault
  name: kube-vault
  namespace: vault-production
spec:
  rules:
  - host: secret.devops.agilitynetworks.com.br
    http:
      paths:
      - backend:
          serviceName: example
          servicePort: 8200
        path: /
  tls:
  - hosts:
    - secret.devops.agilitynetworks.com.br
    secretName: tls-secret-wildcard

Secret

kubectl get secret tls-secret-wildcard 
NAME                  TYPE                DATA      AGE
tls-secret-wildcard   kubernetes.io/tls   2         5d

Docs: Manual TLS Setup

For the initial pass it is OK if we have the user generate and provide all of the required TLS assets. We will need to write docs for this though and explain where to put the assets. This will likely be similar to the etcd Operator docs: https://github.com/coreos/etcd-operator/blob/master/doc/user/cluster_tls.md

Document RBAC setup guide

The current RBAC templates are too broad and were only meant for demo purposes.

There should be a detailed RBAC setup guide with permissions specific to the actual resources needed by the vault-operator.

Upgrade the vault cluster without an Active Node resulting a extra vault Node

I noticed that upgrading a newly created vault cluster that doesn't have any Active node creates one extra node than the size indicated in the vault Custom Resource(CR). The extra node persists until one of the Node becomes Active; Then the extra node is terminated.

Ideally, updating a Vault Cluster without an Active Node should have the same number of nodes before and after.

Vault as statefulset

I'm working on automated deployment of vault with operator. What I noticed is that the operator is setting instances as deployment. It makes unsealing particular instances a bit troublesome (I need to check k8s API for IP of particular instance and then connect to it. I believe migrating to statefulset with known DNS names for each instance would make automated unsealing much easier.
What do you think about?

Thanks.

Vault CRD definition

Things it must include:

Cluster size
TLS certificate configuration
Listener configuration
Service and ingress configuration
Version information

Design: How to interact with etcd Operator

The Vault Operator will take a hard dependency on the etcd Operator. We need to figure out how these interact. Some possible options:

Vault Operator deploys an etcd cluster by creating the etcd cluster resource
Vault Operator requires the user to already have an etcd cluster resource and point at it
Vault Operator takes a hostname and TLS cert of an etcd cluster and doesn't know about etcd Operator
Vault Operator has a tool to generate an etcd cluster resource based on some configuration

Auto unseal vault

Vault can be auto unsealed by using the keys from aws kms. See #307. Credentials can be passed via kube2iam or via accessKey/SecretKey pair.

is the IPC_LOCK capability really needed?

Vault operator creates a vault deployment requesting the IPC_LOCK capability.
But in kubernetes swap is mandatorily disabled (the kubelet now doesn't start if swap is active).
So, if vault can be set to run with disable_mlock=true, then the IPC_LOCK can be probably removed.
This makes deployment simpler in those organization where pod security contexts (kubernetes) or scc (OpenShift) are closely scrutinized.

vaultStatus - Active node: pod doesnt exist any more

Hi there

I have an issue with my vault cluster.

Currently, i don't have any more an active vault pod.

kubectl -n default get vault poc-vault -o jsonpath='{.status.vaultStatus.active}' | xargs -0 -I {} kubectl -n default port-forward {} 8200
Error from server (NotFound): pods "poc-vault-d66f8f747-24f5m" not found

k get pods | grep vault

poc-vault-d66f8f747-lqt76                                      1/2       Running   0          2d
poc-vault-d66f8f747-rgdth                                      1/2       Running   2          2d
poc-vault-etcd-2q9tl8m4hn                                      1/1       Running   0          3d
poc-vault-etcd-gqtp5sgs4p                                      1/1       Running   0          2d
poc-vault-etcd-x6ht4ckrpz                                      1/1       Running   0          2d

kubectl -n default get vault poc-vault -o json
{
    "apiVersion": "vault.security.coreos.com/v1alpha1",
    "kind": "VaultService",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"vault.security.coreos.com/v1alpha1\",\"kind\":\"VaultService\",\"metadata\":{\"annotations\":{},\"name\":\"poc-vault\",\"namespace\":\"default\"},\"spec\":{\"TLS\":{\"static\":{\"clientSecret\":\"vault-client-tls\",\"serverSecret\":\"vault-server-tls\"}},\"nodes\":2,\"version\":\"0.9.1-0\"}}\n"
        },
        "clusterName": "",
        "creationTimestamp": "2018-05-28T13:55:46Z",
        "generation": 0,
        "name": "poc-vault",
        "namespace": "default",
        "resourceVersion": "5029789",
        "selfLink": "/apis/vault.security.coreos.com/v1alpha1/namespaces/default/vaultservices/poc-vault",
        "uid": "d2174cf2-627e-11e8-950f-fa163e1a205a"
    },
    "spec": {
        "TLS": {
            "static": {
                "clientSecret": "vault-client-tls",
                "serverSecret": "vault-server-tls"
            }
        },
        "baseImage": "quay.io/coreos/vault",
        "configMapName": "",
        "nodes": 2,
        "version": "0.9.1-0"
    },
    "status": {
        "clientPort": 8200,
        "initialized": true,
        "phase": "Running",
        "serviceName": "poc-vault",
        "updatedNodes": [
            "poc-vault-d66f8f747-lqt76",
            "poc-vault-d66f8f747-rgdth"
        ],
        "vaultStatus": {
            "active": "poc-vault-d66f8f747-24f5m",
            "sealed": [
                "poc-vault-d66f8f747-lqt76",
                "poc-vault-d66f8f747-rgdth"
            ],
            "standby": null
        }
    }
}

The current active pod in vaultStatus is not here any more...

I check the operator logs

k logs -f vault-operator-67d5846657-4zdfq
time="2018-06-01T14:02:44Z" level=info msg="Go Version: go1.9.2"
time="2018-06-01T14:02:44Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-06-01T14:02:44Z" level=info msg="vault-operator Version: 0.1.9"
time="2018-06-01T14:02:44Z" level=info msg="Git SHA: 43a1dd7"
ERROR: logging before flag.Parse: I0601 14:02:44.832229       1 leaderelection.go:174] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0601 14:03:02.263682       1 leaderelection.go:184] successfully acquired lease default/vault-operator
time="2018-06-01T14:03:02Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"default\", Name:\"vault-operator\", UID:\"ce489616-627b-11e8-950f-fa163e1a205a\", APIVersion:\"v1\", ResourceVersion:\"4822438\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' vault-operator-67d5846657-4zdfq became leader"
time="2018-06-01T14:03:02Z" level=info msg="starting Vaults controller"
time="2018-06-01T14:03:02Z" level=info msg="Vault CR (default/poc-vault) is created"
time="2018-06-01T14:03:12Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp 10.2.2.76:8200: getsockopt: connection refused"
time="2018-06-01T14:03:22Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp 10.2.2.76:8200: getsockopt: connection refused"
time="2018-06-02T01:38:44Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-24f5m): Get https://10-2-0-67.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp: i/o timeout"
time="2018-06-02T01:39:14Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp: i/o timeout"
ERROR: logging before flag.Parse: W0602 13:29:46.715365       1 reflector.go:334] github.com/coreos-inc/vault-operator/pkg/operator/controller.go:35: watch of *v1alpha1.VaultService ended with: too old resource version: 5029789 (5239818)

I delete my 2 previous pods

k delete pod poc-vault-d66f8f747-lqt76 poc-vault-d66f8f747-rgdth

they were recreated but vault-poc seems to be stuck on poc-vault-d66f8f747-24f5m

kubectl -n default get vault poc-vault -o jsonpath='{.status.vaultStatus.active}'
poc-vault-d66f8f747-24f5m

any ideas?
How to force to check again the other pods and elect a new master?

Thanks for your help

Can someone show an example of how to use curl/HTTP API?

I have everything set up and am able to read/write secrets e.g. vault read secret/foo, however when I try the curl commands e.g. curl --header "X-Vault-Token: $VAULT_TOKEN" http://localhost:8200/v1/secret/foo and http://example.default.svc:8200/v1/secret/foo it returns blank. I checked the $VAULT_TOKEN and it seems to be ok.

support for "service serving secret" capability

OpenShift has the capability to generate certificates on the fly using it's internal KPI:
https://docs.openshift.com/container-platform/3.9/dev_guide/secrets.html#service-serving-certificate-secrets
It would be nice if the operator could work with this capability (it's a matter of setting the right annotation on the service fronting vault).
This allows to create certificates that are automatically trusted by other pods running in the cluster.
In the current implementation of the operator, certificate distribution is either manual or automatic via a new KPI that is not know/trusted by the rest of the cluster.

etcd Cluster fails to start

Hi guys,
thank you for your effort. I've tried diving into this following the example, however, my etcd cluster fails to start.

First it looks like this

kubectl -n default get pods
NAME                              READY     STATUS     RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running    0          28m
example-etcd-f4rhsm64d4           0/1       Init:0/1   0          9s
vault-operator-7dc8b55b4d-mkz5p   1/1       Running    0          28m

Then it errors out, without providing any output:

 nvarz:~/playground/vault-operator (master *%=)$ kubectl -n default get pods -w
NAME                              READY     STATUS    RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running   0          29m
example-etcd-9x68tmdxl7           0/1       Error     0          40s
example-etcd-f4rhsm64d4           0/1       Running   0          56s
vault-operator-7dc8b55b4d-mkz5p   1/1       Running   0          29m
^C nvarz:~/playground/vault-operator (master *%=)$ kubectl describe -n default example-etcd-9x68tmdxl7
the server doesn't have a resource type "example-etcd-9x68tmdxl7"
 nvarz:~/playground/vault-operator (master *%=)$ kubectl describe -n default example-etcd-f4rhsm64d4
the server doesn't have a resource type "example-etcd-f4rhsm64d4"

Container Logs of example-etcd-f4rhsm64d4

WARNING: 2018/10/22 17:34:08 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:2379: getsockopt: connection refused"; Reconnecting to {0.0.0.0:2379 0  <nil>}
2018-10-22 17:34:08.883317 I | raft: 850a5ffb901b8769 is starting a new election at term 119
2018-10-22 17:34:08.883345 I | raft: 850a5ffb901b8769 became candidate at term 120
2018-10-22 17:34:08.883354 I | raft: 850a5ffb901b8769 received MsgVoteResp from 850a5ffb901b8769 at term 120
2018-10-22 17:34:08.883362 I | raft: 850a5ffb901b8769 [logterm: 2, index: 5] sent MsgVote request to 3922346fe0f3212c at term 120
2018-10-22 17:34:09.883322 I | raft: 850a5ffb901b8769 is starting a new election at term 120
2018-10-22 17:34:09.883353 I | raft: 850a5ffb901b8769 became candidate at term 121
2018-10-22 17:34:09.883361 I | raft: 850a5ffb901b8769 received MsgVoteResp from 850a5ffb901b8769 at term 121
2018-10-22 17:34:09.883369 I | raft: 850a5ffb901b8769 [logterm: 2, index: 5] sent MsgVote request to 3922346fe0f3212c at term 121
2018-10-22 17:34:10.168368 W | rafthttp: health check for peer 3922346fe0f3212c could not connect: dial tcp 10.42.8.6:2380: getsockopt: connection refused
WARNING: 2018/10/22 17:34:10 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:2379: getsockopt: connection refused"; Reconnecting to {0.0.0.0:2379 0  <nil>}
2018-10-22 17:34:10.578366 I | etcdserver: skipped leadership transfer for stopping non-leader member
WARNING: 2018/10/22 17:34:10 grpc: addrConn.transportMonitor exits due to: context canceled
2018-10-22 17:34:10.578447 I | rafthttp: stopping peer 3922346fe0f3212c...
2018-10-22 17:34:10.578471 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (writer)
2018-10-22 17:34:10.578485 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (writer)
2018-10-22 17:34:10.578522 I | rafthttp: stopped HTTP pipelining with peer 3922346fe0f3212c
2018-10-22 17:34:10.578533 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (stream MsgApp v2 reader)
2018-10-22 17:34:10.578538 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (stream Message reader)
2018-10-22 17:34:10.578544 I | rafthttp: stopped peer 3922346fe0f3212c

What I end up with

kubectl -n default get pods -w
NAME                              READY     STATUS      RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running     0          33m
example-668f9f8f7d-76mh7          1/2       Running     0          3m
example-668f9f8f7d-7m4vd          1/2       Running     0          3m
example-668f9f8f7d-n7vgr          1/2       Running     0          3m
example-etcd-9x68tmdxl7           0/1       Error       0          4m
example-etcd-f4rhsm64d4           0/1       Completed   0          4m
vault-operator-7dc8b55b4d-mkz5p   1/1       Running     0          33m

Nothing listed under sealed

kubectl -n default get vault example -o yaml
apiVersion: vault.security.coreos.com/v1alpha1
kind: VaultService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"vault.security.coreos.com/v1alpha1","kind":"VaultService","metadata":{"annotations":{},"name":"example","namespace":"default"},"spec":{"nodes":3,"version":"0.9.1-0"}}
  creationTimestamp: 2018-10-22T17:31:07Z
  generation: 1
  name: example
  namespace: default
  resourceVersion: "2377045"
  selfLink: /apis/vault.security.coreos.com/v1alpha1/namespaces/default/vaultservices/example
  uid: 42467cbb-d620-11e8-8fcd-0050568b2ddd
spec:
  TLS:
    static:
      clientSecret: example-default-vault-client-tls
      serverSecret: example-default-vault-server-tls
  baseImage: quay.io/coreos/vault
  configMapName: ""
  nodes: 3
  version: 0.9.1-0
status:
  clientPort: 8200
  initialized: false
  phase: Running
  serviceName: example
  vaultStatus:
    active: ""
    sealed: null
    standby: null

Performance issues with vaults instances in standby

I installed vault-operator using the stable helm chart. I simply did the default install with a 2 node cluster as per the readme.

I unsealed both vaults, the first became active and the 2nd went on standby as expected. I did a simple write to the backend and left the cluster as is. nothing within the cluster was set to interact with the the vault cluster other than anything set by the chart.

What I noticed a few hours later is that the standby node as well as the etcd pod running with it started to have increased CPU and memory usage over time. Once I killed the standby vault pod the CPU and memory decreased back to normal. k8s restarted the killed pod and I left the vault sealed, and the active vault instance was leave unsealed. The metrics remained stable for hours. Once I unsealed the vault again the CPU and memory usage steadily increased again.

Is this a known issue or a misconfiguration on my end?

Attached is both the k8s node metrics as well as the offending vault metrics, it should be pretty clear when the vault instance was unsealed and went on standby, was killed and when it was unsealed once more.

How to retrieve secrets from app using REST

I created a Vault service "cos-private" in "default" namespace by following the instructions, and was able to write and read secrets( vault write secret/my-test key1=value1 key2=value2 ...., and vault read secret/my-test). Now, I want to be able to retrieve my-test secrets from my app using REST. What is the URL to get the secrets?
I tried "curl -k https://cos-private.default.svc:8200/secret/my-test", but got "404 page not found".

-cluster-wide flag?

etcd-operator have -cluster-wide flag which allow that you run only one operator in whole cluster and you can create EtcdCluster in any namespace.

At the moment you need to run vault-operator in every namespace? Any plans to add similar support to vault-operator?

vault-operator erroneously "updates" (kills) active node if it can't be reached/is unhealthy

If Vaults.updateLocalVaultCRStatus() can't query a node or determine that it's healthy, Vaults.syncUpgrade() will:

Assume an update is in progress.
Erroneously determine that the active node is the only non-updated node.
Kill it to "complete" the update.

This causes disruption while a standby node takes over and (in installations without auto-unsealing) reduces resiliency by eliminating one of the unsealed, standby nodes.

Allow setting security context for vault and etcd pods

A k8s cluster can have admission control policies that can restrict the allowed pod security policy.

The vault-operator should allow a user to set the pod security policy for a vault cluster's pods via the PodPolicy field.
https://github.com/coreos-inc/vault-operator/blob/master/pkg/apis/vault/v1alpha1/types.go#L79

Similarly for the EtcdCluster created by the vault-operator for a vault cluster, the PodSecurityPolicy should be configurable through the VaultService CR.
https://github.com/coreos/etcd-operator/releases/tag/v0.9.2

Allow alternate (proxy) docker images for vault and statsd exporter

Cool project. Thanks for open sourcing it!

I was reading the code and noticed a couple images are pinned to external docker registries. It would be nice for environments which don't allow external docker registries to offer an override. My usecase is nothing more than a direct "retag" of the images used.

This would obviously open up a support nightmare if people used alternate vault/statsd images, but that could be left as not supported.

Setup CI/CD for the build system

No opinion on the tool we use, but probably should be jenkins?

Namespace vault-operator error

Hi,

When I try to create vault-operator per namespace it gives following error. Error: release vault-operator failed: customresourcedefinitions.apiextensions.k8s.io "vaultservices.vault.security.coreos.com" already exists.

Is this known issue?

Thanks

auto backup and recovery

As of now the way it is documented in vault-operator docs is a manual step. Automating this flow would be really beneficial in production scenarios. Having mentioned this, I guess this requires certain features to be completed in etcd-operator and probably most of the work required to complete this task has to be done in etcd-operator. Vault-operator might just need config or may be no work.

Recommended way to monitor sealed/unsealed status?

I have a vault-operator-installed cluster that I'm monitoring with Prometheus. The StatsD exporter provides some metrics, but I am looking for a way to track the number of sealed/unsealed vault instances. (For example: I'd like to trigger an alert when a certain number of nodes are unsealed.)

I know that vault-operator keeps the VaultService resource's "status" section up to date with this information, but I haven't found a convenient way to get this information into Prometheus.

I've found the vault_exporter tool, perhaps that can be installed as a sidecar alongside the StatsD exporter?

vault replication across regions

It would be great to have a way to specify a secondary vault cluster running in a different region(DC) via vault-operator. Similar to what vault has as a secondary performace/DR replication.

Right now as there is no way to specify an existing etcd cluster #303 to be used by vault-operator, it is not possible to solve the DR problem by using etcd make-mirror tool.

So if #303 is solved then replication for DR will be possible but still some work is needed to be done in vault-operator to enable performance or keys only mirroring for below depoyment config
1 active writable
n active readable across regions
n standby across regions

Support Vault 0.11 and greater

I love the vault operator, it simplifies a lot of the issues around deploying vault in a redundant way with ephemeral storage. Also the simplicity of backing up and restoring is great too.

The only issues is that we're stuck on an old version of vault and we want to upgrade. Is there a reason the operator can't just run a publicly available vault image? As far as I can tell all the vault API endpoints in use haven't changed between v0.9 and v0.11, and I'm not aware of any custom modifications to vault that allow the operator to function.

Is there a reason we can't support the newer versions of vault?

Vault-operator Not Creating Vault Cluster

Thanks for this project. I tried checking it out but unfortunately the operator is not able to create the vault cluster on my kubernetes 1.10.2 installation. I also do not have pod security policies setup. I am able to create the etcd-operator, vault-operator and an etcd deployment without issues. Then I apply the yaml for the vault cluster and nothing happens. I guess this might be related to IPC_LOCK which the vault-operator sets for the vault pods? How do I get this running?

Kubectl version

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Etcd Operator

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: etcd-operator
rules:
- apiGroups:
  - etcd.database.coreos.com
  resources:
  - etcdclusters
  - etcdbackups
  - etcdrestores
  verbs:
  - "*"
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - "*"
# The following permissions can be removed if not using S3 backup and TLS
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: etcd-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: etcd-operator
subjects:
- kind: ServiceAccount
  name: etcd-operator
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: etcd-operator
  namespace: default
rules:
- apiGroups:
  - etcd.database.coreos.com
  resources:
  - etcdclusters
  - etcdbackups
  - etcdrestores
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - "*"
# The following permissions can be removed if not using S3 backup and TLS
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: etcd-operator
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: etcd-operator
subjects:
- kind: ServiceAccount
  name: etcd-operator
  namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: etcd-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-operator
  labels:
    app: etcd-operator
spec:
  selector:
    matchLabels:
      app: etcd-operator
  replicas: 3
  template:
    metadata:
      labels:
        app: etcd-operator
    spec:
      serviceAccountName: etcd-operator
      containers:
      - name: etcd-operator
        image: quay.io/coreos/etcd-operator:v0.9.2
        command:
        - etcd-operator
        - -cluster-wide
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-restore-operator
  labels:
    app: etcd-restore-operator
spec:
  selector:
    matchLabels:
      app: etcd-restore-operator
  replicas: 3
  template:
    metadata:
      labels:
        app: etcd-restore-operator
    spec:
      serviceAccountName: etcd-operator
      containers:
      - name: etcd-restore-operator
        image: quay.io/coreos/etcd-operator:v0.9.2
        command:
        - etcd-restore-operator
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-backup-operator
  labels:
    app: etcd-backup-operator
spec:
  selector:
    matchLabels:
      app: etcd-backup-operator
  replicas: 3
  template:
    metadata:
      labels:
        app: etcd-backup-operator
    spec:
      serviceAccountName: etcd-operator
      containers:
      - name: etcd-backup-operator
        image: quay.io/coreos/etcd-operator:v0.9.2
        command:
        - etcd-backup-operator
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name

Vault Operator

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: vault-operator-role
rules:
- apiGroups:
  - etcd.database.coreos.com
  resources:
  - etcdclusters
  - etcdbackups
  - etcdrestores
  verbs:
  - "*"
- apiGroups:
  - vault.security.coreos.com
  resources:
  - vaultservices
  verbs:
  - "*"
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  verbs:
  - "*"
- apiGroups:
  - "" # "" indicates the core API group
  resources:
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  - configmaps
  - secrets
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - "*"

---

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: vault-operator-rolebinding
subjects:
- kind: ServiceAccount
  name: vault-operator
  namespace: default
roleRef:
  kind: Role
  name: vault-operator-role
  apiGroup: rbac.authorization.k8s.io

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: vaultservices.vault.security.coreos.com
spec:
  group: vault.security.coreos.com
  names:
    kind: VaultService
    listKind: VaultServiceList
    plural: vaultservices
    shortNames:
    - vault
    singular: vaultservice
  scope: Namespaced
  version: v1alpha1

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: etcdclusters.etcd.database.coreos.com
spec:
  group: etcd.database.coreos.com
  names:
    kind: EtcdCluster
    listKind: EtcdClusterList
    plural: etcdclusters
    shortNames:
    - etcd
    singular: etcdcluster
  scope: Namespaced
  version: v1beta2
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: etcdbackups.etcd.database.coreos.com
spec:
  group: etcd.database.coreos.com
  names:
    kind: EtcdBackup
    listKind: EtcdBackupList
    plural: etcdbackups
    singular: etcdbackup
  scope: Namespaced
  version: v1beta2
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: etcdrestores.etcd.database.coreos.com
spec:
  group: etcd.database.coreos.com
  names:
    kind: EtcdRestore
    listKind: EtcdRestoreList
    plural: etcdrestores
    singular: etcdrestore
  scope: Namespaced
  version: v1beta2

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: vault-operator

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vault-operator
  labels:
    app: vault-operator
spec:
  selector:
    matchLabels:
      app: vault-operator
  replicas: 3
  template:
    metadata:
      labels:
        app: vault-operator
    spec:
      serviceAccountName: vault-operator
      containers:
      - name: vault-operator
        image: quay.io/coreos/vault-operator:0.1.9
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name

Etcd Deployment

apiVersion: v1
kind: Namespace
metadata:
  name: vault

---

apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "vault-etcd"
  namespace: vault
  annotations:
   etcd.database.coreos.com/scope: clusterwide
spec:
  size: 3
  version: "3.2.16"
  pod:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "2379"
    nodeSelector:
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: etcd_cluster
              operator: In
              values: ["vault-etcd"]
          topologyKey: kubernetes.io/hostname
    etcdEnv:
    - name: ETCD_AUTO_COMPACTION_RETENTION
      value: "1"
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 200m
        memory: 100Mi

Vault Deployment

apiVersion: "vault.security.coreos.com/v1alpha1"
kind: "VaultService"
metadata:
  name: "central"
  namespace: vault
spec:
  nodes: 3
  version: "0.9.1-0"

Upgrade to Vault 0.10.0

Vault 0.10.0 has been out for a while and the default Vault base image used by the operator needs to be updated to 0.10.0.

The vault-operator by default uses a slightly modified base image(with the curl utility added for health checking) and is currently quite behind at v0.9.1
https://quay.io/repository/coreos/vault

consul Operator interview questions

Consul and Kubernetes Service Discovery

Is there anything consul provides that Kubernetes / Tectonic does not that you hope to use?

Consul Cluster Management

How do you operate Consul today?
Where do datastore backups go? How is the backup job ran?
How do you manage the CAs for consul clustering, and consul client?
How do you handle upgrades?
Is there a global Consul cluster? Consul cluster per DC? Consul per app?
What are you happy with?
What are you unhappy with?
Do you use any Consul Enterprise features? If so which ones? https://www.hashicorp.com/products/consul/
How do you backup Consul today?
What is the SLA of Consul? What is your scaling architecture?
How do you monitor Consul to ensure SLAs?

Integration with Apps

How do you handle application registration and configuration to use Consul?
How do applications consume Consul service discovery?
Do you use the Consul key-value store or only the service discovery features?
How do you envision consul being used with your Kubernetes clusters?
Do you intend to register pods or services with Consul or neither?
Would you you like to see auto-registration of services into Consul? If so what is your tolerance for stale data?

customer interview questions

These are questions that we want to ask customers as we explore the creation of a Vault operator.

How do you operate Vault in production today?

Where do datastore backups go? How is the backup job ran?
Do you use HA? If so why? What is the uptime SLA?
What is your process for managing the unsealing of Vault instances?
How do you manage the CAs for datastore, datastore/vault, & vault/client?
How do you handle upgrades?
How do you monitor vault? How do you monitor the data store?
Is there a global vault or multiple vaults inside of the organization?
What are you happy with?
What are you unhappy with?
Which features of Vault pro or premium - if any - do you use now or expect to use in the future? https://www.hashicorp.com/products/vault

How do application developers test against Vault?

How close to production is this setup?
What are you happy with?
What are you unhappy with?

How are applications integrated with Vault

How do you create vault/client certificates? What is the revocation policy?
Which Vault backends do you use?
Which Vault audit backends do you use?
How do you manage tokens for applications? What are the TTLS?
How do apps consume Vault secrets (e.g. Vault API, configuration files, environment variables)

Add a helm chart

Add a helm chart for vault-operator just like https://github.com/coreos/prometheus-operator/tree/master/helm/prometheus-operator. The prometheus one was of great help.

Vault cluster doesn't start on Macbook->Openshift->myproject

Hi, I'm trying out the vault+etcd-operator on a Macbook running Docker 18.03.1-ce-mac65 (24312) and Openshift origin v3.9.0. Starting from a clean installation and master branch of vault-operator and etcd-operator repos:

Roberts-MacBook-Pro:Desktop rwipfel$ ./runVault.sh
++ oc login -u system:admin
Logged into "https://127.0.0.1:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

    default
    kube-public
    kube-system
  * myproject
    openshift
    openshift-infra
    openshift-node
    openshift-web-console

Using project "myproject".
++ oc patch scc restricted -p '{"fsGroup":{"type":"RunAsAny"}}'
securitycontextconstraints "restricted" patched
++ oc patch scc restricted -p '{"runAsUser":{"type":"RunAsAny"}}'
securitycontextconstraints "restricted" patched
++ cd /Users/rwipfel/git/etcd-operator/
++ example/rbac/create_role.sh --namespace=myproject
Creating role with ROLE_NAME=etcd-operator, NAMESPACE=myproject
clusterrole.rbac.authorization.k8s.io "etcd-operator" created
Creating role binding with ROLE_NAME=etcd-operator, ROLE_BINDING_NAME=etcd-operator, NAMESPACE=myproject
clusterrolebinding.rbac.authorization.k8s.io "etcd-operator" created
++ cd /Users/rwipfel/git/vault-operator/
++ sed -e 's/<namespace>/myproject/g' -e 's/<service-account>/default/g' example/rbac-template.yaml
++ kubectl create -f example/rbac.yaml
role.rbac.authorization.k8s.io "vault-operator-role" created
rolebinding.rbac.authorization.k8s.io "vault-operator-rolebinding" created
++ kubectl create -f example/etcd_crds.yaml
customresourcedefinition.apiextensions.k8s.io "etcdclusters.etcd.database.coreos.com" created
customresourcedefinition.apiextensions.k8s.io "etcdbackups.etcd.database.coreos.com" created
customresourcedefinition.apiextensions.k8s.io "etcdrestores.etcd.database.coreos.com" created
++ kubectl create -f example/etcd-operator-deploy.yaml
deployment.extensions "etcd-operator" created
++ kubectl create -f example/vault_crd.yaml
customresourcedefinition.apiextensions.k8s.io "vaultservices.vault.security.coreos.com" created
++ kubectl create -f example/deployment.yaml
deployment.extensions "vault-operator" created
++ sleep 5
++ kubectl get deploy
NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
etcd-operator    1         1         1            0           6s
vault-operator   1         1         1            0           6s
++ kubectl create -f example/example_vault.yaml
vaultservice.vault.security.coreos.com "example" created
++ sleep 5
++ kubectl get pods
NAME                              READY     STATUS              RESTARTS   AGE
etcd-operator-7bf6b58cdf-j5sk2    3/3       Running             0          12s
vault-operator-67d5846657-bcsd2   0/1       ContainerCreating   0          12s

Roberts-MacBook-Pro:Desktop rwipfel$ kubectl get pods
NAME                              READY     STATUS    RESTARTS   AGE
etcd-operator-7bf6b58cdf-j5sk2    3/3       Running   0          40s
example-etcd-mf52q4mwlr           1/1       Running   0          9s
example-etcd-tvglk9h5fk           1/1       Running   0          25s
vault-operator-67d5846657-bcsd2   1/1       Running   0          40s

There isn't anything obviously wrong in logs. The etcd cluster is running properly.

Roberts-MacBook-Pro:Desktop rwipfel$ kubectl logs vault-operator-67d5846657-bcsd2
time="2018-05-03T14:23:25Z" level=info msg="Go Version: go1.9.2"
time="2018-05-03T14:23:25Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-05-03T14:23:25Z" level=info msg="vault-operator Version: 0.1.9"
time="2018-05-03T14:23:25Z" level=info msg="Git SHA: 43a1dd7"
ERROR: logging before flag.Parse: I0503 14:23:25.710514       1 leaderelection.go:174] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0503 14:23:25.724311       1 leaderelection.go:184] successfully acquired lease myproject/vault-operator
time="2018-05-03T14:23:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"myproject\", Name:\"vault-operator\", UID:\"8abd113d-4edd-11e8-9c89-025000000001\", APIVersion:\"v1\", ResourceVersion:\"1477\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' vault-operator-67d5846657-bcsd2 became leader"
time="2018-05-03T14:23:25Z" level=info msg="starting Vaults controller"
time="2018-05-03T14:23:25Z" level=info msg="Vault CR (myproject/example) is created"
Roberts-MacBook-Pro:Desktop rwipfel$ kubectl logs etcd-operator-7bf6b58cdf-j5sk2 etcd-operator
time="2018-05-03T14:23:21Z" level=info msg="etcd-operator Version: 0.8.3"
time="2018-05-03T14:23:21Z" level=info msg="Git SHA: 85c37511"
time="2018-05-03T14:23:21Z" level=info msg="Go Version: go1.9.2"
time="2018-05-03T14:23:21Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-05-03T14:23:21Z" level=info msg="Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"myproject", Name:"etcd-operator", UID:"887acaa2-4edd-11e8-9c89-025000000001", APIVersion:"v1", ResourceVersion:"1428", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-7bf6b58cdf-j5sk2 became leader"
2018-05-03 14:23:27.078742 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
time="2018-05-03T14:23:27Z" level=info msg="creating cluster with Spec:" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="{" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    "size": 3," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    "repository": "quay.io/coreos/etcd"," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    "version": "3.2.13"," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    "pod": {" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="        "resources": {}," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="        "etcdEnv": [" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="            {" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="                "name": "ETCD_AUTO_COMPACTION_RETENTION"," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="                "value": "1"" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="            }" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="        ]" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    }," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    "TLS": {" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="        "static": {" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="            "member": {" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="                "peerSecret": "example-etcd-peer-tls"," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="                "serverSecret": "example-etcd-server-tls"" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="            }," cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="            "operatorSecret": "example-etcd-client-tls"" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="        }" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="    }" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="}" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="cluster created with seed member (example-etcd-tvglk9h5fk)" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:27Z" level=info msg="start running..." cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:35Z" level=info msg="skip reconciliation: running ([]), pending ([example-etcd-tvglk9h5fk])" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:43Z" level=info msg="Start reconciling" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:43Z" level=info msg="running members: example-etcd-tvglk9h5fk" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:43Z" level=info msg="cluster membership: example-etcd-tvglk9h5fk" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:43Z" level=info msg="added member (example-etcd-mf52q4mwlr)" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:43Z" level=info msg="Finish reconciling" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:51Z" level=info msg="Start reconciling" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:51Z" level=info msg="running members: example-etcd-tvglk9h5fk,example-etcd-mf52q4mwlr" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:51Z" level=info msg="cluster membership: example-etcd-tvglk9h5fk,example-etcd-mf52q4mwlr" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:51Z" level=info msg="Finish reconciling" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:51Z" level=error msg="failed to reconcile: fail to add new member (example-etcd-gs9vq5sjs5): etcdserver: unhealthy cluster" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:59Z" level=info msg="Start reconciling" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:59Z" level=info msg="running members: example-etcd-mf52q4mwlr,example-etcd-tvglk9h5fk" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:59Z" level=info msg="cluster membership: example-etcd-mf52q4mwlr,example-etcd-tvglk9h5fk" cluster-name=example-etcd pkg=cluster
time="2018-05-03T14:23:59Z" level=info msg="added member (example-etcd-8xsqs4nc8j)" cluster-name=example-etcd pkg=cluster

The vault-operator shows this:

Roberts-MacBook-Pro:Desktop rwipfel$ kubectl logs vault-operator-67d5846657-bcsd2
time="2018-05-03T14:23:25Z" level=info msg="Go Version: go1.9.2"
time="2018-05-03T14:23:25Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-05-03T14:23:25Z" level=info msg="vault-operator Version: 0.1.9"
time="2018-05-03T14:23:25Z" level=info msg="Git SHA: 43a1dd7"
ERROR: logging before flag.Parse: I0503 14:23:25.710514       1 leaderelection.go:174] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0503 14:23:25.724311       1 leaderelection.go:184] successfully acquired lease myproject/vault-operator
time="2018-05-03T14:23:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"myproject\", Name:\"vault-operator\", UID:\"8abd113d-4edd-11e8-9c89-025000000001\", APIVersion:\"v1\", ResourceVersion:\"1477\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' vault-operator-67d5846657-bcsd2 became leader"
time="2018-05-03T14:23:25Z" level=info msg="starting Vaults controller"
time="2018-05-03T14:23:25Z" level=info msg="Vault CR (myproject/example) is created"

I'm not sure where to look next?

(As a guess I tried creating custom TLS certificates per https://github.com/coreos/vault-operator/blob/master/doc/user/tls_setup.md but that made no difference)

I'd be grateful for any help, and willing to contribute once I learn more about how to operate these operators :)

Don't use deployment resource for vault Upgrade?

I think the deployment resource is typically for stateless application such as web servers which there isn't any dependency between each instance; Hence upgrading using deployment's rolling upgrade is simple because the order of the upgrade doesn't matter.

However, for a cluster that has a specific ordering on how its instances are being upgraded. Deployment resource might not provide a simple and sufficient mechanism for that.

Vault cluster belongs to the latter part. Even though, the deployment resource in the vault operator is able to provide the upgrade ordering through some cleaver tricks. It is probably worth to think about how to manage a stateful application such as vault not using the deployment.

Support for VaultSecret CRD

It would be useful to have a VaultSecret CRD which only references a secret from vault. The actual vault secret would be provisioned by the operator as a Kubernetes secret whenever a VaultSecret resource is created.

In this way, no secret data would require to be stored encrypted in git. (e.g. for helm chart values using helm-secrets plugin).

I am very interested to contribute this feature, but before starting any work, I would really appreciate your feedback.

cc @rawlingsj @jstrachan

Passing spec configuration to etcd operator

Hi,

How can I define custom options for the etcd cluster. Specifically, etcd operator supports persistent volume claims, beside other stuff I might want to customize. Is there any way to pass this config to etcd operator?

Thank you,
Alexei Daniline

Wrong namespace in example/k8s_auth/vault-tokenreview-binding.yaml

It looks like the namespace on line 13 of vault-tokenreview-binding.yaml should be "default" instead of "vault-services". I have followed the documentation steps to deploy vault-operator and configure k8s auth, but failed to authenticate. Changing namespace has resolved the issue.

On using an existing etcd cluster

Hi guys,

First of all, thank you so much for open sourcing the Vault operator. This is a great milestone for the community and the overall security story of Kubernetes!

It seems that the Vault operator leverages the automation provided by the etcd operator for the Vault’s datastore, which seems like a great integration point for most people!

However, for those who deploy Vault as a base Kubernetes service, it seems that it would be beneficial to use the already existing and managed etcd cluster that supports the Kubernetes API, as it is often closely monitored (etcd metrics, node metrics, ...), has alerts set up as well, backups, disaster recovery plan, etc.

Is supporting existing secured etcd cluster something that you may consider adding to the operator in the short term?

Thanks again.