Giter VIP home page Giter VIP logo

Comments (40)

tvoran avatar tvoran commented on August 12, 2024 23

A lot of these issues sound like what we've seen happen with private GKE clusters, for example: hashicorp/vault-helm#214 (comment)

So if that matches your setup, please try adding a firewall rule to allow the master to access 8080 on the nodes: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

If it doesn't, then it would help to know where your k8s cluster is running and how it's configured. If the configurations are too varied we might need to break this up into separate issues for clarity. Cheers!

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024 6

Hi @mateipopa, these are the correct builds. Release engineering had not completed the official build pipeline, however, they are being built internally using the dev builds.

One thing you might investigate are firewall rules on your GKE nodes. We've seen similar issues with injection due to 8080 being blocked: #46

from vault-k8s.

pravargauba avatar pravargauba commented on August 12, 2024 6

A lot of these issues sound like what we've seen happen with private GKE clusters, for example: hashicorp/vault-helm#214 (comment)

So if that matches your setup, please try adding a firewall rule to allow the master to access 8080 on the nodes: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

If it doesn't, then it would help to know where your k8s cluster is running and how it's configured. If the configurations are too varied we might need to break this up into separate issues for clarity. Cheers!

This worked like a charm!!! Thanks @tvoran

from vault-k8s.

Kampe avatar Kampe commented on August 12, 2024 2

Having this same issue with GKE, opening port to my apiserver 8080 did not do the trick for me.

from vault-k8s.

mikemowgli avatar mikemowgli commented on August 12, 2024 1

I have the same issue:

  • I am running Vault chart v0.4.0
  • I verified with "openssl s_client -servername vault-agent-injector-svc.vault.svc -connect vault-agent-injector-svc.vault.svc:443" that the vault-agent-injector exposes the custom certificate and intermediate CA I provided
  • I verified that the caBundle of the MutatingWebhookConfiguration is the CA that issued the intermediate CA
  • I have the same log line in my vault-agent-monitor
    I followed this guide: https://github.com/hashicorp/vault-guides/tree/bd7eaa007d9124f87549986b070bbe19315895bb/operations/provision-vault/kubernetes/minikube/vault-agent-sidecar
  • I checked the logs of the Kubernetes API server and the Controller while recreating alternately the MutatingWebhookConfiguration, the vault-agent-injector pod, and the consumer app pod, but there is nothing related to the agent injector.
  • I changed AGENT_INJECT_LOG_LEVEL to 'debug', with no effect
  • Even changing the MutatingWebhookConfiguration failurePolicy from 'Ignore' to 'Fail' didn't prevent the consumer app pod from being started.

I wonder if the problem is between Kubernetes not contacting the webhook, or the webhook not contacting vault.

How can I troubleshoot further?

from vault-k8s.

mikemowgli avatar mikemowgli commented on August 12, 2024 1

I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.

    MutatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig
    ValidatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig

This block must be present in the master-config.yml in the section admissionConfig.pluginConfig. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.

from vault-k8s.

pksurferdad avatar pksurferdad commented on August 12, 2024 1

thx for responding @jasonodonnell. well, the init logs were certainly helpful and they led me to my problem, having an incorrect secret path. thx so much, it's working now. i am going to undo some of the AWS networking changes i made to see if they were even necessary.

vault-agent-init logs

2020-10-10T18:27:17.282Z [INFO]  sink.file: creating file sink
2020-10-10T18:27:17.282Z [INFO]  sink.file: file sink configured: path=/home/vault/.vault-token mode=-rw-r-----
2020-10-10T18:27:17.283Z [INFO]  auth.handler: starting auth handler
2020-10-10T18:27:17.283Z [INFO]  auth.handler: authenticating
2020-10-10T18:27:17.283Z [INFO]  template.server: starting template server
2020/10/10 18:27:17.283255 [INFO] (runner) creating new runner (dry: false, once: false)
2020-10-10T18:27:17.283Z [INFO]  sink.server: starting sink server
2020/10/10 18:27:17.283831 [WARN] (clients) disabling vault SSL verification
2020/10/10 18:27:17.283843 [INFO] (runner) creating watcher
2020-10-10T18:27:17.297Z [INFO]  auth.handler: authentication successful, sending token to sinks
2020-10-10T18:27:17.297Z [INFO]  auth.handler: starting renewal process
2020-10-10T18:27:17.297Z [INFO]  sink.file: token written: path=/home/vault/.vault-token
2020-10-10T18:27:17.297Z [INFO]  sink.server: sink server stopped
2020-10-10T18:27:17.297Z [INFO]  sinks finished, exiting
2020-10-10T18:27:17.297Z [INFO]  template.server: template server received new token
2020/10/10 18:27:17.297652 [INFO] (runner) stopping
2020/10/10 18:27:17.297677 [INFO] (runner) creating new runner (dry: false, once: false)
2020/10/10 18:27:17.297800 [WARN] (clients) disabling vault SSL verification
2020/10/10 18:27:17.297825 [INFO] (runner) creating watcher
2020/10/10 18:27:17.297863 [INFO] (runner) starting
2020-10-10T18:27:17.306Z [INFO]  auth.handler: renewed auth token
2020/10/10 18:27:17.314963 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 1 after "250ms")
2020/10/10 18:27:17.572730 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 2 after "500ms")
2020/10/10 18:27:18.080373 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 3 after "1s")
2020/10/10 18:27:19.088366 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 4 after "2s")
2020/10/10 18:27:21.096020 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 5 after "4s")
2020/10/10 18:27:25.104668 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 6 after "8s")
2020/10/10 18:27:33.112358 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 7 after "16s")

from vault-k8s.

agates4 avatar agates4 commented on August 12, 2024 1

Alright folks -

I codified the entire process via terraform on how to get a sidecar injector working, even on external clusters.
sethvargo/vault-on-gke#98

^ Fully documented in this PR πŸ‘

I hope this helps someone! It took me quite a bit of diving in to get this fully working out of the box!

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024

Hi @ky0shiro, based on the logs you sent, it seems like the request never made it to your injector. You would get a log entry looking like this:

2020-01-06T15:10:18.658Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s

Can you provide the following:

kubectl describe service vault-agent-injector-svc
kubectl describe mutatingwebhookconfigurations vault-agent-injector-cfg

from vault-k8s.

ky0shiro avatar ky0shiro commented on August 12, 2024

@jasonodonnell :
service:

Name:              vault-agent-injector-svc
Namespace:         my-namespace
Labels:            app.kubernetes.io/instance=vault
                   app.kubernetes.io/managed-by=Tiller
                   app.kubernetes.io/name=vault-agent-injector
Annotations:       flux.weave.works/antecedent: my-namespace:helmrelease/vault
Selector:          app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
Type:              ClusterIP
IP:                10.210.4.175
Port:              <unset>  443/TCP
TargetPort:        8080/TCP
Endpoints:         10.16.0.198:8080
Session Affinity:  None
Events:            <none>

mutatingwebhookconfigurations:

Name:         vault-agent-injector-cfg
Namespace:    
Labels:       app.kubernetes.io/instance=vault
              app.kubernetes.io/managed-by=Tiller
              app.kubernetes.io/name=vault-agent-injector
Annotations:  flux.weave.works/antecedent: my-namespace:helmrelease/vault
API Version:  admissionregistration.k8s.io/v1beta1
Kind:         MutatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2020-01-06T13:55:54Z
  Generation:          2
  Resource Version:    56445806
  Self Link:           /apis/admissionregistration.k8s.io/v1beta1/mutatingwebhookconfigurations/vault-agent-injector-cfg
  UID:                 4195285e-308c-11ea-8917-4201ac10000a
Webhooks:
  Client Config:
    Ca Bundle:  << REDACTED >>
    Service:
      Name:        vault-agent-injector-svc
      Namespace:   my-namespace
      Path:        /mutate
  Failure Policy:  Ignore
  Name:            vault.hashicorp.com
  Namespace Selector:
  Rules:
    API Groups:
      
    API Versions:
      v1
    Operations:
      CREATE
      UPDATE
    Resources:
      pods
  Side Effects:  Unknown
Events:          <none>

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024

What version of Kube are you using?

Are you using a managed Kube service such as GKE/EKS or did you deploy your own?

from vault-k8s.

ky0shiro avatar ky0shiro commented on August 12, 2024

version is 1.13.11 and it is GKE

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+70132b0f13", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024

@ky0shiro Interesting, our acceptance testing runs on a GKE cluster and is working fine. What you showed me looks correct but the request doesn't seem to make it to the injector. Do you have access to the Kube apiserver logs? I wonder if an error can be found there when Kube tries to contact the webhook.

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024

@ky0shiro Can you also provide me the output of the following command by execing into the Vault Injector container?

cat /etc/resolv.conf

from vault-k8s.

ky0shiro avatar ky0shiro commented on August 12, 2024

@jasonodonnell
Here is /etc/resolv.conf

nameserver 10.210.0.10
search my-namespace.svc.cluster.local svc.cluster.local cluster.local c.my-project.internal google.internal
options ndots:5

from vault-k8s.

ky0shiro avatar ky0shiro commented on August 12, 2024

@jasonodonnell logs from /logs/kube-apiserver.log (only lines containing injector word)
injector.txt

from vault-k8s.

popamatei avatar popamatei commented on August 12, 2024

I've upgraded my vault helm chart from 0.2.1 to 0.3.3 on a GKE cluster - everything was working fine before since I was using the vault-agent and consul-template sidecar containers to render the secrets on the pod.

Now that I've upgraded, I can't get the vault-k8s to work. Is there any chance we're somehow ending up with a development version of hashicorp/vault-k8s:v0.1.2?

I'm in the exact same situation as ky0shiro and looking at the docker files and the vault-server-agent-injector container I end up with, it seems it's loading up the development version:

$ ps xa
PID   USER     TIME  COMMAND
    1 vault     0:14 /vault-k8s agent-inject 2>&1
   62 vault     0:00 sh
   74 vault     0:00 ps xa
/ $
$ /vault-k8s --version
0.1.2-dev
/ $ curl
sh: curl: not found
/ $ 

Are these the way they should be? Maybe we're not seeing any connections on the vault-server-agent-injector because we're missing the certs altogether and the api server can't actually connect.

Thanks

from vault-k8s.

popamatei avatar popamatei commented on August 12, 2024

Hello @jasonodonnell, thanks for pointing me in the right direction. Although I didn't get any errors in stackdriver, connections were indeed blocked by the firewall. Adding a rule to allow traffic from the master to the worker nodes solved the problem and requests now reach the injector. Thanks again!

from vault-k8s.

Kampe avatar Kampe commented on August 12, 2024

Ensure all the components of the vault-injector are installed the the same namespace where you're looking to retrieve your secrets

from vault-k8s.

mikemowgli avatar mikemowgli commented on August 12, 2024

@Kampe , this would mean that in any namespace I'd like to fetch secrets from, I'd have to deploy a new vault-injector components. I can test your suggestion, but it can not be the official recommended solution, right?

from vault-k8s.

Kampe avatar Kampe commented on August 12, 2024

Check out #15 (comment)

from vault-k8s.

pravargauba avatar pravargauba commented on August 12, 2024

@ky0shiro @jasonodonnell any luck yet? I am facing exactly the same problem @ky0shiro described. Double verified everything. After applying patch annotation to the deployment, again 1 container got spawned on the new pod, and i was hoping it to be 2.(the second one to be vault sidecar one). Is this problem on specific vault charts?

from vault-k8s.

jdbohrman avatar jdbohrman commented on August 12, 2024

I'm experiencing this as well. Like @Kampe I updated the firewall to no avail. I'm getting logs almost exactly like @ky0shiro.

I'm on GKE as well.....and I'm beginning to see a pattern

from vault-k8s.

h0x91b-wix avatar h0x91b-wix commented on August 12, 2024

My 2 cents.

It happening to me when GKE replace node (upgrade/maintenance) and in my case cluster is public.

from vault-k8s.

h0x91b-wix avatar h0x91b-wix commented on August 12, 2024

Same situation right now:

❯ kubectl get nodes
NAME                                          STATUS   ROLES    AGE   VERSION
gke-ero-cluster-ero-node-pool-2cf384d7-79b3   Ready    <none>   62m   v1.16.8-gke.15
gke-ero-cluster-ero-node-pool-6430b315-un3c   Ready    <none>   20h   v1.16.8-gke.15
gke-ero-cluster-ero-node-pool-b00f5513-o3c7   Ready    <none>   21h   v1.16.8-gke.15

Google replaced 62 min ago one of nodes and then:

❯ kubectl get pods
NAME                                   READY   STATUS             RESTARTS   AGE
ero-app-d85b548c4-bfk9s                2/2     Running            0          21h
ero-app-d85b548c4-df864                0/1     CrashLoopBackOff   25         62m
ero-app-d85b548c4-jv9b5                0/1     CrashLoopBackOff   25         62m
ero-app-d85b548c4-nr7b4                0/1     CrashLoopBackOff   25         62m
ero-app-d85b548c4-q5q4j                0/1     CrashLoopBackOff   25         62m
ero-app-d85b548c4-x4zbj                2/2     Running            0          21h

To recover it I need to scale deployment to 2 and then back to 6, this happen on each kuberenets node replacement.
This bug happens almost every day, tell me if you want me to run something next time when it occurs...

from vault-k8s.

h0x91b-wix avatar h0x91b-wix commented on August 12, 2024

The reason for such behaviour - hashicorp/vault-helm#238

vault-agent-injector was also recreated and all rescheduled pods are rescheduled without vault container inside.

from vault-k8s.

rchenzheng avatar rchenzheng commented on August 12, 2024

I'm using the latest version of the vault-helm chart 0.6.0 and this issue still seems to be happening kubernetes: v1.15.11-gke.5

However, unlike @ky0shiro I am getting the handlers as I whitelisted 8080

β”‚ 2020-06-22T18:47:00.891Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s                                                                                                          β”‚
β”‚ 2020-06-22T18:47:00.893Z [DEBUG] handler: checking if should inject agent..                                                                                                                              

from vault-k8s.

rchenzheng avatar rchenzheng commented on August 12, 2024

Looks like I had to put the annotations in the right spot.

Annotations: https://www.vaultproject.io/docs/platform/k8s/injector/examples

from vault-k8s.

HariNarayananMohan avatar HariNarayananMohan commented on August 12, 2024

I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.

    MutatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig
    ValidatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig

This block must be present in the master-config.yml in the section admissionConfig.pluginConfig. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.

@mikemowgli Thanks for the info. I added those block of lines in the master-config.yaml, but my openshift cluster says is not enabled in its logs. can you tell me how did u enable it ?

I0719 07:28:30.711521       1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I0719 07:28:31.408404       1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I0719 07:28:32.187811       1 register.go:151] Admission plugin MutatingAdmissionWebhook is not enabled.  It will not be started.
I0719 07:28:32.361736       1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"

from vault-k8s.

HariNarayananMohan avatar HariNarayananMohan commented on August 12, 2024

I was able to make it work, I had to restart few services in the master node after making these changes. I followed this link
https://access.redhat.com/solutions/3869391

from vault-k8s.

bienkma avatar bienkma commented on August 12, 2024

Open 8080,443 ports in your VPC's Firewall resolving the problem. It work fine for me.

from vault-k8s.

pksurferdad avatar pksurferdad commented on August 12, 2024

I've been fighting this same issue for several days as well where agent injector does not finish intializing and the container does not start. I'm ruuming on an AWS eks cluster and if it's a port issue between the control plane and the nodes, anyone know how to enable 8080 on AWS eks?

from vault-k8s.

pksurferdad avatar pksurferdad commented on August 12, 2024

following these instructions from AWS, i added in inbound rule to the node security group to allow all TCP traffic ports 0 - 65535 from the control plane security group, but no luck with the sample deployment initializing. below is some log data from vault and the vault injector as well as a kubectl describe of the sample deployment. could definitely use some guidance on how to troubleshoot this further

Vault Logs

identity: creating a new entity: alias="id:"892e70b6-508d-6b11-0fe7-4e3d273cb868" canonical_id:"fc7332af-4001-bc99-9252-eaebaa41b826" mount_type:"kubernetes" mount_accessor:"auth_kubernetes_edb6b310" mount_path:"auth/kubernetes/" metadata:{key:"service_account_name" value:"vault-auth"} metadata:{key:"service_account_namespace" value:"default"} metadata:{key:"service_account_secret_name" value:"vault-auth-token-jl7h4"} metadata:{key:"service_account_uid" value:"1a40077e-f3ae-4953-bc6d-9f742d0278d2"} name:"1a40077e-f3ae-4953-bc6d-9f742d0278d2" creation_time:{seconds:1602346759 nanos:237072421} last_update_time:{seconds:1602346759 nanos:237072421} namespace_id:"root""

Injector Logs

Registering telemetry path on "/metrics"
2020-10-10T16:18:12.311Z [INFO]  handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...
2020-10-10T16:19:14.074Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-10-10T16:19:14.076Z [DEBUG] handler: checking if should inject agent..
2020-10-10T16:19:14.076Z [DEBUG] handler: checking namespaces..
2020-10-10T16:19:14.076Z [DEBUG] handler: setting default annotations..
2020-10-10T16:19:14.077Z [DEBUG] handler: creating new agent..
2020-10-10T16:19:14.077Z [DEBUG] handler: validating agent configuration..
2020-10-10T16:19:14.077Z [DEBUG] handler: creating patches for the pod..

kubectl describe from the sample deployment

Name:           app-d6d9b9755-2l856
Namespace:      default
Priority:       0
Node:           ip-192-168-26-157.ec2.internal/192.168.26.157
Start Time:     Sat, 10 Oct 2020 11:19:14 -0500
Labels:         app=vault-agent-demo
                pod-template-hash=d6d9b9755
Annotations:    kubernetes.io/psp: eks.privileged
                vault.hashicorp.com/agent-inject: true
                vault.hashicorp.com/agent-inject-secret-poc-secret: secrets/dev/poc-secret
                vault.hashicorp.com/agent-inject-status: injected
                vault.hashicorp.com/role: app-user
                vault.hashicorp.com/tls-skip-verify: true
Status:         Pending
IP:             192.168.9.35
Controlled By:  ReplicaSet/app-d6d9b9755
Init Containers:
  vault-agent-init:
    Container ID:  docker://6ab5f0688f5dea6a416fa5ad8fc5395675ebba37ea1f54a1b4f7e1b56d4cb768
    Image:         vault:1.5.2
    Image ID:      docker-pullable://vault@sha256:9aa46d9d9987562013bfadce166570e1705de619c9ae543be7c61953f3229923
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -ec
    Args:
      echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json
    State:          Running
      Started:      Sat, 10 Oct 2020 11:19:19 -0500
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     250m
      memory:  64Mi
    Environment:
      VAULT_LOG_LEVEL:  info
      VAULT_CONFIG:     eyJhdXRvX2F1dGgiOnsibWV0aG9kIjp7InR5cGUiOiJrdWJlcm5ldGVzIiwibW91bnRfcGF0aCI6ImF1dGgva3ViZXJuZXRlcyIsImNvbmZpZyI6eyJyb2xlIjoiYXBwLXVzZXIifX0sInNpbmsiOlt7InR5cGUiOiJmaWxlIiwiY29uZmlnIjp7InBhdGgiOiIvaG9tZS92YXVsdC8udmF1bHQtdG9rZW4ifX1dfSwiZXhpdF9hZnRlcl9hdXRoIjp0cnVlLCJwaWRfZmlsZSI6Ii9ob21lL3ZhdWx0Ly5waWQiLCJ2YXVsdCI6eyJhZGRyZXNzIjoiaHR0cHM6Ly92YXVsdC52YXVsdC5zdmM6ODIwMCIsInRsc19za2lwX3ZlcmlmeSI6dHJ1ZX0sInRlbXBsYXRlIjpbeyJkZXN0aW5hdGlvbiI6Ii92YXVsdC9zZWNyZXRzL3BvYy1zZWNyZXQiLCJjb250ZW50cyI6Int7IHdpdGggc2VjcmV0IFwic2VjcmV0cy9kZXYvcG9jLXNlY3JldFwiIH19e3sgcmFuZ2UgJGssICR2IDo9IC5EYXRhIH19e3sgJGsgfX06IHt7ICR2IH19XG57eyBlbmQgfX17eyBlbmQgfX0iLCJsZWZ0X2RlbGltaXRlciI6Int7IiwicmlnaHRfZGVsaW1pdGVyIjoifX0ifV19
    Mounts:
      /home/vault from home-init (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
      /vault/secrets from vault-secrets (rw)
Containers:
  app:
    Container ID:   
    Image:          jweissig/app:0.0.1
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
      /vault/secrets from vault-secrets (rw)
  vault-agent:
    Container ID:  
    Image:         vault:1.5.2
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -ec
    Args:
      echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     250m
      memory:  64Mi
    Environment:
      VAULT_LOG_LEVEL:  info
      VAULT_CONFIG:     eyJhdXRvX2F1dGgiOnsibWV0aG9kIjp7InR5cGUiOiJrdWJlcm5ldGVzIiwibW91bnRfcGF0aCI6ImF1dGgva3ViZXJuZXRlcyIsImNvbmZpZyI6eyJyb2xlIjoiYXBwLXVzZXIifX0sInNpbmsiOlt7InR5cGUiOiJmaWxlIiwiY29uZmlnIjp7InBhdGgiOiIvaG9tZS92YXVsdC8udmF1bHQtdG9rZW4ifX1dfSwiZXhpdF9hZnRlcl9hdXRoIjpmYWxzZSwicGlkX2ZpbGUiOiIvaG9tZS92YXVsdC8ucGlkIiwidmF1bHQiOnsiYWRkcmVzcyI6Imh0dHBzOi8vdmF1bHQudmF1bHQuc3ZjOjgyMDAiLCJ0bHNfc2tpcF92ZXJpZnkiOnRydWV9LCJ0ZW1wbGF0ZSI6W3siZGVzdGluYXRpb24iOiIvdmF1bHQvc2VjcmV0cy9wb2Mtc2VjcmV0IiwiY29udGVudHMiOiJ7eyB3aXRoIHNlY3JldCBcInNlY3JldHMvZGV2L3BvYy1zZWNyZXRcIiB9fXt7IHJhbmdlICRrLCAkdiA6PSAuRGF0YSB9fXt7ICRrIH19OiB7eyAkdiB9fVxue3sgZW5kIH19e3sgZW5kIH19IiwibGVmdF9kZWxpbWl0ZXIiOiJ7eyIsInJpZ2h0X2RlbGltaXRlciI6In19In1dfQ==
    Mounts:
      /home/vault from home-sidecar (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
      /vault/secrets from vault-secrets (rw)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  vault-auth-token-jl7h4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vault-auth-token-jl7h4
    Optional:    false
  home-init:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  home-sidecar:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  vault-secrets:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      Memory
    SizeLimit:   <unset>
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age        From                                     Message
  ----    ------     ----       ----                                     -------
  Normal  Scheduled  <unknown>  default-scheduler                        Successfully assigned default/app-d6d9b9755-2l856 to ip-192-168-26-157.ec2.internal
  Normal  Pulling    12m        kubelet, ip-192-168-26-157.ec2.internal  Pulling image "vault:1.5.2"
  Normal  Pulled     12m        kubelet, ip-192-168-26-157.ec2.internal  Successfully pulled image "vault:1.5.2"
  Normal  Created    12m        kubelet, ip-192-168-26-157.ec2.internal  Created container vault-agent-init
  Normal  Started    12m        kubelet, ip-192-168-26-157.ec2.internal  Started container vault-agent-init

from vault-k8s.

jasonodonnell avatar jasonodonnell commented on August 12, 2024

@pksurferdad This all looks good, it did indeed inject. You need to check the vault-agent-init logs to see what's wrong (likely permissions with your Vault role).

kubectl logs <your app pod> -c vault-agent-init

I see you're trying to get a KV secret. Which version of KV is this (1 or 2)? If you're not sure provide the output from:

vault secrets list -detailed

Additionally you should provide the policy that you're attaching to app-user so I can verify you have the correct permissions.

If you're getting login, permission denied errors, there could be something wrong from Vault's end (like the K8s auth method wasn't configured correctly). Please provide the Vault server logs.

from vault-k8s.

pksurferdad avatar pksurferdad commented on August 12, 2024

i also confirmed that the AWS security group changes i made here #32 (comment) were not necessary. looks like the default AWS EKS cluster deployment using eksctl doesn't require any additional inbound or outbound security group rules.

from vault-k8s.

aRestless avatar aRestless commented on August 12, 2024

I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.

    MutatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig
    ValidatingAdmissionWebhook:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig

This block must be present in the master-config.yml in the section admissionConfig.pluginConfig. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.

I too am running an OpenShift 3.11. The resulting error @mikemowgli hinted at here that comes up if the privileged SCC isn't set is Error creating: pods "<podname>" is forbidden: unable to validate against any pod security policy: [].

Adding the privileged SCC to the pod's service account worked for me, but that's nothing for production. The cluster-admin permission implies the privileged SCC, this is why adding that role also works.

Upon further investigation I'm convinced this relates to kubernetes/kubernetes#65716 and may have been changed in newer Kubernetes versions. The way I understand it, there are multiple hooks being called before Kubernetes spins up the pod, and the last in that row is the hook that checks against the Pod Security Policy, meaning answering the question "is this pod allowed to run in this configuration that may have been altered by the other hooks?".

Apparently on OpenShift 3.11 / Kubernetes 1.11, while or after executing the MutatingAdmissionWebhook, the securityContext of the resulting pod is lost or not available. This also explains why the list of pod security policies is empty ([]) in the error message.

Knowing from @mikemowgli's answer that it could be fixed with an SCC, I have played around and to avoid the error, the requiredDropCapabilities property in the SCC must be empty. It is not a specific entry in the list that makes the check fail, I think it is that if there is any entry in the list, a check is being executed that is then missing the aforementioned context.

I was able to copy the restricted SCC, set requiredDropCapabilities: [], assign the SCC to my pod's service account and the pod with the injector came up.

This is not as bad as assigning the privileged SCC, but it certainly has its security implications and I'm not sure yet if that's okay for production. The capabilities being dropped by default are SET_UID, SET_GID, MKNOD, and KILL.

If anyone could shed more light on this that would be great. Otherwise there's probably nothing else left than upgrading to OpenShift 4.x to use the vault injector.

from vault-k8s.

agates4 avatar agates4 commented on August 12, 2024

hey @jasonodonnell , I see you are a great resource for updating configurations

Intention

I am working on getting all the vault setup sets (all that is left is to include the injector), fully automated by terraform.
sethvargo/vault-on-gke#98

Problem

The above PR shows my changes to getting the vault-injector up and running via this terraform project.

I added this firewall rule: https://github.com/sethvargo/vault-on-gke/pull/98/files#diff-833c22bd299aef6aabfe1b427e9ee5f6fe6ca27f9f54ef81f2fb9fb32a5ddb8dR389-R406
which allows mutating requests to come into the sidecar injector:

2021-08-13T15:33:42.096Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-13T15:33:42.105Z [DEBUG] handler: checking if should inject agent..
2021-08-13T15:33:42.106Z [DEBUG] handler: checking namespaces..
2021-08-13T15:33:42.106Z [DEBUG] handler: setting default annotations..
2021-08-13T15:33:42.106Z [DEBUG] handler: creating new agent..
2021-08-13T15:33:42.107Z [DEBUG] handler: validating agent configuration..
2021-08-13T15:33:42.107Z [DEBUG] handler: creating patches for the pod..

however, no patches were made to the pod

Annotations:  cni.projectcalico.org/podIP: 10.0.94.28/32
              cni.projectcalico.org/podIPs: 10.0.94.28/32
              vault.hashicorp.com/agent-inject: true
              vault.hashicorp.com/agent-inject-secret-foo: secret/foo
              vault.hashicorp.com/role: internal-app

^ the pod still has the same annotations, no additional annotations added, and no secrets injected.

To replicate

on this pr, sethvargo/vault-on-gke#98, clone locally, run the READMD instructions

then, run these CLI commands (after the READMD export env variables instructions)

# enable secrets, add a secret, write a new policy
vault secrets enable -path=secret -version=2 kv
vault kv put secret/foo a=b
vault policy write internal-app - <<EOH
path "secret/*" {
  capabilities = ["read"]
}
EOH

# get into the vault container
gcloud container clusters get-credentials vault --region us-central1
kubectl exec -n vault -it vault-0 --container vault /bin/sh

-- inside container --
# enable service to service auth via kubernetes
export VAULT_TOKEN=β€œput in master token”
vault auth enable kubernetes
vault write auth/kubernetes/config \
    token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
    kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
    kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
--

# add a specific 
vault write auth/kubernetes/role/internal-app \
    bound_service_account_names=internal-app \
    bound_service_account_namespaces=vault \
    policies=internal-app \
    ttl=24h

and then, I simply deploy this helm app which defines the annotations
https://github.com/agates4/sample-vault-helm-template
inside the repo,
helm install python-service .

then I check all the logs and see the problem I first described above ^

Hypothesis

  1. maybe the vault address I listed for the vault injector app is wrong
  2. maybe the vault injector is blocked by firewall rules to the vault app
  3. random config mess ups?

My ask

@jasonodonnell, do you think you can help me update this terraform project to be working fully out of the box? can you point me in the right directions?

Thank you!

from vault-k8s.

agates4 avatar agates4 commented on August 12, 2024

UPDATE

The problem was the MutatingWebhookConfiguration I created in terraform was using

admission_review_versions = ["v1", "v1beta"]

when it should be using

admission_review_versions = ["v1beta1"]

thanks to
https://githubmemory.com/repo/hashicorp/vault-k8s/issues?cursor=Y3Vyc29yOnYyOpK5MjAyMS0wNS0yMVQxNjozNjoyOSswODowMM41g65h&pagination=next&page=2

now the sidecar injects a vault-init container within the deployed python-starter pod.

however I am getting this error on the init container:

error authenticating: error="context deadline exceeded" backoff=1s

and this error means the init pod is stuck forever and the python-starter app is never fully deployed or ready ..

diving into this..

from vault-k8s.

pcgeek86 avatar pcgeek86 commented on August 12, 2024

I'm having the same symptom, however in my case, the MutatingWebhookConfiguration resource is never getting created by the Helm Chart release.

PS > kubectl get mutatingwebhookconfigurations
NAME                                    WEBHOOKS   AGE
linkerd-proxy-injector-webhook-config   1          46d
linkerd-tap-injector-webhook-config     1          46d
webhook.pipeline.tekton.dev             1          6d20h

As you can see from the above output, I ran kubectl from PowerShell, and there is no mutating webhook for Vault, even though I installed it with the Helm Chart.

EDIT: The issue, at least in my case, was that I had installed the Vault Helm Chart in different namespaces, and had deleted one of them. That caused the MutatingWebhookConfiguration to be deleted, even though I still had a valid Helm release in a different namespace.

from vault-k8s.

tomhjp avatar tomhjp commented on August 12, 2024

I'm going to close this as it seems the original issue is resolved. Please feel free to post in our discuss forum if anyone is still having issues debugging their deployment: https://discuss.hashicorp.com/c/vault/30

from vault-k8s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.