Giter VIP home page Giter VIP logo

kubergrunt's Introduction

Maintained by Gruntwork.io GitHub tag (latest SemVer)

kubergrunt

kubergrunt is a standalone go binary with a collection of commands that attempts to fill in the gaps between Terraform, Helm, and Kubectl for managing a Kubernetes Cluster.

Some of the features of kubergrunt include:

  • Configuring kubectl to authenticate with a given EKS cluster. Learn more about authenticating kubectl to EKS in the our production deployment guide.
  • Managing Helm and associated TLS certificates on any Kubernetes cluster.
  • Setting up Helm client with TLS certificates on any Kubernetes cluster.
  • Generating TLS certificate key pairs and storing them as Kubernetes Secrets on any Kubernetes cluster.

Installation

The binaries are all built as part of the CI pipeline on each release of the package, and is appended to the corresponding release in the Releases Page. You can download the corresponding binary for your platform from the releases page.

Alternatively, you can install kubergrunt using the Gruntwork Installer. For example, to install version v0.5.13:

gruntwork-install --binary-name "kubergrunt" --repo "https://github.com/gruntwork-io/kubergrunt" --tag "v0.5.13"

3rd party package managers

Note that third-party Kubergrunt packages may not be updated with the latest version, but are often close. Please check your version against the latest available on the Releases Page.

Chocolatey (Windows):

choco install kubergrunt

AWS CLI authentication

You need to authenticate with the AWS CLI before you can run kubergrunt/eksctl commands, please see our authentication guide

Building from source

The main package is in cmd. To build the binary, you can run:

go build -o bin/kubergrunt ./cmd

If you need to set a version on the binary (so that kubergrunt --version works), you use ldflags to set the version string on the compiled binary:

go build -o kubergrunt -ldflags "-X main.VERSION=v0.7.6 -extldflags '-static'" ./cmd

Commands

The following commands are available as part of kubergrunt:

  1. eks
  2. k8s
  3. tls
  4. Deprecated commands

eks

The eks subcommand of kubergrunt is used to setup the operator machine to interact with a Kubernetes cluster running on EKS.

verify

This subcommand verifies that the specified EKS cluster is up and ready. An EKS cluster is considered ready when:

  • The cluster status reaches ACTIVE state.
  • The cluster Kubernetes API server endpoint responds to http requests.

When passing --wait to the command, this command will wait until the EKS cluster reaches the ready state, or it times out. The timeout parameters are configurable with the --max-retries and --sleep-between-retries options, where --max-retries specifies the number of times the command will try to verify a specific condition before giving up, and --sleep-between-retries specifies the duration of time (e.g 10m = 10 minutes) to wait between each trial. So for example, if you ran the command:

kubergrunt eks verify --eks-cluster-arn $EKS_CLUSTER_ARN --wait --max-retries 10 --sleep-between-retries 15s

and the cluster was not active yet, this command will query the AWS API up to 10 times, waiting 15 seconds inbetween each try for a total of 150 seconds (2.5 minutes) before timing out.

Run kubergrunt eks verify --help to see all the available options.

Similar Commands:

  • AWS CLI (aws eks wait): This command will wait until the EKS cluster reaches the ACTIVE state. Note that oftentimes the Kubernetes API endpoint has a delay in accepting traffic even after reaching the ACTIVE state. We have observed it take up to 1.5 minutes after the cluster becomes ACTIVE before we can have a valid TCP connection with the Kubernetes API endpoint.

configure

This subcommand will setup the installed kubectl with config contexts that will allow it to authenticate to a specified EKS cluster by leveraging the kubergrunt eks token command. This binary is designed to be used as part of one of the modules in the package, although this binary supports running as a standalone binary. For example, this binary might be used to setup a new operator machine to be able to talk to an existing EKS cluster.

For example to setup a kubectl install on an operator machine to authenticate with EKS:

kubergrunt eks configure --eks-cluster-arn $EKS_CLUSTER_ARN

Run kubergrunt eks configure --help to see all the available options.

Similar Commands:

  • AWS CLI (aws eks update-kubeconfig): This command will configure kubeconfig in a similar manner. Instead of using kubergrunt eks token, this version will use the get-token subcommand built into the AWS CLI.

token

This subcommand is used by kubectl to retrieve an authentication token using the AWS API authenticated with IAM credentials. This token is then used to authenticate to the Kubernetes cluster. This command embeds the aws-iam-authenticator tool into kubergrunt so that operators don't have to install a separate tool to manage authentication into Kubernetes.

The configure subcommand of kubergrunt eks assumes you will be using this method to authenticate with the Kubernetes cluster provided by EKS. If you wish to use aws-iam-authenticator instead, replace the auth info clause of the kubectl config context.

This subcommand also supports outputting the token in a format that is consumable by terraform as an external data source when you pass in the --as-tf-data CLI arg. You can then pass the token directly into the kubernetes provider configuration. For example:

# NOTE: Terraform does not allow you to interpolate resources in a provider config. We work around this by using the
# template_file data source as a means to compute the resource interpolations.
provider "kubernetes" {
  load_config_file       = false
  host                   = "${data.template_file.kubernetes_cluster_endpoint.rendered}"
  cluster_ca_certificate = "${base64decode(data.template_file.kubernetes_cluster_ca.rendered)}"
  token                  = "${lookup(data.external.kubernetes_token.result, "token_data")}"
}

data "template_file" "kubernetes_cluster_endpoint" {
  template = "${module.eks_cluster.eks_cluster_endpoint}"
}

data "template_file" "kubernetes_cluster_ca" {
  template = "${module.eks_cluster.eks_cluster_certificate_authority}"
}

data "external" "kubernetes_token" {
  program = ["kubergrunt", "--loglevel", "error", "eks", "token", "--as-tf-data", "--cluster-id", "${module.eks_cluster.eks_cluster_name}"]
}

This will configure the kubernetes provider in Terraform without setting up kubeconfig, allowing you to do everything in Terraform without side effects to your local machine.

Similar Commands:

  • AWS CLI (aws eks get-token): This command will do the same thing, but does not provide any specific optimizations for terraform.
  • Terraform aws_eks_cluster_auth data source: This data source can be used to retrieve a temporary auth token for EKS in Terraform. This can only be used in Terraform.
  • aws-iam-authenticator: This is a standalone binary that can be used to fetch a temporary auth token.

oidc-thumbprint

This subcommand will take the EKS OIDC Issuer URL and retrieve the root CA thumbprint. This is used to set the trust relation for any certificates signed by that CA for the issuer domain. This is necessary to setup the OIDC provider, which is used for the IAM Roles for Service Accounts feature of EKS.

You can read more about the general procedure for retrieving the root CA thumbprint of an OIDC Provider in the official documentation.

To retrieve the thumbprint, call the command with the issuer URL:

kubergrunt eks oidc-thumbprint --issuer-url $ISSUER_URL

This will output the thumbprint to stdout in JSON format, with the key thumbprint.

Run kubergrunt eks oidc-thumbprint --help to see all the available options.

Similar Commands:

  • You can use openssl to retrieve the thumbprint as described by the official documentation.
  • eksctl provides routines for directly configuring the OIDC provider so you don't need to retrieve the thumbprint.

deploy

This subcommand will initiate a rolling deployment of the current AMI config to the EC2 instances in your EKS cluster. This command will not deploy or update an application deployed on your Kubernetes cluster (e.g Deployment resource, Pod resource, etc). We provide helm charts that you can use to deploy your applications on to a Kubernetes cluster. See our helm-kubernetes-services repo for more info. Instead, this command is for managing and deploying an update to the EC2 instances underlying your EKS cluster.

Terraform and AWS do not provide a way to automatically roll out a change to the Instances in an EKS Cluster. Due to Terraform limitations (see here for a discussion), there is currently no way to implement this purely in Terraform code. Therefore, we've created this subcommand that can do a zero-downtime roll out for you.

To deploy a change (such as rolling out a new AMI) to all EKS workers using this command:

  1. Make sure the cluster_max_size is at least twice the size of cluster_min_size. The extra capacity will be used to deploy the updated instances.
  2. Update the Terraform code with your changes (e.g. update the cluster_instance_ami variable to a new AMI).
  3. Run terraform apply.
  4. Run the command:
kubergrunt eks deploy --region REGION --asg-name ASG_NAME

When you call the command, it will:

  1. Double the desired capacity of the Auto Scaling Group that powers the EKS Cluster. This will launch new EKS workers with the new launch configuration.
  2. Wait for the new nodes to be ready for Pod scheduling in Kubernetes. This includes waiting for the new nodes to be registered to any external load balancers managed by Kubernetes.
  3. Cordon the old instances in the ASG so that they won't schedule new Pods.
  4. Drain the pods scheduled on the old EKS workers (using the equivalent of kubectl drain), so that they will be rescheduled on the new EKS workers.
  5. Wait for all the pods to migrate off of the old EKS workers.
  6. Set the desired capacity down to the original value and remove the old EKS workers from the ASG.

Note that to minimize service disruption from this command, your services should setup a PodDisruptionBudget, a readiness probe that fails on container shutdown events, and implement graceful handling of SIGTERM in the container. You can learn more about these features in our blog post series covering them.

Currently kubergrunt does not implement any checks for these resources to be implemented. However in the future, we plan to bake in checks into the deployment command to verify that all services have a disruption budget set, and warn the user of any services that do not have a check.

eks deploy recovery file

Due to the nature of rolling update, the deploy subcommand performs multiple sequential actions that depend on success of the previous operations. To mitigate intermittent failures, the deploy subcommand creates a recovery file in the working directory for storing current deploy state. The recovery file is updated after each stage and if the deploy subcommand fails for some reason, execution resumes from the last successful state. The existing recovery file can also be ignored with the --ignore-recovery-file flag. In this case the recovery file will be re-initialized.

sync-core-components

This subcommand will sync the core components of an EKS cluster to match the deployed Kubernetes version by following the steps listed in the official documentation.

The core components managed by this command are:

  • kube-proxy
  • Amazon VPC CNI plug-in
  • CoreDNS

By default, this command will rotate the images without waiting for the Pods to be redeployed. You can use the --wait option to force the command to wait until all the Pods have been replaced.

Example:

kubergrunt eks sync-core-components --eks-cluster-arn EKS_CLUSTER_ARN

cleanup-security-group

This subcommand cleans up the leftover AWS-managed security groups that are associated with an EKS cluster you intend to destroy. It accepts

  • --eks-cluster-arn: the ARN of the EKS cluster
  • --security-group-id: a known security group ID associated with the EKS cluster
  • --vpc-id: the VPC ID where the cluster is located

It also looks for other security groups associated with the EKS cluster, such as the security group created by the AWS Load Balancer Controller. To safely delete these resources, it detaches and deletes any associated AWS Elastic Network Interfaces.

Example:

kubergrunt eks cleanup-security-group --eks-cluster-arn EKS_CLUSTER_ARN --security-group-id SECURITY_GROUP_ID \
--vpc-id VPC_ID

schedule-coredns

This subcommand can be used to toggle the CoreDNS service between scheduling on Fargate and EC2 worker types. During the creation of an EKS cluster that uses Fargate, schedule-coredns fargate will annotate the deployment so that CoreDNS can find and allow EKS to use Fargate nodes. To switch back to EC2, you can run schedule-coredns ec2 to reset the annotations such that EC2 nodes can be found by CoreDNS.

This command is useful when creating Fargate only EKS clusters. By default, EKS will schedule the coredns service assuming EC2 workers. You can use this command to force the service to run on Fargate.

You can also use this command in local-exec provisioners on an aws_eks_fargate_profile resource so you can schedule the CoreDNS service after creating the profile, and revert back when destroying the profile.

Currently fargate and ec2 are the only subcommands that schedule-coredns accepts.

Examples:

kubergrunt eks schedule-coredns fargate --eks-cluster-name EKS_CLUSTER_NAME --fargate-profile-arn FARGATE_PROFILE_ARN
kubergrunt eks schedule-coredns ec2 --eks-cluster-name EKS_CLUSTER_NAME --fargate-profile-arn FARGATE_PROFILE_ARN

drain

This subcommand can be used to drain Pods from the instances in the provided Auto Scaling Groups. This can be used to gracefully retire existing Auto Scaling Groups by ensuring the Pods are evicted in a manner that respects disruption budgets.

You can read more about the drain operation in the official documentation.

To drain the Auto Scaling Group my-asg in the region us-east-2:

kubergrunt eks drain --asg-name my-asg --region us-east-2

You can drain multiple ASGs by providing the --name option multiple times:

kubergrunt eks drain --asg-name my-asg-a --name my-asg-b --name my-asg-c --region us-east-2

k8s

The k8s subcommand of kubergrunt includes commands that directly interact with the Kubernetes resources.

wait-for-ingress

This subcommand waits for the Ingress endpoint to be provisioned. This will monitor the Ingress resource, continuously checking until the endpoint is allocated to the Ingress resource or times out. By default, this will try for 5 minutes (max retries 60 and time betweeen sleep of 5 seconds).

You can configure the timeout settings using the --max-retries and --sleep-between-retries CLI args. This will check for --max-retries times, sleeping for --sleep-between-retries inbetween tries.

For example, if you ran the command:

kubergrunt k8s wait-for-ingress \
    --ingress-name $INGRESS_NAME \
    --namespace $NAMESPACE \
    --max-retries 10 \
    --sleep-between-retries 15s

this command will query the Kubernetes API to check the Ingress resource up to 10 times, waiting for 15 seconds inbetween each try for a total of 150 seconds (2.5 minutes) before timing out.

Run kubergrunt k8s wait-for-ingress --help to see all the available options.

kubectl

This subcommand will call out to kubectl with a temporary file that acts as the kubeconfig, set up with the parameters --kubectl-server-endpoint, --kubectl-certificate-authority, --kubectl-token. Unlike using kubectl directly, this command allows you to pass in the base64 encoded certificate authority data directly as opposed to as a file.

To forward args to kubectl, pass all the args you wish to forward after a --. For example, the following command runs kubectl get pods -n kube-system:

kubergrunt k8s kubectl \
  --kubectl-server-endpoint $SERVER_ENDPOINT \
  --kubectl-certificate-authority $SERVER_CA \
  --kubectl-token $TOKEN \
  -- get pods -n kube-system

Run kubergrunt k8s kubectl --help to see all the available options.

tls

The tls subcommand of kubergrunt is used to manage TLS certificate key pairs as Kubernetes Secrets.

gen

This subcommand will generate new TLS certificate key pairs based on the provided configuration arguments. Once the certificates are generated, they will be stored on your targeted Kubernetes cluster as Secrets. This supports features such as:

  • Generating a new CA key pair and storing the generated key pair in your Kubernetes cluster.
  • Issuing a new signed TLS certificate key pair using an existing CA stored in your Kubernetes cluster.
  • Replacing the stored certificate key pair in your Kubernetes cluster with a newly generated one.
  • Controlling which Namespace the Secrets are stored in.

For example, to generate a new CA key pair, issue a TLS certificate key pair, storing each of those as the Secrets ca-keypair and tls-keypair respectively:

# Generate the CA key pair
kubergrunt tls gen \
    --namespace kube-system \
    --secret-name ca-keypair \
    --ca \
    --tls-common-name kiam-ca \
    --tls-org Gruntwork \
    --tls-org-unit IT \
    --tls-city Phoenix \
    --tls-state AZ \
    --tls-country US \
    --secret-annotation "gruntwork.io/version=v1"
# Generate a signed TLS key pair using the previously created CA
kubergrunt tls gen \
    --namespace kube-system \
    --secret-name tls-keypair \
    --ca-secret-name ca-keypair \
    --tls-common-name kiam-server \
    --tls-org Gruntwork \
    --tls-org-unit IT \
    --tls-city Phoenix \
    --tls-state AZ \
    --tls-country US \
    --secret-annotation "gruntwork.io/version=v1"

The first command will generate a CA key pair and store it as the Secret ca-keypair. The --ca argument signals to kubergrunt that the TLS certificate is for a CA.

The second command uses the generated CA key pair to issue a new TLS key pair. The --ca-secret-name signals kubergrunt to use the CA key pair stored in the Kubernetes Secret ca-keypair.

This command should be run by a cluster administrator to ensure access to the Secrets are tightly controlled.

See the command help for all the available options: kubergrunt tls gen --help.

Deprecated commands

helm

The helm subcommand contained utilities for managing Helm v2, and is not necessary for Helm v3. This subcommand was removed as of kubergrunt version v0.6.0 with Helm v2 reaching end of life.

Who maintains this project?

kubergrunt is maintained by Gruntwork. If you are looking for help or commercial support, send an email to [email protected].

Gruntwork can help with:

  • Setup, customization, and support for this project.
  • Modules and submodules for other types of infrastructure in major cloud providers, such as VPCs, Docker clusters, databases, and continuous integration.
  • Modules and Submodules that meet compliance requirements, such as HIPAA.
  • Consulting & Training on AWS, GCP, Terraform, and DevOps.

How do I contribute?

Contributions are very welcome! Check out the Contribution Guidelines for instructions.

How is this project versioned?

This project follows the principles of Semantic Versioning. You can find each new release, along with the changelog, in the Releases Page.

During initial development, the major version will be 0 (e.g., 0.x.y), which indicates the code does not yet have a stable API. Once we hit 1.0.0, we will make every effort to maintain a backwards compatible API and use the MAJOR, MINOR, and PATCH versions on each release to indicate any incompatibilities.

License

Please see LICENSE and NOTICE for how the code in this repo is licensed.

Copyright © 2020 Gruntwork, Inc.

kubergrunt's People

Contributors

autero1 avatar bwhaley avatar danielris avatar denis256 avatar dependabot[bot] avatar drfaust92 avatar eak12913 avatar friedcircuits avatar gruntwork-ci avatar jtdoepke avatar kylemartin901 avatar marinalimeira avatar mcalhoun avatar pras111gw avatar rhoboat avatar robmorgan avatar ryehowell avatar sayanchowdhury avatar sylr avatar thisguycodes avatar yorinasub17 avatar zackproser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubergrunt's Issues

Problem with registering instances to Load Balancers

Hello.
I use kubergrunt v0.6.10 to redeploy nodes after cluster upgrade.

My command:
kubergrunt eks deploy --region eu-west-1 --asg-name dasdhashsa

I have a problem because kubergrunt is waiting for ELB but I have an error:

INFO[2021-03-16T11:17:36+01:00] Waiting for at least one instance to be in service for elb a883c71036f1d4dd6a3c03c772b9  name=kubergrunt
INFO[2021-03-16T11:27:33+01:00] ERROR: error waiting for any instance to be in service for elb a883c71036f1d4dd6a3603c772b9  name=kubergrunt

When I check LoadBalancer I see new running EC2 instance in Target Group as Healthy.

Cannot build: vbom.ml domain no longer exists

The vbom.ml domain is currently down. As a result, fetching dependencies fails for:

name = "vbom.ml/util"

(17/92) Failed to write github.com/ghodss/[email protected]
grouped write of manifest, lock and vendor: error while writing out vendor tree: failed to write dep tree: failed to export vbom.ml/util: unable to deduce repository and source type for "vbom.ml/util": unable to read metadata: unable to fetch raw metadata: failed HTTP request to URL "http://vbom.ml/util?go-get=1": Get "http://vbom.ml/util?go-get=1": dial tcp: lookup vbom.ml: no such host
Command failed: GOARCH=amd64 GOOS=darwin CC=/usr/bin/clang dep ensure -v -vendor-only
Exit code: 1

Seeing that it should be replaced with: https://github.com/fvbommel/util

kubergrunt eks verify does not work behind HTTP Proxies

Hello,

Cannot verify the EKS Cluster state behind a corporate network that presents HTTP proxy.

After v0.6.14 I can successfully retrieve the OIDC thumbprint, but not verify the cluster state. The following command sits there indefinitely waiting until the client times out. Not 100% where the issue lays in the source code though. I suspect you're using the AWS SDK for EKS in here.

❯kubergrunt eks verify --eks-cluster-arn arn:aws:eks:eu-west-1:<AccountID>:cluster/<ClusterName> --wait
[] INFO[2021-05-05T18:50:04+01:00] Checking if EKS cluster arn:aws:eks:eu-west-1:<AccountID>:cluster/<ClusterName> exists  name=kubergrunt
[] INFO[2021-05-05T18:50:04+01:00] Retrieving details for EKS cluster arn:aws:eks:eu-west-1:<AccountID>:cluster/<ClusterName>  name=kubergrunt
[] INFO[2021-05-05T18:50:04+01:00] Detected cluster deployed in region eu-west-1  name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Successfully retrieved EKS cluster details    name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Verified EKS cluster arn:aws:eks:eu-west-1:<AccountID>:cluster/<ClusterName> is in active state.  name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Checking EKS cluster arn:aws:eks:eu-west-1:<AccountID>:cluster/<ClusterName> Kubernetes API endpoint.  name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Checking EKS cluster info                     name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Retrieving details for EKS cluster arn:aws:eks:eu-west-1:<AccountID>:cluster/  name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Detected cluster deployed in region eu-west-1  name=kubergrunt
[] INFO[2021-05-05T18:50:05+01:00] Successfully retrieved EKS cluster details    name=kubergrunt
(....)
 WARN[2021-05-05T18:54:25+01:00] Error retrieiving info from endpoint: Head "https://<ID>gr7.eu-west-1.eks.amazonaws.com": dial tcp <IPv4>443: connect: connection timed out  name=kubergrunt
[] WARN[2021-05-05T18:54:25+01:00] Marking api server as not ready               name=kubergrunt
(...)

Should rolling deploy command leverage cluster-autoscaler?

Way it might work:

  1. Taint all the nodes in the cluster with NoSchedule so new pods won't be scheduled on the existing nodes.
  2. In a rolling fashion, evict the pods on a set of nodes so that they need to be rescheduled. This should trigger new instances to be provisioned, because there are no nodes available for the pods to be scheduled on.
  3. Since the pods are being evicted on the old nodes, they should be scaled in because they will become idle.

Also from Jim

One more thought:

  • The deployer can schedule itself as a Job in the K8S cluster
  • The deployer can then checkpoint its progress (e.g., in a ConfigMap)
    That way, if it is paused or restarted, it can just read the state and pick up where it left off

Roll out cluster update command in kubergrunt should validate PodDisruptionBudgets

Currently kubergrunt does not implement any checks for minimal downtime features to be implemented on the Kubernetes resources deployed on the cluster. As part of the rollout command, we should validate that the disruption control features (readiness probes and Pod Disruption Budgets) are implemented on the deployed resources before continuing with the rollout.

RBAC Error when running `helm deploy` on GKE

Hi Yori,

I'm trying to deploy Helm to a GKE cluster and it looks like I'm running into some RBAC issues.

Any ideas?

It could be a GKE security feature or limitation?

Here's the full trace:

❯ ./cmd helm deploy \
    --tiller-namespace tiller-world \
    --resource-namespace dev \
    --service-account tiller \
    --tls-common-name tiller \
    --tls-org Gruntwork \
    --tls-org-unit IT \
    --tls-city Phoenix \
    --tls-state AZ \
    --tls-country US \
    --rbac-group admin \
    --client-tls-common-name admin \
    --client-tls-org Gruntwork \
    --kubectl-context-name $(kubectl config current-context)

INFO[2019-02-07T14:12:05+01:00] No kube config path provided. Using default (/Users/robbym/.kube/config)  name=kubergrunt
INFO[2019-02-07T14:12:05+01:00] Validating required resources exist.          name=kubergrunt
INFO[2019-02-07T14:12:05+01:00] Validating the Namespace tiller-world exists  name=kubergrunt
INFO[2019-02-07T14:12:05+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Found Namespace tiller-world                  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Validating the ServiceAccount tiller exists in the Namespace tiller-world  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Found ServiceAccount tiller                   name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] All required resources exist.                 name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Generating certificate key pairs              name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Using /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/671821986 as temp path for storing certificates  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Generating CA TLS certificate key pair        name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Done generating CA TLS certificate key pair   name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Generating Tiller TLS certificate key pair (used to identify server)  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Generating signed TLS certificate key pair    name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Done generating signed TLS Certificate key pair  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Successfully generated Tiller TLS certificate key pair (used to identify server)  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Done generating certificate key pairs         name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Uploading CA certificate key pair as a secret  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Successfully uploaded CA certificate key pair as a secret  name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Deploying Helm Server (Tiller)                name=kubergrunt
INFO[2019-02-07T14:12:06+01:00] Running command: helm --kube-context gke_dev-sandbox-228703_europe-west3_example-cluster --kubeconfig /Users/robbym/.kube/config init --home /Users/robbym/.helm --override spec.template.spec.containers[0].command={/tiller,--storage=secret} --tiller-tls --tiller-tls-verify --tiller-tls-cert /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/671821986/tiller_tiller-world.crt --tiller-tls-key /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/671821986/tiller_tiller-world.pem --tls-ca-cert /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/671821986/tiller_tiller-world_ca.crt --tiller-namespace tiller-world --service-account tiller --wait
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/repository
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/repository/cache
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/repository/local
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/plugins
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/starters
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/cache/archive
INFO[2019-02-07T14:12:06+01:00] Creating /Users/robbym/.helm/repository/repositories.yaml
INFO[2019-02-07T14:12:06+01:00] Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
INFO[2019-02-07T14:12:09+01:00] Adding local repo with URL: http://127.0.0.1:8879/charts
INFO[2019-02-07T14:12:09+01:00] $HELM_HOME has been configured at /Users/robbym/.helm.
INFO[2019-02-07T14:12:09+01:00]
INFO[2019-02-07T14:12:09+01:00] Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
INFO[2019-02-07T14:12:21+01:00] Happy Helming!
INFO[2019-02-07T14:12:21+01:00] Successfully deployed Tiller in namespace tiller-world with service account tiller  name=kubergrunt
INFO[2019-02-07T14:12:21+01:00] Granting access Tiller server deployed in namespace tiller-world to:  name=kubergrunt
INFO[2019-02-07T14:12:21+01:00]         - the RBAC groups [admin]                    name=kubergrunt
INFO[2019-02-07T14:12:21+01:00] Checking if Tiller is deployed in the namespace.  name=kubergrunt
INFO[2019-02-07T14:12:21+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Found a valid Tiller instance in the namespace tiller-world.  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Downloading the CA TLS certificates for Tiller deployed in namespace tiller-world.  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Using /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/050160537 as temp path for storing certificates  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Successfully downloaded CA TLS certificates for Tiller deployed in namespace tiller-world.  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Granting access to deployed Tiller in namespace tiller-world to RBAC groups  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Generating and storing certificate key pair for admin (1 of 1)  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Using /var/folders/4y/w8n6q5x525v9rkl7pt5pwwl40000gn/T/986521124 as temp path for storing client certificates  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Generating client certificates for entity admin  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Generating signed TLS certificate key pair    name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Done generating signed TLS Certificate key pair  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Successfully generated client certificates for entity admin  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Uploading client certificate key pair as secret in namespace tiller-world with name tiller-client-21232f297a57a5a743894a0e4a801fc3-certs  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Successfully uploaded client certificate key pair as a secret  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Successfully generated and stored certificate key pair for admin  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Creating and binding RBAC roles to admin      name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Loading Kubernetes Client with config /Users/robbym/.kube/config and context gke_dev-sandbox-228703_europe-west3_example-cluster  name=kubergrunt
INFO[2019-02-07T14:12:22+01:00] Creating RBAC role to grant access to Tiller in namespace tiller-world to admin  name=kubergrunt
ERRO[2019-02-07T14:12:22+01:00] Error creating RBAC role to grant access to Tiller: roles.rbac.authorization.k8s.io "admin-tiller-world-tiller-access" is forbidden: attempt togrant extra privileges: [{[get] [] [pods] [] []} {[list] [] [pods] [] []} {[get] [] [secrets] [tiller-client-21232f297a57a5a743894a0e4a801fc3-certs] []} {[create] [] [pods/portforward] [] []}] user=&{[email protected]  [system:authenticated] map[user-assertion.cloud.google.com:[APTNk9SdKe8IhVYtn/TR9Ua8Qrnlwvkx5D1eORaixPPyFtQD9/9ggHhR0oa7C2fUpHJUPrHB30HeKcLEOS7rrLix66cYGPDr9+tMdUdK3ofoWMPAXryiFEQA9KaJsKBL8jx7D+nyrmuqzXv5DJEQwW/fhYaglXEJUMx8kpP18TcEybphCg9YexGZ4l1iqVAnn4iREHHV63jSXpvkTRpuoB9nLNJMr/UUDIk=]]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews selfsubjectrulesreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /openapi /openapi/* /swagger-2.0.0.pb-v1 /swagger.json /swaggerapi /swaggerapi/* /version /version/]}] ruleResolutionErrors=[]  name=kubergrunt
ERRO[2019-02-07T14:12:22+01:00] Error creating and binding RBAC roles to admin  name=kubergrunt
ERRO[2019-02-07T14:12:22+01:00] Error granting access to deployed Tiller in namespace tiller-world to RBAC groups: roles.rbac.authorization.k8s.io "admin-tiller-world-tiller-access" is forbidden: attempt to grant extra privileges: [{[get] [] [pods] [] []} {[list] [] [pods] [] []} {[get] [] [secrets] [tiller-client-21232f297a57a5a743894a0e4a801fc3-certs] []} {[create] [] [pods/portforward] [] []}] user=&{[email protected]  [system:authenticated] map[user-assertion.cloud.google.com:[APTNk9SdKe8IhVYtn/TR9Ua8Qrnlwvkx5D1eORaixPPyFtQD9/9ggHhR0oa7C2fUpHJUPrHB30HeKcLEOS7rrLix66cYGPDr9+tMdUdK3ofoWMPAXryiFEQA9KaJsKBL8jx7D+nyrmuqzXv5DJEQwW/fhYaglXEJUMx8kpP18TcEybphCg9YexGZ4l1iqVAnn4iREHHV63jSXpvkTRpuoB9nLNJMr/UUDIk=]]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews selfsubjectrulesreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /openapi /openapi/* /swagger-2.0.0.pb-v1 /swagger.json /swaggerapi /swaggerapi/* /version /version/]}] ruleResolutionErrors=[]  name=kubergrunt
ERROR: roles.rbac.authorization.k8s.io "admin-tiller-world-tiller-access" is forbidden: attempt to grant extra privileges: [{[get] [] [pods] [] []} {[list] [] [pods] [] []} {[get] [] [secrets] [tiller-client-21232f297a57a5a743894a0e4a801fc3-certs] []} {[create] [] [pods/portforward] [] []}] user=&{[email protected]  [system:authenticated] map[user-assertion.cloud.google.com:[APTNk9SdKe8IhVYtn/TR9Ua8Qrnlwvkx5D1eORaixPPyFtQD9/9ggHhR0oa7C2fUpHJUPrHB30HeKcLEOS7rrLix66cYGPDr9+tMdUdK3ofoWMPAXryiFEQA9KaJsKBL8jx7D+nyrmuqzXv5DJEQwW/fhYaglXEJUMx8kpP18TcEybphCg9YexGZ4l1iqVAnn4iREHHV63jSXpvkTRpuoB9nLNJMr/UUDIk=]]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews selfsubjectrulesreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /openapi /openapi/* /swagger-2.0.0.pb-v1 /swagger.json /swaggerapi /swaggerapi/* /version /version/]}] ruleResolutionErrors=[]

Add support for multiple ASGs in rolling deployment command

Currently, kubergrunt requires a single --asg-name when running eks deploy but in many clusters you'll have multiple ASGs for your worker nodes.

It would be great if this supported providing a list of ASGs and running rolling deployments on each of them.

Errors you may encounter when upgrading the library

(The purpose of this report is to alert gruntwork-io/kubergrunt to the possible problems when gruntwork-io/kubergrunt try to upgrade the following dependencies)

An error will happen when upgrading library urfave/cli:

github.com/urfave/cli

-Latest Version: v2.2.0 (Latest commit d648edd on 6 Mar)
-Where did you use it:
https://github.com/gruntwork-io/kubergrunt/search?q=urfave%2Fcli&unscoped_q=urfave%2Fcli
-Detail:

github.com/urfave/cli/go.mod

module github.com/urfave/cli/v2
go 1.11
require (
	github.com/BurntSushi/toml v0.3.1
	github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d	
	…
)

github.com/urfave/cli/docs.go

package cli
import (
	"github.com/cpuguy83/go-md2man/v2/md2man"
)

This problem was introduced since urfave/cli v1.22.1 (committed c71fbce on 12 Sep 2019) .Now you used version v1.21.0. If you try to upgrade urfave/cli to version v1.22.1 and above, you will get an error--- no package exists at "github.com/cpuguy83/go-md2man/v2"

I investigated the libraries (urfave/cli >= v1.22.1) release information and found the root cause of this issue is that----

  1. These dependencies all added Go modules in the recent versions.

  2. They all comply with the specification of "Releasing Modules for v2 or higher" available in the Modules documentation. Quoting the specification:

A package that has migrated to Go Modules must include the major version in the import path to reference any v2+ modules. For example, Repo github.com/my/module migrated to Modules on version v3.x.y. Then this repo should declare its module path with MAJOR version suffix "/v3" (e.g., module github.com/my/module/v3), and its downstream project should use "github.com/my/module/v3/mypkg" to import this repo’s package.

  1. This "github.com/my/module/v3/mypkg" is not the physical path. So earlier versions of Go (including those that don't have minimal module awareness) plus all tooling (like dep, glide, govendor, etc) don't have minimal module awareness as of now and therefore don't handle import paths correctly See golang/dep#1962, golang/dep#2139.

Note: creating a new branch is not required. If instead you have been previously releasing on master and would prefer to tag v3.0.0 on master, that is a viable option. (However, be aware that introducing an incompatible API change in master can cause issues for non-modules users who issue a go get -u given the go tool is not aware of semver prior to Go 1.11 or when module mode is not enabled in Go 1.11+).
Pre-existing dependency management solutions such as dep currently can have problems consuming a v2+ module created in this way. See for example dep#1962.
https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher

Solution

1. Migrate to Go Modules.

Go Modules is the general trend of ecosystem, if you want a better upgrade package experience, migrating to Go Modules is a good choice.

Migrate to modules will be accompanied by the introduction of virtual paths(It was discussed above).

This "github.com/my/module/v3/mypkg" is not the physical path. So Go versions older than 1.9.7 and 1.10.3 plus all third-party dependency management tools (like dep, glide, govendor, etc) don't have minimal module awareness as of now and therefore don't handle import paths correctly.

Then the downstream projects might be negatively affected in their building if they are module-unaware (Go versions older than 1.9.7 and 1.10.3; Or use third-party dependency management tools, such as: Dep, glide, govendor…).

2. Maintaining v2+ libraries that use Go Modules in Vendor directories.

If gruntwork-io/kubergrunt want to keep using the dependency manage tools (like dep, glide, govendor, etc), and still want to upgrade the dependencies, can choose this fix strategy.
Manually download the dependencies into the vendor directory and do compatibility dispose(materialize the virtual path or delete the virtual part of the path). Avoid fetching the dependencies by virtual import paths. This may add some maintenance overhead compared to using modules.

As the import paths have different meanings between the projects adopting module repos and the non-module repos, materialize the virtual path is a better way to solve the issue, while ensuring compatibility with downstream module users. A textbook example provided by repo github.com/moby/moby is here:
https://github.com/moby/moby/blob/master/VENDORING.md
https://github.com/moby/moby/blob/master/vendor.conf
In the vendor directory, github.com/moby/moby adds the /vN subdirectory in the corresponding dependencies.
This will help more downstream module users to work well with your package.

3. Request upstream to do compatibility processing.

The urfave/cli have 1694 module-unaware users in github, such as: alena1108/cluster-controller, nathan-jenan-rancher/example-kontainer-engine-driver, containerd/containerd…
https://github.com/search?q=urfave%2Fcli+filename%3Avendor.conf+filename%3Avendor.json+filename%3Aglide.toml+filename%3AGodep.toml+filename%3AGodep.json

Summary

You can make a choice when you meet this DM issues by balancing your own development schedules/mode against the affects on the downstream projects.

For this issue, Solution 1 can maximize your benefits and with minimal impacts to your downstream projects the ecosystem.

References

Do you plan to upgrade the libraries in near future?
Hope this issue report can help you ^_^
Thank you very much for your attention.

Best regards,
Kate

Add support for 1.25 and drop support of 1.21

Describe the solution you'd like
EKS has released k8s 1.25 support and 1.21 reached EOL on 2/15/2023. EKS update is ready to merge, but kubergrunt tests fail unless we add 1.25 support to kubergrunt.

Tiller-less version

Could we make tiller deprecated or at least optional?

No need for it on Helm 3.

Add ability to use environment variables for kubectl config

Hi,

Just making a feature request for kubergrunt to have the ability to read the Kubectl config variable from environment variable KUBECONFIG as well as the cli flags. I couldn't find anything in past issues so I apologise if you have already addressed this.

I am using KIND to build the Kubernetes clusters and then use direnv to load the environment variable KUBECONFIG based on my different testing environments.

export KUBECONFIG="$(kind get kubeconfig-path --name="infra")"

Terraform and Kubectl both read what config to use from this environment variable KUBECONFIG and it would be awesome if Kubergrunt also did, making it even easier to use within Terraform with local exec, I would think at least for me it would.

Thanks
Kyle

Support sync command with EKS 1.23

Describe the solution you'd like
EKS Kubernetes version 1.23 support was announced recently, so we should update kubergrunt eks sync to support that.

kubergrunt eks configure should allow you to force update an existing context

When you reuse EKS cluster names, the ARNs stay the same if the EKS cluster is recreated, but since the server parameters change (e.g., the endpoint and the CA certificate), you need to reconfigure the kubeconfig. However, kubergrunt currently errors out if a context and config already exists for the server with the same ARN.

The workaround is to delete the existing config, but kubergrunt eks configure should also expose a --force option to overwrite the existing config instead of erroring out.

Kubergrunt Support for EKS 1.19

Today I tested a EKS upgrade to 1.19 as we are building a new cluster and wanted to test upgrading with Terraform. I noticed that kubergrunt generated the following error.

module.eks_cluster.null_resource.sync_core_components[0] (local-exec): ERROR: 1.19 is not a supported version for kubergrunt eks upgrade. Please contact [email protected] for more info.

I know that 1.19 is super new for AWS but what needs to be updated in Kubergrunt to support 1.19? 1.20 is also a few months away.

Thanks.

eks deploy should be robust to eventual consistency

Should avoid errors like the following

ERRO[2021-02-08T21:01:38-04:00] Error retrieving detailed about the instances  name=kubergrunt
ERRO[2021-02-08T21:01:38-04:00] Undo by terminating all the new instances and trying again  name=kubergrunt
ERROR: InvalidInstanceID.NotFound: The instance ID 'i-028ec69838e751d18' does not exist
        status code: 400, request id: 9c623f38-d818-43f1-a335-d370a2a5ad76

missing ENABLE_IPv4 and ENABLE_IPv6 variable for cni version 1.10

Describe the bug
aws cni 1.10.X has two new variables

ENABLE_IPv4=true
ENABLE_IPv6=false

Upgrading my 1.9.x manifests to run a 1.10.x container image get SIGSEGV because of missing above two variables

Defaulted container "aws-node" out of: aws-node, aws-vpc-cni-init (init)
{"level":"info","ts":"2022-08-19T07:51:24.172Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-08-19T07:51:24.173Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-08-19T07:51:24.185Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-08-19T07:51:24.188Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-08-19T07:51:26.195Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x39 pc=0x5564aa055418]

goroutine 422 [running]:
github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd.(*IPAMContext).StartNodeIPPoolManager(0xc00022c768)
	/go/src/github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go:640 +0x18
created by main._main
	/go/src/github.com/aws/amazon-vpc-cni-k8s/cmd/aws-k8s-agent/main.go:64 +0x2bb
{"level":"info","ts":"2022-08-19T07:51:28.200Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T07:51:30.206Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

To Reproduce
run kubergrunt eks sync-core-components on EKS 1.21
see aws/amazon-vpc-cni-k8s#1854

// paste code snippets here

Expected behavior
Better to patch aws-node daemonset with above new envs or add a notes to release notes or kubergrunt for required change

Nice to have

  • Terminal output
  • Screenshots

Additional context
Add any other context about the problem here.

[eks deploy] Configurable rolling deployment for controlled roll out

The current approach with "kubergrunt eks deploy" causes downtime due to the fact that all nodes in a group receive drain simultaneously and all replicas of affected apps (the same for ingress controller, stateful sets, etc) start migration in the same time.
Need to improve the update procedure to avoid downtimes.

No --version flag when building from source

Hi All,

When following the directions to build from source, the resultant binary does not have the --version flag available. This causes downstream problems when working with other gruntworks modules, as most modules check the version of kubergrunt before using. I am compiling with:

go build -o bin/kubergrunt ./cmd

Is there another flag or build config that I should use to make sure the --version flag gets included in kubergrunt?

For reference, here is the error encountered with kubergrunt built from latest source:

$ /usr/bin/kubergrunt --version
Incorrect Usage. flag provided but not defined: -version
Usage: kubergrunt [--loglevel] [--help] command [options] [args]
A CLI tool to help setup and manage a Kubernetes cluster.
Commands:
   eks      Helper commands to configure EKS.
   k8s      Helper scripts for managing Kubernetes resources directly.
   tls      Helper commands to manage TLS certificate key pairs as Kubernetes Secrets.
   help, h  Shows a list of commands or help for one command
ERROR: flag provided but not defined: -version

Helm tools: Consider separating managed namespace from tiller-namespace

Right now a lot of the helm features were built with tiller managing resources in the tiller-namespace that it was deployed to.

However, this is a potential risk because an admin user in the tiller-namespace can now bypass the authentication mechanisms of tiller by inheriting the ServiceAccount. The worse risk is where read only users can grab the tiller-secret to obtain the server side TLS certs, which can be used to successfully authenticate to Tiller due to the shared CA for server and client verification and then get write access.

To mitigate this, Tiller should be deployed into a separate namespace, which then manages resources in the main namespace. This way, read only users in the main namespace do not get tiller access unless they are also explicitly granted access to the tiller namespace.

In this model, granting access to tiller now comprises of:

  • Getting client side certs for tiller.
  • Getting read only access to the tiller pod to look up the network.

And the service account needs:

  • R/W access to secrets in the tiller namespace
  • R/W access to all resources in the main namespace

Allow kubergrunt to continue even when it can't verify load balancer state.

Describe the solution you'd like
I would like a command line flag for kubergrunt eks deploy called something like --ignore-loadbalancer-state
The purpose of this flag is to prevent kubergrunt to fail even though it is not able to get the load balancer state.
This can for instance happen if you have unused service Load Balancers in a Pending state in your cluster.
In a perfect world, you would ask your cloud developers to remove services they no longer use, but that is not always possible, so it would be nice to have a administrator feature like --ignore-loadbalancer-state to keep kubergrunt doing node rollover, and not failing in those cases.

Describe alternatives you've considered
Ask cloud developers to ensure they have active and functioning service load balancers.

Additional context
This error can be seen with:

[] INFO[2023-02-07T16:04:11+01:00] Found 1 LoadBalancer services of 22 services in kubernetes.  name=kubergrunt
[] ERRO[2023-02-07T16:04:11+01:00] Error retrieving associated ELB names of the Kubernetes services.  name=kubergrunt
[] ERRO[2023-02-07T16:04:11+01:00] Undo by terminating all the new instances and trying again  name=kubergrunt
[] ERRO[2023-02-07T16:04:11+01:00] Error while waiting for new nodes to be ready.  name=kubergrunt
[] ERRO[2023-02-07T16:04:11+01:00] Either resume with the recovery file or terminate the new instances.  name=kubergrunt

I have already made a fix in a fork, but I will only feed that back to upstream if you agree this is a usable feature.

kubergrunt eks deploy doesn't gracefully decommission nodes if drain fails

I was trying out the kubergrunt eks deploy command today to try and upgrade/roll my ASKs with a new version.

I have several pods with local data, so the drain failed, however it still went ahead and forcefully terminated the nodes

[] INFO[2020-12-21T11:02:04-04:00] Cordoning old instances in cluster ASG dev-eks-client-spot-us-east-1b20190530170243598800000004 to prevent Pod scheduling  name=kubergrunt
[] INFO[2020-12-21T11:02:05-04:00] Running command: kubectl --kubeconfig /Users/adam.leclerc/.kube/config cordon ip-172-25-114-254.ec2.internal
[] INFO[2020-12-21T11:02:05-04:00] Running command: kubectl --kubeconfig /Users/adam.leclerc/.kube/config cordon ip-172-25-116-251.ec2.internal
[] INFO[2020-12-21T11:02:07-04:00] node/ip-172-25-116-251.ec2.internal cordoned
[] INFO[2020-12-21T11:02:07-04:00] node/ip-172-25-114-254.ec2.internal cordoned
[] INFO[2020-12-21T11:02:07-04:00] Successfully cordoned old instances in cluster ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:07-04:00] Draining Pods on old instances in cluster ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:07-04:00] Running command: kubectl --kubeconfig /Users/adam.leclerc/.kube/config drain ip-172-25-116-251.ec2.internal --ignore-daemonsets --timeout 15m0s
[] INFO[2020-12-21T11:02:07-04:00] Running command: kubectl --kubeconfig /Users/adam.leclerc/.kube/config drain ip-172-25-114-254.ec2.internal --ignore-daemonsets --timeout 15m0s
[] INFO[2020-12-21T11:02:09-04:00] node/ip-172-25-116-251.ec2.internal already cordoned
[] INFO[2020-12-21T11:02:09-04:00] node/ip-172-25-114-254.ec2.internal already cordoned
[] INFO[2020-12-21T11:02:10-04:00] error: unable to drain node "ip-172-25-116-251.ec2.internal", aborting command...
[] INFO[2020-12-21T11:02:10-04:00]
[] INFO[2020-12-21T11:02:10-04:00] There are pending nodes to be drained:
[] INFO[2020-12-21T11:02:10-04:00]  ip-172-25-116-251.ec2.internal
[] INFO[2020-12-21T11:02:10-04:00] error: cannot delete Pods with local storage (use --delete-local-data to override): <<<< REDACTED>>>>
[] INFO[2020-12-21T11:02:10-04:00] error: unable to drain node "ip-172-25-114-254.ec2.internal", aborting command...
[] INFO[2020-12-21T11:02:10-04:00]
[] INFO[2020-12-21T11:02:10-04:00] There are pending nodes to be drained:
[] INFO[2020-12-21T11:02:10-04:00]  ip-172-25-114-254.ec2.internal
[] INFO[2020-12-21T11:02:10-04:00] error: cannot delete Pods with local storage (use --delete-local-data to override): <<<< REDACTED>>>>
[] INFO[2020-12-21T11:02:10-04:00] Successfully drained all scheduled Pods on old instances in cluster ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:10-04:00] Removing old nodes from ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:10-04:00] Detaching 2 instances from ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:11-04:00] Detached 2 instances from ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:02:11-04:00] Terminating 2 instances, in groups of up to 1000 instances  name=kubergrunt
[] INFO[2020-12-21T11:02:11-04:00] Terminated 2 instances from batch 0           name=kubergrunt
[] INFO[2020-12-21T11:02:11-04:00] Waiting for 2 instances to shut down from batch 0  name=kubergrunt
[] INFO[2020-12-21T11:03:42-04:00] Successfully shutdown 2 instances from batch 0  name=kubergrunt
[] INFO[2020-12-21T11:03:42-04:00] Successfully shutdown all 2 instances         name=kubergrunt
[] INFO[2020-12-21T11:03:42-04:00] Successfully removed old nodes from ASG dev-eks-client-spot-us-east-1b20190530170243598800000004  name=kubergrunt
[] INFO[2020-12-21T11:03:42-04:00] Successfully finished roll out for EKS cluster worker group dev-eks-client-spot-us-east-1b20190530170243598800000004 in us-east-1  name=kubergrunt```

I have two questions:
1. Is this expected, or is it a bug?
2. Is there a way to allow the drain to pass the `--delete-local-data` flag?

Kugergrunt using the wrong version for EKS 1.18

[] INFO[2021-08-19T14:05:49-05:00] Successfully retrieved EKS cluster details name=kubergrunt
[] INFO[2021-08-19T14:05:49-05:00] Syncing Kubernetes Applications to: name=kubergrunt
[] INFO[2021-08-19T14:05:49-05:00] kube-proxy: 1.18.8-eksbuild.2 name=kubergrunt

Per:
https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html#kube-proxy-default-versions-table

The currently supported release for 1.18 EKS cluters is:

1.18.8-eksbuild.1

1.18.8-eksbuild.2 doesn't exist and throws:
Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.18.8-eksbuild.2": rpc error: code = Unknown desc = Error response from daemon: manifest for 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.18.8-eksbuild.2 not found: manifest unknown: Requested image not found

Support sync command with EKS 1.24

Describe the solution you'd like

EKS Kubernetes version 1.24 support was announced, so we should update kubergrunt eks sync to support that.

Actively deregister instances from target groups during `eks deploy`

For people using in-cluster mechanisms to manage ELB attachments (e.g., aws-alb-ingress-controller), detaching instances from ASGs do not actually deregister the instances from the ELBs. This can be a source of downtime during the rollout procedure as instances need to fail the health checks before being dropped from the list, which means they can continue to get requests while the servers are being drained and shut down.

cleanup-security-group misses elb security groups

Describe the bug
I ran this tool and it deleted the cluster's security group but not groups created for load balancers.

To Reproduce

  • Create an eks service with load balancer
  • Delete the service (LB automatically removed, leaves orphan security groups in certain situations)
  • Call kubergrunt.
resource "aws_eks_cluster" "eks" {
   ...
  provisioner "local-exec" {
    when    = destroy
    command = "kubergrunt eks cleanup-security-group --eks-cluster-arn ${self.arn} --security-group-id ${self.vpc_config.0.cluster_security_group_id} --vpc-id ${self.vpc_config.0.vpc_id}"
  }
}

Expected behavior
Kubergrunt deletes the orphaned load balancer security groups.

Nice to have

  • Terminal output
module.eks_common.aws_eks_cluster.eks (local-exec): Executing: ["/bin/sh" "-c" "kubergrunt eks cleanup-security-group --eks-cluster-arn arn:aws:eks:us-east-1:xxxx:cluster/dpedu5-eks --security-group-id sg-0a10673de28a8a38f --vpc-id vpc-07c666f30e4bebbef"]
module.eks_common.aws_eks_cluster.eks (local-exec): [] time="2022-04-27T12:30:25-07:00" level=info msg="Successfully authenticated with AWS" name=kubergrunt
module.eks_common.aws_eks_cluster.eks (local-exec): [] time="2022-04-27T12:30:26-07:00" level=info msg="Deleting security group sg-0a10673de28a8a38f" name=kubergrunt
module.eks_common.aws_eks_cluster.eks (local-exec): [] time="2022-04-27T12:30:26-07:00" level=info msg="Security group sg-0a10673de28a8a38f already deleted." name=kubergrunt

Additional context
The orphaned security groups have a tag like: kubernetes.io/cluster/<clustername>=owned. See kubernetes/kubernetes#109698 for the conditions under which these orphaned groups are created and not deleted.

Enhance `README.md` with commands for creating resources

Hi Yori,

I was trying to use Kubergrunt to deploy Helm to a GKE cluster. Would it make sense to add a few lines to the README.md file to illustrate how to create a dedicated ServiceAccount and namespace for Helm.

e.g:

❯ kubectl create namespace tiller-world
namespace "tiller-world" created

❯ kubectl create serviceaccount tiller --namespace tiller-world
serviceaccount "tiller" created

That way the user can get started quicker and avoid errors like: ERROR: namespaces "tiller-world" not found if they then run:

kubergrunt helm deploy \
    --tiller-namespace tiller-world \
    --resource-namespace dev \
    --service-account tiller \
    --tls-common-name tiller \
    --tls-org Gruntwork \
    --tls-org-unit IT \
    --tls-city Phoenix \
    --tls-state AZ \
    --tls-country US \
    --rbac-group admin \
    --client-tls-common-name admin \
    --client-tls-org Gruntwork

It also removes the ambiguity from the following statement:

Note: This command does not create Namespaces or ServiceAccounts, delegating that responsibility to other systems.

Add debug logs if kubergrunt cannot access Kubernetes cluster

Describe the bug
I recently used kubergrunt eks deploy to roll-out workers update and it suddenly failed at some point (see logs):

[] INFO[2023-02-10T12:57:02+01:00] Successfully launched new nodes with new launch config on ASG app-workers-eks-data-asg-20210305120833070500000007  name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Waiting for 3 nodes in Kubernetes to reach ready state  name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Loading Kubernetes Client                     name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Using config on disk and context.             name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Checking if nodes ready                       name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Timed out waiting for the instances to reach ready state in Kubernetes.  name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Undo by terminating all the new instances and trying again  name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Error while waiting for new nodes to be ready.  name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Either resume with the recovery file or terminate the new instances.  name=kubergrunt

The reason for this that my kubectl was not switched to correct context, basically, neither kubectl not kubergrunt cannot authenticate to the correct cluster. But, as you can see from logs, you cannot define that.

To Reproduce
Steps to reproduce the behavior including the relevant Terraform/Terragrunt/Packer version number and any code snippets and module inputs you used.

Expected behavior
Display meaningful errors if kubergrunt cannot authenticate to Kubernetes cluster.

Nice to have

  • Terminal output
  • Screenshots

Additional context
Add any other context about the problem here.

Fault tolerance in the eks deploy command

The eks deploy command should track where it is in the state of the deployment and recover from failures.

For example, it is fairly painful to recoup if the deploy command fails after the ASG was expanded and during the drain call. In that failure mode, the user has to manually drain all the nodes, and then terminate them. We should figure out a way for the deploy command to remember where it failed, and then provide the ability to retry starting from that step.

CoreDNS 1.8.3 update requires clusterrole patch

I just upgraded my lab EKS cluster (managed by Gruntwork's terraform-aws-eks modules) from 1.19 to 1.20 and ran into some issues with kubergrunt's CoreDNS update.

After the module's provisioner had executed eks sync-core-components .., DNS resolution inside the cluster stopped working and the new CoreDNS pods starting logging errors like:

E0611 10:20:52.181166       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope

Quick googling led me to AWS' docs (see step 5):
https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html#updating-coredns-add-on

It turns out that the CoreDNS 1.8.3 update requires additional permissions.

Manually patching the system:coredns clusterrole resolved my issues, but I think it would make sense to have kubergrunt handle this kind of thing automatically (since it's already patching the image version and configmap anyways).

versions:
terraform v1.0.0
terragrunt v0.29.2
kubergrunt v0.7.1
terraform-aws-eks v0.41.0

helm rotate command that rotates the certs

This command should:

  • Generate a new CA
  • Generate new server side certs and do an inplace update on the existing Secret for Tiller.
  • Generate new client side certs for each client previously granted and do an inplace update on the existing Secret for Tiller.

Configurable rolling deployment

kubergrunt eks deploy should support a configurable parameter that specifies how many nodes it should rotate at a time. This can help make the deployment less disruptive at the expense of speed of rollout.

EKS deploy error - LoadBalancer hostname is in an unexpected format

Hi, just trying this out for the first time today and ran into this error.

[] DEBU[2021-08-23T12:13:55+01:00] Node ip-10-0-10-39.ec2.internal is ready      name=kubergrunt
[] DEBU[2021-08-23T12:13:55+01:00] Node ip-10-0-11-26.ec2.internal is ready      name=kubergrunt
[] INFO[2021-08-23T12:13:55+01:00] Getting all LoadBalancers from services in kubernetes  name=kubergrunt
[] INFO[2021-08-23T12:13:55+01:00] Loading Kubernetes Client                     name=kubergrunt
[] INFO[2021-08-23T12:13:55+01:00] Using config on disk and context.             name=kubergrunt
[] INFO[2021-08-23T12:13:56+01:00] Found 3 LoadBalancer services of 21 services in kubernetes.  name=kubergrunt
[] ERRO[2021-08-23T12:13:56+01:00] Error retrieving associated ELB names of the Kubernetes services.  name=kubergrunt
[] ERRO[2021-08-23T12:13:56+01:00] Undo by terminating all the new instances and trying again  name=kubergrunt
ERROR: LoadBalancer hostname is in an unexpected format: k8s-ingestwo-analytic-5e28a64f65-f648fd4a39e5698e.elb.us-east-1.amazonaws.com

The service is using the AWS ALB ingress controller.

Please let me know if there's more info I can provide to help troubleshoot further.

Getting secret name tiller-client-certs

Hi,
Thanks for the awesome tool, I have started to use kubergrunt but I was wondering if there is away to get or change a custom name for the tiller-client-certs, currently when doing kubergrant helm grant it will create certs with name tiller-client-<random-number>-certs

My use case is to get the secret name and then use it with fluxcd without having to get the secrets using terraform data then recreate the secret again using kubernetes_secret which of course will expose the secrets to statefile.

Thanks

`helm deploy` is not idempotent

If helm deploy fails during a deployment, you can get into a state where it partially applied the work. Then, there is no way to pick it back up from where it left off.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.