particuleio / teks Goto Github PK

View Code? Open in Web Editor NEW

317.0 17.0 77.0 1.66 MB

Full feature EKS cluster with Terragrunt/Terraform

Home Page: https://particuleio.github.io/teks/

License: Apache License 2.0

HCL 88.67% Shell 11.33%

kubernetes kubernetes-cluster aws eks terragrunt terraform external-dns kiam cluster-autoscaler addons

teks's Issues

error: You must be logged in to the server (Unauthorized)

When running this the user role assumedRole doesn't have authorisation, when executing the second after_hook in the eks module.

after_hook "kubeconfig" {
commands = ["apply"]
execute = ["bash", "-c", "terraform output kubeconfig 2>/dev/null > ${get_terragrunt_dir()}/kubeconfig"]
}

after_hook "kube-system-label" {
commands = ["apply"]
execute = ["bash", "-c", "kubectl --kubeconfig ${get_terragrunt_dir()}/kubeconfig label ns kube-system name=kube-system --overwrite"]
}

Custom VPC endponts list and the module resources?

When deploying w/o NAT gateway, what is the expected pattern for giving a list of required VPC endpoints, with EKS cluster security group ID(s) and policies (aws_iam_policy_document) maybe?

To my understanding, endpoints would be the right place to add it there?

FTR, I want to follow that guide to deploy on a private VPC and Fargate workers instead of EC2, so I need the following PrivateLinks (VPC endpoints):

Interface endpoints for ECR (both ecr.api and ecr.dkr) to pull container images
A gateway endpoint for S3 to pull the actual image layers
An interface endpoint for EC2
An interface endpoint for STS to support Fargate and IAM Roles for Services Accounts
An interface endpoint for CloudWatch logging (logs) if CloudWatch logging is required

What I couldn't get is, how/should I specify other module inputs for endpoints then: aws_iam_policy_document and security_group, like it is shown in example?

Please confirm if that can be done like that:

the rules defined in node_security_group_additional_rules, merged with defaults that terraform-eks-aws provides via local cluster_security_group_rules, should be pre-created by tEKS. And then passed into terraform-aws-eks via external cluster_security_group_id = "sg-xxxx" and setting create_cluster_security_group = false

how to manage secrets in terragrunt

Hello @ArchiFleKs,

how would you manage secrets that would be part of the extra_values in the helm addons you extend in the terragrunt.hcl files ?

some preferences to use AWS SSM and not Vault (because it will be part of the cluster).

Thank you for your advices.

Error when running terragrunt plan in the EKS Module

First I would like to thank for this great repository and the code here, it's pure gold.

Sadly I'm experiencing an error which for a beginner like me isn't very simple to find out why it happens.

Error: Failed to instantiate provider "kubectl" to obtain schema: fork/exec /Users/dnetzer/Repositories/Vonage/vgai-studio-infra/live/dev/eu-west-1/eks/.terragrunt-cache/Ug5X0EQf-ySvHn_k0tDzaYfjHuo/vo8pQqWUeCu_1_TBy7LGvx51SW0/terraform-provider-kubectl: exec format error

This is the error I get when I run terragrunt plan, after I run a successful terragrunt init in the EKS module.
I tried using kubectl 1.17.11 (similar to what you use in the repo), and I've tried with 19.0.0 (latest stable).

EDIT 1:
updated terragrunt to v0.23.40 which uses terraform v0.13.2, and now when I run terragrunt init it fails with the following error:

Error: Failed to install provider

Error while installing hashicorp/kubectl: provider registry
registry.terraform.io does not have a provider named
registry.terraform.io/hashicorp/kubectl

Any help would be much appreciated.

Critical Pods (nginx-ingress, calico-node...) fail readiness checks under CPU pressure

Hi,

With the configuration provided in this repo, when a (user-scheduled) Pod without CPU limits puts a lot of CPU pressure on a node, critical pods are denied the CPU shares required to correctly pass readiness or even liveliness checks (I've observed this in production, this is reproducible).
This results in nginx-ingress, calico-node and kiam not receiving traffic, or even restarting.

I've noticed that

calico-node
kiam
nginx-ingress

do not have CPU requests set. I guess that's the reason why the health checks timeout under CPU pressure.

Should we add CPU requests for all these Pods? Is there a better way to fix this issue?

Thanks!

Clarify terraform backend usage for terraform

S3 bucket is handled automatically with terragrunt but with terraform, Cloudposs module is used which needs to first setup a local backend and then to copy it over to S3, this is detailed here but should be clarify in our docs out at least link to the original documentation

VPC dependency datasources error

Hello
Hello, I'm trying to use your template to create an eks cluster, but when I run the terragrunt plan command from the vpc directory, it returns the error below.

datasources is a dependency of /home/user/projects/vitta/terraform-live/aws-eks/development/us-east-1/vpc/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block.

even running the command terragrunt run-all plan I got the same error.

I tried adding the skip_outputs flag to true on the dependency block, but without success.

could you let me know if I'm doing something wrong, I'm following the readme but still not getting success

Error: Invalid Configuration for Read-Only Attribute

Hi TEKS team,

I am encountering an error with eks-addons where I am seeing this error come up numerous times after updating to latest addons / teks release. I get this error when destroying eks-addons-critical, and applying it to an already existing one.

Am I missing something or doing anything wrong?

╷
│ Error: Invalid Configuration for Read-Only Attribute
│
│   with tls_cert_request.thanos-tls-querier-cert-csr,
│   on thanos-tls-querier.tf line 138, in resource "tls_cert_request" "thanos-tls-querier-cert-csr":
│  138:   key_algorithm   = "ECDSA"
│
│ Cannot set value for this attribute as the provider has marked it as
│ read-only. Remove the configuration line setting the value.
│
│ Refer to the provider documentation or contact the provider developers for
│ additional information about configurable and read-only attributes that are
│ supported.
╵
╷
│ Error: Invalid Configuration for Read-Only Attribute
│
│   with tls_self_signed_cert.thanos-tls-querier-ca-cert,
│   on thanos.tf line 350, in resource "tls_self_signed_cert" "thanos-tls-querier-ca-cert":
│  350:   key_algorithm     = "ECDSA"
│
│ Cannot set value for this attribute as the provider has marked it as
│ read-only. Remove the configuration line setting the value.
│
│ Refer to the provider documentation or contact the provider developers for
│ additional information about configurable and read-only attributes that are
│ supported.

terragrunt version v0.38.6
Terraform version v1.2.5

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions

.github/workflows/mkdocs.yml

actions/checkout v4

.github/workflows/renovate.yml

actions/checkout v4

actions/setup-node v4

ubuntu 22.04

.github/workflows/terraform.yml

actions/checkout v4

asdf-vm/actions v3

actions/setup-python v5

aws-actions/configure-aws-credentials v4

pre-commit/action v3.0.1

voxmedia/github-action-slack-notify-build v2

actions/checkout v4

cycjimmy/semantic-release-action v2

voxmedia/github-action-slack-notify-build v2

voxmedia/github-action-slack-notify-build v2

ubuntu 22.04

ubuntu 22.04

terraform

terragrunt/modules/datasources/main.tf

aws >= 3.72

hashicorp/terraform >= 1.0

terragrunt/provider-config/aws/aws.tf

terragrunt/provider-config/eks-addons/eks-addons.tf

terragrunt/provider-config/eks/eks.tf

terragrunt

terragrunt/live/production/eu-west-1/clusters/demo/ebs-encryption/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-kms v2.2.1

terragrunt/live/production/eu-west-1/clusters/demo/eks-addons-critical/terragrunt.hcl

github.com/particuleio/terraform-kubernetes-addons v15.3.0

terragrunt/live/production/eu-west-1/clusters/demo/eks-addons/terragrunt.hcl

github.com/particuleio/terraform-kubernetes-addons v15.3.0

terragrunt/live/production/eu-west-1/clusters/demo/eks/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-eks v20.13.0

terragrunt/live/production/eu-west-1/clusters/demo/vpc-endpoints/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-vpc v5.8.1

terragrunt/live/production/eu-west-1/clusters/demo/vpc/terragrunt.hcl

github.com/terraform-aws-modules/terraform-aws-vpc v5.8.1

terragrunt/live/production/eu-west-1/datasources/terragrunt.hcl

terragrunt/live/production/terragrunt.hcl

Check this box to trigger a request for Renovate to run again on this repository

Issue with version 18.30 of terraform-aws-eks

Looks like they added data "aws_default_tags" "current" {}
and that is already part of the provider-aws.tf file.. and due to that we get this error:

│ Error: Duplicate data "aws_default_tags" configuration
│
│   on provider-aws.tf line 12:
│   12: data "aws_default_tags" "current" {}
│
│ A aws_default_tags data resource named "current" was already declared at
│ main.tf:3,1-34. Resource names must be unique per type in each module.

Not able to deploy eks-addons. ingress-nginx not deploying.

Can anyone please help me here?

I ran the command:

~/observer/eks-addons$ terragrunt apply

kubernetes_namespace.ingress-nginx[0]: Creating...
kubernetes_namespace.ingress-nginx[0]: Creation complete after 2s [id=ingress-nginx]
kubernetes_network_policy.ingress-nginx_default_deny[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_namespace[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_control_plane[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_ingress[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_monitoring[0]: Creating...
kubernetes_network_policy.ingress-nginx_allow_monitoring[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-monitoring]
kubernetes_network_policy.ingress-nginx_allow_ingress[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-ingress]
kubernetes_network_policy.ingress-nginx_default_deny[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-default-deny]
kubernetes_network_policy.ingress-nginx_allow_namespace[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-namespace]
kubernetes_network_policy.ingress-nginx_allow_control_plane[0]: Creation complete after 0s [id=ingress-nginx/ingress-nginx-allow-control-plane]
helm_release.kube-prometheus-stack[0]: Modifying... [id=kube-prometheus-stack]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 10s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 20s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 30s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 40s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 50s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m0s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m10s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m20s elapsed]
helm_release.kube-prometheus-stack[0]: Still modifying... [id=kube-prometheus-stack, 1m30s elapsed]
helm_release.kube-prometheus-stack[0]: Modifications complete after 1m39s [id=kube-prometheus-stack]
helm_release.ingress-nginx[0]: Creating...
helm_release.ingress-nginx[0]: Still creating... [10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [1m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [2m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [3m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [4m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [5m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [6m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [7m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [8m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [9m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [10m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m20s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m30s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m40s elapsed]
helm_release.ingress-nginx[0]: Still creating... [11m50s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m0s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m10s elapsed]
helm_release.ingress-nginx[0]: Still creating... [12m20s elapsed]
╷
│ Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
│ 
│   with helm_release.ingress-nginx[0],
│   on ingress-nginx.tf line 131, in resource "helm_release" "ingress-nginx":
│  131: resource "helm_release" "ingress-nginx" {
│ 
╵
ERRO[0976] Hit multiple errors:
Hit multiple errors:
exit status 1

The ingress-nginx-controller is in pending state. I am not sure how to debug this.
Can someone help here?

~/observer/eks-addons$  kubectl get svc -n ingress-nginx

NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   172.20.19.197   <pending>     80:30064/TCP,443:30531/TCP   18m
ingress-nginx-controller-admission   ClusterIP      172.20.32.162   <none>        443/TCP                      18m
ingress-nginx-controller-metrics     ClusterIP      172.20.49.233   <none>        10254/TCP                    18m

Thanos Query not able to fetch data from Thanos Store

Only 2 hour data is visible in grafna dashbord i have checked promethus also getting only 2 hour of data . And data is pushed to s3 bucket
my terragrunt.hcl file in eks-adons folder is
include {
path = "${find_in_parent_folders()}"
}

terraform {
source = "github.com/particuleio/terraform-kubernetes-addons.git//modules/aws?ref=v2.1.0"
}

dependency "eks" {
config_path = "../eks"

mock_outputs = {
cluster_id = "cluster-name"
cluster_oidc_issuer_url = "https://oidc.eks.eu-west-3.amazonaws.com/id/0000000000000000"
}
}

dependency "vpc" {
config_path = "../vpc"

mock_outputs = {
private_subnets_cidr_blocks = [
"privateip.cidr",
"privateip.cidr"
]
}
}

generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<-EOF
provider "aws" {
region = "${local.aws_region}"
}
provider "kubectl" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
}
data "aws_eks_cluster" "cluster" {
name = var.cluster-name
}
data "aws_eks_cluster_auth" "cluster" {
name = var.cluster-name
}
EOF
}

locals {
aws_region = yamldecode(file("${find_in_parent_folders("region_values.yaml")}"))["aws_region"]
custom_tags = merge(
yamldecode(file("${find_in_parent_folders("global_tags.yaml")}")),
yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))
)
default_domain_name = yamldecode(file("${find_in_parent_folders("global_values.yaml")}"))["default_domain_name"]
default_domain_suffix = "${local.custom_tags["Env"]}.${local.custom_tags["Project"]}.${local.default_domain_name}"
}

inputs = {

cluster-name = dependency.eks.outputs.cluster_id

tags = merge(
local.custom_tags
)

eks = {
"cluster_oidc_issuer_url" = dependency.eks.outputs.cluster_oidc_issuer_url
}

aws-ebs-csi-driver = {
enabled = true
is_default_class = true
}

aws-for-fluent-bit = {
enabled = true
}

test this with nginx controller

aws-load-balancer-controller = {
enabled = true
}

aws-node-termination-handler = {
enabled = false
}

calico = {
enabled = true
}

cert-manager = {
enabled = false
acme_email = "[email protected]"
acme_http01_enabled = true
acme_http01_ingress_class = "nginx"
acme_dns01_enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
experimental_csi_driver = true
}

cluster-autoscaler = {
enabled = true
}

cni-metrics-helper = {
enabled = false
}

external-dns = {
external-dns = {
enabled = true
},
}

ingress-nginx = {
enabled = true
use_l7 = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

istio-operator = {
enabled = false
}

karma = {
enabled = false
}

keycloak = {
enabled = false
}

kong = {
enabled = false
}

kube-prometheus-stack = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
thanos_sidecar_enabled = true
thanos_bucket_force_destroy = true
extra_values = <<-EXTRA_VALUES
grafana:
deploymentStrategy:
type: Recreate
ingress:
enabled: true
#paths:
# - /grafana
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- grafana.${local.default_domain_suffix}
#tls:
# - secretName: grafana.${local.default_domain_suffix}
# hosts:
# - grafana.${local.default_domain_suffix}
persistence:
enabled: true
storageClassName: ebs-sc
accessModes:
- ReadWriteOnce
size: 1Gi
prometheus:
ingress:
enabled: true
#paths:
# - /prometheus
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- prometheus.${local.default_domain_suffix}
#tls:
# - secretName: prometheus.${local.default_domain_suffix}
# hosts:
# - prometheus.${local.default_domain_suffix}
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'divum'
scrape_interval: 5s
ec2_sd_configs:
- region: ap-south-1
port: 9100
# This should not be here!
# check: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config, prometheus/prometheus#5738, https://www.robustperception.io/automatically-monitoring-ec2-instances
access_key: xxxxxxxxxx
secret_key: xyz
relabel_configs:
- source_labels: [__meta_ec2_tag_Name]
action: keep
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_public_ip]
target_label: ip
- source_labels: [__meta_ec2_tag_release_env,__meta_ec2_tag_service_name]
separator: ' | '
target_label: job
replicas: 1
retention: 2d
retentionSize: "6GB"
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ebs-sc
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
alertmanager:
ingress:
enabled: true
#paths:
# - /alert-manager
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- alert-manager.${local.default_domain_suffix}
#tls:
# - secretName: alert-manager.${local.default_domain_suffix}
# hosts:
# - alert-manager.${local.default_domain_suffix}
EXTRA_VALUES
}

loki-stack = {
enabled = false
bucket_force_destroy = true
}

metrics-server = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

npd = {
enabled = false
}

sealed-secrets = {
enabled = false
}

thanos = {
enabled = true
generate_ca = true
bucket_force_destroy = true
}

}

thanos sidecar
--prometheus.url=http://127.0.0.1:9090/

--grpc-address=[$(POD_IP)]:10901
--http-address=[$(POD_IP)]:10902
--objstore.config=$(OBJSTORE_CONFIG)
--tsdb.path=/prometheus
--log.level=info
--log.format=logfmt

thanos query

--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--query.replica-label=prometheus_replica
--store=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local
--store=dnssrv+_grpc._tcp.thanos-storegateway.monitoring.svc.cluster.local
--query.timeout=5m
--query.lookback-delta=15m
--query.replica-label=rule_replica

thanos store

--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--data-dir=/data
--objstore.config-file=/conf/objstore.yml
--ignore-deletion-marks-delay=24h

could you please help me out because this running on prod env in eks cluster
in Grafana datsource is promethus and url is
http://thanos-query-frontend:9090

EKS Cluster creation failed in region us-east-1

Hi,

Thank you for a great project.
While I was following getting started guide, I did not fully understand the procedure.

 ➜ tg apply -auto-approve
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/eks] 2021/01/27 17:19:49 Running command: terraform --version
[terragrunt] 2021/01/27 17:19:49 Terraform version: 0.14.5
[terragrunt] 2021/01/27 17:19:49 Reading Terragrunt config file at /Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/eks/terragrunt.hcl
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:49 Generated file /Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc/.terragrunt-cache/569303351/backend.tf.
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:49 Running command: terraform init -get=false -get-plugins=false
[terragrunt] [/Users/amitkarpe/code/terraform/tg/first/teks/terragrunt/live/mycluster/us-east-1/clusters/full/vpc] 2021/01/27 17:19:53 Running command: terraform output -json
Failed to load state: AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-3'
	status code: 400, request id: E0C2281C2098748E, host id: yOzy5JSeIdmHlj/m2M+rkKD9KY86uCrLQ+1xtp+Rp+jmYFYVaDmixcwaiy3KdvvcwKxLWkFvEJ0=
[terragrunt] 2021/01/27 17:20:01 exit status 1

I created cloned here, copied demo into mycluster folder.
While running terragrunt apply, it failed with the above errors. Can you please let me know how to follow the guide?

Also, I want to know do you have a slack channel or any community forum to a discussion about further contribution?

New AWS account has no iam role AWSServiceRoleForAutoScaling

Thanks for providing this configuration example. It helped me enormously understanding how to configure eks with terraform and terragrunt.

But trying to create an eks cluster in a new aws account I had the issue that the aws kms key for eks root volume encryption could not be created with the error message:

error creating KMS Key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.

The reason as far as I can tell is, that in github.com/particuleio/terraform-aws-kms.git the following role is reverenced by the eks root volume encryption policy:

"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"

But the role is not available in a new AWS account and is only created if an EC2 auto scaling group is created. I could fix the issue by creating an EC2 auto scaling group and afterwards deleting it again.

I think the role can be created directly by terraform, but I am no expert. Otherwise, adding some information to the documentation regarding this pitfall could help others trying to use this template.

getting this error in nginx after deploy cert-manager and ingress tls

getting error after deploy cert-manager and ingress tls and its work fine in http
terragrunt.hcl file is

 cert-manager = {
    enabled                   = true
    acme_email                = "[email protected]"
    acme_http01_enabled       = true
    acme_http01_ingress_class = "nginx"
    acme_dns01_enabled        = true
    allowed_cidrs             = local.public_subnets_cidr_blocks
    experimental_csi_driver   = true
  }
  kube-prometheus-stack = {
    enabled                     = true
    allowed_cidrs               = local.public_subnets_cidr_blocks
    thanos_sidecar_enabled      = true
    thanos_bucket_force_destroy = true
    extra_values                = <<-EXTRA_VALUES
      grafana:
        deploymentStrategy:
          type: Recreate
        ingress:
          enabled: true
          annotations:
            kubernetes.io/ingress.class: nginx
            cert-manager.io/cluster-issuer: "letsencrypt"
            kubernetes.io/tls-acme: "true"
            ingress.kubernetes.io/force-ssl-redirect: "true"    
          hosts:
            - grafana.${local.default_domain_suffix}
          tls:
            - secretName: grafana.${local.default_domain_suffix}
              hosts:
               - grafana.${local.default_domain_suffix}
        persistence:
          enabled: true
          storageClassName: ebs-sc
          accessModes:
            - ReadWriteOnce
          size: 1Gi
          }
          ```
          ------
      logs of nginx 
      ```
      "networking.k8s.io/v1beta1", ResourceVersion:"18258834", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0721 17:45:08.508516       6 backend_ssl.go:46] Error obtaining X.509 certificate: no object matching key "monitoring/prometheus.thanos.prom-stack.blackbucklabs.net" in local store
W0721 17:45:08.510744       6 controller.go:1196] Error getting SSL certificate "monitoring/prometheus.thanos.prom-stack.blackbucklabs.net": local SSL certificate monitoring/prometheus.thanos.prom-stack.blackbucklabs.net was not found. Using default certificate

certificate is persent in nginx pod but it is taking defalut one both nginx and secrect are in same name space

cert-manager log

kubectl logs -f cert-manager-8df74bb89-t6d4z  -n cert-manager
I0722 17:32:09.523111       1 start.go:74] cert-manager "msg"="starting controller"  "git-commit"="614438aed00e1060870b273f2238794ef69b60ab" "version"="v1.3.1"
W0722 17:32:09.523200       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 17:32:09.524185       1 controller.go:171] cert-manager/controller/build-context "msg"="configured acme dns01 nameservers" "nameservers"=["172.20.0.10:53"]
I0722 17:32:09.524773       1 controller.go:72] cert-manager/controller "msg"="enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger challenges clusterissuers ingress-shim issuers orders]"
I0722 17:32:09.525467       1 controller.go:131] cert-manager/controller "msg"="starting leader election"
I0722 17:32:09.526315       1 metrics.go:166] cert-manager/controller/build-context/metrics "msg"="listening for connections on" "address"={"IP":"::","Port":9402,"Zone":""}
I0722 17:32:09.526724       1 leaderelection.go:243] attempting to acquire leader lease  kube-system/cert-manager-controller...
I0722 17:33:27.726101       1 leaderelection.go:253] successfully acquired lease kube-system/cert-manager-controller
I0722 17:33:27.728026       1 reflector.go:207] Starting reflector *v1.Secret (5m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.228590       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="challenges"
I0722 17:33:29.228839       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-approver"
I0722 17:33:29.229009       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-venafi"
I0722 17:33:29.229135       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-revision-manager"
I0722 17:33:29.229282       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="ingress-shim"
I0722 17:33:29.229423       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-vault"
I0722 17:33:29.229479       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-issuing"
I0722 17:33:29.229519       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-request-manager"
I0722 17:33:29.229561       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-acme"
I0722 17:33:29.229599       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-ca"
I0722 17:33:29.229641       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="issuers"
I0722 17:33:29.229683       1 reflector.go:207] Starting reflector *v1.Order (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.229848       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-selfsigned"
I0722 17:33:29.230078       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-key-manager"
I0722 17:33:29.230183       1 reflector.go:207] Starting reflector *v1.CertificateRequest (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230317       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-metrics"
I0722 17:33:29.230437       1 reflector.go:207] Starting reflector *v1.Certificate (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230568       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-readiness"
I0722 17:33:29.230702       1 reflector.go:207] Starting reflector *v1.Pod (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230829       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="certificates-trigger"
I0722 17:33:29.229434       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="clusterissuers"
I0722 17:33:29.229351       1 reflector.go:207] Starting reflector *v1beta1.Ingress (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.229391       1 controller.go:105] cert-manager/controller "msg"="starting controller" "controller"="orders"
I0722 17:33:29.230084       1 reflector.go:207] Starting reflector *v1.Challenge (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230154       1 reflector.go:207] Starting reflector *v1.ClusterIssuer (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230663       1 reflector.go:207] Starting reflector *v1.Secret (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230921       1 reflector.go:207] Starting reflector *v1.Service (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:33:29.230119       1 reflector.go:207] Starting reflector *v1.Issuer (10h0m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
W0722 17:33:29.273356       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0722 17:33:29.299606       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
I0722 17:34:16.731919       1 setup.go:90] cert-manager/controller/clusterissuers "msg"="generating acme account private key" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:16.738384       1 setup.go:90] cert-manager/controller/clusterissuers "msg"="generating acme account private key" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:16.990767       1 setup.go:178] cert-manager/controller/clusterissuers "msg"="ACME server URL host and ACME private key registration host differ. Re-checking ACME account registration" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:17.239203       1 setup.go:178] cert-manager/controller/clusterissuers "msg"="ACME server URL host and ACME private key registration host differ. Re-checking ACME account registration" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.771611       1 setup.go:270] cert-manager/controller/clusterissuers "msg"="verified existing registration with ACME server" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.771803       1 conditions.go:95] Setting lastTransitionTime for Issuer "letsencrypt-staging" condition "Ready" to 2021-07-22 17:34:18.771785968 +0000 UTC m=+129.276599161
I0722 17:34:18.835926       1 setup.go:270] cert-manager/controller/clusterissuers "msg"="verified existing registration with ACME server" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.835965       1 conditions.go:95] Setting lastTransitionTime for Issuer "letsencrypt" condition "Ready" to 2021-07-22 17:34:18.835958833 +0000 UTC m=+129.340771996
I0722 17:34:18.934824       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:18.957978       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:21.994663       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-staging" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" "resource_version"="v1"
I0722 17:34:22.239811       1 setup.go:170] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt" "related_resource_namespace"="cert-manager" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt" "resource_namespace"="" "resource_version"="v1"
W0722 17:38:52.301824       1 warnings.go:67] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
I0722 17:42:48.343983       1 conditions.go:182] Setting lastTransitionTime for Certificate "grafana.thanos.prom-stack.blackbucklabs.net" condition "Ready" to 2021-07-22 17:42:48.343976156 +0000 UTC m=+638.848789319

cert-manager webhook log

kubectl logs -f cert-manager-webhook-86f4bbc997-kcfwx   -n cert-manager
W0722 17:32:07.986393       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0722 17:32:07.989574       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 17:32:07.989798       1 webhook.go:69] cert-manager/webhook "msg"="using dynamic certificate generating using CA stored in Secret resource"  "secret_name"="cert-manager-webhook-ca" "secret_namespace"="cert-manager"
I0722 17:32:07.990506       1 server.go:148] cert-manager/webhook "msg"="listening for insecure healthz connections"  "address"=":6080"
I0722 17:32:07.990585       1 server.go:161] cert-manager/webhook "msg"="listening for secure connections"  "address"=":10260"
I0722 17:32:07.990614       1 server.go:187] cert-manager/webhook "msg"="registered pprof handlers"
I0722 17:32:07.992240       1 reflector.go:207] Starting reflector *v1.Secret (1m0s) from external/io_k8s_client_go/tools/cache/reflector.go:156
I0722 17:32:09.127841       1 dynamic_source.go:199] cert-manager/webhook "msg"="Updated serving TLS certificate"

ingress

Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Name:             kube-prometheus-stack-grafana
Namespace:        monitoring
Address:          ae6a3fd83c00a490c92975527b65c33a-500584658.ap-south-1.elb.amazonaws.com
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  grafana.thanos.prom-stack.blackbucklabs.net terminates grafana.thanos.prom-stack.blackbucklabs.net
Rules:
  Host                                         Path  Backends
  ----                                         ----  --------
  grafana.thanos.prom-stack.blackbucklabs.net
                                               /   kube-prometheus-stack-grafana:80 (10.32.37.79:3000)
Annotations:                                   cert-manager.io/cluster-issuer: letsencrypt
                                               ingress.kubernetes.io/force-ssl-redirect: true
                                               kubernetes.io/ingress.class: nginx
                                               kubernetes.io/tls-acme: true
                                               meta.helm.sh/release-name: kube-prometheus-stack
                                               meta.helm.sh/release-namespace: monitoring
Events:                                        <none>

block_device_mappings.0.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

Hi, I'm using your project to create an eks cluster, but I'm having this error., can you explain if I'm doing something wrong?

Error: "block_device_mappings.0.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

with module.eks_managed_node_group["default-a"].aws_launch_template.this[0],
on modules/eks-managed-node-group/main.tf line 45, in resource "aws_launch_template" "this":
45: resource "aws_launch_template" "this" {

Error: "block_device_mappings.1.ebs.0.kms_key_id" (arn:::aws) is an invalid ARN: arn: not enough sections

demo cluster is not working

Hello All,
I am trying to spin up the demo cluster and by default, it is not working to create the managed worker nodes. Is this a bug in the latest AMI? Or is it an issue with something I am doing?

Issue with the eks-asg-tags.tf

Hello @ArchiFleKs , looks like I found another issue. when I provide multiple subnet_ids for a managed node group

      subnet_ids              = [dependency.vpc.outputs.private_subnets[0], dependency.vpc.outputs.private_subnets[1], dependency.vpc.outputs.private_subnets[2]]

I get this error with the eks-asg-tags.tf.

│ Error: Invalid function argument
│ 
│   on eks-asg-tags.tf line 44, in resource "null_resource" "node_groups_asg_tags":
│   44:   "Value" : one(data.aws_autoscaling_group.node_groups[each.key].availability_zones),
│     ├────────────────
│     │ data.aws_autoscaling_group.node_groups is object with 4 attributes
│     │ each.key is "gpu"
│ 
│ Invalid value for "list" parameter: must be a list, set, or tuple value
│ with either zero or one elements.

Multi Cloud Support

Hi Particule IO Team!

We are getting ready to implement terragrunt and kubernetes in Azure and/or GCP. We love what you have done with teks and appreciate all the hard work. Is there anyway you have suggestions or a template / way of doing teks and terragrunt for Azure and GCP?

Thank you for all your support so far!

Zach

Terragrunt Structure for China and Gov Cloud

Hi All,

Just wanted to get some of your recommendations for the terragrunt folder structure if we had prod environments in:

Same Account

us-west-2
us-east-1

Different Accounts

us-china
us-gov

I was thinking we could have us-west-2 and us-china environments in the same folder structure for terragrunt/live/prod, but it looks like that groups stacks together in the same AWS account. So it would need to be terragrunt/live/china and terragrunt/live/gov.

Just wanted to ping here for your recommendations? @ArchiFleKs

Last version v1.7.0 not valid sample file/directory

In the last version release no more time ago, the sample directory and files no compatible with the latest changes and improvement does to code.

Addition the last version of terragrunt v0.18.7 it is not compatible with terraform 0.12.1 issue

Issue creating KMS key

For testing out tEKS i don't want to use KMS for ebs volume encryption, however the module insists on creating resources and fails.

This is the failing resource:

 # aws_kms_key.this will be created
  + resource "aws_kms_key" "this" {
      + arn                                = (known after apply)
      + bypass_policy_lockout_safety_check = false
      + customer_master_key_spec           = "SYMMETRIC_DEFAULT"
      + description                        = "EKS Secret Encryption Key for my-foo-cluster"
      + enable_key_rotation                = true
      + id                                 = (known after apply)
      + is_enabled                         = true
      + key_id                             = (known after apply)
      + key_usage                          = "ENCRYPT_DECRYPT"
      + multi_region                       = false
      + policy                             = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "kms:*"
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:root"
                        }
                      + Resource  = "*"
                      + Sid       = "Enable IAM User Permissions"
                    },
                  + {
                      + Action    = [
                          + "kms:ReEncrypt*",
                          + "kms:GenerateDataKey*",
                          + "kms:Encrypt",
                          + "kms:DescribeKey",
                          + "kms:Decrypt",
                        ]
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
                        }
                      + Resource  = "*"
                      + Sid       = "Allow service-linked role use of the CMK"
                    },
                  + {
                      + Action    = "kms:CreateGrant"
                      + Condition = {
                          + Bool = {
                              + "kms:GrantIsForAWSResource" = [
                                  + "true",
                                ]
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "arn:aws:iam::123456789012:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
                        }
                      + Resource  = "*"
                      + Sid       = "Allow attachment of persistent resources"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + tags                               = {
          + "Environment" = "foo"
          + "Owner"       = "me"
          + "Project"     = "teks"
        }
      + tags_all                           = {
          + "Environment" = "foo"
          + "Owner"       = "me"
          + "Project"     = "teks"
        }
    }

And this is the error message i get:

╷
│ Error: error creating KMS Key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.
│ 
│   with aws_kms_key.this,
│   on main.tf line 1, in resource "aws_kms_key" "this":
│    1: resource "aws_kms_key" "this" {
│ 
╵

The role AWSServiceRoleForAutoScaling does not exist yet.

Installing loki fails because of ingress value seemingly incorrect

I have tried to let teks create loki via the the terraform-kubernetes-addons repo.

However i do get this error:

╷
│ Error: failed to create resource: Ingress.extensions "loki" is invalid: spec.rules[0].host: Invalid value: "map[host:logz.my.domain.tld paths:[/]]": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
│ 
│   with helm_release.loki-stack[0],
│   on loki-stack.tf line 136, in resource "helm_release" "loki-stack":
│  136: resource "helm_release" "loki-stack" {
│ 
╵

The hostname can properly be matched against the given regex, so i don't see why it should fail.
What i noticed however was that it seemed to render a go object as the hostname (which doesn't make sense to me when looking at the manifests that are templated).
Or do i interpret this incorrectly?
I am a little confused anyway about this, as when i am rendering the loki-stack chart in the given version (2.8.2) with the values given it would not create an ingress resource.
That is due to the helm chart loki-stack having the loki helm chart as a dependency and thus expecting its configuration in loki.*.
When i do change that in the extra_values it doesn't create an ingress resource at all though, so it seems that terraforms helm vs. my local one (v3.10.0) is behaving differently?

I only removed the annotations, from the terragrunt.hcl for the loki stack, as i don't want to use the nginx ingress class.
So my extra_values look like this:

    extra_values         = <<-VALUES
      resources:
        requests:
          cpu: 1
          memory: 2Gi
        limits:
          cpu: 2
          memory: 4Gi
      config:
        limits_config:
          ingestion_rate_mb: 320
          ingestion_burst_size_mb: 512
          max_streams_per_user: 100000
        chunk_store_config:
          max_look_back_period: 2160h
        table_manager:
          retention_deletes_enabled: true
          retention_period: 2160h
      ingress:
        enabled: true
        hosts:
          - host: logz.${include.root.locals.merged.default_domain_name}
            paths: ["/"]
        tls:
          - secretName: logz.${include.root.locals.merged.default_domain_name}
            hosts:
              - logz.${include.root.locals.merged.default_domain_name}
        VALUES

I'd be really glad for some hints on where to look at.

setting `aws_account_id` doesn't ensure all resources are created in that account

I am trying out this template for EKS cluster creation right now.

While doing the apply, i was wondering why my vpc endpoint resources did not show up in the new subaccount that i created.

Turns out they were created in the main account i was using, even though i set aws_account_id to the sub account.

That is not ideal or obvious to a new user and i assume it also is a bug?

These resources i can see in my main account, which should be in the new sub account instead:

vpc
subnets
routing tables
igw
egress igw
eip
endpoints
nat gw

I see that it says in the requirements [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) configured with the account you want to deploy into, however my assumption was that my profile should have the permissions needed to create the resources.

Why else would there be a aws_account_id variable?

It seems i will have to use the iam_role option then to enforce where to spawn the resources, will check that out.

When destroying the incorrectly created resources, i do now get:

╷
│ Error: expected "url" url to not be empty, got 
│ 
│   with data.flux_sync.main[0],
│   on flux2.tf line 103, in data "flux_sync" "main":
│  103:   url         = local.flux2["github_url"]
│ 
╵
╷
│ Error: error reading EKS Cluster (cluster-name): couldn't find resource
│ 
│   with data.aws_eks_cluster.cluster,
│   on provider-local.tf line 33, in data "aws_eks_cluster" "cluster":
│   33: data "aws_eks_cluster" "cluster" {
│ 
╵

not ideal, because we wanted to use flux2 without github.
Will try that again with a demo url set.

Setting a demo url did actually not let me remove the resources, so i manually removed them.

"cert-manager" has no deployed releases

I'm trying to run EKS, and after many errors, I get to finish the apply once.

Now, if I try to destroy or to apply again, it shows always this error:

helm_release.cert-manager[0]: Modifying... [id=cert-manager]
>
> Error: "cert-manager" has no deployed releases
> 
>   with helm_release.cert-manager[0],
>   on cert-manager.tf line 117, in resource "helm_release" "cert-manager":
> 117: resource "helm_release" "cert-manager" {

Someone had the same issue?

Not able to create aws eks cluster

when i run terragrunt run-all apply getting this error

 INFO[0708] Executing hook: kubeconfig                    prefix=[/Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks]
ERRO[0714] Error running hook kubeconfig with message: exit status 1  prefix=[/Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks]
ERRO[0714] Module /Users/ramesh/zinka-monitoring/prod-deployment-2/terragrunt/live/thanos/ap-south-1/clusters/observer/eks has finished with an error: 4 errors occurred:
  * exit status 1
  * exit status 1
  * exit status 1
  * exit status 1
  
  eks terragrunt file 
  include {
  path = "${find_in_parent_folders()}"
}

terraform {
  source = "github.com/terraform-aws-modules/terraform-aws-eks?ref=master"

  after_hook "kubeconfig" {
    commands = ["apply"]
    execute  = ["bash", "-c", "terraform output --raw kubeconfig 2>/dev/null > ${get_terragrunt_dir()}/kubeconfig"]
  }

  after_hook "kubeconfig-tg" {
    commands = ["apply"]
    execute  = ["bash", "-c", "terraform output --raw kubeconfig 2>/dev/null > kubeconfig"]
  }

  after_hook "kube-system-label" {
    commands = ["apply"]
    execute  = ["bash", "-c", "kubectl --kubeconfig kubeconfig label ns kube-system name=kube-system --overwrite"]
  }

  after_hook "undefault-gp2" {
    commands = ["apply"]
    execute  = ["bash", "-c", "kubectl --kubeconfig kubeconfig patch storageclass gp2 -p '{\"metadata\": {\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"false\"}}}'"]
  }
}

locals {
  aws_region = yamldecode(file("${find_in_parent_folders("region_values.yaml")}"))["aws_region"]
  env        = yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))["Env"]
  prefix     = yamldecode(file("${find_in_parent_folders("global_values.yaml")}"))["prefix"]
  name       = yamldecode(file("${find_in_parent_folders("cluster_values.yaml")}"))["name"]
  custom_tags = merge(
    yamldecode(file("${find_in_parent_folders("global_tags.yaml")}")),
    yamldecode(file("${find_in_parent_folders("env_tags.yaml")}"))
  )
  cluster_name = "${local.prefix}-${local.env}-${local.name}"

  vpc_id = "xxxxxxxx"
  
  # these should be private subnets
  subnet_ids = [
      "subnet-xxxxxxxxxxx",
      "subnet-xxxxxxxxxxx",
      "subnet-xxxxxxxx",
  ]
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents  = <<-EOF
    provider "aws" {
      region = "${local.aws_region}"
    }
    provider "kubernetes" {
      host                   = data.aws_eks_cluster.cluster.endpoint
      cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
      token                  = data.aws_eks_cluster_auth.cluster.token
    }
    data "aws_eks_cluster" "cluster" {
      name = aws_eks_cluster.this[0].id
    }
    data "aws_eks_cluster_auth" "cluster" {
      name = aws_eks_cluster.this[0].id
    }
  EOF
}

inputs = {

  aws = {
    "region" = local.aws_region
  }

  tags = merge(
    local.custom_tags
  )

  cluster_name                         = local.cluster_name
  subnet_ids                           = local.subnet_ids
  vpc_id                               = local.vpc_id
  write_kubeconfig                     = true
  enable_irsa                          = true
  kubeconfig_aws_authenticator_command = "aws"
  kubeconfig_aws_authenticator_command_args = [
    "eks",
    "get-token",
    "--cluster-name",
    local.cluster_name
  ]
  kubeconfig_aws_authenticator_additional_args = []

  cluster_version           = "1.19"
  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  # Should contain security groups for Office Access only
  # https://aws.amazon.com/blogs/containers/upcoming-changes-to-ip-assignment-for-eks-managed-node-groups/
  node_groups = {
    "default-${local.aws_region}" = {
      create_launch_template = true
      public_ip              = true
      key_name               = "awsKeyName"
      desired_capacity       = 3
      max_capacity           = 5
      min_capacity           = 3
      instance_types         = ["m5a.large"]
      disk_size              = 30
      k8s_labels = {
        pool = "default"
      }
      capacity_type = "ON_DEMAND"
    }
  }
}

Couldn't find EKS resource

Hi, could u please help me to find out how to properly define a data source for cluster critical addons? I'm getting this error on terragrunt run-all plan command
Error: error reading EKS Cluster (cluster-name): couldn't find resource │ │ with data.aws_eks_cluster.cluster, │ on provider-local.tf line 22, in data "aws_eks_cluster" "cluster": │ 22: data "aws_eks_cluster" "cluster" {

and this is my critical addons terragrunt.hcl file
`
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "${get_parent_terragrunt_dir()}/_modules/remote/critical_addons/modules/aws"
}

locals {
environment_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))

env                                               = local.environment_vars.locals.environment
general_name                              = local.environment_vars.locals.general_name
mock_commands                         = local.environment_vars.locals.mock_commands
cluster_version                             = local.environment_vars.locals.cluster_version
region                                          = local.region_vars.locals.aws_region
extra_values                                 = local.environment_vars.locals.extra_values
kube_system_namespace            = local.environment_vars.locals.kube_system_namespace

}

generate "provider" {
path = "provider-local.tf"
if_exists = "overwrite"
contents = <<EOF
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}

provider "kubectl" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}

provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
}

data "aws_eks_cluster" "cluster" {
name = var.cluster-name
}

data "aws_eks_cluster_auth" "cluster" {
name = var.cluster-name
}
EOF
}

dependency "eks" {
config_path = find_in_parent_folders("eks")

mock_outputs_allowed_terraform_commands = local.mock_commands
mock_outputs = {
  cluster_id              = "cluster-name"
  cluster_oidc_issuer_url = "https://oidc.eks.us-east-1.amazonaws.com/id/0000000000000000"
}

}

dependency "vpc" {
config_path = find_in_parent_folders("vpc")

mock_outputs_allowed_terraform_commands = local.mock_commands
mock_outputs = {
private_subnets_cidr_blocks = ["fake","fake"]
}
}

dependencies {
paths = ["../iam/oidc"]
}

inputs = {
cluster-name = dependency.eks.outputs.cluster_id

eks = {
  "cluster_oidc_issuer_url" = dependency.eks.outputs.cluster_oidc_issuer_url
}

metrics-server = {
  enabled       = true
  extra_values  = local.extra_values
  namespace     = local.kube_system_namespace
  allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
}

cluster-autoscaler = {
  enabled       = true
  namespace     = local.kube_system_namespace
  extra_values  = local.extra_values 
}

keda = {
  enabled          = true
  extra_values     = local.extra_values 
  create_ns        = true
}