Giter VIP home page Giter VIP logo

aws-eks-best-practices's Introduction

Amazon Elastic Kubernetes Service (Amazon EKS) Best Practices

A best practices guide for day 2 operations, including operational excellence, security, reliability, performance efficiency, and cost optimization.

Return to Live Docs.

Contributing

While the best practices were originally authored by AWS employees, we encourage and welcome contributions from the Kubernetes user community. If you have a best practice that you would like to share, please review the Contributing Guidelines before submitting a PR.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.

aws-eks-best-practices's People

Contributors

alanty avatar andrewcr7 avatar awsbpfeiff avatar bellkev avatar chipzoller avatar cmanikandan avatar dependabot[bot] avatar ellistarn avatar federicaciuffo avatar geoffcline avatar hackmd-deploy avatar jamesiri avatar jicomusic avatar jicowan avatar jimmyraywv avatar kbiton avatar liwadman avatar lukemwila avatar marciogmorales avatar realvz avatar rodrigobersa avatar rothgar avatar senatoredu avatar sheetaljoshi avatar simyung avatar sotoiwa avatar svennam92 avatar tzneal avatar urbanadventurer avatar wasiqaws avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-eks-best-practices's Issues

Private EKS Cluster not accessble

Describe the problem
Hi, this is srinivasa am created EKS cluster in AWS using EKSCTL but default it will create public eks (API server endpoint access) but it is i need to change this one into private am trying from AWS console after changing in to private from kube-server where i installed kubectl and eksctl i cant able to access that cluster am getting error tcp:ip ip:443 i/o timeout my kubeserver is in private subnet only and all my worker nodes is also in private only but i dont know why am getting this error from my kube-machine please help me for this to troubleshoot incase u need any info i will provide
EKS-version 1.15
thank you

References
Please include a link to the lines where the error appears.

Windows Caveats

Speaking as someone who is attempting to run a mixed EKS cluster with both Windows and Linux workloads, there are several caveats and best practices that are specific to running Windows nodes that it would be helpful to highlight. It would also be beneficial to point out if certain tools or recommendations are incompatible with Windows.

Perhaps the best solution to this overall is an entire section dedicated to Windows EKS best practices, but intermingling notes about running Windows with existing sections could work as well.

I hope this feedback is helpful, even though it didn't quite fit the defined template.

Recommendation to use lifecycle policy in ECR

Is your idea request related to a problem that you've solved? Please describe.
NIST SP800-190 (Application Container Security Guide) lists "3.2.2 Stale images in registries" as a registry risk. As a countermeasure, it recommends automating the removal of insecure images in "4.2.2 Stale images in registries".

Describe the best practice
Amazon ECR lifecycle policies enable you to specify the lifecycle management of images in a repository. Consider using lifecycle policies to automate the removal of images for older generations that may be insecure.

egress-operator && kube-scan

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.

Whitelist the image registry

Is your idea request related to a problem that you've solved? Please describe.
In NIST SP800-190, "3.1.5 Use of untrusted images" is listed as an image risk, and "Enforcement to ensure that all hosts in the environment only run images from these approved lists" is written as a countermeasure example.

The CIS EKS Benchmark also mentions "5.1.4 Minimize Container Registries to only those approved".

The EKS Workshop has an example of whitelisting the registry with OPA.

Actually, there are sample policies for OPA, Gatekeeper, and Kyverno that whitelist the registry in this best practices guide repository.

The current EKS Best Practices Guide does not clearly mention whitelisting the container registry as a recommendation, so how about mentioning it?

Describe the best practice
Consider only allowing images from approved image registries to run. Policy solutions such as OPA and Kyverno can be used for this purpose. Example policies can be found here.

TODO: Auditing additions

Audit changes to the aws-auth ConfigMap
Monitor increases in 403 Forbidden and 401 Unauthorized response codes (already have Log Insights queries in the doc. Need to add timeframes)
Anonymous calls to the API server
alert when there's an increate in 403 Forbidden responses, show attributes host, sourceIPs, and k8s_user.username
misconfigured RBAC policies, unusual API calls
401s: identify authentication issues (e.g., expired certificates or malformed tokens)

Fargate pod incorrectly treated as "Task"

Describe the problem
In the infrastructure page, a Fargate Pod is incorrectly referred as "Task". This common ocurrence lead to confusion with ECS service:

References
Please include a link to the lines where the error appears.

With EKS Fargate, AWS will automatically update the underlying infrastructure as updates become available. Oftentimes this can be done seamlessly, but there may be times when an update will cause your task to be rescheduled.

https://aws.github.io/aws-eks-best-practices/hosts/#treat-your-infrastructure-as-immutable-and-automate-the-replacement-of-your-worker-nodes

Include content about OIDC authentication for EKS

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.

In 1.19, it is not necessary to specify the securityContext when using IRSA in non-root containers

Describe the problem

According to the above topic, when using IRSA, non-root containers need to specify fsGroup in the securityContext and set the file permissions for the web identity token. In Kubernetes 1.19, this is no longer required, so it is a good idea to add this point.

This is documented in the EKS documentation.

https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-1.19

You're no longer required to provide a security context for non-root containers that need to access the web identity token file for use with IAM roles for service accounts. For more information, see IAM roles for service accounts andproposal for file permission handling in projected service account volume on GitHub.

References
https://aws.github.io/aws-eks-best-practices/security/docs/iam/#run-the-application-as-a-non-root-user

RFE: Provide more guidance concerning network policies

@jicowan Please provide more guidance and clarification regarding the many options available for enforcing network policies.

The network security section recommends several network policies and mentions CNI plugins (e.g., Cilium or Calico), or a service mesh (e.g. AWS App Mesh) as means of enforcement.

How to choose between these options?

Additional clarification regarding general applicability of Security Groups for Pods and App Mesh would also be helpful:

  • Are security groups for pods a viable means of enforcing all of the recommended traffic controls? They were designed to control egress (right?) and it is unclear how generally applicable they are.
  • Now that AWS App Mesh handles ingress, egress, and virtual gateways, is it a viable one-stop solution?

Related:

awsdocs/amazon-eks-user-guide#88

Write blurb on using OPA as an alternative to PSPs

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.
based on content from https://www.infracloud.io/kubernetes-pod-security-policies-opa/

Duplicate recommendations about running containers as non-root user

Describe the problem
There are duplicate recommendations about running the container as a non-root user in both the IAM category and the Pod Security category .

The descriptions in the IAM and Pod Security categories are duplicated, and since they are about the securityContext, I think it is better to put it only in the Pod Security category.

It's also mentioned in the Image category, but I think that's fine as it's about specifying the USER in the Dockerfile.

References
https://aws.github.io/aws-eks-best-practices/security/docs/iam/#run-the-application-as-a-non-root-user
https://aws.github.io/aws-eks-best-practices/security/docs/pods/#do-not-run-processes-in-containers-as-root
https://aws.github.io/aws-eks-best-practices/security/docs/image/#add-the-user-directive-to-your-dockerfiles-to-run-as-a-non-root-user

TODO: Tracee for finding evasive malware

Image scanning can find vulnerabilities and malware
In libraries, packages, etc.
Compare SHA of the files with SHAs of known malware
Scanners can detect misconfigurations, e.g. secrets embedded in the container image

Chain of trust is important
Evasive malware
Scanning is important but not sufficient
Malware can be hiding dormant in container images
In tar files or nested tar files; decompressed when the image is run
Need runtime security because static analysis is not enough

Honeypot
Shift left (run containers in a sandbox)
eBPF what allows you to plug into the kernel to handle
Evasive malware. Will make call into the kernel where eBPF is waiting. Instrument parts of the kernel and make an assessment

Tracee - CLI tool to detect malware in the sandbox.
Need to know what to look for
DTA (Dynamic Threat Analysis) wraps session into a product, run container in sandbox, run tracee in there and present the findings in a dashboard. Assigns a risk score to the container
Free to start
Can combined with Elasticsearch to look for specific findings.

Recommendation to create an EKS cluster with a dedicated IAM role

Is your idea request related to a problem that you've solved? Please describe.
The IAM user / role that creates an EKS cluster always has admin access. User management to the cluster is configured through the aws-auth ConfigMap, however this user/role is not present in this file.  Unless access to this user/role is protected and monitored, this can be used to gain privileged access to the cluster.

Describe the best practice
A good solution to this problem is to create a custom IAM Role that is exclusively used to create the EKS cluster. Controls can be put in place to control who can assume this role.Additionally, once the cluster's aws-auth ConfigMap has been configured and additional users have been granted access, for extra protection this role can be deleted, provided that it can be recreated with the same ARN. This ensures that this backdoor entry to the cluster does not remain, but that it can be later recreated to gain access in an emergency / break glass situation. Recreating the role gives an additional audit trail, which is especially useful for controlling user access to production clusters that do not usually have direct user access configured.

Describe alternatives you've considered
Other alternatives would also exist if this initial root access could be configured, but this is not currently supported by EKS.For an additional security control when this root role is recreated, an automated function could delete the role after an hour of it being created to ensure access to production clusters is automatically revoked after a period of time.

PR incoming with suggested wording

EKS support for FlexVolume Plugin

Is there a way to support FlexVolume plugin in EKS?

Currently, the only supported CSi drivers on EKS are EBS, EFS and FSx for Lustre storage classes but these storage classes can’t be used on windows pods that require access to file shares (File shares in windows natively use SMB protocol).

Kubernetes has an alternative way to support SMB storage classes – Flexvolume plugin. The issue is this plugin is required to be installed on both master and worker nodes but since EKS doesn’t give access to the control plane "Master node", it makes it difficult to install the plugin.

The installation guide for flexvolume can be found here - and

It requires this command run on each node

VOLUME_PLUGIN_DIR="/usr/libexec/kubernetes/kubelet-plugins/volume/exec" mkdir -p "$VOLUME_PLUGIN_DIR/fstab~cifs" cd "$VOLUME_PLUGIN_DIR/fstab~cifs" curl -L -O https://raw.githubusercontent.com/fstab/cifs/master/cifs chmod 755 cifs

This installation guide works well on self-managed K8s cluster.

apt require `apt update` before `apt upgrade`

Hi,
Thanks for writing
You should include RUN apt-get upgrade in your Dockerfiles to upgrade the packages in your images.
Could you rewrite two things

  1. Please use apt-get update && apt-get upgrade. update is required before upgrade. It is a one of frequently showing mistake...
  2. apt clean is also useful. It can clean files in /var/cache/apt/archives.

References
https://aws.github.io/aws-eks-best-practices/security/docs/image/#update-the-packages-in-your-container-images

https://www.debian.org/doc/manuals/debian-handbook/sect.apt-get.en.html

Flesh out more Auth Federation details

It would be great to have additional information about the mechanics of federating with an AD or LDAP provider.

I think it’s worth expanding a bit on how things fit together. How the federated Roles are mapped to IAM Roles which, in turn, are used in the ConfigMap. How that ConfigMap gets created/updated, etc.

Network Policy can be used to selectively allow metadata access to the pods

Is your idea request related to a problem that you've solved? Please describe.

The above topics describes how to block access to instance metadata

In my experience, the Kinesis Client Library used by some Pods does not support IRSA. So I could not block metadata access because using only IMDSv2 and setting hop count to 1 or using iptables would target all the Pods on the node.

However I could use Kubernetes Network Policy to selectively allow metadata access to the pods.

How about adding a description of how to block metadata access using Network Policy?

Describe the best practice

You can use Kubernetes Network Policy to block metadata access and selectively allow to some pods.

At first, block access to the metadata service from all pods by adding following policy.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-metadata-access
  namespace: example
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32

Then allow access from some pods by adding following policy.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-metadata-access
  namespace: example
spec:
  podSelector:
    matchLabels:
      app: myapp  
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 169.254.169.254/32

CNI custom networking for managed node groups

Describe the problem
The web page section "EKS managed node groups currently don’t support custom networking option." This conflicts with AWS official documentation.
https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking indicates "

References
Web page section: https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking : Claims "EKS managed node groups currently don’t support custom networking option."

AWS official docs: https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html shows example with managed node groups.

Recommendation to include sessionName when mapping roles in aws-auth ConfigMap

Is your idea request related to a problem that you've solved? Please describe.
When accessing the EKS cluster with the IAM entity mapped by aws-auth ConfigMap, the username described in aws-auth ConfigMap is recorded in the user field of the Kubernetes audit log. If you're using an IAM role, the actual users who assume that role aren't recorded and can't be audited.

Describe the best practice
When assigning K8s RBAC permissions to an IAM role using mapRoles in aws-auth ConfigMap, you should include {{SessionName}} in your username. That way, the audit log will record the session name so you can track who the actual user assume this role along with the CloudTrail log.

- rolearn: arn:aws:iam::XXXXXXXXXXXX:role/testRole
  username: testRole:{{SessionName}}
  groups:
    - system:masters

Accessing the IRSA service account token as a non-root user

If you run your application as a non-root user [a best practice] you cannot access the IRSA service account token because it is assigned 0600 [root] permissions by default. If you update the securityContext for your container to include fsgroup=65534 [Nobody] the container will be able to read the token.

spec:
  securityContext:
    fsGroup: 65534

This is supposed to be fixed in an upcoming release of k8s, kubernetes/enhancements#1598.

Add seccomp.security.alpha.kubernetes.io/pod: "runtime/default"

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
This implements the runtime defaults of Docker or another CRI.

Describe alternatives you've considered
After 1.19 it's part of the securityContext for Pod or container.

Additional context
Add any other context or screenshots about the idea here.

Conflicting advice about mixed instance policies

Describe the problem
The spot instances section states this in the first paragraph: "Mixed Instance Policies with Spot Instances are a great way to increase diversity without increasing the number of node groups...". Later in the third paragraph, the statement is: "It's recommended to isolate On-Demand and Spot capacity into separate EC2 Auto Scaling groups. This is preferred over using a base capacity strategy because the scheduling properties are fundamentally different...". It is unclear what exactly is the recommendation - mixed instance policies or separate ASGs. I think it will be best to remove the statement about mixed instance policies being "great".

References
https://github.com/aws/aws-eks-best-practices/blob/master/content/cluster-autoscaling/cluster-autoscaling.md#spot-instances

Embellish section on Forensics

Describe the best practice
Customers want additional information about how to do a forensics investigation involving containers.

This is an evolving space. Performing a forensics against a container is challenging because containers are oftentimes ephemeral; by the time you realize a container has been compromised, the container has been replaced. You can compensate for this by running software that warns of suspicious behavior while the container is running, but additional guidance is necessary to capture evidence of a breach.

Cannot apply PodSecurityPolicy configurations in 'content/security/docs/pods.md'

Describe the problem

I cannot apply PodSecurityPolicy configurations in https://aws.github.io/aws-eks-best-practices/pods/

Here is the output of applying the PSP named "eks.privileged".

$ cat << EOF | k apply -f -
> apiVersion: extensions/v1beta1
> kind: PodSecurityPolicy
> metadata:
>   annotations:
>     kubernetes.io/description: privileged allows full unrestricted access to pod features,
>       as if the PodSecurityPolicy controller was not enabled.
>     seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
>   labels:
>     eks.amazonaws.com/component: pod-security-policy
>     kubernetes.io/cluster-service: "true"
>   name: eks.privileged
> spec:
>   allowPrivilegeEscalation: true
>   allowedCapabilities:
>   - '*'
>   fsGroup:
>     rule: RunAsAny
>   hostIPC: true
>   hostNetwork: true
>   hostPID: true
>   hostPorts:
>   - max: 65535
>     min: 0
>   privileged: true
>   runAsUser:
>     rule: RunAsAny
>   seLinux:
>     rule: RunAsAny
>   supplementalGroups:
>     rule: RunAsAny
>   volumes:
>   - '*'
> EOF
error: unable to recognize "STDIN": no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1"

apiVersion should be policy/v1beta1 instead of extensions/v1beta1.
In the PSP named "restricted", it is already policy/v1beta1.

Also, I cannot apply the "restricted" PSP.

$ cat <<EOF | k apply -f -
> apiVersion: policy/v1beta1
> kind: PodSecurityPolicy
> metadata:
>     name: restricted
>     annotations:
>     seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
>     apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
>     seccomp.security.alpha.kubernetes.io/defaultProfileName:  'runtime/default'
>     apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
> spec:
>     privileged: false
>     # Required to prevent escalations to root.
>     allowPrivilegeEscalation: false
>     # This is redundant with non-root + disallow privilege escalation,
>     # but we can provide it for defense in depth.
>     requiredDropCapabilities:
>     - ALL
>     # Allow core volume types.
>     volumes:
>     - 'configMap'
>     - 'emptyDir'
>     - 'projected'
>     - 'secret'
>     - 'downwardAPI'
>     # Assume that persistentVolumes set up by the cluster admin are safe to use.
>     - 'persistentVolumeClaim'
>     hostNetwork: false
>     hostIPC: false
>     hostPID: false
>     runAsUser:
>     # Require the container to run without root privileges.
>     rule: 'MustRunAsNonRoot'
>     seLinux:
>     # This policy assumes the nodes are using AppArmor rather than SELinux.
>     rule: 'RunAsAny'
>     supplementalGroups:
>     rule: 'MustRunAs'
>     ranges:
>         # Forbid adding the root group.
>         - min: 1
>         max: 65535
>     fsGroup:
>     rule: 'MustRunAs'
>     ranges:
>         # Forbid adding the root group.
>         - min: 1
>         max: 65535
>     readOnlyRootFilesystem: false
> EOF
error: error parsing STDIN: error converting YAML to JSON: yaml: line 40: did not find expected '-' indicator

It seems that some indents in the configuration is incorrect. (e.g. annotations, rule)

Here is my cluster versions.

$ k version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-e16311", GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", BuildDate:"2020-03-27T22:40:13Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-e16311", GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", BuildDate:"2020-03-27T22:37:12Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

References
https://aws.github.io/aws-eks-best-practices/pods/

Calico and eBPF for large cluster

In the Running large clusters section, we mention the ipvs mode of kube-proxy.

There is a AWS blog about replacing kube-proxy with calico in eBPF mode, which seems to perform even better. 1.19 and above will work on Amazon Linux 2 AMI with kernel 5.4 and calico in eBPF mode seems to be available. How about mentioning this AWS blog?

Run apt-get upgrade to upgrade OS packages in your image

Is your idea request related to a problem that you've solved? Please describe.
https://pythonspeed.com/articles/security-updates-in-docker/ << reference this blog

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.

Stream Kubernetes audit logs to S3 using CWL subscriptions and Firehose

Is your idea request related to a problem that you've solved? Please describe.
It can be difficult to analyze the audit logs when they are in CWLs. If they're streamed to S3, you can use Athena, Glue, SageMaker, and other AWS services to analyze them. You can also use tools like audit2rbac to create RBAC policies, e.g. roles, rolebindings, clusterroles, and clusterrolebindings, from observed behavior in the logs.

Describe the best practice

  • Create S3 bucket
  • Create IAM policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket>",
                "arn:aws:s3:::<bucket>/*"
            ]
        }
    ]
}
  • Create Lambda function from Lambda kinesis-firehose-cloudwatch-logs-processor-python blueprint
  • Configure Firehose stream to deliver logs to your bucket. Enable source record transformation and specify the Lambda function you created from the blueprint.
  • Create IAM policy to allow CWL to put data onto the stream:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<account>:role/CWLtoKinesisFirehoseRole"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "firehose:*",
            "Resource": "arn:aws:firehose:us-west-2:<account>:*"
        }
    ]
}
  • Create a subscription for the log group where you audit logs can be found:
aws logs put-subscription-filter --log-group-name "<log group>" --filter-name "Destination" --destination-arn "<firehose_arn>" --role-arn "<role_arn>" --filter-pattern ""

Logs streamed from CWL to S3 with Firehose are automatically compressed. The Lambda function decompresses the logs before they're written to S3.

Describe alternatives you've considered
https://github.com/rafaelpereyra/ekscw-export

Additional context
Once the logs are in S3 you can run the following:

audit2rbac --filename audit-delivery-stream-2-2020-06-19-19-01-58-c91bfdd2-d182-4803-8ba8-bca8284a5aaf --user=bob

This will generate RBAC permissions for user Bob based on observed behavior in the logs.

Fargate Profile

Describe the problem
Write a blurb on the Fargate Execution Role and issues with the path and aws-auth configmap.

References
Please include a link to the lines where the error appears.

More guidance for the upcoming deprecation of PSPs

@jicowan Currently, OPA Gatekeeper is only mentioned in two links in a Tools and Resources section at the bottom of the page. Is this adequate guidance? (I ask this naively not rhetorically.)

Given that PSP is deprecated, I'm trying to determine what the best practice should be regarding pod security. Can you discuss the decision of whether to replace and/or augment PSP with Gatekeeper or Kyverno in the body of this section? I would appreciate it if you could recommend a course of action. Or are we to assume that we should stick with PSP for now, even if we are creating a new cluster?

Originally posted by @joebowbeer in #16 (comment)

strace (seccomp) && docker profile

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.

Fix statement about kube-bench reporting false positives

Many URLs are not linked

Describe the problem
While reading, I noticed a number of plain URLs not actually being links.

References
Submitting a PR to linkify all the URLs I could find.

Broken link to the k8s audit policy that EKS uses as a base

Describe the problem
At the top of the Detective Control section, it mentions the k8s audit policy that EKS uses. The range of lines in the linked code is not correct because the referenced configure-helper.sh has been updated.

The guide refers to L983-L1108, but L1116-L1241 seems to be correct at this moment.

Is EKS still using this latest updated GCP configure-helper.sh as a base? The EKS user guide does not mention the k8s audit policy that EKS uses.

References
https://aws.github.io/aws-eks-best-practices/security/docs/detective.html

PSPs EOL: Transition to Policy as Code (PaC) solutions and/or Pod Security Standards (PSS)

Is your idea request related to a problem that you've solved? Please describe.
A clear and concise description of the problem.

Describe the best practice
A clear and concise description of the best practice you developed along with any code and/or projects you used to solve the problem.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the idea here.

Monitoring Control Plane Metrics shows guidance for deprecated metrics

Describe the problem

Metric Notes
etcd_request_latencies_summary Deprecated: kubernetes/kubernetes#76496 - Replaced with etcd_request_duration_seconds
etcd_helper_cache_entry_total   Deprecated: kubernetes/kubernetes#79520
etcd_helper_cache_hit_total   Deprecated: kubernetes/kubernetes#79520
etcd_helper_cache_miss_total   Deprecated: kubernetes/kubernetes#79520
etcd_request_cache_get_duration_seconds   Deprecated: kubernetes/kubernetes#79520
etcd_request_cache_add_duration_seconds   Deprecated: kubernetes/kubernetes#79520

The PR/code changes for this deprecation do not list alternatives.

Can anyone provide alternatives for these deprecated metrics?

References
https://aws.github.io/aws-eks-best-practices/reliability/docs/controlplane/#monitor-control-plane-metrics

Upstream Kubernetes lifecycle changes

Describe the problem
The support window for minor versions of upstream Kubernetes has been changed to one year starting with 1.19.
The release interval for minor versions of upstream Kubernetes has been changed to three releases a year since 1.22.
I think these need to be reflected in the EKS best practice documentation.

https://kubernetes.io/blog/2020/08/31/kubernetes-1-19-feature-one-year-support/
https://kubernetes.io/blog/2021/07/20/new-kubernetes-release-cadence/

References
https://aws.github.io/aws-eks-best-practices/reliability/docs/controlplane/#handling-cluster-upgrades

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.