aws / aws-node-termination-handler Goto Github PK

View Code? Open in Web Editor NEW

1.6K 21.0 263.0 1.96 MB

Gracefully handle EC2 instance shutdown within Kubernetes

Home Page: https://aws.amazon.com/ec2

License: Apache License 2.0

Dockerfile 0.28% Makefile 1.18% Go 55.94% Shell 40.34% Mustache 1.55% PowerShell 0.71%

kubernetes aws-ec2 spot-instances eks golang maintenance-events

aws-node-termination-handler's Introduction

AWS Node Termination Handler

Gracefully handle EC2 instance shutdown within Kubernetes

Project Summary

This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events, EC2 Spot interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.

The aws-node-termination-handler (NTH) can operate in two different modes: Instance Metadata Service (IMDS) or the Queue Processor.

The aws-node-termination-handler Instance Metadata Service Monitor will run a small pod on each host to perform monitoring of IMDS paths like /spot or /events and react accordingly to drain and/or cordon the corresponding node.

The aws-node-termination-handler Queue Processor will monitor an SQS queue of events from Amazon EventBridge for ASG lifecycle events, EC2 status change events, Spot Interruption Termination Notice events, and Spot Rebalance Recommendation events. When NTH detects an instance is going down, we use the Kubernetes API to cordon the node to ensure no new work is scheduled there, then drain it, removing any existing work. The termination handler Queue Processor requires AWS IAM permissions to monitor and manage the SQS queue and to query the EC2 API.

You can run the termination handler on any Kubernetes cluster running on AWS, including self-managed clusters and those created with Amazon Elastic Kubernetes Service. If you're using EKS managed node groups, you don't need the aws-node-termination-handler.

Major Features

Both modes (IMDS and Queue Processor) monitor for events affecting your EC2 instances, but each supports different types of events. Both modes have the following:

Helm installation and event configuration support
Webhook feature to send shutdown or restart notification messages
Unit & integration tests

Instance Metadata Service Processor

Must be deployed as a Kubernetes DaemonSet.

Monitors EC2 Instance Metadata for:

Queue Processor

Must be deployed as a Kubernetes Deployment. Also requires some additional infrastructure setup (including SQS queue, EventBridge rules).

Monitors an SQS Queue for:
- Spot Instance Termination Notifications
- Scheduled Events (via AWS Health)
- Instance Rebalance Recommendations
- ASG Termination Lifecycle Hooks to handle the following:
  - ASG Scale-In
  - Availability Zone Rebalance
  - Unhealthy Instances, and more
- Instance State Change events

We can use the Queue Processor for both ASG Lifecycle Termination Hooks and Instance State Change Events for termination of nodes. Below listed are the details on how AWS EC2 takes actions for graceful shutdowns. You can pick one that is best suitable for your use, based on the configuration and workloads.

Queue Processor with ASG Lifecycle Hooks

When using the ASG Lifecycle Hooks, ASG first sends the lifecycle action notification then waits until it has been completed or times out. This allows time for NTH to receive the notification via SQS, cordon and drain the node, and then complete the lifecycle action. Once the ASG receives the completion it then instructs EC2 to terminate the instance.

Queue Processor with Instance State Change Events

When using the EC2 Console or EC2 API to terminate the instance, a state-change notification is sent and the instance termination is started. EC2 does not wait for a "continue" signal before beginning to terminate the instance. When you terminate an EC2 instance, it should trigger a graceful operating system shutdown which will send a SIGTERM to the kubelet, which will in-turn start shutting down pods by propagating that SIGTERM to the containers on the node. If the containers do not shut down by the kubelet's podTerminationGracePeriod (k8s default is 30s), then it will send a SIGKILL to forcefully terminate the containers. Setting the podTerminationGracePeriod to a max of 90sec (probably a bit less than that) will delay the termination of pods, which helps in graceful shutdown.

Which one should I use?

Feature	IMDS Processor	Queue Processor
Spot Instance Termination Notifications (ITN)	✅	✅
Scheduled Events	✅	✅
Instance Rebalance Recommendation	✅	✅
AZ Rebalance Recommendation	❌	✅
ASG Termination Lifecycle Hooks	❌	✅
Instance State Change Events	❌	✅

Kubernetes Compatibility

NTH Release	K8s v1.30	K8s v1.29	K8s v1.28	K8s v1.27	K8s v1.26	K8s v1.25	K8s v1.24	K8s v1.23
v1.22.1	✅	✅	✅	✅	✅	✅	✅	✅
v1.22.0	✅	✅	✅	✅	✅	✅	✅	✅
v1.21.0	❌	✅	✅	✅	✅	✅	✅	✅
v1.20.0	❌	❌	✅	✅	✅	✅	✅	✅
v1.19.0	❌	❌	❌	❌	❌	❌	✅	✅

A ✅ indicates that a specific aws-node-termination-handler release has been tested with a specific Kubernetes version. A ❌ indicates that a specific aws-node-termination-handler release has not been tested with a specific Kubernetes version.

Installation and Configuration

The aws-node-termination-handler can operate in two different modes: IMDS Processor and Queue Processor. The enableSqsTerminationDraining helm configuration key or the ENABLE_SQS_TERMINATION_DRAINING environment variable are used to enable the Queue Processor mode of operation. If enableSqsTerminationDraining is set to true, then IMDS paths will NOT be monitored. If the enableSqsTerminationDraining is set to false, then IMDS Processor Mode will be enabled. Queue Processor Mode and IMDS Processor Mode cannot be run at the same time.

IMDS Processor Mode allows for a fine-grained configuration of IMDS paths that are monitored. There are currently 3 paths supported that can be enabled or disabled by using the following helm configuration keys:

enableSpotInterruptionDraining
enableRebalanceMonitoring
enableScheduledEventDraining

By default, IMDS mode will only Cordon in response to a Rebalance Recommendation event (all other events are Cordoned and Drained). Cordon is the default for a rebalance event because it's not known if an ASG is being utilized and if that ASG is configured to replace the instance on a rebalance event. If you are using an ASG w/ rebalance recommendations enabled, then you can set the enableRebalanceDraining flag to true to perform a Cordon and Drain when a rebalance event is received.

Rebalance Recommendation is an early indicator to notify the Spot Instances that they can be interrupted soon. Node Termination Handler supports AZ Rebalance Recommendation only in Queue Processor mode using ASG Lifecycle Hooks. For AZ rebalances the instances are just terminated, using Lifecycle Hooks and EventBridge rule for EC2 Instance-terminate Lifecycle Action we can handle OD Instances.

The enableSqsTerminationDraining must be set to false for these configuration values to be considered.

The Queue Processor Mode does not allow for fine-grained configuration of which events are handled through helm configuration keys. Instead, you can modify your Amazon EventBridge rules to not send certain types of events to the SQS Queue so that NTH does not process those events. All events when operating in Queue Processor mode are Cordoned and Drained unless the cordon-only flag is set to true.

The enableSqsTerminationDraining flag turns on Queue Processor Mode. When Queue Processor Mode is enabled, IMDS mode will be disabled, even if you explicitly enabled any of the IMDS configuration keys. NTH cannot respond to queue events AND monitor IMDS paths. In this case, it is safe to disable IMDS for the NTH pod.

AWS Node Termination Handler - IMDS Processor

Installation and Configuration

The termination handler DaemonSet installs into your cluster a ServiceAccount, ClusterRole, ClusterRoleBinding, and a DaemonSet. All four of these Kubernetes constructs are required for the termination handler to run properly.

Pod Security Admission

When using Kubernetes Pod Security Admission it is recommended to assign the [privileged](https://kubernetes.io/docs/concepts/security/pod-security-standards/#privileged) level.

Kubectl Apply

You can use kubectl to directly add all of the above resources with the default configuration into your cluster.

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.22.1/all-resources.yaml

For a full list of releases and associated artifacts see our releases page.

Helm

The easiest way to configure the various options of the termination handler is via helm. The chart for this project is hosted in helm/aws-node-termination-handler

To get started you need to authenticate your helm client

aws ecr-public get-login-password \
  --region us-east-1 | helm registry login \
  --username AWS \
  --password-stdin public.ecr.aws

Once that is complete you can install the termination handler. We've provided some sample setup options below. Make sure to replace CHART_VERSION with the version you want to install.

Zero Config:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

Enabling Features:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining="true" \
  --set enableRebalanceMonitoring="true" \
  --set enableScheduledEventDraining="false" \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

The enable* configuration flags above enable or disable IMDS monitoring paths.

Running Only On Specific Nodes:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set nodeSelector.lifecycle=spot \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

Webhook Configuration:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set webhookURL=https://hooks.slack.com/services/YOUR/SLACK/URL \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

Alternatively, pass Webhook URL as a Secret:

WEBHOOKURL_LITERAL="webhookurl=https://hooks.slack.com/services/YOUR/SLACK/URL"

kubectl create secret -n kube-system generic webhooksecret --from-literal=$WEBHOOKURL_LITERAL

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set webhookURLSecretName=webhooksecret \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

For a full list of configuration options see our Helm readme.

AWS Node Termination Handler - Queue Processor (requires AWS IAM Permissions)

Infrastructure Setup

The termination handler requires some infrastructure prepared before deploying the application. In a multi-cluster environment, you will need to repeat the following steps for each cluster.

You'll need the following AWS infrastructure components:

Amazon Simple Queue Service (SQS) Queue
AutoScaling Group Termination Lifecycle Hook
Instance Tagging
Amazon EventBridge Rule
IAM Role for the aws-node-termination-handler Queue Processing Pods

Optional AWS infrastructure components:

AutoScaling Group Launch Lifecycle Hook

1. Create an SQS Queue:

Here is the AWS CLI command to create an SQS queue to hold termination events from ASG and EC2, although this should really be configured via your favorite infrastructure-as-code tool like CloudFormation (template here) or Terraform:

## Queue Policy
QUEUE_POLICY=$(cat <<EOF
{
    "Version": "2012-10-17",
    "Id": "MyQueuePolicy",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {
            "Service": ["events.amazonaws.com", "sqs.amazonaws.com"]
        },
        "Action": "sqs:SendMessage",
        "Resource": [
            "arn:aws:sqs:${AWS_REGION}:${ACCOUNT_ID}:${SQS_QUEUE_NAME}"
        ]
    }]
}
EOF
)

## make sure the queue policy is valid JSON
echo "$QUEUE_POLICY" | jq .

## Save queue attributes to a temp file
cat << EOF > /tmp/queue-attributes.json
{
  "MessageRetentionPeriod": "300",
  "Policy": "$(echo $QUEUE_POLICY | sed 's/\"/\\"/g' | tr -d -s '\n' " ")",
  "SqsManagedSseEnabled": "true"
}
EOF

aws sqs create-queue --queue-name "${SQS_QUEUE_NAME}" --attributes file:///tmp/queue-attributes.json

If you are sending Lifecycle termination events from ASG directly to SQS, instead of through EventBridge, then you will also need to create an IAM service role to give Amazon EC2 Auto Scaling access to your SQS queue. Please follow these linked instructions to create the IAM service role: link. Note the ARNs for the SQS queue and the associated IAM role for Step 2.

There are some caveats when using server side encryption with SQS:

using SSE-KMS with a customer managed key requires changing the KMS key policy to allow EventBridge to publish events to SQS.
using SSE-KMS with an AWS managed key is not supported as the KMS key policy can't be updated to allow EventBridge to publish events to SQS.
using SSE-SQS doesn't require extra setup and works out of the box as SQS queues without encryption at rest.

2. Create an ASG Termination Lifecycle Hook:

Here is the AWS CLI command to create a termination lifecycle hook on an existing ASG when using EventBridge, although this should really be configured via your favorite infrastructure-as-code tool like CloudFormation or Terraform:

aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name=my-k8s-term-hook \
  --auto-scaling-group-name=my-k8s-asg \
  --lifecycle-transition=autoscaling:EC2_INSTANCE_TERMINATING \
  --default-result=CONTINUE \
  --heartbeat-timeout=300

If you want to avoid using EventBridge and instead send ASG Lifecycle events directly to SQS, instead use the following command, using the ARNs from Step 1:

aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name=my-k8s-term-hook \
  --auto-scaling-group-name=my-k8s-asg \
  --lifecycle-transition=autoscaling:EC2_INSTANCE_TERMINATING \
  --default-result=CONTINUE \
  --heartbeat-timeout=300 \
  --notification-target-arn <your queue ARN here> \
  --role-arn <your SQS access role ARN here>

3. Tag the Instances:

By default the aws-node-termination-handler will only manage terminations for instances tagged with key=aws-node-termination-handler/managed. The value of the key does not matter.

To tag ASGs and propagate the tags to your instances (recommended):

aws autoscaling create-or-update-tags \
  --tags ResourceId=my-auto-scaling-group,ResourceType=auto-scaling-group,Key=aws-node-termination-handler/managed,Value=,PropagateAtLaunch=true

To tag an individual EC2 instance:

aws ec2 create-tags \
    --resources i-1234567890abcdef0 \
    --tags 'Key="aws-node-termination-handler/managed",Value='

Tagging your EC2 instances in this way is helpful if you only want aws-node-termination-handler to manage the lifecycle of instances in certain ASGs. For example, if your account also has other ASGs that do not contain Kubernetes nodes, this tagging mechanism will ensure that NTH does not manage the lifecycle of any instances in those non-Kubernetes ASGs.

However, if the only ASGs in your account are for your Kubernetes cluster, then you can turn off the tag check by setting the flag --check-tag-before-draining=false or environment variable CHECK_TAG_BEFORE_DRAINING=false.

You can also control what resources NTH manages by adding the resource ARNs to your Amazon EventBridge rules.

Take a look at the docs on how to create rules that only manage certain ASGs, and read about all the supported ASG events.

4. Create Amazon EventBridge Rules

You may skip this step if sending events from ASG to SQS directly.

If we use ASG with capacity-rebalance enabled on ASG, then we do not need Spot and Rebalance events enabled with EventBridge. ASG will send a termination lifecycle hook for spot interrruptions while it's launching a new instance and for Rebalance events ASG will send a termination lifecycle hook after it brings a new node in the ASG.

If we use ASG without capacity-rebalance enabled, then spot interruptions will cause a termination lifecycle hook after the interruption occurs but not while launching the new instance.

Here are AWS CLI commands to create Amazon EventBridge rules so that ASG termination events, Spot Interruptions, Instance state changes, Rebalance Recommendations, and AWS Health Scheduled Changes are sent to the SQS queue created in the previous step. This should really be configured via your favorite infrastructure-as-code tool like CloudFormation (template here) or Terraform:

aws events put-rule \
  --name MyK8sASGTermRule \
  --event-pattern "{\"source\":[\"aws.autoscaling\"],\"detail-type\":[\"EC2 Instance-terminate Lifecycle Action\"]}"

aws events put-targets --rule MyK8sASGTermRule \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:123456789012:MyK8sTermQueue"

aws events put-rule \
  --name MyK8sSpotTermRule \
  --event-pattern "{\"source\": [\"aws.ec2\"],\"detail-type\": [\"EC2 Spot Instance Interruption Warning\"]}"

aws events put-targets --rule MyK8sSpotTermRule \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:123456789012:MyK8sTermQueue"

aws events put-rule \
  --name MyK8sRebalanceRule \
  --event-pattern "{\"source\": [\"aws.ec2\"],\"detail-type\": [\"EC2 Instance Rebalance Recommendation\"]}"

aws events put-targets --rule MyK8sRebalanceRule \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:123456789012:MyK8sTermQueue"

aws events put-rule \
  --name MyK8sInstanceStateChangeRule \
  --event-pattern "{\"source\": [\"aws.ec2\"],\"detail-type\": [\"EC2 Instance State-change Notification\"]}"

aws events put-targets --rule MyK8sInstanceStateChangeRule \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:123456789012:MyK8sTermQueue"

aws events put-rule \
  --name MyK8sScheduledChangeRule \
  --event-pattern "{\"source\": [\"aws.health\"],\"detail-type\": [\"AWS Health Event\"],\"detail\": {\"service\": [\"EC2\"],\"eventTypeCategory\": [\"scheduledChange\"]}}"

aws events put-targets --rule MyK8sScheduledChangeRule \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:123456789012:MyK8sTermQueue"

5. Create an IAM Role for the Pods

There are many different ways to allow the aws-node-termination-handler pods to assume a role:

IAM Policy for aws-node-termination-handler Deployment:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeTags",
                "ec2:DescribeInstances",
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage"
            ],
            "Resource": "*"
        }
    ]
}

1. Handle ASG Instance Launch Lifecycle Notifications (optional):

NTH can monitor for new instances launched by an ASG and notify the ASG when the instance is available in the EKS cluster.

NTH will need to receive notifications of new instance launches within the ASG. We can add a lifecycle hook to the ASG that will send instance launch notifications via EventBridge:

aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name=my-k8s-launch-hook \
  --auto-scaling-group-name=my-k8s-asg \
  --lifecycle-transition=autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result="ABANDON" \
  --heartbeat-timeout=300

Alternatively, ASG can send the instance launch notification directly to an SQS Queue:

aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name=my-k8s-launch-hook \
  --auto-scaling-group-name=my-k8s-asg \
  --lifecycle-transition=autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result="ABANDON" \
  --heartbeat-timeout=300 \
  --notification-target-arn <your queue ARN here> \
  --role-arn <your SQS access role ARN here>

When NTH receives a launch notification, it will periodically check for a node backed by the EC2 instance to join the cluster and for the node to have a status of 'ready.' Once a node becomes ready, NTH will complete the lifecycle hook, prompting the ASG to proceed with terminating the previous instance. If the lifecycle hook is not completed before the timeout, the ASG will take the default action. If the default action is 'ABANDON', the new instance will be terminated, and the notification process will be repeated with another new instance.

Installation

Pod Security Admission

When using Kubernetes Pod Security Admission it is recommended to assign the [baseline](https://kubernetes.io/docs/concepts/security/pod-security-standards/#baseline) level.

Helm

The easiest way to configure the various options of the termination handler is via helm. The chart for this project is hosted in helm/aws-node-termination-handler

To get started you need to authenticate your helm client

aws ecr-public get-login-password \
     --region us-east-1 | helm registry login \
     --username AWS \
     --password-stdin public.ecr.aws

Once that is complete you can install the termination handler. We've provided some sample setup options below. Make sure to replace CHART_VERSION with the version you want to install.

Minimal Config:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSqsTerminationDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/0123456789/my-term-queue \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

Webhook Configuration:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSqsTerminationDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/0123456789/my-term-queue \
  --set webhookURL=https://hooks.slack.com/services/YOUR/SLACK/URL \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

Alternatively, pass Webhook URL as a Secret:

WEBHOOKURL_LITERAL="webhookurl=https://hooks.slack.com/services/YOUR/SLACK/URL"

kubectl create secret -n kube-system generic webhooksecret --from-literal=$WEBHOOKURL_LITERAL

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSqsTerminationDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/0123456789/my-term-queue \
  --set webhookURLSecretName=webhooksecret \
  oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

For a full list of configuration options see our Helm readme.

Single Instance vs Multiple Replicas

The Helm chart, by default, will deploy a single instance of Amazon Node Termination Handler. With the minimizing of resource usage, a single instance still provides good responsiveness in processing SQS messages.

When should multiple instances of Amazon Node Termination Handler be used?

Responsiveness: Amazon Node Termination Handler may be taking longer than desired to process certain events, potentially in processing numerous concurrent events or taking too long to drain Pods. The deployment of multiple Amazon Node Termination Handler instances may help.
Availability: The deployment of multiple Amazon Node Termination Handler instances provides mitigation in the case that Amazon Node Termination Handler itself is drained. Replica Amazon Node Termination Handlers will process SQS messages, avoiding a delay until the Deployment can start another instance.

Notes

Running multiple instances of Amazon Node Termination Handler will not load balance responding to events. Each instance will greedily consume and respond to events.
Logs from multiple instances of Amazon Node Termination Handler are not aggregated.
Multiple instances of Amazon Node Termination Handler may respond to the same event, if it takes longer than 20s to process. This is not an error case, only the first response will have an affect.

Kubectl Apply

Queue Processor needs an SQS queue URL to function; therefore, manifest changes are REQUIRED before using kubectl to directly add all of the above resources into your cluster.

Minimal Config:

curl -L https://github.com/aws/aws-node-termination-handler/releases/download/v1.22.1/all-resources-queue-processor.yaml -o all-resources-queue-processor.yaml
<open all-resources-queue-processor.yaml and update QUEUE_URL value>
kubectl apply -f ./all-resources-queue-processor.yaml

For a full list of releases and associated artifacts see our releases page.

Use with Kiam

If you are using IMDS mode which defaults to hostNetworking: true, or if you are using queue-processor mode, then this section does not apply. The configuration below only needs to be used if you are explicitly changing NTH IMDS mode to hostNetworking: false .

To use the termination handler alongside Kiam requires some extra configuration on Kiam's end. By default Kiam will block all access to the metadata address, so you need to make sure it passes through the requests the termination handler relies on.

To add a whitelist configuration, use the following fields in the Kiam Helm chart values:

agent.whiteListRouteRegexp: '^\/latest\/meta-data\/(spot\/instance-action|events\/maintenance\/scheduled|instance-(id|type)|public-(hostname|ipv4)|local-(hostname|ipv4)|placement\/availability-zone)|\/latest\/dynamic\/instance-identity\/document$'

Or just pass it as an argument to the kiam agents:

kiam agent --whitelist-route-regexp='^\/latest\/meta-data\/(spot\/instance-action|events\/maintenance\/scheduled|instance-(id|type)|public-(hostname|ipv4)|local-(hostname|ipv4)|placement\/availability-zone)|\/latest\/dynamic\/instance-identity\/document$'

Metadata endpoints

The termination handler relies on the following metadata endpoints to function properly:

/latest/dynamic/instance-identity/document
/latest/meta-data/spot/instance-action
/latest/meta-data/events/recommendations/rebalance
/latest/meta-data/events/maintenance/scheduled
/latest/meta-data/instance-id
/latest/meta-data/instance-life-cycle
/latest/meta-data/instance-type
/latest/meta-data/public-hostname
/latest/meta-data/public-ipv4
/latest/meta-data/local-hostname
/latest/meta-data/local-ipv4
/latest/meta-data/placement/availability-zone

Building

For build instructions please consult BUILD.md.

Metrics

Available Prometheus metrics:

Metric name	Description
`actions`	Number of actions
`actions_node`	Number of actions per node (Deprecated: Use actions metric instead)
`events_error`	Number of errors in events processing

Communication

If you've run into a bug or have a new feature request, please open an issue.
You can also chat with us in the Kubernetes Slack in the #provider-aws channel
Check out the open source Amazon EC2 Spot Instances Integrations Roadmap to see what we're working on and give us feedback!

Contributing

Contributions are welcome! Please read our guidelines and our Code of Conduct

License

This project is licensed under the Apache-2.0 License.

aws-node-termination-handler's People

Contributors

Stargazers

Watchers

Forkers

mustafakirimli sharmaansh21 jaypipes kniec dgamo bwagner5 tinchi eddycharly daitc2004 kkrampa jeffwan pdk27 opuzaman dasydong nithu0115 ldgithub007 athiwatp brychcy haugenj carles-figuerola jillmon shankarramshivram kavichu lemass enxebre blackbookusa mazzy89 coderanger andredurao gladiatr72 jnjmarte marcincuber qiaoxingli cjerad diversario eduardozimelewicz lazar-basiq tjto ssheff manute ozkologlu vladh91 mwconceicao aavi flant kavicse87 leosunmo limed yurrriq takanabe dheerajjoshi rifelpet toneill818 junneyang paulopontesm dp19 tdien105 stevenbressey madhubysani01 leewalter gsuryatej wanwenli reddynitheeesh stasta kppullin yahiakhidr blakestoddard cosmos0703 shashisingh hangchan mikesplain mikesplainsonos derdanne taisyo7333 patrickdeutsch-wk universam1 o11n dbacker-atg jarrettprosser yuri-1987 pobek boostrack gabegorelick jalawala tingpingliudd01 patthesilent fawadkhaliq sliaptsou inarongrit tyrken azenk mvisonneau victorboissiere adevhammer imuqtadir james-callahan engineering-bjs taharah oluwaseunmoshood marciogmorales

aws-node-termination-handler's Issues

JSON Log Output

Is there any possibilities to implement a JSON flag in order to log with JSON format instead of human readable format?
zerolog with zerolog.ConsoleWriter could be used in order to do that?

Thanks

Error getting response from instance metadata

Hello.

I am trying to test node-termination-handler. It has been running for a couple of weeks and I didn't see anything in logs except two problems. One is described in #20, another error is:

Error getting response from instance metadata Get http://169.254.169.254/latest/meta-data/spot/instance-action: dial tcp 169.254.169.254:80: connect: connection refused

After the error the handler died and was restarted. I cannot really understand if the tool works somehow or not. 🤔 Is the error critical or it is something temporary? Do I need to react on it?

I would appreciate any instructions how to use the handler and how its logs really must look like.

Support IMDSv2

Support the recently released instance metadata service v2 (https://aws.amazon.com/about-aws/whats-new/2019/11/announcing-updates-amazon-ec2-instance-metadata-service/)

Change namespace used

It'd be good if this didn't use the default namespace.

I had first thought it would be good to just remove the namespace entirely from the YAML manifest (see #2) but it is required for the ClusterRoleBinding so I think it might make more sense to change the namespace to something more appropriate.
My suggestion would either be default it to kube-system or have it create a aws-node-termination-handler namespace to use solely for this.

Handle instance retirement events

From time to time AWS is retiring EC2 instances due to underlying hardware or software failures https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html#types-of-scheduled-events

I would be awesome if the aws-node-termination-handler could watch the instance meta data service to react automatically to those events https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html#viewing_scheduled_events and drain the affected instance.

Would this be something in scope of this project?

Emulate Spot Instance ITN events

It would be nice to be able to emulate the sorts of events this daemonset reacts to. We are interested in doing more testing around implementing this daemonset, and how it would interact with our application, but I don't see an obvious path forward from a test perspective.

Node replacement on our spot pool is relatively infrequent. I would be open to sending a PR if the team has interest. My initial thought was a signal handler (SIGQUIT, SIGHUP, SIGUSR1/2, etc). If that's too out-of-left-field, maybe webhook or webhook response?
Thanks!

Missing DrainEvent metadata for SpotITN

The spot instance termination notice endpoint is only populating the timestamp of the event in the DrainEvent struct.

You can see this in the logs:

Got drain event from channel {EventID: Kind:SPOT_ITN Description:Spot ITN received. will be at 2020-03-02T18:20:55Z

The documentation for the notice API also only indicates a time is returned: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#instance-action-metadata

Can you confirm that this is working as expected? If so, will probably need to add another step to fetch more instance metadata during these notices, as at the moment the logs (and webhook notifications) are pretty sparse.

Thanks!

Another example of a webhook notice showing the DrainEvent fields not being populated:

How does this compare to the similar kube-aws project?

We're currently using https://github.com/kube-aws/kube-spot-termination-notice-handler. What are the differences between this project and that one? Are there reasons to use one over the other, either for techincal or community reasons?

Use with managed node groups

Hi,

Since managed node groups were introduced, according to AWS:

Amazon EKS automatically drains nodes using the Kubernetes API during terminations or updates. Updates respect the pod disruption budgets that you set for your pods.

Does this mean aws-node-termination-handler is not required when used in conjunction with managed node groups during maintenance events? Does anyone have some experience around that?

Also with managed node groups backed by spot instances in the horizon it would be good to get some clarity in case there is overlapping functionality.

Thank you

Cluster With On-Demand And Spot Instances

I am creating an EKS cluster with On-Demand and Spot Instances. This module run a pod on on-demand instances as well. There should be a way to run pods only on spot instances.

aws-node-termination-handler Network I/O issues with pod

I don't have any way to reproduce this issue, but found it after taking a look at bills. Few days ago something starting eating ~100$/day after investigation - it was a NAT gateway and I found a source of traffic.

First screen shows overall picture and second - exact timeframe when it started.
Hopefully the form of graph will help to identify an issue.
Here is a part of log. It's full of that:

2020/06/16 11:54:04 Trying to get token from IMDSv2
2020/06/16 11:54:04 Got token from IMDSv2
2020/06/16 12:53:52 Trying to get token from IMDSv2
2020/06/16 12:53:52 Got token from IMDSv2
2020/06/16 13:53:40 Trying to get token from IMDSv2
2020/06/16 13:53:40 Got token from IMDSv2

Which is actually the same as everywhere else.. And I couldn't find anything from logs.

After deleting pod - new one scheduled pod has same issue with Network IO

Document required IAM Policy

I cannot seem to find the IAM Permissions that are required by the application in the form of an IAM Policy. Does this already exist?

Add configurable delay to interruption handling

Currently the daemonset immediately cordons and drains a node upon ITN detection.

We should allow users to define a period to wait before initializing cordon and drain such that they can still perform work in that window. e.g. if you can successfully cordon and drain in 30 seconds, delay the process 90 seconds.

Add support for Instance Refresh for EC2 Auto Scaling

Can the AWS Node Termination Handler handle the events triggered by Instance Refresh for EC2 Auto Scaling to provide a seamless worker upgrade path? If not is this something that could be supported?

Spot nodes not labeled as such

I'm not seeing the lifecycle: Ec2Spot label associated with my nodes running on spot instances. Is there something in addition to this project that needs to run for this label to appear?

Running EKS 1.14, all nodes running on spot instances.

Prometheus metric path

Hi,

Thanks for this project, it's very useful.

Looking at the documentation I could not find any prometheus path, so Can it be added or do you think is not needed?

I can work on a PR if that's ok.

[edited: typo]

Fix log output on e2e tests when a failure occurs

When running the E2E test suites, the NTH logs are printed on a successful test run, but they are not print when a test failure occurs. NTH logs should be printed when the tests succeed or fail.

Provide k8s yaml resource files in addition to helm chart

Some users would rather use the Kubernetes resource yaml files directly rather than installing via helm charts. This presents a problem with synchronizing the resource files and the helm chart.

We already store a local copy of the helm chart in this repo and the distributed copy in the github.com/aws/eks-charts repo. The synchronization between these two locations is done manually but enforced via a travis test which diffs the two locations.

A possible solution is to treat the helm chart as the source of truth and use helm template to generate the yaml resource files and then upload as a tar'd artifact in our github releases. This could all be done via travis (we're already automatically generating binaries and uploading to releases via travis). The generation process would ensure that the default values are the same between the plain-yaml files and the helm default values.

Expand to include non-spot instances

We currently use a lambda to drain our instances, would prefer to use this.

helm3 does not have --name flag

Slack notification support

We are running several EKS clusters on spot instances. So far we are using https://github.com/kube-aws/kube-spot-termination-notice-handler to handle node termination but would love to switch to this project.

However, we need support for slack notifications.

/cc @ecktom

More DrainEvent metadata to potentially use in the webhook template

Hello, thank you for open sourcing this, it seems to be working well!

I noticed that the webhook template allows you to use fields from the associated DrainEvent. It would be helpful if more fields were available, like node name, instance type, and potentially instance id. It seems like these could all come from the Node object

Create Helm Chart

Feature Request: Detatch Instance from ASG upon Termination Event

The other termination-handler project at https://github.com/kube-aws/kube-spot-termination-notice-handler is capable of detaching the instance from its ASG when a termination event occurs. Link

This feature is useful to compensate for the extremely long lag time between instance termination and another ASG replacement coming online. If the instance is detached from the asg immediately, a new node can be starting up during the 120s grace period.

Currently, my only method of ensuring minimum compute capacity is to run an extra node or two at all times so that pods get rescheduled onto other nodes when one of them is drained. If a new instance comes online fast enough for the drained pods to reschedule, the problem is solved.

TL;DR: Remove instance from ASG early to bring new nodes online faster.

Cannot send slack webhook call via proxy

We only allow external traffic via an outgoing proxy. We should be able to pass ENV variables to the daemonset, so we can pass in https_proxy, and slack call should use this value if it is set. Otherwise, the slack calls are not getting out of the network.

RBAC question: DaemonSet delete

Your sample deployment manifest includes delete permissions for daemonsets. Is this actually used? It looks like maybe it's only needed for running your test suite, and not for normal usage?

Tighten Default RBAC Rules

First, great work on this! It seems like the RBAC could be paired down to a much tighter set of controls.

It looks like these have some overlapping rules, namely having apiGroups: "", resources: "*"verbs: "*" as well as all the rest of the declarations. I also wouldn't have apiGroups: rbac.authorization.k8s.io, resources: "*", verbs: "*"

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: node-termination-handler
  namespace: default
rules:
- apiGroups:
  - "apps"
  resources:
  - "daemonsets"
  verbs:
  - get
  - delete
- apiGroups:
  - ""
  resources:
  - "*"
  verbs:
  - "*"
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - "*"
  verbs:
  - "*"
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - get
  - list
  - watch
  - create
  - delete

node getting marked as SchedulingDisabled, but does not drain

1. What happened
The aws-node-termination-handler marked the node as SchedulingDisabled but, it did not drain the node.

2. What did you expect to happen?
Expected the drain command to evict pods running on the node.

3. What commands did you run? What is the simplest way to reproduce this issue?

Running this version of aws-node-termination-handler:

https://github.com/aws/aws-node-termination-handler/releases/download/v1.3.1/all-resources.yaml

updated the manifest with (which can also be seen in the beginning of pod logs) these values:

        - name: DELETE_LOCAL_DATA
          value: "false"
        - name: IGNORE_DAEMON_SETS
          value: "true"
        - name: POD_TERMINATION_GRACE_PERIOD
          value: "-1"
        - name: DRY_RUN
          value: "false"
        - name: ENABLE_SPOT_INTERRUPTION_DRAINING
          value: "true"
        - name: ENABLE_SCHEDULED_EVENT_DRAINING
          value: "true"

pod starts correctly (pod logs):
kubectl -n kube-system logs -f pod/aws-node-termination-handler-mbl5q

aws-node-termination-handler arguments:
	dry-run: false,
	node-name: ip-10-34-5-239.us-west-2.compute.internal,
	metadata-url: http://169.254.169.254,
	kubernetes-service-host: 10.255.0.1,
	kubernetes-service-port: 443,
	delete-local-data: false,
	ignore-daemon-sets: true,
	pod-termination-grace-period: -1,
	node-termination-grace-period: 120,
	enable-scheduled-event-draining: true,
	enable-spot-interruption-draining: true,
	metadata-tries: 3,
2020/04/28 22:46:42 Trying to get token from IMDSv2
2020/04/28 22:46:42 Got token from IMDSv2
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 22:46:42 Startup Metadata Retrieved: {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239}
2020/04/28 22:46:42 Started watching for drain events
2020/04/28 22:46:42 Kubernetes AWS Node Termination Handler has started successfully!
2020/04/28 22:46:42 Started monitoring for Scheduled Maintenance events
2020/04/28 22:46:42 Started watching for event cancellations
2020/04/28 22:46:42 Started monitoring for Spot ITN events

To test this, I am using the project: https://github.com/Shogan/ec2-spot-termination-simulator
Once the simulator starts, followed the readme to get it to respond with a 200 OK for the endpoints below:
root@ip-10-34-5-239:~# curl -I http://169.254.169.254/latest/meta-data/spot/instance-action

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
Content-Length: 56
ETag: W/"38-VYfXrrgMPYYVvHpYfu17uf0cEUo"
Date: Tue, 28 Apr 2020 23:02:05 GMT
Connection: keep-alive

root@ip-10-34-5-239:~# curl -I http://169.254.169.254/latest/meta-data/spot/termination-time

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 24
ETag: W/"18-19RUhUkAJcUow5fWuYAuXtXIqWw"
Date: Tue, 28 Apr 2020 23:03:00 GMT
Connection: keep-alive

The handler catches this, and issues the drain command (pod logs):

2020/04/28 23:00:29 There was a problem monitoring for Spot ITN events: There was a problem checking for spot ITNs: Unable to parse metadata response: Unable to get a response from IMDS: Get "http://169.254.169.254/latest/meta-data/spot/instance-action": dial tcp 169.254.169.254:80: connect: connection refused
2020/04/28 23:00:29 Request failed. Attempts remaining: 2
2020/04/28 23:00:29 Sleep for 2.455292045s seconds
2020/04/28 23:00:30 Request failed. Attempts remaining: 1
2020/04/28 23:00:30 Sleep for 5.658607551s seconds
2020/04/28 23:00:32 Request failed. Attempts remaining: 1
2020/04/28 23:00:32 Sleep for 7.07003247s seconds
2020/04/28 23:00:35 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Unable to parse metadata response: Unable to get a response from IMDS: Get "http://169.254.169.254/latest/meta-data/events/maintenance/scheduled": dial tcp 169.254.169.254:80: connect: connection refused
2020/04/28 23:00:35 Request failed. Attempts remaining: 2
2020/04/28 23:00:35 Sleep for 2.162389136s seconds
2020/04/28 23:00:37 Request failed. Attempts remaining: 1
2020/04/28 23:00:37 Sleep for 5.854815506s seconds
2020/04/28 23:00:39 There was a problem monitoring for Spot ITN events: There was a problem checking for spot ITNs: Unable to parse metadata response: Unable to get a response from IMDS: Get "http://169.254.169.254/latest/meta-data/spot/instance-action": dial tcp 169.254.169.254:80: connect: connection refused
2020/04/28 23:00:39 Request failed. Attempts remaining: 2
2020/04/28 23:00:39 Sleep for 2.346887455s seconds
2020/04/28 23:00:41 Request failed. Attempts remaining: 1
2020/04/28 23:00:41 Sleep for 4.813488343s seconds
2020/04/28 23:00:43 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:43 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:44 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:46 Sending drain event to the drain channel
2020/04/28 23:00:46 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-e3fc043a893b0aa781dd6cf767711a4256f38164a35cb76e260a3f51681ea65d Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:46.243Z
 State: StartTime:2020-04-28 23:02:46.243 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
2020/04/28 23:00:46 Sending drain event to the drain channel
2020/04/28 23:00:46 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-180911fc5ea28cddd3d3f22b95f796141b4573a1e4e5e39307116a034d72d18e Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:46.246Z
 State: StartTime:2020-04-28 23:02:46.246 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:46 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:46 Sending drain event to the drain channel
2020/04/28 23:00:46 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-8f856801ec2e9f3e9fb5351e098d285d0cc4071072d32ea7e4fbca3d64c9a921 Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:46.262Z
 State: StartTime:2020-04-28 23:02:46.262 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
2020/04/28 23:00:46 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.
2020/04/28 23:00:48 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:48 Sending drain event to the drain channel
2020/04/28 23:00:48 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-259e1dd39c74d627008861a953d00e6353199484342611434ee81fa38d55d4f8 Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:48.262Z
 State: StartTime:2020-04-28 23:02:48.262 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:49 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.
2020/04/28 23:00:50 Sending drain event to the drain channel
2020/04/28 23:00:50 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-356298b6a07c6a6c21aca8644929e6446a3fc6f8223be8222a9ff473e1b68707 Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:50.261Z
 State: StartTime:2020-04-28 23:02:50.261 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
2020/04/28 23:00:50 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:51 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.
2020/04/28 23:00:52 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:52 Sending drain event to the drain channel
2020/04/28 23:00:52 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-5cebb6c8509f8ae9517eaf79136bf66b6fcf586c3e6d8814c212aaf98a29e9db Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:52.262Z
 State: StartTime:2020-04-28 23:02:52.262 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:53 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.
2020/04/28 23:00:54 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:54 Sending drain event to the drain channel
2020/04/28 23:00:54 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-6a8a5df95f6bb5d63d03f5da495c402629bd2e3ec1a58187cf4e39e98847c10c Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:54.262Z
 State: StartTime:2020-04-28 23:02:54.262 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:55 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.
2020/04/28 23:00:56 There was a problem monitoring for Scheduled Maintenance events: Unable to parse metadata response: Metadata request received http status code: 404
2020/04/28 23:00:56 Sending drain event to the drain channel
2020/04/28 23:00:56 Got drain event from channel {InstanceID:i-027a7181c7dcdf2ff InstanceType:m5n.4xlarge PublicHostname: PublicIP: LocalHostname:ip-10-34-5-239.us-west-2.compute.internal LocalIP:10.34.5.239} {EventID:spot-itn-4a2b3bf424ab3a1d8d7e456e4564f8c4847fdd27b4871b0a98e7ced199c91c2f Kind:SPOT_ITN Description:Spot ITN received. Instance will be interrupted at 2020-04-28T23:02:56.262Z
 State: StartTime:2020-04-28 23:02:56.262 +0000 UTC EndTime:0001-01-01 00:00:00 +0000 UTC Drained:false PreDrainTask:<nil>}
ip-10-34-5-239.us-west-2.compute.internal
2020/04/28 23:00:57 Node "ip-10-34-5-239.us-west-2.compute.internal" successfully drained.

The logs keep looping with these messages.

4. What happened after the commands executed?

The node was marked as SchedulingDisabled
kubectl get nodes -l kops.k8s.io/instancegroup=nodes

NAME                                        STATUS                     ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION   CONTAINER-RUNTIME
ip-10-34-3-108.us-west-2.compute.internal   Ready                      node    44m   v1.15.5   10.34.3.108   <none>        Debian GNU/Linux 9 (stretch)   4.9.0-11-amd64   docker://18.6.3
ip-10-34-5-239.us-west-2.compute.internal   Ready,SchedulingDisabled   node    42m   v1.15.5   10.34.5.239   <none>        Debian GNU/Linux 9 (stretch)   4.9.0-11-amd64   docker://18.6.3

kubelet logs form the node:
root@ip-10-34-5-239:~# journalctl -u kubelet.service -f

Apr 28 22:55:12 ip-10-34-5-239 kubelet[1815]: I0428 22:55:12.435800    1815 kube_docker_client.go:345] Stop pulling image "shoganator/ec2-spot-termination-simulator:1.0.1": "Status: Downloaded newer image for shoganator/ec2-spot-termination-simulator:1.0.1"
Apr 28 22:55:13 ip-10-34-5-239 kubelet[1815]: I0428 22:55:13.361318    1815 kubelet.go:1933] SyncLoop (PLEG): "spot-term-simulator-59b8bb69b7-2tsjc_default(01978ef9-eaa1-4467-948e-20febc853e46)", event: &pleg.PodLifecycleEvent{ID:"01978ef9-eaa1-4467-948e-20febc853e46", Type:"ContainerStarted", Data:"d99b10f487e0afe0527f7537eb5884381ead2c4feb926e9523524746a3d97364"}
Apr 28 22:56:16 ip-10-34-5-239 kubelet[1815]: I0428 22:56:16.186344    1815 container_manager_linux.go:457] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
.
.
.
.
Apr 28 22:59:56 ip-10-34-5-239 kubelet[1815]: I0428 22:59:56.666917    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: dial tcp 169.254.169.254:80: connect: connection refused"
Apr 28 23:00:06 ip-10-34-5-239 kubelet[1815]: I0428 23:00:06.948970    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: dial tcp 169.254.169.254:80: connect: connection refused"
Apr 28 23:00:17 ip-10-34-5-239 kubelet[1815]: I0428 23:00:17.293101    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: dial tcp 169.254.169.254:80: connect: connection refused"
Apr 28 23:00:27 ip-10-34-5-239 kubelet[1815]: I0428 23:00:27.637188    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: dial tcp 169.254.169.254:80: connect: connection refused"
Apr 28 23:00:37 ip-10-34-5-239 kubelet[1815]: I0428 23:00:37.896355    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "RequestError: send request failed\ncaused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: dial tcp 169.254.169.254:80: connect: connection refused"
Apr 28 23:00:46 ip-10-34-5-239 kubelet[1815]: I0428 23:00:46.600960    1815 kubelet_node_status.go:471] Recording NodeNotSchedulable event message for node ip-10-34-5-239.us-west-2.compute.internal
Apr 28 23:00:47 ip-10-34-5-239 kubelet[1815]: I0428 23:00:47.898161    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "EC2MetadataError: failed to make EC2Metadata request\ncaused by: <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot GET /latest/meta-data/network/interfaces/macs/</pre>\n</body>\n</html>\n"
Apr 28 23:00:57 ip-10-34-5-239 kubelet[1815]: I0428 23:00:57.899614    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "EC2MetadataError: failed to make EC2Metadata request\ncaused by: <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot GET /latest/meta-data/network/interfaces/macs/</pre>\n</body>\n</html>\n"
Apr 28 23:01:07 ip-10-34-5-239 kubelet[1815]: I0428 23:01:07.901001    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "EC2MetadataError: failed to make EC2Metadata request\ncaused by: <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot GET /latest/meta-data/network/interfaces/macs/</pre>\n</body>\n</html>\n"
Apr 28 23:01:16 ip-10-34-5-239 kubelet[1815]: I0428 23:01:16.186573    1815 container_manager_linux.go:457] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Apr 28 23:01:17 ip-10-34-5-239 kubelet[1815]: I0428 23:01:17.901951    1815 cloud_request_manager.go:115] Node addresses from cloud provider for node "ip-10-34-5-239.us-west-2.compute.internal" not collected: error querying AWS metadata for "network/interfaces/macs": "EC2MetadataError: failed to make EC2Metadata request\ncaused by: <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot GET /latest/meta-data/network/interfaces/macs/</pre>\n</body>\n</html>\n"

Pods are still running on that node:
kubectl get pods -n kube-addons -o wide | grep my-nginx | grep ip-10-34-5-239.us-west-2.compute.internal

my-nginx-c58d9b7db-24qlj                                  1/1     Running   0          19m     10.255.236.66    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-2d4hn                                  1/1     Running   0          18m     10.255.236.73    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-4f2l2                                  1/1     Running   0          18m     10.255.236.74    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-8sh8d                                  1/1     Running   0          19m     10.255.236.69    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-cksnt                                  1/1     Running   0          18m     10.255.236.72    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-jgk57                                  1/1     Running   0          18m     10.255.236.75    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-jkm9p                                  1/1     Running   0          18m     10.255.236.76    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-rjvmw                                  1/1     Running   0          19m     10.255.236.70    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-rpr7m                                  1/1     Running   0          19m     10.255.236.68    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-rzbkk                                  1/1     Running   0          19m     10.255.236.67    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>
my-nginx-c58d9b7db-zj7gn                                  1/1     Running   0          18m     10.255.236.71    ip-10-34-5-239.us-west-2.compute.internal   <none>           <none>

5. What did you expect to happen?
Expected the pods to be evicted from that node.

6. Anything else do we need to know?

Found nothing in api-server pod logs.

kube-apiserver-ip-10-34-7-89.us-west-2.compute.internal kube-apiserver I0428 23:00:23.893775       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
kube-apiserver-ip-10-34-7-37.us-west-2.compute.internal kube-apiserver I0428 23:00:23.895238       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
kube-apiserver-ip-10-34-7-181.us-west-2.compute.internal kube-apiserver I0428 23:00:23.893754       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
kube-apiserver-ip-10-34-7-37.us-west-2.compute.internal kube-apiserver I0428 23:00:26.311504       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.external.metrics.k8s.io
kube-apiserver-ip-10-34-7-37.us-west-2.compute.internal kube-apiserver E0428 23:00:26.313190       1 controller.go:114] loading OpenAPI spec for "v1beta1.external.metrics.k8s.io" failed with: OpenAPI spec does not exist
kube-apiserver-ip-10-34-7-37.us-west-2.compute.internal kube-apiserver I0428 23:00:26.313264       1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.external.metrics.k8s.io: Rate Limited Requeue.
kube-apiserver-ip-10-34-7-89.us-west-2.compute.internal kube-apiserver I0428 23:01:23.898571       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
kube-apiserver-ip-10-34-7-181.us-west-2.compute.internal kube-apiserver I0428 23:01:23.898618       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
kube-apiserver-ip-10-34-7-37.us-west-2.compute.internal kube-apiserver I0428 23:01:23.901611       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io

7. Environment:

k8s cluster (1.15.5) on AWS using KOPS (1.15.0)
k8s cluster configuration: 3-master. nodes (one per ASG in each AZ), and 1 MixedInstanceType worker ASG which runs 2 spot Instances.
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-23T14:21:36Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
AWS InstanceType:m5.4xlarge
OS (e.g: cat /etc/os-release):

root@ip-10-34-5-239:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@ip-10-34-5-239:~#

Kernel (e.g. uname -a):

root@ip-10-34-5-239:~# uname -a
Linux ip-10-34-5-239 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64 GNU/Linux
root@ip-10-34-5-239:~#

Install tools:

Network plugin and version:
calico

cannot use when restricting access to metadata service

In order to prevent pods from assuming the IAM role of the node, we restrict access to the metadata service for pods as follows:

yum install -y iptables-services
iptables --insert FORWARD 1 --in-interface eni+ --destination 169.254.169.254/32 --jump DROP
iptables-save | tee /etc/sysconfig/iptables 
systemctl enable --now iptables

This is recommended in the AWS EKS documentation for worker nodes.

When running the termination handler, the following is logged:

2020/01/28 00:33:47 Request to instance metadata failed. Retrying.
2020/01/28 00:34:17 Error getting response from instance metadata  Get http://169.254.169.254/latest/meta-data/spot/instance-action: dial tcp 169.254.169.254:80: i/o timeout

Mostly logging this so that other people who run into the same know this isn't supported.. but are there any plans to add support for gathering the termination events through CloudWatch events when restricting access to the metadata service? Or should we just use an alternate termination handler?

Better Docs

As the node-termination-handler includes more features and configurable values, we should have some nice docs.

I would propose https://www.mkdocs.org/ for user docs.

Initial list of some useful sections in the docs:

All the different configuration values allowed and some more detail on each
Helm configurations
Using the node-termination-handler as an on-host binary (not a daemonset) - maybe baked into an AMI?

Taint nodes when termination notification is detected

Similar to how cluster-autoscaler uses taints when marking nodes for scale down, aws-node-termination-handler should taint nodes which will be terminated, in addition to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.

The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using prometheus-operator, default KubeNodeUnreachable alert looks like this:

kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"}

which often misfires for nodes being scaled down by the cluster-autoscaler. To fix the problem, we can drop nodes with a taint:

kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="node.kubernetes.io/unreachable"} unless on(node) kube_node_spec_taint{effect="NoSchedule",job="kube-state-metrics",key="ToBeDeletedByClusterAutoscaler"}

If aws-node-termination-handler tainted nodes that taint could be incorporated into the alert as well.

I'm happy to PR this if there are no objections.

Support windows spot instances

Looks like the current daemonset and image is only for linux, is there a plan on the roadmap to support windows spot instances or should a pull request be made?

question

The documentation states it will cordon the node and drain it. Does it do that on its' own and how does that play with the existing auto scaling group and the lifecycle hooks we have configured for drains? We watch to make sure certain daemonsets/deploy/pods have been terminated before continuing etc.

Manifests in the repository

Hi,

I apologize if the suggestions is dumb but we have following problem:

We are starting to use ArgoCD as a tool to deploy cluster addons. Node termination handler is one of them. Currently we are not making any changes to it and deploy it as it is in releases. We are using helm for some addons where we add different configuration values but for this addons helm chart seems like unnecessary dependency. Ideally we would specify the manifests directly, maybe as a base in kustomize and overlay something if needed.

The problem is this addon doesn't have manifests in the repository so we cannot refer to them. I haven't found a way to refer to the release package. If there isn't something I missed, would it be possible to publish and tag every new version in the repo?

Thanks!

How can I get the termination information via webhook?

I have deployed aws-node-termination-handler as DaemonSet in an EKS cluster. I set the configuration of flags in order to receive notification via webhook when a spot instance terminates.
To test whether the flags will work, I only set the flag WEBHOOK_URL with a url which is listening for HTTP requests and leave blanks for other flags as default.
I did receive an POST request from a terminating spot instance in EKS cluster terminates. It is expected a JSON like {"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Description: {{ .Description }} - State: {{ .State }} - Start Time: {{ .StartTime }}"} which provided by the official documentation. But I can't see anything in the request payload.

Chart Daemonset should tolerate all taints by default

The issue is if a node has a taint, currently the handler will not be run in that node. Then if termination happens, services may not be drained properly. Therefore, it should by default tolerate all taints by adding to the Daemonset?

...
tolerations:
- operator: "Exists"
...

Design Question: Drain Node vs Add Taints

Currently, most of the termination handler including this one, drains the node when termination notice is received.

TaintBasedEviction has been enabled since 1.13 (and TaintNodesByCondition since 1.12). Has anyone thought about using taints instead to approach this as below?

On receiving Termination Notice, taint the affected node with "termination-notice:NoSchedule" and "termination-notice:NoExecute".

In that case, any pods that do not have the toleration will automatically be evicted.
It will then give back the decision (and flexibility) to the cluster admin whether a pod can tolerate this or not.

This will provide more flexibility and scalability to say what combination of pods should or should not be evicted per the user's need.

Notes

Taints by Condition also seems to be how k8s handles out-of-resource eviction: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-nodes-by-condition

Provide dry-run mode

Is your feature request related to a problem? Please describe.
I would like to compare aws-node-termination-handler with our current approach of listening to the CloudWatch event in terms of timing.

Describe the solution you would like
The application should take a --dry-run flag that will disable the actual execution of the draining but still print the log messages. During startup it should log out that the application is running in dry-run mode.

Thanks
hrzbrg

Feature Request: Read WEBHOOK_URL from a secret instead of specificying in an env variable

WEBHOOK_URL is treated as a secret/private by some organizations since it gives the ability to send a message to a group of people e.g. in a slack channel.

Currently the manifests in https://github.com/aws/aws-node-termination-handler/releases/download/v1.4.0/all-resources.yaml require to hard code the WEBHOOK_URL value which some organization can not afford to put in Github.

Please provide the flexibility in the published all-resources.yaml to use either of the options: e.g. either use WEBHOOK_URL or WEBHOOK_URL_FROM_SECRET where yaml can look like

...
      - name: WEBHOOK_URL_FROM_SECRET
        valueFrom:
          secretKeyRef:
            name: secret-webhook-url
            key: address

Feature Request: Add lifecycle labels to nodes

We run ASGs with mixed on-demand and spot instances. Given the nature of spot instances and some of our workloads, we need the ability to use Node Affinity rules to ensure we get a good spread between the on-demand nodes and the spot nodes (or avoid spot instances altogether for certain workloads).

Would it be feasible to have NTH query the metadata API and retrieve the lifecycle of the running node and attach a lifecycle=spot or lifecycle=on-demand label to the node upon startup?

Unable to parsed scheduled event start time

Hello,
I recently deployed aws-node-termination-handler in my cluster and now I have my first scheduled termination. The node hasn't drain nor cordoned and the application logs show the following message repeatedly every 2-3 seconds:

2020/04/30 12:19:25 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:27 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:29 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:32 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:34 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:35 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:37 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:39 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"
2020/04/30 12:19:42 There was a problem monitoring for Scheduled Maintenance events: Unable to parsed scheduled event start time: parsing time "8 May 2020 16:00:00 GMT" as "02 Jan 2006 15:04:05 GMT": cannot parse "8 May 2020 16:00:00 GMT" as "02"

What might the problem be?

Thanks,
Yosi

Improve InterruptionEvent with more details

When an InterruptionEvent occurs (used to post on webhook), it would be nice to have information such as:

instance-id
instance-type
node name
availability zone

This way we can adapt the WEBHOOK_TEMPLATE to also send that information.

Support cordon only

Would be possible to support cordon only instead of also drain? I'm keen to wire PR and make a contribution. I'm just asking here before doing that if the feature would be well accepted.

My use case is that I have CI jobs running in Spot instances and cordon and drain the node would result in a poor experience for the devs because their jobs would be killed beforehand.

Instead of only cordon, the node would make any new jobs non-schedulable in the spot instances that are going to be killed.

Non-Helm install files still include helm annotations

When "helm template" is used to generate the plain yaml files the helm specific labels are left on the resources. The build step should remove lines like the bolded below after generation.

Example annotation labels
helm.sh/chart: aws-node-termination-handler-0.5.0
app.kubernetes.io/managed-by: Helm

Example yaml from 1.2.0 release

apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-node-termination-handler
  namespace: kube-system
  labels:
    app.kubernetes.io/name: aws-node-termination-handler
    helm.sh/chart: aws-node-termination-handler-0.5.0
    app.kubernetes.io/instance: aws-node-termination-handler
    k8s-app: aws-node-termination-handler
    app.kubernetes.io/version: "1.2.0"
    app.kubernetes.io/managed-by: Helm

Ever increasing memory usage.

I deployed v1.3.0 of the node termination handler using the official helm chart. I noticed that the memory usage is ever increasing and eventually it reaches the pod memory limit and is OOMKilled. Is there a memory leak somewhere?

Logs does not look anything out of the ordinary:

$ kubectl logs -f -n kube-system node-termination-handler-hmbs7
aws-node-termination-handler arguments: 
	dry-run: false,
	node-name: ip-x-x-x-x.ap-southeast-1.compute.internal,
	metadata-url: http://169.254.169.254,
	kubernetes-service-host: 172.20.0.1,
	kubernetes-service-port: 443,
	delete-local-data: true,
	ignore-daemon-sets: true,
	pod-termination-grace-period: -1,
	node-termination-grace-period: 120,
	enable-scheduled-event-draining: false,
	enable-spot-interruption-draining: true,
	metadata-tries: 3,
2020/03/17 14:11:36 Trying to get token from IMDSv2
2020/03/17 14:11:36 Got token from IMDSv2
2020/03/17 14:11:36 Startup Metadata Retrieved: {InstanceID:i-xxxx InstanceType:m5.large PublicHostname: PublicIP: LocalHostname:ip-x-x-x-x.ap-southeast-1.compute.internal LocalIP:10.17.16.6}
2020/03/17 14:11:36 Started watching for drain events
2020/03/17 14:11:36 Kubernetes AWS Node Termination Handler has started successfully!
2020/03/17 14:11:36 Started watching for event cancellations
2020/03/17 14:11:36 Started monitoring for Spot ITN events

Support other schedulers like Hashicorp Nomad

It would be great that this service supports other schedulers like Hashicorp Nomad or Mesos.

Something like this:

/node-termination-handler --type nomad/k8s/mesos...

Update PULL_REQUEST_TEMPLATE to match other AWS repos

Currently the GH PR template language contains the following:

"By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice."

This is incorrect, since clearly there isn't a "terms of your choice" component here :) Instead, we should correct this to reference the Apache 2.0 License:

"By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license."

Flags "delete-local-data" & "ignore-daemonsets"

Could please add ability to configure this flags

--ignore-daemonsets=false: Ignore DaemonSet-managed pods
--delete-local-data=false: Continue even if there are pods using emptyDir (local data that will be deleted when the node is drained)

Provide handling for ASG 'rebalance' events

Background:

We run EKS cluster on SPOT instances managed behind an ASG with K8S cluster-autoscaler.
This is pretty obvious use case for SPOTs, leveraging Kubernetes self-healing and self-regulating features and elasticity provided by ASG.

With around 9-12 instances, we often get intensive SPOT rotation events, where we loose multiple SPOTs within 10-20 mins - sometimes as much as half the cluster.

When SPOT instances are lost, they are quickly replaced by ASG. However subjected to spare capacity across Availability Zones, instances often come back in the numbers but unbalanced across Zones.

ASG has a rebalancing mechanism and over time it will bring new SPOTs to rebalance ASG across AZs. Rebalanced instances are terminated by ASG from an overflowing AZ.

Issue:
While aws-node-termination-handler is handling SPOT terminations well, draining the pods and all, it remains completely oblivious to rebalancing events that ensue.

Need to push git tags for at least v1.0.0 bits

There are no git tags on the aws/aws-node-termination-handler repository, therefore there are no releases listed. However, there is a Docker image tag of v1.0.0 in dockerhub. There should be a corresponding git tag of v1.0.0 as well.

Docker image tags should really be derived from the git tag, making the git history the source of truth for all artifact versioning.

I can work on fixing up the Makefile to ensure proper git tags are used for tagging the Docker images.

aws / aws-node-termination-handler Goto Github PK

aws-node-termination-handler's Introduction

AWS Node Termination Handler

Gracefully handle EC2 instance shutdown within Kubernetes

Project Summary

Major Features

Instance Metadata Service Processor

Queue Processor

Queue Processor with ASG Lifecycle Hooks

Queue Processor with Instance State Change Events

Which one should I use?

Kubernetes Compatibility

Installation and Configuration

Installation and Configuration

Pod Security Admission

Kubectl Apply

Helm

Infrastructure Setup

1. Create an SQS Queue:

2. Create an ASG Termination Lifecycle Hook:

3. Tag the Instances:

4. Create Amazon EventBridge Rules

5. Create an IAM Role for the Pods

1. Handle ASG Instance Launch Lifecycle Notifications (optional):

Installation

Pod Security Admission

Helm

Single Instance vs Multiple Replicas

Kubectl Apply

Use with Kiam

Metadata endpoints

Building

Metrics

Communication

Contributing

License

aws-node-termination-handler's People

Contributors

Stargazers

Watchers

Forkers

aws-node-termination-handler's Issues

Recommend Projects

Recommend Topics

Recommend Org