aws / amazon-cloudwatch-agent-operator Goto Github PK
View Code? Open in Web Editor NEWThe Amazon CloudWatch Agent Operator is software developed to manage the CloudWatch Agent on kubernetes.
License: Apache License 2.0
The Amazon CloudWatch Agent Operator is software developed to manage the CloudWatch Agent on kubernetes.
License: Apache License 2.0
Users may wish to run the CloudWatch Agent using pod-based IAM roles, using the IRSA or EKS Pod Identities technologies. Recently (PR below) this was enabled when an environment variable is set on the agent pod, RUN_WITH_IRSA=true
, and this enables the agent to utilize the default provider chain for AWS authentication.
However, the EKS Addon for AWS CloudWatch Observability creates a managed AmazonCloudWatchAgent
configuration, making it unsafe - there is no guarantee it won't be overridden - to add environment variables.
CloudWatch Agent PR:
Running the EKS Addon for AWS CloudWatch Observability with pod-based IAM should work by default.
The agent fails, and there is no knob available to users to ensure the agent works.
Either of these solutions would address this:
AmazonCloudWatchAgent
custom resourcecwagentconfig.json
, which is managed by the add-on should accept a configuration key to enable the RUN_WITH_IRSA mode.Describe the bug
All the pods part of the ds cloudwatch-agent are not getting restarted when doing kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch
. Only one pod is getting restarted.
Steps to reproduce
Created a cluster of version 1.28 and installed the addon Amazon CloudWatch Observability
of version v1.2.2-eksbuild.1
.
Intially we have 2 pods:
kubectl get pods -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-cloudwatch cloudwatch-agent-hdbmv 1/1 Running 0 7s 172.31.78.188 ip-172-31-67-14.ec2.internal <none> <none>
amazon-cloudwatch cloudwatch-agent-ttfbd 1/1 Running 0 7s 172.31.1.111 ip-172-31-5-6.ec2.internal <none> <none>
1st Restart:
kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch
daemonset.apps/cloudwatch-agent restarted
We can see that only 1 pod got restarted, other pod is still running:
kubectl get pods -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide -w
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-cloudwatch cloudwatch-agent-hdbmv 1/1 Running 0 33s 172.31.78.188 ip-172-31-67-14.ec2.internal <none> <none>
amazon-cloudwatch cloudwatch-agent-l2mgm 1/1 Running 0 8s 172.31.0.110 ip-172-31-5-6.ec2.internal <none> <none>
Same behaviour every time:
kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch
daemonset.apps/cloudwatch-agent restarted
kubectl get pods -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-cloudwatch cloudwatch-agent-hdbmv 1/1 Running 0 71s 172.31.78.188 ip-172-31-67-14.ec2.internal <none> <none>
amazon-cloudwatch cloudwatch-agent-st4p9 1/1 Running 0 4s 172.31.1.111 ip-172-31-5-6.ec2.internal <none> <none>
What did you expect to see?
I expected that all the pods of the ds should be restarted
What did you see instead?
Instead, I see that only 1 pod is getting restarted
What version did you use?
v1.2.2-eksbuild.1
What config did you use?
NA
Environment
Tried for cluster version 1.26, 1.27 & 1.28
Additional context
I could observer difference in the creation of the controllerrevisisons.
For a sample ds, where rollout restart works perfectly fine, 1 new controllerrevision is created when we perform rollout restart
% kubectl get controllerrevision -A
NAMESPACE NAME CONTROLLER REVISION AGE
amazon-cloudwatch cloudwatch-agent-6ddd78df4 daemonset.apps/cloudwatch-agent 1 34m
amazon-cloudwatch fluent-bit-57659b7864 daemonset.apps/fluent-bit 1 34m
default web-79dc58f667 statefulset.apps/web 1 46d
kube-system aws-node-5b47bbc5c8 daemonset.apps/aws-node 2 16d
kube-system aws-node-5bdc4b45f4 daemonset.apps/aws-node 3 16d
kube-system aws-node-7845867c85 daemonset.apps/aws-node 1 31d
Whereas in case of cloudwatch agent pods, the 1st controllerrevision is deleted and 2 new controller revisions are created. 3rd one is same as the 1st one. Below is the pattern:
$kubectl get controllerrevision -A | grep watch
amazon-cloudwatch cloudwatch-agent-5f44485c55 daemonset.apps/cloudwatch-agent 1 20m
$kubectl get controllerrevision -A | grep watch
amazon-cloudwatch cloudwatch-agent-5f44485c55 daemonset.apps/cloudwatch-agent 2 36m
amazon-cloudwatch cloudwatch-agent-746f576ff6 daemonset.apps/cloudwatch-agent 3 47m
$kubectl get controllerrevision -A | grep watch
amazon-cloudwatch cloudwatch-agent-5f44485c55 daemonset.apps/cloudwatch-agent 2 40m
amazon-cloudwatch cloudwatch-agent-746f576ff6 daemonset.apps/cloudwatch-agent 5 51m
amazon-cloudwatch cloudwatch-agent-cd885487d daemonset.apps/cloudwatch-agent 4 16s
$kubectl get controllerrevision -A | grep watch
amazon-cloudwatch cloudwatch-agent-5f44485c55 daemonset.apps/cloudwatch-agent 2 42m
amazon-cloudwatch cloudwatch-agent-746f576ff6 daemonset.apps/cloudwatch-agent 7 53m
amazon-cloudwatch cloudwatch-agent-779d495df4 daemonset.apps/cloudwatch-agent 6 4s
amazon-cloudwatch cloudwatch-agent-cd885487d daemonset.apps/cloudwatch-agent 4 2m2s
$kubectl get controllerrevision -A | grep watch
amazon-cloudwatch cloudwatch-agent-5f44485c55 daemonset.apps/cloudwatch-agent 2 42m
amazon-cloudwatch cloudwatch-agent-746f576ff6 daemonset.apps/cloudwatch-agent 9 53m
amazon-cloudwatch cloudwatch-agent-779d495df4 daemonset.apps/cloudwatch-agent 6 21s
amazon-cloudwatch cloudwatch-agent-84df56d566 daemonset.apps/cloudwatch-agent 8 3s
amazon-cloudwatch cloudwatch-agent-cd885487d daemonset.apps/cloudwatch-agent 4 2m19s
We are in the process of starting to use the amazon-cloudwatch-observability EKS Addon in our EKS Clusters. This addon seems to use the amazon-cloudwatch-agent-operator project under the hood.
We noticed that the Fluent Bit DaemonSet is enabled even when the containerLogs.enabled
is set to false
in the Helm chart. According to our understanding, the containerLogs.enabled
setting only modifies the resource limits and logging configuration for Fluent Bit, but does not control the creation of the DaemonSet itself.
Initially, we would like to continue using our existing Fluent Bit deployment model and avoid creating separate and unnecessary DaemonSets for the EKS nodes if they are not used.
Would it be possible to modify the Helm chart to skip the creation of the Fluent Bit DaemonSet resource when containerLogs.enabled
is set to false
? For example, by adding the following condition in the helm/templates/linux/fluent-bit-daemonset.yaml
file line 1:
{{- if .Values.containerLogs.enabled }}
If you have no objections, we are more than happy to create a pull request for the modification.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.