Giter VIP home page Giter VIP logo

amazon-cloudwatch-agent-operator's People

Contributors

anuraaga avatar avadhut123pisal avatar bogdandrutu avatar changexd avatar chrlic avatar dependabot[bot] avatar frzifus avatar iblancasa avatar ishwarkanse avatar jaronoff97 avatar jpkrohling avatar kevinearls avatar kielek avatar kristinapathak avatar lisguo avatar majanjua-amzn avatar mat-rumian avatar matej-g avatar mitali-salvi avatar moh-osman3 avatar movence avatar nathalapooja avatar opentelemetrybot avatar paramadon avatar pavolloffay avatar sky333999 avatar swiatekm-sumo avatar tylerhelmuth avatar vineethreddy02 avatar yuriolisa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amazon-cloudwatch-agent-operator's Issues

Agent fails with credential errors, cannot use IAM Roles for Service Accounts (IRSA) or EKS Pod Identities with EKS Addon

Users may wish to run the CloudWatch Agent using pod-based IAM roles, using the IRSA or EKS Pod Identities technologies. Recently (PR below) this was enabled when an environment variable is set on the agent pod, RUN_WITH_IRSA=true, and this enables the agent to utilize the default provider chain for AWS authentication.

However, the EKS Addon for AWS CloudWatch Observability creates a managed AmazonCloudWatchAgent configuration, making it unsafe - there is no guarantee it won't be overridden - to add environment variables.

Background

CloudWatch Agent PR:

Expected behavior

Running the EKS Addon for AWS CloudWatch Observability with pod-based IAM should work by default.

Actual behavior

The agent fails, and there is no knob available to users to ensure the agent works.

Proposal

Either of these solutions would address this:

  • The Addon and Operator should permit an additional configuration, to merge either individual env vars or arbitrary config into the AmazonCloudWatchAgent custom resource
  • The Agent's configuration file, cwagentconfig.json, which is managed by the add-on should accept a configuration key to enable the RUN_WITH_IRSA mode.

Cloudwatch agent pods don't get restarted when doing rollout-restart

Describe the bug

All the pods part of the ds cloudwatch-agent are not getting restarted when doing kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch. Only one pod is getting restarted.

Steps to reproduce

Created a cluster of version 1.28 and installed the addon Amazon CloudWatch Observability of version v1.2.2-eksbuild.1.

Intially we have 2 pods:

kubectl get pods  -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide   
NAMESPACE           NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE                           NOMINATED NODE   READINESS GATES
amazon-cloudwatch   cloudwatch-agent-hdbmv   1/1     Running   0          7s    172.31.78.188   ip-172-31-67-14.ec2.internal   <none>           <none>
amazon-cloudwatch   cloudwatch-agent-ttfbd   1/1     Running   0          7s    172.31.1.111    ip-172-31-5-6.ec2.internal     <none>           <none>

1st Restart:

kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch                      
daemonset.apps/cloudwatch-agent restarted

We can see that only 1 pod got restarted, other pod is still running:

kubectl get pods  -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide  -w
NAMESPACE           NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE                           NOMINATED NODE   READINESS GATES
amazon-cloudwatch   cloudwatch-agent-hdbmv   1/1     Running   0          33s   172.31.78.188   ip-172-31-67-14.ec2.internal   <none>           <none>
amazon-cloudwatch   cloudwatch-agent-l2mgm   1/1     Running   0          8s    172.31.0.110    ip-172-31-5-6.ec2.internal     <none>           <none>

Same behaviour every time:

kubectl rollout restart ds cloudwatch-agent -n amazon-cloudwatch                       
daemonset.apps/cloudwatch-agent restarted


kubectl get pods  -A -l app.kubernetes.io/component=amazon-cloudwatch-agent -o wide  
NAMESPACE           NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE                           NOMINATED NODE   READINESS GATES
amazon-cloudwatch   cloudwatch-agent-hdbmv   1/1     Running   0          71s   172.31.78.188   ip-172-31-67-14.ec2.internal   <none>           <none>
amazon-cloudwatch   cloudwatch-agent-st4p9   1/1     Running   0          4s    172.31.1.111    ip-172-31-5-6.ec2.internal     <none>           <none>

What did you expect to see?
I expected that all the pods of the ds should be restarted

What did you see instead?
Instead, I see that only 1 pod is getting restarted

What version did you use?
v1.2.2-eksbuild.1

What config did you use?
NA

Environment
Tried for cluster version 1.26, 1.27 & 1.28

Additional context

I could observer difference in the creation of the controllerrevisisons.

For a sample ds, where rollout restart works perfectly fine, 1 new controllerrevision is created when we perform rollout restart

% kubectl get controllerrevision -A
NAMESPACE           NAME                              CONTROLLER                            REVISION   AGE
amazon-cloudwatch   cloudwatch-agent-6ddd78df4        daemonset.apps/cloudwatch-agent       1          34m
amazon-cloudwatch   fluent-bit-57659b7864             daemonset.apps/fluent-bit             1          34m
default             web-79dc58f667                    statefulset.apps/web                  1          46d
kube-system         aws-node-5b47bbc5c8               daemonset.apps/aws-node               2          16d
kube-system         aws-node-5bdc4b45f4               daemonset.apps/aws-node               3          16d
kube-system         aws-node-7845867c85               daemonset.apps/aws-node               1          31d

Whereas in case of cloudwatch agent pods, the 1st controllerrevision is deleted and 2 new controller revisions are created. 3rd one is same as the 1st one. Below is the pattern:

$kubectl get controllerrevision -A | grep watch               
amazon-cloudwatch   cloudwatch-agent-5f44485c55       daemonset.apps/cloudwatch-agent       1          20m

$kubectl get controllerrevision -A | grep watch               
amazon-cloudwatch   cloudwatch-agent-5f44485c55       daemonset.apps/cloudwatch-agent       2          36m
amazon-cloudwatch   cloudwatch-agent-746f576ff6       daemonset.apps/cloudwatch-agent       3          47m

$kubectl get controllerrevision -A | grep watch    
amazon-cloudwatch   cloudwatch-agent-5f44485c55       daemonset.apps/cloudwatch-agent       2          40m
amazon-cloudwatch   cloudwatch-agent-746f576ff6       daemonset.apps/cloudwatch-agent       5          51m
amazon-cloudwatch   cloudwatch-agent-cd885487d        daemonset.apps/cloudwatch-agent       4          16s

$kubectl get controllerrevision -A | grep watch                  
amazon-cloudwatch   cloudwatch-agent-5f44485c55       daemonset.apps/cloudwatch-agent       2          42m
amazon-cloudwatch   cloudwatch-agent-746f576ff6       daemonset.apps/cloudwatch-agent       7          53m
amazon-cloudwatch   cloudwatch-agent-779d495df4       daemonset.apps/cloudwatch-agent       6          4s
amazon-cloudwatch   cloudwatch-agent-cd885487d        daemonset.apps/cloudwatch-agent       4          2m2s

$kubectl get controllerrevision -A | grep watch                  
amazon-cloudwatch   cloudwatch-agent-5f44485c55       daemonset.apps/cloudwatch-agent       2          42m
amazon-cloudwatch   cloudwatch-agent-746f576ff6       daemonset.apps/cloudwatch-agent       9          53m
amazon-cloudwatch   cloudwatch-agent-779d495df4       daemonset.apps/cloudwatch-agent       6          21s
amazon-cloudwatch   cloudwatch-agent-84df56d566       daemonset.apps/cloudwatch-agent       8          3s
amazon-cloudwatch   cloudwatch-agent-cd885487d        daemonset.apps/cloudwatch-agent       4          2m19s

Fluent Bit DaemonSet created even when containerLogs.enabled is false.

We are in the process of starting to use the amazon-cloudwatch-observability EKS Addon in our EKS Clusters. This addon seems to use the amazon-cloudwatch-agent-operator project under the hood.

Issue

We noticed that the Fluent Bit DaemonSet is enabled even when the containerLogs.enabled is set to false in the Helm chart. According to our understanding, the containerLogs.enabled setting only modifies the resource limits and logging configuration for Fluent Bit, but does not control the creation of the DaemonSet itself.

Initially, we would like to continue using our existing Fluent Bit deployment model and avoid creating separate and unnecessary DaemonSets for the EKS nodes if they are not used.

Question

Would it be possible to modify the Helm chart to skip the creation of the Fluent Bit DaemonSet resource when containerLogs.enabled is set to false? For example, by adding the following condition in the helm/templates/linux/fluent-bit-daemonset.yaml file line 1:

{{- $region := .Values.region | required ".Values.region is required." -}}

{{- if .Values.containerLogs.enabled }}

If you have no objections, we are more than happy to create a pull request for the modification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.