Giter VIP home page Giter VIP logo

Comments (6)

jefchien avatar jefchien commented on June 10, 2024 1

We're working on providing an alternative to IMDS. You can track that here aws/amazon-cloudwatch-agent#1101.

from amazon-cloudwatch-agent-operator.

jefchien avatar jefchien commented on June 10, 2024

Hi @AaronFriel,

Have a few questions that would help us look into your issue.

from amazon-cloudwatch-agent-operator.

AaronFriel avatar AaronFriel commented on June 10, 2024

Hey @jefchien thanks for getting back to me.

Yeah, this is the TypeScript code used with Pulumi IaC to deploy EKS CloudWatch.

  // This configures IRSA with an `AssumeRoleWithWebIdentity` - confirmed working with other addons including:
  // * AWS EFS CSI Driver
  // * Karpenter Controller
  const role = IamServiceAccountRole(`${clusterPetName}-cloudwatch-observability`, {
    namespaceName: 'amazon-cloudwatch',
    serviceAccountName: 'cloudwatch-agent',
  });

  new aws.iam.RolePolicyAttachment(`${clusterPetName}-cloudwatch-observability-agent`, {
    policyArn: 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy',
    role,
  });
  new aws.iam.RolePolicyAttachment(`${clusterPetName}-cloudwatch-observability-xray`, {
    policyArn: 'arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess',
    role,
  });

  const addonVersion = aws.eks.getAddonVersionOutput({
    addonName: 'amazon-cloudwatch-observability',
    kubernetesVersion: clusterVersion,
    mostRecent: true,
  });

  const addon = new aws.eks.Addon(
    `${clusterPetName}-cloudwatch-observability`,
    {
      clusterName,
      addonName: addonVersion.addonName,
      addonVersion: addonVersion.version,
      serviceAccountRoleArn: role.arn,
      preserve: false,
    },
    { dependsOn },
  );

Yes, here are the agent logs. I've formatted these for readability:

2024-04-09T01:27:56Z E! {
  "caller": "[email protected]/cwlog_client.go:135",
  "msg": "cwlog_client: Error occurs in PutLogEvents",
  "kind": "exporter",
  "data_type": "metrics",
  "name": "awsemf/containerinsights",
  "error": "SharedCredsLoad: failed to load shared credentials file
  caused by: FailedRead: unable to open file
  caused by: open /root/.aws/credentials: no such file or directory",
  "stacktrace": "github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*Client).PutLogEvents
  \tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/cwlog_client.go:135
  github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*logPusher).pushEventBatch
  \tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/pusher.go:264
  github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*logPusher).AddLogEntry
  \tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/pusher.go:238
  github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.(*emfExporter).pushMetricsData
  \tgithub.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/emf_exporter.go:153
  go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:58
  go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:38
  go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:33
  go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:173
  go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:33
  go.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:189
  go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func1
  \tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:98
  go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics
  \tgo.opentelemetry.io/collector/[email protected]/metrics.go:25
  github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry.(*wrapperMetricsExporter).ConsumeMetrics
  \tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/resource_to_telemetry.go:32
  go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
  \tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:442
  go.opentelemetry.io/collector/processor/batchprocessor.(*shard).sendItems
  \tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:256
  go.opentelemetry.io/collector/processor/batchprocessor.(*shard).start
  \tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:218"
}
2024-04-09T01:27:56Z W! {
  "caller": "[email protected]/batch_processor.go:258",
  "msg": "Sender failed",
  "kind": "processor",
  "name": "batch/containerinsights",
  "pipeline": "metrics/containerinsights",
  "error": "SharedCredsLoad: failed to load shared credentials file
  caused by: FailedRead: unable to open file
  caused by: open /root/.aws/credentials: no such file or directory"
}

The only way to set RUN_WITH_IRSA to true is to edit the amazoncloudwatchagents.cloudwatch.aws.amazon.com resource, because the operator will reconcile that against the pods. Editing the resource like so:

 apiVersion: v1
 items:
 - apiVersion: cloudwatch.aws.amazon.com/v1alpha1
   kind: AmazonCloudWatchAgent
   metadata:
     annotations:
       pulumi.com/patchForce: "true"
     creationTimestamp: "2024-04-01T08:21:38Z"
     generation: 5
     labels:
       app.kubernetes.io/managed-by: amazon-cloudwatch-agent-operator
     name: cloudwatch-agent
     namespace: amazon-cloudwatch
     resourceVersion: "3839446"
     uid: 542fecd4-0368-4ab1-8d8b-e7e5ad47c538
   spec:
     config: '{"agent":{"region":"us-west-2"},"logs":{"metrics_collected":{"app_signals":{"hosted_in":"opal-quokka-6860d02"},"kubernetes":{"cluster_name":"opal-quokka-6860d02","enhanced_container_insights":true}}},"traces":{"traces_collected":{"app_signals":{}}}}'
     env:
+  - name: RUN_WITH_IRSA
+    value: true  
   - name: K8S_NODE_NAME
     valueFrom:
       fieldRef:
         fieldPath: spec.nodeName

This edit enables the CloudWatch Agent to succeed. However, there is no way to guarantee that this change is persistent, because the cloudwatch-agent resource is owned by the addon and could be overwritten during an upgrade.

from amazon-cloudwatch-agent-operator.

jefchien avatar jefchien commented on June 10, 2024

Do you have IMDS disabled or a hop limit set to 1? This seems like a similar issue to aws/amazon-cloudwatch-agent#1101 where the agent thinks it is onPrem because it cannot reach IMDS, which results in it trying to read the /root/.aws/credentials file.

from amazon-cloudwatch-agent-operator.

AaronFriel avatar AaronFriel commented on June 10, 2024

Yes, because that is considered best practice - but also because if node IMDS is enabled, it is not using pod identity, it's using node identity.

If the CloudWatch Agent does not work with IMDS hop limit set to 1, what is this section doing?

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html#install-CloudWatch-Observability-EKS-addon-serviceaccountrole

I think the answer is "nothing"?

from amazon-cloudwatch-agent-operator.

charlierm avatar charlierm commented on June 10, 2024

Any updates on this? It's preventing us from using it.

from amazon-cloudwatch-agent-operator.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.