Comments (6)
We're working on providing an alternative to IMDS. You can track that here aws/amazon-cloudwatch-agent#1101.
from amazon-cloudwatch-agent-operator.
Hi @AaronFriel,
Have a few questions that would help us look into your issue.
- Did you follow https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html#install-CloudWatch-Observability-EKS-addon-serviceaccountrole when setting up the EKS Addon with IRSA?
- Can you provide a sample of the failure you're seeing in the agent logs?
- Does the agent work as expected when you set
RUN_WITH_IRSA=true
on the pod?
from amazon-cloudwatch-agent-operator.
Hey @jefchien thanks for getting back to me.
Yeah, this is the TypeScript code used with Pulumi IaC to deploy EKS CloudWatch.
// This configures IRSA with an `AssumeRoleWithWebIdentity` - confirmed working with other addons including:
// * AWS EFS CSI Driver
// * Karpenter Controller
const role = IamServiceAccountRole(`${clusterPetName}-cloudwatch-observability`, {
namespaceName: 'amazon-cloudwatch',
serviceAccountName: 'cloudwatch-agent',
});
new aws.iam.RolePolicyAttachment(`${clusterPetName}-cloudwatch-observability-agent`, {
policyArn: 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy',
role,
});
new aws.iam.RolePolicyAttachment(`${clusterPetName}-cloudwatch-observability-xray`, {
policyArn: 'arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess',
role,
});
const addonVersion = aws.eks.getAddonVersionOutput({
addonName: 'amazon-cloudwatch-observability',
kubernetesVersion: clusterVersion,
mostRecent: true,
});
const addon = new aws.eks.Addon(
`${clusterPetName}-cloudwatch-observability`,
{
clusterName,
addonName: addonVersion.addonName,
addonVersion: addonVersion.version,
serviceAccountRoleArn: role.arn,
preserve: false,
},
{ dependsOn },
);
Yes, here are the agent logs. I've formatted these for readability:
2024-04-09T01:27:56Z E! {
"caller": "[email protected]/cwlog_client.go:135",
"msg": "cwlog_client: Error occurs in PutLogEvents",
"kind": "exporter",
"data_type": "metrics",
"name": "awsemf/containerinsights",
"error": "SharedCredsLoad: failed to load shared credentials file
caused by: FailedRead: unable to open file
caused by: open /root/.aws/credentials: no such file or directory",
"stacktrace": "github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*Client).PutLogEvents
\tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/cwlog_client.go:135
github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*logPusher).pushEventBatch
\tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/pusher.go:264
github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/cwlogs.(*logPusher).AddLogEntry
\tgithub.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/pusher.go:238
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.(*emfExporter).pushMetricsData
\tgithub.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/emf_exporter.go:153
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:58
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:38
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:33
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:173
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:33
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:189
go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func1
\tgo.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:98
go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics
\tgo.opentelemetry.io/collector/[email protected]/metrics.go:25
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry.(*wrapperMetricsExporter).ConsumeMetrics
\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/resource_to_telemetry.go:32
go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
\tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:442
go.opentelemetry.io/collector/processor/batchprocessor.(*shard).sendItems
\tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:256
go.opentelemetry.io/collector/processor/batchprocessor.(*shard).start
\tgo.opentelemetry.io/collector/processor/[email protected]/batch_processor.go:218"
}
2024-04-09T01:27:56Z W! {
"caller": "[email protected]/batch_processor.go:258",
"msg": "Sender failed",
"kind": "processor",
"name": "batch/containerinsights",
"pipeline": "metrics/containerinsights",
"error": "SharedCredsLoad: failed to load shared credentials file
caused by: FailedRead: unable to open file
caused by: open /root/.aws/credentials: no such file or directory"
}
The only way to set RUN_WITH_IRSA
to true is to edit the amazoncloudwatchagents.cloudwatch.aws.amazon.com
resource, because the operator will reconcile that against the pods. Editing the resource like so:
apiVersion: v1
items:
- apiVersion: cloudwatch.aws.amazon.com/v1alpha1
kind: AmazonCloudWatchAgent
metadata:
annotations:
pulumi.com/patchForce: "true"
creationTimestamp: "2024-04-01T08:21:38Z"
generation: 5
labels:
app.kubernetes.io/managed-by: amazon-cloudwatch-agent-operator
name: cloudwatch-agent
namespace: amazon-cloudwatch
resourceVersion: "3839446"
uid: 542fecd4-0368-4ab1-8d8b-e7e5ad47c538
spec:
config: '{"agent":{"region":"us-west-2"},"logs":{"metrics_collected":{"app_signals":{"hosted_in":"opal-quokka-6860d02"},"kubernetes":{"cluster_name":"opal-quokka-6860d02","enhanced_container_insights":true}}},"traces":{"traces_collected":{"app_signals":{}}}}'
env:
+ - name: RUN_WITH_IRSA
+ value: true
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
This edit enables the CloudWatch Agent to succeed. However, there is no way to guarantee that this change is persistent, because the cloudwatch-agent
resource is owned by the addon and could be overwritten during an upgrade.
from amazon-cloudwatch-agent-operator.
Do you have IMDS disabled or a hop limit set to 1? This seems like a similar issue to aws/amazon-cloudwatch-agent#1101 where the agent thinks it is onPrem
because it cannot reach IMDS, which results in it trying to read the /root/.aws/credentials
file.
from amazon-cloudwatch-agent-operator.
Yes, because that is considered best practice - but also because if node IMDS is enabled, it is not using pod identity, it's using node identity.
If the CloudWatch Agent does not work with IMDS hop limit set to 1, what is this section doing?
I think the answer is "nothing"?
from amazon-cloudwatch-agent-operator.
Any updates on this? It's preventing us from using it.
from amazon-cloudwatch-agent-operator.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-cloudwatch-agent-operator.