Comments (5)
Thanks for the report @Shershebnev !
Are any errors reported on the instance via the console using "Get system log", "Get instance screenshot", or "EC2 serial console"? I suspect that some additional user data and/or roles may be needed for the instance. I'm looking back over the changelogs to confirm this suspicion and explain the differences between k8s versions.
from bottlerocket.
Log is empty, but screenshot shows some encryption error
That's on Intel-based instances (m6i.4xlarge)
I've also tried AMD-based instance (m6a.4xlarge), it seems to be stuck on booting
I've also tried the oldest ami I can see - 1.13.0 (bottlerocket-aws-k8s-1.26-x86_64-v1.13.0-f7a2e3cc
) and it works fine even though it gives the same error about encryption, still it proceeds further and appears in ssm almost immediately.
Yet 1.14.0 gets stuck
So at this point I've realized I actually have nodes in EKS that I've switched to bottlerocket and they work fine on the latest ami for 1.26 but the nvidia version bottlerocket-aws-k8s-1.26-nvidia-x86_64-v1.15.1-264e294c
, they appear in ssm as well. The only differences I could see are /dev/xvda
root volume size (4 gb vs 2 gb) and eks nodes being on nvidia version. I've changed both and it seems to go past encryption error with such setup but then still got stuck
And after some more waiting I got a system log ending with
[ 305.391718] sundog[1858]: Setting generator 'pluto private-dns-name' failed with exit code 1 - stderr: Timed out retrieving private DNS name from EC2: deadline has elapsed
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Send signal to CloudFormation Stack.
[DEPEND] Dependency failed for Sets the hostname.
i-0ce93f121a3bf8b3a.log
I can confirm that on this VPC DNS resolution is enabled.
There seem to be related issue #3064 however my failing instances are in public subnet so doesn't seem to be caused by what they had going on in the issue. However my EKS nodes which seem to work fine are in the private subnets.
This turned into quite a long post, sorry about that. In a nutshell:
- When starting in public subnet as standalone instances:
- Ami version 1.13.0 seems to work fine and appear almost immediately in SSM even with 2 GB root volume.
- Ami version 1.14.0 and beyond (including latest version) seems to get stuck either on encryption error or, when increasing root volume to 4 GB, gets stuck for several minutes to arrive to DNS resolution error from the log above.
- However in EKS when starting nodes in private subnets everything seems to work fine (still can see the encryption error though), here they start with 4 GB (I also find it strange that default root volume size seems to be different as I don't specify root volume size in EKS explicitly)
Hope this is helpful :)
from bottlerocket.
Related to #3525 (comment) I think we might need to add in EC2 Describe Images access to the IAM Role policies attached in https://github.com/aws-samples/containers-blog-maelstrom/blob/ee8e18c0bb170f625b86a59dfc0605e9c98cdee3/bottlerocket-images-cache/ebs-snapshot-instance.yaml#L44. For example, I have AmazonEKSWorkerNodePolicy
attached with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications",
"ec2:DescribeVpcs",
"eks:DescribeCluster"
],
"Resource": "*"
}
]
}
as the policy. This might be the missing piece. Can you try this and see if it resolves the issues with 1.26 coming up? If so, we can try and get this other repo updated to cover this permissions addition.
from bottlerocket.
I've tried with AmazonEC2ReadOnlyAccess
AWS managed policy, everything works now on latest 1.26 🎉
from bottlerocket.
Sounds great! Glad we got you sorted!
from bottlerocket.
Related Issues (20)
- don't use bootconfig for systemd's unified cgroup hierarchy HOT 1
- v1.19.5 💘 Tracking Issue HOT 1
- pytorch could not detect Nvidia driver on bottlerocket HOT 6
- occasional build failures after extracting subpackages HOT 1
- Looking for aws-dev variant AMI ID HOT 1
- Fail to detect GPU on Bottlerocket v1.19 within AWS g4dn instance HOT 8
- v1.20.0 🐫 Tracking Issue HOT 1
- v1.20.0 update eni-max-pods mapping file HOT 1
- ootb: apiclient needs to be model agnostic HOT 1
- v1.20.0 Host container updates
- Is there any documentation for making bottlerocket work without the internet access to the instances security group ? HOT 1
- kernel-parameters does not accept single-word config options, specifying them causes reboot-loops HOT 3
- BottleRocket NVIDIA EKS Node group wont join EKS Cluster HOT 2
- nvidia-container-cli timeout error when running ECS tasks
- Changes to kernel module compression can break certain workflows HOT 11
- Cilium-agent does not start after upgrading to bottlerocket OS 1.20.0 HOT 1
- Host Container Unable to Create Container Task HOT 6
- Collecting logs from EKS Worker Nodes running Bottlerocket AMI when no SSH is enabled HOT 1
- Create symlinks to devices using the device name configured for EBS volumes
- v1.20.1 🐫 Tracking Issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bottlerocket.