Comments (3)
Testing shows that this issue is intermittent. The symptoms from the logs show that networkd
is reloaded multiple times during boot up of an instance with v2.5.0. This can cause failures in case a cloud-init userdata script is performing networking actions (like fetching from s3) during the reloads.
However repeated testing does not exhibit this behaviour leading me to think that the issue is; Timing related, related to the removal of cleanup from PostExec
step, has something to do with the change of the timers to start via systemd Wants=
and maybe needs a Before=|After=
.
from amazon-ec2-net-utils.
I know that cloud-init is configured differently on EL vs AL, but we encountered a version of this issue on EL (in v2.4.0). The solution I came up with was to make cloud-init configure its network in fallback mode (i.e., without using networkd/networkmanager/etc. and only using dhcpcd). Obviously, this is not something that you can implement here, but I will paste the relevant lines from our build script:
sed -i "s/renderers: .*/renderers: []\n activators: ['networkd']/" /etc/cloud/cloud.cfg
echo "network: {config: disabled}" >> /etc/cloud/cloud.cfg
This works because the network stack doesn't meaningfully change configuration once amazon-ec2-net-utils takes over, and cloud-init does nothing special to try to persist its "bootleg" IP configuration when configured this way. You will DHCP twice, however, and I don't think this is resolvable with this solution.
The correct solution is either for amazon-ec2-net-utils to use the same files/naming conventions as cloud-init (these are not configurable in cloud-init, sadly), or to change cloud-init to make these configurable enough that this package could drop in a configuration file that switched to and configured cloud-init to use the same network units that amazon-ec2-net-utils manages. If you used the same unit names and were careful that the amazon-ec2-net-utils units were placed in a directory with higher precedence than cloud-init, then cloud-init's initial unit file would just become superseded by amazon-ec2-net-utils, and networkd wouldn't end up dropping any packets (theoretically) because the fields that change wouldn't require down/up'ing the interface.
Symlinks unfortunately don't seem to work because cloud-init writes the unit files, and as the unit files differ slightly in naming and configuration, the system ends up in an undefined state (as you see with this issue), except now networkd knows about the undefined state (and that's worse because now it is conflicting).
I also couldn't get a combination of Wants/After that I was happy with. If you make amazon-ec2-net-utils entirely dependent on cloud-init-local.service, you still have the timing problem with cloud-init.service, and that's arguably worse because cloud-init-local only needs a network to fetch userdata but cloud-init needs it for any user-defined behavior. So you could then make amazon-ec2-net-utils depend on cloud-init and cloud-init-final, but then the userdata configuration cannot benefit from the network stack being up to the standards being done by amazon-ec2-net-utils. If you make cloud-init-local dependent on amazon-ec2-net-utils, you really need it to be dependent on a target due to the dynamic unit instances, and now you need a target that can somehow represent "amazon-ec2-net-utils is done", which makes late attached ENIs (potentially done in user data) have to work differently than designed, and that sucks too.
from amazon-ec2-net-utils.
That is interesting to know. I've been trying to setup some sort of reproducer so that I can catch this bug in the wild. Still investigating. Some code is parked in the trim-changes
branch for now.
Definitely worth investigating orchestration between net-utils, and cloud-init since we have seen similar timing issues between cloud-init and IMDS (which net-utils depends on)
from amazon-ec2-net-utils.
Related Issues (20)
- [Feature Request] - refresh-policy-routes systemd timer emits lots of noisy journald log entries HOT 3
- Secondary IPs missed in aliases depending on sort HOT 2
- udev rules configuration incorrectly handles virtual interfaces HOT 1
- 1.x: repeatedly deletes and creates rules for delegated IPv6 prefixes HOT 1
- Going from v1.7.0 to v1.7.1 caused instance to be unreachable HOT 11
- ec2-net-utils v2.3.0 - Issues with hotplug HOT 1
- get_meta retry loop attempt not incrementing correctly HOT 1
- 1.x: regression in the handling of `/etc/sysconfig/network-scripts/route-*` files
- The amazon-ec2-net-utils RPM package does not own the /usr/share/amazon-ec2-net-utils directory HOT 3
- 1.x: Race condition at boot can cause instance to miss ENI attachments
- Shellcheck error on Debian 12/bookworm (testing as of today) HOT 2
- Using device number from IMDS before propagation HOT 6
- ec2-net-utils deleting custom ip rules upon state change HOT 1
- 2.4.0-1.amzn2023.0.1 breaks docker connectivity inside host HOT 8
- amazon-ec2-net-utils incompatible with amazon-vpc-cni-k8s HOT 3
- Incorporate ENA Express recommended settings HOT 1
- Package should use systemd presets instead of enabling/disabling services
- Support for Rhel9/ Predicatable interfaces HOT 1
- add support for multiple NetworkCardIndex values
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-ec2-net-utils.