Giter VIP home page Giter VIP logo

Comments (3)

vigh-m avatar vigh-m commented on June 7, 2024

Testing shows that this issue is intermittent. The symptoms from the logs show that networkd is reloaded multiple times during boot up of an instance with v2.5.0. This can cause failures in case a cloud-init userdata script is performing networking actions (like fetching from s3) during the reloads.

However repeated testing does not exhibit this behaviour leading me to think that the issue is; Timing related, related to the removal of cleanup from PostExec step, has something to do with the change of the timers to start via systemd Wants= and maybe needs a Before=|After=.

from amazon-ec2-net-utils.

ziggythehamster avatar ziggythehamster commented on June 7, 2024

I know that cloud-init is configured differently on EL vs AL, but we encountered a version of this issue on EL (in v2.4.0). The solution I came up with was to make cloud-init configure its network in fallback mode (i.e., without using networkd/networkmanager/etc. and only using dhcpcd). Obviously, this is not something that you can implement here, but I will paste the relevant lines from our build script:

sed -i "s/renderers: .*/renderers: []\n      activators: ['networkd']/" /etc/cloud/cloud.cfg
echo "network: {config: disabled}" >> /etc/cloud/cloud.cfg

This works because the network stack doesn't meaningfully change configuration once amazon-ec2-net-utils takes over, and cloud-init does nothing special to try to persist its "bootleg" IP configuration when configured this way. You will DHCP twice, however, and I don't think this is resolvable with this solution.

The correct solution is either for amazon-ec2-net-utils to use the same files/naming conventions as cloud-init (these are not configurable in cloud-init, sadly), or to change cloud-init to make these configurable enough that this package could drop in a configuration file that switched to and configured cloud-init to use the same network units that amazon-ec2-net-utils manages. If you used the same unit names and were careful that the amazon-ec2-net-utils units were placed in a directory with higher precedence than cloud-init, then cloud-init's initial unit file would just become superseded by amazon-ec2-net-utils, and networkd wouldn't end up dropping any packets (theoretically) because the fields that change wouldn't require down/up'ing the interface.

Symlinks unfortunately don't seem to work because cloud-init writes the unit files, and as the unit files differ slightly in naming and configuration, the system ends up in an undefined state (as you see with this issue), except now networkd knows about the undefined state (and that's worse because now it is conflicting).

I also couldn't get a combination of Wants/After that I was happy with. If you make amazon-ec2-net-utils entirely dependent on cloud-init-local.service, you still have the timing problem with cloud-init.service, and that's arguably worse because cloud-init-local only needs a network to fetch userdata but cloud-init needs it for any user-defined behavior. So you could then make amazon-ec2-net-utils depend on cloud-init and cloud-init-final, but then the userdata configuration cannot benefit from the network stack being up to the standards being done by amazon-ec2-net-utils. If you make cloud-init-local dependent on amazon-ec2-net-utils, you really need it to be dependent on a target due to the dynamic unit instances, and now you need a target that can somehow represent "amazon-ec2-net-utils is done", which makes late attached ENIs (potentially done in user data) have to work differently than designed, and that sucks too.

from amazon-ec2-net-utils.

vigh-m avatar vigh-m commented on June 7, 2024

That is interesting to know. I've been trying to setup some sort of reproducer so that I can catch this bug in the wild. Still investigating. Some code is parked in the trim-changes branch for now.

Definitely worth investigating orchestration between net-utils, and cloud-init since we have seen similar timing issues between cloud-init and IMDS (which net-utils depends on)

from amazon-ec2-net-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.