Giter VIP home page Giter VIP logo

Comments (11)

euank avatar euank commented on August 12, 2024 3

Thanks to @pieterlange for the PR to fix this problem on master. That change should be built/published now, so this issue should be fixed.

Ideally the master/edge/latest mutable tags wouldn't be the way these images are consumed, but I think discussion of these tags belongs in a different issue.

Thanks for reporting the issue

from awscli.

Fsero avatar Fsero commented on August 12, 2024 2

Thanks for pointing this out @mumoshu , we have pinned versions and setup a internal docker registry after quay.io was down and now we fetch new versions from quay or gcr to keep synched.

from awscli.

pieterlange avatar pieterlange commented on August 12, 2024 1

So, i know it's bad to depend on "latest" / "master" unversioned image tags but this has been kube-aws's dependency since it's inception and may potentially hit every kube-aws deployed cluster. Can we restore previous behaviour on master for now @euank and give users some time to update their clusters?

from awscli.

whereisaaron avatar whereisaaron commented on August 12, 2024 1

Maybe @ankon, but I suggest change it to 'off' to be sure. I had an old 0.9.3 etcd reboot even with the 'etcd-lock' errors. It appeared to be for updates, but maybe it was for a different reason.

from awscli.

ankon avatar ankon commented on August 12, 2024

I noticed on quay.io/coreos/awscli that the 'master' image now has a 'CMD' in the Dockerfile, but I cannot see where this actually comes from.

Master:
screen-quay-master

Latest:
screen-quay-latest

from awscli.

euank avatar euank commented on August 12, 2024

The CMD /bin/sh comes from this change in alpine: gliderlabs/docker-alpine#199

The FROM alpine is implicitly including that if a new enough alpine image is referenced.

We can fix this by explicitly overriding cmd to be /bin/sh -c to match the previous behavior.

It can also be worked around by using either of the following commands:

/usr/bin/rkt run  ..... quay.io/coreos/awscli:latest --exec=aws -- s3 ....

# or

/usr/bin/rkt run  ..... quay.io/coreos/awscli:latest --exec=sh -- -c "aws s3 ...."

from awscli.

ankon avatar ankon commented on August 12, 2024

For now I worked around this by using :latest, which right now seems to be "previous". It would be great to have versions on this repository, so that one can point to the specific images.

Right now I need to recover my complete kubernetes cluster, as the attempt to fix this problem ("cannot bring up new nodes") created some major mess due to some critical other commands failing. I think it would be best to restore the previous behavior for now, under the assumption that this might have actually affected more people, and the change being an "API change" for awscli:master, if you will :)

from awscli.

pieterlange avatar pieterlange commented on August 12, 2024

I'm in the same boat with an old cluster that needs upgrading, but it looks like the current kube-aws behaviour would also trigger this.

https://github.com/kubernetes-incubator/kube-aws/blob/master/core/controlplane/config/templates/cluster.yaml#L964

from awscli.

mumoshu avatar mumoshu commented on August 12, 2024

cc'ing @Fsero @danielfm @c-knowles @camilb @whereisaaron

In case your node starts failing while fetching cloud-configs from s3, this issue would help. awsCliImage.tag to something older or override the aws command in cloud-config-* as @ankon suggested.

from awscli.

whereisaaron avatar whereisaaron commented on August 12, 2024

Thanks @mumoshu. We should warn kube-aws users to check their etcd nodes ASAP. Versions of kube-aws from at least 0.9.3 0.9.5-rc.5 and earlier deployed etcd with the etcd-lock reboot strategy. Given this issue, it appears the next automatic update reboot could bring down everyone's k8s etcd cluster without notice? The current 0.9.5 and later versions appears safe ('reboot-strategy: "off"') by default. Not sure about 0.9.4/5/6/7/8?

It looks like this was fixed on March 20 for version v0.9.5-rc.5 and later, changing the etcd node strategy to 'off'. So any cluster older than ~9 months is ripe for sudden etcd failure.

@ankon sorry you got to be the canary!

This is obviously a hard lesson in using versioned tags! To be fair though, coreos/awscli doesn't actually have any tagged versions. Only master, latest, and edge (or is that big hex string a 'version'?). Could we now tag the old ('latest') version and this new version with a versioned tag?

Given the purpose of this container is purely to provide aws could we tag the old and new versions with the awscli version it contains? Then current and future clusters can be tagged to one of these versions, and we won't have this 'cluster apocalypse' again with the next breaking change :-).

image

from awscli.

ankon avatar ankon commented on August 12, 2024

Versions of kube-aws from at least 0.9.7 and earlier deployed etcd with the etcd-lock reboot strategy.

Just had a minor heart-attack, that was a good pointer to check!

At least on my (horribly old kube-aws 0.9.4-ish) cluster it seems that this etcd is indeed configured to use the 'etcd-lock' reboot strategy, BUT locksmithd complains about a missing "reboot window", and so the reboot doesn't actually happen. I'm in the progress of replacing this cluster, so right now things seem to be ok ... for me.

Given the purpose of this container is purely to provide aws could we tag the old and new versions with the awscli version it contains?

This seems reasonable to me, there doesn't seem to be much of a point to use an arbitrary AWS CLI version when the actual needed parts are known in advance (copy stuff from S3, something else? :D).

from awscli.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.