Giter VIP home page Giter VIP logo

Comments (3)

kirs avatar kirs commented on August 28, 2024

I agree that it would be great to have this kind of check, but that might be problematic because for security reasons the host that runs kubernetes-deploy might not have access to the container registry. Also the registry auth may be specific to the project (VPN or token).
Maybe we should leave it up to users?

from krane.

KnVerey avatar KnVerey commented on August 28, 2024

I agree with Kir that we shouldn't really assume the deploy host has access to the registry. Conversely, it would be possible (if stranger) for the deploy host to have access but the production cluster not to. We should improve our handling of the situation though for sure.

Did you see warnings logged about the ImangePullBackoff? This code is supposed to log, giving you the chance to realize it is unrecoverable and abort. Minimally we could improve that message to suggest action.

#54 proposes aborting the deploy at some point based on observing this condition. We'd have to think carefully about when we make the "it's doomed" call though... Last week we saw a case where issues with the production env's link to the registry caused a huge deployment to roll out super slowly, with many (but not all) containers in ImagePullBackoff. Ideally that case wouldn't fail the deploy. Maybe we could flag this condition to the parent deployment object, which could make the call when all its children are persistently in that state... still a timeout, but hopefully smarter?

On the other hand, maybe we should keep the log-warning approach and expect deployments to specify progressDeadlineSeonds to actually fail the deploy, for real on the k8s side, not just from our gem's perspective. AFAICT that field seems to be the Kubernetes answer to permanent failure conditions, including this one.

from krane.

KnVerey avatar KnVerey commented on August 28, 2024

IMO #116 resolved this. We now inspect all pods in the new ReplicaSet, and if they are all failing for a recognized reason (including because of failure to pull image), the deploy will fail immediately. Please reopen if you disagree.

from krane.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.