Giter VIP home page Giter VIP logo

Comments (6)

AuroreM avatar AuroreM commented on August 11, 2024 1

We manage to understand the issue : the calico-node starts to quickly and the kube-proxy has not set the iptables fully, though the kubernetes server ip is not set

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

yeah, calico/node relies on kube-proxy to program the kubernetes service IP in order to access the API, unless running in eBPF mode in which case an explicit IP can be given.

Might just need to wait until kube-proxy is ready until Calico is installed

from operator.

tmjd avatar tmjd commented on August 11, 2024

@caseydavenport I don't think this is something that we or anyone can control. They're both daemonsets and there isn't anything that kube-proxy would be creating/setting that could delay calico-node startup is there? I'm thinking in the case of adding new nodes to a cluster.
Should we consider a startup probe to cover this case? (I just learned about them and see they became available in v1.20).
Here a few links where I was reading about them:

from operator.

wiikip avatar wiikip commented on August 11, 2024

After further investigation we found out that the failed request to the kube server API takes 30s to timeout. During this time the liveness probe has the time to fail 3 times and then the pod restarts. I think a quick win solution would be to decrease this timeout delay to 5s, even if the first call to the API fails, 5s later the API will be up for sure and the probe will not kill the pod.

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

@tmjd yeah, I think it's a hard one to do safely. We could get something to work for 90% of cases but not all, most likely.

@wiikip yeah, I think we could decrease the connection timeout, but we need to be careful. I don't think we should drop it below 10s - too short and we risk destabilizing unnecessarily. Maybe we can drop the connection timeout to 10s and increase the liveness prove failure threshold to 40s? That should be plenty.

from operator.

wiikip avatar wiikip commented on August 11, 2024

@caseydavenport Yeah, I agree with you. Decreasing the connection timeout to 10s and allowing the liveness probe to fail a 4th time ( 40s before it kill the pod) will do the job !

from operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.