Comments (6)
We manage to understand the issue : the calico-node
starts to quickly and the kube-proxy
has not set the iptables fully, though the kubernetes server ip is not set
from operator.
yeah, calico/node relies on kube-proxy to program the kubernetes service IP in order to access the API, unless running in eBPF mode in which case an explicit IP can be given.
Might just need to wait until kube-proxy is ready until Calico is installed
from operator.
@caseydavenport I don't think this is something that we or anyone can control. They're both daemonsets and there isn't anything that kube-proxy would be creating/setting that could delay calico-node startup is there? I'm thinking in the case of adding new nodes to a cluster.
Should we consider a startup probe to cover this case? (I just learned about them and see they became available in v1.20).
Here a few links where I was reading about them:
- https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
- https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
from operator.
After further investigation we found out that the failed request to the kube server API takes 30s to timeout. During this time the liveness probe has the time to fail 3 times and then the pod restarts. I think a quick win solution would be to decrease this timeout delay to 5s, even if the first call to the API fails, 5s later the API will be up for sure and the probe will not kill the pod.
from operator.
@tmjd yeah, I think it's a hard one to do safely. We could get something to work for 90% of cases but not all, most likely.
@wiikip yeah, I think we could decrease the connection timeout, but we need to be careful. I don't think we should drop it below 10s - too short and we risk destabilizing unnecessarily. Maybe we can drop the connection timeout to 10s and increase the liveness prove failure threshold to 40s? That should be plenty.
from operator.
@caseydavenport Yeah, I agree with you. Decreasing the connection timeout to 10s and allowing the liveness probe to fail a 4th time ( 40s before it kill the pod) will do the job !
from operator.
Related Issues (20)
- Error running cluster on M1 / ARM Mac OS for local development HOT 13
- Calico Operator should support running different dataplanes on different nodes in the same Kubernetes cluster HOT 2
- v1.31.1 showing HIGH vulnerability CVE-2023-44487 HOT 1
- Tigera operator violates PodSecurity "baseline:latest" HOT 2
- Tigera Operator pod keeps restarting. HOT 1
- Pod fails to start when 'sysctl' tuning configured
- Typha autoscaler's autoscaling profile to be configurable
- Propose Windows operator updates HOT 7
- Calico v3.27.0 not working with Tigera v1.32.3 HOT 5
- Uninstallation Failure: Calico Module Leaves Remaining Jobs Blocking Deletion HOT 1
- Can't use calico on windows on EKS due to forced network mode HOT 1
- Calico APIServer does not find certs secret HOT 2
- With Tigera operator, applicative pod lost network after windows nodes reboot HOT 2
- Calico or Tigera operator should create CRDs automatically HOT 1
- Calico v3.27.2 is not working with TigeraOperator v1.32.5 HOT 2
- is there anyway to config labels for calico-system and calico-apiserver using tigera operator
- Expose CNI path for configuration
- [SOLVED] Issue migrating to Tigera Operator, IPAMCONFIGURATION not found HOT 8
- Tigera Operator installation causing significant growth in kube-apiserver-audit and operator workload logs HOT 1
- strict decoding error: unknown field "spec.FailsafeInboundHostPorts" HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from operator.