Comments (18)
@caseydavenport I agree that "Add component-specific fields for Typha and calico/node." is the best approach from an end user aspect. Is it worth considering if Calico node needs affinity or customisable selectors? Also Calico Node configuration could use the daemonset prefix, e.g. daemonsetTolerations
.
Slightly related, could you point me at the documentation that defines the data plane and control plane? I incorrectly assumed that Typha was part of the control plane; mainly because it's usually only the daemonsets that are considered data plane, but also because my understanding of Typha is that it caches the K8s API and I'd consider the K8s API to be more control plane than data plane (possibly incorrectly).
from operator.
I think we should fill this out:
- typhaAffinity
- typhaNodeSelector
- controlPlaneAffinity
- controlPlaneSelector
- daemonsetAffinity
- daemonsetNodeSelector
Consistent and covers all of the bases 😅
from operator.
@tmjd I've configured typhaAffinity
but as Typha uses the controlPlaneTolerations
value it really doesn't make sense that it doesn't use the controlPlaneNodeSelector
value. Also for a simple node selector typhaAffinity
is overkill and adds significant cognitive load.
When building Kubernetes platforms it's really important to have control over scheduling decisions for central components; this is where a lack of flexibility in operators can make them un-usable. It's a common pattern to run system node pools to rull all the central components on, leaving user provisioned nodes to only run daemonsets and user workloads.
from operator.
Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components.
@caseydavenport WDYT should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?
Also for a simple node selector typhaAffinity is overkill and adds significant cognitive load.
I expect the user to take on that cognitive load because it should be a specific decision if they need to use affinity for typha. We are talking about a component that if it cannot be deployed then pod networking will not function in a cluster, so if someone wants Node Selector type behavior for typha, it should not be an easy or quick decision.
from operator.
Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components
Agreed
should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?
I think the options here are:
- Consider Typha to be controlPlane, and thus have all the controlPlaneX fields apply to it.
- Consider it to be dataPlane, add new dataPlaneX fields that apply to Typha and calico/node.
- Add component-specific fields for Typha and calico/node.
I think the latter is probably the right path forward. controlPlane
makes sense for controllers and such that are not critical path for applications functioning (kube-controllers, apiserver, etc). Those can be bunched up.
calico/node and calico/typha are, unfortunately but necessarily, special system components that require fine-tuning.
So, for typha I think we should have:
- typhaNodeSelector
- typhaAffinity
- typhaTolerations
I'm not a huge fan of encoding component names into the API - I think it leaks implementation details, but in this case the implementation is part of the feature that is relevant to the end user, so there might be no way around that.
calico/node
is even more awkward, because it is named pretty vaguely...
- calicoNodeNodeSelector
- calicoNodeAffinity
- calicoNodeTolerations
^ These all seem non-obvious for the new user - e.g., is it "CalicoNode affinity or Calico NodeAffinity"? I think better names are needed for those.
from operator.
That would work too! The one caveat we've heard of around affinity is that it is much more expensive for the scheduler to enforce compared to nodeSelectors, which could limit the size of the Kubernetes cluster in terms of number of pods (around 10k pods per cluster). But we haven't hit that limit in our use-case yet.
from operator.
Hey @stevehipwell @aquam8 @sarthakjain271095 and @aarondav,
We've put up an outline of proposed changes to operator component configuration. Among other things, this will allow overriding tolerations
and node affinity
/nodeSelectors
. Please take a look if you can. We'd appreciate your input on the proposed changes: #1990
from operator.
Yeah we should state that Typha is an exception because it is a critical component and should be considered part of the dataPlane.
from operator.
@tmjd isn't that more reason to allow the nodes it runs on to be selected? I've currently got Typha pods running on nodes with no guaranteed lifecycle, if they respected controlPlaneNodeSelector
they'd be on the system nodes. If Typha isn't using controlPlaneNodeSelector
then there either needs to be a typhaNodeSelector
or dataPlaneNodeSelector
to control this.
from operator.
@tmjd isn't that more reason to allow the nodes it runs on to be selected?
I don't think so, it isn't critical that individual Typhas are not destroyed/recreated. It is critical that a sufficient number exist, which is why there is a PodDisruptionBudget defined for it and we suggest only preferred(not required) typaAffinity. This will allow typha to always be present at the correct scale since it is not forced to any specific nodes. So as long as any node scaling properly follows PodDisrutpionBudgets then Typha is fine.
If you still would like to prefer system nodes for typha you can set that with spec.typhaAffinity.
If Typha isn't using controlPlaneNodeSelector then there either needs to be a typhaNodeSelector or dataPlaneNodeSelector to control this.
I do not believe this is necessary because typha's scaling and PodDisruptionBudget are set to ensure HA when nodes are removed or fail.
If you still believe you need to force typha to certain nodes checkout typhaAffinity which is of type NodeAffinity. But please take note of the warning on requiredDuringSchedulingIgnoredDuringExecution, which is
WARNING: Please note that if the affinity requirements specified by this field are not met at scheduling time, the pod will NOT be scheduled onto the node. There is no fallback to another affinity rules with this setting. This may cause networking disruption or even catastrophic failure! PreferredDuringSchedulingIgnoredDuringExecution should be used for affinity unless there is a specific well understood reason to use RequiredDuringSchedulingIgnoredDuringExecution and you can guarantee that the RequiredDuringSchedulingIgnoredDuringExecution will always have sufficient nodes to satisfy the requirement.
from operator.
Now that Typha is respecting controlPlaneTolerations , we can either:
- Make a clean break, and release note the change in field
- Have Typha prefer typhaTolerations, and fall back to controlPlaneTolerations, with a warning explaining that it will be removed in a future release and users should migrate to typhaTolerations
- Have Typha prefer typhaTolerations, and fall back to controlPlaneTolerations, no future change.
from operator.
Any plans to allow the calico-typha to be configured analogously to calico-kube-controllers? We consider typha to be more "control plane-y" and want to control which set of nodes it lands on in part due to security concerns.
Would be great to have typha part of the controlPlaneNodeSelector or having a separate typhaNodeSelector, as mentioned above. typhaAffinity is available, but requires us to configure calico in an inconsistent manner between the controllers and typha.
from operator.
but requires us to configure calico in an inconsistent manner between the controllers and typha.
Agree this is annoying, although my preference would be to adjust controllers to respect an affinity field since it's a more flexible syntax than adding a selector for typha.
from operator.
The one caveat we've heard of around affinity is that it is much more expensive for the scheduler to enforce compared to nodeSelectors
Yeah, it's true that nodeSelectors are evaulated continuously whereas to-date, I believe affinities are at schedule-time only. So perhaps there is a case for supporting both.
from operator.
stevehipwell which version of calico are you using where you are seeing that controlPlaneTolerations are being applied to typha pods? The reason being, I recently upgraded to calico 3.20.0. I am installing it via tigera-operator. And controlPlaneTolerations are not being applied to typha pods for me. 😅
from operator.
@sarthakjain271095 I'm currently running Tigera Operator v1.23.5 to install Calico v3.21.4 and the toleration are applied to Typha correctly, I've also used one of the v1.24 versions and that also worked.
I think the behaviour was added in v1.22.0 (#1507) so it depends on which Tigera Operator version you're using to install Calico v3.20.0.
from operator.
I think we should fill this out:
- typhaAffinity
- typhaNodeSelector
- controlPlaneAffinity
- controlPlaneSelector
- daemonsetAffinity
- daemonsetNodeSelector
Consistent and covers all of the bases 😅
You dropped support for tolerations
specification for calico-node/DS which is very important if we need calico-node on every nodes - even with a taint.
I can confirm that in v3.22.2 controlPlaneTolerations
gets applied to Typha, but not to Node
(calico-node).
Thank you
from operator.
Now that #2063 is merged, it will now be possible to specify a nodeSelector/affinity on the core Calico components (including Typha).
from operator.
Related Issues (20)
- Tigera Operator pod keeps restarting. HOT 1
- Pod fails to start when 'sysctl' tuning configured
- Typha autoscaler's autoscaling profile to be configurable
- Propose Windows operator updates HOT 7
- Calico v3.27.0 not working with Tigera v1.32.3 HOT 5
- Uninstallation Failure: Calico Module Leaves Remaining Jobs Blocking Deletion HOT 1
- Can't use calico on windows on EKS due to forced network mode HOT 1
- Calico APIServer does not find certs secret HOT 2
- With Tigera operator, applicative pod lost network after windows nodes reboot HOT 2
- Calico or Tigera operator should create CRDs automatically HOT 1
- Calico v3.27.2 is not working with TigeraOperator v1.32.5 HOT 2
- is there anyway to config labels for calico-system and calico-apiserver using tigera operator
- Expose CNI path for configuration
- [SOLVED] Issue migrating to Tigera Operator, IPAMCONFIGURATION not found HOT 8
- Tigera Operator installation causing significant growth in kube-apiserver-audit and operator workload logs HOT 1
- strict decoding error: unknown field "spec.FailsafeInboundHostPorts" HOT 5
- operator: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory HOT 4
- Tigera-operator helm chart unable to set csiNodeDriverDaemonSet resource memory/cpu requests & limits HOT 5
- bug: Calico Uninstallation Fails Due to Finalizers on Service Accounts HOT 11
- tigera operator throws error every 5 minutes for ippool not created and managed by operator HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from operator.