Giter VIP home page Giter VIP logo

Comments (18)

stevehipwell avatar stevehipwell commented on August 11, 2024 2

@caseydavenport I agree that "Add component-specific fields for Typha and calico/node." is the best approach from an end user aspect. Is it worth considering if Calico node needs affinity or customisable selectors? Also Calico Node configuration could use the daemonset prefix, e.g. daemonsetTolerations.

Slightly related, could you point me at the documentation that defines the data plane and control plane? I incorrectly assumed that Typha was part of the control plane; mainly because it's usually only the daemonsets that are considered data plane, but also because my understanding of Typha is that it caches the K8s API and I'd consider the K8s API to be more control plane than data plane (possibly incorrectly).

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024 2

I think we should fill this out:

  • typhaAffinity
  • typhaNodeSelector
  • controlPlaneAffinity
  • controlPlaneSelector
  • daemonsetAffinity
  • daemonsetNodeSelector

Consistent and covers all of the bases 😅

from operator.

stevehipwell avatar stevehipwell commented on August 11, 2024 1

@tmjd I've configured typhaAffinity but as Typha uses the controlPlaneTolerations value it really doesn't make sense that it doesn't use the controlPlaneNodeSelector value. Also for a simple node selector typhaAffinity is overkill and adds significant cognitive load.

When building Kubernetes platforms it's really important to have control over scheduling decisions for central components; this is where a lack of flexibility in operators can make them un-usable. It's a common pattern to run system node pools to rull all the central components on, leaving user provisioned nodes to only run daemonsets and user workloads.

from operator.

tmjd avatar tmjd commented on August 11, 2024 1

Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components.
@caseydavenport WDYT should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?

Also for a simple node selector typhaAffinity is overkill and adds significant cognitive load.

I expect the user to take on that cognitive load because it should be a specific decision if they need to use affinity for typha. We are talking about a component that if it cannot be deployed then pod networking will not function in a cluster, so if someone wants Node Selector type behavior for typha, it should not be an easy or quick decision.

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024 1

Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components

Agreed

should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?

I think the options here are:

  • Consider Typha to be controlPlane, and thus have all the controlPlaneX fields apply to it.
  • Consider it to be dataPlane, add new dataPlaneX fields that apply to Typha and calico/node.
  • Add component-specific fields for Typha and calico/node.

I think the latter is probably the right path forward. controlPlane makes sense for controllers and such that are not critical path for applications functioning (kube-controllers, apiserver, etc). Those can be bunched up.

calico/node and calico/typha are, unfortunately but necessarily, special system components that require fine-tuning.

So, for typha I think we should have:

  • typhaNodeSelector
  • typhaAffinity
  • typhaTolerations

I'm not a huge fan of encoding component names into the API - I think it leaks implementation details, but in this case the implementation is part of the feature that is relevant to the end user, so there might be no way around that.

calico/node is even more awkward, because it is named pretty vaguely...

  • calicoNodeNodeSelector
  • calicoNodeAffinity
  • calicoNodeTolerations

^ These all seem non-obvious for the new user - e.g., is it "CalicoNode affinity or Calico NodeAffinity"? I think better names are needed for those.

from operator.

aarondav avatar aarondav commented on August 11, 2024 1

That would work too! The one caveat we've heard of around affinity is that it is much more expensive for the scheduler to enforce compared to nodeSelectors, which could limit the size of the Kubernetes cluster in terms of number of pods (around 10k pods per cluster). But we haven't hit that limit in our use-case yet.

from operator.

lmm avatar lmm commented on August 11, 2024 1

Hey @stevehipwell @aquam8 @sarthakjain271095 and @aarondav,

We've put up an outline of proposed changes to operator component configuration. Among other things, this will allow overriding tolerations and node affinity/nodeSelectors. Please take a look if you can. We'd appreciate your input on the proposed changes: #1990

from operator.

tmjd avatar tmjd commented on August 11, 2024

Yeah we should state that Typha is an exception because it is a critical component and should be considered part of the dataPlane.

from operator.

stevehipwell avatar stevehipwell commented on August 11, 2024

@tmjd isn't that more reason to allow the nodes it runs on to be selected? I've currently got Typha pods running on nodes with no guaranteed lifecycle, if they respected controlPlaneNodeSelector they'd be on the system nodes. If Typha isn't using controlPlaneNodeSelector then there either needs to be a typhaNodeSelector or dataPlaneNodeSelector to control this.

from operator.

tmjd avatar tmjd commented on August 11, 2024

@tmjd isn't that more reason to allow the nodes it runs on to be selected?

I don't think so, it isn't critical that individual Typhas are not destroyed/recreated. It is critical that a sufficient number exist, which is why there is a PodDisruptionBudget defined for it and we suggest only preferred(not required) typaAffinity. This will allow typha to always be present at the correct scale since it is not forced to any specific nodes. So as long as any node scaling properly follows PodDisrutpionBudgets then Typha is fine.

If you still would like to prefer system nodes for typha you can set that with spec.typhaAffinity.

If Typha isn't using controlPlaneNodeSelector then there either needs to be a typhaNodeSelector or dataPlaneNodeSelector to control this.

I do not believe this is necessary because typha's scaling and PodDisruptionBudget are set to ensure HA when nodes are removed or fail.
If you still believe you need to force typha to certain nodes checkout typhaAffinity which is of type NodeAffinity. But please take note of the warning on requiredDuringSchedulingIgnoredDuringExecution, which is

WARNING: Please note that if the affinity requirements specified by this field are not met at scheduling time, the pod will NOT be scheduled onto the node. There is no fallback to another affinity rules with this setting. This may cause networking disruption or even catastrophic failure! PreferredDuringSchedulingIgnoredDuringExecution should be used for affinity unless there is a specific well understood reason to use RequiredDuringSchedulingIgnoredDuringExecution and you can guarantee that the RequiredDuringSchedulingIgnoredDuringExecution will always have sufficient nodes to satisfy the requirement.

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

Now that Typha is respecting controlPlaneTolerations , we can either:

  • Make a clean break, and release note the change in field
  • Have Typha prefer typhaTolerations, and fall back to controlPlaneTolerations, with a warning explaining that it will be removed in a future release and users should migrate to typhaTolerations
  • Have Typha prefer typhaTolerations, and fall back to controlPlaneTolerations, no future change.

from operator.

aarondav avatar aarondav commented on August 11, 2024

Any plans to allow the calico-typha to be configured analogously to calico-kube-controllers? We consider typha to be more "control plane-y" and want to control which set of nodes it lands on in part due to security concerns.

Would be great to have typha part of the controlPlaneNodeSelector or having a separate typhaNodeSelector, as mentioned above. typhaAffinity is available, but requires us to configure calico in an inconsistent manner between the controllers and typha.

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

but requires us to configure calico in an inconsistent manner between the controllers and typha.

Agree this is annoying, although my preference would be to adjust controllers to respect an affinity field since it's a more flexible syntax than adding a selector for typha.

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

The one caveat we've heard of around affinity is that it is much more expensive for the scheduler to enforce compared to nodeSelectors

Yeah, it's true that nodeSelectors are evaulated continuously whereas to-date, I believe affinities are at schedule-time only. So perhaps there is a case for supporting both.

from operator.

sarthakjain271095 avatar sarthakjain271095 commented on August 11, 2024

stevehipwell which version of calico are you using where you are seeing that controlPlaneTolerations are being applied to typha pods? The reason being, I recently upgraded to calico 3.20.0. I am installing it via tigera-operator. And controlPlaneTolerations are not being applied to typha pods for me. 😅

from operator.

stevehipwell avatar stevehipwell commented on August 11, 2024

@sarthakjain271095 I'm currently running Tigera Operator v1.23.5 to install Calico v3.21.4 and the toleration are applied to Typha correctly, I've also used one of the v1.24 versions and that also worked.

I think the behaviour was added in v1.22.0 (#1507) so it depends on which Tigera Operator version you're using to install Calico v3.20.0.

from operator.

aquam8 avatar aquam8 commented on August 11, 2024

I think we should fill this out:

  • typhaAffinity
  • typhaNodeSelector
  • controlPlaneAffinity
  • controlPlaneSelector
  • daemonsetAffinity
  • daemonsetNodeSelector

Consistent and covers all of the bases 😅

You dropped support for tolerations specification for calico-node/DS which is very important if we need calico-node on every nodes - even with a taint.

I can confirm that in v3.22.2 controlPlaneTolerations gets applied to Typha, but not to Node (calico-node).

Thank you

from operator.

lmm avatar lmm commented on August 11, 2024

Now that #2063 is merged, it will now be possible to specify a nodeSelector/affinity on the core Calico components (including Typha).

from operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.