Comments (10)
@Omar007 I don't think the documentation is outdated. It is perfectly fine to deploy without typha for fewer than 50 pods.
It looks like the scale count doesn't care about the node type and will operate on both master/control-plane nodes as well as worker nodes. Is that something that should be accounted for?
Any host that will be running calico/node should be counted for the purposes for typha scaling.
@bodgit The only documentation is in the code 😀,
operator/pkg/common/autoscale.go
Line 17 in 4c523e7
from operator.
Maybe it'd not only be interesting to tune the replica count, but enable the Installation
CRD to also completely disable the use of Typha.
Calico currently has a lot of different ways to cover the installation/deployment (operator by manifest, operator by Helm, Calico by manifest variant A, Calico by manifest variant B, etc). Some of these even end up overlapping with the end-result.
Providing different solutions isn't inherently bad but atm I wouldn't say it is obvious or logically to set up or decide on a deployment strategy to maintain Calico with as not each of them currently covers all the documented deployment cases either.
Adding this capability would make the operator an alternative for the on-prem 50 nodes or less deployment variant that currently relies on a separate manifest. This installation variant end-result is currently not covered in the operator and adding this option would make it possible to use the operator for basically all documented deployment variants.
This would mean you're only required to provide and manage different Installation
objects for your clusters instead of also having to manage completely different installation methods.
from operator.
@bodgit The operator will auto scale typha so if only 1 or 2 nodes are present the appropriate number of typhas will be used. Also if you deploy a large number of nodes, typha will be scaled up as necessary by the operator. Typha will not be scaled to less than 3 if there are at least 3 nodes available as we consider 3 to be the minimum for a high availability purposes.
If you are looking for typha to scale to less than 3 after scaling up a cluster to 3 or more, then yes that is something the operator will not do.
@Omar007 The operator is deploying what we believe to be best practice, and that includes typha with all installation sizes. The operator is a great option for the on-prem 50 node or less deployment. The recommendation that typha should be deployed on clusters with more than 50 nodes should not be taken as a recommendation that it should not be deployed with less than 50 nodes.
from operator.
@tmjd Ok fair enough. Does that then mean that part of the documentation is basically outdated/deprecated?
The way it's set up and explicitly split into 2 use-cases suggested to me that the deployment variant without typha was preferred over the other with <50 nodes.
Small sidenote; It looks like the scale count doesn't care about the node type and will operate on both master/control-plane nodes as well as worker nodes. Is that something that should be accounted for?
from operator.
@tmjd Thanks for the explanation. Is the scaling algorithm documented somewhere, i.e. how many nodes do I need to have before a fourth typha replica is added, etc.?
from operator.
@tmjd Maybe that was a bit strongly worded. What I meant was more along the lines of: unlike how it is presented in the documentation, it's not the preferred/recommended deployment method (anymore?).
To me the Calico documentation is currently very much implying it is the first/primary option you should use if you're running a cluster with <50 nodes. It being the first option shown as well as being present as an explicit use-case in the first place.
Basically, the more exact the documentation matches the desired use-case, the more it suggests that for that given use-case that deployment method is the preferred and best option available. Even more so since it doesn't explicitly say anywhere it isn't.
As such, when I looked at using Calico based purely on the docs (layout, wording, etc), it suggests to me that:
- <50 nodes -> preferred to deploy minimally without typha
- >50 nodes -> preferred to deploy with typha
- Instead of using loose manifests, use of the operator preferred if you can
And when trying to use the operator for the <50 nodes use-case as seemingly documented as being preferred I came upon this issue while trying to figure out how to deploy without typha to match that documented use-case.
In reality it seems/feels like it's more like:
- Deployments with typha are the preferred method and regarded as best practice.
- Use of the operator is preferred if possible.
- Any other use-case; possible but you probably just shouldn't bother
Which is fine by me, just not at all clear to me until you elaborated here ;)
This information here helps a lot by deciding on the deployment strategy because it basically just flattens the whole selection of options to just 1 for me; use the operator which will deploy Calico with typha, which is the preferred and best practice deployment.
from operator.
@tmjd A pointer to the code is fine, that comment block explains it perfectly. I see I will have to get into the realms of 200+ nodes before a fourth typha Pod appears, I think that will be a nice problem to have!
from operator.
I ran into this issue today. When using cluster-autoscaler, once you scale to a number of nodes that increases the typha replicas, the cluster autoscaler will never be able to scale nodes back down to a smaller size that would decrease the number of typha replicas. For example here the cluster-autoscaler is unable to evict typha pods to scale below 3 nodes:
I0603 20:49:59.394882 1 klogx.go:86] Evaluation ip-10-xx-xx-xx.region.compute.internal for calico-system/calico-typha-5557f5df96-96l2x -> node(s) didn't have free ports for the requested pod ports; predicateName=NodePorts; reasons: node(s) didn't have free ports for the requested pod ports; debugInfo=
I0603 20:49:59.394913 1 klogx.go:86] Evaluation ip-10-xx-xx-xy.region.compute.internal for calico-system/calico-typha-5557f5df96-96l2x -> node(s) didn't have free ports for the requested pod ports; predicateName=NodePorts; reasons: node(s) didn't have free ports for the requested pod ports; debugInfo=
I0603 20:49:59.394928 1 cluster.go:190] Fast evaluation: node ip-10-xx-xx-xz.region.compute.internal is not suitable for removal: failed to find place for calico-system/calico-typha-5557f5df96-96l2x
I believe this would work if the tigera-operator supported the annotation cluster-autoscaler.kubernetes.io/safe-to-evict": "true
for calico-typha
like the calico helm chart does, but this annotation isn't configurable in the operator, and is explicitly stripped off during migration by tigera-operator (probably for a good reason).
I can understand why running > 2 typha replicas is recommended for HA during updates, but in smaller non-prod clusters, this can be undesirable.
from operator.
@itmustbejj Can I suggest opening a new issue as I think adding safe-to-evict
is probably a request that seems to make sense. Though from the log messages you added I'm not confident it would evict the pod, even if it had that annotation. The message says not suitable for removal: failed to find place for calico-system/calico-typha
, which to me suggests that it does think it could be evicted but there is no place for it to go so it wouldn't evict it anyway.
from operator.
I'm going to close this Issue as the original request is not something we're going to expose.
from operator.
Related Issues (20)
- When unninstall HOT 3
- timescale for v3.10.0 update
- Installation CRD not getting status updated HOT 1
- document procedure to completely uninstall calico and operator from cluster HOT 1
- Feature request: ability to disable pod-security labels added by tigera/operator HOT 11
- Fargate anti-affinity doesn't get applied to DaemonSets HOT 1
- feature request: calico-apiserver PodDisruptionBudget HOT 2
- Operator deletes tigera-system namespace on ApiServer deployment HOT 7
- Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade HOT 2
- AutoDiscoverProvider leads to wrong result
- Error running cluster on M1 / ARM Mac OS for local development HOT 13
- Calico Operator should support running different dataplanes on different nodes in the same Kubernetes cluster HOT 2
- v1.31.1 showing HIGH vulnerability CVE-2023-44487 HOT 1
- Tigera operator violates PodSecurity "baseline:latest" HOT 2
- Tigera Operator pod keeps restarting. HOT 1
- Pod fails to start when 'sysctl' tuning configured
- Typha autoscaler's autoscaling profile to be configurable
- Propose Windows operator updates HOT 7
- Calico v3.27.0 not working with Tigera v1.32.3 HOT 5
- Uninstallation Failure: Calico Module Leaves Remaining Jobs Blocking Deletion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from operator.