kuadrant / multicluster-gateway-controller Goto Github PK
View Code? Open in Web Editor NEWmulti-cluster gateway controller, manages multi-cluster gateways based on gateway api and policy attachment
License: Apache License 2.0
multi-cluster gateway controller, manages multi-cluster gateways based on gateway api and policy attachment
License: Apache License 2.0
What
As a gateway admin, I want to be able to chose to place a gateway on a set of clusters from my available clusters .
option
use an exact match placement label
kuadrant.io/placement: "key=value"
at a future point we may need more than this simplistic placement logic but this is lightweight.
When a placement is added, the controller needs to figure out which clusters names (secrets) this is selecting. It then needs to add a new label something like: kuadrant.io/<cluster>/placement.decision: "true"
which the sync agent will watch for
Done
Deploy a single redis instance to the local control plane cluster.
For each workload cluster, configure the limitador instance to connect to this single redis isntance in the control plane.
This can be added to https://github.com/Kuadrant/multi-cluster-traffic-controller/blob/main/hack/local-setup.sh
So that shared counters can be used between limitador instances.
This will enable development and testing of multi cluster rate limiting scenarios.
Note that the location of redis is not important from an architecture point of view.
The goal of this issue is to have some central shared redis instance, and the control plane cluster makes sense for local development.
In a production environment, the redis will probably be elsewhere (e.g. Elasticache). It's unlikely to be part of the HCG control plane architecture.
This issue depends on #29 for the installation of kuadrant (and limitador) in workload clusters.
Currently if you want to deploy the controller to a cluster (local or otherwise), you need to patch the Deployment to either have env vars for AWS access, or pull them in from configmap &/or secret.
In glbc, a local aws-credentials.env and controller-config.env file, which are in the gitignore, are used to store these values and then pulled in via kustomize https://github.com/kcp-dev/kcp-glbc/blob/main/config/deploy/local/kcp-glbc/kustomization.yaml#L16-L26
The .env files have a template, with some sensible defaults so a developer can copy it.
This model could be used here unless there's a better model we can come up with.
What
We are adding a sync agent to reflect certain resources defined at the control plane into the data plane. This syncer will also be responsible for reflecting the status of those resources back into the control plane and applying transforms added via the controllers in the control plane. This syncer will lean heavily on what we have learned through use of the KCP sync component
At the end of this work the control plane should no longer talk to the data plane.
Open Questions
Outcome
We should have basic sync capabilities for gateways and TLS secrets based on access provided by service account secrets. We should capture, patterns, concepts and remaining questions as a Design doc
What
We want to ensure that what we do is compliant with gitops, and also intuative for users who are familiar with kubernetes concepts. Currently we modify ingresses that we see directly. This can cause issues with gitops tools like argoCD. Additionally it can make configuring how to manage these ingresses complex as there may be other controllers and webhook intercepting these ingresses.
To solve this, we want to explore the idea of creating a custom ingress controller as part of the traffic controller deployment. This ingress controller would define its own ingressClass. While it would be considered an ingress controller it would effectively ensure that ingress was transformed and behaving as expected before delegating the ingress via a copy to a chosen ingress controller that would then use that ingress to configure traffic routing etc. As this copy would be owned by the original, it would be considered a derived resource and so not cause issues with ArgoCD.
This approach means delegating an ingress to traffic controller would mean assigning the right ingressClass to your ingress.
Additionally as we are looking at defining multi-cluster gateways at the control plane, we potentially could represent this ingress controller via a "virtual gateway" resource at the control plane. Where this resource would configure and trigger the deployment of our ingress controller to the data plane
Ingress Controller
apiVersion: kudarant.io/v1Alpha1
kind: VirtualGateway
metadata:
name: example-gateway
spec:
gatewayClassName: ingress-gateway
listeners:
- name: https
hostname: "*.example.com"
allowedRoutes:
kinds:
- kind: Ingress
namespaces:
from: Selector
selector:
matchLabels:
shared-gateway-access: "true"
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: example-com
The GetDNSRecords dns service method gets called to return existing records for a particular host. This needs to check the namespace of the managed zone for that particular host, instead of what it's currently doing of just checking the default ctrl ns, and with later changes the default and the traffic resource ns (see #84).
GetDNSRecords is called from 3 places:
What
Add a new healthcheck controller under a dns package. This controller is responsible for reconciling the DNSHealthCheck
What
By default, the traffic controller will automatically setup new certificates for listener hosts added to a gateway definition. There are use cases where, the gateway admin may not want to add a particular subdomain as a listener host but still want to be able to have a certificate that they can use for specific hosts they define under that subdomain.
Use Case
As a gateway admin, I want to setup a managed zone with a root domain and be able to choose how certificates are created for subdomains of that root domain. This is so I can request a certificate for *.a.b.com but not allow any use of *.a.b.com in the gateway definition. Instead I will define 1.a.b.com as a listener host in my gateway and use the certificate for *.a.b.com to cover this listener host
Follows on from #65
What
By default zones setup in the root traffic controller namespace will be used as shared zones. The controller will by default assign a subdomain of the default zone for each tenant. We need a way to advertise this subdomain to the tenant ns. This could potentially be done via a label on the ns or a new subdomain resource.
#43 introduced the sync component and enabled us to be able to connect to the control plane, sync resources and report status back to the control plane.
This is a good foundation but there are several pieces that are needed in order to bring the sync component to a full MCVP
Use Case
As a gateway admin managing a set of gateways across multiple clusters, I want to define a common health check that will govern whether a DNS A record is considered healthy so that with a single policy, I can ensure the DNS response for services exposing endpoints and using listeners on my gateway are only returning endpoints that are from healthy service endpoints allowing us to automatically and rapidly mitigate an unexpected outage. I also want to define a strategy for if all the endpoints become available. I want to define a minimum number of records per hostnames so that my health check doesn't remove all records .
What
This is an API that targets a gateway and encapsulates an end point check. We should base it on what we provided via annotations in the GLBC https://github.com/kcp-dev/kcp-glbc/blob/main/docs/dns/health-checks.md
Constraint
At this point the endpoint has to be publicly reachable. In the future we may look at other forms of check that do not require this. Example something running at the gateway level that performs the check and reports back via a status condition on the listeners status block
What
We want to be able to register clusters with the control plane within a given tenant NS . These clusters will be represented by secrets. These cluster secrets will be the place where additional information and context is added about the cluster.
The setup of this can be done via a script initially.
This should also spit our yaml for the sync agent and webhook configuration
Constraints
There isn't a requirement to setup a the tenant as part of this work, this is just focused on registering the cluster within a tenant.
What
Beyond regular unit tests, we should add an e2e test for the DNSHealthCheck. This test would not need to directly change the DNSProvider but could use a mock client for that piece. We would want to test that the HealthCheck was reconciled correctly, the DNSRecord was updated correctly and any additonal status on the gateway added along with the reverse flow for deletion.
The test should also validate that we do not reconcile badly defined health checks
The current lookup logic that maps a host tro a managed zone or any host is simplistic in that it will only knock off the first label and look for an exactly matching managed zone for the rest e.g. foo.bar.baz.mydomain.com
, would look for an MZ for bar.baz.mydomain.com
and create a record for foo
in it.
If we want to be able to work down the tree and for example find an MZ for mydomain.com
and create a record for foo.bar.baz
then we would need to update the logic in some way to allow this.
Need to consider how we would do this lookup in an efficient way since you would need to search for each item down the tree until one matched.
What
The gateway controller in the traffic controller, will need to reconcile the listeners defined in the gateway. For each listener the following requirements have been identified
So any clusters being targeted by a gateway and related tls secrets can be re-evaluated (based on the gateway cluster selector) and updated.
For example, if a new cluster is created, should the Gateway be synced to that new cluster?
what
Define an API that uses a target ref to target a gateway resource and specify a health check for the DNS Provider to implement in order to decide whether a DNS record should be considered part of a healthy response.
This resource is how we express DNS policy. It is intended to be translated into the DNSRecord.
What
local setup should install the CRDs for rate limit policy into the control plane
Currently, Istio is installed via helm:
This should be changed to install the Istio Operator (possibly via helm?) and use an IstioOperator CR
So that a Kuadrant AuthPolicy and Rate limit policy can be used.
When using the Istio operator, the IstioOperator CR is how Kuadrant configures the Istio extension provider.
Without this CR, the mesh config will have to configure manually (or by HCG) instead of by Kuadrant.
See https://redhat-internal.slack.com/archives/C047YHJJG2D/p1676895679014959 for additional context
NOTE When creating a gateway-api Gateway resource, the spec.addresses field must include a Hostname value that matches the gateway Service dns name that was created by the Istio Operator e.g. [{"type":"Hostname","value":"istio-ingressgateway.istio-system.svc.cluster.local"}]
. Otherwise the gateway-api Gateway resource won't be "linked" to the running Istio gateway. Instead it will result in another istio gateway being started.
Currently, MCTC has the ability to set an autogenerated managed host for every rule in the Ingress that it reconciles. It does so by duplicating the rule and setting the host of the duplicated rule as an algorithmically generated host. For example, with ZONE_ROOT_DOMAIN
being mctc.io
, the spec would be reconciled from this:
spec:
rules:
- host: example.custom.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo
port:
number: 80
To this
spec:
rules:
- host: example.custom.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo
port:
number: 80
- host: <autogenerated>.mctc.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo
port:
number: 80
It is possible for there to be a scenario where a user wants to create an Ingress without passing a specific host, only expecting to use the autogenerated managed host. Previously, in the kcp supported controller, the empty string would be used for this scenario. However, in the new architecture, the Ingress is also subject to be reconciled by the Nginx Ingress Controller. This controller sets a validation on Ingresses that ensures the empty host is only used once across the cluster, which limits the usage of it
Investigate alternative approaches to set rules that exclusively rely on the managed host without conflicting with the Ngnix controller.
The MCTC watches and reconciles traffic objects (currently limited to Ingresses) that might be reconciled by other controllers (for example the local setup includes the Nginx Ingress Controller, which reconciles the Ingress at the same time as the MCTC)
As support for other traffic objects (For example, Openshift Routes) is added, potential conflicts might arise with other controllers.
Investigate restrictions put in place by other traffic controllers (webhooks, reconciliation logic) that might conflict with the reconciliation of the MCTC, and create follow up issues to ensure that these conflicts are kept in mind when adding support for these objects
Extend the logic for setting up workload clusters to:
So that a Gateway resource can be created in a local workload cluster.
The intention is for a Gateway to be defined in the control plane, and synced into workload clusters by the traffic controller. This work will be captured in other issues.
This issue is done if a Gateway can be manually created in a workload cluster.
done in #110
what
Look into re-using the code ready namespace as a service components to provide a way to register tenants and also provide a k8s proxy
Goals/questions to answer
What
When setting up traffic controller, it should be possible to setup a default zone that is used by default for hosts in gateways etc.
Use Case
I don't want to have to setup a zone for every tenant ns but instead want to define a default zone that can be used by any tenant. Each tenant should receive a subdomain underneath the root domain of this zone
Option
Allow zones to be added to the same namespace as the traffic controller. By default these zones are considered shared. When a new namespace is added with a specific label, the controller should assign that new tenant a subdomain of the default zone. .
secret/admin-user-token created
error: error executing template "{{.data.token | base64decode}}": template: output:1:16: executing "output" at <base64decode>: invalid value; expected string
make: *** [local-setup] Error 1
Seen this once when running make local-setup
.
Possible timing issue
What
With the global load balancer (GLBC) being replaced by the Multi Cluster Traffic Controller (MCTC) we should look to rename the glbc-deployments repository to something more generic such as kcp-deployments while updating Argo configurations to support this name change.
From there we should also look at updating or replacing directory names and configurations within the repository to use MCTC configurations instead
Done
What
Add the base GatewayClass and Gateway controllers. There should be no logic at this point but just setup of the wiring events etc. This will allow us to potentially split up tasks within the controller
What
Define the managed zone CRD. We should keep this very simple for the time being. The spec currently likely only needs the root domain for the zone.
The status will need ready conditions, and a place to report nameservers back to the requester so that they can update their own DNS Provider to point at ours.
We may also want to record additional information in the spec/status such which provider this Zone is hosted in and the number of records currently assigned to the zone
Done
The changes introduced in #9 allow the webhook system to be set up only when the controller is running in-cluster. The webhook configuration will not be reconciled when running locally
In order to continously verify and develop the webhook system, it would be better to include the ability for the webhooks to work when the controller runs locally
Investigate an approach to expose the webhook server (which will run in the local host) with a valid CA bundle that be referenced in the webhook configuration.
172.32.0.1
(the IP address to the local host). This would require:
172.32.0.1
. In that case the host of the certificates will need to change depending on whether we're running the webhook server in-cluster or in the localhostWhat
As we are building out multi-cluster traffic capabilities, we want to be able to support Gateway API. More specifically we want to implement a controller in the traffic controller that will allow to reconcile and validate Gateways and be able to define specific gatewayclasss at the control plane level. Additionally we want to allow Gateway admins to be able to place these gateways onto their registered clusters and define listeners within a gateway and have these listeners be provided with TLS and DNS backing
Use Cases
As a gateway administrator, managing gateways across multiple clusters within an environment, I want a single view of my gateways so that I can easily understand which gateways are being used in which clusters and what their current configuration is.
As a gateway administrator, I want to be able to simply add new clusters and gateways as my requirements to scale dictate. Traffic should naturally flow and be balanced across all clusters
As a gateway administrator, I want to place a gateway I am managing via the control plane onto a specific group of data plane / ingress clusters so that these clusters are ready to accept ingress via Gateways I have configured.
As a cluster admin, I want to enforce that only domains I choose can be used with a particular gateway (supported by Gateway API def)
What
The gateway controller is responsible for reconciling gateway resources. It should do the following things
Depends on #54
This should introduce the idea of a multi-cluster gateway and then go on to explain how you can create and place a multi-cluster gateway. This doc should appear under the how-to section of https://docs.kuadrant.io/multicluster-gateway-controller/
It should have "getting started" as a pre-req
What
A managed zone represents a zone in an DNS Provider that is being managed by the traffic controller. Access to the provider is provided separately via credentials given to the traffic controller. In time these credentials will form the basis for a DNSProvider CR but initially there will just be one provider we support.
Considerations and Follow Ons
User Story
As a cluster administrator / Gateway administrator, I want to register a zone with the traffic controller to be used for my applications and multi-cluster ingress needs. I want this zone to be something I choose and own so that I know the application hosts are part of my well known "properties/domains". while still being able to take advantage of all the features offered by the traffic controller.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.