kuadrant / multicluster-gateway-controller Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 23.0 65.31 MB

multi-cluster gateway controller, manages multi-cluster gateways based on gateway api and policy attachment

License: Apache License 2.0

Dockerfile 0.78% Makefile 6.01% Go 86.15% Shell 6.32% Jsonnet 0.74%

gateway-api global-load-balancing hacktoberfest multi-cloud multi-cluster

multicluster-gateway-controller's Introduction

kuadrant

multicluster-gateway-controller's People

Contributors

Stargazers

Watchers

multicluster-gateway-controller's Issues

Allow placement of Gateways

What

As a gateway admin, I want to be able to chose to place a gateway on a set of clusters from my available clusters .

option

use an exact match placement label

kuadrant.io/placement: "key=value"

at a future point we may need more than this simplistic placement logic but this is lightweight.
When a placement is added, the controller needs to figure out which clusters names (secrets) this is selecting. It then needs to add a new label something like: kuadrant.io/<cluster>/placement.decision: "true" which the sync agent will watch for

Done

new placement label defined
new logic to find the cluster the label matches
new logic to add a placement decision to the gateway resource
new unit tests

Hook up limitador in local workload clusters to a shared redis instance

What

Deploy a single redis instance to the local control plane cluster.
For each workload cluster, configure the limitador instance to connect to this single redis isntance in the control plane.
This can be added to https://github.com/Kuadrant/multi-cluster-traffic-controller/blob/main/hack/local-setup.sh

Why

So that shared counters can be used between limitador instances.
This will enable development and testing of multi cluster rate limiting scenarios.
Note that the location of redis is not important from an architecture point of view.
The goal of this issue is to have some central shared redis instance, and the control plane cluster makes sense for local development.
In a production environment, the redis will probably be elsewhere (e.g. Elasticache). It's unlikely to be part of the HCG control plane architecture.

This issue depends on #29 for the installation of kuadrant (and limitador) in workload clusters.

Add AWS config & credentials to the 'deploy' target

Currently if you want to deploy the controller to a cluster (local or otherwise), you need to patch the Deployment to either have env vars for AWS access, or pull them in from configmap &/or secret.

In glbc, a local aws-credentials.env and controller-config.env file, which are in the gitignore, are used to store these values and then pulled in via kustomize https://github.com/kcp-dev/kcp-glbc/blob/main/config/deploy/local/kcp-glbc/kustomization.yaml#L16-L26
The .env files have a template, with some sensible defaults so a developer can copy it.
This model could be used here unless there's a better model we can come up with.

Basic Sync Support

What

We are adding a sync agent to reflect certain resources defined at the control plane into the data plane. This syncer will also be responsible for reflecting the status of those resources back into the control plane and applying transforms added via the controllers in the control plane. This syncer will lean heavily on what we have learned through use of the KCP sync component

At the end of this work the control plane should no longer talk to the data plane.

Open Questions

how do we choose which namespace on the cluster to sync to? Do we generate a namespace like kcp did based on the tenant namespace?
Initially we can sync to a known namespace (IE istio-system)

Outcome

We should have basic sync capabilities for gateways and TLS secrets based on access provided by service account secrets. We should capture, patterns, concepts and remaining questions as a Design doc

Sync DNSRecords based on Ingress #70
Remove cluster watch support #70
Remove webhook reconcile #70
Add capability to sync gateways and secrets
Add capability to report back the status of a given gateway as a multi-cluster status

Investigate custom ingress Gateway and controller

What

We want to ensure that what we do is compliant with gitops, and also intuative for users who are familiar with kubernetes concepts. Currently we modify ingresses that we see directly. This can cause issues with gitops tools like argoCD. Additionally it can make configuring how to manage these ingresses complex as there may be other controllers and webhook intercepting these ingresses.

To solve this, we want to explore the idea of creating a custom ingress controller as part of the traffic controller deployment. This ingress controller would define its own ingressClass. While it would be considered an ingress controller it would effectively ensure that ingress was transformed and behaving as expected before delegating the ingress via a copy to a chosen ingress controller that would then use that ingress to configure traffic routing etc. As this copy would be owned by the original, it would be considered a derived resource and so not cause issues with ArgoCD.

This approach means delegating an ingress to traffic controller would mean assigning the right ingressClass to your ingress.

Additionally as we are looking at defining multi-cluster gateways at the control plane, we potentially could represent this ingress controller via a "virtual gateway" resource at the control plane. Where this resource would configure and trigger the deployment of our ingress controller to the data plane

Ingress Controller

Sees Ingress
Checks its listeners config (specified at the control planed and synced down)
validates that the ingress is good (passes any allowedRoutes rules)
finds the needed TLS secret for the listener (also in the same ns as the controller) and copies it to the ingress ns
mutates the ingress and creates a copy of that ingress in the same ns as the target ingress
The ingress controller can only delegate to one actual ingress controller. The LB for this is what is set in the status of the virtual gateway

apiVersion: kudarant.io/v1Alpha1
kind: VirtualGateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: ingress-gateway
  listeners:
  - name: https
    hostname: "*.example.com"
    allowedRoutes:
      kinds: 
        - kind: Ingress
      namespaces:
        from: Selector
        selector:
          matchLabels:
            shared-gateway-access: "true"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: example-com

Update GetDNSRecords log to query MZ ns only

The GetDNSRecords dns service method gets called to return existing records for a particular host. This needs to check the namespace of the managed zone for that particular host, instead of what it's currently doing of just checking the default ctrl ns, and with later changes the default and the traffic resource ns (see #84).

GetDNSRecords is called from 3 places:

EnsureManagedHost (ingress specific)
AddEndPoints
RemoveEndpoints

Add DNSHealthCheck Controller

What

Add a new healthcheck controller under a dns package. This controller is responsible for reconciling the DNSHealthCheck

Validate the target and spec of the DNSHealthCheck
For each defined listener in the Gateway, add a health check to the DNSRecord ( we can modify the spec of the DNSRecord to support health checks)
This controller should be triggered by changes to the DNSRecord, Gateway and DNSHealthCheck resources. DNSHealthCheck should probably be owned by the gateway it is targeting so it is removed if the gateway is removed
The status of the DNSHealthCheck should reflect whether the check is considered active
Another option is to add a status listener condition that informs that a health check is active for that listener

Define a cluster role that the sync service accounts will be bound to

Allow a certificate strategy to be defined for ManagedZone

What

By default, the traffic controller will automatically setup new certificates for listener hosts added to a gateway definition. There are use cases where, the gateway admin may not want to add a particular subdomain as a listener host but still want to be able to have a certificate that they can use for specific hosts they define under that subdomain.

Use Case

As a gateway admin, I want to setup a managed zone with a root domain and be able to choose how certificates are created for subdomains of that root domain. This is so I can request a certificate for *.a.b.com but not allow any use of *.a.b.com in the gateway definition. Instead I will define 1.a.b.com as a listener host in my gateway and use the certificate for *.a.b.com to cover this listener host

bf2fc6cc711aee1a0c2a/architecture#84

Advertise the subdomain of a default zone

Follows on from #65
What

By default zones setup in the root traffic controller namespace will be used as shared zones. The controller will by default assign a subdomain of the default zone for each tenant. We need a way to advertise this subdomain to the tenant ns. This could potentially be done via a label on the ns or a new subdomain resource.

Set up automatic image builds for main of mctc to quay.io.

Advanced Sync Support

#43 introduced the sync component and enabled us to be able to connect to the control plane, sync resources and report status back to the control plane.
This is a good foundation but there are several pieces that are needed in order to bring the sync component to a full MCVP

Add capability to apply transforms to the gateway declared by the control plane (IE the gateway class to use)
Remove MCTC annotations from downstream resource.
Add deletion support based on deletion of resources in the CP
Replace hardcoded downstream namespace
Unit tests that cover the key functionality of the above sync features
Add e2e tests that cover sync functionality

Deploy mctc to the cluster via our internal ArgoCD instance

Support adding a TLS and DNS record based on listeners in the Gateway resource

AWS DNSHealthCheck

Use Case

As a gateway admin managing a set of gateways across multiple clusters, I want to define a common health check that will govern whether a DNS A record is considered healthy so that with a single policy, I can ensure the DNS response for services exposing endpoints and using listeners on my gateway are only returning endpoints that are from healthy service endpoints allowing us to automatically and rapidly mitigate an unexpected outage. I also want to define a strategy for if all the endpoints become available. I want to define a minimum number of records per hostnames so that my health check doesn't remove all records .

What
This is an API that targets a gateway and encapsulates an end point check. We should base it on what we provided via annotations in the GLBC https://github.com/kcp-dev/kcp-glbc/blob/main/docs/dns/health-checks.md

Constraint
At this point the endpoint has to be publicly reachable. In the future we may look at other forms of check that do not require this. Example something running at the gateway level that performs the check and reports back via a status condition on the listeners status block

Cluster Registration

What

We want to be able to register clusters with the control plane within a given tenant NS . These clusters will be represented by secrets. These cluster secrets will be the place where additional information and context is added about the cluster.

The setup of this can be done via a script initially.

This should also spit our yaml for the sync agent and webhook configuration

Constraints

There isn't a requirement to setup a the tenant as part of this work, this is just focused on registering the cluster within a tenant.

#68
Add logic to the registration script that uses a KUBECONFIG to create a new service account and create a rolebinding in the tenant ns
Add logic that uses the SA and outputs a deployment etc to setup the syncer

CI/CD groundwork

Goals

Enable fast deploy of new versions to shared environments
Test changes after a deploy to ensure functionality still works
Enable the team to run a stable version

Tasks

Create an e2e test suite

Add a e2e test suite that can be run against the local-setup environment
Add GH action that can create a local environment and execute the test suite against it

Gateway certificate sync

Add e2e test for DNSHealthCheck

What

Beyond regular unit tests, we should add an e2e test for the DNSHealthCheck. This test would not need to directly change the DNSProvider but could use a mock client for that piece. We would want to test that the HealthCheck was reconciled correctly, the DNSRecord was updated correctly and any additonal status on the gateway added along with the reverse flow for deletion.
The test should also validate that we do not reconcile badly defined health checks

Enhance lookup logic for managed zones

The current lookup logic that maps a host tro a managed zone or any host is simplistic in that it will only knock off the first label and look for an exactly matching managed zone for the rest e.g. foo.bar.baz.mydomain.com, would look for an MZ for bar.baz.mydomain.com and create a record for foo in it.

If we want to be able to work down the tree and for example find an MZ for mydomain.com and create a record for foo.bar.baz then we would need to update the logic in some way to allow this.

Need to consider how we would do this lookup in an efficient way since you would need to search for each item down the tree until one matched.

Docs for managed zone - What are they and how to use them

Support a placement label on the gateway resource

Auto run an e2e test after new deploys to the unstable environment

Add support for reconciling Gateway Listeners

What

The gateway controller in the traffic controller, will need to reconcile the listeners defined in the gateway. For each listener the following requirements have been identified

Check if the host is part of a managed zone for that tenant
If not then it can be ignored
ensure there is a certificate for the host specified (if it is a wildcard host the cert should be a wildcard cert)
ensure there is a DNSRecord based on the LB status in the gateway (only once the certificate is ready)

Docs for syncer - What is it and how to set it up

Trigger gateway reconciler when a cluster is created/updated/deleted

Why

So any clusters being targeted by a gateway and related tls secrets can be re-evaluated (based on the gateway cluster selector) and updated.
For example, if a new cluster is created, should the Gateway be synced to that new cluster?

Add support for registering supported GatewayClasses

Add automation & process around keeping it running

Define API spec for DNSPolicy

what

Define an API that uses a target ref to target a gateway resource and specify a health check for the DNS Provider to implement in order to decide whether a DNS record should be considered part of a healthy response.

This resource is how we express DNS policy. It is intended to be translated into the DNSRecord.

expected status code
number of re-tries
path to endpoint
reference https://github.com/kcp-dev/kcp-glbc/blob/main/docs/dns/health-checks.md

Set up automated deploy to unstable on merge to main branch of mctc

Control Plane Multi-Cluster Gateway Rate Limiting

What
local setup should install the CRDs for rate limit policy into the control plane

Install istio in local dev environment using the IstioOperator CR

Currently, Istio is installed via helm:

This should be changed to install the Istio Operator (possibly via helm?) and use an IstioOperator CR

Why

So that a Kuadrant AuthPolicy and Rate limit policy can be used.
When using the Istio operator, the IstioOperator CR is how Kuadrant configures the Istio extension provider.
Without this CR, the mesh config will have to configure manually (or by HCG) instead of by Kuadrant.

See https://redhat-internal.slack.com/archives/C047YHJJG2D/p1676895679014959 for additional context

NOTE When creating a gateway-api Gateway resource, the spec.addresses field must include a Hostname value that matches the gateway Service dns name that was created by the Istio Operator e.g. [{"type":"Hostname","value":"istio-ingressgateway.istio-system.svc.cluster.local"}]. Otherwise the gateway-api Gateway resource won't be "linked" to the running Istio gateway. Instead it will result in another istio gateway being started.

Investigate approach to exclusively use managed host

What

Currently, MCTC has the ability to set an autogenerated managed host for every rule in the Ingress that it reconciles. It does so by duplicating the rule and setting the host of the duplicated rule as an algorithmically generated host. For example, with ZONE_ROOT_DOMAIN being mctc.io, the spec would be reconciled from this:

spec:
  rules:
    - host: example.custom.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80

To this

spec:
  rules:
    - host: example.custom.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80
    - host: <autogenerated>.mctc.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80

Why

It is possible for there to be a scenario where a user wants to create an Ingress without passing a specific host, only expecting to use the autogenerated managed host. Previously, in the kcp supported controller, the empty string would be used for this scenario. However, in the new architecture, the Ingress is also subject to be reconciled by the Nginx Ingress Controller. This controller sets a validation on Ingresses that ensures the empty host is only used once across the cluster, which limits the usage of it

How

Investigate alternative approaches to set rules that exclusively rely on the managed host without conflicting with the Ngnix controller.

Investigate potential conflicts with third-party traffic controllers

What

The MCTC watches and reconciles traffic objects (currently limited to Ingresses) that might be reconciled by other controllers (for example the local setup includes the Nginx Ingress Controller, which reconciles the Ingress at the same time as the MCTC)

Why

As support for other traffic objects (For example, Openshift Routes) is added, potential conflicts might arise with other controllers.

How

Investigate restrictions put in place by other traffic controllers (webhooks, reconciliation logic) that might conflict with the reconciliation of the MCTC, and create follow up issues to ensure that these conflicts are kept in mind when adding support for these objects

Add Istio & kuadrant to local dev workload clusters

What

Extend the logic for setting up workload clusters to:

Install the Istio Operator, including the Gateway API CRDs i.e. Gateway, GatewayClass, HttpRoute ...
Install the Kuadrant Operator, including the Kuadrant API CRDs i.e. AuthPolicy, RateLimitingPolicy, and Authorino & Limitador

Why

So that a Gateway resource can be created in a local workload cluster.
The intention is for a Gateway to be defined in the control plane, and synced into workload clusters by the traffic controller. This work will be captured in other issues.
This issue is done if a Gateway can be manually created in a workload cluster.

Links

Docs/Script for setting up a local environment to use milestone 2 features

done in #110

Investigate using the namespace as a service registration and proxy

what

Look into re-using the code ready namespace as a service components to provide a way to register tenants and also provide a k8s proxy

Goals/questions to answer

Can we get this setup working with k8s
How or what do we want to use in order to provide the authentication
Is this a full transparent proxy for k8s ( does it support things like watch kubectl -f etc)
Can the sync agent act against this proxy
Can we keep this as an optional piece

Allow use of shared managed zones

What

When setting up traffic controller, it should be possible to setup a default zone that is used by default for hosts in gateways etc.

Use Case
I don't want to have to setup a zone for every tenant ns but instead want to define a default zone that can be used by any tenant. Each tenant should receive a subdomain underneath the root domain of this zone

Option

Allow zones to be added to the same namespace as the traffic controller. By default these zones are considered shared. When a new namespace is added with a specific label, the controller should assign that new tenant a subdomain of the default zone. .

Milestone 2 Documentation

Tasks

Local setup flake: `error executing template` when deploying kubernetes-dashboard

secret/admin-user-token created
error: error executing template "{{.data.token | base64decode}}": template: output:1:16: executing "output" at <base64decode>: invalid value; expected string
make: *** [local-setup] Error 1

Seen this once when running make local-setup.
Possible timing issue

Refactor glbc-deployments repo

What
With the global load balancer (GLBC) being replaced by the Multi Cluster Traffic Controller (MCTC) we should look to rename the glbc-deployments repository to something more generic such as kcp-deployments while updating Argo configurations to support this name change.

From there we should also look at updating or replacing directory names and configurations within the repository to use MCTC configurations instead

Done

The glbc-deployments repository has been renamed to kcp-deployments
Sub directories and config files within the glbc-deployments repo are updated to support MCTC

Add Base Gateway and Gateway class controllers

What

Add the base GatewayClass and Gateway controllers. There should be no logic at this point but just setup of the wiring events etc. This will allow us to potentially split up tasks within the controller

Define ManagedZone CRD and controller

What

Define the managed zone CRD. We should keep this very simple for the time being. The spec currently likely only needs the root domain for the zone.
The status will need ready conditions, and a place to report nameservers back to the requester so that they can update their own DNS Provider to point at ours.
We may also want to record additional information in the spec/status such which provider this Zone is hosted in and the number of records currently assigned to the zone

Done

Simple API Resource that specifies a root domain for a new zone to be used by the tenant e.g. apps.mycompany.com
New Managed Zone controller within the traffic controller
The new zone is assigned to a provider by the traffic controller
(only 1 provider at this stage)
The zone is then created in a provider via the traffic controller using provider credentials
The zone status is updated with the nameservers for this new zone and a ready condition
Unit tests in place

Enable webhooks when running the controller locally

What

The changes introduced in #9 allow the webhook system to be set up only when the controller is running in-cluster. The webhook configuration will not be reconciled when running locally

Why

In order to continously verify and develop the webhook system, it would be better to include the ability for the webhooks to work when the controller runs locally

How

Investigate an approach to expose the webhook server (which will run in the local host) with a valid CA bundle that be referenced in the webhook configuration.

Possible implementations

One alternative to implement this is exposing a container using the webhook TLS certificates, which proxies the requests to 172.32.0.1 (the IP address to the local host). This would require:
- A container image that can be configured to proxy requests to a given address
- Configuring the webhook ingress to expose that container image instead of the controller
Another alternative is modifying the controller to receive the TLS certificates location as a parameter and, in that case, run the webhook server with them. Then all that needs to be done is configuring the WebhookConfiguration to point to 172.32.0.1. In that case the host of the certificates will need to change depending on whether we're running the webhook server in-cluster or in the localhost

Control Plane Gateway API Support

What

As we are building out multi-cluster traffic capabilities, we want to be able to support Gateway API. More specifically we want to implement a controller in the traffic controller that will allow to reconcile and validate Gateways and be able to define specific gatewayclasss at the control plane level. Additionally we want to allow Gateway admins to be able to place these gateways onto their registered clusters and define listeners within a gateway and have these listeners be provided with TLS and DNS backing

Use Cases

As a gateway administrator, managing gateways across multiple clusters within an environment, I want a single view of my gateways so that I can easily understand which gateways are being used in which clusters and what their current configuration is.
As a gateway administrator, I want to be able to simply add new clusters and gateways as my requirements to scale dictate. Traffic should naturally flow and be balanced across all clusters
As a gateway administrator, I want to place a gateway I am managing via the control plane onto a specific group of data plane / ingress clusters so that these clusters are ready to accept ingress via Gateways I have configured.
As a cluster admin, I want to enforce that only domains I choose can be used with a particular gateway (supported by Gateway API def)

Add new controller logic for reconciling Gateway resources

What

The gateway controller is responsible for reconciling gateway resources. It should do the following things

validate the gateway class set in the gateway is kuadrant.io/traffic-controller
while there is a valid gateway resource present, it should ensure there is a finalizer set on the gateway class
add any transformation annotation to the gateway (initially to set the data plane gateway class)

How To: define and place a multi-cluster gateway

Depends on #54

This should introduce the idea of a multi-cluster gateway and then go on to explain how you can create and place a multi-cluster gateway. This doc should appear under the how-to section of https://docs.kuadrant.io/multicluster-gateway-controller/

It should have "getting started" as a pre-req

Re-Add TLS metrics

Managed Zone Support

What

A managed zone represents a zone in an DNS Provider that is being managed by the traffic controller. Access to the provider is provided separately via credentials given to the traffic controller. In time these credentials will form the basis for a DNSProvider CR but initially there will just be one provider we support.

Considerations and Follow Ons

When should a zone be considered safe to delete
How can we improve scalability (multiple providers, migration, shared zones)
- As part of the above how would we migrate a tenant from a shared zone to a dedicated zone

User Story

As a cluster administrator / Gateway administrator, I want to register a zone with the traffic controller to be used for my applications and multi-cluster ingress needs. I want this zone to be something I choose and own so that I know the application hosts are part of my well known "properties/domains". while still being able to take advantage of all the features offered by the traffic controller.

#35
Provide service/api to find which managed zone a host should be assigned

kuadrant / multicluster-gateway-controller Goto Github PK

multicluster-gateway-controller's Introduction

kuadrant

multicluster-gateway-controller's People

Contributors

Stargazers

Watchers

Forkers

multicluster-gateway-controller's Issues

What

Why

Goals

Tasks

Why

Why

What

Why

How

What

Why

How

What

Why

Links

What

Why

How

Possible implementations

Recommend Projects

Recommend Topics

Recommend Org