Giter VIP home page Giter VIP logo

multicluster-gateway-controller's Introduction

kuadrant

multicluster-gateway-controller's People

Contributors

adam-cattermole avatar alexsnaps avatar david-martin avatar dependabot[bot] avatar eoinfennessy avatar ficap avatar grzpiotrowski avatar jasonmadigan avatar kevfan avatar laurafitzgerald avatar makslion avatar maleck13 avatar mikenairn avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar pehala avatar philbrookes avatar pmccarthy avatar r-lawton avatar roivaz avatar sergioifg94 avatar trepel avatar ygnas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

multicluster-gateway-controller's Issues

Allow placement of Gateways

What

As a gateway admin, I want to be able to chose to place a gateway on a set of clusters from my available clusters .

option

use an exact match placement label

kuadrant.io/placement: "key=value"

at a future point we may need more than this simplistic placement logic but this is lightweight.
When a placement is added, the controller needs to figure out which clusters names (secrets) this is selecting. It then needs to add a new label something like: kuadrant.io/<cluster>/placement.decision: "true" which the sync agent will watch for

Done

  • new placement label defined
  • new logic to find the cluster the label matches
  • new logic to add a placement decision to the gateway resource
  • new unit tests

Hook up limitador in local workload clusters to a shared redis instance

What

Deploy a single redis instance to the local control plane cluster.
For each workload cluster, configure the limitador instance to connect to this single redis isntance in the control plane.
This can be added to https://github.com/Kuadrant/multi-cluster-traffic-controller/blob/main/hack/local-setup.sh

Why

So that shared counters can be used between limitador instances.
This will enable development and testing of multi cluster rate limiting scenarios.
Note that the location of redis is not important from an architecture point of view.
The goal of this issue is to have some central shared redis instance, and the control plane cluster makes sense for local development.
In a production environment, the redis will probably be elsewhere (e.g. Elasticache). It's unlikely to be part of the HCG control plane architecture.

This issue depends on #29 for the installation of kuadrant (and limitador) in workload clusters.

Add AWS config & credentials to the 'deploy' target

Currently if you want to deploy the controller to a cluster (local or otherwise), you need to patch the Deployment to either have env vars for AWS access, or pull them in from configmap &/or secret.

In glbc, a local aws-credentials.env and controller-config.env file, which are in the gitignore, are used to store these values and then pulled in via kustomize https://github.com/kcp-dev/kcp-glbc/blob/main/config/deploy/local/kcp-glbc/kustomization.yaml#L16-L26
The .env files have a template, with some sensible defaults so a developer can copy it.
This model could be used here unless there's a better model we can come up with.

Basic Sync Support

What

We are adding a sync agent to reflect certain resources defined at the control plane into the data plane. This syncer will also be responsible for reflecting the status of those resources back into the control plane and applying transforms added via the controllers in the control plane. This syncer will lean heavily on what we have learned through use of the KCP sync component

At the end of this work the control plane should no longer talk to the data plane.

Open Questions

  • how do we choose which namespace on the cluster to sync to? Do we generate a namespace like kcp did based on the tenant namespace?
  • Initially we can sync to a known namespace (IE istio-system)

Outcome

We should have basic sync capabilities for gateways and TLS secrets based on access provided by service account secrets. We should capture, patterns, concepts and remaining questions as a Design doc

  • Sync DNSRecords based on Ingress #70
  • Remove cluster watch support #70
  • Remove webhook reconcile #70
  • Add capability to sync gateways and secrets
  • Add capability to report back the status of a given gateway as a multi-cluster status

Investigate custom ingress Gateway and controller

What

We want to ensure that what we do is compliant with gitops, and also intuative for users who are familiar with kubernetes concepts. Currently we modify ingresses that we see directly. This can cause issues with gitops tools like argoCD. Additionally it can make configuring how to manage these ingresses complex as there may be other controllers and webhook intercepting these ingresses.

To solve this, we want to explore the idea of creating a custom ingress controller as part of the traffic controller deployment. This ingress controller would define its own ingressClass. While it would be considered an ingress controller it would effectively ensure that ingress was transformed and behaving as expected before delegating the ingress via a copy to a chosen ingress controller that would then use that ingress to configure traffic routing etc. As this copy would be owned by the original, it would be considered a derived resource and so not cause issues with ArgoCD.

This approach means delegating an ingress to traffic controller would mean assigning the right ingressClass to your ingress.

Additionally as we are looking at defining multi-cluster gateways at the control plane, we potentially could represent this ingress controller via a "virtual gateway" resource at the control plane. Where this resource would configure and trigger the deployment of our ingress controller to the data plane

Ingress Controller

  • Sees Ingress
  • Checks its listeners config (specified at the control planed and synced down)
  • validates that the ingress is good (passes any allowedRoutes rules)
  • finds the needed TLS secret for the listener (also in the same ns as the controller) and copies it to the ingress ns
  • mutates the ingress and creates a copy of that ingress in the same ns as the target ingress
  • The ingress controller can only delegate to one actual ingress controller. The LB for this is what is set in the status of the virtual gateway
apiVersion: kudarant.io/v1Alpha1
kind: VirtualGateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: ingress-gateway
  listeners:
  - name: https
    hostname: "*.example.com"
    allowedRoutes:
      kinds: 
        - kind: Ingress
      namespaces:
        from: Selector
        selector:
          matchLabels:
            shared-gateway-access: "true"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: example-com

Update GetDNSRecords log to query MZ ns only

The GetDNSRecords dns service method gets called to return existing records for a particular host. This needs to check the namespace of the managed zone for that particular host, instead of what it's currently doing of just checking the default ctrl ns, and with later changes the default and the traffic resource ns (see #84).

GetDNSRecords is called from 3 places:

  • EnsureManagedHost (ingress specific)
  • AddEndPoints
  • RemoveEndpoints

Add DNSHealthCheck Controller

What

Add a new healthcheck controller under a dns package. This controller is responsible for reconciling the DNSHealthCheck

  • Validate the target and spec of the DNSHealthCheck
  • For each defined listener in the Gateway, add a health check to the DNSRecord ( we can modify the spec of the DNSRecord to support health checks)
  • This controller should be triggered by changes to the DNSRecord, Gateway and DNSHealthCheck resources. DNSHealthCheck should probably be owned by the gateway it is targeting so it is removed if the gateway is removed
  • The status of the DNSHealthCheck should reflect whether the check is considered active
  • Another option is to add a status listener condition that informs that a health check is active for that listener

Allow a certificate strategy to be defined for ManagedZone

What

By default, the traffic controller will automatically setup new certificates for listener hosts added to a gateway definition. There are use cases where, the gateway admin may not want to add a particular subdomain as a listener host but still want to be able to have a certificate that they can use for specific hosts they define under that subdomain.

Use Case

As a gateway admin, I want to setup a managed zone with a root domain and be able to choose how certificates are created for subdomains of that root domain. This is so I can request a certificate for *.a.b.com but not allow any use of *.a.b.com in the gateway definition. Instead I will define 1.a.b.com as a listener host in my gateway and use the certificate for *.a.b.com to cover this listener host

bf2fc6cc711aee1a0c2a/architecture#84

Advertise the subdomain of a default zone

Follows on from #65
What

By default zones setup in the root traffic controller namespace will be used as shared zones. The controller will by default assign a subdomain of the default zone for each tenant. We need a way to advertise this subdomain to the tenant ns. This could potentially be done via a label on the ns or a new subdomain resource.

Advanced Sync Support

#43 introduced the sync component and enabled us to be able to connect to the control plane, sync resources and report status back to the control plane.
This is a good foundation but there are several pieces that are needed in order to bring the sync component to a full MCVP

  • Add capability to apply transforms to the gateway declared by the control plane (IE the gateway class to use)
  • Remove MCTC annotations from downstream resource.
  • Add deletion support based on deletion of resources in the CP
  • Replace hardcoded downstream namespace
  • Unit tests that cover the key functionality of the above sync features
  • Add e2e tests that cover sync functionality

AWS DNSHealthCheck

Use Case

As a gateway admin managing a set of gateways across multiple clusters, I want to define a common health check that will govern whether a DNS A record is considered healthy so that with a single policy, I can ensure the DNS response for services exposing endpoints and using listeners on my gateway are only returning endpoints that are from healthy service endpoints allowing us to automatically and rapidly mitigate an unexpected outage. I also want to define a strategy for if all the endpoints become available. I want to define a minimum number of records per hostnames so that my health check doesn't remove all records .

What
This is an API that targets a gateway and encapsulates an end point check. We should base it on what we provided via annotations in the GLBC https://github.com/kcp-dev/kcp-glbc/blob/main/docs/dns/health-checks.md

Constraint
At this point the endpoint has to be publicly reachable. In the future we may look at other forms of check that do not require this. Example something running at the gateway level that performs the check and reports back via a status condition on the listeners status block

Cluster Registration

What

We want to be able to register clusters with the control plane within a given tenant NS . These clusters will be represented by secrets. These cluster secrets will be the place where additional information and context is added about the cluster.

The setup of this can be done via a script initially.

This should also spit our yaml for the sync agent and webhook configuration

Constraints

There isn't a requirement to setup a the tenant as part of this work, this is just focused on registering the cluster within a tenant.

  • #68
  • Add logic to the registration script that uses a KUBECONFIG to create a new service account and create a rolebinding in the tenant ns
  • Add logic that uses the SA and outputs a deployment etc to setup the syncer

Create an e2e test suite

  • Add a e2e test suite that can be run against the local-setup environment
  • Add GH action that can create a local environment and execute the test suite against it

Add e2e test for DNSHealthCheck

What

Beyond regular unit tests, we should add an e2e test for the DNSHealthCheck. This test would not need to directly change the DNSProvider but could use a mock client for that piece. We would want to test that the HealthCheck was reconciled correctly, the DNSRecord was updated correctly and any additonal status on the gateway added along with the reverse flow for deletion.
The test should also validate that we do not reconcile badly defined health checks

Enhance lookup logic for managed zones

The current lookup logic that maps a host tro a managed zone or any host is simplistic in that it will only knock off the first label and look for an exactly matching managed zone for the rest e.g. foo.bar.baz.mydomain.com, would look for an MZ for bar.baz.mydomain.com and create a record for foo in it.

If we want to be able to work down the tree and for example find an MZ for mydomain.com and create a record for foo.bar.baz then we would need to update the logic in some way to allow this.

Need to consider how we would do this lookup in an efficient way since you would need to search for each item down the tree until one matched.

Add support for reconciling Gateway Listeners

What

The gateway controller in the traffic controller, will need to reconcile the listeners defined in the gateway. For each listener the following requirements have been identified

  • Check if the host is part of a managed zone for that tenant
  • If not then it can be ignored
  • ensure there is a certificate for the host specified (if it is a wildcard host the cert should be a wildcard cert)
  • ensure there is a DNSRecord based on the LB status in the gateway (only once the certificate is ready)

Install istio in local dev environment using the IstioOperator CR

Currently, Istio is installed via helm:

This should be changed to install the Istio Operator (possibly via helm?) and use an IstioOperator CR

Why

So that a Kuadrant AuthPolicy and Rate limit policy can be used.
When using the Istio operator, the IstioOperator CR is how Kuadrant configures the Istio extension provider.
Without this CR, the mesh config will have to configure manually (or by HCG) instead of by Kuadrant.

See https://redhat-internal.slack.com/archives/C047YHJJG2D/p1676895679014959 for additional context

NOTE When creating a gateway-api Gateway resource, the spec.addresses field must include a Hostname value that matches the gateway Service dns name that was created by the Istio Operator e.g. [{"type":"Hostname","value":"istio-ingressgateway.istio-system.svc.cluster.local"}]. Otherwise the gateway-api Gateway resource won't be "linked" to the running Istio gateway. Instead it will result in another istio gateway being started.

Investigate approach to exclusively use managed host

What

Currently, MCTC has the ability to set an autogenerated managed host for every rule in the Ingress that it reconciles. It does so by duplicating the rule and setting the host of the duplicated rule as an algorithmically generated host. For example, with ZONE_ROOT_DOMAIN being mctc.io, the spec would be reconciled from this:

spec:
  rules:
    - host: example.custom.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80

To this

spec:
  rules:
    - host: example.custom.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80
    - host: <autogenerated>.mctc.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: echo
                port:
                  number: 80

Why

It is possible for there to be a scenario where a user wants to create an Ingress without passing a specific host, only expecting to use the autogenerated managed host. Previously, in the kcp supported controller, the empty string would be used for this scenario. However, in the new architecture, the Ingress is also subject to be reconciled by the Nginx Ingress Controller. This controller sets a validation on Ingresses that ensures the empty host is only used once across the cluster, which limits the usage of it

How

Investigate alternative approaches to set rules that exclusively rely on the managed host without conflicting with the Ngnix controller.

Investigate potential conflicts with third-party traffic controllers

What

The MCTC watches and reconciles traffic objects (currently limited to Ingresses) that might be reconciled by other controllers (for example the local setup includes the Nginx Ingress Controller, which reconciles the Ingress at the same time as the MCTC)

Why

As support for other traffic objects (For example, Openshift Routes) is added, potential conflicts might arise with other controllers.

How

Investigate restrictions put in place by other traffic controllers (webhooks, reconciliation logic) that might conflict with the reconciliation of the MCTC, and create follow up issues to ensure that these conflicts are kept in mind when adding support for these objects

Add Istio & kuadrant to local dev workload clusters

What

Extend the logic for setting up workload clusters to:

  • Install the Istio Operator, including the Gateway API CRDs i.e. Gateway, GatewayClass, HttpRoute ...
  • Install the Kuadrant Operator, including the Kuadrant API CRDs i.e. AuthPolicy, RateLimitingPolicy, and Authorino & Limitador

Why

So that a Gateway resource can be created in a local workload cluster.
The intention is for a Gateway to be defined in the control plane, and synced into workload clusters by the traffic controller. This work will be captured in other issues.
This issue is done if a Gateway can be manually created in a workload cluster.

Links

Investigate using the namespace as a service registration and proxy

what

Look into re-using the code ready namespace as a service components to provide a way to register tenants and also provide a k8s proxy

Goals/questions to answer

  • Can we get this setup working with k8s
  • How or what do we want to use in order to provide the authentication
  • Is this a full transparent proxy for k8s ( does it support things like watch kubectl -f etc)
  • Can the sync agent act against this proxy
  • Can we keep this as an optional piece

Allow use of shared managed zones

What

When setting up traffic controller, it should be possible to setup a default zone that is used by default for hosts in gateways etc.

Use Case
I don't want to have to setup a zone for every tenant ns but instead want to define a default zone that can be used by any tenant. Each tenant should receive a subdomain underneath the root domain of this zone

Option

Allow zones to be added to the same namespace as the traffic controller. By default these zones are considered shared. When a new namespace is added with a specific label, the controller should assign that new tenant a subdomain of the default zone. .

Refactor glbc-deployments repo

What
With the global load balancer (GLBC) being replaced by the Multi Cluster Traffic Controller (MCTC) we should look to rename the glbc-deployments repository to something more generic such as kcp-deployments while updating Argo configurations to support this name change.

From there we should also look at updating or replacing directory names and configurations within the repository to use MCTC configurations instead

Done

  • The glbc-deployments repository has been renamed to kcp-deployments
  • Sub directories and config files within the glbc-deployments repo are updated to support MCTC

Add Base Gateway and Gateway class controllers

What

Add the base GatewayClass and Gateway controllers. There should be no logic at this point but just setup of the wiring events etc. This will allow us to potentially split up tasks within the controller

Define ManagedZone CRD and controller

What

Define the managed zone CRD. We should keep this very simple for the time being. The spec currently likely only needs the root domain for the zone.
The status will need ready conditions, and a place to report nameservers back to the requester so that they can update their own DNS Provider to point at ours.
We may also want to record additional information in the spec/status such which provider this Zone is hosted in and the number of records currently assigned to the zone

Done

  • Simple API Resource that specifies a root domain for a new zone to be used by the tenant e.g. apps.mycompany.com
  • New Managed Zone controller within the traffic controller
  • The new zone is assigned to a provider by the traffic controller
    (only 1 provider at this stage)
  • The zone is then created in a provider via the traffic controller using provider credentials
  • The zone status is updated with the nameservers for this new zone and a ready condition
  • Unit tests in place

Enable webhooks when running the controller locally

What

The changes introduced in #9 allow the webhook system to be set up only when the controller is running in-cluster. The webhook configuration will not be reconciled when running locally

Why

In order to continously verify and develop the webhook system, it would be better to include the ability for the webhooks to work when the controller runs locally

How

Investigate an approach to expose the webhook server (which will run in the local host) with a valid CA bundle that be referenced in the webhook configuration.

Possible implementations

  1. One alternative to implement this is exposing a container using the webhook TLS certificates, which proxies the requests to 172.32.0.1 (the IP address to the local host). This would require:
    • A container image that can be configured to proxy requests to a given address
    • Configuring the webhook ingress to expose that container image instead of the controller
  2. Another alternative is modifying the controller to receive the TLS certificates location as a parameter and, in that case, run the webhook server with them. Then all that needs to be done is configuring the WebhookConfiguration to point to 172.32.0.1. In that case the host of the certificates will need to change depending on whether we're running the webhook server in-cluster or in the localhost

Control Plane Gateway API Support

What

As we are building out multi-cluster traffic capabilities, we want to be able to support Gateway API. More specifically we want to implement a controller in the traffic controller that will allow to reconcile and validate Gateways and be able to define specific gatewayclasss at the control plane level. Additionally we want to allow Gateway admins to be able to place these gateways onto their registered clusters and define listeners within a gateway and have these listeners be provided with TLS and DNS backing

Use Cases

As a gateway administrator, managing gateways across multiple clusters within an environment, I want a single view of my gateways so that I can easily understand which gateways are being used in which clusters and what their current configuration is.
As a gateway administrator, I want to be able to simply add new clusters and gateways as my requirements to scale dictate. Traffic should naturally flow and be balanced across all clusters
As a gateway administrator, I want to place a gateway I am managing via the control plane onto a specific group of data plane / ingress clusters so that these clusters are ready to accept ingress via Gateways I have configured.
As a cluster admin, I want to enforce that only domains I choose can be used with a particular gateway (supported by Gateway API def)

  • #42
  • #40
  • #41
  • #46
  • #52
  • Documentation of the control plane status and resource spec

Add new controller logic for reconciling Gateway resources

What

The gateway controller is responsible for reconciling gateway resources. It should do the following things

  • validate the gateway class set in the gateway is kuadrant.io/traffic-controller
  • while there is a valid gateway resource present, it should ensure there is a finalizer set on the gateway class
  • add any transformation annotation to the gateway (initially to set the data plane gateway class)

Managed Zone Support

What

A managed zone represents a zone in an DNS Provider that is being managed by the traffic controller. Access to the provider is provided separately via credentials given to the traffic controller. In time these credentials will form the basis for a DNSProvider CR but initially there will just be one provider we support.

Considerations and Follow Ons

  • When should a zone be considered safe to delete
  • How can we improve scalability (multiple providers, migration, shared zones)
    • As part of the above how would we migrate a tenant from a shared zone to a dedicated zone

User Story

As a cluster administrator / Gateway administrator, I want to register a zone with the traffic controller to be used for my applications and multi-cluster ingress needs. I want this zone to be something I choose and own so that I know the application hosts are part of my well known "properties/domains". while still being able to take advantage of all the features offered by the traffic controller.

  • #35
  • Provide service/api to find which managed zone a host should be assigned

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.