Giter VIP home page Giter VIP logo

applicationset-progressive-sync's Introduction

applicationset-progressive-sync

applicationset-progressive-sync is a controller to allow a progressive sync of ArgoCD Applications generated by an ApplicationSet.

Motivation

Argo ApplicationSet is being developed as the solution to replace the app-of-apps pattern.

While ApplicationSet is great to programmatically generate ArgoCD Applications, we will still need to solve how to update the Applications.

If we enable the auto-sync policy, we will update all the generated Applications at once.

This might not be a problem if we have only one production cluster, but organizations with tens or hundreds of clusters need to avoid a global rollout. They need to release new versions of their application in a safer way.

The applicationset-progressive-sync controller allows operators and developers to decide how they want to update their Applications.

Example spec

apiVersion: argoproj.skyscanner.net/v1alpha1
kind: ProgressiveSync
metadata:
  name: myprogressivesync
  namespace: argocd
spec:
  # a reference to the target ApplicationSet in the same namespace
  appSetRef:
    name: myappset
    # the rollout steps
  # app sync options
  syncOptions:
    # enable prune resources during sync, disabled by default
    prune: true
  stages:
      # human friendly name
    - name: two clusters as canary in EMEA
      # how many targets to update in parallel
      # can be an integer or %.
      maxParallel: 2
      # how many targets to update from the selector result
      # can be an integer or %.
      maxTargets: 2
      # which targets to update
      targets:
        clusters:
          selector:
            matchLabels:
              area: emea
    - name: rollout to remaining clusters
      maxParallel: 2
      maxTargets: 4
      targets:
        clusters:
          selector: {}

Status: pre-alpha

Expect a non-functional controller and breaking changes until Milestone 2 is completed.

Contributing

See CONTRIBUTING.md

Configuration

The controller connects to an Argo CD server and requires configuration to do so:

ARGOCD_AUTH_TOKEN: <token of the Argo CD user>
ARGOCD_SERVER_ADDR: <address of the Argo CD server>
ARGOCD_INSECURE: <true/false>

The above configuration is loaded taking into account the following priority order:

  1. Environment Variables.
ARGOCD_AUTH_TOKEN=ey...
ARGOCD_SERVER_ADDR=argocd-server
ARGOCD_INSECURE=true
  1. Files in the Config Directory (/etc/applicationset-progressive-sync/).
/etc/
├── applicationset-progressive-sync/
│   ├── ARGOCD_AUTH_TOKEN  # file content: ey...
│   ├── ARGOCD_SERVER_ADDR # file content: argocd-server
│   ├── ARGOCD_INSECURE    # file content: true

If at least one of the options is missing, the controller will fail to start.

Development

Local development with Kubebuilder

To get the controller running against the configured Kubernetes cluster in ~/.kube/config, run:

make install
make run

Please remember the ARGOCD_AUTH_TOKEN, ARGOCD_SERVER_ADDR and ARGOCD_INSECURE environment variables need to be present in order to run against a Kubernetes cluster with Argo CD. If the cluster was configured using the hack/setup-dev.sh script, these variables are part of the .env.local file.

Deploying to a Kubernetes cluster

To deploy the controller to a Kubernetes cluster, run:

make install
make docker-build
make deploy

In order to do so, the target cluster needs to have a secret named prc-config containing the three necessary variables: ARGOCD_AUTH_TOKEN, ARGOCD_SERVER_ADDR and ARGOCD_INSECURE. If using the dev environment in the following section, this secret has already been created.

If using kind clusters, docker images need to be loaded manually using kind load docker-image <image>:<version> --name <cluster-name>.

Configuration Secret

Here's a sample secret of the necessary configuration:

apiVersion: v1
data:
  ARGOCD_AUTH_TOKEN: ey...
  ARGOCD_INSECURE: true
  ARGOCD_SERVER_ADDR: argocd-server
kind: Secret
metadata:
  name: prc-config
  namespace: argocd
type: Opaque

Setting up dev environment

To facilitate local debugging and testing against real clusters, you may run:

bash hack/install-dev-deps.sh
bash hack/setup-dev.sh [argocd-version] [appset-version]
make install
make deploy

this will install all the dependencies (pre-commit, kubebuilder, argocd, kind) and it will install the correct version of ArgoCD Application API package for you. If you omit argocd-version and/or appset-version it will default to the latest stable/tested versions of ArgoCD and Appset controller.

After running the script, you will have 3 kind clusters created locally:

  • kind-argocd-control-plane - cluster hosting the argocd installation and the progressive sync operator. This cluster is also registered with Argo so that we can simulate using the same process for deploying to control cluster as well
  • kind-prc-cluster-1 and kind-prc-cluster-2 - are the target clusters for deploying the apps to.

This gives us a total of 3 clusters allowing us to play with multiple stages of deploying. It will also log you in argocd cli. You can find additional login details in .env.local file that will be generated for your convenience.

Regenerating your access

In case that your access to the local argocd has become broken, you can regenerate it by running

bash hack/login-argocd-local.sh

This will create a socat link in kind docker network allowing you to access argocd server UI through your localhost. The exact port will be outputted after the command has been run. Running this command will also update the values in .env.local.

Registering additional clusters

If you want to create additional clusters, you can do so by running:

bash hack/add-cluster <cluster-name> <recreate> <labels>

This will spin up another kind cluster and register it against ArgoCD running in kind-argocd-control-plane

Deploying local test resources

You can deploy a test appset and a progressive sync object to your kind environment via:

bash hack/redeploy-dev-resources.sh

Feel free to extend the cluster generation section of the appset spec if you want to deploy it clusters that you have manually created.

Debugging

make debug

Invoking the command above should spin up a Delve debugger server in headless mode. You can then use your IDE specific functionality or the delve client itself to attach to the remote process and debug it.

NOTE: On MacOSX, delve is currently unkillable in headless mode with ^C or any other control signals that can be sent from the same terminal session. Instead, you'd need to run

bash ./hack/kill-debug.sh

or

make debug

from another terminal session to kill the debugger.

Debugging tests

Delve can be used to debug tests as well. See Test launch configuration in .vscode/launch.json. Something similar should be achievable in your IDE of choice as well.

Update ArgoCD Application API package

Because of argoproj/argo-cd#4055 we can't just run go get github.com/argoproj/argo-cd.

Use hack/install-argocd-application.sh to install the correct version of the Application API.

applicationset-progressive-sync's People

Contributors

aravindvnair99 avatar dependabot[bot] avatar dimitarhristov111 avatar diogomcampos avatar maruina avatar nebojsa-prodana avatar riuvshyn avatar sledigabel avatar smirl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

applicationset-progressive-sync's Issues

Support maxTargets and maxParallel with % values

The ProgressiveSync CRD is designed as to support maxTargets and maxParallel as IntOrString.

Consider the following scenario, when you have 4 clusters and a ProgressiveSync similar to

[...]
stages:
  - name: one cluster
     maxTargets: 1
     maxParallel: 1
     targets:
       clusters:
         selector:
           name: cluster-one
  - name: everything else
     maxTargets: 100%
     maxParallel: 25%
     targets:
       clusters:
         selector: {}

In the second stage, we would expect maxTargets to be 3, but instead we set it to 4. This is because the scheduler looks at every application matching the label selector - in this case all of them.

We need to change this logic so when we express maxTargets and maxParallel as %, they are scaled against the remaining clusters.

Add shuffleKey

Consider the following example
image
With this stage, we will update cells-1-eu-west-1a-1 and cells-1-eu-west-1b-2 and no clusters in eu-central-1. A bad deployment will affect 2 of 3 clusters in the same region
image
The shuffleKey alternates the clusters, so a bad deployment will only affect 1 of 3 clusters in each region.

TODO
[ ] Naming is hard: is shuffleKey a good term? Is alternatebyLabel better?

Rename the controller before cutting the alpha release

There is an ongoing discussion about incubating this project under the official argoproj-labs organization as per argoproj/argoproj#25

One of the feedback we got from the community is that the argocd-progressive-rollout name might be confusing for the Argo users.

We should rename this project before cutting an alpha version:

Pass context in reconciliation methods instead of defining a new one

We have few methods of our ProgressiveRolloutReconciler structure which they need a context.

Instead of using the context defined inside the Reconcile method, we re-define it as a background context. This mean we lose all the information about the deadlines if we ever need to cancel the context for whatever reason.

We should instead propagate the context to the dependent methods

Race condition during testing

Found this during CI from #78

https://github.com/Skyscanner/argocd-progressive-rollout/pull/78/checks?check_run_id=2563926011#step:6:115
The test [It] should forward events for owned applications is failing because (I think) the namespace is global but reset before each test, which could lead to a race condition where the namespace is changed during the reconcile loop and ends up being inconsistent.

The namespace is set here:
https://github.com/Skyscanner/argocd-progressive-rollout/blob/cb03eef8d637218632005b4266b5a4e47a37b55d/controllers/progressiverollout_controller_test.go#L48-L52

and the test is performed here:
https://github.com/Skyscanner/argocd-progressive-rollout/blob/cb03eef8d637218632005b4266b5a4e47a37b55d/controllers/progressiverollout_controller_test.go#L86-L108

Improve logging

In #114 we needed the Scheduler to log some information. To do that, we passed a logr.Logger to the scheduler from the reconciliation loop.

What we need to do:

  • Is that a good idea or do we prefer to add the information we need to the context, pass a context and create a new logger from it? We need to agree on the implementation
  • We only use log.Info, while we should agree at least on 2 different logs levels (INFO and DEBUG), their values and start using log.V().Info to differentiate between the 2 levels.
  • Add a new flag to enable or disable debug logging or setting a log-level.

Allow partial rollout

Users might write their stages in a way so they forget to update some of the clusters they have deployed into.
To allow users to deliberately doing a partial rollout but protect against errors, we should add a flag. The spec should look like

spec:
  ...
  allowPartialRollout: true
  ...

Default to false.

We should have a validating webhook that rejects a ProgressiveRollout object if the flag is not set.

Fix test structure

During #25 we agreed on a writing the unit tests in the following format

for _, testCase := range testCases {
		t.Run(testCase.name, func(t *testing.T) {
			got := Scheduler(testCase.apps, testCase.stage)
			g := NewWithT(t)
			g.Expect(got).To(Equal(testCase.expected))
		})
	}

This allow to run the test individually and we get one output per test (see #25 (comment))

We need to make sure we use this style for every unit test.

Reset ProgressiveSync object

The ProgressiveSync object needs to be reset when there's a change in stages status. This applies to both stages status - which could be reset/cleaned - and to conditions - which should be set to reflect the new overall status of the object.

An example is a completed progressive sync (with condition Complete set to True) that starts synchronizing again and sets a stage to be Progressing. The latest event's Complete condition should not be True at this point.

Fix data race condition

When running the test, we sometimes have a data race condition

[1620309206] Controller Suite - 8/8 specs ••••••==================
WARNING: DATA RACE
Write at 0x00c000b2c968 by goroutine 50:
  github.com/Skyscanner/argocd-progressive-rollout/controllers.glob..func1.1()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:52 +0x450
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xfc
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x184
  github.com/onsi/ginkgo/internal/leafnodes.(*SetupNode).Run()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/setup_nodes.go:15 +0xb9
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:193 +0x2f2
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x187
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x17b
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x235
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x145
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x839
  github.com/onsi/ginkgo.RunSpecsWithCustomReporters()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x357
  github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters()
      /Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0x125
  github.com/Skyscanner/argocd-progressive-rollout/controllers.TestController()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/suite_test.go:51 +0x148
  testing.tRunner()
      /usr/local/opt/go/libexec/src/testing/testing.go:1193 +0x202

Previous read at 0x00c000b2c968 by goroutine 67:
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).syncApp()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:328 +0x175
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:104 +0x76c
  fmt.(*pp).printValue()
      /usr/local/opt/go/libexec/src/fmt/print.go:806 +0x28fa
  fmt.(*pp).printValue()
      /usr/local/opt/go/libexec/src/fmt/print.go:865 +0xf37
  fmt.(*pp).printArg()
      /usr/local/opt/go/libexec/src/fmt/print.go:712 +0x284
  fmt.(*pp).doPrintf()
      /usr/local/opt/go/libexec/src/fmt/print.go:1026 +0x330
  fmt.Sprintf()
      /usr/local/opt/go/libexec/src/fmt/print.go:219 +0x73
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:89 +0xed8
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:81 +0xbbc
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:293 +0x42a
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:248 +0x368
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x64
  fmt.(*pp).doPrintf()
      /usr/local/opt/go/libexec/src/fmt/print.go:1026 +0x330
  fmt.Sprintf()
      /usr/local/opt/go/libexec/src/fmt/print.go:219 +0x73
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:89 +0xed8
  github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
      /Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:81 +0xbbc
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:293 +0x42a
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:248 +0x368
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x64
  k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x4e
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x75
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xba
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x114
  k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0xb3
  k8s.io/apimachinery/pkg/util/wait.UntilWithContext()
      /Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99 +0x64

Goroutine 50 (running) created at:
  testing.(*T).Run()
      /usr/local/opt/go/libexec/src/testing/testing.go:1238 +0x5d7
  testing.runTests.func1()
      /usr/local/opt/go/libexec/src/testing/testing.go:1511 +0xa6
  testing.tRunner()
      /usr/local/opt/go/libexec/src/testing/testing.go:1193 +0x202
  testing.runTests()
      /usr/local/opt/go/libexec/src/testing/testing.go:1509 +0x612
  testing.(*M).Run()
      /usr/local/opt/go/libexec/src/testing/testing.go:1417 +0x3b3
  main.main()
      _testmain.go:95 +0x356

Goroutine 67 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:208 +0x6e4
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x264
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1()
      /Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:659 +0xe3
==================
••
 SUCCESS! 23.129771481s --- FAIL: TestController (23.15s)
    testing.go:1092: race detected during execution of test
FAIL
coverage: 73.8% of statements

Ginkgo ran 1 suite in 1m4.201436219s
Test Suite Failed
make: *** [test] Error 1

Add clusters status in stage status

In #8 we introduced the concept of status.stages.

To complete the information about the status of each stage, we should extend status.stages with information about the cluster status.

It should look like

status:
  stages:
    - name: canary in emea
      phase: Completed
      message: Stage completed
      targets: 
        - cluster-1-eu-west-1a-1
        - cluster-2-eu-central-1b-1
      syncing: 
        - cluster-1-eu-west-1a-1
      requeued: []
      failed: []
      completed:
        - cluster-2-eu-central-1b-1
      startedAt: 2019-07-10T08:23:18Z
      finishedAt: 2019-07-10T08:40:18Z

Add scheduling logic

Add the logic that giving the target clusters returns the list of applications that we should sync.

Add basic controller structure

Create the basic structure of the controller with a basic CRD. A basic CRD looks like

apiVersion: deployment.skyscanner.net/v1alpha1
kind: ProgressiveRollout
metadata:
  name: myservice
  namespace: argocd
spec:
  sourceRef:
    apiGroup: argoproj.io/v1alpha1
    kind: ApplicationSet
    name: file-cache-proxy
  stages:
    - name: one cell as canary in eu-west-1
      maxParallel: 1 # can be omitted as default is 1
      maxTargets: 1
      targets:
        clusters:
          selector:
            matchLabels:
              region: eu-west-1

The cluster selection logic should be on a different PR.
The CRD status looks like

status:
  conditions:
    - lastTransitionTime: "2019-07-10T08:23:18Z"
      lastUpdateTime: "2019-07-10T08:23:18Z"
      message: Progressive rollout completed successfully, rollout finished.
      reason: Succeeded
      status: "True"
      type: RolledOut

When creating a ProgressiveRollout object, the controller should reconcile and just succeed.

TODO

[ ] Tests
[ ] Update README with CRD specs

Fix k8sClient in test

According to the latest kubebuilder book about writing test

Note that we set up both a “live” k8s client, separate from the manager. This is because when making assertions in tests, you generally want to assert against the live state of the API server. If you used the client from the manager (k8sManager.GetClient), you’d end up asserting against the contents of the cache instead, which is slower and can introduce flakiness into your tests. We could use the manager’s APIReader to accomplish the same thing, but that would leave us with two clients in our test assertions and setup (one for reading, one for writing), and it’d be easy to make mistakes.

We should change https://github.com/Skyscanner/argocd-progressive-rollout/blob/main/controllers/suite_test.go#L100 to use a different client.

Discuss pointer vs value semantics, agree and refactor

Investigate flaky `[It] should forward an event for a matching argocd secret `

Sometimes this test fails as in https://github.com/Skyscanner/argocd-progressive-rollout/runs/2366453478?check_suite_focus=true

ProgressiveRollout Controller requestsForSecretChange function [It] should forward an event for a matching argocd secret 
/home/runner/work/argocd-progressive-rollout/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:146

  Expected
      <int>: 0
  to equal
      <int>: 1

  /home/runner/work/argocd-progressive-rollout/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:175

[...]

We should investigate and either fix the test or document why this flakiness is OK and expected.

This issue has been reported by @nebojsa-prodana as well.

Add sync Application logic

Add the logic to tell ArgoCD to sync a target Application. We should document on which approach we are taking before implementing it.

Options:

Inconsistent object declaration

We are not consistent when we declare a new object.

Sometimes we do something like

pr := deploymentskyscannernetv1alpha1.ProgressiveRollout{}

and sometimes we do

pr := &deploymentskyscannernetv1alpha1.ProgressiveRollout{}

We agreed on using the first format because then it's very explicit when you need to pass a reference to it. For example

if err := r.Get(ctx, req.NamespacedName, &pr); err != nil {...}

is more clear then

if err := r.Get(ctx, req.NamespacedName, pr); err != nil {...}

Add targets selection logic

Add functions to retrieve the targets selection. We should be able to test it in isolation, outside the reconciliation loop.

Unable to start controller

The latest version doesn't start because of the following error

argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 2021-04-12T10:06:16.393Z	INFO	controller-runtime.metricsmetrics server is starting to listen	{"addr": ":8080"}
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 2021-04-12T10:06:16.393Z	ERROR	setup	unable to read configuration	{"error": "strconv.ParseBool: parsing \"\\\"true\\\"\": invalid syntax"}
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout github.com/go-logr/zapr.(*zapLogger).Error
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout main.main
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 	/workspace/main.go:74
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout runtime.main
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 	/usr/local/go/src/runtime/proc.go:204

Add stage status

We should provide users with a feedback about not only the ProgressiveRollout CRD itself, but about each of its stages.

A status.stages should look like

status:
  stages:
    - name: canary in emea
      phase: Completed
      message: Stage completed
      startedAt: 2019-07-10T08:23:18Z
      finishedAt: 2019-07-10T08:40:18Z
    - [...]

Add the ability to cancel a rollout

We need a way to signal to the APS that a progressive rollout is cancelled by a user.

That includes a new final state (Cancelled) and some way of notifying the APS object of the cancellation.

Add finalizer

Finalizers allow controllers to implement asynchronous pre-delete hooks. They also set a delete timestamp on the object. Presence of deletion timestamp on the object indicates that it is being deleted. Otherwise, without finalizers, a delete shows up as a reconcile where the object is missing from the cache.

See https://book.kubebuilder.io/reference/using-finalizers.html

Add Application and Secret watch

A change to ArgoCD Applications or Clusters (kubernetes secrets) needs to trigger a reconciliation loop, but only for the resources owned by the ApplicationSet defined in the ProgressiveRollout CRD

TODO

[ ] Add sourceRef in the ProgressiveRollout CRD
[ ] Add watches for Applications and Secrets
[ ] Filter only owned Applications and Clusters
[ ] Add additional rbac rules

Add ProgressiveSync failed status

When a progressive sync is completed, the controller sets the CRD status to Completed.

In the case when a stage fails, the controller doesn't have the logic to set the progressive sync to Failed. We already have a scheduler function to determine if the stage is failed, we need to update the ProgressiveSync object.

CodeQL fails on dependabot PRs

See for example https://github.com/Skyscanner/argocd-progressive-rollout/pull/78/checks?check_run_id=2563925698

 Error: Workflows triggered by Dependabot on the "push" event run with read-only access. Uploading Code Scanning results requires write access. To use Code Scanning with Dependabot, please ensure you are using the "pull_request" event for this workflow and avoid triggering on the "push" event for Dependabot branches. See https://docs.github.com/en/code-security/secure-coding/configuring-code-scanning#scanning-on-push for more information on how to configure these events.

Scope the controller to a single namespace

We coded the controller under the assumption that is running in a single namespace but we've seen instances where we have multiple Applications or ProgressiveSync under different namespaces interfering with each other (#103).
We also want to run multiple controller versions in the same cluster under different namespaces so we can test a new version without impacting every customer.

However, we didn't do the proper setup as per https://book.kubebuilder.io/cronjob-tutorial/empty-main.html#every-journey-needs-a-start-every-program-a-main. In particular, we're not passing the required Namespace to the manager and we might also want to watch a specific set of namespaces only.

We need two CLI flags to specify two different namespaces:

  • the namespace where the controller is running (default to the ArgCD default namespace, argocd)
  • the namespace where ArgoCD is running, since we need to watch its Applications and Secrets

When we do this change we also need to change the Helm Chart because if it's namespace scoped we don't need a ClusterRole and a ClusterRoleBinding but just a Role and a RoleBinding.

Add well-known sorting when selecting targets

The target selection returns the list as we get that from the controller-runtime client. We should have a predictable order because

  • easier to test
  • easier to debug
  • if we released bad code we make sure we always update the same clusters first, reducing MTBF

Note that we don't want the same order for every ProgressiveRollout CRD as we would end up having the same "canary" cluster for every service. Using the PR CRD name as seed might be a good idea

Infinite loop when multiple stage targeting the same clusters

We got an infinite loop with the following ProgressiveSync

spec:
[...]
  stages:
  - maxParallel: 1
    maxTargets: 1
    name: one cluster as canary in eu-west-1
    targets:
      clusters:
        selector:
          matchLabels:
            region: eu-west-1
  - maxParallel: 3
    maxTargets: 3
    name: one cluster as canary in every other region
    targets:
      clusters:
        selector:
          matchExpression:
          - key: region
            operator: NotIn
            values:
            - eu-west-1
    name: rollout to remaining clusters
  - maxParallel: 25%
    maxTargets: 100%
    targets:
      clusters:
        selector: {}

After some investigation, we found the following issues:

Add pause

Users might want to pause a rollout in a break-glass scenario.
The spec looks like

spec:
  ...
  paused: true
  ...

In this case, the desired behavior is to just don't reconcile this object anymore until we change the spec back (or set paused: false) and reflect this in the status.
By default pause should be false.

Remove predicate.GenerationChangedPredicate{}

In #96 we added a predicate to the For responsible for sending ProgressiveSync events.
While that helped us moving forward, it means the controller is not able to auto-heal in case something changes the CRD Status.
We should remove the predicate and make the reconciliation loop idempotent, so it should converge over time to the desired state.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.