skyscanner / applicationset-progressive-sync Goto Github PK
View Code? Open in Web Editor NEWProgressive sync controller for Argo ApplicationSet
License: Apache License 2.0
Progressive sync controller for Argo ApplicationSet
License: Apache License 2.0
In #114 we needed the Scheduler to log some information. To do that, we passed a logr.Logger
to the scheduler from the reconciliation loop.
What we need to do:
log.Info
, while we should agree at least on 2 different logs levels (INFO and DEBUG), their values and start using log.V().Info
to differentiate between the 2 levels.We got an infinite loop with the following ProgressiveSync
spec:
[...]
stages:
- maxParallel: 1
maxTargets: 1
name: one cluster as canary in eu-west-1
targets:
clusters:
selector:
matchLabels:
region: eu-west-1
- maxParallel: 3
maxTargets: 3
name: one cluster as canary in every other region
targets:
clusters:
selector:
matchExpression:
- key: region
operator: NotIn
values:
- eu-west-1
name: rollout to remaining clusters
- maxParallel: 25%
maxTargets: 100%
targets:
clusters:
selector: {}
After some investigation, we found the following issues:
StartedAt
and FinishedAt
timeThe ProgressiveSync object needs to be reset when there's a change in stages status. This applies to both stages
status - which could be reset/cleaned - and to conditions
- which should be set to reflect the new overall status of the object.
An example is a completed progressive sync (with condition Complete
set to True
) that starts synchronizing again and sets a stage to be Progressing
. The latest event's Complete
condition should not be True
at this point.
We should provide users with a feedback about not only the ProgressiveRollout CRD itself, but about each of its stages.
A status.stages
should look like
status:
stages:
- name: canary in emea
phase: Completed
message: Stage completed
startedAt: 2019-07-10T08:23:18Z
finishedAt: 2019-07-10T08:40:18Z
- [...]
Create the basic structure of the controller with a basic CRD. A basic CRD looks like
apiVersion: deployment.skyscanner.net/v1alpha1
kind: ProgressiveRollout
metadata:
name: myservice
namespace: argocd
spec:
sourceRef:
apiGroup: argoproj.io/v1alpha1
kind: ApplicationSet
name: file-cache-proxy
stages:
- name: one cell as canary in eu-west-1
maxParallel: 1 # can be omitted as default is 1
maxTargets: 1
targets:
clusters:
selector:
matchLabels:
region: eu-west-1
The cluster selection logic should be on a different PR.
The CRD status looks like
status:
conditions:
- lastTransitionTime: "2019-07-10T08:23:18Z"
lastUpdateTime: "2019-07-10T08:23:18Z"
message: Progressive rollout completed successfully, rollout finished.
reason: Succeeded
status: "True"
type: RolledOut
When creating a ProgressiveRollout object, the controller should reconcile and just succeed.
[ ] Tests
[ ] Update README with CRD specs
Implement the requeue specification
[ ] Annotate the ProgressiveRollout CRD to protect against controller restart
[ ] See https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html for some hints around adding annotations, timeouts and handle time.
Add the logic that giving the target clusters returns the list of applications that we should sync.
Related to #51 as the manager pod transition from ready to crashing. I suspect the ct
test gets confused by the brief transition to ready.
The controller-runtime supports readiness and liveness probes as per https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L376. There is a discussion with some suggestion about how to implement it here: https://kubernetes.slack.com/archives/CAR30FCJZ/p1595422003154000
Consider the following example
With this stage, we will update cells-1-eu-west-1a-1 and cells-1-eu-west-1b-2 and no clusters in eu-central-1. A bad deployment will affect 2 of 3 clusters in the same region
The shuffleKey alternates the clusters, so a bad deployment will only affect 1 of 3 clusters in each region.
TODO
[ ] Naming is hard: is shuffleKey
a good term? Is alternatebyLabel
better?
We use a deprecated Ginkgo functionality. The test output is https://github.com/Skyscanner/argocd-progressive-rollout/pull/53/checks?check_run_id=2331406128#step:7:246
There is a recommended migration path: https://github.com/onsi/ginkgo/blob/v2/docs/MIGRATING_TO_V2.md#removed-async-testing
The ProgressiveSync CRD is designed as to support maxTargets
and maxParallel
as IntOrString.
Consider the following scenario, when you have 4 clusters and a ProgressiveSync similar to
[...]
stages:
- name: one cluster
maxTargets: 1
maxParallel: 1
targets:
clusters:
selector:
name: cluster-one
- name: everything else
maxTargets: 100%
maxParallel: 25%
targets:
clusters:
selector: {}
In the second stage, we would expect maxTargets to be 3
, but instead we set it to 4
. This is because the scheduler looks at every application matching the label selector - in this case all of them.
We need to change this logic so when we express maxTargets and maxParallel as %, they are scaled against the remaining clusters.
The operator has a mix of pointer semantics vs value semantics.
More than the decision about why, we need to be consistent and not flip flop between one or the other.
https://github.com/Skyscanner/argocd-progressive-rollout/pull/25/files#diff-b1a5cf22f2be07477047c7c268c9a2077f75c3fdadc285eb862dab8bc3aaad31R293
https://github.com/Skyscanner/argocd-progressive-rollout/pull/25/files#diff-c3753ff27e79b7da6571efc4fffc45e729b031a39337eec9b22a04425bbd69f0R12
https://tour.golang.org/methods/8
https://www.ardanlabs.com/blog/2017/06/design-philosophy-on-data-and-semantics.html
The latest version doesn't start because of the following error
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 2021-04-12T10:06:16.393Z INFO controller-runtime.metricsmetrics server is starting to listen {"addr": ":8080"}
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout 2021-04-12T10:06:16.393Z ERROR setup unable to read configuration {"error": "strconv.ParseBool: parsing \"\\\"true\\\"\": invalid syntax"}
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout github.com/go-logr/zapr.(*zapLogger).Error
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout main.main
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout /workspace/main.go:74
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout runtime.main
argocd-progressive-rollout-5dc46f5d86-x5qb8 argocd-progressive-rollout /usr/local/go/src/runtime/proc.go:204
To make sure the operator doesn't get stuck in an infinite loop, we need a safeguard mechanism. We want this to be eventual consistent, because there might be a delay in ArgoCD (it might be down, or unable to sync for a time window).
We can use the progressDeadlineSeconds
concept as seen in https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#deploymentspec-v1-apps and https://docs.flagger.app/usage/how-it-works
Users might write their stages in a way so they forget to update some of the clusters they have deployed into.
To allow users to deliberately doing a partial rollout but protect against errors, we should add a flag. The spec should look like
spec:
...
allowPartialRollout: true
...
Default to false.
We should have a validating webhook that rejects a ProgressiveRollout object if the flag is not set.
Add a validating webhook to make sure the CRD is valid before creating the objects. See https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation.html for implementation details.
Checks
maxParallel
can't be 0 if maxTargets
is > 0maxParallel
and maxTargets
need to be valid IntOrString
As per #44 (comment)
We need defaults for (but possibly not limited to):
Finalizers allow controllers to implement asynchronous pre-delete hooks. They also set a delete timestamp on the object. Presence of deletion timestamp on the object indicates that it is being deleted. Otherwise, without finalizers, a delete shows up as a reconcile where the object is missing from the cache.
See https://book.kubebuilder.io/reference/using-finalizers.html
When running the test, we sometimes have a data race condition
[1620309206] Controller Suite - 8/8 specs ••••••==================
WARNING: DATA RACE
Write at 0x00c000b2c968 by goroutine 50:
github.com/Skyscanner/argocd-progressive-rollout/controllers.glob..func1.1()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:52 +0x450
github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xfc
github.com/onsi/ginkgo/internal/leafnodes.(*runner).run()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x184
github.com/onsi/ginkgo/internal/leafnodes.(*SetupNode).Run()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/setup_nodes.go:15 +0xb9
github.com/onsi/ginkgo/internal/spec.(*Spec).runSample()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:193 +0x2f2
github.com/onsi/ginkgo/internal/spec.(*Spec).Run()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x187
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x17b
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x235
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x145
github.com/onsi/ginkgo/internal/suite.(*Suite).Run()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x839
github.com/onsi/ginkgo.RunSpecsWithCustomReporters()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x357
github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters()
/Users/ruio/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0x125
github.com/Skyscanner/argocd-progressive-rollout/controllers.TestController()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/suite_test.go:51 +0x148
testing.tRunner()
/usr/local/opt/go/libexec/src/testing/testing.go:1193 +0x202
Previous read at 0x00c000b2c968 by goroutine 67:
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).syncApp()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:328 +0x175
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:104 +0x76c
fmt.(*pp).printValue()
/usr/local/opt/go/libexec/src/fmt/print.go:806 +0x28fa
fmt.(*pp).printValue()
/usr/local/opt/go/libexec/src/fmt/print.go:865 +0xf37
fmt.(*pp).printArg()
/usr/local/opt/go/libexec/src/fmt/print.go:712 +0x284
fmt.(*pp).doPrintf()
/usr/local/opt/go/libexec/src/fmt/print.go:1026 +0x330
fmt.Sprintf()
/usr/local/opt/go/libexec/src/fmt/print.go:219 +0x73
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:89 +0xed8
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:81 +0xbbc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:293 +0x42a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:248 +0x368
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x64
fmt.(*pp).doPrintf()
/usr/local/opt/go/libexec/src/fmt/print.go:1026 +0x330
fmt.Sprintf()
/usr/local/opt/go/libexec/src/fmt/print.go:219 +0x73
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:89 +0xed8
github.com/Skyscanner/argocd-progressive-rollout/controllers.(*ProgressiveRolloutReconciler).Reconcile()
/Users/ruio/go/src/github.com/skyscanner/argocd-progressive-rollout/controllers/progressiverollout_controller.go:81 +0xbbc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:293 +0x42a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:248 +0x368
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x64
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0x4e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x75
k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xba
k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x114
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185 +0xb3
k8s.io/apimachinery/pkg/util/wait.UntilWithContext()
/Users/ruio/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99 +0x64
Goroutine 50 (running) created at:
testing.(*T).Run()
/usr/local/opt/go/libexec/src/testing/testing.go:1238 +0x5d7
testing.runTests.func1()
/usr/local/opt/go/libexec/src/testing/testing.go:1511 +0xa6
testing.tRunner()
/usr/local/opt/go/libexec/src/testing/testing.go:1193 +0x202
testing.runTests()
/usr/local/opt/go/libexec/src/testing/testing.go:1509 +0x612
testing.(*M).Run()
/usr/local/opt/go/libexec/src/testing/testing.go:1417 +0x3b3
main.main()
_testmain.go:95 +0x356
Goroutine 67 (running) created at:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:208 +0x6e4
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x264
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1()
/Users/ruio/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:659 +0xe3
==================
••
SUCCESS! 23.129771481s --- FAIL: TestController (23.15s)
testing.go:1092: race detected during execution of test
FAIL
coverage: 73.8% of statements
Ginkgo ran 1 suite in 1m4.201436219s
Test Suite Failed
make: *** [test] Error 1
According to the latest kubebuilder book about writing test
Note that we set up both a “live” k8s client, separate from the manager. This is because when making assertions in tests, you generally want to assert against the live state of the API server. If you used the client from the manager (k8sManager.GetClient), you’d end up asserting against the contents of the cache instead, which is slower and can introduce flakiness into your tests. We could use the manager’s APIReader to accomplish the same thing, but that would leave us with two clients in our test assertions and setup (one for reading, one for writing), and it’d be easy to make mistakes.
We should change https://github.com/Skyscanner/argocd-progressive-rollout/blob/main/controllers/suite_test.go#L100 to use a different client.
An ArgoCD Application can specify a destination server using server
or name
as per https://github.com/argoproj/argo-cd/blob/master/pkg/apis/application/v1alpha1/types.go#L524
At the moment we just look at server
we need to support both.
We are not consistent when we declare a new object.
Sometimes we do something like
pr := deploymentskyscannernetv1alpha1.ProgressiveRollout{}
and sometimes we do
pr := &deploymentskyscannernetv1alpha1.ProgressiveRollout{}
We agreed on using the first format because then it's very explicit when you need to pass a reference to it. For example
if err := r.Get(ctx, req.NamespacedName, &pr); err != nil {...}
is more clear then
if err := r.Get(ctx, req.NamespacedName, pr); err != nil {...}
Add functions to retrieve the targets selection. We should be able to test it in isolation, outside the reconciliation loop.
Users might want to pause a rollout in a break-glass scenario.
The spec looks like
spec:
...
paused: true
...
In this case, the desired behavior is to just don't reconcile this object anymore until we change the spec back (or set paused: false
) and reflect this in the status.
By default pause should be false
.
During #25 we agreed on a writing the unit tests in the following format
for _, testCase := range testCases {
t.Run(testCase.name, func(t *testing.T) {
got := Scheduler(testCase.apps, testCase.stage)
g := NewWithT(t)
g.Expect(got).To(Equal(testCase.expected))
})
}
This allow to run the test individually and we get one output per test (see #25 (comment))
We need to make sure we use this style for every unit test.
A change to ArgoCD Applications or Clusters (kubernetes secrets) needs to trigger a reconciliation loop, but only for the resources owned by the ApplicationSet defined in the ProgressiveRollout CRD
[ ] Add sourceRef
in the ProgressiveRollout CRD
[ ] Add watches for Applications and Secrets
[ ] Filter only owned Applications and Clusters
[ ] Add additional rbac rules
See for example https://github.com/Skyscanner/argocd-progressive-rollout/pull/78/checks?check_run_id=2563925698
Error: Workflows triggered by Dependabot on the "push" event run with read-only access. Uploading Code Scanning results requires write access. To use Code Scanning with Dependabot, please ensure you are using the "pull_request" event for this workflow and avoid triggering on the "push" event for Dependabot branches. See https://docs.github.com/en/code-security/secure-coding/configuring-code-scanning#scanning-on-push for more information on how to configure these events.
The current Secret.yaml
makes the data field mandatory. This breaks the following workflows:
We should remove the required
clause from the chart.
Found this during CI from #78
https://github.com/Skyscanner/argocd-progressive-rollout/pull/78/checks?check_run_id=2563926011#step:6:115
The test [It] should forward events for owned applications
is failing because (I think) the namespace is global but reset before each test, which could lead to a race condition where the namespace is changed during the reconcile loop and ends up being inconsistent.
The namespace is set here:
https://github.com/Skyscanner/argocd-progressive-rollout/blob/cb03eef8d637218632005b4266b5a4e47a37b55d/controllers/progressiverollout_controller_test.go#L48-L52
and the test is performed here:
https://github.com/Skyscanner/argocd-progressive-rollout/blob/cb03eef8d637218632005b4266b5a4e47a37b55d/controllers/progressiverollout_controller_test.go#L86-L108
Adding this one in while I can think of it.
Expose rollout metrics to prometheus
Kubebuilder 3.0 is released and we should migrate sooner rather than later.
There is a nice guide at https://book.kubebuilder.io/migration/v2vsv3.html where they highlight all the new features.
Sometimes this test fails as in https://github.com/Skyscanner/argocd-progressive-rollout/runs/2366453478?check_suite_focus=true
ProgressiveRollout Controller requestsForSecretChange function [It] should forward an event for a matching argocd secret
/home/runner/work/argocd-progressive-rollout/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:146
Expected
<int>: 0
to equal
<int>: 1
/home/runner/work/argocd-progressive-rollout/argocd-progressive-rollout/controllers/progressiverollout_controller_test.go:175
[...]
We should investigate and either fix the test or document why this flakiness is OK and expected.
This issue has been reported by @nebojsa-prodana as well.
We coded the controller under the assumption that is running in a single namespace but we've seen instances where we have multiple Applications or ProgressiveSync under different namespaces interfering with each other (#103).
We also want to run multiple controller versions in the same cluster under different namespaces so we can test a new version without impacting every customer.
However, we didn't do the proper setup as per https://book.kubebuilder.io/cronjob-tutorial/empty-main.html#every-journey-needs-a-start-every-program-a-main. In particular, we're not passing the required Namespace
to the manager and we might also want to watch a specific set of namespaces only.
We need two CLI flags to specify two different namespaces:
argocd
)When we do this change we also need to change the Helm Chart because if it's namespace scoped we don't need a ClusterRole and a ClusterRoleBinding but just a Role and a RoleBinding.
There is an ongoing discussion about incubating this project under the official argoproj-labs organization as per argoproj/argoproj#25
One of the feedback we got from the community is that the argocd-progressive-rollout
name might be confusing for the Argo users.
We should rename this project before cutting an alpha version:
applicationset-progressive-sync
controller, since it works with https://github.com/argoproj-labs/applicationset.ProgressiveSync
instead of ProgressiveRollout
.We need a way to signal to the APS that a progressive rollout is cancelled by a user.
That includes a new final state (Cancelled) and some way of notifying the APS object of the cancellation.
We have few methods of our ProgressiveRolloutReconciler
structure which they need a context.
Instead of using the context defined inside the Reconcile
method, we re-define it as a background context. This mean we lose all the information about the deadlines if we ever need to cancel the context for whatever reason.
We should instead propagate the context to the dependent methods
By default, we log at the debug level as per https://github.com/Skyscanner/argocd-progressive-rollout/blob/main/main.go#L58
We need to change the default log level to INFO and add a flag to be able to turn the DEBUG level on.
In #96 we added a predicate to the For
responsible for sending ProgressiveSync events.
While that helped us moving forward, it means the controller is not able to auto-heal in case something changes the CRD Status.
We should remove the predicate and make the reconciliation loop idempotent, so it should converge over time to the desired state.
Skyscanner doesn't have an official Docker registry where we can push the controller docker image.
A solution is on its way, so this issue is to move from the temporary solution in #35 to the official one.
Use KIND to spin up a kubernetes environment and:
When a progressive sync is completed, the controller sets the CRD status to Completed
.
In the case when a stage fails, the controller doesn't have the logic to set the progressive sync to Failed
. We already have a scheduler function to determine if the stage is failed, we need to update the ProgressiveSync object.
we spent some time making a helm chart to abstract all the complexity, might as well use it for our e2e environment...
Add the logic to tell ArgoCD to sync a target Application. We should document on which approach we are taking before implementing it.
Options:
argocd
CLI
argocd.argoproj.io/refresh
as per https://github.com/argoproj/argo-cd/blob/13b9b92c991874252e9e01dd8e94e69cb526827b/common/common.go#L121
The target selection returns the list as we get that from the controller-runtime client. We should have a predictable order because
Note that we don't want the same order for every ProgressiveRollout CRD as we would end up having the same "canary" cluster for every service. Using the PR CRD name as seed might be a good idea
Keeping this in utils
doesn't make sense.
Let's find a better name to refer to those values.
Originally posted by @maruina in #18 (comment)
Depends on #28
With the e2e environment in the Makefile, we can quickly spin up a new environment in github action and perform some e2e testing.
In the POC we used the ArgoCD cli to sync Applications.
PROs:
One feedback we got is to use an annotation. See argoproj/argoproj#25 (comment)
[ ] Decide between CLI and annotation, documenting why about the choice
In #8 we introduced the concept of status.stages
.
To complete the information about the status of each stage, we should extend status.stages
with information about the cluster status.
It should look like
status:
stages:
- name: canary in emea
phase: Completed
message: Stage completed
targets:
- cluster-1-eu-west-1a-1
- cluster-2-eu-central-1b-1
syncing:
- cluster-1-eu-west-1a-1
requeued: []
failed: []
completed:
- cluster-2-eu-central-1b-1
startedAt: 2019-07-10T08:23:18Z
finishedAt: 2019-07-10T08:40:18Z
Events allow users to see what is going on with a particular object, and allow automated processes to see and respond to them.
The recorder package is https://pkg.go.dev/k8s.io/client-go/tools/record
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.