Giter VIP home page Giter VIP logo

cluster-api-operator's Introduction

Cluster API Operator

Home for Cluster API Operator, a subproject of sig-cluster-lifecycle

โœจ What is Cluster API Operator?

The Cluster API Operator is a Kubernetes Operator designed to empower cluster administrators to handle the lifecycle of Cluster API providers within a management cluster using a declarative approach. It aims to improve user experience in deploying and managing Cluster API, making it easier to handle day-to-day tasks and automate workflows with GitOps.

This operator leverages a declarative API and extends the capabilities of the clusterctl CLI, allowing greater flexibility and configuration options for cluster administrators.

๐Ÿ“– Documentation

Please see our book for in-depth documentation.

๐ŸŒŸ Features

  • Offers a declarative API that simplifies the management of Cluster API providers and enables GitOps workflows.
  • Facilitates provider upgrades and downgrades making it more convenient for distributed teams and CI pipelines.
  • Aims to support air-gapped environments without direct access to GitHub/GitLab.
  • Leverages controller-runtime configuration API for a more flexible Cluster API providers setup.
  • Provides a transparent and effective way to interact with various Cluster API components on the management cluster.

๐Ÿค— Community, discussion, contribution, and support

You can reach the maintainers of this project at:

Pull Requests and feedback on issues are very welcome!

See also our contributor guide and the Kubernetes community page for more details on how to get involved.

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-api-operator's People

Contributors

adriananeci avatar alexander-demicev avatar blackliner avatar cblecker avatar charlie-haley avatar danil-grigorev avatar dependabot[bot] avatar fedosin avatar furkatgofurov7 avatar guettli avatar jackfrancis avatar joelspeed avatar k8s-ci-robot avatar karthik-k-n avatar kishen-v avatar kranurag7 avatar ljtill avatar maxfedotov avatar miyadav avatar mrbobbytables avatar nikparasyr avatar oprinmarius avatar zioproto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-operator's Issues

Enable cherry-picker for the repository

User Story

As a maintainer of the repo, in the future maintainers might want to backport some PRs' (bug fixes, documentation etc) to stable branches (i.e from main => release-0.1/0.2). To do that, we could make use k8s community provided cherry-picker which can take care of it for us easily.

Detailed Description

Example usage command, which can be commented on the PR targeting the main branch and cherry-picking of the PR is requested to release-0.2 branch:

/cherrypick release-0.2

Once PR targeting the main branch is merged, k8s cherry-pick bot should open a PR automatically towards the targeted branch and assign the commenter on the PR

Anything else you would like to add:
We need a PR in https://github.com/kubernetes/test-infra to enable it and I can work on it

[Miscellaneous information that will assist in solving the issue.]

/kind feature
/area ci

Scope down capi-operator-manager-role ClusterRole Permissions

User Story

As a operator I would like to narrow the {"apiGroups": ["*"], "resources": ["*"], "verbs": ["*"]} permissions used by capi-operator-manager-role for security reasons

Detailed Description

The * / * / * permissions requested are too broad for what the operator does. Can we scope these permissions down based on the objects that the operator creates?

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

E2E for upgrades

Create e2e test for checking that upgrades work properly. If providerSpec.version gets changed, a newer version should be deployed.

Allow for provider CRs in the same helm chart as the operator

User Story

As an operator I would like to define my own chart to manage the CAPI providers and include the operator as a subchart so I don't need to manage multiple charts independently.

Detailed Description

The current helm chart for the operator includes the CRDs as templates which makes impossible to bundle CRs for those CRDs in the same or parent chart. In this scenario, the first install will fail in the validation phase, since the CRDs won't be yet defined in the API server. Moreover, even if the first install is handled separately, any new field added to the CRDs will also fail during upgrade validations, since helm runs validations against the API server before applying any resource and at that time the installed CRDs will still be missing the new field included in the CRs. With the current setup, it's necessary two manage the two charts (once including the operator, another with the provider definitions) independently and orchestrate them in sequence.

Helm supports CRDs as first class citizens (the crds folder), making sure they are installed before any CR and allowing discovery validations to succeed when the template contains CRs that are instances of those CRDs. However, helm only supports creating CRDs the first time, not updating them. More info about the history of CRDs in helm and their challenges can we found here. This limitation makes using the crds folder for CRD updates a no-go.

I propose to modify the operator helm chart to apply CRDs from a Job invoked as pre-upgrade and pre-install hook. During chart installation/upgrade, helm would create a Job configured to apply all CRD manifests (using an image that contains kubectl). These CRDs could be injected in the Job pod through ConfigMaps. These jobs will be executed before any other resource is installed, ensuring the CRDs exist before the operator deployment is installed. Any consumer of this chart can write their own hooks (with a lower precedence that ours) to create/update the provider CRs after the CRDs have been installed/updated. CRDs and CRs won't be run through the initial helm validations before the hooks are created, but helm will stop the installation if any of the Jobs fail. Helm will cleanup automatically all these Jobs and ConfigMaps after the install/upgrade operation is done.

Drawing 2023-07-13 14 11 53 excalidraw

The drawback of this solution is that the CRDs won't be part of the helm release (won't be listed as part of the release resources and won't be deleted with helm unistall). We could circumvent this issue (if we wanted to) by using a post-delete Job that deletes them with kubectl in the same way it created them,

Anything else you would like to add:

I would love to contribute these changes to the project. But opening this issue first to align on the solution and make sure this is a feature the community is interested in. There might be other solutions, happy to discuss if anyone has other ideas.

/kind feature

Create a README

README.md that we currently have is the default one for kubernetes projects. We should write our own with the relevant information about the operator subproject.

E2E test for fetchConfig field

Create e2e test for checking that providerSpec.fetchConfig has exactly one of URL or Selector specified, if both of them are specified tests should fail.

v1alpha2?

Let's discuss ideas that require a change in the API version.

  1. ImageMeta can be presented as an image url, i.e. <repository>/<image_name>:<tag> instead of the current object.
  2. SecretName and SecretNamespace can be combined in one object called secret or configSecret:
secret:
  name: some_name
  namespace: default
  1. ComponentConfig is deprecated in 0.15.0, so we have to migrate our ManagerSpec to something else.

/kind feature

Rename Suffix of Conditions to "...Succeeded"

User Story

I want to automatically check all conditions.

This works fine if the developers follow the Cluster API Condition Proposal

Detailed Description

I wrote a small tool to check the conditions, and I found that cluster-stack-operator uses other suffixes:

controlplaneproviders kubeadm PreflightCheckPassed=True
controlplaneproviders kubeadm ProviderInstalled=True
bootstrapproviders kubeadm PreflightCheckPassed=True
bootstrapproviders kubeadm ProviderInstalled=True
coreproviders cluster-api PreflightCheckPassed=True
coreproviders cluster-api ProviderInstalled=True

Would you accept a PR which updates the condition to match the above proposal?

For example: ProviderInstalled --> ProviderInstallSucceeded

If you want to check your cluster, you can use:

go run github.com/guettli/check-conditions@latest all

E2E for provider version.

Create e2e test for checking that providerSpec.version contains a valid version. If the version field contains invalid data, tests should fail.

Support clusterctl upgrade plan

User Story

As a user/operator I would like to determine the current CAPI* versions being used on the management cluster as well as if there is an upgrade available using clusterctl upgrade plan so that I don't have to take as long to manually figure this out probing deployments or other manual means.

Detailed Description
Only support clusterctl upgrade plan to determine current and upgradeable versions for the mangement cluster. clusterctl upgrade apply out of scope for this feature request. Currently if you run this against a cluster which has CAPI and the provider installed, you get this:

Checking new release availability...
Error: invalid management cluster: there should a core provider, found 0

/kind feature

Support addon providers such as CAAPH

User Story

As a operator I would like to be able to deploy addon providers such as CAAPH using the operator

Detailed Description

We are looking to slowly move to the operator instead of installing the various capi controllers via clusterctl. We will start testing with the next release (and the helm chart improvements). One piece that is missing for us is support for addon providers. Currently there is only caaph ( still in early alpha releases )and capi added support for clusterctl just recently in 1.5.0.

Anything else you would like to add:

Note 1: not having support for addon providers is not a blocker for us, but it would be very nice if there will be support in the future to have everything capi related deployed by the operator
Note 2: I could try to find some time to work on this ( have contributed to some other capi projects before ), but i would appreciate some initial pointers in what needs to be done in order to support this

Thank you :)

/kind feature

Unable to run e2e tests locally

What steps did you take and what happened:
When running e2e tests locally using Makefile targets (make test-e2e or make test-e2e-run) both will fail, IF you have never run the make-docker-build-e2e target on the repo (if you have an operator image with dev tag locally, you would not see this problem, but that is not the case for everyone) which builds an operator image with dev tag.

That is because make test-e2e-run sets E2E_OPERATOR_IMAGE to http://gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev, spins up a kind cluster and loads that image into cluster:

Creating cluster "capi-operator-e2e" ...
 โ€ข Ensuring node image (kindest/node:v1.27.0) ๐Ÿ–ผ  ...
 โœ“ Ensuring node image (kindest/node:v1.27.0) ๐Ÿ–ผ
 โ€ข Preparing nodes ๐Ÿ“ฆ   ...
 โœ“ Preparing nodes ๐Ÿ“ฆ 
 โ€ข Writing configuration ๐Ÿ“œ  ...
 โœ“ Writing configuration ๐Ÿ“œ
 โ€ข Starting control-plane ๐Ÿ•น๏ธ  ...
 โœ“ Starting control-plane ๐Ÿ•น๏ธ
 โ€ข Installing CNI ๐Ÿ”Œ  ...
 โœ“ Installing CNI ๐Ÿ”Œ
 โ€ข Installing StorageClass ๐Ÿ’พ  ...
 โœ“ Installing StorageClass ๐Ÿ’พ
  INFO: The kubeconfig file for the kind cluster is /var/folders/cz/q854zvyj34nccdhvq_4cxhd80000gp/T/e2e-kind2691374665
  INFO: Loading image: "gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev"
  INFO: Image gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev not present in local container image cache, will pull
  INFO: [WARNING] Unable to load image "gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev" into the kind cluster "capi-operator-e2e": error pulling image "gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev": failure pulling container image: Error response from daemon: manifest for gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev not found: manifest unknown: Failed to fetch "dev" from request "/v2/k8s-staging-capi-operator/cluster-api-operator/manifests/dev".

Later on in the tests, operator deployment will not come up properly and fail:

state:
      waiting:
        message: Back-off pulling image "[gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev](http://gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev)"
        reason: ImagePullBackOff

What did you expect to happen:
run make test-e2e and make test-e2e-run successfully

To reproduce:

# in case you have run the `make-docker-build-e2e` before
$docker rmi gcr.io/k8s-staging-capi-operator/cluster-api-operator:dev
$make test-e2e-run

Tests time out waiting for capi-operator-system/capi-operator-controller-manager deployment to be available

Additional information:
I see we have 2 options in this case:

  1. passing make-docker-build-e2e target to make test-e2e-run so that we always build the image first before running e2e tests locally
  2. leave it to the user and document it properly somewhere mentioning that, running make-docker-build-e2e is a prerequisite for successfully running e2e tests locally

Any other suggestions?

Environment:

  • Cluster-api-operator version:main
  • Cluster-api version:v1.4.2
  • Minikube/KIND version:1.27.0
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):macOS

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Update CAPI secret mechanism to supported method

User Story

As a developer/user/operator I would like to ensure that the secret authentication method for CAPI is utilizing the current supported method.

Detailed Description
The current method the operator helm chart uses to install is the deprecated use of environment variables.

  1. clusterctl init
  2. create k8s secret with creds
  3. refer to the secret via AzureClusterIdentity ref in AzureCluster

Is the current supported way to authenticate and should be reflected by default in the installation process.

/kind feature

Kustomize deprecation warnings

What steps did you take and what happened:
When I generate release manifests with make release-manifests I get multiple kustomize warnings

โžญ make release-manifests
/Users/mfedosin/projects/cluster-api-operator/hack/tools/bin/kustomize-v5.0.1 build ./config/default > out/operator-components.yaml
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.

What did you expect to happen:
I'd like to see all warnings fixed.

Environment:

  • Cluster-api-operator version: master

/kind bug
/area release

CAPI v1.5.0-beta.0 has been released and is ready for testing

CAPI v1.5.0-beta.0 has been released and is ready for testing.
Looking forward to your feedback before CAPI 1.5.0 release, the 25th July 2023!

For quick reference

Following are the planned dates for the upcoming releases

Release Expected Date
v1.5.0-beta.x Tuesday 5th July 2023
release-1.5 branch created (Begin [Code Freeze]) Tuesday 11th July 2023
v1.5.0-rc.0 released Tuesday 11th July 2023
release-1.5 jobs created Tuesday 11th July 2023
v1.5.0-rc.x released Tuesday 18th July 2023
v1.5.0 released Tuesday 25th July 2023

Helm uninstall on installed operator is not working properly

What steps did you take and what happened:
Following quickstart guide:

  1. install the operator using https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/docs/README.md#method-2-use-helm-charts
  2. Add some CAPI provider in any way, create a CAPI cluster using this provider
  3. helm uninstall capi-operator -n capi-operator-system

This command does not finish with success code. A manual intervention is required. Additionally, if the capi operator deployment is removed, none of the created provider specifications will be removed, leaving only manual removal process.

What did you expect to happen:
Helm uninstall command to remove CAPI operator and all created resources managed by it, and then finish successfully.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-operator version: v0.5.1
  • Cluster-api version: v1.4.4
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version): v1.26.3
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Cluster API Operator keeps refetching data from the specified URL after successful provider installation

What steps did you take and what happened:

Deploy a provider either from a predefined location or from a custom URL, make sure it is ready. For example:

apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  version: v1.4.2

During each subsequent reconciliation (every 10 minutes by default), the operator will repeatedly fetch the configuration from GitHub. If a network issue arises or GitHub is unavailable, the operator will report an error and mark the provider as Unhealthy, even though it is functioning properly.

What did you expect to happen:

I would like the CAPI operator to retrieve the provider configuration solely during the initial installation. The status of my provider should not be impacted by any GitHub-related issues.

Environment:

  • Cluster-api-operator version: main
  • Cluster-api version: 1.4.2
  • Minikube/KIND version: 0.26.2
  • Kubernetes version: (use kubectl version): 1.26
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Helm chart support for overriding images and image pull secrets

User Story

As a cluster fleet operator I would like to be able to use images from private registries in order to deploy the CAPI operator in an air gapped environment.

Detailed Description

In order to be able to allow a fully air gapped installation, the CAPI Operator helm chart should expose ways to override the images and image pull secrets used in the controller manager deployment at a minimum.

Anything else you would like to add:

Not sure if it is already tracked in #130

/kind feature

E2E for minimal configuration

Create e2e tests for testing minimal configuration, currently, we are only testing CoreProvider, similar tests are required for BootstrapProvider, InfrastructureProvider, and ControlPlaneProvider

CI: post-cluster-api-operator-push-images jobs failing for 89 days

What steps did you take and what happened:
See kubernetes-sigs/cluster-api#8784

What did you expect to happen:
post-cluster-api-operator-push-images job passing

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Job was running fine and passing until March 7th, 2023 and started failing from then. Based on quick look from the older run jobs and latest failing job run found out "gcc" executable file is not found in the $PATH and thus making release-staging Makefile target to fail:

go: downloading github.com/mailru/easyjson v0.7.7
go: downloading github.com/golang/protobuf v1.5.2
go: downloading github.com/josharian/intern v1.0.0
# sigs.k8s.io/kustomize/kustomize/v5
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exec: "gcc": executable file not found in $PATH
make[2]: Leaving directory '/workspace'
make[2]: *** [Makefile:169: /workspace/hack/tools/bin/kustomize-v5.0.1] Error 2
make[1]: *** [Makefile:329: staging-manifests] Error 2
make[1]: Leaving directory '/workspace'
make: *** [Makefile:421: release-staging] Error 2
ERROR
ERROR: build step 0 "gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20220609-2e4c91eb7e" failed: step exited with non-zero status: 2

To reproduce:

$docker run -it --entrypoint /bin/bash gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20220609-2e4c91eb7e
bash-5.1# go install sigs.k8s.io/kustomize/kustomize/[email protected]
# sigs.k8s.io/kustomize/kustomize/v5
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exec: "gcc": executable file not found in $PATH

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Migrate operator prow jobs to EKS (Community) clusters

User Story

This issue is for discussing and tracking efforts to migrate existing operator prow jobs over to the new infrastructure provided by test-infra.

Detailed Description
SIG testing and k8s infra folks suggesting to migrate all CI jobs to Community Clusters as per https://groups.google.com/a/kubernetes.io/g/dev/c/H5-G2bQGgds/m/INw7yZs1BQAJ.
There is also an umbrella issue which lists all the repos with default cluster jobs found in kubernetes/test-infra#29722 and Cluster API Operator is among them.

Anything else you would like to add:
We can follow steps describe in kubernetes/test-infra#29722 (comment)

/kind feature
/area ci

Bump CAPI to v1.5.0

Describe the solution you'd like
We want to bump CAPI to 1.5.0 (as soon as its released).

Anything else you would like to add:
We had an issue to bump to the RC which is already done: #183

/kind feature
/help

CAPZ provider does not install Azure Service Operator (ASO)

What steps did you take and what happened:
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=azure --set addon=helm --set cert-manager.enabled=true --wait --timeout 90s

No errors, but the cluster API provider Azure didn't install the now required Azure Service Operator pre-requisite on the management cluster.

What did you expect to happen:
CAPZ provider is properly installed.

Environment:

  • Cluster-api-operator version: 0.6.0
  • Cluster-api version: 1.5.1
  • Minikube/KIND version:
NAME             STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                       CONTAINER-RUNTIME
docker-desktop   Ready    control-plane   3d23h   v1.27.2   192.168.65.4   <none>        Docker Desktop   5.10.102.1-microsoft-standard-WSL2   docker://24.0.6

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Tasks to improve operator CI infra

User Story

General CI improvements for the operator repo

Detailed Description

/kind feature

Installing the chart creates cert-manager CRDs

What steps did you take and what happened:
helm install --repo https://kubernetes-sigs.github.io/cluster-api-operator capi-operator cluster-api-operator -n capi-system --create-namespace

What did you expect to happen:
That the operator would get installed, nothing more.

Anything else you would like to add:
I understand that the operator might depend on cert-manager, but then it should just depend on it (which is currently not possible because of cert-manager/cert-manager#6179 (comment)) and otherwise just shouldn't do anything with the cert-manager deployment and just require the user to install it manually. Because now I can't install cert-manager my way, as the CRDs already exist ๐Ÿ˜…, I'd have to skipCRDs in my gitops system.

I guess this isn't immediately fixable, but then this issue is just to track the problem ๐Ÿ˜

Environment:

  • Cluster-api-operator version: N/A
  • Cluster-api version: N/A
  • Minikube/KIND version: N/A
  • Kubernetes version: (use kubectl version): N/A
  • OS (e.g. from /etc/os-release): N/A
  • Helm Chart version: 0.6.0 <- missing from the template

/kind bug
/area helm-chart <- I couldn't find any fitting area

Move mgt-cluster into wl-cluster

User Story

As a operator of clusters I would like to move the mgt-cluster into the wl-cluster after the wl-cluster was successfuly created.

Up to now we rely on the clusterctl command.

Since we plan to use cluster-api-operator, would be great if the operator could move the mgt-cluster.

/kind feature

E2E test for CoreProvider

Create e2e test for checking that only one instance of core/infrastructure is allowed, if more than one are created appropriate condition should be set.

Unable to install from helm chart with Docker Desktop K8s

What steps did you take and what happened:
Run this command against local docker desktop kubernetes:
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=azure --set cert-manager.enabled=true --wait --timeout 90s

And produced this error (debug output included)

install.go:200: [debug] Original chart version: ""
install.go:217: [debug] CHART PATH: /home/dtzar/.cache/helm/repository/cluster-api-operator-0.6.0.tgz

Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://kubernetes.docker.internal:6443/version": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
helm.go:84: [debug] Get "https://kubernetes.docker.internal:6443/version": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Kubernetes cluster unreachable
helm.sh/helm/v3/pkg/kube.(*Client).IsReachable
        helm.sh/helm/v3/pkg/kube/client.go:127
helm.sh/helm/v3/pkg/action.(*Install).RunWithContext
        helm.sh/helm/v3/pkg/action/install.go:222
main.runInstall
        helm.sh/helm/v3/cmd/helm/install.go:287
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:145
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:968
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598
INSTALLATION FAILED
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:147
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:968
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598

What did you expect to happen:
The operator, provider, and cert manager, would install.
When I browse manually to https://kubernetes.docker.internal:6443/version it does work (albeit it does say that the site is insecure / invalid certificate) and gives me this output:

{
    "major": "1",
    "minor": "27",
    "gitVersion": "v1.27.2",
    "gitCommit": "7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647",
    "gitTreeState": "clean",
    "buildDate": "2023-05-17T14:13:28Z",
    "goVersion": "go1.20.4",
    "compiler": "gc",
    "platform": "linux/amd64"
}

Environment:

  • Cluster-api-operator version: 0.6.0
  • Cluster-api version: latest with 0.6.0
  • Minikube/KIND version: Docker Desktop 4.22.1
  • Kubernetes version: (use kubectl version): 1.27.2
  • OS (e.g. from /etc/os-release): Windows 11 with Ubuntu 22.04 WSL

/kind bug

Operator doesn't reconcile providers if their spec values changed

What steps did you take and what happened:
Deploy a provider specifying some deployment arguments. For instance:

apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
 name: aws
 namespace: capa-system
spec:
 version: v2.1.4
 secretName: aws-variables
 deployment:
   containers:
   - name: manager
     image:
         repository: "gcr.io/myregistry"
         name: "capa-controller"
         tag: "v2.1.4-foo"

Make sure that capa started successfully.

Update image in the provider spec:

apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
 name: aws
 namespace: capa-system
spec:
 version: v2.1.4
 secretName: aws-variables
 deployment:
   containers:
   - name: manager
     image:
         repository: "gcr.io/myregistry"
         name: "capa-controller"
         tag: "v2.1.4-bar"

What did you expect to happen:

Operator should reconcile capa provider and update image for the deployment, but nothing happens.

Environment:

  • Cluster-api-operator version: main

/kind bug

CRDs aren't inside the `crds` folder

Anything else you would like to add:
This goes against helm best-practices. I understand that for non-gitops workflows having the CRDs inside the templates folder (with an if!) is necessary, but they should still be inside the crds folder for normal operations, like not needing post-* hooks for creating CRs, and gitops for CRD upgrades

Environment:

  • Cluster-api-operator version: N/A
  • Cluster-api version: N/A
  • Minikube/KIND version: N/A
  • Kubernetes version: (use kubectl version): N/A
  • OS (e.g. from /etc/os-release): N/A
  • Helm Chart version: 0.6.0

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Cannot add infrastructure provider after the initial installation

What steps did you take and what happened:
I tried adding an infrastructure provider during an upgrade, which didn't work because they are installed via post-install hooks, which of course don't trigger during any phase of an upgrade.

What did you expect to happen:
That the corresponding resources are just created like any other normal resource instead of with some kind of hook.
In the short term that the post-upgrade hook be added as well.

Anything else you would like to add:
The root cause for this problem is that the CRDs aren't inside the crds folder, which I addressed in #282

Environment:

  • Cluster-api-operator version: N/A
  • Cluster-api version: N/A
  • Minikube/KIND version: N/A
  • Kubernetes version: (use kubectl version): N/A
  • OS (e.g. from /etc/os-release): N/A
  • Helm Chart version: 0.6.0

/kind bug

CRDs are not deleted when provider is removed

What steps did you take and what happened:

  1. Install a provider:
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  version: v1.4.2
  1. Delete the provider
  2. Ensure that all previously created CRDs from cluster.x-k8s.io are not deleted together with the provider.

What did you expect to happen:
All created resources should be removed.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-operator version: main
  • Cluster-api version: 1.4.2
  • Minikube/KIND version: 1.27
  • Kubernetes version: (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Provider not getting installed

Provider not getting installed. Here is the logs from the Cluster API Operator pod.

...
I0812 14:58:06.509139       1 controller.go:185]  "msg"="Starting Controller" "controller"="controlplaneprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="ControlPlaneProvider"
I0812 14:58:06.612648       1 controller.go:219]  "msg"="Starting workers" "controller"="controlplaneprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="ControlPlaneProvider" "worker count"=1
I0812 14:58:06.612704       1 controller.go:219]  "msg"="Starting workers" "controller"="bootstrapprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="BootstrapProvider" "worker count"=1
I0812 14:58:06.612655       1 controller.go:219]  "msg"="Starting workers" "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "worker count"=1
I0812 14:58:06.612760       1 controller.go:219]  "msg"="Starting workers" "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "worker count"=1
I0812 14:58:06.613376       1 genericprovider_controller.go:63]  "msg"="Reconciling provider" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.614999       1 preflight_checks.go:56]  "msg"="Performing preflight checks" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.615692       1 preflight_checks.go:205]  "msg"="Preflight checks passed" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.615872       1 phases.go:222]  "msg"="No configuration secret was specified" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.625571       1 manifests_downloader.go:77]  "msg"="Downloading provider manifests" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
Complete log
Defaulted container "manager" out of: manager, kube-rbac-proxy
I0812 14:57:51.069384       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:8080"
I0812 14:57:51.071038       1 webhook.go:158] controller-runtime/builder "msg"="Registering a mutating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"CoreProvider"} "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-coreprovider"
I0812 14:57:51.071863       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-coreprovider"
I0812 14:57:51.072029       1 webhook.go:188] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"CoreProvider"} "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-coreprovider"
I0812 14:57:51.072148       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-coreprovider"
I0812 14:57:51.072343       1 webhook.go:158] controller-runtime/builder "msg"="Registering a mutating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"BootstrapProvider"} "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-bootstrapprovider"
I0812 14:57:51.072459       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-bootstrapprovider"
I0812 14:57:51.072573       1 webhook.go:188] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"BootstrapProvider"} "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-bootstrapprovider"
I0812 14:57:51.072693       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-bootstrapprovider"
I0812 14:57:51.072907       1 webhook.go:158] controller-runtime/builder "msg"="Registering a mutating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"ControlPlaneProvider"} "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-controlplaneprovider"
I0812 14:57:51.073036       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-controlplaneprovider"
I0812 14:57:51.073188       1 webhook.go:188] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"ControlPlaneProvider"} "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-controlplaneprovider"
I0812 14:57:51.073306       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-controlplaneprovider"
I0812 14:57:51.073478       1 webhook.go:158] controller-runtime/builder "msg"="Registering a mutating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"InfrastructureProvider"} "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-infrastructureprovider"
I0812 14:57:51.073596       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-operator-cluster-x-k8s-io-v1alpha1-infrastructureprovider"
I0812 14:57:51.073739       1 webhook.go:188] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"operator.cluster.x-k8s.io","Version":"v1alpha1","Kind":"InfrastructureProvider"} "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-infrastructureprovider"
I0812 14:57:51.073884       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-operator-cluster-x-k8s-io-v1alpha1-infrastructureprovider"
I0812 14:57:51.074046       1 main.go:164] setup "msg"="starting manager" "version"=""
I0812 14:57:51.074220       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server"
I0812 14:57:51.074854       1 certwatcher.go:161] controller-runtime/certwatcher "msg"="Updated current TLS certificate"
I0812 14:57:51.075114       1 server.go:273] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=9443
I0812 14:57:51.075341       1 internal.go:360]  "msg"="Starting server" "addr"={"IP":"::","Port":8081,"Zone":""} "kind"="health probe"
I0812 14:57:51.075599       1 certwatcher.go:115] controller-runtime/certwatcher "msg"="Starting certificate watcher"
I0812 14:57:51.075783       1 server.go:50]  "msg"="starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I0812 14:57:51.075942       1 leaderelection.go:245] attempting to acquire leader lease capi-operator-system/controller-leader-election-capi-operator...
I0812 14:58:06.507611       1 leaderelection.go:255] successfully acquired lease capi-operator-system/controller-leader-election-capi-operator
I0812 14:58:06.508587       1 controller.go:177]  "msg"="Starting EventSource" "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "source"="kind source: *v1alpha1.CoreProvider"
I0812 14:58:06.508622       1 controller.go:185]  "msg"="Starting Controller" "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider"
I0812 14:58:06.508828       1 controller.go:177]  "msg"="Starting EventSource" "controller"="bootstrapprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="BootstrapProvider" "source"="kind source: *v1alpha1.BootstrapProvider"
I0812 14:58:06.508852       1 controller.go:185]  "msg"="Starting Controller" "controller"="bootstrapprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="BootstrapProvider"
I0812 14:58:06.508946       1 controller.go:177]  "msg"="Starting EventSource" "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "source"="kind source: *v1alpha1.InfrastructureProvider"
I0812 14:58:06.508961       1 controller.go:185]  "msg"="Starting Controller" "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider"
I0812 14:58:06.509088       1 controller.go:177]  "msg"="Starting EventSource" "controller"="controlplaneprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="ControlPlaneProvider" "source"="kind source: *v1alpha1.ControlPlaneProvider"
I0812 14:58:06.509139       1 controller.go:185]  "msg"="Starting Controller" "controller"="controlplaneprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="ControlPlaneProvider"
I0812 14:58:06.612648       1 controller.go:219]  "msg"="Starting workers" "controller"="controlplaneprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="ControlPlaneProvider" "worker count"=1
I0812 14:58:06.612704       1 controller.go:219]  "msg"="Starting workers" "controller"="bootstrapprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="BootstrapProvider" "worker count"=1
I0812 14:58:06.612655       1 controller.go:219]  "msg"="Starting workers" "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "worker count"=1
I0812 14:58:06.612760       1 controller.go:219]  "msg"="Starting workers" "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "worker count"=1
I0812 14:58:06.613376       1 genericprovider_controller.go:63]  "msg"="Reconciling provider" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.614999       1 preflight_checks.go:56]  "msg"="Performing preflight checks" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.615692       1 preflight_checks.go:205]  "msg"="Preflight checks passed" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.615872       1 phases.go:222]  "msg"="No configuration secret was specified" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"
I0812 14:58:06.625571       1 manifests_downloader.go:77]  "msg"="Downloading provider manifests" "CoreProvider"={"name":"cluster-api","namespace":"capi-system"} "controller"="coreprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="CoreProvider" "name"="cluster-api" "namespace"="capi-system" "reconcileID"="cf655d07-8cf5-4ea9-87d2-81dbe915f38a"

Add prerequisites to provider initialization

User Story

I would like the operator to execute provider bootstrap operations, e.g.: clusterawsadm bootstrap iam create-cloudformation-stack in order to provision all required roles an policies

Detailed Description

Operator initializes provider but omits prerequisites such as the ones indicated in the AWS provider documentation

Helm chart support more comprehensive installation

User Story

As an operator I would like to have a single helm chart to install not only the operator, but the entire e2e stack to have a functioning management cluster with the provider installed so that it is quick and easy to maintain and upgrade.

Detailed Description
There are many disconnected steps in the quick start guide and the helm chart should be able to support any of the providers versus manually applying various independent yaml files.

/kind feature

Handle error case for invalid GitHub token or unable to fetch provider repository

What steps did you take and what happened:
Followed the instructions at the getting started guide and the Azure infrastructure provider was not installed and there is no error/warning messages of what went wrong.

Specifically, I did the following steps:

helm repo add jetstack https://charts.jetstack.io
helm repo udpate
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.12.1 \
  --set installCRDs=true
helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system
k create ns capi-system
k apply -f ./capi.yaml
k create ns capz-system
k apply -f ./capz.yaml
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: CoreProvider
metadata:
  name: cluster-api
  namespace: capi-system
spec:
  version: v1.4.2
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-variables
  namespace: capz-system
type: Opaque
stringData:
  AZURE_CLIENT_ID_B64: M#
  AZURE_CLIENT_SECRET_B64: W#
  AZURE_SUBSCRIPTION_ID_B64: M#
  AZURE_TENANT_ID_B64: O#
  github-token: Z#
---
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
 name: azure
 namespace: capz-system
spec:
 version: v1.9.2
 secretName: azure-variables

What did you expect to happen:
The CAPZ infrastructure provider was installed.

Anything else you would like to add:
The k apply -f capz.yaml succeded.

secret/azure-variables created
infrastructureprovider.operator.cluster.x-k8s.io/azure created

The capi-operator-controller-manager pod logs also appears to only have "success" listed.

I0601 17:54:11.609633       1 genericprovider_controller.go:56]  "msg"="Reconciling provider" "InfrastructureProvider"={"name":"azure","namespace":"capz-system"} "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "name"="azure" "namespace"="capz-system" "reconcileID"="b975804e-6e58-4618-9304-47030b5593c2"
I0601 17:54:11.697207       1 genericprovider_controller.go:56]  "msg"="Reconciling provider" "InfrastructureProvider"={"name":"azure","namespace":"capz-system"} "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "name"="azure" "namespace"="capz-system" "reconcileID"="e73ed648-5f89-4b68-ab63-6dd7055707f1"
I0601 17:54:11.697355       1 preflight_checks.go:51]  "msg"="Performing preflight checks" "InfrastructureProvider"={"name":"azure","namespace":"capz-system"} "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "name"="azure" "namespace"="capz-system" "reconcileID"="e73ed648-5f89-4b68-ab63-6dd7055707f1"
I0601 17:54:11.697485       1 preflight_checks.go:153]  "msg"="Preflight checks passed" "InfrastructureProvider"={"name":"azure","namespace":"capz-system"} "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "name"="azure" "namespace"="capz-system" "reconcileID"="e73ed648-5f89-4b68-ab63-6dd7055707f1"
I0601 17:54:11.697525       1 phases.go:115]  "msg"="Loading provider" "InfrastructureProvider"={"name":"azure","namespace":"capz-system"} "controller"="infrastructureprovider" "controllerGroup"="operator.cluster.x-k8s.io" "controllerKind"="InfrastructureProvider" "name"="azure" "namespace"="capz-system" "reconcileID"="e73ed648-5f89-4b68-ab63-6dd7055707f1"

Environment:

  • Cluster-api-operator version: 0.2.0
  • Cluster-api version: 1.4.2
  • Minikube/KIND version: Kind 0.19.0
  • Kubernetes version: (use kubectl version): 1.27.1
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 11 (bullseye)

/kind bug
/area provider/azure

Introduce the webhook for all the Resources

User Story

Add webhooks for following kinds

  1. CoreProvider
  2. BootstrapProvider
  3. ControlPlaneProvider
  4. InfrastructureProvider

Detailed Description

Anything else you would like to add:

/kind feature

cert-manager updates and version management

User Story

As an operator I would like the CAPI operator to manage updates to cert-manager for me so it's kept in sync with my providers' dependencies.

Detailed Description

cert-manager support was recently introduced in #144. This facilitates a lot the first CAPI bootstrap, making it almost a one command task.

I understand that initially it was made a non-goal to manage cert-manager and that was left to the user. However, I do agree that the path initiated by #144 is the right one, given that most users want a simple and streamlined experience, leaving the dependency management to the CAPI tooling. I believe this is specially true for users coming from clusterctl, where cert-manager is automatically managed (installation and updates). And in this new path, I think there are few places where the experience can be improved:

  • cert-manager CRDs. Currently, our chart includes cert-manager as a subchart and including the upstream cert-manager CRDs in the crds folder. This works great for the initial installation, but any future cert-manager update that includes changes to the CRD will fail, since helm won't update them.
  • cert-manager and operator coupled updates. If core CAPI starts requiring a new version of cert-manager, cert-manager will need to be updated before the providers are updated. Assuming that in this case we will release a new version of the chart and bump the cert-manager version, that means now the operator needs to be updated before the providers. This kind of coupling is not ideal, since the operator aims at being independent from the version of the operators that are installed. Moreover, it requires users to keep track of the upstream CAPI dependencies and cross check them with our chart.

Ideally, we would handle these two scenarios automatically by updating cert-manager to the required version by the core provider. This eliminates the need for the user to keep track of cert-manager versions, run manual CRD upgrades, check the version required upstream and ultimately streamlines the CAPI management experience in the same way clusterctl does. However, this presents some challenges:

  • Map cert-manager version to a core provider version. Currently, there is not an easy way (and even less an "API friendly" way) to know the cert-manager version pinned by a particular CAPI version.
  • The operator webhooks depend on cert-manager. The operator itself depends on cert-manager, hence it can't wait until the first CoreProvider is created before installing it. I believe the solution here is to maintain something similar to what the chart does today, install a default version but only during helm install. Then let the operator update that version based on the selected CoreProvider.
  • cert-manager updates in air-gapped environments. We would need to implement something similar to what is done today with ConfigMaps for providers. Although for this case, it might be better for users to manage cert-manager themselves, since they would need to keep track of the versions either way.

Alternatively, if we decided that moving this responsibility to the operator code is too much and/or we wanted to address this issues sparely, we could manage cert-manage CRDs updates using helm hooks and a kubernetes Job, similar to what I propose in #188.

Anything else you would like to add:

This is not a full fledged proposal and there are definitely some gaps in the design, but looking for feedback before investing more time on this.

/kind feature

PR this subproject in the k/community sigs.yaml

the k/community repository maintainers a list of subproject for each SIG (in this case SIG CL) in a sigs.yaml file.
each new project must be PRed by the newly established maintainers into the sigs.yaml and the SIG leads must LGTM.

we are currently doing some cleanup in our portion of the sigs.yaml, but once this PR merges you can follow the existing YAML structure to PR this project in the sigs.yaml:
kubernetes/community#6402

steps:

  • edit sigs.yaml
  • run make
  • commit / PR the changes

E2E for configmap configuration

Create e2e test for checking that providerSpec.fetchConfig.selector works as expected. If specified it should fetch components from a configmap and not use GitHub.

Helm chart install - Error INSTALLATION FAILED ... "capi-system" already exists

What steps did you take and what happened:

helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=azure --set addon=helm --set cert-manager.enabled=true --wait --timeout 90s

Error: INSTALLATION FAILED: failed post-install: warning: Hook post-install cluster-api-operator/templates/infra-conditions.yaml failed: 1 error occurred:
        * object is being deleted: namespaces "capi-system" already exists

What did you expect to happen:
Install completed with no errors.

Anything else you would like to add:
This problem does not seem to happen when you install the infrastructure provider separately. For instance, this succeeds:
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set addon=helm --set cert-manager.enabled=true --wait --timeout 90s

Environment:

Cluster-api-operator version: 0.6.0
Cluster-api version: latest with 0.6.0
Minikube/KIND version: Docker Desktop 4.22.1
Kubernetes version: (use kubectl version): 1.27.2
OS (e.g. from /etc/os-release): Windows 11 with Ubuntu 22.04 WSL

/kind bug

Merge content from ConfigMaps

ProviderSpec.FetchConfiguration support a label selector for providing ConfigMap as a source for provider's components https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/api/v1alpha1/provider_types.go#L186, if a provider decides to split it's components into multiple configmaps this won't work because our code doesn't support it https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/controllers/phases.go#L216. We should merge content from ConfigMaps of the same provider version into one piece of data before storing it in MemoryRepository https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/controllers/phases.go#L239.

/kind feature

E2E test for deleting provider

Create e2e tests for deleting a provider, if the provider was deleted all components should be removed from the cluster.

Infrastructure provider not added to the clusterctl providers list when fetchconfiguration provided by configmap

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

i want to install my local ovh infrastructure provider using the cluster api operator. The provider is not in a github repo so i created the following infrastructure provider ressource.


apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
  name:  infrastructure-capovh
  namespace: capi-system
spec:
  version: v0.1.0
  secretName: ovh-variables
  secretNamespace: capi-system
  fetchConfig:
    selector:
      matchLabels:
         cluster.x-k8s.io/provider: infrastructure-capovh

and created the config map containing the metadata and components .yaml


apiVersion: v1
kind: ConfigMap
metadata:
  name: v0.1.0
  namespace: capi-system
  labels:
    cluster.x-k8s.io/provider: infrastructure-capovh
    provider.cluster.x-k8s.io/version: v0.1.0  
data:
  metadata: |
    apiVersion: clusterctl.cluster.x-k8s.io/v1alpha3
    releaseSeries:
    - major: 0
      minor: 1
      contract: v1beta1
  components: |
    apiVersion: v1
    kind: Namespace

i expected the operator to be able to fetch the infrastructure-components.yaml and install the provider. but it returned this error .

msg"="Reconciler error" "error"="failed to get configuration for the InfrastructureProvider with name infrastructure-capov โ”‚
โ”‚ h. Please check the provider name and/or add configuration for new providers using the .clusterctl config file" "InfrastructureProvider"={"name":"infrastructure-capovh","namespace": โ”‚
โ”‚ "capi-system"} "controller"="infrastructureprovider" "controllerGroup"="[operator.cluster.x-k8s.io](http://operator.cluster.x-k8s.io/)" "controllerKind"="InfrastructureProvider" "name"="infrastructure-capovh" "namespac โ”‚
โ”‚ e"="capi-system" "reconcileID"="9fea4b23-53c0-42a5-baa0-613f86c67dac"

the reason of this error is that the provider name is not added to the clusterctl list in case when not url is provided to fetch config but slector is provided . the current implementation only handle the url case.

https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/internal/controller/phases.go#L220-L224

Environment:

  • Cluster-api-operator version: 0.4.0
  • Cluster-api version:
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Feature gates are not injected into manager pod

What steps did you take and what happened:

When I want to set custom feature gates for a provider manager, they are ignored

Example:

apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: CoreProvider
metadata:
  name: cluster-api
spec:
  version: v1.4.3
  manager:
    featureGates: 
      MachinePool: true
      ClusterResourceSet: true
      ClusterTopology: true
      RuntimeSDK: false
      LazyRestmapper: false    

For this manifests operator doesn't modify feature gates, and leaves default values, i.e. --feature-gates=MachinePool=false,ClusterResourceSet=false,ClusterTopology=false,RuntimeSDK=false,LazyRestmapper=false

What did you expect to happen:

Feature gates should be updated like --feature-gates=MachinePool=true,ClusterResourceSet=true,ClusterTopology=true,RuntimeSDK=false,LazyRestmapper=false

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-operator version:
  • Cluster-api version:
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.