open-cluster-management-io / ocm Goto Github PK

View Code? Open in Web Editor NEW

728.0 728.0 89.0 91.36 MB

Core components in the OCM project. Report here if you found any issues in OCM.

Home Page: https://open-cluster-management.io

License: Apache License 2.0

Shell 0.56% Makefile 0.56% Go 98.72% Python 0.12% Smarty 0.04%

ocm's People

Contributors

Stargazers

Watchers

Forkers

mikeshng qiujian16 p4ali nyechiel yue9944882 lonelycz ogaye-ibm apertus-dev xkcoding luobo911 zhm022726 ycyaoxdu jiahaowei-rh haoqing0110 tomerfi morvencao zhujian7 chunhuihu zmberg xuezhaojun ilonashishov chlam4 xajxiang deepsm007 geek3carl maoyangliu ouyangxiaochen sanyu isabella232 dockerymick reetika-vyas elgnay chenz4027 maximsava12 clyang82 eternalerrors keremceliker yanmxa mprahl goga1992 b33pl0g1c lkfadeaway sali2801 youhangwang o-farag skeeey kluster-manager stolostron zhiweiyin318 johsonluo haojue nitishchauhan0022 ldpliu cloudgeek7 anandf serngawy kannon92 rudyares dhaiducek aidenpan0x juner417 step-security-bot zhujian-org haowells grdryn aii-nozomu-oki minatoaquamk2 dongbeiqing91 juaby benzaidfoued emysaiki andreyod imryao giantank nirs ohkinozomu dtclxy64 tamalsaha rokibulhasan7 jaswalkiranavtar clubanderson joaobravecoding jwjwjw3

ocm's Issues

[task] Add managed-serviceaccount addon deploy scenario

add managed-serviceaccount addon deploy scenario.
refer https://github.com/open-cluster-management-io/managed-serviceaccount

How to apply a workload to many clusters? Do I have to create multiple ManifestWorks?

If i want to apply a deployment to cluster a and b, do I have to create two ManifestWorks to apply the deployment to cluster a and cluster b? If i want to update the workload, do I have to update all the ManifestWorks about the workload?

improve the OCM development guide

OCM now has some development guide, like the Architecture and Concepts in website https://open-cluster-management.io/concepts/architecture/
Also some development guides in each repo.

The feedback we receive from users who want to contribute is that:

I want to know the features/functions of the components, and their relationship.
After reading the website doc, I feel hard to understand the relationship of the components.
So many repos. Which one is related to my requirement?
As a new dev who want to contribute to ocm, is there any easy one for me to start with?

A better development guide could help to improve the user experience.

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.

How to do cluster maintainence and its implementation on placement

Subtask of #33

Update `clusteradm install hub-addon` to wrap Helm chart repo

Currently clusteradm install hub-addon uses YAML manifests hosted in the clusteradm repo to deploy hub addon components. This is undesirable for maintainability. With the new https://github.com/open-cluster-management-io/helm-charts repo, the hub-addon command should be updated to deploy helm charts from this repo directly.

Related issue:

open-cluster-management-io/community#136

Hub cluster HA solution proposal

The scenario encountered in production environment is that:

Hub cluster hc1 managed some managed clusters and many manifest works also many addons are deployed.
Hub cluster hc2 was installed as a backup.
Hub cluster encountered a critical error that could not recover for a short time.
We must join the managed clusters to Hub cluster hc2 and recovered OCM resources in Hub cluster hc2

Hub cluster HA solution proposal architecture is shown below:

Of course, OCM resources such as manifest works deployed in managed clusters should save meta data in annotaions or somewhere else.

Whether this solution is feasible ?

Need more documentation for OCM troubleshooting

I am learning OCM recently, the feature is nice. We are interested in using OCM in our project to support multi-cluster communication. During my experiments and troubleshooting, I found it is difficult to understand the underlying meaning of error messages, and look for fixes of the issues. Luckily the contributors and developers of OCM are very knowledgeable and helpful, on the other hand, if there is well-documented troubleshooting guide, there will be time saving for developers and providing smoother user experience.

A brief walk-through of how to write an addon based on addon-framework.

we should have a handbook to tell user how to develop an addon using addon-framework.

/assign @yue9944882
/assign @zhiweiyin318

Provide submariner addon for OCM

Is anyone working on submariner addon for OCM?

I'm interested in submariner right now, maybe I can help contribute.

Not able to disable/delete a ManagedClusterAddon

When a user install the cluster-proxy, managed-serviceaccount and cluster-gateway addons. The managed cluster addon resources are automatically created on the registered clusters. If the addons are deployed to cluster foo and bar. Is it possible to disable or delete the addons for cluster bar? Currently, when deleting those managed cluster addon on cluster bar namespace, they will come back.

$ kubectl get managedclusteraddon -n $NSC1
NAME                     AVAILABLE   DEGRADED   PROGRESSING
cluster-gateway          True
cluster-proxy            True
managed-serviceaccount   True
$ kubectl -n $NSC1 delete managedclusteraddon cluster-gateway
managedclusteraddon.addon.open-cluster-management.io "cluster-gateway" deleted

$ kubectl get managedclusteraddon -n $NSC1
NAME                     AVAILABLE   DEGRADED   PROGRESSING
cluster-gateway          True
cluster-proxy            True
managed-serviceaccount   True

Discovered by @yitiangf

CC @qiujian16

现在有没有企业已经在使用OCM，有生产环境的实践吗？

现在有没有企业已经在使用OCM，有生产环境的实践吗？谢谢

Upgrade clusterset api to v1beta2 in registration-operator

[task]Add cluster-proxy addon deploy scenario

add cluster-proxy addon deploy scenario.
refer https://github.com/open-cluster-management-io/cluster-proxy

provide a normalize library to generate scores for AddOnPlacementScore

In https://open-cluster-management.io/scenarios/extend-multicluster-scheduling-capabilities/ ,we talked about how to calculate and normalize the score for AddOnPlacementScore, it's better to provide a normalize library to do it.

Ref: open-cluster-management-io/open-cluster-management-io.github.io#280 (comment)

How do I run argoCD with OCM

we should have a solution scenario to describe how user can run argocd on OCM and to multiple cluster gitops.

/assign @elgnay

Create a solution scenario on how is the placement used in OCM

We should have solution scenarios to describe:

How user can extend placement scheduling.
How user can use placement to schedule their workload.
How to use placement and other opensource tools to do workload or storage disaster recovery.
How to do cluster maintainence and its implication on placement

How do I run a multicluster service mesh with ocm

we should describe a solution scenario on enabling istio with ocm to do multicluster service mesh

/assign @morvencao

Add new placement condition type and reason for configuration error and schedule failure

The requirement comes from open-cluster-management-io/placement#51 (comment) when implementing placement extensible scheduling.

In placement status, need to add new placement condition type and reason for configuration error and schedule failure.

Add condition type PlacementConditionMisconfigured in API open-cluster-management-io/api#132
Add condition type PlacementConditionMisconfigured in placement controller open-cluster-management-io/placement#72
doc how to debug placement in OCM website open-cluster-management-io/open-cluster-management-io.github.io#291

[task] The process tracker of translating document to Chinese

The document in Chinese is not done yet and this issue is used to track the process.

How to contribute:

If you're interested in translation work, please make a [work-in-process] PR and note this issue in that PR.

This will help others know which part of the document translation is ongoing.

Does this project support k3s?

Update document for clusterset api v1beta2

How user can use placement to schedule their workload.

Subtask of #33

improve the user scenarios in website

We now have some user scenarios under https://open-cluster-management.io/scenarios/ , which may not be enough.

And we have below feedback from users, which can help improve the user scenarios

I want to know how OCM is used in the real world.
Can OCM dispatch jobs for multi-cluster env? not clear about what can it do and how.
I need some detailed usage of the core functions (such as a blog for that).
To deploy an application on multiple clusters, I would want an easy approach.
How to integrate with the app or other middleware? e.g. kubeflow

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.

Subtasks:

Can I create work by using native K8s API?

I want to create work like deployment by using K8s API but not a WorkManifest with template in it, just like what kubefedV2, karmada or clusternet does. Is it possible?

failed to sync "cluster-manager", err: unsupport install mode:

By following this instruction: https://open-cluster-management.io/getting-started/quick-start/#setup-a-local-kind-environment, two Kind clusters were started. In the Hub cluster, the following error can be observed in the cluter-manager pod:

I0216 23:45:37.432366       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management", Name:"open-cluster-management", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CustomResourceDefinitionUpdated' Updated CustomResourceDefinition.apiextensions.k8s.io/addonplacementscores.cluster.open-cluster-management.io because it changed
E0217 09:19:01.780126       1 base_controller.go:251] "CRDMigrationController" controller failed to sync "cluster-manager", err: unsupport install mode: 
I0217 09:19:01.780137       1 clustermanager_status_controller.go:62] Reconciling ClusterManager "cluster-manager"
I0217 09:19:01.780160       1 certrotation_controller.go:157] Reconciling ClusterManager "cluster-manager"
I0217 09:19:01.818821       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management", Name:"open-cluster-management", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CustomResourceDefinitionUpdated' Updated CustomResourceDefinition.apiextensions.k8s.io/clustermanagementaddons.addon.open-cluster-management.io because it changed

And on the Hub cluster this happened while accepting cluster joining requests:

>  ~  /usr/local/bin/clusteradm accept --clusters cluster1
Error: no CSR to approve for cluster cluster1

Env is on rhel 8.5 + kind + podman

Could you pls provide some suggestions?

Return an error if components are not enable when using `clusteradm proxy xxx`

clusteradm proxy subcommand is depending on two addon: cluster-proxy-addon and managed-serviceaccount.
It should return an error when those two components are not installed.

Feature Request: Cluster API integration - automatic installation of klusterlet on managed clusters

Proposal

Currently you can use ClusterAPI to spin up new kubernetes clusters. It has different provider implementations, however. most of them (I didn't checked all) provides CA secrets to access managed cluster. Having this, and having OCM in admin/hub cluster, it would be useful to automatically install klusterjet agent of any new cluster and join them to the hub.

Enhance clusterset delete policy

We need to enhance the clusterset deletion policy.
Like:
When clusterset deleted:

Delete the related resources in clusterset
remove the clusterset label on managedcluster
Block the clusterset deletion if there are some resources in this set.
...

adding a time to live (TTL) option for managed-serviceaccount

I been experimenting with the managed-serviceaccount project from OCM and found it to be incredibly useful

for my usecase I leverage the managed-serviceaccount controller to create on-demand serviceaccount than use ManifestWork to create role/rolebinding on managedcluster to grant my serviceaccount RBAC permission.

the managed-serviceaccount and the manifestwork is short lived and is deleted when my workflow is done, but sometime i found that my workflow exit prematurely and the cleanup is not explicitly performed by my workflow (i know its my own problem but...)

it would be nice if the managed-serviceaccount resource can have a expiration time so that after expiration the managed-serviceaccount is deleted automatically

Add `Access managed cluster` solution.

This task aims to integrate #18 with #17 to provide a more pragmatic solution for addon usage.

How to use placement and other opensource tools to do workload or storage disaster recovery.

Subtask of #33

How user can extend placement scheduling.

Subtask of #33

put extensible scheduling example repo into addon-contrib

https://github.com/JiahaoWei-RH/resource-usage-collect provides an implementation example for extensible placement scheduling. Consider move it to placement.

Placement and ManagedClusterSets solution may be not good

I suggest selector or clusterSelector as a part of resources to match multi-cluster for applications rather than standalone placement object.The implementation of Placement and ManagedClusterSets may be not flexible.

Here are some reasons.

Clusters are resources dynamic selected and matched depending on demand of specific applications. We can't define one placement in advance. For example, we create a placement and later we create a manifest work which require GPU resources, but the placement do note include GPU score configuration.Then we modify placement and recreate the manifest work. After a few days later, the manifest work is rolling update by add a large volume requirements, then we modify placement again and recreate the manifest work. It may be a disaster.
Consider this scenario: a manifest work set is deployed with placement setting numberOfClusters to 3. Managed clusters A B C are selected in PlacementDecision. When the manifest work in managed cluster B is in error and can not be health any more (maybe lack of GPU resource or other problems), PlacementDecision should remove managed cluster B and select a new cluster. In the process, a lot of objects (ManagedClusterSet, ManagedClusterSetBinding, Placement, PlacementDecision, ManifestWork) are created, watched and updated. The process may be simple and clear by selector or clusterSelector.

Klog flags disppeared because of component-base updates.

Issue:

We are using the following pattern to setup logs in many repo currently:

        pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)

	logs.InitLogs()
	defer logs.FlushLogs()

But it seems klog part has been missed in the latest version:

➜  helloworld git:(f3917ca) ✗ ./helloworld controller --help
Start the addon controller

Usage:
  addon controller [flags]

Flags:
      --config string                    Location of the master configuration file to run from.
      -h, --help                             help for controller
      --kubeconfig string                Location of the master configuration file to run from.
      --listen string                    The ip:port to serve on.
      --namespace string                 Namespace where the controller is running. Auto-detected if run in cluster.
      --terminate-on-files stringArray   A list of files. If one of them changes, the process will terminate.

The above process, it should have some klog flag such as --v.

Reason

The reason for this issue may be in the latest version component-base, it using a customized FlagsSet:

https://github.com/kubernetes/component-base/blob/30d23418100a70c7ea34979e1fe87da620b84025/logs/logs.go#L53

Solution

A way to solve this issue could be:

        pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)

	logs.AddFlags(pflag.CommandLine) // Use this line to add default FlagSet into component-base.

	logs.InitLogs()
	defer logs.FlushLogs()

NOTICE: Renaming planned for similar GitHub organization

This issue serves as a notice that the similarly named open-cluster-management is planning to move to a new GitHub organization in order to remove naming confusion with open-cluster-management-io. The owners of open-cluster-management have agreed to move and have started preparations. We expect the move to be completed by mid 2022 and will keep this issue up to date with any new developments.

Fix: `bitnami` repos turn to unavailable.

Here are all places we replying on bitnami repos:
https://github.com/search?q=org%3Aopen-cluster-management-io+bitnami&type=code

We should remove all dependencies later.

Addon namespace created without any addon installed.

After deploy klusterlet, we found open-cluster-management-agent-addon is created but yet we had not installed any addons.

So if it possible only create this ns if addons exists?

ArgoCD Integration - Pull Model

The current OCM and ArgoCD integration is primary push model, which means the ArgoCD controller on the hub cluster needs to directly communicate with the target managed clusters. This approach does not use OCM's hub and spoke cluster post registration secured communication channel.

We are proposing implementing an ArgoCD integration pull model that leverages the OCM registration, placement and manifestwork APIs. We will introduce a new CRD call the MulticlusterApplicationSet that will evaluate cluster placements and create manifestwork wrapping ArgoCD application template to the target clusters. Once the ArgoCD application template lands on the target clusters, the ArgoCD controller will evaluate the template and deploy the application.

OCM working groups

It's a common practice to organize the community into working groups.

For OCM it's the same, organized into a set of working groups could improve the experience when users want to join and contribute, as well as the community development.

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.

Add Test Coverage to improve code quality.

It's better to have test coverage to improve our code quality.

There are several things we could do:

Add GitHub action check for new PR to meet the coverage requirement.
Set a goal (for example 70%) for further code-refactor tasks.
etc.

Add deleteOptions to Managedcluster

I suggest to add DeleteOption in Managedcluster for implementing following functions:

Set ClearPolicy in DeleteOption, then manifestworks and addons, etc of the managedcluster will be deleted by force when exec kubectl delete managedcluster.
Set RetentionPolicy in DeleteOption, then manifestworks and addons, etc of the managedcluster will be retained when exec kubectl delete managedcluster.

The demand was encountered in production environment.

Please Consider whether it is reasonable, thanks.

We should have an example app that shows features in ocm

deploy an application.
use placement to change targeted cluster.
use cluster-proxy to check the application.

Multiple Release EPIC: Upgrade clusterset API to v1beta2, use the "exclusiveClusterSetLabel" as default for clusterSetType, and deprecate the "legacyclusterset"

By introducing a new version of clusterset api (v1beta2), we can change the migration path as below.

Release 0.9.0

Add exclusiveClustersetLabel selector type in clusterset api v1beta1
Add clusterset api v1beta2 to remove the support of legacyClustersetLabel selector type
Add conversion webhook to transform clusterset CRs between v1beta1 and v1beta2
Mark clusterset api v1beta1 as deprecated
update documentation for v1beta2

Release 0.10.0

upgrade clusterset api to v1beta2 for each component
- #74
- #75
- #76
Other clusterset consumers (external consumers) must upgrade to clusterset api v1beta2

Release 0.11.0

Migrate storage version of clusterset api to v1beta2

Release 0.12.0

Remove clusterset api v1beta1

Upgrade clusterset api to v1beta2 in placement

Upgrade clusterset api to v1beta2 in registration

Provide Helm charts for OCM components

Thanks to Min Kim's excellent work, there is now a centralize OCM repo containing OCM related Helm charts: https://github.com/open-cluster-management-io/helm-charts

So far there are only some addon Helm charts. We should have Helm charts for core components (cluster manager, klusterlet) as well as some other addon charts (app, grc).

This enhancement is related to #40

How should I use placement and ManagedClusterSet

, according to the example, I have created ManagedClusterSet, also created placement, and PlacementDecision also got the corresponding result, but I don't know how to use placement and PlacementDecision, I created manifestwork under the namespace corresponding to ManagedClusterSet , there is no response on the corresponding cluster. The same is true for creating a workload 。Can someone tell me, how can I use ManagedClusterSet and placement?

addon customized configuration

proposal is here open-cluster-management-io/enhancements#58

story:

use should be able to use a addon configuration API to setup addon agent
- nodeSelector/tolerations
- images
- environment variables.

/assign @skeeey

run ocm control plane as a standalone binary

It will be interesting to run ocm control plane in stand alone mode. It will involve integrating apiserver/controller/etcd in one single binary. We should also disable unused APIs on apiserver (e.g. nodes, pods etc.). We can start with a prototype and check how kcp does this.

cc @yue9944882 @ycyaoxdu
/kind feature

open-cluster-management-io / ocm Goto Github PK

ocm's People

Contributors

Stargazers

Watchers

Forkers

ocm's Issues

How to contribute:

Proposal

Issue:

Reason

Solution

Recommend Projects

Recommend Topics

Recommend Org