Giter VIP home page Giter VIP logo

ocm's People

Contributors

clyang82 avatar deads2k avatar dependabot[bot] avatar dhaiducek avatar dongbeiqing91 avatar elgnay avatar haoqing0110 avatar haowells avatar ivan-cai avatar ldpliu avatar mdelder avatar mikeshng avatar morvencao avatar mprahl avatar nitishchauhan0022 avatar openshift-ci[bot] avatar openshift-merge-robot avatar pdettori avatar pmorie avatar qiujian16 avatar rokibulhasan7 avatar serngawy avatar skeeey avatar suigh avatar tomerfi avatar xuezhaojun avatar ycyaoxdu avatar yue9944882 avatar zhiweiyin318 avatar zhujian7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocm's Issues

improve the OCM development guide

OCM now has some development guide, like the Architecture and Concepts in website https://open-cluster-management.io/concepts/architecture/
Also some development guides in each repo.

The feedback we receive from users who want to contribute is that:

  • I want to know the features/functions of the components, and their relationship.
  • After reading the website doc, I feel hard to understand the relationship of the components.
  • So many repos. Which one is related to my requirement?
  • As a new dev who want to contribute to ocm, is there any easy one for me to start with?

A better development guide could help to improve the user experience.

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.

Hub cluster HA solution proposal

The scenario encountered in production environment is that:

  1. Hub cluster hc1 managed some managed clusters and many manifest works also many addons are deployed.
  2. Hub cluster hc2 was installed as a backup.
  3. Hub cluster encountered a critical error that could not recover for a short time.
  4. We must join the managed clusters to Hub cluster hc2 and recovered OCM resources in Hub cluster hc2

Hub cluster HA solution proposal architecture is shown below:
image

Of course, OCM resources such as manifest works deployed in managed clusters should save meta data in annotaions or somewhere else.

Whether this solution is feasible ?

Need more documentation for OCM troubleshooting

I am learning OCM recently, the feature is nice. We are interested in using OCM in our project to support multi-cluster communication. During my experiments and troubleshooting, I found it is difficult to understand the underlying meaning of error messages, and look for fixes of the issues. Luckily the contributors and developers of OCM are very knowledgeable and helpful, on the other hand, if there is well-documented troubleshooting guide, there will be time saving for developers and providing smoother user experience.

Not able to disable/delete a ManagedClusterAddon

When a user install the cluster-proxy, managed-serviceaccount and cluster-gateway addons. The managed cluster addon resources are automatically created on the registered clusters. If the addons are deployed to cluster foo and bar. Is it possible to disable or delete the addons for cluster bar? Currently, when deleting those managed cluster addon on cluster bar namespace, they will come back.

$ kubectl get managedclusteraddon -n $NSC1
NAME                     AVAILABLE   DEGRADED   PROGRESSING
cluster-gateway          True
cluster-proxy            True
managed-serviceaccount   True
$ kubectl -n $NSC1 delete managedclusteraddon cluster-gateway
managedclusteraddon.addon.open-cluster-management.io "cluster-gateway" deleted

$ kubectl get managedclusteraddon -n $NSC1
NAME                     AVAILABLE   DEGRADED   PROGRESSING
cluster-gateway          True
cluster-proxy            True
managed-serviceaccount   True

Discovered by @yitiangf

CC @qiujian16

Create a solution scenario on how is the placement used in OCM

We should have solution scenarios to describe:

  • How user can extend placement scheduling.
  • How user can use placement to schedule their workload.
  • How to use placement and other opensource tools to do workload or storage disaster recovery.
  • How to do cluster maintainence and its implication on placement

Add new placement condition type and reason for configuration error and schedule failure

The requirement comes from open-cluster-management-io/placement#51 (comment) when implementing placement extensible scheduling.

In placement status, need to add new placement condition type and reason for configuration error and schedule failure.

[task] The process tracker of translating document to Chinese

The document in Chinese is not done yet and this issue is used to track the process.

How to contribute:

If you're interested in translation work, please make a [work-in-process] PR and note this issue in that PR.

This will help others know which part of the document translation is ongoing.

improve the user scenarios in website

We now have some user scenarios under https://open-cluster-management.io/scenarios/ , which may not be enough.

And we have below feedback from users, which can help improve the user scenarios

  • I want to know how OCM is used in the real world.
  • Can OCM dispatch jobs for multi-cluster env? not clear about what can it do and how.
  • I need some detailed usage of the core functions (such as a blog for that).
  • To deploy an application on multiple clusters, I would want an easy approach.
  • How to integrate with the app or other middleware? e.g. kubeflow

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.


Subtasks:

Can I create work by using native K8s API?

I want to create work like deployment by using K8s API but not a WorkManifest with template in it, just like what kubefedV2, karmada or clusternet does. Is it possible?

failed to sync "cluster-manager", err: unsupport install mode:

By following this instruction: https://open-cluster-management.io/getting-started/quick-start/#setup-a-local-kind-environment, two Kind clusters were started. In the Hub cluster, the following error can be observed in the cluter-manager pod:

I0216 23:45:37.432366       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management", Name:"open-cluster-management", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CustomResourceDefinitionUpdated' Updated CustomResourceDefinition.apiextensions.k8s.io/addonplacementscores.cluster.open-cluster-management.io because it changed
E0217 09:19:01.780126       1 base_controller.go:251] "CRDMigrationController" controller failed to sync "cluster-manager", err: unsupport install mode: 
I0217 09:19:01.780137       1 clustermanager_status_controller.go:62] Reconciling ClusterManager "cluster-manager"
I0217 09:19:01.780160       1 certrotation_controller.go:157] Reconciling ClusterManager "cluster-manager"
I0217 09:19:01.818821       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management", Name:"open-cluster-management", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CustomResourceDefinitionUpdated' Updated CustomResourceDefinition.apiextensions.k8s.io/clustermanagementaddons.addon.open-cluster-management.io because it changed

And on the Hub cluster this happened while accepting cluster joining requests:

>  ~  /usr/local/bin/clusteradm accept --clusters cluster1
Error: no CSR to approve for cluster cluster1

Env is on rhel 8.5 + kind + podman

Could you pls provide some suggestions?

Enhance clusterset delete policy

We need to enhance the clusterset deletion policy.
Like:
When clusterset deleted:

  1. Delete the related resources in clusterset
  2. remove the clusterset label on managedcluster
  3. Block the clusterset deletion if there are some resources in this set.
    ...

adding a time to live (TTL) option for managed-serviceaccount

I been experimenting with the managed-serviceaccount project from OCM and found it to be incredibly useful

for my usecase I leverage the managed-serviceaccount controller to create on-demand serviceaccount than use ManifestWork to create role/rolebinding on managedcluster to grant my serviceaccount RBAC permission.

the managed-serviceaccount and the manifestwork is short lived and is deleted when my workflow is done, but sometime i found that my workflow exit prematurely and the cleanup is not explicitly performed by my workflow (i know its my own problem but...)

it would be nice if the managed-serviceaccount resource can have a expiration time so that after expiration the managed-serviceaccount is deleted automatically

Placement and ManagedClusterSets solution may be not good

I suggest selector or clusterSelector as a part of resources to match multi-cluster for applications rather than standalone placement object.The implementation of Placement and ManagedClusterSets may be not flexible.

Here are some reasons.

  1. Clusters are resources dynamic selected and matched depending on demand of specific applications. We can't define one placement in advance. For example, we create a placement and later we create a manifest work which require GPU resources, but the placement do note include GPU score configuration.Then we modify placement and recreate the manifest work. After a few days later, the manifest work is rolling update by add a large volume requirements, then we modify placement again and recreate the manifest work. It may be a disaster.
  2. Consider this scenario: a manifest work set is deployed with placement setting numberOfClusters to 3. Managed clusters A B C are selected in PlacementDecision. When the manifest work in managed cluster B is in error and can not be health any more (maybe lack of GPU resource or other problems), PlacementDecision should remove managed cluster B and select a new cluster. In the process, a lot of objects (ManagedClusterSet, ManagedClusterSetBinding, Placement, PlacementDecision, ManifestWork) are created, watched and updated. The process may be simple and clear by selector or clusterSelector.

Klog flags disppeared because of component-base updates.

Issue:

We are using the following pattern to setup logs in many repo currently:

        pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)

	logs.InitLogs()
	defer logs.FlushLogs()

But it seems klog part has been missed in the latest version:

➜  helloworld git:(f3917ca) ✗ ./helloworld controller --help
Start the addon controller

Usage:
  addon controller [flags]

Flags:
      --config string                    Location of the master configuration file to run from.
      -h, --help                             help for controller
      --kubeconfig string                Location of the master configuration file to run from.
      --listen string                    The ip:port to serve on.
      --namespace string                 Namespace where the controller is running. Auto-detected if run in cluster.
      --terminate-on-files stringArray   A list of files. If one of them changes, the process will terminate.

The above process, it should have some klog flag such as --v.

Reason

The reason for this issue may be in the latest version component-base, it using a customized FlagsSet:

https://github.com/kubernetes/component-base/blob/30d23418100a70c7ea34979e1fe87da620b84025/logs/logs.go#L53

Solution

A way to solve this issue could be:

        pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)

	logs.AddFlags(pflag.CommandLine) // Use this line to add default FlagSet into component-base.

	logs.InitLogs()
	defer logs.FlushLogs()

NOTICE: Renaming planned for similar GitHub organization

This issue serves as a notice that the similarly named open-cluster-management is planning to move to a new GitHub organization in order to remove naming confusion with open-cluster-management-io. The owners of open-cluster-management have agreed to move and have started preparations. We expect the move to be completed by mid 2022 and will keep this issue up to date with any new developments.

ArgoCD Integration - Pull Model

The current OCM and ArgoCD integration is primary push model, which means the ArgoCD controller on the hub cluster needs to directly communicate with the target managed clusters. This approach does not use OCM's hub and spoke cluster post registration secured communication channel.

We are proposing implementing an ArgoCD integration pull model that leverages the OCM registration, placement and manifestwork APIs. We will introduce a new CRD call the MulticlusterApplicationSet that will evaluate cluster placements and create manifestwork wrapping ArgoCD application template to the target clusters. Once the ArgoCD application template lands on the target clusters, the ArgoCD controller will evaluate the template and deploy the application.

OCM working groups

It's a common practice to organize the community into working groups.

For OCM it's the same, organized into a set of working groups could improve the experience when users want to join and contribute, as well as the community development.

Create this issue to further discuss this topic.
More ideas and suggestions are welcomed to comment on this issue.

Add Test Coverage to improve code quality.

It's better to have test coverage to improve our code quality.

There are several things we could do:

  1. Add GitHub action check for new PR to meet the coverage requirement.
  2. Set a goal (for example 70%) for further code-refactor tasks.
  3. etc.

Add deleteOptions to Managedcluster

I suggest to add DeleteOption in Managedcluster for implementing following functions:

  1. Set ClearPolicy in DeleteOption, then manifestworks and addons, etc of the managedcluster will be deleted by force when exec kubectl delete managedcluster.
  2. Set RetentionPolicy in DeleteOption, then manifestworks and addons, etc of the managedcluster will be retained when exec kubectl delete managedcluster.

The demand was encountered in production environment.

Please Consider whether it is reasonable, thanks.

Multiple Release EPIC: Upgrade clusterset API to v1beta2, use the "exclusiveClusterSetLabel" as default for clusterSetType, and deprecate the "legacyclusterset"

By introducing a new version of clusterset api (v1beta2), we can change the migration path as below.

Release 0.9.0

Release 0.10.0

  • upgrade clusterset api to v1beta2 for each component

  • Other clusterset consumers (external consumers) must upgrade to clusterset api v1beta2

Release 0.11.0

  • Migrate storage version of clusterset api to v1beta2

Release 0.12.0

  • Remove clusterset api v1beta1

How should I use placement and ManagedClusterSet

, according to the example, I have created ManagedClusterSet, also created placement, and PlacementDecision also got the corresponding result, but I don't know how to use placement and PlacementDecision, I created manifestwork under the namespace corresponding to ManagedClusterSet , there is no response on the corresponding cluster. The same is true for creating a workload 。Can someone tell me, how can I use ManagedClusterSet and placement?

run ocm control plane as a standalone binary

It will be interesting to run ocm control plane in stand alone mode. It will involve integrating apiserver/controller/etcd in one single binary. We should also disable unused APIs on apiserver (e.g. nodes, pods etc.). We can start with a prototype and check how kcp does this.

cc @yue9944882 @ycyaoxdu
/kind feature

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.