Giter VIP home page Giter VIP logo

clusters-config's People

Contributors

ahmedsa-mir avatar enekofb avatar foot avatar hagarmagdyy avatar mohamedmsaeed avatar morancj avatar taghreed86 avatar waleedhammam avatar weaveworks-admin-bot avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

microscaler

clusters-config's Issues

Deploy flagger with WGE

  • Deploy flagger with WGE by default
  • Deploy a service mesh "istio / linkerd" the recommended one.

Configure access for pentesters to the clusters

The pentesters will need to be able to access gitops on the clusters in order to run their tests. This can either be done via kubectl port-forward or DNS (I'd suggest the former)

  • Provide them an IAM Role (assuming that AWS is used) that they can assume with an external id configured
  • Grant that role permissions to eks:AccessKubernetesApi
    • This page lists other permissions required for using the AWS console. I don't think they're needed for purely kubectl based access
  • Add the role aws-auth via eksctl create identitymapping

Delete extra AWS resources created by cluster resources on deleting the cluster

Some cluster resources like NGINX create extra AWS resources like LoadBalancer when they are deployed on the kubernetes clusters. We need to delete these resources in the delete-cluster.sh script before deleting the cluster. Otherwise, cluster deletion will fail or will succeed but we will have orphan AWS resources.

add workload components to weave gitops enterprise pen-testing environment

I would like to request adding the following to weave gitops enterprise pen-testing environment

  • a leaf cluster so we could test management to leaf cluster. no CAPI would be tested so the manual way is good enough. Similar to this
  • flagger in the leaf cluster to test progressive delivery journeys. an example of leaf cluster using flagger could be found here
  • policy in the leaf cluster so we could test policy journeys. An example of leaf cluster using policy here

Notes:

  • new pen-testing date is October 24th

Error in accessing leaf clusters (AccessDenied) when calling the AssumeRole

How to reproduce

  • Following creating new cluster

  • Added provisioning leaf CAPI cluster template to the cluster and pushing it

  • Create leaf cluster from frontend and provisioned successfully
    image
    image

  • Download the leaf cluster kubeconfig via eksctl as the following command works ok and available in the context

➜ eksctl utils write-kubeconfig --region eu-north-1 --cluster default_leaf-sm-control-plane --kubeconfig=~/.kube/config
➜  wge k config get-contexts                
CURRENT   NAME                                                                        CLUSTER                                                                     AUTHINFO                                                                    NAMESPACE
*         arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane   arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane   arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane   
  • Trying to accessing the cluster fails with the following error
➜  wge k get pod                            

An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::894516026745:assumed-role/WeaveEksEditor/[email protected] is not authorized to perform: sts:AssumeRole on resource: arn:aws:sts::894516026745:assumed-role/WeaveEksEditor/[email protected]

It's trying to assume the same role it has

Other Trials

  • Downloaded kubeconfig manually and tried to make the authentication locally by aws cli & aws-iam-authenticator results in the same result by following this doc

Check CloudWatch dashboard for resources utilization

We need to monitor our clusters resources (CPU, Mem, etc...) utilization to check how much resources are required for our clusters and whether we should change the instance type or not.
CloudWatch might have dashboards that can help in monitoring resources

Enable notifications on slack

Have webhook enabled on "#clusters-config" slack channel and configure github actions to send notifications to it.

Default policies cause errors on startup

Default policies cause cluster provisioning failure because of blocking CAPI initialization step and blocks helm releases installation like flagger and nginx because they violate the default policies. We should make sure the default policies do not cause any errors at startup either by changing them or deleting the policies that cause violations.

Modify CAPA template

Currently, the CAPA template we have creates EKS clusters with 3 NAT gateways, which costs a lot. We need to find a way to reduce them to only one NAT GW or stop using EKS and use self-managed k8s clusters.

Templates can be found here

Upgrade WGE on demand

As part of WGE upgrade testing, we need to make it easier for engineers to spin up clusters with older versions of WGE, and then with one cmd, they upgrade to the latest to test the upgrade process.

Fix the provisioning script for MAC laptops

Running the scripts on MAC will create duplicate files. See this commit

This happened because of the sed command. We use sed -i '' "s/old/new/g" file

If the issue is not fixable, we need to delete the duplicate files using the script.

checkout to main before request a cluster.

Alert if the current branch is not the main and exit the script.

The idea is to make sure that engineers have the latest updates and prevent having multiple clusters in one branch

Add capi for AWS

Enable capi by default for created clusters.
Templates will create leaf clusters on AWS.

Install/configure common components of the pentest clusters

The two pentest clusters (#12) need common configuration installed on them:

Expose WGE/Gitops on a domain

We need to:

  • Expose the WGE/Gitops service on a domain using ingress. This would be easier for developers to deal with the cluster instead of port-forwarding the services.
  • Estimate the cost for the domain.
  • Think about how dex would be handled in this scenario.

Deploy policy-agent with WGE

  • Deploy policy agent with WGE by default.
  • Deploy some policies
  • Deploy an app "podinfo" that violates some of the policies.

manipulate the cluster creation options after provisioning the cluster

Give the engineer the ability to manipulate the cluster creation options after provisioning the cluster

Use case: an engineer provisioned a cluster without --enable-flagger option. Now he needs to install flagger. So instead of letting him do it manually, the script will copy the needed files to his cluster dir.

Options:

  1. --enable-flagger => to enable flagger
  2. --add-capi-templates => to add capi templates to the clsuter.

Enable versioning in WGE installation

We should enable users to specify which version of WGE to install on the cluster. We should also give them the option to specify a branch to install WGE from.

Add WW-roles to aws-auth if flux failed

Sometimes, flux fails to reconcole and this cause provision-cluster.sh to fail without adding access to admin role to the cluster.

We need to add a job to makesure that access is granted even if flux failed and the whole pipeline failed.

Notifying cluster owner when the cluster is to be deleted

Currently we send notifications on slack about clusters that are to be deleted in the next days. We need to notify the cluster owner about this update so that he/she can take an action whether to delete the cluster if he/she doesn't need it or extend the cluster ttl using the extend-cluster-ttl.sh script.

To implement this, we would need to:

  • Add a mandatory owner tag on requesting a cluster. The value of the owner should be the owner's slack ID.
  • Tag the owner in the cluster status slack message.

Manage clusters through branches

As we have decided to keep each cluster in a separate branch, we need to:

  • Automatically create a branch for each cluster request as a part of running request-cluster.sh script. The branch name = cluster name.
  • We should check if the branch exists "i.e. cluster is created" first.
  • Replace the branch name placeholder in eks-cluster-config.yaml file.

Check EKS with Spot instances clusters

Spot instances can reduce cost to about 70%. We can build EKS on Spot instances but it has its drawbacks, so we need to figure out pros and cons of this option.

scale down pentesting clusters

Pentesters has finshed using the environments but they would like to have them freeze until they finish writing up the report (in case they need to come back to them).

I could think that if not too costly in terms of time we could just scale down the nodes.

Thanks!

Github action to provision the cluster

The pipeline should be run only when the branch is created to provision the cluster and install any components needed to run weave-gitops-enterprise.

Add `--enable-policies` flag on requesting a new cluster

We should add an option to install policies on a newly created cluster for teams who need to work with policies and policy violations. We can add a --enable-policies option in the request-cluster script to install a couple of policies with the cluster.

Update docs

  • AWS auth section needs to include an extra step: export AWS_PROFILE=sts, otherwise it fails to authenticate.
  • Add a note in the request-cluster doc to note that the user should wait until the cluster is provisioned before getting the kubeconfig

Add WGE components templates

As we will install WGE, we need to provide templates for the most used items in the WGE like profiles, policies, clusters templates, ..etc.

Create two pentesting clusters

We need two dedicated kubernetes clusters for the pentesters to work on.

  • Ideally these clusters should be hosted in AWS (as that's our client's most common cloud platform).
  • Creation of these clusters should be as automated as possible.
  • Nodes should be private
  • Should have Flux installed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.