weaveworks / clusters-config Goto Github PK
View Code? Open in Web Editor NEWConfiguration for engineering's ephemeral clusters
Configuration for engineering's ephemeral clusters
The pentesters will need to be able to access gitops on the clusters in order to run their tests. This can either be done via kubectl port-forward
or DNS (I'd suggest the former)
eks:AccessKubernetesApi
kubectl
based accessaws-auth
via eksctl create identitymapping
Some cluster resources like NGINX
create extra AWS resources like LoadBalancer
when they are deployed on the kubernetes clusters. We need to delete these resources in the delete-cluster.sh
script before deleting the cluster. Otherwise, cluster deletion will fail or will succeed but we will have orphan AWS resources.
Make worker nodes private.
Write document for:
We've got the green lights from pentesters to tear down the clusters
Thanks!
The set-output
command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
I would like to request adding the following to weave gitops enterprise pen-testing environment
a leaf cluster
so we could test management to leaf cluster. no CAPI would be tested so the manual way is good enough. Similar to thisflagger
in the leaf cluster to test progressive delivery journeys. an example of leaf cluster using flagger could be found herepolicy
in the leaf cluster so we could test policy journeys. An example of leaf cluster using policy hereNotes:
Following creating new cluster
Added provisioning leaf CAPI cluster template to the cluster and pushing it
Create leaf cluster from frontend and provisioned successfully
Download the leaf cluster kubeconfig via eksctl
as the following command works ok and available in the context
➜ eksctl utils write-kubeconfig --region eu-north-1 --cluster default_leaf-sm-control-plane --kubeconfig=~/.kube/config
➜ wge k config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane arn:aws:eks:eu-north-1:894516026745:cluster/default_leaf-sm-control-plane
➜ wge k get pod
An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::894516026745:assumed-role/WeaveEksEditor/[email protected] is not authorized to perform: sts:AssumeRole on resource: arn:aws:sts::894516026745:assumed-role/WeaveEksEditor/[email protected]
It's trying to assume the same role it has
Other Trials
kubeconfig
manually and tried to make the authentication locally by aws cli & aws-iam-authenticator results in the same result by following this docCreate makefile to request/update cluster easily
To get access to the leaf cluster's components, we should grant access to users on WGE. RBACs should be created by default for admin users.
One of the two pentest clusters (#12) needs to have weave-gitops-enterprise installed upon it
It should be installed according to the standard documentation
It does not need to have CAPI enabled.
We need to monitor our clusters resources (CPU, Mem, etc...) utilization to check how much resources are required for our clusters and whether we should change the instance type or not.
CloudWatch might have dashboards that can help in monitoring resources
Sometimes delete-cluster
CI job fails with error like: Error: checking if cluster implements policy API: Unauthorized
Have webhook enabled on "#clusters-config" slack channel and configure github actions to send notifications to it.
Add leaf mode to weave-mode options to give engineers the ability to create leaf clusters. Add an enterprise-leaf folder with a set of common apps for enterprise-leaf clusters.
To track which teams are using which resources easily for billing and UX purposes.
Default policies cause cluster provisioning failure because of blocking CAPI initialization step and blocks helm releases installation like flagger and nginx because they violate the default policies. We should make sure the default policies do not cause any errors at startup either by changing them or deleting the policies that cause violations.
currently we install weave-policy-agent version from a fixed version see here
https://github.com/weaveworks/clusters-config/blob/main/eksctl-clusters/apps/enterprise/policy-agent/policy-agent.yaml#L17
we need to install the weave-policy-agent from the version specified in the mccp chart
https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/Chart.yaml#L32
Currently, the CAPA template we have creates EKS clusters with 3 NAT gateways, which costs a lot. We need to find a way to reduce them to only one NAT GW or stop using EKS and use self-managed k8s clusters.
Templates can be found here
As part of WGE upgrade testing, we need to make it easier for engineers to spin up clusters with older versions of WGE, and then with one cmd, they upgrade to the latest to test the upgrade process.
Running the scripts on MAC will create duplicate files. See this commit
This happened because of the sed
command. We use sed -i '' "s/old/new/g" file
If the issue is not fixable, we need to delete the duplicate files using the script.
Allow user to extend his cluster ttl tag
As engineers test CAPI feature, we will have lots of "create cluster" branches and open PRs, we should delete forgotten PRs and branches.
Alert if the current branch is not the main and exit the script.
The idea is to make sure that engineers have the latest updates and prevent having multiple clusters in one branch
Enable capi by default for created clusters.
Templates will create leaf clusters on AWS.
The two pentest clusters (#12) need common configuration installed on them:
readonly
cluster role (e.g. https://github.com/weaveworks/weave-gitops-clusters/blob/main/k8s/apps/core/dex/readonly-cluster-role.yaml)cluster-admin
permissionswego-admin-role
permissions bound to the app namespacereadonly
permissionsreadonly
permissions bound to the app namespaceWe need to:
Give the engineer the ability to manipulate the cluster creation options after provisioning the cluster
Use case: an engineer provisioned a cluster without --enable-flagger
option. Now he needs to install flagger. So instead of letting him do it manually, the script will copy the needed files to his cluster dir.
Options:
--enable-flagger
=> to enable flagger--add-capi-templates
=> to add capi templates to the clsuter.The pipeline should be run only when the branch is being deleted to destroy the cluster and delete all related components.
Create a doc for all use cases that our tool currently supports
We should enable users to specify which version of WGE to install on the cluster. We should also give them the option to specify a branch to install WGE from.
Sometimes, flux fails to reconcole and this cause provision-cluster.sh to fail without adding access to admin role to the cluster.
We need to add a job to makesure that access is granted even if flux failed and the whole pipeline failed.
Calculate the potential cost saving of reducing nodes from t3.larg to t3.medium.
Make t3.meduim the default.
Create a how-to guide for accessing gitops in the cluster that can be provided to the pentesters.
Currently we send notifications on slack about clusters that are to be deleted in the next days. We need to notify the cluster owner about this update so that he/she can take an action whether to delete the cluster if he/she doesn't need it or extend the cluster ttl using the extend-cluster-ttl.sh
script.
To implement this, we would need to:
As we have decided to keep each cluster in a separate branch, we need to:
request-cluster.sh
script. The branch name = cluster name.eks-cluster-config.yaml
file.Spot instances can reduce cost to about 70%. We can build EKS on Spot instances but it has its drawbacks, so we need to figure out pros and cons of this option.
In order to manage costs, we need to auto deleted clusters after a period of time "2 weeks as default".
possible solutions:
Pentesters has finshed using the environments but they would like to have them freeze until they finish writing up the report (in case they need to come back to them).
I could think that if not too costly in terms of time we could just scale down the nodes.
Thanks!
Error => Red
Warning => Yellow
Success => Green
Info => default
The pipeline should be run only when the branch is created to provision the cluster and install any components needed to run weave-gitops-enterprise.
We should add an option to install policies on a newly created cluster for teams who need to work with policies and policy violations. We can add a --enable-policies
option in the request-cluster
script to install a couple of policies with the cluster.
export AWS_PROFILE=sts
, otherwise it fails to authenticate.request-cluster
doc to note that the user should wait until the cluster is provisioned before getting the kubeconfigAs we will install WGE, we need to provide templates for the most used items in the WGE like profiles, policies, clusters templates, ..etc.
Enable SOPS to encrypt secrets and decrypt them using kustomize-controller
One of the two pentest clusters (#12) needs to have weave-gitops installed upon it
It should be installed according to the standard documentation
Install WGE app with minimum configurations
We need two dedicated kubernetes clusters for the pentesters to work on.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.