Giter VIP home page Giter VIP logo

helm-charts's Introduction

SAP Converged Charts

This repository contains Helm charts required by SAP Converged Cloud.

Structure

Charts are grouped logically into:

  • common: Reusable charts
  • global: Singletons that only exist once in a global context
  • openstack: Openstack and dependent or related services
  • prometheus-exporters: A curated collection of Prometheus exporters
  • prometheus-rules: Prometheus alert- & aggregation rules
  • system: Infrastructure required by the control plane

This structure is just a logical grouping, it does not represent deployable units or imply other semantics.

Charts

On the second level we expect a chart. This can be a single chart or a meta-chart that describe a dependent set of compononents. Meta-charts contain sub-charts or reference charts from other repositories using Helm dependencies.

.
└── system
    ├── dns
    │   └── charts
    │       ├── bind
    │       └── unbound
    ├── kube-system
    │   └── charts
    │       ├── ingress
    │       └── dashboard
    └── prometheus
        └── charts
            ├── kube-state-metrics
            ├── prometheus-collector
            └── prometheus-frontend

We imply that the highest chart will be deployed as a Helm release. In this example, releasing dns will install/update bind and unbound.

In order to be able to relate charts to running Kubernetes pods, we also imply that a chart will be deployed in a namespace with the same name.

$ kubectl get pods --all-namespaces                                                                                                                 0 ↵
NAMESPACE         NAME                                               READY     STATUS    RESTARTS   AGE
dns               bind1-2290429089-joidj                             2/2       Running   0          5d
dns               bind2-3590597799-1vcv0                             2/2       Running   0          5d
dns               unbound1-3007389427-shh2y                          1/1       Running   0          9d
dns               unbound1-3577488147-ld1rd                          1/1       Running   0          5d
kube-system       ingress-controller-d3snv                           1/1       Running   4          13d
kube-system       ingress-controller-j9bpf                           1/1       Running   2          18d

This has the benefits that:

  • Values required for releasing a chart can be found at the same place in cc/regions
  • Cleanup of a failed release, is as easy as deleting the namespace.
  • For testing a chart can deployed in a seperate testing namespace.
  • Pods and other Kubernetes primitives are reflected at a known place in Kubernetes

Test a Chart

Opening a PR to this repository triggers the Helm chart tests which are described in detail here.

Install/Update of a Chart/Release

Per convention we use the name of the meta-chart as namespace and name of the release. Values are pulled in from a secret repository.

helm upgrade dns ./system/dns --namespace dns --values ../secrets/staging/system/dns.yaml --install

helm-charts's People

Contributors

artherd42 avatar auhlig avatar berndkue avatar bugroger avatar carthaca avatar christopherhans avatar chuan137 avatar databus23 avatar defo89 avatar developer-abhi avatar dhalimi avatar fwiesel avatar galkindmitrii avatar ivogoman avatar jknipper avatar joker-at-work avatar kuckkuck avatar majewsky avatar notandy avatar notque avatar nuckal777 avatar rajivmucheli avatar reimannf avatar richardtief avatar stefanhipfel avatar supersandro2000 avatar swagner-de avatar talal avatar thgrs avatar viennaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

Guideline for deploying Swift

Hi team.
I have been operating openstack on k8s cluster deployed by Openstack-helm project. (https://opendev.org/openstack/openstack-helm).
This project includes several main services of openstack and all are packaged by helm charts.
Unfortunately, it does not provide swift yet. So I am trying to build swift container image and package the helm chart. In Openstack-helm, all container images are created by loci project(https://opendev.org/openstack/loci) but I don't think I need to build in that way.
I am interested in your containerization method and want to discuss the details. I am willing to contribute also.

Regards.

Originally posted by @QuesadaMarvin in #2386 (comment)

Move kubernetes-entrypoint to init-container

Personally, I like the approach openstack/openstack-helm to move the kubernetes-entrypoint to the init-container.

This way, we do not have to add the executable to the image, and we can use a minimal kubernetes-entrypoint container to fulfil the same purpose.

Prometheus relabeling should decide which snmp-exporter module to use

For the baremetal metrics the service discovery file gives that information via __param_module, but it shouldn’t.
Instead the Prometheus should make that decision during relabeling based on information the service discovery gives via labels.
The reasoning behind that is, that the service discovery should be reusable for other setups and for scenarios that have nothing to do with the snmp-exporter.

[digicert-issuer]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

Bugs in kube-monitoring

  • Remove AlertManager Releases/Trash from Regions. Should only be a single global instance.
  • Node Exporter terminates as “Completed”. Needs to be restarted.
  • Grouping for PodRestart Alerts doesn’t work properly. Too much spam during resolves. Remove or Fix.

Proactively Detect Kubelet Problems - Go-Routine Leaks

Recently we have been seeing some instances of "unresponsive" Kubelets. The symptoms are Pods being scheduled but not starting and related problems. Our current alerting doesn't detect this. This is because the Kubelet is actually still running and responsive.

My suspicion is that it gets stuck in some endless retry loop or the like. A restart fixes the problem. Pending finding out the actual root cause and fixing the bug, we need to have an alert so we can proactively fix the problem.

One possible way to detect this would be to find abnormal spikes in the number of Go Routines the kubelet is creating.

This query shows a recent incident.
https://prometheus.staging.cloud.sap/graph?g0.range_input=1w&g0.expr=go_goroutines%7Bjob%3D%22kube-system%2Fkubelet%22%7D&g0.tab=0

Normal on eu-de-1:
image

Abnormal on staging:
image

Implement an alert when GoRoutines are spiking.

[cert-manager-crds-scaleout]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

[vertical-pod-autoscaler]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

[prometheus-crds]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

Happy hacking!

Add klog_pod_oomkill metrics

Make the klog_pod_oomkill metric provided by the oomkill-exporter available in all Prometheus instances (similar to the kube_* metrics). This would allow pod monitoring and alerting to be implemented in the dedicated Prometheus.

[cinder] Clean out old agents

The migration strategy negotiates the oldest version over all agents.
Having old agents from prior deployments breaks that, as they never renegotiate their version and will never update it.

One way to proceed would be by time:
delete from services where updated_at is null or updated_at < now() - interval '15 minutes';

[utils/mysql] Replace Global Master Password

[Prometheus] Info Inhibitors

Currently severity=critical inhibits severity=warning when the same context is set for alerts.

Add an additional inhibition rule: severity=critical|warning supresses severity=info

[utils/identity] Replace Global Master Password

https://github.com/sapcc/helm-charts/blob/master/openstack/utils/templates/_hosts.tpl#L136

  • Remove all references to Values.global.master_password
  • Prefer to not reference a global value. Instead require a password being passed to the chart

Prior Art:
a4c48f4

Take note that this utility function is being used in many other charts:

Coordinated deployments might be required. :/

Increase Severity for NodeNotReady Alerts

The NodeNotReady alert needs to be treated with urgency.

It is indicative of the node being stuck with a hanging kernel. This leads to problems if locks are still being held for persistent applications. Upon rescheduling those applications will not recover and stay in CrashLoopBackoff until manual intervention. Which could lead to severe outage if critical databases, like Keystone, are affected.

Increase the severity of the alert to critical. Update Playbook.

Tuning of kube-monitoring

  • PodRestart are to spammy as Warnings. Set to INFO Level
  • Critical Alerts for when regional Prometheus are down are too aggressive. Relax timeframe
  • Send Warnings to regional channels
  • Critical Alerts should also send resolved notifications
  • Docker Hang Alert is not sensitive enough

Bash Script vs Executable

Instead of executing the script with an explicit call to bash, I would suggest to mount all container-init scripts with executable mode, and call the script directly.

This way, we encapsulate the execution in the script, and can chose another interpreter (e.g. dumb-init bash), if required.

Not able to deploy on the local k8s cluster

Hi team. I am trying to deploy openstack services(swift for the first time) on my local k8s cluster. When I run helm install, it returns the following error message.

Error: execution error at (swift/templates/workers-daemonset.yaml:33:54): This release should be installed by the deployment pipeline!

It seems likely the values.yaml file includes some hardcoded constants which will be replaced in CICD pipeline.
How can I use these helm charts? Where are the container images for them?

[cert-manager-crds]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

[velero]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

[disco]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

prometheus-global retention time

$ grep retention global/prometheus/values.yaml
2:retention: 168h0m0s

@BugRoger @auhlig I recall this being a lot more (something like 90 days or so). Was there a copy-paste error, or did we scale it down over storage space concerns?

Region Label Missing

For some alerts the region label seems to be missing:
image

This leads to odd effects in routing, grouping etc... And the rendering of the alerts looks bugged 😄

I think this might be because some alert queries remove all labels. In Prometheus speak it ends up with something like absent(up{job="kube-scheduler"})={}. Somehow these rules don't apply then:

https://github.com/sapcc/helm-charts/blob/master/system/kube-monitoring/charts/prometheus-frontend/templates/config.yaml#L91-L98

According to the documentation this should add a label though, so not sure what's going on.

[jaeger-operator]: Migrate CustomResourceDefinitions to v1

For the upcoming k8s upgrade to 1.22 we need to migrate CustromResourceDefinitions to v1.

For some more details please check the 1.22 Deprecation Guide

There was also a post in the kubernetes blog about upcoming changes in 1.22:

Migrate to use the CustomResourceDefinition apiextensions.k8s.io/v1 API, available since v1.16.
You can use the v1 API to retrieve or update existing objects, even if they were created using an older API version. If you defined any custom resources in your cluster, those are still served after you upgrade.
If you're using external CustomResourceDefinitions, you can use kubectl convert to translate existing manifests to use the newer API. Because there are some functional differences between beta and stable CustomResourceDefinitions, our advice is to test out each one to make sure it works how you expect after the upgrade.

In the case of external resources it might be best to pull in an updated version of the crds from upstream.

Happy hacking!

[Prometheus] Increase of scrape duration

The number of open FDs of Prometheus in staging is significantly higher than in the other regions. Highest in production found in eu-de-1. As we only see an increase in scrape duration in staging we may want to try -storage.local.series-file-shrink-ratio={0.3,..,0.5} there to reduce consumed disk throughput as suggested in the other thread @BugRoger.

sapcc helm repo cannot be added because of certificate signed by unknown authority

Somehow I cannot add the helm repo to my helm list. It says that the certificate is not verified.

$ helm repo add sapcc https://charts.global.cloud.sap
Error: looks like "https://charts.global.cloud.sap" is not a valid chart repository or cannot be reached: Get "https://charts.global.cloud.sap/index.yaml": x509: certificate signed by unknown authority

Screenshot 2021-09-28 at 15 44 52

I tried to skip TLS verify but it says it's "AuthorizedOnly"

$ helm repo add --insecure-skip-tls-verify sapcc https://charts.global.cloud.sap
Error: looks like "https://charts.global.cloud.sap" is not a valid chart repository or cannot be reached: failed to fetch https://charts.global.cloud.sap/index.yaml : 403 AuthorizedOnly

What can I do to add the helm repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.