Giter VIP home page Giter VIP logo

kube-prod-runtime's Introduction

WARNING: Bitnami Kubernetes Production Runtime is no longer actively maintained by VMware.

VMware has made the difficult decision to stop driving this project and therefore we will no longer actively respond to issues or pull requests. If you would like to take over maintaining this project independently from VMware, please let us know so we can add a link to your forked project here.

Thank You.

Description

The Bitnami Kubernetes Production Runtime (BKPR) is a collection of services that makes it easy to run production workloads in Kubernetes.

Think of Bitnami Kubernetes Production Runtime as a curated collection of the services you would need to deploy on top of your Kubernetes cluster to enable logging, monitoring, certificate management, automatic discovery of Kubernetes resources via public DNS servers and other common infrastructure needs.

BKPR

BKPR is available for Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS) and Amazon Elastic Container Service for Kubernetes (Amazon EKS) clusters.

License

BKPR is licensed under the Apache License Version 2.0.

Requirements

BKPR has been tested to work on a bare-minimum Kubernetes cluster with three kubelet nodes with 2 CPUs and 8GiB of RAM each.

Kubernetes version support matrix

The following matrix shows which Kubernetes versions and platforms are supported:

BKPR release AKS versions GKE versions EKS versions
1.3 (deprecated) 1.11-1.12 1.11-1.12 1.11
1.4 (deprecated) 1.14-1.15 1.14-1.15 1.14
1.5 (deprecated) 1.14-1.15 1.14-1.15 1.14-1.15
1.6 (deprecated) 1.15-1.16 1.15-1.16 1.15-1.16
1.7 1.16-1.17 1.16-1.17 1.16-1.17
1.8 (current) 1.17-1.18 1.17-1.18 1.17-1.18
1.9 (planned) 1.18 1.18-1.19 1.18-1.19

Note that the (experimental) generic platform is e2e tested on GKE.

Quickstart

Please use the installation guide to install the kubeprod binary before installing BKPR to your cluster.

Frequently Asked Questions (FAQ)

See the separate FAQ and roadmap documents.

Versioning

The versioning used in BKPR is described here.

Components

BKPR leverages the following components to achieve its mission. For more in-depth documentation about them please read the components documentation.

Logging stack

  • Elasticsearch: A distributed, RESTful search and analytics engine
  • Fluentd: A data collector for unified logging layer
  • Kibana: A visualization tool for Elasticsearch data

Logging stack

Monitoring stack

  • Prometheus: A monitoring system and time series database
  • Alertmanager: An alert manager and router
  • Grafana: An open source metric analytics & visualization suite

Monitoring stack

Ingress stack

  • NGINX Ingress Controller: A Controller to satisfy requests for Ingress objects
  • cert-manager: A Kubernetes add-on to automate the management and issuance of TLS certificates from various sources
  • OAuth2 Proxy: A reverse proxy and static file server that provides authentication using Providers (Google, GitHub, and others) to validate accounts by email, domain or group
  • ExternalDNS: A component to synchronize exposed Kubernetes Services and Ingresses with DNS providers

Ingress stack

Release compatibility

Components version support

The following matrix shows which versions of each component are used and supported in the most recent releases of BKPR:

Component BKPR 1.6 BKPR 1.7 BKPR 1.8
Alertmanager 0.21.x 0.21.x 0.21.x
cert-manager 0.14.x 0.16.x 0.16.x
configmap-reload 0.3.x 0.5.x 0.5.x
Elasticsearch 7.8.x 7.12.x 7.12.x
Elasticsearch Curator 5.8.x 5.8.x 5.8.x
Elasticsearch Exporter 1.1.x 1.1.x 1.1.x
ExternalDNS 0.7.x 0.7.x 0.7.x
Fluentd 1.11.x 1.12.x 1.12.x
Grafana 7.0.x 7.5.x 7.5.x
Kibana 7.8.x 7.12.x 7.12.x
kube-state-metrics 1.9.x 1.9.x 1.9.x
Node exporter 1.0.x 1.1.x 1.1.x
NGINX Ingress Controller 0.33.x 0.34.x 0.34.x
oauth2_proxy 5.1.x 6.0.x 6.0.x
Prometheus 2.19.x 2.26.x 2.26.x

Note BKPR 1.8 is a catch up release. Patch versions might be updated, but no ther significant changes have been applied. This was done to enable ourselves to catch up with the latest Kubernetes releases for subsequent BKPR releases.

Contributing

If you would like to become an active contributor to this project please follow the instructions provided in contribution guidelines.

kube-prod-runtime's People

Contributors

anguslees avatar arapulido avatar atomatt avatar bitnami-bot avatar bors[bot] avatar dbarranco avatar falfaro avatar georgewgriffith avatar javsalgar avatar jbianquetti-nami avatar jjo avatar jloramas avatar juan131 avatar migmartri avatar munnerz avatar nomisbeme avatar pichouk avatar ppbaena avatar rberrelleza avatar sameersbn avatar surik avatar vikram-bitnami avatar vsimon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-prod-runtime's Issues

Split Prometheus configuration and rules in two different configmaps

Currently kubeprod deploys one ConfigMap for the Prometheus deployment that includes the following:

  • monitoring.yml -> Rules file
  • basic.yml -> Rules file
  • prometheus.yml -> Prometheus configuration

An approach we've found useful while implementing our gitOps workflow is to split that ConfigMap in two different ones, one intended for rules and one intended for Prometheus configuration.

With this, Prometheus could be easily integrated in a gitOps workflow, and any developer could update Prometheus rules without side affecting the application configuration, integrating this with the ConfigMap-reloader the deployment has.

These two new Prometheus ConfigMaps can be safely mounted in:

  • /opt/bitnami/prometheus/conf for application configuration
  • /opt/bitnami/prometheus/conf/rules.d for Prometheus rules

And Prometheus will load and group them in different folders, as it currently works.

Of course, happy to share with you the work the SRE team did around this :)
cc/ @jjo @jbianquetti-nami

Tons of fluentd logs brings down the ES cluster by disk exhaustion

On a completely idle GKE cluster, and after a week, some ElasticSearch pods have entered into a crash loop because the underlying PV is full.

The ElasticSearch manifest uses 100GB by default for each PV. At the moment only one of the three ElasticSearch pods is crashing, but the other two pods will very soon start crashing as they PVs are already at 100% according to "df -k":

$ kubectl --namespace=kubeprod exec -it elasticsearch-logging-1 bash
Defaulting container name to elasticsearch-logging.
Use 'kubectl describe pod/elasticsearch-logging-1 -n kubeprod' to see all of the containers in this pod.
I have no name!@elasticsearch-logging-1:/$ df -k
Filesystem     1K-blocks     Used Available Use% Mounted on
overlay         98868448  6814184  92037880   7% /
tmpfs            3829072        0   3829072   0% /dev
tmpfs            3829072        0   3829072   0% /sys/fs/cgroup
/dev/sda1       98868448  6814184  92037880   7% /etc/hosts
shm                65536        0     65536   0% /dev/shm
/dev/sdb       102687672 97428408         0 100% /bitnami/elasticsearch/data
tmpfs            3829072       12   3829060   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            3829072        0   3829072   0% /sys/firmware

Create "optimised" helm stable repo fork

Something that looks like the stable helm charts repo, only customised/optimised to target the standard runtime. Doesn't need to include all the charts, but more is better.

Write basic user docs

To focus on the final UX, we're going to write the user docs first. This should cover:

  • install
  • upgrade
  • customise/republish (within limits)

Use dedicated Kubernetes namespace or namespaces

Running kubeprod inside the kube-system namespace is seen as dangerous by some folks. We should decide whether it makes sense to run all of kubeprod in the kube-system namespace of using a different namespace, or even namespaces. For instance, we could run Elasticsearch in the logging namespace and everything else in the bitnami namespace.

Initial packages

Each of these will be tracked in separate PRs, but here's the place where we get to argue over the mandatory set of components (for initial release).

  • prometheus stack (grafana, node-exporter, etc) (#11, #12)
  • service catalog and relevant broker(s)
  • k8s dashboard/heapster where missing (#15)
  • An ingress controller (nginx? contour? traefik?)
  • Logs sink (fluentd target, broker, and elasticsearch/whatever) (#13)
  • TLS solution (cert-manager) (#14)
  • minikube http/tls solution (pagekite/ngrok)
  • A mariadb broker?

Detailed docs for GKE

We need detailed docs for GKE that include the objects that are created, config.json, etc.

Add end to end tests

The test will deploy all components and will check that those were started successfully

dns suffix should be a mandatory

Right now --dns-zone is an optional argument and if you don't pass it you get a warning:

WARNING DNS suffix was not provided. Some services may not function correctly.

But actually, if you don't pass it, it fails:

ERROR Error: Error updating ingresses kube-system.kibana-logging: Ingress.extensions "kibana-logging" is invalid: [spec.rules[0].host: Invalid value: "kibana.": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.tls[0].hosts: Invalid value: "kibana.": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]

We should make it a mandatory argument if it is the case.

Also, --dns-zone might not be the best naming for this. Maybe --dns-suffix might be clearer?

Help UX should be improved

Some issues that I see when I run kubeprod help

Install the Bitnami Production Runtime for Kubernetes

Usage:
  kube-prod-runtime-installer [command]

Available Commands:
  help           Help about any command
  install        Install Bitnami Production Runtime for Kubernetes
  list-platforms List supported platforms
  • It says that Install the Bitnami Production Runtime for Kubernetes which is not true, as that is what the install subcommand does. We should give an introduction to BKPR instead.
  • Usage says kube-prod-runtime-installer it should be kubeprod
  • It says that help would give you help about any subcommand, but if I do: kubeprod install help I get ERROR Error: unknown command "help" for "kube-prod-runtime-installer install"

Get the latest version of Kubeprod releases in docs programatically

As I was looking this morning at the GKE quick start docs I noticed we are pointing users to to the releases tab of Github to find the latest release of Kubeprod to create the BKPR_VERSION environment variable.

Just wanted to point what we did in other repositories like SealedSecrets:

$ release=$(curl --silent "https://api.github.com/repos/bitnami-labs/sealed-secrets/releases/latest" | sed -n 's/.*"tag_name": *"\([^"]*\)".*/\1/p')

ref

Making easier to users to follow the docs without leaving the documentation page :)

PR #177 is not backwards compatible

PR #177 is not backwards compatible. It introduces a new field in the JSON configuration file named authz_domain which didn't exist before the PR.

I have a GKE cluster that was deployed with kubeprod manifests before PR #177. This how the JSON configuration file looks like:

{
  "dnsZone": "...",
  "contactEmail": "...",
  "externalDns": {
    "credentials": "...",
    "project": "..."
  },
  "oauthProxy": {
    "client_id": "...",
    "client_secret": "...",
    "cookie_secret": "...",
    "google_groups": [],
    "google_admin_email": "",
    "google_service_account_json": ""
  }
}

Running kubecfg update after PR #177 results in the following error:

 kubecfg update kubeprod-manifest.jsonnet
ERROR Error reading kubeprod-manifest.jsonnet: RUNTIME ERROR: Field does not exist: authz_domain
        file:///Users/falfaro/go/src/github.com/bitnami/kube-prod-runtime/manifests/platforms/gke.jsonnet:110:35-67     object <anonymous>
        file:///Users/falfaro/go/src/github.com/bitnami/kube-prod-runtime/manifests/lib/kube.libsonnet:32:25-29 thunk from <thunk from <function <anonymous>>>
        <std>:659:15-22 thunk <val> from <function <format_codes_arr>>
        <std>:666:27-30 thunk from <thunk <s> from <function <format_codes_arr>>>
        <std>:536:22-25 thunk from <function <format_code>>
        <builtin>       builtin function <toString>
        <std>:666:15-60 thunk <s> from <function <format_codes_arr>>
        <std>:671:24-25 thunk from <thunk <s_padded> from <function <format_codes_arr>>>
        <std>:451:30-33 thunk from <thunk from <function <pad_left>>>
        <builtin>       builtin function <length>
        ...
        <std>:451:7-38  function <pad_left>
        <std>:451:7-44  function <pad_left>
        <std>:671:15-39 thunk <s_padded> from <function <format_codes_arr>>
        <std>:677:55-63 thunk from <function <format_codes_arr>>
        <std>:677:51-63 thunk from <function <format_codes_arr>>
        <std>:677:11-64 function <format_codes_arr>
        <std>:721:7-46  function <anonymous>
        <std>:203:7-23  function <anonymous>

        During manifestation

HTTP Routing and Kubeprod

AKS offers an add-on called HTTP Routing that when enabled, creates a DNS zone and manages the Ingress resources for you (given the right annotation in your manifests).

We should test how this conflicts with kubeprod and document it properly.

Workflow documentation (easy v.s. advanced)

t would be good to document the two possible workflow: using kubeprod on an empty cluster, with all the defaults or the advanced use case where users modify the json config themselves and use kubecfg to apply those

DNS entries managed by BPKR (on AKS) aren't refreshed

I created two AKS clusters in the same Resource Group and followed the private beta guide to deploy BKPR and wordpress in each of these clusters. The two clusters were setup correctly and I was able to access the logging and monitoring dashboards as well as the wordpress app.

However after a few hours I noticed that the DNS records for the kibana, prometheus and wordpress subdomains of the first cluster vanished from the DNS zone.

Looking at the logs of the external-dns pod in the two clusters it appears each is one is deleting the entries made by the other which explains why the records are missing from the zone. Note the two clusters were setup to use different subdomains of the same top-level domain.

Logs of the external-dns container for the first cluster

time="2018-07-05T04:55:52Z" level=info msg="Deleting A record named 'blog' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:53Z" level=info msg="Deleting A record named 'kibana' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:53Z" level=info msg="Deleting A record named 'prometheus' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:53Z" level=info msg="Deleting TXT record named 'blog' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:53Z" level=info msg="Deleting TXT record named 'kibana' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:53Z" level=info msg="Deleting TXT record named 'prometheus' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T04:55:54Z" level=info msg="Updating A record named 'blog' to '40.121.61.194' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T04:55:54Z" level=info msg="Updating A record named 'kibana' to '40.121.61.194' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T04:55:54Z" level=info msg="Updating A record named 'prometheus' to '40.121.61.194' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T04:55:54Z" level=info msg="Updating TXT record named 'blog' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/blog.one.my-domain.com-blog\"' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T04:55:55Z" level=info msg="Updating TXT record named 'kibana' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/kube-system/kibana-logging\"' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T04:55:55Z" level=info msg="Updating TXT record named 'prometheus' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/kube-system/prometheus\"' for Azure DNS zone 'one.my-domain.com'."

Logs of the external-dns container for the second cluster

time="2018-07-05T05:00:10Z" level=info msg="Deleting A record named 'blog' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:10Z" level=info msg="Deleting A record named 'kibana' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:11Z" level=info msg="Deleting A record named 'prometheus' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:11Z" level=info msg="Deleting TXT record named 'blog' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:11Z" level=info msg="Deleting TXT record named 'kibana' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:11Z" level=info msg="Deleting TXT record named 'prometheus' for Azure DNS zone 'one.my-domain.com'."
time="2018-07-05T05:00:12Z" level=info msg="Updating A record named 'prometheus' to '40.76.205.98' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T05:00:12Z" level=info msg="Updating A record named 'blog' to '40.76.205.98' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T05:00:12Z" level=info msg="Updating A record named 'kibana' to '40.76.205.98' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T05:00:12Z" level=info msg="Updating TXT record named 'prometheus' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/kube-system/prometheus\"' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T05:00:13Z" level=info msg="Updating TXT record named 'blog' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/blog.two.my-domain.com-blog\"' for Azure DNS zone 'two.my-domain.com'."
time="2018-07-05T05:00:13Z" level=info msg="Updating TXT record named 'kibana' to '\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/kube-system/kibana-logging\"' for Azure DNS zone 'two.my-domain.com'."

Detailed docs for AKS

We need a detailed doc for AKS that explains what objects are created, the config.json, etc.

Monitor that ExternalDNS is working properly

ExternalDNS depends on OAuth2 being properly set for AKS and GKE. When that is not the case it "fails silently", not updating any DNS records, and logging the following error to standard output:

time="2018-10-23T08:58:14Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"

It would be nice to alert on this condition if possible, or write some automation that checks for these error messages after deploying BKPR.

Kubeprod returns a cyptic error message when you don't have your current-context set

Steps to reproduce:

  • Unset your current-context (kubectl config unset current-context)
  • Run kubeprod. You will get the following error:
ERROR Error: invalid configuration: no configuration has been provided

This error is quite cryptic, as it feels like the configuration you are missing is a Kubeprod configuration, rather than a missing default context.

We should provide a better error message in this case

Decide on compatibility/support statement

How long? Covering what? How are security patches going to happen?

Obviously this will be provisional depending on what users actually want. Also need to tie this into k8s' own releease/compatibility schedule.

Write initial "Package policy" guidelines

These are the guidelines/standards we will try to follow across our components.

Follow the example of the Debian Policy Manual, with a versioned doc so we can evolve standards over time, and track which files/components have been updated to which standards version.

Keep the initial version simple, and improve depth/breadth over time.

Cleanup option handling

Currently we pass some options to jsonnet, and some other option to the platform-specific setup code (post-jsonnet), and some options to both. This is a mess, and doesn't make any of it very transparent to power users.

Proposal

Replace it all with:

  1. pre-jsonnet setup code that does platform-specific setup,
  2. write a jsonnet file that has all the generated secrets/option/etc and imports the standard platform,
  3. run kubecfg (or embedded library equivalent) on this jsonnet file.

This makes the generated file a good example of how power users can further tweak the setup if they want. They can also skip the auto-installer entirely if required, etc.

In future we can use sealed-secrets (or similar) to protect the secrets that will be included in this file - for a first version I suggest just writing the naive (plaintext) Secrets.

Record video

We will need an (unedited) video with a screencast for an end-to-end example of a kubeprod run, from an user perspective.

Starting with an empty AKS cluster:

  • Run kubeprod install
  • Show installed components
  • Show a "kubeprod" ready chart
  • Install the chart
  • Show application logs in Kibana, show monitoring in Grafana

Add AKS as a supported platform

AKS is going to be one of the first supported platforms for the prod runtime.

Tests should run automatically against an AKS cluster

Improve manifests tests and related documentation

Running "make validate" should be documented (what it does, and its requirements). For instance, does it require any particular platform? At the moment we only support AKS, so probably it makes sense to make sure kubeconfig points to an AKS cluster. However, looking into the future, we should have platform-specific tests and that will require having kubeconfig configured properly.

GKE e2e tests

We should fix and enable the end to end GKE tests in Jenkins

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.