datastax / pulsar-helm-chart Goto Github PK

View Code? Open in Web Editor NEW

46.0 12.0 38.0 1.66 MB

Apache Pulsar Helm chart

License: Apache License 2.0

Shell 27.20% Mustache 72.80%

pulsar-cluster helm-chart pulsar-heartbeat pulsar

pulsar-helm-chart's Issues

Configuration .Values.tls.proxy.enableTlsWithBroker and .Values.broker.enableForProxyToBroker are conflicting

.Values.tls.proxy.enableTlsWithBroker:

pulsar-helm-chart/helm-chart-sources/pulsar/values.yaml

Lines 200 to 202 in 08fde0f

 proxy: 

 # Applies to connections to standalone function worker, too. 

 enableTlsWithBroker: false

.Values.broker.enableForProxyToBroker:

pulsar-helm-chart/helm-chart-sources/pulsar/values.yaml

Lines 214 to 215 in 08fde0f

 broker: 

 enableForProxyToBroker: false

Similar problem seems to exist with .Values.tls.function.enableTlsWithBroker and .Values.tls.broker.enableForFunctionWorkerToBroker.

Wouldn't it make sense to have .Values.tls.broker.enabled instead?

Missing charts

Using current master branch helmcharts
Download dev-values.yaml
helm install pulsar-hank -n hank-test -f dev-values.yaml ./pulsar/
Out: Error: found in Chart.yaml, but missing in charts/ directory: kube-prometheus-stack, cert-manager, keycloak

Unable to specify loadBalancerIP on Services

I'd like to set an internal static IP to a service with type LoadBalancer, but there is no configuration available to define this.
Could we get a service.loadBalancerIP option for pulsar services so this could be set via helm?

Document reclaimPolicy #137

externalDNS version doesn't work with k8s 1.22

The version of externalDNS image used by the chart does not work in k8s 1.22.

The following version works: k8s.gcr.io/external-dns/external-dns:v0.10.2

Also, since the older ingress API versions were removed, the ClusterRole rules needs to be updated to this:

rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get","watch","list"]
  - apiGroups: ["networking","networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get","watch","list"]

[Functions] Failed to resolve 'pulsar-broker.mypulsar.svc.cluster.local'

When deploying pulsar with TLS enabled and running a source function the following error is logged in the spawned function:

java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: java.net.UnknownHostException: Failed to resolve 'pulsar-broker.mypulsar.svc.cluster.local' after 2 queries 
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
	at org.apache.pulsar.client.impl.ConnectionPool.lambda$createConnection$10(ConnectionPool.java:226) ~[java-instance.jar:?]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[java-instance.jar:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[java-instance.jar:?]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:391) ~[java-instance.jar:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[java-instance.jar:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[java-instance.jar:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[java-instance.jar:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: java.net.UnknownHostException: Failed to resolve 'pulsar-broker.mypulsar.svc.cluster.local' after 2 queries 
	... 8 more

Repro:

Deploy pulsar with tls and functions enabled (values below)
create a partitioned topic and subscription
generate a source function:

                        bin/pulsar-admin sources create
                        -t data-generator --name data-generator-source
                        --source-config '{"sleepBetweenMessages":"10"}'
                        --destination-topic-name persistent://public/default/test

Values:

enableAntiAffinity: no
enableTls: yes
tls:
  function:
    enableTlsWithBroker: true
    enableHostnameVerification: true
cert-manager:
  enabled: true
createCertificates:
  selfSigned:
    enabled: true
enableTokenAuth: yes
autoRecovery:
  enableProvisionContainer: yes
restartOnConfigMapChange:
  enabled: yes
image:
  zookeeper:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  bookie:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  bookkeeper:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  autorecovery:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  broker:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  proxy:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  functions:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
  function:
    repository: datastaxlunastreaming-all
    tag: 2.7.2_1.1.32
extra:
  broker: false
  brokerSts: true
  function: yes
  burnell: yes
  burnellLogCollector: yes
  pulsarHeartbeat: yes
  pulsarAdminConsole: yes
  functionsAsPods: yes
default_storage:
  existingStorageClassName: server-storage
volumes:
  data: #ASF Helm Chart
    storageClassName: existent-storage-class
zookeeper:
  replicaCount: 3
bookkeeper:
  replicaCount: 3
broker:
  component: broker
  replicaCount: 2
  ledger:
    defaultEnsembleSize: 1
    defaultAckQuorum:  1
    defaultWriteQuorum: 1
function:
  replicaCount: 1
  functionReplicaCount: 1
  runtime: "kubernetes"
proxy:
  disableZookeeperDiscovery: true
  useStsBrokersForDiscovery: true
  replicaCount: 2
  autoPortAssign:
    enablePlainTextWithTLS: yes
  service:
    type: ClusterIP
    autoPortAssign:
      enabled: yes
grafanaDashboards:
  enabled: yes
pulsarAdminConsole:
  replicaCount: 0
  service:
    type: ClusterIP
grafana: #ASF Helm Chart
  service:
    type: ClusterIP
pulsar_manager:
  service: #ASF Helm Chart
    type: ClusterIP
kube-prometheus-stack: # Luna Streaming Helm Chart
  enabled: no
  prometheusOperator:
    enabled: no
  grafana:
    enabled: no
    service:
      type: ClusterIP
pulsarSQL:
  service:
    type: ClusterIP

TLS ZooKeeper support - use secrets to pass passwords

Now that we are on ZooKeeper 3.8.0 it is possible to pass the passwords using a file
https://issues.apache.org/jira/browse/ZOOKEEPER-4396

Prepare next release with support for OpenShift deployment

This is an issue to track the tasks for preparing the release with support for OpenShift deployment

Tasks

Cannot set existing StorageClass with `existingStorageClassName`.

Problem

I both tried setting existingStorageClassName under global default_storage or under specific volume to an existing StorageClass ebs-pulsar, but the created PVCs are still using the default StorageClass ebs-gp3.

My custom helm values:

components:
  zookeeper: true
  bookkeeper: true
  autorecovery: true
  broker: true
  functions: true
  proxy: true
  toolset: true
  pulsar_manager: true
monitoring:
  prometheus: false
  grafana: false
volumes:
  persistence: true
default_storage:
  existingStorageClassName: ebs-pulsar
antiAffinity:
  host:
    enabled: true
    mode: required
  zone:
    enabled: true
nodeSelector:
  dedicated: infrastructure
zookeeper:
  volumes:
    data:
      name: data
      size: 40Gi
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
bookkeeper:
  volumes:
    journal:
      name: journal
      size: 20Gi
    ledgers:
      name: ledgers
      size: 100Gi
    ranges:
      name: ranges
      size: 10Gi
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
autorecovery:
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
broker:
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
functions:
  volumes:
    data:
      name: logs
      size: 10Gi
      existingStorageClassName: ebs-pulsar
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
proxy:
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
toolset:
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure
pulsar_manager:
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: infrastructure

Env

Kubernetes: v1.21 in EKS
Helm: v3.3.4
Pulsar Chart: v2.7.6

Upgrade Grafana to mitigate 0day CVE-2021-43798: Grafana directory traversal

More details in:
https://grafana.com/blog/2021/12/08/an-update-on-0day-cve-2021-43798-grafana-directory-traversal/

Could be covered as part of #22

Add documentation to disable `kube-prometheus-stack` optional features

Documentation should be added to cover the issue described below.

Whenever the conditional dependency kube-prometheus-stack.enabled is disabled, it does not produce the expected behaviour of disabling all the Prometheus stack components. In fact, each of those should be disabled in the values.yaml, by adding the following:

kube-prometheus-stack:
  enabled: false
  prometheusOperator:
    enabled: false
  grafana:
    enabled: false
    adminPassword: e9JYtk83*4#PM8
  alertmanager:
    enabled: false
  prometheus:
    enabled: false

It should be noted that unless all the Prometheus components are disabled, the helmchart will attempt to install Prometheus CRDs. This might be an issue in cases when the service account used to deploy the cluster does not have enough permissions to install Custom Resources Definitions.

Pod in crashloop ‘data/zookeeper’: Permission denied

Hey there,

currently getting a crashloop on zookeeper pod:

[conf/zookeeper.conf] Adding config quorumListenOnAllIPs = true
Current server id 1
Creating data/zookeeper/myid with id = 1
mkdir: cannot create directory ‘data/zookeeper’: Permission denied

kubectl get pods -n pulsar   
NAME                                                  READY   STATUS             RESTARTS   AGE
angry-shark-29-grafana-58fbb56d8f-j2ntg               2/2     Running            0          2m34s
angry-shark-29-kube-promet-operator-8bbcdbcf6-8mfpt   1/1     Running            0          2m34s
angry-shark-29-kube-state-metrics-7797486dfd-6xfm6    1/1     Running            0          2m34s
angry-shark-29-prometheus-node-exporter-mbcsk         1/1     Running            0          2m34s
angry-shark-29-prometheus-node-exporter-slxfk         1/1     Running            0          2m34s
angry-shark-29-prometheus-node-exporter-zlwk4         1/1     Running            0          2m34s
prometheus-angry-shark-29-kube-promet-prometheus-0    2/2     Running            0          2m32s
pulsar-autorecovery-59b5cbd74d-xx2qq                  1/1     Running            3          2m34s
pulsar-bastion-7f9656fdd7-czfbn                       1/1     Running            0          2m34s
pulsar-bookkeeper-0                                   0/1     Pending            0          2m59s
pulsar-broker-54cdf98dc9-pqdjd                        0/1     Pending            0          2m59s
pulsar-broker-c97d67cb8-sfd5j                         0/1     Init:0/1           0          2m33s
pulsar-proxy-559687647b-sthqj                         0/2     Pending            0          2m59s
pulsar-proxy-5fcbc8c9bd-h54fx                         0/3     Pending            0          2m33s
pulsar-zookeeper-0                                    0/1     CrashLoopBackOff   4          2m59s

Current HELM Values:

 storageValues = {
        "default_storage": {
          "provisioner": "kubernetes.io/aws-ebs",
          "type": "gp2",
          "fsType": "ext4",
          "extraParams": {
            "iopsPerGB": "10"
          }
        },
      };

const values = 
       {
        "fullnameOverride": "pulsar",
        "dnsName": "pulsar.example.com",
        "enableWaitContainers": "false",
        "rbac": {
          "create": true,
          "clusterRoles": true
        },
        "persistence": true,
        "enableAntiAffinity": false,
        "enableTls": false,
        "enableTokenAuth": false,
        "restartOnConfigMapChange": {
          "enabled": true
        },
        "extra": {
          "function": true,
          "burnell": true,
          "burnellLogCollector": true,
          "pulsarHeartbeat": true,
          "pulsarAdminConsole": true
        },
        "zookeeper": {
          "replicaCount": 1,
          "resources": {
            "requests": {
              "memory": "300Mi",
              "cpu": 0.3
            }
          },
          "configData": {
            "PULSAR_MEM": "\"-Xms300m -Xmx300m -Djute.maxbuffer=10485760 -XX:+ExitOnOutOfMemoryError\""
          }
        },
        "bookkeeper": {
          "replicaCount": 1,
          "resources": {
            "requests": {
              "memory": "512Mi",
              "cpu": 0.3
            }
          },
          "configData": {
            "BOOKIE_MEM": "\"-Xms312m -Xmx312m -XX:MaxDirectMemorySize=200m -XX:+ExitOnOutOfMemoryError\""
          }
        },
        "broker": {
          "component": "broker",
          "replicaCount": 1,
          "ledger": {
            "defaultEnsembleSize": 1,
            "defaultAckQuorum": 1,
            "defaultWriteQuorum": 1
          },
          "resources": {
            "requests": {
              "memory": "600Mi",
              "cpu": 0.3
            }
          },
          "configData": {
            "PULSAR_MEM": "\"-Xms400m -Xmx400m -XX:MaxDirectMemorySize=200m -XX:+ExitOnOutOfMemoryError\""
          }
        },
        "autoRecovery": {
          "resources": {
            "requests": {
              "memory": "300Mi",
              "cpu": 0.3
            }
          }
        },
        "function": {
          "replicaCount": 1,
          "functionReplicaCount": 1,
          "resources": {
            "requests": {
              "memory": "512Mi",
              "cpu": 0.3
            }
          },
          "configData": {
            "PULSAR_MEM": "\"-Xms312m -Xmx312m -XX:MaxDirectMemorySize=200m -XX:+ExitOnOutOfMemoryError\""
          }
        },
        "proxy": {
          "replicaCount": 1,
          "resources": {
            "requests": {
              "memory": "512Mi",
              "cpu": 0.3
            }
          },
          "wsResources": {
            "requests": {
              "memory": "512Mi",
              "cpu": 0.3
            }
          },
          "configData": {
            "PULSAR_MEM": "\"-Xms400m -Xmx400m -XX:MaxDirectMemorySize=112m\""
          },
          "autoPortAssign": {
            "enablePlainTextWithTLS": true
          },
          "service": {
            "autoPortAssign": {
              "enabled": true
            }
          }
        },
        "grafanaDashboards": {
          "enabled": true
        },
        "pulsarAdminConsole": {
          "replicaCount": 1
        },
        "kube-prometheus-stack": {
          "enabled": true,
          "prometheusOperator": {
            "enabled": true
          },
          "grafana": {
            "enabled": true,
            "adminPassword": "***********"
          }
        }
      }

Some help/hints much appreciated

Remove default credentials from values file

There are default credentials for configuring Tardigrade and Grafana in the values file:

tardigrade:
  access: access-key-generated-with-uplink
  accessKey: 2J7EJY4xTK6uHKqnCE4nAhdGfXqy
  secretKey: 4YeYwYdsoFFpvtNFuncWcTVqSTPL
  service:
    port: 7777
    type: ClusterIP

And:

  grafana:
    enabled: true
    # namespaceOverride: "monitoring"
    testFramework:
      enabled: false
    defaultDashboardsEnabled: true
    adminPassword: ZhF9sS8B7PQSTR

These default values should be removed.

Key duplication in configMaps causing errors on some deployment tools

In these two files I found that there is a key duplication in the configMap:

https://github.com/datastax/pulsar-helm-chart/blob/master/helm-chart-sources/pulsar/templates/broker-deployment/broker-configmap.yaml#L222
https://github.com/datastax/pulsar-helm-chart/blob/master/helm-chart-sources/pulsar/templates/zookeeper/zookeeper-configmap.yaml#L40

For example, if I run helm template . and view zookeeper's configMap output:

apiVersion: v1
kind: ConfigMap
metadata:
  name: "pulsar-broker"
  namespace: nuri-test
  labels:
    app: pulsar
    chart: pulsar-2.0.9
    release: RELEASE-NAME
    heritage: Helm
    component: broker
    cluster: pulsar
data:
  zookeeperServers:
    pulsar-zookeeper-ca:2181
  configurationStoreServers:
    pulsar-zookeeper-ca:2181
  clusterName: pulsar
  allowAutoTopicCreationType: "non-partitioned"
  PULSAR_EXTRA_OPTS: -Dpulsar.log.root.level=info
  PULSAR_GC: -XX:+UseG1GC
  PULSAR_LOG_LEVEL: info
  PULSAR_LOG_ROOT_LEVEL: info
  PULSAR_MEM: -Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled
    -Dio.netty.recycler.linkCapacity=1024 -XX:+ExitOnOutOfMemoryError
  backlogQuotaDefaultRetentionPolicy: producer_exception
  brokerDeduplicationEnabled: "false"
  exposeConsumerLevelMetricsInPrometheus: "false"
  exposeTopicLevelMetricsInPrometheus: "true"
  # Workaround for double-quoted values in old values files
  PULSAR_MEM: -Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ExitOnOutOfMemoryError
  PULSAR_GC: -XX:+UseG1GC

The last 2 (PULSAR_MEM and PULSAR_GC) appear twice. This isn't a problem for Helm to handle.

However, for deployment automation tools such as kustomize and FluxCD this presents a problem as they depend on kyaml.

This is a known issue: kubernetes-sigs/kustomize#3480
There's even the chance the error will eventually end up in Helm's newer versions.

I haven't noticed this duplication/workaround on apache's pulsar chart.

Would it be possible to remove the duplication or handle it in a way that will not produce double keys in the configMap?
There might be other instances of the duplication in other configMaps, I haven't gone through all of them.

Thank you.

Enable annotations on services

The following services do not support annotations, which mean that on AWS you cannot control the load balancer type used:

broker
pulsarSql

Add an option to delete PVCs while uninstalling the Helm Chart

While running tests on GKE with Fallout we found out that disks are not released even by deleting the GKE cluster.
This is because the PVC created by the Helm Chart are not deleted when uninstalling the chart.

It would be great to have an option to automatically delete the PVCs

CI build job "install-charts" should install charts with k8s 1.18, 1.21 and 1.23

Motivation

We should make sure that the Helm chart is compatible with older and newer versions of k8s.

Proposed change

Use a matrix job in .circleci/config.yml to run the install-charts job with multiple k8s versions

PulsarHeartBeat and Bastion compnent failed to find "rootCaSecretName"

When TLS is enabled and a customized root ca secrete name (e.g. tls-ss-ca) is provided (see below), PulsarHeartBeat and Bastion components failed to initialize with error "MountVolume.SetUp failed for volume "certs" : secret "tls-ss-ca" not found"

tls:
  ... ... 
   rootCaSecretName: "tls-ss-ca"

Checking the secrets, there is no ca certificate created with the customized name (e.g. 'tls-ss-ca'). However, a ca certificate with the default name 'pulsar-ss-ca' is created:

% kubectl get secrets | grep ss-ca
pulsar-ss-ca                                    kubernetes.io/tls                     3      7m32s

Move bookkeeper metadata initialization to it's own job

Currently
bookkeeper shell metaformat --nonInteractive is called every time a bookie is started:

pulsar-helm-chart/helm-chart-sources/pulsar/templates/bookkeeper/bookkeeper-statefulset.yaml

Lines 132 to 142 in 524d796

  # This initContainer will make sure that the bookkeeper 

 # metadata is in zookeeper 

 - name: pulsar-bookkeeper-metaformat 

 image: "{{ .Values.image.bookkeeper.repository }}:{{ .Values.image.bookkeeper.tag }}" 

 imagePullPolicy: {{ .Values.image.bookkeeper.pullPolicy }} 

 command: ["sh", "-c"] 

 args: 

 - > 

  bin/apply-config-from-env.py conf/bookkeeper.conf && 

  bin/apply-config-from-env.py conf/bkenv.sh && 

  bin/bookkeeper shell metaformat --nonInteractive || true;

bookkeeper shell metaformat command is deprecated since 4.7.0
https://bookkeeper.apache.org/docs/latest/reference/cli/#bookkeeper-shell-metaformat

This command is deprecated since 4.7.0, in favor of using initnewcluster for initializing a new cluster and nukeexistingcluster for nuking an existing cluster.

Consider moving the bookkeeper metadata initialization to it's own job

Zoo Navigator image no longer available

Current helm chart references Zoo Navigator images at v0.6.0, but these are no longer available on docker hub.

Current is here:

https://hub.docker.com/r/elkozmon/zoonavigator

And does not have separate web and api tiers.

Error while installing: " unable to build kubernetes objects from release manifest"

Dears,
After following your instructions:

helm repo add datastax-pulsar https://datastax.github.io/pulsar-helm-chart
helm repo update
curl -LOs https://datastax.github.io/pulsar-helm-chart/examples/dev-values.yaml
helm install pulsar -f dev-values.yaml datastax-pulsar/pulsar

I get this:

$ helm install pulsar -f dev-values.yaml datastax-pulsar/pulsar
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Prometheus.spec): unknown field "probeNamespaceSelector" in com.coreos.monitoring.v1.Prometheus.spec, ValidationError(Prometheus.spec): unknown field "probeSelector" in com.coreos.monitoring.v1.Prometheus.spec, ValidationError(Prometheus.spec): unknown field "shards" in com.coreos.monitoring.v1.Prometheus.spec]

Is there a bug in your templates ?

Regards

NIOServerCnxnFactory should be used for Zookeeper in Pulsar 2.8.0+

org.apache.zookeeper.server.NettyServerCnxnFactory has issues.

References:

Update request for PodSecurityPolicy Deprecation

The current plan is to remove PSP from Kubernetes in the 1.25 release.

Kafka examples are outdated

We should update the examples about how to bootstrap Starlight For Kafka

Service Annotations in pulsar admin console incorrectly scoped

At this line:

https://github.com/datastax/pulsar-helm-chart/blob/master/helm-chart-sources/pulsar/templates/admin-console/pulsar-admin-console-service.yaml#L32

The helm variables name for service annotations is .Values.pulsarAdminConsole.annotations but should be scoped to the service, consistent with other components, as .Values.pulsarAdminConsole.service.annotations

Workaround is to change your values.yaml to match, but it is then inconsistent with standards and the other components of the chart.

DNS resolutions errors with Broker host names returned by Pulsar lookups

There's currently a conflicting problem with the Pulsar k8s deployment and how Pulsar load balancing works.

When a Pulsar broker starts, it will register itself as a broker in the internal Pulsar load balancer. Pulsar load balancer might immediately assign new namespace bundles to the broker and the topics might immediately get requests.

The conflicting problem is that DNS resolution for the broker's host name will fail with the current settings until the broker's readiness probe succeeds.

Pulsar might already return the hostname of a specific broker to a client, but the client cannot resolve the DNS name since the broker's readiness probe hasn't passed. This causes extra delays and also bugs when connecting to topics after a load balancing event. Pulsar clients usually backoff and retry. For Admin API HTTP requests, clients might not properly handle errors and for example Pulsar Proxy will fail the request when there's a DNS lookup issue.

solution:
Broker statefulset's service should use publishNotReadyAddresses: true

There's useful information about stateful sets and publishNotReadyAddresses setting:
k8ssandra/cass-operator#18

There's an alternative solution in #198 which is fine for cases where TLS is disabled for brokers. Stable hostnames are required when using TLS to be able to do hostname verification for the certificates.

Review Port Definitions to Ensure Chart Flexibility

Observation

The current chart has hard coded ports throughout. A good example is the Pulsar Admin Console's nginx configuration. It has hard coded ports for the pulsar proxy service. However, the pulsar proxy service allows for ports to be defined. If a user were to deploy the chart with non-default ports for the proxy service (and possibly other services), the components might not integrate properly.

Solution

Review all hard coded ports. Use the .Values to make those ports configurable. In the case of services, it can be easier to declare target ports using the pod's port names instead of the port numbers. I think using port names, instead of numbers, can make it more readable and reduces the configuration necessary for a service. This feature is described here: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service.

Pulsar-hearbeat init container is connecting with port 8080 on broker

After enabling tls on broker and disabling port 8080, pulsar hearbeat init container is still trying to connect to port 8080.

https://github.com/datastax/pulsar-helm-chart/blob/pulsar-2.1.3/helm-chart-sources/pulsar/templates/pulsar-heartbeat/pulsar-heartbeat-deployment.yaml#L71

curl command should be trying to connect to port 8443 after enabling tls on broker by hitting https endpoint instead of currently configured http.

Istio service mesh compatibility

Currently the manner in which jobs start in this helm chart interferes with the use of an istio service mesh.

However, pulsar otherwise would be very compatible with a service mesh and this would reduce complexity for mTLS and ingress routing to a separate aspect of the environment, which is often desirable.

This issue can be reproduced as follows with a basic istio installation. From a new installation:

kubectl create namespace pulsar
kubectl label namespace pulsar istio-injection=enabled
helm upgrade --install  \
   pulsar  datastax-pulsar/pulsar \
  --namespace pulsar \
  --create-namespace

This will result in a set of initialization jobs that are hung up.

If instead we disable istio in the namespace kubectl label namespace pulsar istio-injection-, install datastax luna pulsar via the helm chart, then re-eanble istio, and cycle the various pods that take place in the data plane, the system works as expected.

This indicates that a set of tweaks to the jobs ( e.g. by setting pod labels sidecar.istio.io/inject=false ) may make this chart compatible with istio.

Jobs observed:

pulsar-kube-prometheus-sta-admission-create
pulsar-kube-prometheus-sta-admission-patch
pulsar-dev-zookeeper-metadata

These jobs will not reach a completed state unless the associated istio-proxy container exits successfully (i.e. shell into the istio-proxy container and run kill -TERM 1). An alternative approach is to add a preStop condition to the main container to call curl -sf -XPOST http://127.0.0.1:15020/quitquitquit which will tell the istio-proxy to exit, or other ideas include a dedicated additional sidecar to manage this. See https://discuss.istio.io/t/best-practices-for-jobs/4968/2 for more.

PulsarSQL worker node readiness and liveness probes fail

The readiness and liveness probes fail for the pulsarSQL worker nodes.

For example, from the pod in question:

pulsar@pulsar-sql-worker-84b789f6bc-wtztr:/pulsar/conf$ hostname -i
192.168.14.34

Calling the status endpoint as localhost fails:

pulsar@pulsar-sql-worker-84b789f6bc-wtztr:/pulsar/conf$ curl http://localhost:8090/v1/service/presto
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/v1/service/presto</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>org.glassfish.jersey.servlet.ServletContainer-5f6e2ad9</td></tr>
</table>

</body>
</html>

As does calling the status endpoint via the pod IP:

pulsar@pulsar-sql-worker-84b789f6bc-wtztr:/pulsar/conf$ curl http://192.168.14.34:8090/v1/service/presto
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/v1/service/presto</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>org.glassfish.jersey.servlet.ServletContainer-5f6e2ad9</td></tr>
</table>

</body>
</html>

In "all-component-tls" scenario, Pulsar function worker always have both non-TLS port and TLS port enabled

Using "all-component-tls" example values.yaml file, I can see that the function worker always have both ports enabled.

workerPort: "6750"
workerPortTls: "6751"

Also, the function worker Pod readiness probe is always checking against the non-TLS port:

% kubectl describe pod pulsar-function-0 | grep Readiness:
    Readiness:  tcp-socket :6750 delay=10s timeout=5s period=30s #success=1 #failure=3

Pulsar Admin Console does not work

I have just installed this helm-chart on a brand new k8s cluster
the admin-console seems to work, but it reality it doesn't

Chrome console is full of errors like this one
xhr.js:175 GET http://localhost:8888/api/v1/pulsar/tenants 502 (Bad Gateway)
and other timeout errors

Release request

Hello,

What is the schedule for releases? We'd like to use a previously patched commit but we're just waiting for the next release.
Thank you again for the continuous patches.

This is the patch we're after.
#141

Thanks!

Liveness check failed due to the way to deploy broker in Statefulset

Modify broker deploy way from Deployment to Statefulset
Execute the helm install command
Pulsar heartbeat and function pod can not be running due to connection error

Works fine after updating broker.component from broker to brokersts in dev-values.yaml file. Is this the expected way?

Ingress for admin-console has misaligned ports

Template for ingress.yaml for admin-console hardcodes a backend port of 8080 but default service uses ports 80 and 443 so ingress is broken for admin console.

Zookeeper service accessed by shortname

It is only possible to access Zookeeper service via shortname (pulsar-zookeeper-ca:2181)

pulsar-helm-chart/helm-chart-sources/pulsar/templates/bookkeeper/bookkeeper-configmap.yaml

Line 31 in db0c121

zkServers:

I understand that this works for DNS resolution since the brokers live in the same namespace and the domain is present in the search option of the /etc/resolv.conf. However, we'd like to avoid having to generate certificates using short-names. Is it possible to add an option to add a full domain for the zookeeper service?

For example something like:
{{ template "pulsar.fullname" . }}-{{ .Values.zookeeper.component }}-ca{{- if .Values.zookeeper.domain -}}.{{ .Values.zookeeper.domain }}{{- end -}}:2281

Thanks

Upgrade Keycloak to mitigate CVEs

There are some recent CVEs in Keycloak. It is most likely necessary to upgrade Keycloak that is provided with this Helm chart.

Use fully-qualified DNS name to support TLS hostname verification

The generated deployment does not use a fully-qualified DNS name everywhere, so some TLS hostname verification fails.

Support existing Keycloak instance

Thanks for the OIDC plugin and including a setup for Keycloak in this helm chart!

For those of us who already have existing Keycloak instances, it would be great to be able to leverage those by configuring the Keycloak component to point to our existing instance rather than deploying a new one.

PulsarSQL ingress port is wrong

INgress for pulsarSQL points to port 8080 on the service at this line (hardcoded):

https://github.com/datastax/pulsar-helm-chart/blob/master/helm-chart-sources/pulsar/templates/pulsarSql/ingress.yaml#L42

But the default values is set for port 8090, thus breaking deployments.

superUserRoles shouldn't need to contain the proxy role

Currently the sample "superUserRoles" is superuser,admin,websocket,proxy.

Why does this include all possible roles?

One reason seems to be that the token generation with Burnell uses the "SuperRoles" environment variable with .Values.superUserRoles to generate the tokens.

pulsar-helm-chart/helm-chart-sources/pulsar/templates/autorecovery/autorecovery-deployment.yaml

Lines 180 to 181 in 9763ba7

 - name: SuperRoles 

 value: {{ .Values.superUserRoles }}

https://github.com/datastax/burnell/blob/5a7c261e498ff5b34356b0c164d357e9f3a8b81b/src/workflow/keys-jwt.go#L98

Update Ingress API Version

Sample warning when using the Ingress for PulsarSQL:

W0128 16:06:08.502415 22507 warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress

Upgrade Circle CI Base Image

[Action Required] Ubuntu 14.04 machine image deprecation & EOL

We are deprecating Ubuntu 14.04-based machine images on CircleCI in preparation for an EOL on Tuesday, May 31, 2022 to ensure your builds remain secure. For a detailed overview of how this will affect your workflow, read the blog article here.

We will also be conducting temporary brownouts on Tuesday, March 29, 2022, and again on Tuesday, April 26, 2022 during which these images will be unavailable.

We are contacting you because one or more of your projects has a job that either:

does not specify an image (uses machine: true in config)
explicitly uses an Ubuntu 14.04-based image

Jobs that do not specify an image default to using an Ubuntu 14.04-based image.

If you have specified an Ubuntu 14.04-based image or you are using machine: true in your config file, please see our migration guide to upgrade to a newer version of Ubuntu image in order to avoid any service disruption during the brownout & subsequent EOL.

We will also be releasing a CircleCI Ubuntu 22.04 image on April 22nd offering the flexibility to upgrade to the latest LTS version of Ubuntu image before we remove older versions permanently. A beta version of the image will be available March 21st.

Allow configuring liveness and readiness probe timeouts

Kubernetes has a default probe timeout of 1 second.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes

timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.

The default 1 second timeout causes the probe to intermittently fail. This can cause undesired restarts.

Make the liveness and readiness probe timeouts configurable and set the default value to 5 seconds to prevent undesired restarts.

In the k8s docs it says

"Before Kubernetes 1.20, the field timeoutSeconds was not respected for exec probes: probes continued running indefinitely, even past their configured deadline, until a result was returned."
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes

Fix org.apache.kafka.common.errors.UnknownServerException on ktool deployment

When the deploying the helmchart with values:

              enableAntiAffinity: no
              initialize: true  # ASF Helm Chart
              restartOnConfigMapChange:
                enabled: yes
              image:
                zookeeper:
                  repository: pulsar-repo
                  tag: 2.8.x
                bookie:
                  repository: pulsar-repo
                  tag: 2.8.x
                bookkeeper:
                  repository: pulsar-repo
                  tag: 2.8.x
                autorecovery:
                  repository: pulsar-repo
                  tag: 2.8.x
                broker:
                  repository: pulsar-repo
                  tag: 2.8.x
                proxy:
                  repository: pulsar-repo
                  tag: 2.8.x
                functions:
                  repository: pulsar-repo
                  tag: 2.8.x
                function:
                  repository: pulsar-repo
                  tag: 2.8.x
              extra:
                function: yes
                burnell: no
                burnellLogCollector: no
                pulsarHeartbeat: no
                pulsarAdminConsole: no
                autoRecovery: no
                functionsAsPods: yes
              default_storage:
                existingStorageClassName: server-storage
              volumes:
                data: #ASF Helm Chart
                  storageClassName: existent-storage-class
              zookeeper:
                replicaCount: 3
              bookkeeper:
                replicaCount: 3
              broker:
                component: broker
                replicaCount: 2
                ledger:
                  defaultEnsembleSize: 2
                  defaultAckQuorum:  2
                  defaultWriteQuorum: 2
                service:
                  annotations: {}
                  type: ClusterIP
                  headless: false
                  ports:
                  - name: http
                    port: 8080
                  - name: pulsar
                    port: 6650
                  - name: https
                    port: 8443
                  - name: pulsarssl
                    port: 6651
                  - name: kafkaplaintext
                    port: 9092
                  - name: kafkassl
                    port: 9093
                  - name: kafkaschemareg
                    port: 8001
                kafkaOnPulsarEnabled: true
                kafkaOnPulsar:
                  saslAllowedMechanisms: PLAIN
                  brokerEntryMetadataInterceptors: "org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor"
                  kopSchemaRegistryEnable: true
              function:
                replicaCount: 1
                functionReplicaCount: 1
                runtime: "kubernetes"
              proxy:
                replicaCount: 2
                autoPortAssign:
                  enablePlainTextWithTLS: yes
                service:
                  type: ClusterIP
                  autoPortAssign:
                    enabled: yes
                configData:
                  PULSAR_MEM: "\"-Xms400m -Xmx400m -XX:MaxDirectMemorySize=112m\""
                  PULSAR_PREFIX_kafkaListeners: "SASL_PLAINTEXT://0.0.0.0:9092"
                  PULSAR_PREFIX_kafkaAdvertisedListeners: "SASL_PLAINTEXT://pulsar-proxy:9092"
                  PULSAR_PREFIX_saslAllowedMechanisms: PLAIN
                  PULSAR_PREFIX_kafkaProxySuperUserRole: superuser
                  PULSAR_PREFIX_kopSchemaRegistryProxyEnableTls: "false"
                  PULSAR_PREFIX_kopSchemaRegistryEnable: "true"
                  PULSAR_PREFIX_kopSchemaRegistryProxyPort: "8081"
                extensions:
                  enabled: true
                  extensions: "kafka"
                  containerPorts:
                    - name: kafkaplaintext
                      containerPort: 9092
                    - name: kafkassl
                      containerPort: 9093
                    - name: kafkaschemareg
                      containerPort: 8081
                  servicePorts:
                  - name: kafkaplaintext
                    port: 9092
                    protocol: TCP
                    targetPort: kafkaplaintext
                  - name: kafkassl
                    port: 9093
                    protocol: TCP
                    targetPort: kafkassl
                  - name: kafkaschemareg
                    port: 8081
                    protocol: TCP
                    targetPort: kafkaschemareg
              grafanaDashboards:
                enabled: no
              pulsarAdminConsole:
                replicaCount: 0
                service:
                  type: ClusterIP
              grafana: #ASF Helm Chart
                service:
                  type: ClusterIP
              pulsar_manager:
                service: #ASF Helm Chart
                  type: ClusterIP
              kube-prometheus-stack: # Luna Streaming Helm Chart
                enabled: no
                prometheusOperator:
                  enabled: no
                grafana:
                  enabled: no
                  adminPassword: 123
                  service:
                    type: ClusterIP
              pulsarSQL:
                service:
                  type: ClusterIP
              enableTls: no
              enableTokenAuth: no

The following error is thrown by the ktool pod:

[2022-01-01 12:49:53,587] ERROR Failed to start KSQL (io.confluent.ksql.rest.server.KsqlServerMain:66)
io.confluent.ksql.util.KsqlServerException: Could not get Kafka cluster configuration!
	at io.confluent.ksql.services.KafkaClusterUtil.getConfig(KafkaClusterUtil.java:96)
	at io.confluent.ksql.security.KsqlAuthorizationValidatorFactory.isKafkaAuthorizerEnabled(KsqlAuthorizationValidatorFactory.java:81)
	at io.confluent.ksql.security.KsqlAuthorizationValidatorFactory.create(KsqlAuthorizationValidatorFactory.java:51)
	at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:724)
	at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:637)
	at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:152)
	at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:59)
Caused by: org.apache.kafka.common.errors.UnknownServerException: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /10.244.4.18:9092

This is resolved by reverting ab5216de64e5868a378a2a08f0ff6efa0f0430ef, as it can be seen by using this "version" of the helmchart. However, a better solution is needed.

Cannot start CertManager on latest minikube v1.23.0

I am trying to configure Pulsar with TLS on minikube on Mac, but the default/pulsar-cert-manager-cainjector-564d757c9f-hzpps:cert-manager pod errors with:

I0910 06:45:11.336198 1 request.go:645] Throttling request took 1.0469241s, request: GET:https://10.96.0.1:443/apis/authentication.k8s.io/v1?timeout=32s │
│ E0910 06:45:12.194358 1 start.go:151] cert-manager/ca-injector "msg"="Error registering certificate based controllers. Retrying after 5 seconds." "error"="no matches for kind "MutatingWebhookConfigurat │
│ Error: error registering secret controller: no matches for kind "MutatingWebhookConfiguration" in version "admissionregistration.k8s.io/v1beta1" │
│ Usage: │
│ ca-injector [flags]

I have followed the instructions in the README and it used to work with a start k8s installation

Minikube version:

minikube version
minikube version: v1.23.0
commit: 5931455374810b1bbeb222a9713ae2c756daee10

PulsarSQL worker misconfigured and will not start.

The pulsar SQL worker pod fails to start due to the erroneous configuration value discovery-server.enabled=true in config.properties. Per this discussion:

https://groups.google.com/g/presto-users/c/AetYYMvfsiA?pli=1

This is no longer a valid config option. Removing this configuration fixes the start up problem.

Autorecovery doesn't expose metrics via Prometheus

When autorecovery is a separate pod, the metrics don't get exposed.

The Prometheus scraping annotations seem to be missing:
https://github.com/datastax/pulsar-helm-chart/blob/master/helm-chart-sources/pulsar/templates/autorecovery/autorecovery-deployment.yaml

Pulsar admin in bastion does not work

Today I have installed on a vanilla k8s env this helm chart, but I have troubles with using pulsar-admin


(ctool-env) enrico.olivelli@eolivelli-rmbp16 pulsar % kubectl exec $(kubectl get pods -l component=bastion -o jsonpath="{.items[*].metadata.name}") -it -- /bin/bash
root@pulsar-bastion-7489594b85-fxjz7:/pulsar# 
root@pulsar-bastion-7489594b85-fxjz7:/pulsar# 
root@pulsar-bastion-7489594b85-fxjz7:/pulsar# 
root@pulsar-bastion-7489594b85-fxjz7:/pulsar# bin/pulsar-admin tenants list
Warning: Nashorn engine is planned to be removed from a future JDK release

null

Reason: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: Connection refused: pulsar-proxy/10.109.252.132:8080

we have two problems here:

it does not work: Connection refused: pulsar-proxy/10.109.252.132:8080
I am still seeing the "Nashorn engine is planned to be removed from a future JDK release" that has been removed by Ming

Pulsar Broker metadata initialization should use the given Broker image instead of Zookeeper image

In the chart, Pulsar Broker metadata initialization is called "zookeeperMetadata" which is misleading. It uses the Zookeeper image which is wrong. The Pulsar Broker image should be used for initializing the Pulsar Broker metadata.

pulsar-helm-chart/helm-chart-sources/pulsar/templates/zookeeper/zookeeper-metadata.yaml

Lines 65 to 66 in 524d796

 image: "{{ .Values.image.zookeeper.repository }}:{{ .Values.image.zookeeper.tag }}" 

 imagePullPolicy: {{ .Values.image.zookeeper.pullPolicy }}

	proxy:
	# Applies to connections to standalone function worker, too.
	enableTlsWithBroker: false

	# This initContainer will make sure that the bookkeeper
	# metadata is in zookeeper
	- name: pulsar-bookkeeper-metaformat
	image: "{{ .Values.image.bookkeeper.repository }}:{{ .Values.image.bookkeeper.tag }}"
	imagePullPolicy: {{ .Values.image.bookkeeper.pullPolicy }}
	command: ["sh", "-c"]
	args:
	- >
	bin/apply-config-from-env.py conf/bookkeeper.conf &&
	bin/apply-config-from-env.py conf/bkenv.sh &&
	bin/bookkeeper shell metaformat --nonInteractive \|\| true;

	image: "{{ .Values.image.zookeeper.repository }}:{{ .Values.image.zookeeper.tag }}"
	imagePullPolicy: {{ .Values.image.zookeeper.pullPolicy }}