ibm / event-streams Goto Github PK

IBM Event Streams issues repository. For more details, see the website:

Home Page: https://ibm.github.io/event-streams/

License: Apache License 2.0

Ruby 0.10% HTML 78.45% CSS 2.88% JavaScript 8.05% Shell 2.76% SCSS 7.76%

event-streams's Introduction

Archived repository for Event Streams documentation site

This repository was used to host the source files for the IBM Event Streams end-user documentation site. Event Streams is now part of IBM Event Automation.

event-streams's People

Contributors

Stargazers

Watchers

Forkers

gavin15 pabhinav ajborley jackcarnes tagarr tinaxq bhaskers-blu-org1 alexandrezanetti silaman tmitchell10 hguzha wdrdres3qew5ts21 vigneshibmpub ghas-results

event-streams's Issues

Schema Registry auto-schema registration id generation problem

Issue Description

The Event Streams Schema Registry uses a hash function to generate a consistent ID for new schemas based on the subject name. It produces a 32-bit unsigned int.

The Java Confluent client expects an int
https://github.com/confluentinc/schema-registry/blob/a91a82f795cc5afd1d85545ce14af78d174a52b7/client/src/main/java/io/confluent/kafka/schemaregistry/client/rest/entities/requests/RegisterSchemaResponse.java#L28

This means that for some subject names, the Java client throws an exception when trying to process the API response from the schema registry:

org.apache.kafka.common.errors.SerializationException Error serializing Avro message
Caused by: com.fasterxml.jackson.databind.JsonMappingException Numeric value (2257871337) out of range of int
at [Source: (sun.net.www.protocol.http.HttpURLConnection$HttpInputStream); line: 1, column: 17]
at [Source: (sun.net.www.protocol.http.HttpURLConnection$HttpInputStream); line: 1, column: 7] (through reference chain: io.confluent.kafka.schemaregistry.client.rest.entities.requests.RegisterSchemaResponse["id"])
at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:391)
at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:351)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1711)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:290)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4013)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3077)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:188)

Environment

IBM Event Streams Version: 2019.4.2

Zookeeper pods restart reporting 137 OOM Errors

Issue Description

When running IBM Event Streams 2018.3.0, the zookeeper pods repeatedly restart with the following shown by a kubectl logs --previous command called against the failing container:

Wrote ZooKeeper configuration file to /var/lib/zookeeper/conf/zoo.cfg
Creating ZooKeeper log4j configuration
Wrote log4j configuration to /var/lib/zookeeper/conf/log4j.properties
Creating JVM configuration file
Wrote JVM configuration to /var/lib/zookeeper/conf/java.env
ZooKeeper JMX enabled by default
Using config: /var/lib/zookeeper/conf/zoo.cfg
[WARN  tini (1)] Reaped zombie process with pid=38

A kubectl describe of the failing pod shows the following for the Zookeeper container:

   Last State:    Terminated
     Reason:      OOMKilled
     Exit Code:   137

Taking a javacore of a running Zookeeper process showed that the amount of memory in use was greater than what had been allocated for use by the container.

Running dmesg on the VM hosting the restarting Zookeeper shows pagefault_out_of_memory errors and the oom-killer being invoked on the Zookeeper JRE.

Issue Resolution

The default memory limits for the Zookeeper pods specified in the Event Streams install charts have been increased so that sufficient resources are available.

Workaround

Update the resources available to the Zookeeper container using the information here

https://ibm.github.io/event-streams/administering/scaling/

Fix details

IBM Internal issue number - 1844
Fix target - 2018.3.1

Placeholder issue

Issue Description

Placeholder for internal issue 2078

Environment

IBM Event Streams Version:
IBM Cloud Private (ICP) Version:
Operating system of ICP install:
Browser (for UI issues):

Is the Kafka REST API available for ICP EventStreams?

I've noticed the Kafka REST API is available in our cloud EventStreams offering:

https://console.bluemix.net/docs/services/EventStreams/eventstreams025.html#rest_using

Is it also available for the ICP offering? Thanks!

LDAP issues-auth-pod "bug"

Issue Description

error with the following pod after install of ICP 3.1.1 and configuration of LDAP
icp-platform-auth

Describe the problem
After the installation of ICP 3.1.1, the cp-platform-auth starts logging several errors; This causes the Authentication to LDAP random and sporadic
Describe the application flow or what steps were being taken when the problem occurred
Configuring LDAP
If the issue is reproducible provide a set of instructions describing how to reproduce it, including screenshots if they might assist
Once the environment is installed and LDAP is configured, random failures with logon
Include any error messages, stack traces and logs associated with the error, in full, in text format (ie not a screenshot)
Pod Logs....
RROR ] CWWKS1617E: A userinfo request was made with an access token that was not recognized. The request URI was /oidc/endpoint/OP/userinfo.
[ERROR ] CWWKS1617E: A userinfo request was made with an access token that was not recognized. The request URI was /oidc/endpoint/OP/userinfo.
[ERROR ] CWWKS1617E: A userinfo request was made with an access token that was not recognized. The request URI was /oidc/endpoint/OP/userinfo.
[ERROR ] CWWKS1617E: A userinfo request was made with an access token that was not recognized. The request URI was /oidc/endpoint/OP/userinfo.
[ERROR ] CWWKS1617E: A userinfo request was made with an access token that was not recognized. The request URI was /oidc/endpoint/OP/userinfo.

Warning Unhealthy 7m kubelet, x.x.x.x Readiness probe failed: Get https://10.1.154.33:4300/: dial tcp 10.1.154.33:4300: connect: connection refused
Warning Unhealthy 6m (x5 over 7m) kubelet, 9.59.150.18 Readiness probe failed: Get https://10.1.154.33:9443/: dial tcp 10.1.154.33:9443: connect: connection refused
Normal Pulled 3m (x2 over 7m) kubelet, x.x.x.x Container image “mycluster.icp:8500/ibmcom/icp-identity-manager:3.1.1” already present on machine
Warning Unhealthy 2m (x4 over 5m) kubelet, x.x.x.x Readiness probe failed: Get https://10.1.154.33:9443/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Run the log collection script available at https://github.com/IBM/charts/blob/master/stable/ibm-eventstreams-dev/additionalFiles/get-logs.sh and attach the output to this issue
Describe the expected behaviour
No Errors and LDAP Auth to work without random failures
Did this used to work? If so what has changed, for example was any maintenance of the system performed?
No

Environment

IBM Event Streams Version:
IBM Cloud Private (ICP) Version: 3.1.1
Operating system of ICP install: Ubuntu 16.04
Browser (for UI issues):

Kafka messages with duplicate header keys prevent the message browser from displaying in the UI

Issue Description

When a producer for a topic creates message headers and more than 1 header has the same key, the messages cannot be displayed in the messages panel for the topic in the Event Streams UI.

The messages panel will display an error page but no notifications will be issued. To diagnose the problem, you need to inspect the network responses to requests in the browser when the Messages tab is opened for a selected topic. The request to look for is a POST https:///api/admin/gql/queries. The status code in the response for that request will be set to 200 ok but the response payload will contain an error similar to the following:

{"errors":[{"message":"Exception while fetching data (/topicData/partitions[0]/records) : Duplicate key dup-key (attempted merging values [B@5cf67132 and [B@fd8fe03a)","locations":[{"line":6,"column":7}],"path":["topicData","partitions",0,"records"],"extensions":{"classification":"DataFetchingException"}}],"data":{"topicData":{"partitions":[{"id":0,"latestOffset":5,"records":null,"__typename":"Partition"}],"__typename":"Topic"}}}

Environment

IBM Event Streams Version: 10.1

Event Streams UI presenting certificate issued by IBM Cloud Private when installed with custom certificates

Issue Description

IBM Event Streams installed with custom certificates is presenting a certificate issued by an IBM Cloud Private organisation when accessing the Event Streams UI.

This can manifest as warnings in the web browser regarding untrusted certificates as the browser does not trust the certificate issuer.

Issue Resolution

If a user supplies custom certificates during the Event Streams installation, the Event Streams UI now uses these certificate.

Workaround

Edit the deployment <release-name>-es-ui-deploy
For the container with name proxy, change the environment variables to the following:
- TLS_CERT : change the key from https.cert to tls.cert
- TLS_KEY : change the key from https.key to tls.key

Fix details

IBM Internal issue number - 2524
Fix target - 2019.1.1

SaslAuthenticationException after installation on ICP 3.1.1

Hi,
I just installed the IBM Event Streams Community Edition-2018.3.1 into a fresh installation of ICP 3.1.1 (CE) and get a org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed, invalid credentials when I'm trying to access the UI.
Checking the logs (as noted in #10) lead me to the fact, that the log of the eventstreams-ibm-es-access-controller is full of errors like this

{"error":"Post https://xxx.yyy.zzz:8443/iam-token/oidc/token: x509: certificate signed by unknown authority","message":"failed to obtain token from IAM","message_template":"failed to obtain token from IAM","mh_file":"renewer.go","mh_line":104,"mh_ts":"Feb  6 07:53:09.287","transaction_id":"a8411f35-bde7-4524-9cf3-73972746908c"}

The installation seems to be correct, since there are no failed jobs or other obvious strange pods.
I installed the ICP with having a valid singed certificate and a custom domain name (not only published with an IP address). I also installed the eventstreams with a provided certificate/ private key (section Kafka external access configuration in the values.yaml).
Is there any chance to import the certificate of the iam service to a keystore, so it is considered as a valid certificate?
Any help is highly appreciated.

Install

Error installing Event Streams 1.0 on ICP 3.1.0. Container config errors reported. kafka and some others get the error message:
Error: secrets "eventsgreen-ibm-es-oauth-secret" not found
and/or
Error: Couldn't find key eventstreams-eventsgreen-api-key in Secret streaming/eventsgreen-ibm-es-iam-secret
Both these secrets exist. The data in the oauth secret looks good. The Iam-secret contains the annotation for the api-key.
The OS is ubuntu
Event Streams is the dev version off the IBM catalog

Unable to override bootstrap and broker routes

Issue Description

When deploying an instance of Event Streams with an external listener and overriding the host by using overrides.bootstrap/broker the host fields do not persist in the CR and the changes do not take effect.

spec:
  strimziOverrides:
    kafka:
      listeners:
        external:
          type: route
            overrides:
              bootstrap:
                host: bootstrap.myhost.com
              brokers:
                - broker: 0
                  host: broker-0.myhost.com

Environment

IBM Event Streams Version: 10.x

Issue reference: ES-6791

Unable to renew Custom CA Certificates in Event Streams cluster

Issue Description

When updating the Custom Cluster CA certificates in an Event Streams cluster, using the following documented steps https://strimzi.io/docs/operators/latest/using.html#renewing-your-own-ca-certificates-str, none of the pods in the cluster roll to pick up the changes, and the zookeeper pods start to issue the following errors:

2021-04-16 15:07:59,715 WARN Exception caught (org.apache.zookeeper.server.NettyServerCnxnFactory) [nioEventLoopGroup-7-1]
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)

Environment

IBM Event Streams Version: 10.0 onwards deployed on Openshift

CWOAU0062E: The OAuth service provider could not redirect the request when opening UI

Issue Description

When attempting to access the Event Streams admin console from the IBM Cloud Private (ICP) UI using the Launch -> admin-ui-https menu option, the Event Streams UI fails to launch and the following error message is seen in the browser window:

CWOAU0062E: The OAuth service provider could not redirect the request because the redirect URI was not valid. Contact your system administrator to resolve the problem.

The ICP topology includes separate master and proxy nodes.

Issue Resolution

The ICP Authentication Manager did not have the proxy node's ip address registered as a valid URL from where the Event Streams UI could be accessed.

Event Streams has been updated to register a redirect with the ICP Authentication Manager which allows the UI to be accessed via both the master and proxy node's ip addresses.

Workaround

When the Event Streams UI is launched from the ICP UI, it may be launched using the proxy node ip address.

As a workaround to this issue, replace the proxy ip address with the master node's ip address in the browser, which will result in the Event Streams UI launching successfully.

Fix details

IBM Internal issue number - 1679
Fix target - 2018.3.1

Test issue - please ignore

SASL authentication errors when connecting to Event Streams installed on an IBM Cloud Private cluster that has custom certificates

Issue Description

The following error is seen after install after logging into the Event Streams UI, or when initiating Kafka connections:
org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed, invalid credentials

Checking the logs for an access-controller pod (kubectl logs <release>-ibm-es-access-controller-<id>) shows:

{"error":"Post https://xxx.yyy.zzz:8443/iam-token/oidc/token: x509: certificate signed by unknown authority","message":"failed to obtain token from IAM","message_template":"failed to obtain token from IAM","mh_file":"renewer.go","mh_line":104,"mh_ts":"Feb  6 07:53:09.287","transaction_id":"a8411f35-bde7-4524-9cf3-73972746908c"}

This can occur when IBM Event Streams is installed on an IBM Cloud Private (ICP) that was installed with custom certificates.

Issue Resolution

Not yet available

Workaround

After installing Event Streams, a Kubernetes secret containing the certificate used to verify connections from Event Streams to ICP must be updated to contain the correct certificate to verify the custom icp-router certificate supplied on install of ICP.

Copy the proxy secret onto your local machine: kubectl -n <event-streams-namespace> get secret <release-name>-ibm-es-proxy-secret --export -o yaml > proxy-secret.yaml
Copy proxy-secret.yaml to proxy-secret.bak so that you have a backup of the secret
Replace the value tls.cluster inside proxy-secret.yaml with the certificate needed to verify the icp-router certificate. The certificate must be in PEM format, encoded as a base64 string. base64 <cert.pem>
- If the icp-router certificate is self signed, provide the certificate itself as the value for <cert.pem>. If the icp-router certificate is signed by a chain of certificate authorities, provide the chain in a single file as the value for <cert.pem>. The chain file will look the following:
```
-----BEGIN CERTIFICATE----- 
(Your Intermediate certificate: CA.crt) 
-----END CERTIFICATE----- 
-----BEGIN CERTIFICATE----- 
(Your Root certificate: TrustedRoot.crt) 
-----END CERTIFICATE-----
```
Apply the secret kubectl -n <event-streams-namespace> apply -f proxy-secret.yaml
Gracefully restart the access-controller and rest pods by scaling the deployments:
- kubectl -n <event-streams-namespace> scale deploy <release-name>-ibm-es-access-controller-deploy --replicas 0
- kubectl -n <event-streams-namespace> scale deploy <release-name>-ibm-es-access-controller-deploy --replicas 2
- kubectl -n <event-streams-namespace> scale deploy <release-name>-ibm-es-rest-deploy --replicas 0
- kubectl -n <event-streams-namespace> scale deploy <release-name>-ibm-es-rest-deploy --replicas 1

Fix details

IBM Internal issue number - 2514
Fix target -

Event Streams- Broker component not available in namespace.

Issue Description

After installing IBM Cloud Private-CE 3.1.1 with 1 master and 3 worker nodes on RHEL 7 operating system, I installed the Eventstreams-dev HELM release 1.1.0 on a new namespace. Then launched the Event Streams UI. The UI was showing that 1 component is not available, which turned out to the the 3 brokers. All other components were enabled. Can't proceed with testing until this component is available.

Further investigation using the kubectl commands showed that only 2/4 containers are starting in es-kafka-0, es-kafka-1 and es-kafka-2 brokers.

I have attached the logs for your further investigation. Please help.

Thanks,
Edwin

describepod_kafka-sts-0.txt
describepod_kafka-sts-1.txt
describepod_kafka-sts-2.txt
getnodes.txt
getpods.txt
logs_kafka-sts-0_container_healthcheck.txt
logs_kafka-sts-0_container_kafka.txt
logs_kafka-sts-0_container_metrics-proxy.txt
logs_kafka-sts-0_container_metrics-reporter.txt

Describe the problem
Describe the application flow or what steps were being taken when the problem occurred
If the issue is reproducible provide a set of instructions describing how to reproduce it, including screenshots if they might assist
Include any error messages, stack traces and logs associated with the error, in full, in text format (ie not a screenshot)
Describe the expected behaviour
Did this used to work? If so what has changed, for example was any maintenance of the system performed?

Environment

IBM Event Streams Version:
IBM Cloud Private (ICP) Version:
Operating system of ICP install:
Browser (for UI issues):

Geo-replication fails to start due to custom certificate IP address/DNS name mismatch

Issue Description

On an IBM Event Streams deployment installed with custom certificates, geo-replication fails when starting a topic replicator, due to SSL handshake errors.
Checking the logs for a replicator pod (kubectl logs <release>-ibm-es-replicator-deploy-<id>) shows the following exceptions:

INFO [AdminClient clientId=adminclient-5] Failed authentication with <origin-server-hostname>/<origin-server-IP-address> (SSL handshake failed) (org.apache.kafka.common.network.Selector)
WARN [AdminClient clientId=adminclient-5] Metadata update failed due to authentication error (org.apache.kafka.clients.admin.internals.AdminMetadataManager)
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
    at com.ibm.jsse2.D.z(D.java:518)
    at com.ibm.jsse2.as.b(as.java:264)
    ...
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
    at com.ibm.jsse2.k.a(k.java:42)
    ... 
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching <server-hostname> found.
   ...

Issue Resolution

It is now possible to disable hostname verification in the Geo-replication component by setting the environment variable NO_HOSTNAME_VALIDATION to true in the Geo-replication deployment definition.

Fix details

IBM Internal issue number - 3627
Fix target - 2019.4.1

Cannot add a destination cluster via the UI with long URL for API Address

Issue Description

When setting up a destination cluster for Geo Replication using the Event Streams UI - inputting the connection information will trigger a validation error if the api_address hostname is longer than 100 characters
A workaround is to use the Event Streams CLI to set up the connection: https://ibm.github.io/event-streams/2019.4/georeplication/setting-up/
Resolved in Event Streams 10.0.0

Environment

IBM Event Streams Version: 2019.4.2
IBM Cloud Private (ICP) Version: any
Operating system of ICP install: any
Browser (for UI issues): any

Need for a "Dev mode" TLS free option

Hi there. It would be very handy if EventStreams could be installed in a "Dev mode" that didn't require all the certificates and the TLS and the whatnot. People kicking tires on the product want to be up and running in minutes, and although I realize security is very important, it can slow people down which causes a drop-off in enthusiasm. A switch on the helm chart to enable or disable TLS would be awesome.

Additionally, when using the beta, I had to connect to the brokers via their internal container addresses. I actually really liked this, as I had my own containers which could connect to Kafka without having to go out through the external proxy. It would be very nice to bring this feature back.

Multiple license files

There is a LICENSE (Apache) and LICENSE.txt (MIT) in this repo, that seems wrong. Which one is it?

Close all admin clients in REST

Issue Description

Numerous authentication failures appear in the rest container within the rest-deploy pod. For example:

[3/10/20 15:41:39:811 GMT] 00002d3e id=R 15:41:39.811 [kafka-admin-client-thread | username] ERROR org.apache.kafka.clients.NetworkClient processDisconnection - [AdminClient clientId=username] Connection to node 0 (es-1-4-0-ibm-es-kafka-sts-0.es-1-4-0-ibm-es-kafka-headless-svc.es.svc.cluster.local/10.131.0.219:8084) failed authentication due to: Authentication failed, invalid credentials

Numerous authentication errors appear in the kafka container within the kafka pod. For example:

[2020-03-10 15:41:33,438] INFO [SocketServer brokerId=0] Failed authentication with /127.0.0.1 (Authentication failed, invalid credentials) (org.apache.kafka.common.network.Selector)

These log entries are essentially benign but are flooding these two log files with these entries. This was observed on 2019.4.1

Payloads from NodeJS not appearing on Messages screen

Browser Version: 5.0 (Windows)
IBM Event Streams Version: IBM Event Streams Community Edition-2018.3.1
IBM Cloud Private Version: 3.1.1

Hi there. I have the latest event streams installed on ICP 3.1.1. I have two clients, one Java and one NodeJS that can both successfully attach and post messages.

The problem that I am having is that messages posted from the Java client appear in the messages console, while ones posted from NodeJS don't.

In the following screenshot:

The message at offsets 2 and 3 appear, which were from a Java client. Messages 0 and 1, which were from my NodeJS client, don't appear. However, if I search on the specific offset, then the message is indeed visible:

As far as I can tell, my Java code sends the same stringified payload as NodeJS:

Airgap installation instructions incorrect for Event Streams

Issue Description

The IBM Cloud Pak for Integration documentation that includes instructions for installations in an airgapped environment has the wrong command for Event Streams.
Visit this page: https://www.ibm.com/support/knowledgecenter/SSGT7J_20.4/install/airgap_bastion.html and note that the Event Streams command has export CASE_ARCHIVE=ibm-eventstreams-1.2.1.tgz. This is incorrect as the latest release of Event Streams has a CASE version of 1.2.0. As seen here: https://github.com/IBM/cloud-pak/tree/master/repo/case/ibm-eventstreams/1.2.0
These instructions need to be updated

Environment

IBM Event Streams Version: 10.2.0-eus

Internal issue number: cp4i/icip-issues#2079

Calling eventstreams UI on ICP 3.1.1 with correct dns name fails

Hi,
I installed IBM Event Streams Community Edition-2018.3.1 into a fresh installation of ICP 3.1.1 (CE) and now I am not able to call the UI via my registered hostname for the proxy. The browser states that there is no valid connection possible, since the page is protected by HSTS.
Calling the UI via the IP of the proxy server works - with the limitation that the issued certificate is not valid.
For the installation I used following part of the values.yaml:

proxy:
  # external IP address for access that the proxy should use
  externalEndpoint: "xxx.yyy.zzz"
# Secure connection settings for the proxy
tls:
  type: "provided"
  key: "<key for my certificate | base 64>"
  cert: "<my wildcard certificate without the chain | base 64>"
  cacert: "<the chain parts of my wildcard certificate | base 64>"

The tracing of the curl returns to the folllowing output:
curl.trace.txt

The presented certificate for the endpoint is a self-signed/self created certificate for the endpoint that enumerates all necessary alternative names in the correct way, but is not signed by the CA that was specified during the installation (also see #19). The presented certificate is located at kubectl get secret eventstreams-ibm-es-proxy-secret -o=jsonpath='{.data.https\.cert}' with the following partial output:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1549454351754155924 (0x1580c48607fd5794)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, ST=New York, L=Armonk, O=IBM Cloud Private, CN=www.ibm.com
        Validity
            Not Before: Feb  6 11:59:11 2019 GMT
            Not After : Feb  3 11:59:11 2029 GMT
        Subject: O=IBM, OU=IBM Event Streams
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:b5:eb:3b:54:93:0f:cd:26:98:a1:65:96:19:22:
                    32:c3:09:3b:1c:0a:a4:e6:c9:98:32:1c:17:a2:b9:
                    63:9b:33:9a:e8:e6:72:60:df:7d:23:51:89:5c:a2:
                    63:e9:06:74:0c:8a:05:43:79:6f:02:e2:55:a3:af:
                    f3:d1:bb:07:20:4d:de:db:98:a0:5a:db:e7:09:89:
                    d7:f1:ba:29:93:cd:da:34:8c:a8:60:03:78:d2:80:
                    4e:c1:2e:b5:2b:34:d6:c5:59:b7:cb:ba:61:99:e5:
                    e1:fb:b5:f3:bf:a4:21:7b:fb:45:53:d4:46:1a:67:
                    b4:ad:08:bc:ec:67:c5:db:10:c4:0e:de:05:ac:7e:
                    3c:9e:13:0e:a6:a0:eb:9a:e2:ba:9d:68:3a:05:01:
                    f2:1d:b7:5c:6b:60:ad:c6:6a:d7:93:d1:0f:01:a0:
                    7c:de:9a:b3:6a:9f:0c:49:5f:84:58:ee:51:31:91:
                    7f:e6:84:24:5e:87:50:5d:d8:e1:75:5f:02:66:10:
                    23:a1:e0:73:c3:c4:f8:7f:dd:3b:67:80:88:dd:2f:
                    97:b9:08:aa:1e:30:1c:f3:8a:d8:da:95:af:1b:89:
                    e1:54:62:bc:80:6c:11:ef:1e:c6:60:42:6a:b9:ff:
                    4a:33:9a:ad:c6:59:d5:32:6c:37:ab:d7:35:97:fe:
                    bf:55
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                keyid:00:EE:E8:1D:2E:C4:52:D4:FC:A3:27:ED:C6:94:64:18:A6:33:86:A5

            X509v3 Subject Alternative Name: 
                DNS:yyy.zzz, DNS:xxx.yyy.zzz, DNS:yyy.zzz.eventstreams-ns.svc.cluster.local, DNS:xxx.yyy.zzz, IP Address:123.456.12.123

The signer corresponds to the previous (see #18) used certificate specified in tls.cluster.

Do you have any ideas to overcome this issue?

Missing Documentation for Schema API

Schema API documentation page is 404

https://ibm.github.io/event-streams/schemas/schema-api-reference/

REST producer API rejecting messages with size > 65KB

Issue Description

When sending messages to the REST producer API the response code is 400 Bad Request. Additionally when the Accept header in the request is set to application/json the response includes the message: Error parsing JSON message: http: request body too large.

The REST producer API has a configured limit for both message and key size. This defaults to 4096 bytes for the key and 65536 bytes for the message. The request is being rejected because the message body sent is larger than the limit.

Issue Resolution

The Event Streams Helm chart has been updated to allow configuration of the max message and key size at install time.

Workaround

The limits can be updated as follows:

If required initialise kubectl configuration:

Ensure you have the Event Streams CLI installed.
Log in to your cluster as an administrator by using the IBM Cloud Private CLI: cloudctl login -a https://<cluster_address>:<cluster_router_https_port>
Run the following command to initialize the Event Streams CLI: cloudctl es init
If you have more than one Event Streams instance installed, select the one where the topic you want to produce to is.
Details of your Event Streams installation are displayed.

Update the size limit(s):

List the Event Streams deployments: kubectl get deployments
Identify the rest-producer deployment from the list (it will be similar to ***-ibm-es-rest-producer-deploy)
Edit the rest-producer deployment: kubectl edit deployment ***-ibm-es-rest-producer-deploy
In the yaml file locate the env section for the rest-producer container under spec.template.spec.containers
Add the following environment variables as required:
a. MAX_KEY_SIZE - maximum key size in bytes (default 4096)
b. MAX_MESSAGE_SIZE - maximum message size in bytes: (default 65536)
Save the change and wait for the REST producer pod to be updated

Note:
Sending larger requests to the REST producer will result in some increase in latency, since it will take the REST producer longer to process the requests. Do not set the limit to higher than the broker max message size, otherwise Kafka will reject the messages. The default for Kafka is 1000012 bytes.

Environment

Affected IBM Event Streams Versions: 2019.1.1, 2019.2.1, 2019.4.1
Internal issue number: 4394, 5386

Long value for Zookeeper and Kafka persistent volume name field causes error

Issue Description

The following event is seen after install:

I0201 09:50:05.715206 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"event-streams", Name:"evtstr-ibm-es-zookeeper-sts", UID:"800de1d1-2587-11e9-9317-0a0c9eb925fe", APIVersion:"apps/v1", ResourceVersion:"15048940", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' create Pod evtstrbai-ibm-es-zookeeper-sts-0 in StatefulSet evtstrbai-ibm-es-zookeeper-sts failed error: Pod "evtstr-ibm-es-zookeeper-sts-0" is invalid: spec.containers[0].volumeMounts[0].name: Not found: "zookeeper-vol-pv"

The value supplied for the Zookeeper persistent volume name during install was too long resulting in an internal mapping of the name to fail.

This issue can also be seen with the Kafka persistent volume name.

Issue Resolution

The internal value sused for both the Zookeeper and Kafka persistent volume claim names are now all now shortened to matching values, and so the mapping no longer fails.

Workaround

Reduce the length of the name of the Zookeeper persistent volume name.

Fix details

IBM Internal issue number - 2449
Fix target - 2019.1.1

[ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient)

Hi,

i have installed ibm-eventstreams-prod 2018.3.1 with helm chart 1.1.0 on icp 3.1.2 Enterprise Edition. After running the helm chart the following pods are CrashLoopBackOff

[admin@localhost ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-instance-ibm-es-access-controller-deploy-7f5cc5f89b-nwp6h 2/2 Running 0 15h
test-instance-ibm-es-access-controller-deploy-7f5cc5f89b-xcf7k 2/2 Running 0 15h
test-instance-ibm-es-elastic-sts-0 0/1 CrashLoopBackOff 177 15h
test-instance-ibm-es-elastic-sts-1 0/1 CrashLoopBackOff 176 15h
test-instance-ibm-es-indexmgr-deploy-5ff44dfb9-bnbpb 1/1 Running 0 15h
test-instance-ibm-es-kafka-sts-0 3/4 CrashLoopBackOff 18 75m
test-instance-ibm-es-kafka-sts-1 3/4 CrashLoopBackOff 19 78m
test-instance-ibm-es-kafka-sts-2 3/4 CrashLoopBackOff 38 175m
test-instance-ibm-es-proxy-deploy-657bcbd7b-cclhp 1/1 Running 0 15h
test-instance-ibm-es-proxy-deploy-657bcbd7b-w2fqj 1/1 Running 0 15h
test-instance-ibm-es-rest-deploy-7dd6f57778-xp9pk 3/3 Running 0 15h
test-instance-ibm-es-ui-deploy-bfd7dfd96-qwn4s 3/3 Running 0 15h
test-instance-ibm-es-zook-c4c0-0 1/1 Running 0 15h
test-instance-ibm-es-zook-c4c0-1 1/1 Running 0 15h
test-instance-ibm-es-zook-c4c0-2 1/1 Running 0 15h

helm:

global:
image:
repository: mycluster.icp:8500/event-streams
pullSecret: regcred
pullPolicy: IfNotPresent
imageTags:
kafkaTag: 2018-11-23-09.03.09-f51c694
healthcheckTag: 2018-11-21-16.21.30-9947d87
kafkaMetricsProxyTag: 2018-11-22-13.44.45-932c41b
metricsReporterTag: 2018-11-15-16.03.55-fc56475
zookeeperTag: 2018-11-15-16.04.27-97fcd55
kafkaProxyTag: 2018-11-21-09.10.49-192cd69
proxyTag: 2018-11-21-09.10.54-83bacb7
uiTag: 2018-11-23-14.41.34-5551d4d
codegenTag: 2018-11-21-11.58.59-4a844ff
oauthTag: 2018-11-15-16.06.08-0e82f45
roleMappingsTag: 2018-11-15-16.06.41-a193f91
restTag: 2018-11-23-10.08.32-1d3a9a6
elasticSearchTag: 2018-11-15-16.07.36-41f575c
indexmgrTag: 2018-11-21-12.49.59-6879c17
telemetryTag: 2018-11-15-16.08.19-17b74dc
replicatorTag: 2018-11-15-16.50.56-c9a89c3
accessControllerTag: 2018-11-23-11.54.42-58ce6ac
kubectlTag: 2018-11-15-16.27.22-6a75737
certGenTag: 2018-11-15-16.09.38-317444b
redisTag: 2018-11-22-17.05.03-8dfe09e
busyboxTag: 2018-11-15-16.10.31-12d2260
alpineTag: 2018-11-21-15.56.28-ebba125
fsGroupGid: null
arch: amd64
telemetry:
enabled: false
kafka:
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
metricsReporterResources:
limits:
memory: 1500Mi
requests:
memory: 1500Mi
brokers: 3
configMapName: ""
interBrokerProtocolVersion: "2.0"
logMessageFormatVersion: "2.0"
heapOpts: -XX:+UseContainerSupport
persistence:
enabled: true
useDynamicProvisioning: true
dataPVC:
name: datadir
storageClassName: "kafka-server"
size: 4Gi
zookeeper:
resources:
limits:
cpu: 100m
memory: 1Gi
requests:
cpu: 100m
memory: 750Mi
persistence:
enabled: true
useDynamicProvisioning: true
dataPVC:
name: datadir
storageClassName: "kafka-zookeeper"
size: 2Gi
proxy:
externalEndpoint: ""
tls:
type: selfsigned
key: null
cert: null
cacert: null
messageIndexing:
messageIndexingEnabled: true
resources:
limits:
memory: 4Gi
replicator:
replicas: 0
metricsReporterResources:
limits:
memory: 1500Mi
requests:
memory: 1500Mi
license: accepted

Log test-instance-ibm-es-kafka-sts-0
[2019-03-08 08:31:30,056] INFO Opening socket connection to server test-instance-ibm-es-zookeeper-fixed-ip-svc-0.event-streams.svc.cluster.local/10.0.122.38:2181 (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,056] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient)
[2019-03-08 08:31:30,056] INFO Socket connection established to test-instance-ibm-es-zookeeper-fixed-ip-svc-0.event-streams.svc.cluster.local/10.0.122.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,057] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,812] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/jaas/event_streams_jaas.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,813] INFO Opening socket connection to server test-instance-ibm-es-zookeeper-fixed-ip-svc-1.event-streams.svc.cluster.local/10.0.68.237:2181 (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,813] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient)
[2019-03-08 08:31:30,814] INFO Socket connection established to test-instance-ibm-es-zookeeper-fixed-ip-svc-1.event-streams.svc.cluster.local/10.0.68.237:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:30,815] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,659] WARN SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/jaas/event_streams_jaas.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,660] INFO Opening socket connection to server test-instance-ibm-es-zookeeper-fixed-ip-svc-2.event-streams.svc.cluster.local/10.0.203.71:2181 (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,660] ERROR [ZooKeeperClient] Auth failed. (kafka.zookeeper.ZooKeeperClient)
[2019-03-08 08:31:32,661] INFO Socket connection established to test-instance-ibm-es-zookeeper-fixed-ip-svc-2.event-streams.svc.cluster.local/10.0.203.71:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,661] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,678] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-03-08 08:31:32,765] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-03-08 08:31:32,767] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-03-08 08:31:32,770] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-03-08 08:31:32,777] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:230)
at kafka.zookeeper.ZooKeeperClient$$Lambda$107.000000001D1E3870.apply$mcV$sp(Unknown Source)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:95)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1588)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2019-03-08 08:31:32,781] INFO shutting down (kafka.server.KafkaServer)
[2019-03-08 08:31:32,784] WARN (kafka.utils.CoreUtils$)
java.lang.NullPointerException
at kafka.server.KafkaServer.$anonfun$shutdown$6(KafkaServer.scala:579)
at kafka.server.KafkaServer$$Lambda$129.000000001D1E1DA0.apply$mcV$sp(Unknown Source)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:579)
at kafka.server.KafkaServer.startup(KafkaServer.scala:329)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2019-03-08 08:31:32,792] INFO shut down completed (kafka.server.KafkaServer)
[2019-03-08 08:31:32,794] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-03-08 08:31:32,798] INFO shutting down (kafka.server.KafkaServer)

Log test-instance-ibm-es-elastic-sts-0
[2019-03-08T08:32:34,648][INFO ][o.e.p.PluginsService ] [node-0] loaded module [aggs-matrix-stats]
[2019-03-08T08:32:34,649][INFO ][o.e.p.PluginsService ] [node-0] loaded module [analysis-common]
[2019-03-08T08:32:34,649][INFO ][o.e.p.PluginsService ] [node-0] loaded module [ingest-common]
[2019-03-08T08:32:34,649][INFO ][o.e.p.PluginsService ] [node-0] loaded module [lang-expression]
[2019-03-08T08:32:34,649][INFO ][o.e.p.PluginsService ] [node-0] loaded module [lang-mustache]
[2019-03-08T08:32:34,649][INFO ][o.e.p.PluginsService ] [node-0] loaded module [lang-painless]
[2019-03-08T08:32:34,650][INFO ][o.e.p.PluginsService ] [node-0] loaded module [mapper-extras]
[2019-03-08T08:32:34,650][INFO ][o.e.p.PluginsService ] [node-0] loaded module [parent-join]
[2019-03-08T08:32:34,650][INFO ][o.e.p.PluginsService ] [node-0] loaded module [percolator]
[2019-03-08T08:32:34,650][INFO ][o.e.p.PluginsService ] [node-0] loaded module [rank-eval]
[2019-03-08T08:32:34,651][INFO ][o.e.p.PluginsService ] [node-0] loaded module [reindex]
[2019-03-08T08:32:34,651][INFO ][o.e.p.PluginsService ] [node-0] loaded module [repository-url]
[2019-03-08T08:32:34,651][INFO ][o.e.p.PluginsService ] [node-0] loaded module [transport-netty4]
[2019-03-08T08:32:34,651][INFO ][o.e.p.PluginsService ] [node-0] loaded module [tribe]
[2019-03-08T08:32:34,652][INFO ][o.e.p.PluginsService ] [node-0] no plugins loaded
--2019-03-08 08:32:35-- http://test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc:9200/
Resolving test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)... 10.1.30.126
Connecting to test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)|10.1.30.126|:9200... failed: Connection refused.
--2019-03-08 08:32:36-- http://test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc:9200/
Resolving test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)... 10.1.30.126
Connecting to test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)|10.1.30.126|:9200... failed: Connection refused.
--2019-03-08 08:32:37-- http://test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc:9200/
Resolving test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)... 10.1.30.126
Connecting to test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc (test-instance-ibm-es-elastic-sts-0.test-instance-ibm-es-elastic-svc)|10.1.30.126|:9200... failed: Connection refused.
[2019-03-08T08:32:37,331][INFO ][o.e.d.DiscoveryModule ] [node-0] using discovery type [zen]
[2019-03-08T08:32:37,860][INFO ][o.e.n.Node ] [node-0] initialized
[2019-03-08T08:32:37,860][INFO ][o.e.n.Node ] [node-0] starting ...
[2019-03-08T08:32:38,017][INFO ][o.e.t.TransportService ] [node-0] publish_address {10.1.30.126:9300}, bound_addresses {10.1.30.126:9300}
[2019-03-08T08:32:38,027][INFO ][o.e.b.BootstrapChecks ] [node-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2019-03-08T08:32:38,036][INFO ][o.e.n.Node ] [node-0] stopping ...
[2019-03-08T08:32:38,054][INFO ][o.e.n.Node ] [node-0] stopped
[2019-03-08T08:32:38,054][INFO ][o.e.n.Node ] [node-0] closing ...
[2019-03-08T08:32:38,068][INFO ][o.e.n.Node ] [node-0] closed

Please assist.

KafkaExporter reconcile and runtime issues when deployed via operator

Issue Description

If configured to deploy via a CR, Event Streams can produce additional JMX metrics via the KafkaExporter component. While the operator processes the request to deploy the KafkaExporter however, a java.lang.NullPointerException can be encountered, which will cause the reconcile to fail, and the KafkaExporter ultimately to not be deployed.

In addition, if the KafkaExporter does deploy, it may fail to start/run correctly, due to a lack of permissions.

Both issues are caused by the operator misprocessing elements of the CR.

Environment

IBM Event Streams Version: 10.x

Issue reference: ES-6722

504 timeout error when viewing consumer groups in the Event Streams UI

Issue ID: ES-135

Summary: When viewing consumer groups in the UI a 504 timeout error is shown and the consumer groups are not displayed

Environment

IBM Event Streams Version: 10.x

Test issue - please ignore

Download schema registry Java dependencies not working on 2019.4.2

Issue Description

When I'm viewing a schema in the EventStreams UI, click on the Connect to this version blue button, and click on Java Dependencies link in paragraph 2. Download the schema registry dependencies ..., the link fails with a network error.
A workaround is to adjust the link to say 2019.4.1, which produces files that work with 2019.4.2.
Ie:
https://<my-eventstreams-ui-address>/api/files/java-dependencies/dependencies-2019.4.1-java.zip

Environment

IBM Event Streams Version: 2019.4.2
IBM Cloud Private (ICP) Version: any
Operating system of ICP install: any
Browser (for UI issues): any

Authentication issue with ICP UI

Hi,

After installing eventstreams-dev on our ICP installation we are unable to perform any tasks via the UI. For instance attempting to create a topic will result in a dialog box containing the following error:

Topic creation failed
500: An unexpected condition has occurred. org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed, invalid credentials
9/20/2019, 12:19:54 PM

GeoReplicator tasks stop and errors seen at source Kafka

Issue Description

Georeplicator status shows replicators tasks are failing and they have to be manually restarted.

The replicator-deploy logs show:

[2020-02-28 15:05:08,141] ERROR WorkerSourceTask{id=<redacted>} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Unexpected error in commit: The server experienced an unexpected error when processing the request.
	at com.ibm.eventstreams.replicator.ReplicatorTask.poll(ReplicatorTask.java:188)
	at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:245)
	at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:221)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
	at java.util.concurrent.FutureTask.run(FutureTask.java:277)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:812)
[2020-02-28 15:05:08,141] ERROR WorkerSourceTask{id=<redacted>} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)

The source Kafka brokers show errors at the same time:

[2020-04-02 16:56:59,479] ERROR [KafkaApi-2] Error when handling request: clientId=<redacted>, correlationId=<redacted>, api=OFFSET_COMMIT, body={group_id=<redacted>,generation_id=39,member_id=<member-redacted>} (kafka.server.KafkaApis)
java.util.NoSuchElementException: key not found: <member-redacted>
	at scala.collection.MapLike.default(MapLike.scala:235)
	at scala.collection.MapLike.default$(MapLike.scala:234)
	at scala.collection.AbstractMap.default(Map.scala:63)
	at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
	at kafka.coordinator.group.GroupMetadata.get(GroupMetadata.scala:203)
	at kafka.coordinator.group.GroupCoordinator.$anonfun$tryCompleteHeartbeat$1(GroupCoordinator.scala:927)
	at kafka.coordinator.group.GroupCoordinator$$Lambda$1108.00000000600D0690.apply$mcZ$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
	at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:198)
	at kafka.coordinator.group.GroupCoordinator.tryCompleteHeartbeat(GroupCoordinator.scala:920)
	at kafka.coordinator.group.DelayedHeartbeat.tryComplete(DelayedHeartbeat.scala:34)
	at kafka.server.DelayedOperation.maybeTryComplete(DelayedOperation.scala:121)
	at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:388)
	at kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:294)
	at kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextExpiration(GroupCoordinator.scala:737)
	at kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextHeartbeatExpiration(GroupCoordinator.scala:730)
	at kafka.coordinator.group.GroupCoordinator.$anonfun$handleHeartbeat$2(GroupCoordinator.scala:486)
	at kafka.coordinator.group.GroupCoordinator$$Lambda$1115.000000005C091790.apply$mcV$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
	at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:198)
	at kafka.coordinator.group.GroupCoordinator.handleHeartbeat(GroupCoordinator.scala:451)
	at kafka.server.KafkaApis.handleHeartbeatRequest(KafkaApis.scala:1336)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:120)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
	at java.lang.Thread.run(Thread.java:812)

Issue Resolution

https://issues.apache.org/jira/browse/KAFKA-8896 provided a fix.

This is in Kafka 2.3.1 which is available in IBM Event Streams 2019.4.2.

Workaround

Fix details

IBM Internal issue number - 5199
Fix target - 2019.4.2

Unable to connect clients to bootstrap url when on OpenShift

Issue Description

When deploying an instance of Event Streams 2019.4.2 on OpenShift, clients connecting via the external bootstrap URL presented in the UI may encounter connection issues where Kafka is not contactable.

This can occur when OpenShift generates truncated routes for Kafka listeners, but the Kafka listeners are not updated with the truncated values.

Diagnosis

To diagnose, run the following commands:

oc get routes
oc describe cm <release>-ibm-es-proxy-cm

Compare the values for HOST/PORT of the following routes against the externalListeners section of the config map:

<release>-ibm-es-proxy-route-broker-<0..n>
<release>-ibm-es-proxy-route-bootstrap

If any of the routes do not match the listeners, the config map will need to be updated

Resolution

If the bootstrap listener does not match the bootstrap route:
- The route will take the following form: <prefix>-<namespace>.<domain>:443 eg es-1-ibm-es-proxy-route-bootstrap-mynamespace.domain.somewhere:port
- set the bootstrapRoutePrefix in the config map to es-1-ibm-es-proxy-route-bootstrap (do not save changes yet)
If the broker listeners do not match the broker route:
- The route will take the following form: <prefix>-<broker-id>-<namespace>.<domain>:443 eg es-1-ibm-es-proxy-route-broker-1-mynamespace.domain.somewhere:port
- set the brokerRoutePrefix in the config map to es-1-ibm-es-proxy-route-broker- (do not save changes yet)
Change the revision value in the config map to 1 greater than the current value. You can now save the config map changes.

Environment

IBM Event Streams Version: 2019.4.2 deployed on Openshift

Internal issue: 5957
Fixed in: 10.0

Improve the view of records created with Avro schemas. Use the generic avro serializer/deserializer?

Feature / Enhancement

As a <role> I would like <feature_description> because <reason_why>.

How will this improve your use of IBM Event Streams?

Allow me to do something that is not currently possible
Allow me to do something more easily

How important is this to your use of IBM Event Streams?

issue1

Browser Version: 5.0 (Macintosh)
IBM Event Streams Version: IBM Event Streams-2018.3.1
IBM Cloud Private Version:

KafkaProducer nil pointer reference issued when rolling Kafka brokers

Issue Description
Messages fail to be produced using the Rest Producer and the following error is seen in the Rest Producer logs:

http: panic serving n.n.n.n:12345: runtime error: invalid memory address or nil pointer dereference
goroutine 8294 [running]:
net/http.(*conn).serve.func1(0xc00010a000)
/usr/local/go/src/net/http/server.go:1769 +0x139
panic(0x8f0ec0, 0xdf6af0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.ibm.com/mhub/rest-producer/pkg/producer.(*singleTenant).Produce(0xc0001335c0, 0xc00037e5d0, 0xc00010a1e0, 0x6fc23ac00, 0x0, 0x0, 0x0, 0x0)
/home/jenkins/go/src/github.ibm.com/mhub/rest-producer/pkg/producer/singletenant.go:85 +0x194
github.ibm.com/mhub/rest-producer/pkg/api.(*KafkaRestAPI).Produce(0xc00011ece0, 0xc0003920cd, 0x6, 0xa47da0, 0xc00038a000, 0xc0001d8200)
/home/jenkins/go/src/github.ibm.com/mhub/rest-producer/pkg/api/api.go:210 +0x187f
github.ibm.com/mhub/rest-producer/pkg/api.(*KafkaRestAPI).produceKafkaHandler(0xc00011ece0, 0xa47da0, 0xc00038a000, 0xc0001d8200)
/home/jenkins/go/src/github.ibm.com/mhub/rest-producer/pkg/api/api.go:374 +0x96
net/http.HandlerFunc.ServeHTTP(0xc000133690, 0xa47da0, 0xc00038a000, 0xc0001d8200)
/usr/local/go/src/net/http/server.go:1995 +0x44
github.ibm.com/mhub/rest-producer/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001700c0, 0xa47da0, 0xc00038a000, 0xc0001d8000)
/home/jenkins/go/src/github.ibm.com/mhub/rest-producer/vendor/github.com/gorilla/mux/mux.go:212 +0xe3
net/http.serverHandler.ServeHTTP(0xc000123ba0, 0xa47da0, 0xc00038a000, 0xc0001d8000)
/usr/local/go/src/net/http/server.go:2774 +0xa8
net/http.(*conn).serve(0xc00010a000, 0xa48ee0, 0xc00002e2c0)
/usr/local/go/src/net/http/server.go:1878 +0x851
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2884 +0x2f4
{"error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","ibm_datetime":"Jun 10 12:54:37.832","ibm_fileName":"api.go","ibm_lineNumber":215,"kafka_error_code":null,"loglevel":"DEBUG","message":"error producing to topic1 : 500 kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","message_template":"error producing to [topic] : [status_code] [error]","module":"RestProducer","status_code":500,"topic":"topic1","transaction_id":"e3175c78-1749-4e1e-8223-9b3d83a2396c","type":"IES","verbose":true}

Issue Resolution
The Nil Pointer was seen when the REST Producer code attempted to reuse a failed connection to the broker. The code has been updated to remove failed connections.

Workaround
None available

Fix details
IBM Internal issue number - 3310
Fix target - Release after 2019.4.2

Vulnerability in IBM Event Streams - CVE-2020-4662

Issue Description

A vulnerability exists in the Event Streams 10.0.0 schema registry that allows unauthorised access to create, edit and delete schemas

Issue Resolution

There is a vulnerability in the schema registry image, to fix this issue you will need to have access to the following image in the ibm entitled registry cp.icr.io/cp/ibm-eventstreams-schema-proxy@sha256:7ff23bc286cc4b1557287a8e48af1716150e88b0d2c5eeb0c9faf538ba2806e8

If you have an air gapped install you will need to pull down this image into your private image repository and make a note of the location.

Then you need to modify the ClusterServiceVersion (csv) of the eventstreams operator.

Get the name of the eventstreams csv, run oc get csv -n <NAMESPACE>. You should get an entry named ibm-eventstreams.v2.0.0 or ibm-eventstreams.v2.0.1
Edit the csv and find the following environment variable EVENTSTREAMS_DEFAULT_SCHEMA_REGISTRY_PROXY_IMAGE.
Replace the image tag associated with this env var with either the image mentioned above or the one you've pulled into your local private registry.
Save the csv. This will restart the operator with the updated env var and then update the Schema Registry with the required fix

Workaround

None

Fix details

IBM Internal Issue Number - 5931
Fix target - Not yet available

oauth

Issue Description

Describe the problem
Describe the application flow or what steps were being taken when the problem occurred
If the issue is reproducible provide a set of instructions describing how to reproduce it, including screenshots if they might assist
Include any error messages, stack traces and logs associated with the error, in full, in text format (ie not a screenshot)
Describe the expected behaviour
Did this used to work? If so what has changed, for example was any maintenance of the system performed?

Environment

IBM Event Streams Version:
IBM Cloud Private (ICP) Version:
Operating system of ICP install:
Browser (for UI issues):

kafka-python-console-sample

Hi,

I'm trying to get the python sample script working, it builds ok and attempts to send the messages but I only ever get timeouts on the delivery report.

I'm running this from a Ubuntu 19.04 with Python 3.7.2.

This is more than likely going to be something to do with the input strings I'm specifying but I can't see anything wrong with them, input below;

python app.js "kafka05-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka01-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka02-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka04-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka03-prod02.messagehub.services.eu-gb.bluemix.net:9093" "https://kafka-admin-prod02.messagehub.services.eu-gb.bluemix.net:443" "<My API Key>" "/etc/ssl/certs"

Logging outputs below;

Using command line arguments to find credentials.
Kafka Endpoints: kafka05-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka01-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka02-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka04-prod02.messagehub.services.eu-gb.bluemix.net:9093,kafka03-prod02.messagehub.services.eu-gb.bluemix.net:9093
Admin REST Endpoint: https://kafka-admin-prod02.messagehub.services.eu-gb.bluemix.net:443
Creating the topic start-action-chain-for-api-call with Admin REST API
{"errorCode":42201,"errorMessage":"Topic "my-topic" already exists."}
Admin REST Listing Topics:
[{"name":"my-topic","partitions":1,"retentionMs":86400000,"cleanupPolicy":"delete","markedForDeletion":false}]
This sample app will run until interrupted.
The producer has started
Sending message This is a test message #0
Sending message This is a test message #1
Sending message This is a test message #2
Sending message This is a test message #3
Sending message This is a test message #4
Sending message This is a test message #5
Delivery report: Failed sending message b'This is a test message #0'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}
Delivery report: Failed sending message b'This is a test message #1'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}
Delivery report: Failed sending message b'This is a test message #2'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}
Delivery report: Failed sending message b'This is a test message #3'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}
Delivery report: Failed sending message b'This is a test message #4'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}
Delivery report: Failed sending message b'This is a test message #5'
KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}

Unable to complete SSL handshake using provided PEM certificate when Event Streams is installed with CA signed certificates

Issue Description

Clients connecting using the PEM certificate provided in the UI 'connect to this cluster' sidebar are unable to complete an SSL handshake.

The following error may be seen in node-rdkafka applications:

{ severity: 7,
  fac: 'BROKERFAIL',
  message: '[thrd:sasl_ssl://xxxxx:31216/bootstrap]: sasl_ssl://xxxxx:31216/bootstrap: failed: err: Local: SSL error: (errno: Undefined error: 0)' }

This can occur when IBM Event Streams is installed with certificates where the encoded certificate entered in TLS certificate does not contain the full certificate chain to the root CA.

Issue Resolution

The PEM certificate and Java truststore than can be downloaded from the Event Streams UI now include the full certificate chain to the root CA.

Workaround

Supply the certificate chain via a new PEM file when starting client applications, instead of using the PEM provided from the Event Streams UI.

Fix details

IBM Internal issue number - 2539
Fix target - 2019.1.1

The es-proxy-deploy pods periodically stop responding and do not seem to recover

Issue Description

Describe the problem

The es-proxy-deploy pods periodically stop responding and do not seem to recover

Describe the application flow or what steps were being taken when the problem occurred

The deployed system is for functional testing. There is no special setup. The product is deployed and runs fine until the error occurs.

If the issue is reproducible provide a set of instructions describing how to reproduce it, including screenshots if they might assist

The issue occurs after the system has been deployed for a period of time. It looks like it is related to recovery from an unexpected network error. Details follow.

Include any error messages, stack traces and logs associated with the error, in full, in text format (ie not a screenshot)

Symptom
On startup, The es-proxy-deploy pods monitor the ConfigMap es-staging-ibm-es-proxy-cm using a Kubernetes k8s.Client. See detail (1) below. The k8s.Client watches the ConfigMap and periodically receives an unexpected, likely network, error. See detail (2). On perhaps some class of errors the watcher then does not recover. We see a retry attempt fail with a AuthError. See (3) below. The pods remain in a running state but the container never starts and hence no longer responds to client requests.

Details 
(1) ConfigMap monitoring initializes k8sclient.go:166] Attempting to watch for changes to config map (retry set for 5 seconds) : configmap name : es-staging-ibm-es-proxy-cm
(2) The k8s.Client watching the es-staging-ibm-es-proxy-cm ConfigMap receives an error.
{"ibm_datetime":"2020-10-24T03:01:48Z","logLevel":"ERROR","module":"KafkaProxy", "ibm_messageId":"IES","message":"IES: Kube watcher status change : http2: server sent GOAWAY and closed the connection; LastStreamID=187, ErrCode=NO_ERROR, debug="""}
(3) The retry to watch receives an auth error
{"ibm_datetime":"2020-10-24T03:02:14Z","logLevel":"ERROR","module":"KafkaProxy", "ibm_messageId":"IES","message":"IES: Error creating watcher : kubernetes api: Failure 401 Unauthorized"}

Run the log collection script available at https://github.com/IBM/charts/tree/master/stable/ibm-eventstreams-dev/ibm_cloud_pak/pak_extensions/support/get-logs.sh and attach the output to this issue

For those with access, logs have been loaded and the issue discussed here: https://ibm-security.slack.com/archives/CB777ACRM/p1603733272446300. We can repost here if necessary.

Describe the expected behaviour

The es-proxy-deploy pods should recover from unexpected kube issues, again likely network related.

Did this used to work? If so what has changed, for example was any maintenance of the system performed?

Environment

IBM Event Streams Version: 2019.4.2
IBM Cloud Private (ICP) Version: 3.2.4, running Openshift 4.3.35
Operating system of ICP install: Red Hat Enterprise Linux release 8.2
Browser (for UI issues):

Operator is generating constant log output

Issue ID: ES-6230

Summary: The log file for the Event Streams operator pod shows constant looping of reconcilliations when on OpenShift 4.5 or later.

SASL authentication error after installing on ICP 3.1.1

Hi there,
I just installed eventstreams into a fresh install of ICP 3.1.1 on a RHEL cluster. I logged into the eventstreams console, clicked on "Topics", and then got the following error popup:

500: An unexpected condition has occurred. org.apache.kafka.common.errors.SaslAuthenticationException: Authentication failed, invalid credentials

Long release name causes 503 errors

Issue Description

When a helm chart is installed with a release name that is longer than 15 characters, the system starts up normally but any Kafka operations in the user interface or command line interface, e.g. topic create, will result in a 503 error notification being displayed.

The longer release name causes the system to truncate the pod names which introduces an issue where the non-truncated version of the name is used in accessing the Kafka brokers.

Issue resolution

The code is changed to correctly use the truncated names throughout so the lookup no longer fails

Workaround

Re-install the helm chart with a release name that is shorter than 15 characters

Fix details

IBM internal issue number - 3705
Fix target - 2019.4.1

Metrics proxy restarting due to an index out of bounds panic error

The metrics proxy is continuously recycling with index out of Bounds on compressUpdate

panic: runtime error: index out of range

goroutine 2580 [running]:
github.ibm.com/mhub/qp-kafka-metrics-proxy/connector.(*ESearch).compressUpdate(0xc0001ee000, 0x5e26c6c6, 0xc0003d3090, 0xc000671620, 0xc0001e0400)
/home/jenkins/go/src/github.ibm.com/mhub/qp-kafka-metrics-proxy/connector/compresssearch.go:157 +0xf8f
github.ibm.com/mhub/qp-kafka-metrics-proxy/connector.(*ESearch).compressRecords(0xc0001ee000, 0xc0002cc000, 0xfe)
/home/jenkins/go/src/github.ibm.com/mhub/qp-kafka-metrics-proxy/connector/compresssearch.go:113 +0x2af
github.ibm.com/mhub/qp-kafka-metrics-proxy/connector.(*ESearch).waitForEvents.func1(0xc0001ee000, 0xc0002cc000, 0xfe, 0x5e26c6c5)
/home/jenkins/go/src/github.ibm.com/mhub/qp-kafka-metrics-proxy/connector/esearch.go:186 +0x3f
created by github.ibm.com/mhub/qp-kafka-metrics-proxy/connector.(*ESearch).waitForEvents
/home/jenkins/go/src/github.ibm.com/mhub/qp-kafka-metrics-proxy/connector/esearch.go:185 +0x6e2

Authentication failure when using transactions

Issue Description

Kafka clients using transactions fail with a Transactional Id authorization failed error.

Issue resolution

The code has been changed to correctly register transactions as a secured resource, so that transactions no longer fail.

Fix details

IBM internal issue number - 3831
Fix target - 2019.4.1

UI is unable to display consumer groups for a topic

Issue Description

The IBM Event Streams UI is unable to display consumer groups for a particular topic and
the log output for the Admin REST component shows an exception similar to the following:

java.lang.IllegalArgumentException: Invalid negative offset

See Kafka issue 9507{:target="_blank"}.

When processing a list offsets request the Kafka Admin Client does not filter out offsets with value -1 and gets an IllegalArgumentException when it creates the response.

Environment

IBM Event Streams Version: 2019.4.3, 2019.4.4

Issue reference: ES-139

503: The Kafka request did not complete after 60 seconds. Please check that the cluster is responding.

Browser Version: 5.0 (Macintosh)
IBM Event Streams Version: IBM Event Streams Community Edition-2019.2.1
IBM Cloud Private Version: 3.2

When I connecting to IBM Event Streams UI, and clicked on generate the application. it fails with the error 503 - please is the cluster is up and running.
Although all event streams pods are running and it shows system is healthy.

Applications failed to produce and consume after a period of time

Issue Description

After a period of time a connected application which could previously produce messages to a topic writes an error similar to the following:

Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials

The API key has not been revoked and is still valid, but the application is no longer able to produce messages.

The Kafka logs show:

[2019-01-03 12:56:12,916] ERROR Access Controller error outcome in transaction 'eventstreams.2.1544346578761.68755' for ApiKey 'xxxxxxxxx...' (com.ibm.eventstreams.security.auth.AuthService

and the transaction id can be found in the access controller logs with the following error:

{"error":"401 UNAUTHORIZED","message":"Unable to check authorization","mh_file":"checker.go","mh_line":243,"mh_ts":"Jan  3 12:56:12.915","transaction_id":"eventstreams.2.1544346578761.68755"}

Issue Resolution

Event Streams was caching an access token, which was not being recreated when it expired. This token is now regenerated at the appropriate time.

The issue affects both producers and consumers.

Workaround

Create a new API key and use this with the application.

Fix details

IBM Internal issue number - 2273
Fix target - 2019.1.1