Giter VIP home page Giter VIP logo

origin-aggregated-logging's Introduction

OpenShift Logging

This repo primary contains only the image definitions for the logstore components of the OpenShift Logging stack for releases 4.x and later. These components images, abbreviated as the "EFK" stack, include: Elasticsearch, Fluentd, Kibana. Please refer to the cluster-logging-operator and elasticsearch-operator for information regarding the operators which deploy these images.

The primary features this integration provides:

  • Multitenant support to isolate logs from various project namespaces
  • OpenShift OAuth2 integration
  • Log Forwarding
  • Historical log discovery and visualization
  • Log aggregation of pod and node logs

Information to build the images from github source using an OKD deployment is found here. See the quickstart guide to deploy cluster logging.

Please check the release notes for deprecated features or breaking changes .

Components

The cluster logging subsystem consists of multiple components commonly abbreviated as the "ELK" stack (though modified here to be the "EFK" stack).

Elasticsearch

Elasticsearch is a Lucene-based indexing object store into which logs are fed. Logs for node services and all containers in the cluster are fed into one deployed cluster. The Elasticsearch cluster should be deployed with redundancy and persistent storage for scale and high availability.

Fluentd

Fluentd is responsible for gathering log entries from nodes, enriching them with metadata, and forwarding them to the default logstore or other destinations defined by administrators. The content for this component has moved to https://github.com/viaq/logging-fluentd

Kibana

Kibana presents a web UI for browsing and visualizing logs in Elasticsearch.

Cluster Logging Operator

The cluster-logging-operator orchestrates the deployment of the cluster logging stack including: resource definitions, key/cert generation, component start and stop order.

Issues

Any issues can be filed at Red Hat JIRA. Please include as many details as possible in order to assist in issue resolution along with attaching a must gather output.

Contributions

To contribute to the development of origin-aggregated-logging, see REVIEW.md

origin-aggregated-logging's People

Contributors

btaani avatar danmcp avatar elyscape avatar ewolinetz avatar ibotty avatar jcantrill avatar linzhaoming avatar lukas-vlcek avatar nhosoi avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar pavolloffay avatar periklis avatar pmoogi-redhat avatar portante avatar red-gv avatar richm avatar ruromero avatar smarterclayton avatar sosiouxme avatar sqtran avatar stevekuznetsov avatar sureshgaikwad avatar syedriko avatar vimalk78 avatar vladmasarik avatar vparfonov avatar wshearn avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

origin-aggregated-logging's Issues

Error accessing _all index using Kibana user

This might be issue in Search-Guard configuration but still I think it is worth recording.

When I setup origin-aggregated-logging stack (see below if details are needed about how I did it) then I spotted difference between the following two calls. In case of the first call we use _all index placeholder (or it can be omitted at all) and it gets rejected by SG, while in the second call we explicitly used * and it is ok.

$ sudo curl -s -k --cert ./cert --key ./key https://localhost:9200/_cat/count
# or
$ sudo curl -s -k --cert ./cert --key ./key https://localhost:9200/_cat/count/_all
{
  "error" : "RuntimeException[java.lang.RuntimeException: Attempt from null to _all indices for indices:data/read/count and User [name=system.logging.kibana, roles=[]]]; nested: RuntimeException[Attempt from null to _all indices for indices:data/read/count and User [name=system.logging.kibana, roles=[]]]; ",
  "status" : 500
}

vs.

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_cat/count/*?v'
epoch      timestamp count 
1455109195 12:59:55  6797

Another example of this issue can be the following (probably more simple) use case:

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_search?size=0'
# or
$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_all/_search?size=0'
{
  "error" : "RuntimeException[java.lang.RuntimeException: Attempt from null to _all indices for indices:data/read/search and User [name=system.logging.kibana, roles=[]]]; nested: RuntimeException[Attempt from null to _all indices for indices:data/read/search and User [name=system.logging.kibana, roles=[]]]; ",
  "status" : 500
}

vs.

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/*/_search?size=0'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "failed" : 0
  },
  "hits" : {
    "total" : 7374,
    "max_score" : 0.0,
    "hits" : []
  }
}

Internally, it probably results into different calls, however, functionally it should be equivalent. The question is, shall we care about rejection of use of _all index while * is ok? May be we need to review ACL rules a bit more?

This might be related to #31?

Stack setup

I started OpenShift using vagrant from origin repo (checked out v1.1.1 tag before I built it). After I ssh-ed to openshiftdev machine I used @richm's script to build and setup whole stack. This means that I end up using kibana user certificates when doing above curl commands.

Deploying the EFK Stack FAILS with message: "error: error processing template logging/logging-es-template: [unable to parse quantity's suffix]"

Following steps from: https://docs.openshift.org/latest/install_config/aggregate_logging.html#deploying-the-efk-stack

All preparation steps were completed, keys were generated using next command:

oadm ca create-server-cert --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --hostnames=kibana.oc3.videonext.net --cert=kibana.crt --key=kibana.key
oc secrets new logging-deployer kibana.crt=kibana.crt kibana.key=kibana.key

Deployment was started using next command:

oc new-app logging-deployer-template \
--param KIBANA_HOSTNAME=kibana.oc3.videonext.net \
--param ES_CLUSTER_SIZE=1 \
--param PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 \
--param ES_INSTANCE_RAM=1Gi

Result:

[root@o3-master logging]# oc get pod/logging-deployer-67zp4 -w
NAME                     READY     STATUS              RESTARTS   AGE
logging-deployer-67zp4   0/1       ContainerCreating   0          1m
logging-deployer-67zp4   1/1       Running   0         1m
logging-deployer-67zp4   0/1       Error     0         1m

Here is full log:

[root@o3-master ~]# oc logs logging-deployer-67zp4
+ project=logging
+ mode=install
+ dir=/etc/deploy
+ secret_dir=/secret
+ master_url=https://kubernetes.default.svc.cluster.local
+ master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
+ '[' -n 1 ']'
+ oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc.cluster.local
cluster "master" set.
++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
+ oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJsb2dnaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImxvZ2dpbmctZGVwbG95ZXItdG9rZW4tcWE0bW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nZ2luZy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjYzNzg5YzQ0LTIyYjEtMTFlNi1iZDM3LTUyNTQwMGQzNmE1YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpsb2dnaW5nOmxvZ2dpbmctZGVwbG95ZXIifQ.b6rDEjuWXL3tFqfuoKbx4fKIvkeY9Y6R_utRHkwt0ZkWeoClSXquvDYUEZj6ngAIbz7XV3bs0lFm6my-l5_S4X12m84j4Ht-jdeo7n7wqUx2nS3cBSh8EISrueD0uVZZFABZt_xZiThLiHnBxAEN6OclxQ70Ehb96jgoQ4m4brmtlcsTNLogOK9pVGQ3ESfIKHSj0gvkDu3u97fDTLP5ibdstCxBUyhfdhEQRkMy0PZMuKv_giuDKASExWf-2qy-PcbTXTi6IM64Ccn0UHsIoz7_h-1kdxPufij4cIzN8el0BC_ZdnrShVnFOT125OpPo5qEf_WnFstWzOyROj9s6w
user "account" set.
+ oc config set-context current --cluster=master --user=account --namespace=logging
context "current" set.
+ oc config use-context current
switched to context "current".
+ for file in 'scripts/*.sh'
+ source scripts/install.sh
++ set -ex
+ for file in 'scripts/*.sh'
+ source scripts/upgrade.sh
++ set -ex
++ TIMES=300
++ fluentd_nodeselector=logging-infra-fluentd=true
+ for file in 'scripts/*.sh'
+ source scripts/util.sh
+ for file in 'scripts/*.sh'
+ source scripts/uuid_migrate.sh
+ case "${mode}" in
+ install_logging
+ initialize_install_vars
+ image_prefix=docker.io/openshift/origin-
+ image_version=latest
+ insecure_registry=false
+ hostname=kibana.oc3.videonext.net
+ ops_hostname=kibana-ops.example.com
+ public_master_url=https://o3-master.videonext.net:8443
+ es_instance_ram=1Gi
+ es_pvc_size=
+ es_pvc_prefix=logging-es-
+ es_cluster_size=1
+ es_node_quorum=1
+ es_recover_after_nodes=0
+ es_recover_expected_nodes=1
+ es_recover_after_time=5m
+ es_ops_instance_ram=8G
+ es_ops_pvc_size=
+ es_ops_pvc_prefix=logging-es-ops-
+ es_ops_cluster_size=1
+ es_ops_node_quorum=1
+ es_ops_recover_after_nodes=0
+ es_ops_recover_expected_nodes=1
+ es_ops_recover_after_time=5m
+ image_params=IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ generate_secrets
+ '[' '' '!=' true ']'
+ generate_signer_cert_and_conf
+ rm -rf /etc/deploy
rm: cannot remove '/etc/deploy': Permission denied
+ :
+ mkdir -p /secret
+ chmod 700 /secret
chmod: changing permissions of '/secret': Read-only file system
+ :
+ '[' -s /secret/ca.key ']'
++ date +%Y%m%d%H%M%S
+ openshift admin ca create-signer-cert --key=/etc/deploy/ca.key --cert=/etc/deploy/ca.crt --serial=/etc/deploy/ca.serial.txt --name=logging-signer-20160525195414
+ echo Generating signing configuration file
+ cat - conf/signing.conf
Generating signing configuration file
+ procure_server_cert kibana
+ local file=kibana hostnames=
+ '[' -s /secret/kibana.crt ']'
+ cp /secret/kibana.key /etc/deploy/kibana.key
+ cp /secret/kibana.crt /etc/deploy/kibana.crt
+ procure_server_cert kibana-ops
+ local file=kibana-ops hostnames=
+ '[' -s /secret/kibana-ops.crt ']'
+ '[' -n '' ']'
+ procure_server_cert kibana-internal kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
+ local file=kibana-internal hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
+ '[' -s /secret/kibana-internal.crt ']'
+ '[' -n kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com ']'
+ openshift admin ca create-server-cert --key=/etc/deploy/kibana-internal.key --cert=/etc/deploy/kibana-internal.crt --hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com --signer-cert=/etc/deploy/ca.crt --signer-key=/etc/deploy/ca.key --signer-serial=/etc/deploy/ca.serial.txt
+ '[' -s /secret/server-tls.json ']'
+ cp conf/server-tls.json /etc/deploy
+ cat /dev/null
+ cat /dev/null
+ fluentd_user=system.logging.fluentd
+ kibana_user=system.logging.kibana
+ curator_user=system.logging.curator
+ admin_user=system.admin
+ generate_PEM_cert system.logging.fluentd
+ NODE_NAME=system.logging.fluentd
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.fluentd
Generating keystore and certificate for node system.logging.fluentd
+ openssl req -out /etc/deploy/system.logging.fluentd.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.fluentd.key -subj /CN=system.logging.fluentd/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...........................................................................................+++
................................................................................................+++
writing new private key to '/etc/deploy/system.logging.fluentd.key'
-----
+ echo Sign certificate request with CA
Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.fluentd.csr -notext -out /etc/deploy/system.logging.fluentd.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 2 (0x2)
        Validity
            Not Before: May 25 19:54:18 2016 GMT
            Not After : May 25 19:54:18 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.fluentd
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                E6:16:3D:33:C7:95:FE:F8:2C:66:B6:15:FD:FB:D8:35:DB:E7:7F:7B
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:18 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.logging.kibana
+ NODE_NAME=system.logging.kibana
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.kibana
Generating keystore and certificate for node system.logging.kibana
+ openssl req -out /etc/deploy/system.logging.kibana.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.kibana.key -subj /CN=system.logging.kibana/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...................................................+++
.......................................+++
writing new private key to '/etc/deploy/system.logging.kibana.key'
-----
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.kibana.csr -notext -out /etc/deploy/system.logging.kibana.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 3 (0x3)
        Validity
            Not Before: May 25 19:54:18 2016 GMT
            Not After : May 25 19:54:18 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.kibana
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                DE:32:BA:34:4F:D5:34:7C:DA:F4:F2:1B:4C:76:28:E0:D5:46:88:96
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:18 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.logging.curator
+ NODE_NAME=system.logging.curator
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.curator
Generating keystore and certificate for node system.logging.curator
+ openssl req -out /etc/deploy/system.logging.curator.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.curator.key -subj /CN=system.logging.curator/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...............+++
..................................+++
writing new private key to '/etc/deploy/system.logging.curator.key'
-----
+ echo Sign certificate request with CA
Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.curator.csr -notext -out /etc/deploy/system.logging.curator.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 4 (0x4)
        Validity
            Not Before: May 25 19:54:19 2016 GMT
            Not After : May 25 19:54:19 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.curator
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                D5:31:7F:3F:70:BB:60:E1:F8:C2:6D:7B:F1:6C:04:F9:0D:35:D6:F7
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:19 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.admin
+ NODE_NAME=system.admin
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.admin
Generating keystore and certificate for node system.admin
+ openssl req -out /etc/deploy/system.admin.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.admin.key -subj /CN=system.admin/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
....+++
..................................+++
writing new private key to '/etc/deploy/system.admin.key'
-----
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.admin.csr -notext -out /etc/deploy/system.admin.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 5 (0x5)
        Validity
            Not Before: May 25 19:54:19 2016 GMT
            Not After : May 25 19:54:19 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.admin
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                96:D4:5D:8D:EA:35:50:D5:8F:15:95:11:06:FF:3E:6E:F0:F2:1F:94
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:19 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
++ join , logging-es logging-es.logging.svc.cluster.local logging-es-cluster logging-es-cluster.logging.svc.cluster.local logging-es-ops logging-es-ops.logging.svc.cluster.local logging-es-ops-cluster logging-es-ops-cluster.logging.svc.cluster.local
++ local IFS=,
++ shift
++ echo logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ generate_JKS_chain logging-es logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ dir=/etc/deploy
+ NODE_NAME=logging-es
+ CERT_NAMES=logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ ks_pass=kspass
+ ts_pass=tspass
+ rm -rf logging-es
+ extension_names=
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
+ echo Generating keystore and certificate for node logging-es
Generating keystore and certificate for node logging-es
+ /bin/keytool -genkey -alias logging-es -keystore /etc/deploy/keystore.jks -keypass kspass -storepass kspass -keyalg RSA -keysize 2048 -validity 712 -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
Generating certificate signing request for node logging-es
+ echo Generating certificate signing request for node logging-es
+ /bin/keytool -certreq -alias logging-es -keystore /etc/deploy/keystore.jks -storepass kspass -file /etc/deploy/logging-es.csr -keyalg rsa -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/logging-es.csr -notext -out /etc/deploy/logging-es.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 6 (0x6)
        Validity
            Not Before: May 25 19:54:23 2016 GMT
            Not After : May 25 19:54:23 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Test
            organizationalUnitName    = SSL
            commonName                = logging-es
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                BA:18:A4:E2:C1:7E:5F:CF:47:D1:27:E6:EB:F0:8F:76:41:02:CE:BC
            X509v3 Authority Key Identifier:
                0.
            X509v3 Subject Alternative Name:
                DNS:localhost, IP Address:127.0.0.1, DNS:logging-es, DNS:logging-es.logging.svc.cluster.local, DNS:logging-es-cluster, DNS:logging-es-cluster.logging.svc.cluster.local, DNS:logging-es-ops, DNS:logging-es-ops.logging.svc.cluster.local, DNS:logging-es-ops-cluster, DNS:logging-es-ops-cluster.logging.svc.cluster.local
Certificate is to be certified until May 25 19:54:23 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ echo 'Import back to keystore (including CA chain)'
Import back to keystore (including CA chain)
+ /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias sig-ca
Certificate was added to keystore
+ /bin/keytool -import -file /etc/deploy/logging-es.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias logging-es
Certificate reply was installed in keystore
+ echo 'Import CA to truststore for validating client certs'
+ /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/truststore.jks -storepass tspass -noprompt -alias sig-ca
Import CA to truststore for validating client certs
Certificate was added to keystore
+ echo All done for logging-es
All done for logging-es
+ openssl rand 16
+ openssl enc -aes-128-cbc -nosalt -out /etc/deploy/searchguard_node_key.key -pass pass:pass
+ cat /dev/urandom
+ tr -dc a-zA-Z0-9
+ fold -w 200
+ head -n 1
+ cat /dev/urandom
+ tr -dc a-zA-Z0-9
+ fold -w 64
+ head -n 1
Deleting secrets
+ echo 'Deleting secrets'
+ oc delete secret logging-fluentd logging-elasticsearch logging-kibana logging-kibana-proxy logging-kibana-ops-proxy logging-curator logging-curator-ops
Error from server: secrets "logging-fluentd" not found
Error from server: secrets "logging-elasticsearch" not found
Error from server: secrets "logging-kibana" not found
Error from server: secrets "logging-kibana-proxy" not found
Error from server: secrets "logging-kibana-ops-proxy" not found
Error from server: secrets "logging-curator" not found
Error from server: secrets "logging-curator-ops" not found
+ :
+ echo 'Creating secrets'
+ oc secrets new logging-elasticsearch key=/etc/deploy/keystore.jks truststore=/etc/deploy/truststore.jks searchguard.key=/etc/deploy/searchguard_node_key.key admin-key=/etc/deploy/system.admin.key admin-cert=/etc/deploy/system.admin.crt admin-ca=/etc/deploy/ca.crt
Creating secrets
secret/logging-elasticsearch
+ oc secrets new logging-kibana ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.kibana.key cert=/etc/deploy/system.logging.kibana.crt
secret/logging-kibana
+ oc secrets new logging-kibana-proxy oauth-secret=/etc/deploy/oauth-secret session-secret=/etc/deploy/session-secret server-key=/etc/deploy/kibana-internal.key server-cert=/etc/deploy/kibana-internal.crt server-tls.json=/etc/deploy/server-tls.json
secret/logging-kibana-proxy
+ oc secrets new logging-fluentd ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.fluentd.key cert=/etc/deploy/system.logging.fluentd.crt
secret/logging-fluentd
+ oc secrets new logging-curator ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator
+ oc secrets new logging-curator-ops ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator-ops
+ echo 'Attaching secrets to service accounts'
+ oc secrets add serviceaccount/aggregated-logging-kibana logging-kibana logging-kibana-proxy
Attaching secrets to service accounts
+ oc secrets add serviceaccount/aggregated-logging-elasticsearch logging-elasticsearch
+ oc secrets add serviceaccount/aggregated-logging-fluentd logging-fluentd
+ oc secrets add serviceaccount/aggregated-logging-curator logging-curator
+ '[' -n '' ']'
+ generate_support_objects
++ cat /etc/deploy/oauth-secret
+ oc new-app -f templates/support.yaml --param OAUTH_SECRET=sZl5tjf5SOdvGwZ9RAlfCzPdvzplkMVOREr0KrYlMgLvjBPSw1GwjfGg5rrMJPWb --param KIBANA_HOSTNAME=kibana.oc3.videonext.net --param KIBANA_OPS_HOSTNAME=kibana-ops.example.com --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param INSECURE_REGISTRY=false
--> Deploying template logging-support-template-maker for "templates/support.yaml"
     With parameters:
      OAUTH_SECRET=sZl5tjf5SOdvGwZ9RAlfCzPdvzplkMVOREr0KrYlMgLvjBPSw1GwjfGg5rrMJPWb
      KIBANA_HOSTNAME=kibana.oc3.videonext.net
      KIBANA_OPS_HOSTNAME=kibana-ops.example.com
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      INSECURE_REGISTRY=false
--> Creating resources ...
    template "logging-support-template" created
    template "logging-imagestream-template" created
    template "logging-pvc-template" created
--> Success
    Run 'oc status' to view your app.
+ oc new-app logging-support-template
--> Deploying template logging-support-template for "logging-support-template"
--> Creating resources ...
    service "logging-es" created
    service "logging-es-cluster" created
    service "logging-es-ops" created
    service "logging-es-ops-cluster" created
    service "logging-kibana" created
    service "logging-kibana-ops" created
    oauthclient "kibana-proxy" created
--> Success
    Run 'oc status' to view your app.
+ kibana_keys=
+ '[' -e /etc/deploy/kibana.crt ']'
+ kibana_keys='--cert=/etc/deploy/kibana.crt --key=/etc/deploy/kibana.key'
+ oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt --cert=/etc/deploy/kibana.crt --key=/etc/deploy/kibana.key
route "logging-kibana" created
+ kibana_keys=
+ '[' -e /etc/deploy/kibana-ops.crt ']'
+ oc create route reencrypt --service=logging-kibana-ops --hostname=kibana-ops.example.com --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt
route "logging-kibana-ops" created
+ generate_templates
+ echo '(Re-)Creating templates'
+ generate_es_template
+ create_template_optional_nodeselector '' es --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1Gi --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=es
+ shift
+ cp templates/es.yaml /etc/deploy/es.yaml
(Re-)Creating templates
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/es.yaml --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1Gi --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-elasticsearch-template-maker for "/etc/deploy/es.yaml"
     With parameters:
      ES_CLUSTER_NAME=es
      ES_INSTANCE_RAM=1Gi
      ES_NODE_QUORUM=1
      ES_RECOVER_AFTER_NODES=0
      ES_RECOVER_EXPECTED_NODES=1
      ES_RECOVER_AFTER_TIME=5m
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
    template "logging-es-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_kibana_template
+ create_template_optional_nodeselector '' kibana --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=kibana
+ shift
+ cp templates/kibana.yaml /etc/deploy/kibana.yaml
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/kibana.yaml --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-kibana-template-maker for "/etc/deploy/kibana.yaml"
     With parameters:
      KIBANA_DEPLOY_NAME=kibana
      OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local
      OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443
      ES_HOST=logging-es
      ES_PORT=9200
      OAP_DEBUG=false
      IMAGE_VERSION_DEFAULT=latest
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Creating resources ...
    template "logging-kibana-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_curator_template
+ create_template_optional_nodeselector '' curator --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=curator
+ shift
+ cp templates/curator.yaml /etc/deploy/curator.yaml
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/curator.yaml --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-curator-template-maker for "/etc/deploy/curator.yaml"
     With parameters:
      CURATOR_DEPLOY_NAME=curator
      MASTER_URL=https://kubernetes.default.svc.cluster.local
      ES_HOST=logging-es
      ES_PORT=9200
      ES_CLIENT_CERT=/etc/curator/keys/cert
      ES_CLIENT_KEY=/etc/curator/keys/key
      ES_CA=/etc/curator/keys/ca
      CURATOR_DEFAULT_DAYS=30
      CURATOR_CONF_LOCATION=/etc/curator
      CURATOR_RUN_HOUR=0
      CURATOR_RUN_MINUTE=0
      IMAGE_VERSION_DEFAULT=latest
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Creating resources ...
    template "logging-curator-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_fluentd_template
+ es_host=logging-es
+ es_ops_host=logging-es
+ '[' false == true ']'
+ create_template_optional_nodeselector logging-infra-fluentd=true fluentd --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=logging-infra-fluentd=true
+ shift
+ local template=fluentd
+ shift
+ cp templates/fluentd.yaml /etc/deploy/fluentd.yaml
+ [[ -n logging-infra-fluentd=true ]]
++ extract_nodeselector logging-infra-fluentd=true
++ local inputstring=logging-infra-fluentd=true
++ selectors=()
++ local selectors
++ for keyvalstr in '${inputstring//\,/ }'
++ keyval=(${keyvalstr//=/ })
++ [[ -n logging-infra-fluentd ]]
++ [[ -n true ]]
++ selectors+=("\"${keyval[0]}\": \"${keyval[1]}\"")
++ [[ 1 -gt 0 ]]
+++ join , '"logging-infra-fluentd": "true"'
+++ local IFS=,
+++ shift
+++ echo '"logging-infra-fluentd": "true"'
++ echo nodeSelector: '{' '"logging-infra-fluentd":' '"true"' '}'
+ sed '/serviceAccountName/ i\          nodeSelector: { "logging-infra-fluentd": "true" }' templates/fluentd.yaml
+ oc new-app -f /etc/deploy/fluentd.yaml --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-fluentd-template-maker for "/etc/deploy/fluentd.yaml"
     With parameters:
      MASTER_URL=https://kubernetes.default.svc.cluster.local
      ES_HOST=logging-es
      ES_PORT=9200
      ES_CLIENT_CERT=/etc/fluent/keys/cert
      ES_CLIENT_KEY=/etc/fluent/keys/key
      ES_CA=/etc/fluent/keys/ca
      OPS_HOST=logging-es
      OPS_PORT=9200
      OPS_CLIENT_CERT=/etc/fluent/keys/cert
      OPS_CLIENT_KEY=/etc/fluent/keys/key
      OPS_CA=/etc/fluent/keys/ca
      ES_COPY=false
      ES_COPY_HOST=
      ES_COPY_PORT=
      ES_COPY_SCHEME=https
      ES_COPY_CLIENT_CERT=
      ES_COPY_CLIENT_KEY=
      ES_COPY_CA=
      ES_COPY_USERNAME=
      ES_COPY_PASSWORD=
      OPS_COPY_HOST=
      OPS_COPY_PORT=
      OPS_COPY_SCHEME=https
      OPS_COPY_CLIENT_CERT=
      OPS_COPY_CLIENT_KEY=
      OPS_COPY_CA=
      OPS_COPY_USERNAME=
      OPS_COPY_PASSWORD=
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
    template "logging-fluentd-template" created
--> Success
    Run 'oc status' to view your app.
+ generate_objects
+ echo '(Re-)Creating deployed objects'
+ oc new-app logging-imagestream-template
(Re-)Creating deployed objects
--> Deploying template logging-imagestream-template for "logging-imagestream-template"
     With parameters:
      IMAGE_PREFIX=docker.io/openshift/origin-
--> Creating resources ...
    imagestream "logging-auth-proxy" created
    imagestream "logging-elasticsearch" created
    imagestream "logging-fluentd" created
    imagestream "logging-kibana" created
    imagestream "logging-curator" created
--> Success
    Run 'oc status' to view your app.
+ generate_es
+ pvcs=()
+ declare -A pvcs
++ oc get persistentvolumeclaim '--template={{range .items}}{{.metadata.name}} {{end}}'
+ (( n=1 ))
+ (( n<=1 ))
+ pvc=logging-es-1
+ '[' '' '!=' 1 -a '' '!=' '' ']'
+ '[' '' = 1 ']'
+ oc new-app logging-es-template
error: error processing template logging/logging-es-template: [unable to parse quantity's suffix]
[root@o3-master ~]#

need docs on using "custom" ssl certificates for fluentd, elasticsearch, etc

some people will want to use their own SSL certs for the "internal" communication between fluentd and elasticsearch.

Currently the docs clearly illustrate how to do this for Kibana itself, but not for the internal communications. Talking with @sosiouxme on the phone we determined this is "as easy as" replacing the secrets after they are generated, but this needs to be documented somehow.

ImagePullBackOff while deploying EFK

Hi, I am following readme to deploy EFK in origin 1.1.0.1. But It always fails with ImagePullBackOff status of pods. In events I could see below:

image

I tried pulling image on my machine using "docker pull " , which pulls image without any issue. Could anyone help me with this?

Note: I am able to create apps from other docker images + metrics setup is also working fine.

Thanks,
Yash

Elastic Search per-replica storage is awkward

This is a tracking issue for future improvements to the current deployment implementation.

The issue is that each instance of Elastic Search requires its own storage volume, but there is no mechanism in current Kubernetes/OpenShift to set up a replica controller that varies parameters or volumes per replica, so multiple replicas could only reuse the same storage volume. Thus for multiple instances of Elastic Search we are required to create multiple deployments, each with a single "replica" so that each can have its own storage. This violates expectations from the rest of the platform and is difficult to manage.

This problem is not specific to aggregated logging; actually just about any clustered storage mechanism is likely to run into it. There is an upstream proposal to solve this generically. The discussion goes beyond simple storage concerns to cluster parameters or other parameters that may need to vary per instance, specialized deployment hooks, and so forth.

At this time the proposal and design itself is still under heavy debate, with spinoff issues to investigate the requirements of specific cluster implementations. We will add comments to this issue when there are substantive developments in the implementation and implications for logging specifically.

deployer fails on network blips

( i could be wrong about this, just a first thought, am running version 3.1)

Problem

I think the pod deployer can fail on a network blip, and if so ( i think ) that might break the deployment entirely ? I saw a ES node go down, and never came back up today (after waiting about 5 minutes).

When i looked, i saw that there was an Error, and the logs in the deployer said this:

[cloud-user@support ~]$ oc logs logging-es-1lfg5ess-1-deploy
F0511 18:24:07.389551       1 deployer.go:70] couldn't get deployment logging/logging-es-1lfg5ess-1: Get https://172.24.0.1:443/api/v1/namespaces/logging/replicationcontrollers/logging-es-1lfg5ess-1: net/http: TLS handshake timeout

Solution

Not sure what it is, but i think maybe deployer.go 70 could be made a little more robust to retry in the event of the handshake timeout?

Thoughts

If this is really a bug, This effects the ability to predictably generate precise plots of performance of scaling of the ES nodes, and ensuring their number in the cluster... because it meanst the deployment controllers themselves are unstable.

aggregated logging / elasticsearch maintenance

We've installed the aggregated-logging Stack from https://github.com/openshift/origin-aggregated-logging/tree/master/deployment
Now, collecting and displaying logs works fine, but collected data grows, and we cant invoke elasticsearch's REST API to make some cleanup/maintenance jobs. even when i connect to the elasticsearch-pod and call
curl -X GET http://127.0.0.1:9200
the response is always
curl: (52) Empty reply from server

How do you maintain elasticsearch-data? Is there a special token / secret that must be used to connect elasticsearch?

[RFE] Enable fluentd to correctly parse multi-line json log entries such as java stacktraces.

if a Exception is thrown from a java application then fluentd will send for each line in the stacktrace a new event to elastic seach. this is undesired and produces a lot of noise in ES. it is hard to find the real errors

The parser plugin should be enabled and preconfigured for the above case (i'm sure there others )
http://docs.fluentd.org/articles/parser-plugin-overview

From docs

One more example, you can parse Java like stacktrace logs with multiline. Here is a configuration example.

format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) [(?.)] (?[^\s]+)(?.)/
If you have a following log:

2013-3-03 14:27:33 [main] INFO Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO Main - End
It will be parsed as:

2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - Start"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"ERROR","message":" Main - Exception\njavax.management.RuntimeErrorException: null\n at Main.main(Main.java:16) ~[bin/:na]"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - End"}

Deploying the EFK Stack FAILS with message: "error: open '/etc/deploy/kibana.crt': no such file or directory"

Following instruction: https://docs.openshift.org/latest/install_config/aggregate_logging.html#deploying-the-efk-stack

When executing next commands....

oc process logging-deployer-template -n openshift -v
PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443,KIBANA_HOSTNAME=kibana.oc3.videonext.net,ES_CLUSTER_SIZE=1,ES_INSTANCE_RAM=1G \

logging-deployer.json

oc create -f logging-deployer.json

Deployer pod gets created, then runs for 20-25 seconds, eventually fails:

  • kibana_keys=
  • '[' -e /etc/deploy/kibana.crt ']'
  • kibana_keys='--cert='''/etc/deploy/kibana.crt''' --key='''/etc/deploy/kibana.key''''
  • oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt '--cert='''/etc/deploy/kibana.crt'''' '--key='''/etc/deploy/kibana.key''''
    error: open '/etc/deploy/kibana.crt': no such file or directory

As I was trying to troubleshoot, I started an additional terminal into the deployer pod and was checking for '/etc/deploy/kibana.crt' file: it was present all along, until container crashed.

I strongly suspect that and extra set of quotes around the cert file name is a root cause of the failure.

Here is a full log from deployer pod:

[root@o3-master master]# oc logs logging-deployer-9zann

  • project=logging
  • mode=install
  • dir=/etc/deploy
  • secret_dir=/secret
  • master_url=https://kubernetes.default.svc.cluster.local
  • master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  • token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
  • '[' -n 1 ']'
  • oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc.cluster.local
    cluster "master" set.
    ++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
  • oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJsb2dnaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImxvZ2dpbmctZGVwbG95ZXItdG9rZW4tZjN6YnQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nZ2luZy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjY0ZjRiY2VmLTA4YjktMTFlNi05ZmNhLTUyNTQwMDBlYjgyNyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpsb2dnaW5nOmxvZ2dpbmctZGVwbG95ZXIifQ.Yjikf_XhShgkVPocuKBUmobB47JqYpXn1k3nWReilGK-XXE4rMOgSUKYmzREUJVj8JWLdrkS_o4NgcPC1WiOR2BwxsIkWbKj0d6ZYMxzegOm4P5CID2_jgcy5GHFTNt63LRKawIi4bp_fMVQODIT2kL2VjWs_3EGoPACeaYR1tm0rFH9R8vHcPf8sggvpmXHnKpwFuZNbbwbW-ra8Efdfa0J6qMqq7ngwzPuIngDDIlOXLkpaYw1LCSLrBHaX12uME8Vtu7iM1GdZyRh35MUtfDRqf87y0v7BQnsPcdUy48_zuKK_Bd1BnM9uJsoEPa1sVFODU0TWnc2ein7BNIJLg
    user "account" set.
  • oc config set-context current --cluster=master --user=account --namespace=logging
    context "current" set.
  • oc config use-context current
    switched to context "current".
  • for file in 'scripts/*.sh'
  • source scripts/install.sh
    ++ set -ex
  • for file in 'scripts/*.sh'
  • source scripts/util.sh
  • for file in 'scripts/*.sh'
  • source scripts/uuid_migrate.sh
  • case "${mode}" in
  • install_logging
  • initialize_install_vars
  • image_prefix=docker.io/openshift/origin-
  • image_version=latest
  • hostname=kibana.oc3.videonext.net
  • ops_hostname=kibana-ops.example.com
  • public_master_url=https://o3-master.videonext.net:8443
  • es_instance_ram=1G
  • es_pvc_size=
  • es_pvc_prefix=logging-es-
  • es_cluster_size=1
  • es_node_quorum=1
  • es_recover_after_nodes=0
  • es_recover_expected_nodes=1
  • es_recover_after_time=5m
  • es_ops_instance_ram=8G
  • es_ops_pvc_size=
  • es_ops_pvc_prefix=logging-es-ops-
  • es_ops_cluster_size=1
  • es_ops_node_quorum=1
  • es_ops_recover_after_nodes=0
  • es_ops_recover_expected_nodes=1
  • es_ops_recover_after_time=5m
  • generate_secrets
  • '[' '' '!=' true ']'
  • rm -rf /etc/deploy
    rm: cannot remove '/etc/deploy': Permission denied
  • :
  • mkdir -p /secret
  • chmod 700 /secret
    chmod: changing permissions of '/secret': Read-only file system
  • :
  • '[' -s /secret/ca.key ']'
    ++ date +%Y%m%d%H%M%S
  • openshift admin ca create-signer-cert --key=/etc/deploy/ca.key --cert=/etc/deploy/ca.crt --serial=/etc/deploy/ca.serial.txt --name=logging-signer-20160422201646
  • procure_server_cert kibana
  • local file=kibana hostnames=
  • '[' -s /secret/kibana.crt ']'
  • cp /secret/kibana.key /etc/deploy/kibana.key
  • cp /secret/kibana.crt /etc/deploy/kibana.crt
  • procure_server_cert kibana-ops
  • local file=kibana-ops hostnames=
  • '[' -s /secret/kibana-ops.crt ']'
  • '[' -n '' ']'
  • procure_server_cert kibana-internal kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
  • local file=kibana-internal hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
  • '[' -s /secret/kibana-internal.crt ']'
  • '[' -n kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com ']'
  • openshift admin ca create-server-cert --key=/etc/deploy/kibana-internal.key --cert=/etc/deploy/kibana-internal.crt --hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com --signer-cert=/etc/deploy/ca.crt --signer-key=/etc/deploy/ca.key --signer-serial=/etc/deploy/ca.serial.txt
  • echo Generating signing configuration file
  • cat - conf/signing.conf
    Generating signing configuration file
  • '[' -s /secret/server-tls.json ']'
  • cp conf/server-tls.json /etc/deploy
  • cat /dev/null
  • cat /dev/null
  • fluentd_user=system.logging.fluentd
  • kibana_user=system.logging.kibana
  • curator_user=system.logging.curator
  • admin_user=system.admin
  • generate_PEM_cert system.logging.fluentd
  • NODE_NAME=system.logging.fluentd
  • dir=/etc/deploy
  • echo Generating keystore and certificate for node system.logging.fluentd
  • openssl req -out /etc/deploy/system.logging.fluentd.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.fluentd.key -subj /CN=system.logging.fluentd/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
    Generating keystore and certificate for node system.logging.fluentd
    Generating a 2048 bit RSA private key
    .....................+++
    ....................................+++
    writing new private key to '/etc/deploy/system.logging.fluentd.key'

  • echo Sign certificate request with CA
  • openssl ca -in /etc/deploy/system.logging.fluentd.csr -notext -out /etc/deploy/system.logging.fluentd.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
    Sign certificate request with CA
    Using configuration from /etc/deploy/signing.conf
    Check that the request matches the signature
    Signature ok
    Certificate Details:
    Serial Number: 2 (0x2)
    Validity
    Not Before: Apr 22 20:16:50 2016 GMT
    Not After : Apr 22 20:16:50 2018 GMT
    Subject:
    countryName = DE
    localityName = Test
    organizationName = Logging
    organizationalUnitName = OpenShift
    commonName = system.logging.fluentd
    X509v3 extensions:
    X509v3 Key Usage: critical
    Digital Signature, Key Encipherment
    X509v3 Basic Constraints:
    CA:FALSE
    X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
    X509v3 Subject Key Identifier:
    B1:04:92:2D:7D:D1:7C:A4:1A:03:55:6F:6B:8B:2D:CD:28:38:87:D3
    X509v3 Authority Key Identifier:
    0.
    Certificate is to be certified until Apr 22 20:16:50 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

  • generate_PEM_cert system.logging.kibana
  • NODE_NAME=system.logging.kibana
  • dir=/etc/deploy
  • echo Generating keystore and certificate for node system.logging.kibana
    Generating keystore and certificate for node system.logging.kibana
  • openssl req -out /etc/deploy/system.logging.kibana.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.kibana.key -subj /CN=system.logging.kibana/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
    Generating a 2048 bit RSA private key
    .....................................................................+++
    ......................+++
    writing new private key to '/etc/deploy/system.logging.kibana.key'

  • echo Sign certificate request with CA
    Sign certificate request with CA
  • openssl ca -in /etc/deploy/system.logging.kibana.csr -notext -out /etc/deploy/system.logging.kibana.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
    Using configuration from /etc/deploy/signing.conf
    Check that the request matches the signature
    Signature ok
    Certificate Details:
    Serial Number: 3 (0x3)
    Validity
    Not Before: Apr 22 20:16:51 2016 GMT
    Not After : Apr 22 20:16:51 2018 GMT
    Subject:
    countryName = DE
    localityName = Test
    organizationName = Logging
    organizationalUnitName = OpenShift
    commonName = system.logging.kibana
    X509v3 extensions:
    X509v3 Key Usage: critical
    Digital Signature, Key Encipherment
    X509v3 Basic Constraints:
    CA:FALSE
    X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
    X509v3 Subject Key Identifier:
    C5:F9:16:53:B0:19:01:35:82:C4:F3:B2:4A:1F:A8:0A:03:97:92:D9
    X509v3 Authority Key Identifier:
    0.
    Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

  • generate_PEM_cert system.logging.curator
  • NODE_NAME=system.logging.curator
  • dir=/etc/deploy
  • echo Generating keystore and certificate for node system.logging.curator
  • openssl req -out /etc/deploy/system.logging.curator.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.curator.key -subj /CN=system.logging.curator/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
    Generating keystore and certificate for node system.logging.curator
    Generating a 2048 bit RSA private key
    ....+++
    .........................................................+++
    writing new private key to '/etc/deploy/system.logging.curator.key'

  • echo Sign certificate request with CA
  • openssl ca -in /etc/deploy/system.logging.curator.csr -notext -out /etc/deploy/system.logging.curator.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
    Sign certificate request with CA
    Using configuration from /etc/deploy/signing.conf
    Check that the request matches the signature
    Signature ok
    Certificate Details:
    Serial Number: 4 (0x4)
    Validity
    Not Before: Apr 22 20:16:51 2016 GMT
    Not After : Apr 22 20:16:51 2018 GMT
    Subject:
    countryName = DE
    localityName = Test
    organizationName = Logging
    organizationalUnitName = OpenShift
    commonName = system.logging.curator
    X509v3 extensions:
    X509v3 Key Usage: critical
    Digital Signature, Key Encipherment
    X509v3 Basic Constraints:
    CA:FALSE
    X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
    X509v3 Subject Key Identifier:
    B4:22:14:DA:0A:F6:7C:64:00:41:A9:0C:A7:95:88:1D:E2:61:7D:C8
    X509v3 Authority Key Identifier:
    0.
    Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

  • generate_PEM_cert system.admin
  • NODE_NAME=system.admin
  • dir=/etc/deploy
  • echo Generating keystore and certificate for node system.admin
  • openssl req -out /etc/deploy/system.admin.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.admin.key -subj /CN=system.admin/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
    Generating keystore and certificate for node system.admin
    Generating a 2048 bit RSA private key
    ....+++
    .............................................+++
    writing new private key to '/etc/deploy/system.admin.key'

  • echo Sign certificate request with CA
  • openssl ca -in /etc/deploy/system.admin.csr -notext -out /etc/deploy/system.admin.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
    Sign certificate request with CA
    Using configuration from /etc/deploy/signing.conf
    Check that the request matches the signature
    Signature ok
    Certificate Details:
    Serial Number: 5 (0x5)
    Validity
    Not Before: Apr 22 20:16:51 2016 GMT
    Not After : Apr 22 20:16:51 2018 GMT
    Subject:
    countryName = DE
    localityName = Test
    organizationName = Logging
    organizationalUnitName = OpenShift
    commonName = system.admin
    X509v3 extensions:
    X509v3 Key Usage: critical
    Digital Signature, Key Encipherment
    X509v3 Basic Constraints:
    CA:FALSE
    X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
    X509v3 Subject Key Identifier:
    CC:03:4D:07:D0:F0:3B:38:0C:E7:E6:30:EA:59:50:38:8D:2E:DB:1F
    X509v3 Authority Key Identifier:
    0.
    Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
++ join , logging-es logging-es.logging.svc.cluster.local logging-es-cluster logging-es-cluster.logging.svc.cluster.local logging-es-ops logging-es-ops.logging.svc.cluster.local logging-es-ops-cluster logging-es-ops-cluster.logging.svc.cluster.local
++ local IFS=,
++ shift
++ echo logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local

  • generate_JKS_chain logging-es logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
  • dir=/etc/deploy
  • NODE_NAME=logging-es
  • CERT_NAMES=logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
  • ks_pass=kspass
  • ts_pass=tspass
  • rm -rf logging-es
  • extension_names=
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster
  • for name in '${CERT_NAMES//,/ }'
  • extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
  • echo Generating keystore and certificate for node logging-es
  • /bin/keytool -genkey -alias logging-es -keystore /etc/deploy/keystore.jks -keypass kspass -storepass kspass -keyalg RSA -keysize 2048 -validity 712 -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
    Generating keystore and certificate for node logging-es
  • echo Generating certificate signing request for node logging-es
  • /bin/keytool -certreq -alias logging-es -keystore /etc/deploy/keystore.jks -storepass kspass -file /etc/deploy/logging-es.csr -keyalg rsa -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
    Generating certificate signing request for node logging-es
  • echo Sign certificate request with CA
  • openssl ca -in /etc/deploy/logging-es.csr -notext -out /etc/deploy/logging-es.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
    Sign certificate request with CA
    Using configuration from /etc/deploy/signing.conf
    Check that the request matches the signature
    Signature ok
    Certificate Details:
    Serial Number: 6 (0x6)
    Validity
    Not Before: Apr 22 20:16:53 2016 GMT
    Not After : Apr 22 20:16:53 2018 GMT
    Subject:
    countryName = DE
    localityName = Test
    organizationName = Test
    organizationalUnitName = SSL
    commonName = logging-es
    X509v3 extensions:
    X509v3 Key Usage: critical
    Digital Signature, Key Encipherment
    X509v3 Basic Constraints:
    CA:FALSE
    X509v3 Extended Key Usage:
    TLS Web Server Authentication, TLS Web Client Authentication
    X509v3 Subject Key Identifier:
    79:F7:CD:26:B8:84:5B:19:08:87:53:1F:2B:C8:25:A7:8A:73:70:EC
    X509v3 Authority Key Identifier:
    0.
    X509v3 Subject Alternative Name:
    DNS:localhost, IP Address:127.0.0.1, DNS:logging-es, DNS:logging-es.logging.svc.cluster.local, DNS:logging-es-cluster, DNS:logging-es-cluster.logging.svc.cluster.local, DNS:logging-es-ops, DNS:logging-es-ops.logging.svc.cluster.local, DNS:logging-es-ops-cluster, DNS:logging-es-ops-cluster.logging.svc.cluster.local
    Certificate is to be certified until Apr 22 20:16:53 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

  • echo 'Import back to keystore (including CA chain)'
  • /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias sig-ca
    Import back to keystore (including CA chain)
    Certificate was added to keystore
  • /bin/keytool -import -file /etc/deploy/logging-es.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias logging-es
    Certificate reply was installed in keystore
    Import CA to truststore for validating client certs
  • echo 'Import CA to truststore for validating client certs'
  • /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/truststore.jks -storepass tspass -noprompt -alias sig-ca
    Certificate was added to keystore
    All done for logging-es
  • echo All done for logging-es
  • openssl rand 16
  • openssl enc -aes-128-cbc -nosalt -out /etc/deploy/searchguard_node_key.key -pass pass:pass
  • cat /dev/urandom
  • tr -dc a-zA-Z0-9
  • fold -w 200
  • head -n 1
  • cat /dev/urandom
  • tr -dc a-zA-Z0-9
  • fold -w 64
  • head -n 1
  • echo 'Deleting existing secrets'
  • oc delete secret logging-fluentd logging-elasticsearch logging-kibana logging-kibana-proxy logging-kibana-ops-proxy logging-curator logging-curator-ops
    Deleting existing secrets
    secret "logging-fluentd" deleted
    secret "logging-elasticsearch" deleted
    secret "logging-kibana" deleted
    secret "logging-kibana-proxy" deleted
    secret "logging-curator" deleted
    secret "logging-curator-ops" deleted
    Error from server: secrets "logging-kibana-ops-proxy" not found
  • :
  • echo 'Creating secrets'
    Creating secrets
  • oc secrets new logging-elasticsearch key=/etc/deploy/keystore.jks truststore=/etc/deploy/truststore.jks searchguard.key=/etc/deploy/searchguard_node_key.key admin-key=/etc/deploy/system.admin.key admin-cert=/etc/deploy/system.admin.crt admin-ca=/etc/deploy/ca.crt
    secret/logging-elasticsearch
  • oc secrets new logging-kibana ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.kibana.key cert=/etc/deploy/system.logging.kibana.crt
    secret/logging-kibana
  • oc secrets new logging-kibana-proxy oauth-secret=/etc/deploy/oauth-secret session-secret=/etc/deploy/session-secret server-key=/etc/deploy/kibana-internal.key server-cert=/etc/deploy/kibana-internal.crt server-tls.json=/etc/deploy/server-tls.json
    secret/logging-kibana-proxy
  • oc secrets new logging-fluentd ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.fluentd.key cert=/etc/deploy/system.logging.fluentd.crt
    secret/logging-fluentd
  • oc secrets new logging-curator ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
    secret/logging-curator
  • oc secrets new logging-curator-ops ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
    secret/logging-curator-ops
  • echo 'Attaching secrets to service accounts'
  • oc secrets add serviceaccount/aggregated-logging-kibana logging-kibana logging-kibana-proxy
    Attaching secrets to service accounts
  • oc secrets add serviceaccount/aggregated-logging-elasticsearch logging-elasticsearch
  • oc secrets add serviceaccount/aggregated-logging-fluentd logging-fluentd
  • oc secrets add serviceaccount/aggregated-logging-curator logging-curator
  • generate_templates
  • echo '(Re-)Creating templates'
  • oc delete template --selector logging-infra=curator
    (Re-)Creating templates
    template "logging-curator-template" deleted
  • oc delete template --selector logging-infra=kibana
    template "logging-kibana-template" deleted
  • oc delete template --selector logging-infra=fluentd
    template "logging-fluentd-template" deleted
  • oc delete template --selector logging-infra=elasticsearch
    template "logging-es-template" deleted
  • create_template_optional_nodeselector '' es --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1G --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest
  • local nodeselector=
  • shift
  • local template=es
  • shift
  • cp templates/es.yaml /etc/deploy/es.yaml
  • [[ -n '' ]]
  • oc new-app -f /etc/deploy/es.yaml --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1G --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest
    --> Deploying template logging-elasticsearch-template-maker for "/etc/deploy/es.yaml"
    With parameters:
    ES_CLUSTER_NAME=es
    ES_INSTANCE_RAM=1G
    ES_NODE_QUORUM=1
    ES_RECOVER_AFTER_NODES=0
    ES_RECOVER_EXPECTED_NODES=1
    ES_RECOVER_AFTER_TIME=5m
    IMAGE_VERSION_DEFAULT=latest
    --> Creating resources ...
    template "logging-es-template" created
    --> Success
    Run 'oc status' to view your app.
  • es_host=logging-es
  • create_template_optional_nodeselector '' kibana --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest
  • local nodeselector=
  • shift
  • local template=kibana
  • shift
  • cp templates/kibana.yaml /etc/deploy/kibana.yaml
  • [[ -n '' ]]
  • oc new-app -f /etc/deploy/kibana.yaml --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest
    --> Deploying template logging-kibana-template-maker for "/etc/deploy/kibana.yaml"
    With parameters:
    KIBANA_DEPLOY_NAME=kibana
    OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local
    OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443
    ES_HOST=logging-es
    ES_PORT=9200
    OAP_DEBUG=false
    IMAGE_VERSION_DEFAULT=latest
    --> Creating resources ...
    template "logging-kibana-template" created
    --> Success
    Run 'oc status' to view your app.
  • create_template_optional_nodeselector '' curator --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest
  • local nodeselector=
  • shift
  • local template=curator
  • shift
  • cp templates/curator.yaml /etc/deploy/curator.yaml
  • [[ -n '' ]]
  • oc new-app -f /etc/deploy/curator.yaml --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest
    --> Deploying template logging-curator-template-maker for "/etc/deploy/curator.yaml"
    With parameters:
    CURATOR_DEPLOY_NAME=curator
    MASTER_URL=https://kubernetes.default.svc.cluster.local
    ES_HOST=logging-es
    ES_PORT=9200
    ES_CLIENT_CERT=/etc/curator/keys/cert
    ES_CLIENT_KEY=/etc/curator/keys/key
    ES_CA=/etc/curator/keys/ca
    CURATOR_DEFAULT_DAYS=30
    CURATOR_CONF_LOCATION=/etc/curator
    CURATOR_RUN_HOUR=0
    CURATOR_RUN_MINUTE=0
    IMAGE_VERSION_DEFAULT=latest
    --> Creating resources ...
    template "logging-curator-template" created
    --> Success
    Run 'oc status' to view your app.
  • es_ops_host=logging-es
  • '[' false == true ']'
  • create_template_optional_nodeselector logging-infra-fluentd=true fluentd --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param IMAGE_VERSION_DEFAULT=latest
  • local nodeselector=logging-infra-fluentd=true
  • shift
  • local template=fluentd
  • shift
  • cp templates/fluentd.yaml /etc/deploy/fluentd.yaml
  • [[ -n logging-infra-fluentd=true ]]
    ++ extract_nodeselector logging-infra-fluentd=true
    ++ local inputstring=logging-infra-fluentd=true
    ++ selectors=()
    ++ local selectors
    ++ for keyvalstr in '${inputstring//,/ }'
    ++ keyval=(${keyvalstr//=/ })
    ++ [[ -n logging-infra-fluentd ]]
    ++ [[ -n true ]]
    ++ selectors=("${selectors[@]}" ""${keyval[0]}": "${keyval[1]}"")
    ++ [[ 1 -gt 0 ]]
    +++ join , '"logging-infra-fluentd": "true"'
    +++ local IFS=,
    +++ shift
    +++ echo '"logging-infra-fluentd": "true"'
    ++ echo nodeSelector: '{' '"logging-infra-fluentd":' '"true"' '}'
  • sed '/serviceAccountName/ i\ nodeSelector: { "logging-infra-fluentd": "true" }' templates/fluentd.yaml
  • oc new-app -f /etc/deploy/fluentd.yaml --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param IMAGE_VERSION_DEFAULT=latest
    --> Deploying template logging-fluentd-template-maker for "/etc/deploy/fluentd.yaml"
    With parameters:
    MASTER_URL=https://kubernetes.default.svc.cluster.local
    ES_HOST=logging-es
    ES_PORT=9200
    ES_CLIENT_CERT=/etc/fluent/keys/cert
    ES_CLIENT_KEY=/etc/fluent/keys/key
    ES_CA=/etc/fluent/keys/ca
    OPS_HOST=logging-es
    OPS_PORT=9200
    OPS_CLIENT_CERT=/etc/fluent/keys/cert
    OPS_CLIENT_KEY=/etc/fluent/keys/key
    OPS_CA=/etc/fluent/keys/ca
    ES_COPY=false
    ES_COPY_HOST=
    ES_COPY_PORT=
    ES_COPY_SCHEME=https
    ES_COPY_CLIENT_CERT=
    ES_COPY_CLIENT_KEY=
    ES_COPY_CA=
    ES_COPY_USERNAME=
    ES_COPY_PASSWORD=
    OPS_COPY_HOST=
    OPS_COPY_PORT=
    OPS_COPY_SCHEME=https
    OPS_COPY_CLIENT_CERT=
    OPS_COPY_CLIENT_KEY=
    OPS_COPY_CA=
    OPS_COPY_USERNAME=
    OPS_COPY_PASSWORD=
    IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
    IMAGE_VERSION_DEFAULT=latest
    --> Creating resources ...
    template "logging-fluentd-template" created
    --> Success
    Run 'oc status' to view your app.
  • '[' '' '!=' true ']'
  • oc delete template --selector logging-infra=support
    template "logging-imagestream-template" deleted
    template "logging-pvc-template" deleted
    template "logging-support-template" deleted
    ++ cat /etc/deploy/oauth-secret
  • oc new-app -f templates/support.yaml --param OAUTH_SECRET=WyJiJ6bpeu0J225RCnhg1uUTS1F8PO9ViiaE2DMPHKwGS5OusNxePhOUSHboW6KD --param KIBANA_HOSTNAME=kibana.oc3.videonext.net --param KIBANA_OPS_HOSTNAME=kibana-ops.example.com --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
    --> Deploying template logging-support-template-maker for "templates/support.yaml"
    With parameters:
    OAUTH_SECRET=WyJiJ6bpeu0J225RCnhg1uUTS1F8PO9ViiaE2DMPHKwGS5OusNxePhOUSHboW6KD
    KIBANA_HOSTNAME=kibana.oc3.videonext.net
    KIBANA_OPS_HOSTNAME=kibana-ops.example.com
    IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
    --> Creating resources ...
    template "logging-support-template" created
    template "logging-imagestream-template" created
    template "logging-pvc-template" created
    --> Success
    Run 'oc status' to view your app.
    (Re-)Creating deployed objects
  • generate_objects
  • echo '(Re-)Creating deployed objects'
  • '[' '' '!=' true ']'
  • oc process logging-support-template
  • oc delete -f -
    service "logging-es" deleted
    service "logging-es-cluster" deleted
    service "logging-es-ops" deleted
    service "logging-es-ops-cluster" deleted
    service "logging-kibana" deleted
    service "logging-kibana-ops" deleted
    oauthclient "kibana-proxy" deleted
  • oc delete imagestream,service,route --selector logging-infra=support
    No resources found
  • oc process logging-support-template
  • oc create -f -
    service "logging-es" created
    service "logging-es-cluster" created
    service "logging-es-ops" created
    service "logging-es-ops-cluster" created
    service "logging-kibana" created
    service "logging-kibana-ops" created
    oauthclient "kibana-proxy" created
  • kibana_keys=
  • '[' -e /etc/deploy/kibana.crt ']'
  • kibana_keys='--cert='''/etc/deploy/kibana.crt''' --key='''/etc/deploy/kibana.key''''
  • oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt '--cert='''/etc/deploy/kibana.crt'''' '--key='''/etc/deploy/kibana.key''''
    error: open '/etc/deploy/kibana.crt': no such file or directory

curator starts before elasticsearch during deploy

The curator pod starts before the elasticsearch pod during deploy. This causes errors like this in the curator logs::

logging-curator running [1] jobs
2016-03-11 18:33:13,831 ERROR     Connection failure.
logging-curator run finish

This is because elasticsearch hasn't started yet::

[2016-03-11 18:33:18,372][INFO ][node                     ] [Outrage] version[1.5.2], pid[8], build[62ff986/2015-04-27T09:21:06Z]
[2016-03-11 18:33:18,377][INFO ][node                     ] [Outrage] initializing ...
[2016-03-11 18:33:19,601][INFO ][plugins                  ] [Outrage] loaded [searchguard, openshift-elasticsearch-plugin, cloud-kubernetes], sites []

Is there some way we can orchestrate the pods in the deployer?

Question, regarding fluentd volume configuration

Guys,

I've dismantled this as we want to borrow some bits of this but make it work for our specific environment. In the process I do not understand how the master branch can be working. Either some magic is going on, or what is built does not match what is on master.

The volume configuration for fluentd as noted below:
https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/templates/fluentd.yaml#L79

Does not match where fluentd is configured to look for the log files:
https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/fluent.conf#L147

Update search-guard ACL

To address the following error:

[ERROR][com.floragunn.searchguard.filter.SearchGuardActionFilter] Error while apply() due to com.floragunn.searchguard.tokeneval.MalformedConfigurationException: no bypass or execute filters at all for action indices:admin/mappings/fields/get

I believe this occurs when people are navigating to the settings page in Kibana, and Kibana tries to pull the fields for the logstash-* index.

Deployer pod fails on timeout: How to debug?

I'm finding that on a cluster, after deploying some ES nodes, I get this:

logging-es-wucg4yv5-1-deploy   0/1       Error       0          2m
logging-es-wvn347bb-1-deploy   0/1       Error       0          4m
logging-es-xf75ods2-1-deploy   0/1       Error       0          2m
logging-es-xtb58wpb-1-deploy   0/1       Error       0          4m
logging-es-y2kmwhda-1-deploy   0/1       Error       0          2m
logging-es-ybwfcgo1-1-deploy   0/1       Error       0          2m

Basically, all the ES deploy tasks fail. Since there's no logging-es pod ever created, all i see is a timeout.

root@support: /opt/jay/team8/projects/enterprise_logging # oc logs logging-es-q0pf7vnt-1-deploy
I0520 17:44:24.403372       1 deployer.go:200] Deploying logging/logging-es-q0pf7vnt-1 for the first time (replicas: 1)
I0520 17:44:24.447335       1 recreate.go:126] Scaling logging/logging-es-q0pf7vnt-1 to 1 before performing acceptance check
F0520 17:46:25.543245       1 deployer.go:70] couldn't scale logging/logging-es-q0pf7vnt-1 to 1: timed out waiting for the condition
  1. Any idea how to fix this?
  2. How can we get more information about where the logging deployer is falling down?

cc @rflorenc

Elasticsearch's run.sh does not set heap size at all

ES_JAVA_OPTS is not exported, so the elasticsearch start script does not even see the options set there.
Also, it configures elasticsearch to use a variable amount of Ram (-Xms512m -Xmx$HALF_THE_RAM). Elasticsearch's heap sizing guide recommends to set them to the same value, or better: set ES_HEAP_SIZE.

When this is fixed, it might possibly conflict with options set in elasticsearch.in.sh. That should be checked.

Improve ES volume documentation

It's become clear the official docs for dealing with ES volumes are inadequate. A discussion of how multiple instances attach separate volumes and the tradeoffs of hostmount vs NAS is in order.

"error: couldn't read version from server: Get https://kubernetes.default.svc.cluster.local/api: dial tcp: lookup kubernetes.default.svc.cluster.local: no such host" in logging-deployer Pod

Met "error: couldn't read version from server: Get https://kubernetes.default.svc.cluster.local/api: dial tcp: lookup kubernetes.default.svc.cluster.local: no such host" in logging-deployer Pod.

Steps to Reproduce:

  1. Log into OSE env
  2. Create a project named "chunpj"
  3. Create the Deployer Secret
    oc secrets new logging-deployer nothing=/dev/null
  4. Create the Deployer ServiceAccount
    oc create -f - <<API
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: logging-deployer
    secrets:
  • name: logging-deployer
    API

oc policy add-role-to-user edit
system:serviceaccount:chunpj:logging-deployer
5. Run the Deployer
oc process -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployment/deployer.yaml -v IMAGE_PREFIX=<rcm-img-docker01_REGISTRY>/openshift3/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://<OSE_MASTER>:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -
6. Check the logging-deployer's logs, please refer to the detail logs from this gist: https://gist.github.com/chunyunchen/19122b09b62cc178af2d

Fluentd will blindly use hostname from syslog

Currently Fluentd will just use the value of the third column to populate the value of 'hostname', even if it is something like 'localhost'. We should probably replace this with the FQDN

Handle kibana-proxy SSL/TLS termination in the router

As far as I can tell, there's no real reason to have SSL/TLS termination be handled by the Kibana auth proxy instead of in the OpenShift router. This is especially true since the router image is updated far more often than that of the Kibana auth proxy, which is running on Node.js v0.10.36 with an old version of OpenSSL.

Improve documentation surrounding restricting deployments of different EFK components

If we want to restrict where ES and Kibana are deployed, we would still want to ensure that Fluentd can be deployed to every node. We need to describe means of doing this, such as leveraging the Default namespace and using https://docs.openshift.org/latest/admin_guide/pod_network.html#joining-project-networks when the multitenant SDN plugin is available

Describe how do we create in different namespaces after install?

fluentd container does not work with Project Atomic

This appears to be a known issue, but I thought it would be nice to have an open bug about it, because it's extremely hard to google for.

The Fluentd container generates SELinux exceptions on Project Atomic. Dan Walsh has written about how to create an SELinux context to allow Fluentd to run: http://www.projectatomic.io/blog/2016/03/selinux-and-docker-part-2/

However, it's not at all clear to me how to generate such a policy on Atomic (i.e. via openshift-ansible) and then make use of it in the fluentd template.

EFK doubt

Hi, I have deployed EFK in openshift 3.1, In doc it is mentoined "Unfortunately there is no way to stream logs as they are created at this time."

how much time gap is there between log generation and display in kibana and why??

Clean up Fluentd Dockerfile

Should no longer provide out_*.rb files and should not ADD them either. elasticsearch_dynamic is now available from fluentd-plugin-elasticsearch releases

[RFE] Expand deployer to have upgrade option

Can safely scale down components
Pull in updated images
Create missing or remove deprecated api objects (e.g.: secrets, dc, ds)
Maintain previous configurations (volumes and nodeSelectors)

then Scale back up

Incorrect year used for syslog based message operations index

When pulling in log files from the previous year in the current year; the index the logs are first created for are the current year.

E.g. logs from 12/27/2015 will be created in the index ".operations.2016.12.27" if it is read in in 2016 initially

logging-fluentd connection refused to ElasticSearch

We are running into an issue following the documentation. We have basically followed the docs apart from calling the our project mbaas-logging instead of the default.

The issue we are seeing is with the fluentd pod:

$ oc logs -f logging-fluentd-1-3925v
2016-01-25 08:02:35 -0500 [info]: reading config file path="/etc/fluent/fluent.conf"
2016-01-25 08:04:06 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-01-25 08:02:48 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es.mbaas-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! Connection refused - connect(2) (Errno::ECONNREFUSED)" plugin_id="object:1421250"
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:61:in `rescue in client'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:58:in `client'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `rescue in send'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:182:in `send'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:174:in `block in write'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:173:in `each'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:173:in `write'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/buffer.rb:325:in `write_chunk'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/buffer.rb:304:in `pop'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/output.rb:321:in `try_flush'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/output.rb:140:in `run'
2016-01-25 08:05:39 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-01-25 08:02:50 -0500
error_class="Fluent::ElasticsearchOutput::ConnectionFailure" 
error="Can not reach Elasticsearch cluster 
({:host=>\"logging-es.mbaas-logging.svc.cluster.local\", 
:port=>9200, 
:scheme=>\"https\", 
:user=>\"fluentd\",  
:password=>\"obfuscated\"})! 
Connection refused - connect(2) (Errno::ECONNREFUSED)" plugin_id="object:1421250"
2016-01-25 08:05:39 -0500 [warn]: suppressed same stacktrace

The ES_HOST is defined in the run.sh:

es_host=logging-es.${project}.svc.cluster.local

If we look inspect /etc/resolv.conf for the fluentd pod:

[root@logging-fluentd-8-9ukoc /]# cat /etc/resolv.conf
nameserver 172.30.0.1
nameserver 10.0.2.3
search mbaas-logging.svc.cluster.local svc.cluster.local cluster.local feedhenry.io
options ndots:5

The original host we were dealing with was logging-es.mbaas-logging.svc.cluster.local. We have 4 dots in that name, so the above search args (in combination with ndots) would append mbaas-logging.svc.cluster.local to the first resolve try (because there are fewer than ndots:5).
The order would be:

  1. logging-es.mbaas-logging.svc.cluster.local.mbaas-logging.svc.cluster.local
  2. logging-es.mbaas-logging.svc.cluster.local.svc.cluster.local
  3. logging-es.mbaas-logging.svc.cluster.local.cluster.local
  4. logging-es.mbaas-logging.svc.cluster.local.feedhenry.io

None of the above would be resolved with any on the dns servers.
The above was not a correct statement. It turns out that the last entry in the above list could be resolved by the second nameserver (10.0.2.3):

[vagrant@local ~]$ dig @10.0.2.3 logging-es.mbaas-logging.svc.cluster.local.feedhenry.io

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7 <<>> @10.0.2.3 logging-es.mbaas-logging.svc.cluster.local.feedhenry.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50317
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;logging-es.mbaas-logging.svc.cluster.local.feedhenry.io. IN A

;; ANSWER SECTION:
logging-es.mbaas-logging.svc.cluster.local.feedhenry.io. 6 IN A 192.168.33.12

;; Query time: 606 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Wed Jan 27 08:31:49 UTC 2016
;; MSG SIZE  rcvd: 100

We can in fact see this when we use curl:

curl -s -S -I -k --verbose https://logging-es.mbaas-logging.svc.cluster.local:9200
* About to connect() to logging-es.mbaas-logging.svc.cluster.local port 9200 (#0)
* Trying 192.168.33.12...
* Connection refused
* Failed connect to logging-es.mbaas-logging.svc.cluster.local:9200; Connection refused
* Closing connection 0
curl: (7) Failed connect to logging-es.mbaas-logging.svc.cluster.local:9200; Connection refused

Look at this, it is trying to connect to 192.168.33.12 and not the kubernetes services cluster ip 172.30.177.137.

If we bypass the dns resolution we can infact connect to the internal cluster ip:

$ curl -s -S -I -k -H "Host: logging-es.mbaas-logging.svc.cluster.local" --resolve logging-es.mbaas-logging.svc.cluster.local:9200:172.30.177.137 --verbose https://logging-es.mbaas-logging.svc.cluster.local:9200

* Added logging-es.mbaas-logging.svc.cluster.local:9200:172.30.177.137 to DNS cache
* About to connect() to logging-es.mbaas-logging.svc.cluster.local port 9200 (#0)
*   Trying 172.30.177.137...
* Connected to logging-es.mbaas-logging.svc.cluster.local (172.30.177.137) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* NSS error -12271 (SSL_ERROR_BAD_CERT_ALERT)
* SSL peer cannot verify your certificate.
* Closing connection 0
curl: (58) NSS: client certificate not found (nickname not specified)

If we simply use logging-es that would append .mbaas-logging.svc.cluster.local which can be resolved:

[vagrant@local ~]$  kubectl exec busybox -- nslookup logging-es.mbaas-logging.svc.cluster.local
Server:    172.30.0.1
Address 1: 172.30.0.1 kubernetes.default.svc.cluster.local

Name:      logging-es.mbaas-logging.svc.cluster.local
Address 1: 172.30.177.137 logging-es.mbaas-logging.svc.cluster.local

One solution for this would be to not use the full service name, as in my-svc.my-namespace.svc.cluster.local. Since these pods all live in the same namespace/project they can refer to other services in the same namespace project using the simple service name (without the namespace.

As mentioned earlier the value for ES_HOST is logging-es.mbaas-logging.svc.cluster.local in our case. This is then used when processing fluentd.yaml.

We are currently working around this issue by editing the logging-fluentd deployment configuration:

$ oc edit deploymentconfig logging-fluentd

and changing the ES_HOST value to logging-es (the default value in fluentd.yaml). This works for us and we logs are showing up in Kibana.

I'm trying to figure out if this is something specific to our environment or if this might also be an issue for others?

EFK: Error checking ACL when seeding

Issue:
After doing a basic installation of EFK logging on a lab cluster, the following errors come up in all of the Elasticsearch pods:

[2016-05-23 15:26:50,982][ERROR][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Stained Glass Scarlet] Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
...

For further details see also:
[(https://gist.github.com/rflorenc/18644314624ff0876b9f62a8e30ac25c)]

Version info:
openshift v3.2.0.44
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

oc get pods -o wide
NAME READY STATUS RESTARTS AGE NODE
logging-deployer-0ihvw 0/1 Completed 0 2d 192.1.12.26
logging-deployer-8vlkb 0/1 Completed 0 2d 192.1.12.26
logging-deployer-98bpu 0/1 Completed 0 2d 192.1.12.26
logging-deployer-dwyno 0/1 Completed 0 2d 192.1.12.26
logging-deployer-jd6so 0/1 Completed 0 2d 192.1.12.26
logging-deployer-rpri1 0/1 Completed 0 2d 192.1.12.26
logging-deployer-xtgde 0/1 Completed 0 2d 192.1.12.26
logging-es-0egyxsqv-1-7yo6f 1/1 Running 0 2d 192.1.13.21
logging-es-0tkimh6h-1-52ejb 1/1 Running 0 2d 192.1.13.44
logging-es-1jnuwamn-1-2qgsu 1/1 Running 0 2d 192.1.12.91
logging-es-1wutikij-1-q63ed 1/1 Running 0 2d 192.1.13.30
logging-es-6mdp2lg4-1-qaqda 1/1 Running 0 2d 192.1.13.35
logging-es-6qvmrgr3-1-66jhn 1/1 Running 0 2d 192.1.12.80
logging-es-agd26le3-1-tc59s 1/1 Running 0 2d 192.1.13.16
logging-es-blvicocl-1-jygw6 1/1 Running 0 2d 192.1.13.26
logging-es-crg80ae2-1-cxzgd 1/1 Running 0 2d 192.1.13.33
logging-es-cu4hcxij-1-st9og 1/1 Running 0 2d 192.1.12.81
logging-es-ewaf2h33-1-snp2v 1/1 Running 0 2d 192.1.12.86
logging-es-ki533gnt-1-5hx7x 1/1 Running 0 2d 192.1.13.1
logging-es-m62n8qre-1-comug 1/1 Running 0 2d 192.1.13.14
logging-es-qb6hswff-1-5cosr 1/1 Running 0 2d 192.1.13.4
logging-es-qdbdftok-1-v2foz 1/1 Running 0 2d 192.1.13.51
logging-es-shm6lj2r-1-wxs1f 1/1 Running 0 2d 192.1.13.28
logging-es-sq0arlq1-1-flv71 1/1 Running 0 2d 192.1.12.96
logging-es-uvcyndsx-1-6njbd 1/1 Running 0 2d 192.1.13.17
logging-es-ve4k3a1i-1-nhuot 1/1 Running 0 2d 192.1.13.20
logging-es-y4fts5fj-1-7n9zw 1/1 Running 0 2d 192.1.12.78
logging-kibana-1-5915n 2/2 Running 0 2d 192.1.12.26

Add support for image pull secrets

The deployer supports arbitrary prefixes for the logging component images. It would be nice if you could specify an image pull secret, in order to allow them to be hosted on a locked-down external Docker registry.

Support standalone external ElasticSearch deployment

Since #60, it is possible to have fluentd send logs to an external ElasticSearch instance in addition to the embedded one. It would be nice to be able to send logs to an external ElasticSearch instance exclusively, without even running ElasticSearch (or its associated Kibana and curator) inside OpenShift.

[exploratory] client-node based optimizations for finer grained scaling tasks

Problem

Scaling Elastic search now is done course-grained: By creating more servers. However, there are many different types of scaling - write throughput, capacity, etc... So, when we need to just scale one aspect (i.e. write throughput), we may have to take up extra resources on a cluster (storage) if we only have 1 way to scale elastic search.

Soution

To prevent wasting resources, we can implement finer-grained scaling of different elastic search nodes (writers, readers, clients).

Details

@portante has suggested we leverage ES-Client nodes to decouple scaling requirements , cc @jeremyeder @timothysc @rflorenc

the elastic search "client node" is a special node which is really just there to do smart "ES-aware" loadbalancing, https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html.

this diagram summarizes the arch change we might implement if we use ES Clients

This is just a first pass but the main idea is:

  • separate writers from readers
  • have kuberentes service abstraction connect to "client nodes"

The benefits:

Writers never have to field read requests, clients minimize the amount of "hops" they have to do when connecting to a shard, and you decouple scaling of logging capacity (bottlnecked by servers) from scaling of other operations (reads/writes/lookups).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.