openshift / origin-aggregated-logging Goto Github PK

Shell 3.73% Go 0.04% Python 0.17% Makefile 0.02% Awk 0.01% Dockerfile 0.09% JavaScript 81.33% HTML 5.51% CSS 5.73% SCSS 1.52% Less 1.73% Pug 0.10% Handlebars 0.02%

origin-aggregated-logging's Introduction

OpenShift Logging

This repo primary contains only the image definitions for the logstore components of the OpenShift Logging stack for releases 4.x and later. These components images, abbreviated as the "EFK" stack, include: Elasticsearch, Fluentd, Kibana. Please refer to the cluster-logging-operator and elasticsearch-operator for information regarding the operators which deploy these images.

The primary features this integration provides:

Multitenant support to isolate logs from various project namespaces
OpenShift OAuth2 integration
Log Forwarding
Historical log discovery and visualization
Log aggregation of pod and node logs

Information to build the images from github source using an OKD deployment is found here. See the quickstart guide to deploy cluster logging.

Please check the release notes for deprecated features or breaking changes .

Components

The cluster logging subsystem consists of multiple components commonly abbreviated as the "ELK" stack (though modified here to be the "EFK" stack).

Elasticsearch

Elasticsearch is a Lucene-based indexing object store into which logs are fed. Logs for node services and all containers in the cluster are fed into one deployed cluster. The Elasticsearch cluster should be deployed with redundancy and persistent storage for scale and high availability.

Fluentd

Fluentd is responsible for gathering log entries from nodes, enriching them with metadata, and forwarding them to the default logstore or other destinations defined by administrators. The content for this component has moved to https://github.com/viaq/logging-fluentd

Kibana

Kibana presents a web UI for browsing and visualizing logs in Elasticsearch.

Cluster Logging Operator

The cluster-logging-operator orchestrates the deployment of the cluster logging stack including: resource definitions, key/cert generation, component start and stop order.

Issues

Any issues can be filed at Red Hat JIRA. Please include as many details as possible in order to assist in issue resolution along with attaching a must gather output.

Contributions

To contribute to the development of origin-aggregated-logging, see REVIEW.md

origin-aggregated-logging's People

Contributors

Stargazers

Watchers

Forkers

ewolinetz sosiouxme jcantrill jtslear richm getupcloud lukas-vlcek gamkiller77 wshearn xiazhao2015 mdshuai t0ffel sdodson thomastaquin aki-kurita timothysc ibotty tshumal d1g1tal-cnamts nhosoi yepengxj healthefrog xiaowenlee moolitayer ybz123 stevekuznetsov elyscape kseremet jmferrer mohit-soral enoodle fortem751 dtschan oliverjrichter tomsweeneyredhat caruccio smarterclayton portante johannes-cabal ericlake ersushantsood amitkumarj441 peterbaouoft rootfs georgegoh wozniakjan barleyer tiran nilsotto josefkarasek jokerr sparrow-net-pl ruromero mladendinev robfrut135 keerthivel28 adammhaile sradco staebler bdurrow bigdelivery sqtran sayeedch bsk01 appagile dhwanilraval meduri99 anpingli wallacetan florinpeter orachide openshift-cherrypick-robot jthapliya zhanglianx111 wolfspyre jemacom alexxnica kryndex nkinder aniamin nathanleclaire xgoss sudarshan-uc ravimanupati53 abarbare steven-terrana adambkaplan cs-lucas-mucheroni sureshgaikwad renanrmoraes himmatb niconosenzo smokedoc kyannick appleusers25 liuyinping darkobar brianholsen ronaldkonjer gaga-github

origin-aggregated-logging's Issues

deployed EFK error

Hello,
when i deployed the openshift-EFK[1],that are some error for this,such as

thank you!
[1] https://docs.openshift.org/latest/install_config/aggregate_logging.html

Error accessing _all index using Kibana user

This might be issue in Search-Guard configuration but still I think it is worth recording.

When I setup origin-aggregated-logging stack (see below if details are needed about how I did it) then I spotted difference between the following two calls. In case of the first call we use _all index placeholder (or it can be omitted at all) and it gets rejected by SG, while in the second call we explicitly used * and it is ok.

$ sudo curl -s -k --cert ./cert --key ./key https://localhost:9200/_cat/count
# or
$ sudo curl -s -k --cert ./cert --key ./key https://localhost:9200/_cat/count/_all
{
  "error" : "RuntimeException[java.lang.RuntimeException: Attempt from null to _all indices for indices:data/read/count and User [name=system.logging.kibana, roles=[]]]; nested: RuntimeException[Attempt from null to _all indices for indices:data/read/count and User [name=system.logging.kibana, roles=[]]]; ",
  "status" : 500
}

vs.

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_cat/count/*?v'
epoch      timestamp count 
1455109195 12:59:55  6797

Another example of this issue can be the following (probably more simple) use case:

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_search?size=0'
# or
$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/_all/_search?size=0'
{
  "error" : "RuntimeException[java.lang.RuntimeException: Attempt from null to _all indices for indices:data/read/search and User [name=system.logging.kibana, roles=[]]]; nested: RuntimeException[Attempt from null to _all indices for indices:data/read/search and User [name=system.logging.kibana, roles=[]]]; ",
  "status" : 500
}

vs.

$ sudo curl -s -k --cert ./cert --key ./key 'https://localhost:9200/*/_search?size=0'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 20,
    "successful" : 20,
    "failed" : 0
  },
  "hits" : {
    "total" : 7374,
    "max_score" : 0.0,
    "hits" : []
  }
}

Internally, it probably results into different calls, however, functionally it should be equivalent. The question is, shall we care about rejection of use of _all index while * is ok? May be we need to review ACL rules a bit more?

This might be related to #31?

Stack setup

I started OpenShift using vagrant from origin repo (checked out v1.1.1 tag before I built it). After I ssh-ed to openshiftdev machine I used @richm's script to build and setup whole stack. This means that I end up using kibana user certificates when doing above curl commands.

Deploying the EFK Stack FAILS with message: "error: error processing template logging/logging-es-template: [unable to parse quantity's suffix]"

Following steps from: https://docs.openshift.org/latest/install_config/aggregate_logging.html#deploying-the-efk-stack

All preparation steps were completed, keys were generated using next command:

oadm ca create-server-cert --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --hostnames=kibana.oc3.videonext.net --cert=kibana.crt --key=kibana.key
oc secrets new logging-deployer kibana.crt=kibana.crt kibana.key=kibana.key

Deployment was started using next command:

oc new-app logging-deployer-template \
--param KIBANA_HOSTNAME=kibana.oc3.videonext.net \
--param ES_CLUSTER_SIZE=1 \
--param PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 \
--param ES_INSTANCE_RAM=1Gi

Result:

[root@o3-master logging]# oc get pod/logging-deployer-67zp4 -w
NAME                     READY     STATUS              RESTARTS   AGE
logging-deployer-67zp4   0/1       ContainerCreating   0          1m
logging-deployer-67zp4   1/1       Running   0         1m
logging-deployer-67zp4   0/1       Error     0         1m

Here is full log:

[root@o3-master ~]# oc logs logging-deployer-67zp4
+ project=logging
+ mode=install
+ dir=/etc/deploy
+ secret_dir=/secret
+ master_url=https://kubernetes.default.svc.cluster.local
+ master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
+ '[' -n 1 ']'
+ oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc.cluster.local
cluster "master" set.
++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
+ oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJsb2dnaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImxvZ2dpbmctZGVwbG95ZXItdG9rZW4tcWE0bW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nZ2luZy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjYzNzg5YzQ0LTIyYjEtMTFlNi1iZDM3LTUyNTQwMGQzNmE1YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpsb2dnaW5nOmxvZ2dpbmctZGVwbG95ZXIifQ.b6rDEjuWXL3tFqfuoKbx4fKIvkeY9Y6R_utRHkwt0ZkWeoClSXquvDYUEZj6ngAIbz7XV3bs0lFm6my-l5_S4X12m84j4Ht-jdeo7n7wqUx2nS3cBSh8EISrueD0uVZZFABZt_xZiThLiHnBxAEN6OclxQ70Ehb96jgoQ4m4brmtlcsTNLogOK9pVGQ3ESfIKHSj0gvkDu3u97fDTLP5ibdstCxBUyhfdhEQRkMy0PZMuKv_giuDKASExWf-2qy-PcbTXTi6IM64Ccn0UHsIoz7_h-1kdxPufij4cIzN8el0BC_ZdnrShVnFOT125OpPo5qEf_WnFstWzOyROj9s6w
user "account" set.
+ oc config set-context current --cluster=master --user=account --namespace=logging
context "current" set.
+ oc config use-context current
switched to context "current".
+ for file in 'scripts/*.sh'
+ source scripts/install.sh
++ set -ex
+ for file in 'scripts/*.sh'
+ source scripts/upgrade.sh
++ set -ex
++ TIMES=300
++ fluentd_nodeselector=logging-infra-fluentd=true
+ for file in 'scripts/*.sh'
+ source scripts/util.sh
+ for file in 'scripts/*.sh'
+ source scripts/uuid_migrate.sh
+ case "${mode}" in
+ install_logging
+ initialize_install_vars
+ image_prefix=docker.io/openshift/origin-
+ image_version=latest
+ insecure_registry=false
+ hostname=kibana.oc3.videonext.net
+ ops_hostname=kibana-ops.example.com
+ public_master_url=https://o3-master.videonext.net:8443
+ es_instance_ram=1Gi
+ es_pvc_size=
+ es_pvc_prefix=logging-es-
+ es_cluster_size=1
+ es_node_quorum=1
+ es_recover_after_nodes=0
+ es_recover_expected_nodes=1
+ es_recover_after_time=5m
+ es_ops_instance_ram=8G
+ es_ops_pvc_size=
+ es_ops_pvc_prefix=logging-es-ops-
+ es_ops_cluster_size=1
+ es_ops_node_quorum=1
+ es_ops_recover_after_nodes=0
+ es_ops_recover_expected_nodes=1
+ es_ops_recover_after_time=5m
+ image_params=IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ generate_secrets
+ '[' '' '!=' true ']'
+ generate_signer_cert_and_conf
+ rm -rf /etc/deploy
rm: cannot remove '/etc/deploy': Permission denied
+ :
+ mkdir -p /secret
+ chmod 700 /secret
chmod: changing permissions of '/secret': Read-only file system
+ :
+ '[' -s /secret/ca.key ']'
++ date +%Y%m%d%H%M%S
+ openshift admin ca create-signer-cert --key=/etc/deploy/ca.key --cert=/etc/deploy/ca.crt --serial=/etc/deploy/ca.serial.txt --name=logging-signer-20160525195414
+ echo Generating signing configuration file
+ cat - conf/signing.conf
Generating signing configuration file
+ procure_server_cert kibana
+ local file=kibana hostnames=
+ '[' -s /secret/kibana.crt ']'
+ cp /secret/kibana.key /etc/deploy/kibana.key
+ cp /secret/kibana.crt /etc/deploy/kibana.crt
+ procure_server_cert kibana-ops
+ local file=kibana-ops hostnames=
+ '[' -s /secret/kibana-ops.crt ']'
+ '[' -n '' ']'
+ procure_server_cert kibana-internal kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
+ local file=kibana-internal hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
+ '[' -s /secret/kibana-internal.crt ']'
+ '[' -n kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com ']'
+ openshift admin ca create-server-cert --key=/etc/deploy/kibana-internal.key --cert=/etc/deploy/kibana-internal.crt --hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com --signer-cert=/etc/deploy/ca.crt --signer-key=/etc/deploy/ca.key --signer-serial=/etc/deploy/ca.serial.txt
+ '[' -s /secret/server-tls.json ']'
+ cp conf/server-tls.json /etc/deploy
+ cat /dev/null
+ cat /dev/null
+ fluentd_user=system.logging.fluentd
+ kibana_user=system.logging.kibana
+ curator_user=system.logging.curator
+ admin_user=system.admin
+ generate_PEM_cert system.logging.fluentd
+ NODE_NAME=system.logging.fluentd
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.fluentd
Generating keystore and certificate for node system.logging.fluentd
+ openssl req -out /etc/deploy/system.logging.fluentd.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.fluentd.key -subj /CN=system.logging.fluentd/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...........................................................................................+++
................................................................................................+++
writing new private key to '/etc/deploy/system.logging.fluentd.key'
-----
+ echo Sign certificate request with CA
Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.fluentd.csr -notext -out /etc/deploy/system.logging.fluentd.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 2 (0x2)
        Validity
            Not Before: May 25 19:54:18 2016 GMT
            Not After : May 25 19:54:18 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.fluentd
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                E6:16:3D:33:C7:95:FE:F8:2C:66:B6:15:FD:FB:D8:35:DB:E7:7F:7B
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:18 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.logging.kibana
+ NODE_NAME=system.logging.kibana
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.kibana
Generating keystore and certificate for node system.logging.kibana
+ openssl req -out /etc/deploy/system.logging.kibana.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.kibana.key -subj /CN=system.logging.kibana/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...................................................+++
.......................................+++
writing new private key to '/etc/deploy/system.logging.kibana.key'
-----
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.kibana.csr -notext -out /etc/deploy/system.logging.kibana.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 3 (0x3)
        Validity
            Not Before: May 25 19:54:18 2016 GMT
            Not After : May 25 19:54:18 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.kibana
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                DE:32:BA:34:4F:D5:34:7C:DA:F4:F2:1B:4C:76:28:E0:D5:46:88:96
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:18 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.logging.curator
+ NODE_NAME=system.logging.curator
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.logging.curator
Generating keystore and certificate for node system.logging.curator
+ openssl req -out /etc/deploy/system.logging.curator.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.curator.key -subj /CN=system.logging.curator/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
...............+++
..................................+++
writing new private key to '/etc/deploy/system.logging.curator.key'
-----
+ echo Sign certificate request with CA
Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.logging.curator.csr -notext -out /etc/deploy/system.logging.curator.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 4 (0x4)
        Validity
            Not Before: May 25 19:54:19 2016 GMT
            Not After : May 25 19:54:19 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.logging.curator
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                D5:31:7F:3F:70:BB:60:E1:F8:C2:6D:7B:F1:6C:04:F9:0D:35:D6:F7
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:19 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ generate_PEM_cert system.admin
+ NODE_NAME=system.admin
+ dir=/etc/deploy
+ echo Generating keystore and certificate for node system.admin
Generating keystore and certificate for node system.admin
+ openssl req -out /etc/deploy/system.admin.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.admin.key -subj /CN=system.admin/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
....+++
..................................+++
writing new private key to '/etc/deploy/system.admin.key'
-----
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/system.admin.csr -notext -out /etc/deploy/system.admin.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 5 (0x5)
        Validity
            Not Before: May 25 19:54:19 2016 GMT
            Not After : May 25 19:54:19 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Logging
            organizationalUnitName    = OpenShift
            commonName                = system.admin
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                96:D4:5D:8D:EA:35:50:D5:8F:15:95:11:06:FF:3E:6E:F0:F2:1F:94
            X509v3 Authority Key Identifier:
                0.
Certificate is to be certified until May 25 19:54:19 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
++ join , logging-es logging-es.logging.svc.cluster.local logging-es-cluster logging-es-cluster.logging.svc.cluster.local logging-es-ops logging-es-ops.logging.svc.cluster.local logging-es-ops-cluster logging-es-ops-cluster.logging.svc.cluster.local
++ local IFS=,
++ shift
++ echo logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ generate_JKS_chain logging-es logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ dir=/etc/deploy
+ NODE_NAME=logging-es
+ CERT_NAMES=logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
+ ks_pass=kspass
+ ts_pass=tspass
+ rm -rf logging-es
+ extension_names=
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster
+ for name in '${CERT_NAMES//,/ }'
+ extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
+ echo Generating keystore and certificate for node logging-es
Generating keystore and certificate for node logging-es
+ /bin/keytool -genkey -alias logging-es -keystore /etc/deploy/keystore.jks -keypass kspass -storepass kspass -keyalg RSA -keysize 2048 -validity 712 -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
Generating certificate signing request for node logging-es
+ echo Generating certificate signing request for node logging-es
+ /bin/keytool -certreq -alias logging-es -keystore /etc/deploy/keystore.jks -storepass kspass -file /etc/deploy/logging-es.csr -keyalg rsa -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
+ echo Sign certificate request with CA
+ openssl ca -in /etc/deploy/logging-es.csr -notext -out /etc/deploy/logging-es.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
        Serial Number: 6 (0x6)
        Validity
            Not Before: May 25 19:54:23 2016 GMT
            Not After : May 25 19:54:23 2018 GMT
        Subject:
            countryName               = DE
            localityName              = Test
            organizationName          = Test
            organizationalUnitName    = SSL
            commonName                = logging-es
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                BA:18:A4:E2:C1:7E:5F:CF:47:D1:27:E6:EB:F0:8F:76:41:02:CE:BC
            X509v3 Authority Key Identifier:
                0.
            X509v3 Subject Alternative Name:
                DNS:localhost, IP Address:127.0.0.1, DNS:logging-es, DNS:logging-es.logging.svc.cluster.local, DNS:logging-es-cluster, DNS:logging-es-cluster.logging.svc.cluster.local, DNS:logging-es-ops, DNS:logging-es-ops.logging.svc.cluster.local, DNS:logging-es-ops-cluster, DNS:logging-es-ops-cluster.logging.svc.cluster.local
Certificate is to be certified until May 25 19:54:23 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
+ echo 'Import back to keystore (including CA chain)'
Import back to keystore (including CA chain)
+ /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias sig-ca
Certificate was added to keystore
+ /bin/keytool -import -file /etc/deploy/logging-es.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias logging-es
Certificate reply was installed in keystore
+ echo 'Import CA to truststore for validating client certs'
+ /bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/truststore.jks -storepass tspass -noprompt -alias sig-ca
Import CA to truststore for validating client certs
Certificate was added to keystore
+ echo All done for logging-es
All done for logging-es
+ openssl rand 16
+ openssl enc -aes-128-cbc -nosalt -out /etc/deploy/searchguard_node_key.key -pass pass:pass
+ cat /dev/urandom
+ tr -dc a-zA-Z0-9
+ fold -w 200
+ head -n 1
+ cat /dev/urandom
+ tr -dc a-zA-Z0-9
+ fold -w 64
+ head -n 1
Deleting secrets
+ echo 'Deleting secrets'
+ oc delete secret logging-fluentd logging-elasticsearch logging-kibana logging-kibana-proxy logging-kibana-ops-proxy logging-curator logging-curator-ops
Error from server: secrets "logging-fluentd" not found
Error from server: secrets "logging-elasticsearch" not found
Error from server: secrets "logging-kibana" not found
Error from server: secrets "logging-kibana-proxy" not found
Error from server: secrets "logging-kibana-ops-proxy" not found
Error from server: secrets "logging-curator" not found
Error from server: secrets "logging-curator-ops" not found
+ :
+ echo 'Creating secrets'
+ oc secrets new logging-elasticsearch key=/etc/deploy/keystore.jks truststore=/etc/deploy/truststore.jks searchguard.key=/etc/deploy/searchguard_node_key.key admin-key=/etc/deploy/system.admin.key admin-cert=/etc/deploy/system.admin.crt admin-ca=/etc/deploy/ca.crt
Creating secrets
secret/logging-elasticsearch
+ oc secrets new logging-kibana ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.kibana.key cert=/etc/deploy/system.logging.kibana.crt
secret/logging-kibana
+ oc secrets new logging-kibana-proxy oauth-secret=/etc/deploy/oauth-secret session-secret=/etc/deploy/session-secret server-key=/etc/deploy/kibana-internal.key server-cert=/etc/deploy/kibana-internal.crt server-tls.json=/etc/deploy/server-tls.json
secret/logging-kibana-proxy
+ oc secrets new logging-fluentd ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.fluentd.key cert=/etc/deploy/system.logging.fluentd.crt
secret/logging-fluentd
+ oc secrets new logging-curator ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator
+ oc secrets new logging-curator-ops ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator-ops
+ echo 'Attaching secrets to service accounts'
+ oc secrets add serviceaccount/aggregated-logging-kibana logging-kibana logging-kibana-proxy
Attaching secrets to service accounts
+ oc secrets add serviceaccount/aggregated-logging-elasticsearch logging-elasticsearch
+ oc secrets add serviceaccount/aggregated-logging-fluentd logging-fluentd
+ oc secrets add serviceaccount/aggregated-logging-curator logging-curator
+ '[' -n '' ']'
+ generate_support_objects
++ cat /etc/deploy/oauth-secret
+ oc new-app -f templates/support.yaml --param OAUTH_SECRET=sZl5tjf5SOdvGwZ9RAlfCzPdvzplkMVOREr0KrYlMgLvjBPSw1GwjfGg5rrMJPWb --param KIBANA_HOSTNAME=kibana.oc3.videonext.net --param KIBANA_OPS_HOSTNAME=kibana-ops.example.com --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param INSECURE_REGISTRY=false
--> Deploying template logging-support-template-maker for "templates/support.yaml"
     With parameters:
      OAUTH_SECRET=sZl5tjf5SOdvGwZ9RAlfCzPdvzplkMVOREr0KrYlMgLvjBPSw1GwjfGg5rrMJPWb
      KIBANA_HOSTNAME=kibana.oc3.videonext.net
      KIBANA_OPS_HOSTNAME=kibana-ops.example.com
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      INSECURE_REGISTRY=false
--> Creating resources ...
    template "logging-support-template" created
    template "logging-imagestream-template" created
    template "logging-pvc-template" created
--> Success
    Run 'oc status' to view your app.
+ oc new-app logging-support-template
--> Deploying template logging-support-template for "logging-support-template"
--> Creating resources ...
    service "logging-es" created
    service "logging-es-cluster" created
    service "logging-es-ops" created
    service "logging-es-ops-cluster" created
    service "logging-kibana" created
    service "logging-kibana-ops" created
    oauthclient "kibana-proxy" created
--> Success
    Run 'oc status' to view your app.
+ kibana_keys=
+ '[' -e /etc/deploy/kibana.crt ']'
+ kibana_keys='--cert=/etc/deploy/kibana.crt --key=/etc/deploy/kibana.key'
+ oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt --cert=/etc/deploy/kibana.crt --key=/etc/deploy/kibana.key
route "logging-kibana" created
+ kibana_keys=
+ '[' -e /etc/deploy/kibana-ops.crt ']'
+ oc create route reencrypt --service=logging-kibana-ops --hostname=kibana-ops.example.com --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt
route "logging-kibana-ops" created
+ generate_templates
+ echo '(Re-)Creating templates'
+ generate_es_template
+ create_template_optional_nodeselector '' es --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1Gi --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=es
+ shift
+ cp templates/es.yaml /etc/deploy/es.yaml
(Re-)Creating templates
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/es.yaml --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1Gi --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-elasticsearch-template-maker for "/etc/deploy/es.yaml"
     With parameters:
      ES_CLUSTER_NAME=es
      ES_INSTANCE_RAM=1Gi
      ES_NODE_QUORUM=1
      ES_RECOVER_AFTER_NODES=0
      ES_RECOVER_EXPECTED_NODES=1
      ES_RECOVER_AFTER_TIME=5m
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
    template "logging-es-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_kibana_template
+ create_template_optional_nodeselector '' kibana --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=kibana
+ shift
+ cp templates/kibana.yaml /etc/deploy/kibana.yaml
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/kibana.yaml --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-kibana-template-maker for "/etc/deploy/kibana.yaml"
     With parameters:
      KIBANA_DEPLOY_NAME=kibana
      OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local
      OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443
      ES_HOST=logging-es
      ES_PORT=9200
      OAP_DEBUG=false
      IMAGE_VERSION_DEFAULT=latest
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Creating resources ...
    template "logging-kibana-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_curator_template
+ create_template_optional_nodeselector '' curator --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=
+ shift
+ local template=curator
+ shift
+ cp templates/curator.yaml /etc/deploy/curator.yaml
+ [[ -n '' ]]
+ oc new-app -f /etc/deploy/curator.yaml --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-curator-template-maker for "/etc/deploy/curator.yaml"
     With parameters:
      CURATOR_DEPLOY_NAME=curator
      MASTER_URL=https://kubernetes.default.svc.cluster.local
      ES_HOST=logging-es
      ES_PORT=9200
      ES_CLIENT_CERT=/etc/curator/keys/cert
      ES_CLIENT_KEY=/etc/curator/keys/key
      ES_CA=/etc/curator/keys/ca
      CURATOR_DEFAULT_DAYS=30
      CURATOR_CONF_LOCATION=/etc/curator
      CURATOR_RUN_HOUR=0
      CURATOR_RUN_MINUTE=0
      IMAGE_VERSION_DEFAULT=latest
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Creating resources ...
    template "logging-curator-template" created
--> Success
    Run 'oc status' to view your app.
+ '[' false == true ']'
+ generate_fluentd_template
+ es_host=logging-es
+ es_ops_host=logging-es
+ '[' false == true ']'
+ create_template_optional_nodeselector logging-infra-fluentd=true fluentd --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
+ local nodeselector=logging-infra-fluentd=true
+ shift
+ local template=fluentd
+ shift
+ cp templates/fluentd.yaml /etc/deploy/fluentd.yaml
+ [[ -n logging-infra-fluentd=true ]]
++ extract_nodeselector logging-infra-fluentd=true
++ local inputstring=logging-infra-fluentd=true
++ selectors=()
++ local selectors
++ for keyvalstr in '${inputstring//\,/ }'
++ keyval=(${keyvalstr//=/ })
++ [[ -n logging-infra-fluentd ]]
++ [[ -n true ]]
++ selectors+=("\"${keyval[0]}\": \"${keyval[1]}\"")
++ [[ 1 -gt 0 ]]
+++ join , '"logging-infra-fluentd": "true"'
+++ local IFS=,
+++ shift
+++ echo '"logging-infra-fluentd": "true"'
++ echo nodeSelector: '{' '"logging-infra-fluentd":' '"true"' '}'
+ sed '/serviceAccountName/ i\          nodeSelector: { "logging-infra-fluentd": "true" }' templates/fluentd.yaml
+ oc new-app -f /etc/deploy/fluentd.yaml --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest,IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-fluentd-template-maker for "/etc/deploy/fluentd.yaml"
     With parameters:
      MASTER_URL=https://kubernetes.default.svc.cluster.local
      ES_HOST=logging-es
      ES_PORT=9200
      ES_CLIENT_CERT=/etc/fluent/keys/cert
      ES_CLIENT_KEY=/etc/fluent/keys/key
      ES_CA=/etc/fluent/keys/ca
      OPS_HOST=logging-es
      OPS_PORT=9200
      OPS_CLIENT_CERT=/etc/fluent/keys/cert
      OPS_CLIENT_KEY=/etc/fluent/keys/key
      OPS_CA=/etc/fluent/keys/ca
      ES_COPY=false
      ES_COPY_HOST=
      ES_COPY_PORT=
      ES_COPY_SCHEME=https
      ES_COPY_CLIENT_CERT=
      ES_COPY_CLIENT_KEY=
      ES_COPY_CA=
      ES_COPY_USERNAME=
      ES_COPY_PASSWORD=
      OPS_COPY_HOST=
      OPS_COPY_PORT=
      OPS_COPY_SCHEME=https
      OPS_COPY_CLIENT_CERT=
      OPS_COPY_CLIENT_KEY=
      OPS_COPY_CA=
      OPS_COPY_USERNAME=
      OPS_COPY_PASSWORD=
      IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
      IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
    template "logging-fluentd-template" created
--> Success
    Run 'oc status' to view your app.
+ generate_objects
+ echo '(Re-)Creating deployed objects'
+ oc new-app logging-imagestream-template
(Re-)Creating deployed objects
--> Deploying template logging-imagestream-template for "logging-imagestream-template"
     With parameters:
      IMAGE_PREFIX=docker.io/openshift/origin-
--> Creating resources ...
    imagestream "logging-auth-proxy" created
    imagestream "logging-elasticsearch" created
    imagestream "logging-fluentd" created
    imagestream "logging-kibana" created
    imagestream "logging-curator" created
--> Success
    Run 'oc status' to view your app.
+ generate_es
+ pvcs=()
+ declare -A pvcs
++ oc get persistentvolumeclaim '--template={{range .items}}{{.metadata.name}} {{end}}'
+ (( n=1 ))
+ (( n<=1 ))
+ pvc=logging-es-1
+ '[' '' '!=' 1 -a '' '!=' '' ']'
+ '[' '' = 1 ']'
+ oc new-app logging-es-template
error: error processing template logging/logging-es-template: [unable to parse quantity's suffix]
[root@o3-master ~]#

need docs on using "custom" ssl certificates for fluentd, elasticsearch, etc

some people will want to use their own SSL certs for the "internal" communication between fluentd and elasticsearch.

Currently the docs clearly illustrate how to do this for Kibana itself, but not for the internal communications. Talking with @sosiouxme on the phone we determined this is "as easy as" replacing the secrets after they are generated, but this needs to be documented somehow.

ImagePullBackOff while deploying EFK

Hi, I am following readme to deploy EFK in origin 1.1.0.1. But It always fails with ImagePullBackOff status of pods. In events I could see below:

I tried pulling image on my machine using "docker pull " , which pulls image without any issue. Could anyone help me with this?

Note: I am able to create apps from other docker images + metrics setup is also working fine.

Thanks,
Yash

JSON is overwritten in Fluentd

If a log message is already contained within the 'message' field that Fluentd picks up, it is being overwritten by the contents of 'log'.

https://bugzilla.redhat.com/show_bug.cgi?id=1324996

Elastic Search per-replica storage is awkward

This is a tracking issue for future improvements to the current deployment implementation.

The issue is that each instance of Elastic Search requires its own storage volume, but there is no mechanism in current Kubernetes/OpenShift to set up a replica controller that varies parameters or volumes per replica, so multiple replicas could only reuse the same storage volume. Thus for multiple instances of Elastic Search we are required to create multiple deployments, each with a single "replica" so that each can have its own storage. This violates expectations from the rest of the platform and is difficult to manage.

This problem is not specific to aggregated logging; actually just about any clustered storage mechanism is likely to run into it. There is an upstream proposal to solve this generically. The discussion goes beyond simple storage concerns to cluster parameters or other parameters that may need to vary per instance, specialized deployment hooks, and so forth.

At this time the proposal and design itself is still under heavy debate, with spinoff issues to investigate the requirements of specific cluster implementations. We will add comments to this issue when there are substantive developments in the implementation and implications for logging specifically.

deployer fails on network blips

( i could be wrong about this, just a first thought, am running version 3.1)

Problem

I think the pod deployer can fail on a network blip, and if so ( i think ) that might break the deployment entirely ? I saw a ES node go down, and never came back up today (after waiting about 5 minutes).

When i looked, i saw that there was an Error, and the logs in the deployer said this:

[cloud-user@support ~]$ oc logs logging-es-1lfg5ess-1-deploy
F0511 18:24:07.389551       1 deployer.go:70] couldn't get deployment logging/logging-es-1lfg5ess-1: Get https://172.24.0.1:443/api/v1/namespaces/logging/replicationcontrollers/logging-es-1lfg5ess-1: net/http: TLS handshake timeout

Solution

Not sure what it is, but i think maybe deployer.go 70 could be made a little more robust to retry in the event of the handshake timeout?

Thoughts

If this is really a bug, This effects the ability to predictably generate precise plots of performance of scaling of the ES nodes, and ensuring their number in the cluster... because it meanst the deployment controllers themselves are unstable.

document source for logging-auth-proxy image

Is it https://github.com/fabric8io/openshift-auth-proxy?

use DaemonSet for fluentd when on modern kubernetes/openshift

It ought to work from v1.1 on, right?

aggregated logging / elasticsearch maintenance

We've installed the aggregated-logging Stack from https://github.com/openshift/origin-aggregated-logging/tree/master/deployment
Now, collecting and displaying logs works fine, but collected data grows, and we cant invoke elasticsearch's REST API to make some cleanup/maintenance jobs. even when i connect to the elasticsearch-pod and call
curl -X GET http://127.0.0.1:9200
the response is always
curl: (52) Empty reply from server

How do you maintain elasticsearch-data? Is there a special token / secret that must be used to connect elasticsearch?

[RFE] Enable fluentd to correctly parse multi-line json log entries such as java stacktraces.

if a Exception is thrown from a java application then fluentd will send for each line in the stacktrace a new event to elastic seach. this is undesired and produces a lot of noise in ES. it is hard to find the real errors

The parser plugin should be enabled and preconfigured for the above case (i'm sure there others )
http://docs.fluentd.org/articles/parser-plugin-overview

From docs

One more example, you can parse Java like stacktrace logs with multiline. Here is a configuration example.

format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) [(?.)] (?[^\s]+)(?.)/
If you have a following log:

2013-3-03 14:27:33 [main] INFO Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO Main - End
It will be parsed as:

2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - Start"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"ERROR","message":" Main - Exception\njavax.management.RuntimeErrorException: null\n at Main.main(Main.java:16) ~[bin/:na]"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":" Main - End"}

Deploying the EFK Stack FAILS with message: "error: open '/etc/deploy/kibana.crt': no such file or directory"

Following instruction: https://docs.openshift.org/latest/install_config/aggregate_logging.html#deploying-the-efk-stack

When executing next commands....

oc process logging-deployer-template -n openshift -v
PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443,KIBANA_HOSTNAME=kibana.oc3.videonext.net,ES_CLUSTER_SIZE=1,ES_INSTANCE_RAM=1G \

logging-deployer.json

oc create -f logging-deployer.json

Deployer pod gets created, then runs for 20-25 seconds, eventually fails:

kibana_keys=
'[' -e /etc/deploy/kibana.crt ']'
kibana_keys='--cert='''/etc/deploy/kibana.crt''' --key='''/etc/deploy/kibana.key''''
oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt '--cert='''/etc/deploy/kibana.crt'''' '--key='''/etc/deploy/kibana.key''''
error: open '/etc/deploy/kibana.crt': no such file or directory

As I was trying to troubleshoot, I started an additional terminal into the deployer pod and was checking for '/etc/deploy/kibana.crt' file: it was present all along, until container crashed.

I strongly suspect that and extra set of quotes around the cert file name is a root cause of the failure.

Here is a full log from deployer pod:

[root@o3-master master]# oc logs logging-deployer-9zann

project=logging
mode=install
dir=/etc/deploy
secret_dir=/secret
master_url=https://kubernetes.default.svc.cluster.local
master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
token_file=/var/run/secrets/kubernetes.io/serviceaccount/token
'[' -n 1 ']'
oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://kubernetes.default.svc.cluster.local
cluster "master" set.
++ cat /var/run/secrets/kubernetes.io/serviceaccount/token
oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJsb2dnaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImxvZ2dpbmctZGVwbG95ZXItdG9rZW4tZjN6YnQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nZ2luZy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjY0ZjRiY2VmLTA4YjktMTFlNi05ZmNhLTUyNTQwMDBlYjgyNyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpsb2dnaW5nOmxvZ2dpbmctZGVwbG95ZXIifQ.Yjikf_XhShgkVPocuKBUmobB47JqYpXn1k3nWReilGK-XXE4rMOgSUKYmzREUJVj8JWLdrkS_o4NgcPC1WiOR2BwxsIkWbKj0d6ZYMxzegOm4P5CID2_jgcy5GHFTNt63LRKawIi4bp_fMVQODIT2kL2VjWs_3EGoPACeaYR1tm0rFH9R8vHcPf8sggvpmXHnKpwFuZNbbwbW-ra8Efdfa0J6qMqq7ngwzPuIngDDIlOXLkpaYw1LCSLrBHaX12uME8Vtu7iM1GdZyRh35MUtfDRqf87y0v7BQnsPcdUy48_zuKK_Bd1BnM9uJsoEPa1sVFODU0TWnc2ein7BNIJLg
user "account" set.
oc config set-context current --cluster=master --user=account --namespace=logging
context "current" set.
oc config use-context current
switched to context "current".
for file in 'scripts/*.sh'
source scripts/install.sh
++ set -ex
for file in 'scripts/*.sh'
source scripts/util.sh
for file in 'scripts/*.sh'
source scripts/uuid_migrate.sh
case "${mode}" in
install_logging
initialize_install_vars
image_prefix=docker.io/openshift/origin-
image_version=latest
hostname=kibana.oc3.videonext.net
ops_hostname=kibana-ops.example.com
public_master_url=https://o3-master.videonext.net:8443
es_instance_ram=1G
es_pvc_size=
es_pvc_prefix=logging-es-
es_cluster_size=1
es_node_quorum=1
es_recover_after_nodes=0
es_recover_expected_nodes=1
es_recover_after_time=5m
es_ops_instance_ram=8G
es_ops_pvc_size=
es_ops_pvc_prefix=logging-es-ops-
es_ops_cluster_size=1
es_ops_node_quorum=1
es_ops_recover_after_nodes=0
es_ops_recover_expected_nodes=1
es_ops_recover_after_time=5m
generate_secrets
'[' '' '!=' true ']'
rm -rf /etc/deploy
rm: cannot remove '/etc/deploy': Permission denied
:
mkdir -p /secret
chmod 700 /secret
chmod: changing permissions of '/secret': Read-only file system
:
'[' -s /secret/ca.key ']'
++ date +%Y%m%d%H%M%S
openshift admin ca create-signer-cert --key=/etc/deploy/ca.key --cert=/etc/deploy/ca.crt --serial=/etc/deploy/ca.serial.txt --name=logging-signer-20160422201646
procure_server_cert kibana
local file=kibana hostnames=
'[' -s /secret/kibana.crt ']'
cp /secret/kibana.key /etc/deploy/kibana.key
cp /secret/kibana.crt /etc/deploy/kibana.crt
procure_server_cert kibana-ops
local file=kibana-ops hostnames=
'[' -s /secret/kibana-ops.crt ']'
'[' -n '' ']'
procure_server_cert kibana-internal kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
local file=kibana-internal hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com
'[' -s /secret/kibana-internal.crt ']'
'[' -n kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com ']'
openshift admin ca create-server-cert --key=/etc/deploy/kibana-internal.key --cert=/etc/deploy/kibana-internal.crt --hostnames=kibana,kibana-ops,kibana.oc3.videonext.net,kibana-ops.example.com --signer-cert=/etc/deploy/ca.crt --signer-key=/etc/deploy/ca.key --signer-serial=/etc/deploy/ca.serial.txt
echo Generating signing configuration file
cat - conf/signing.conf
Generating signing configuration file
'[' -s /secret/server-tls.json ']'
cp conf/server-tls.json /etc/deploy
cat /dev/null
cat /dev/null
fluentd_user=system.logging.fluentd
kibana_user=system.logging.kibana
curator_user=system.logging.curator
admin_user=system.admin
generate_PEM_cert system.logging.fluentd
NODE_NAME=system.logging.fluentd
dir=/etc/deploy
echo Generating keystore and certificate for node system.logging.fluentd
openssl req -out /etc/deploy/system.logging.fluentd.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.fluentd.key -subj /CN=system.logging.fluentd/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating keystore and certificate for node system.logging.fluentd
Generating a 2048 bit RSA private key
.....................+++
....................................+++
writing new private key to '/etc/deploy/system.logging.fluentd.key'
echo Sign certificate request with CA
openssl ca -in /etc/deploy/system.logging.fluentd.csr -notext -out /etc/deploy/system.logging.fluentd.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 2 (0x2)
Validity
Not Before: Apr 22 20:16:50 2016 GMT
Not After : Apr 22 20:16:50 2018 GMT
Subject:
countryName = DE
localityName = Test
organizationName = Logging
organizationalUnitName = OpenShift
commonName = system.logging.fluentd
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
B1:04:92:2D:7D:D1:7C:A4:1A:03:55:6F:6B:8B:2D:CD:28:38:87:D3
X509v3 Authority Key Identifier:
0.
Certificate is to be certified until Apr 22 20:16:50 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

generate_PEM_cert system.logging.kibana
NODE_NAME=system.logging.kibana
dir=/etc/deploy
echo Generating keystore and certificate for node system.logging.kibana
Generating keystore and certificate for node system.logging.kibana
openssl req -out /etc/deploy/system.logging.kibana.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.kibana.key -subj /CN=system.logging.kibana/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating a 2048 bit RSA private key
.....................................................................+++
......................+++
writing new private key to '/etc/deploy/system.logging.kibana.key'
echo Sign certificate request with CA
Sign certificate request with CA
openssl ca -in /etc/deploy/system.logging.kibana.csr -notext -out /etc/deploy/system.logging.kibana.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 3 (0x3)
Validity
Not Before: Apr 22 20:16:51 2016 GMT
Not After : Apr 22 20:16:51 2018 GMT
Subject:
countryName = DE
localityName = Test
organizationName = Logging
organizationalUnitName = OpenShift
commonName = system.logging.kibana
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
C5:F9:16:53:B0:19:01:35:82:C4:F3:B2:4A:1F:A8:0A:03:97:92:D9
X509v3 Authority Key Identifier:
0.
Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

generate_PEM_cert system.logging.curator
NODE_NAME=system.logging.curator
dir=/etc/deploy
echo Generating keystore and certificate for node system.logging.curator
openssl req -out /etc/deploy/system.logging.curator.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.logging.curator.key -subj /CN=system.logging.curator/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating keystore and certificate for node system.logging.curator
Generating a 2048 bit RSA private key
....+++
.........................................................+++
writing new private key to '/etc/deploy/system.logging.curator.key'
echo Sign certificate request with CA
openssl ca -in /etc/deploy/system.logging.curator.csr -notext -out /etc/deploy/system.logging.curator.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 4 (0x4)
Validity
Not Before: Apr 22 20:16:51 2016 GMT
Not After : Apr 22 20:16:51 2018 GMT
Subject:
countryName = DE
localityName = Test
organizationName = Logging
organizationalUnitName = OpenShift
commonName = system.logging.curator
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
B4:22:14:DA:0A:F6:7C:64:00:41:A9:0C:A7:95:88:1D:E2:61:7D:C8
X509v3 Authority Key Identifier:
0.
Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

generate_PEM_cert system.admin
NODE_NAME=system.admin
dir=/etc/deploy
echo Generating keystore and certificate for node system.admin
openssl req -out /etc/deploy/system.admin.csr -new -newkey rsa:2048 -keyout /etc/deploy/system.admin.key -subj /CN=system.admin/OU=OpenShift/O=Logging/L=Test/C=DE -days 712 -nodes
Generating keystore and certificate for node system.admin
Generating a 2048 bit RSA private key
....+++
.............................................+++
writing new private key to '/etc/deploy/system.admin.key'
echo Sign certificate request with CA
openssl ca -in /etc/deploy/system.admin.csr -notext -out /etc/deploy/system.admin.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 5 (0x5)
Validity
Not Before: Apr 22 20:16:51 2016 GMT
Not After : Apr 22 20:16:51 2018 GMT
Subject:
countryName = DE
localityName = Test
organizationName = Logging
organizationalUnitName = OpenShift
commonName = system.admin
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
CC:03:4D:07:D0:F0:3B:38:0C:E7:E6:30:EA:59:50:38:8D:2E:DB:1F
X509v3 Authority Key Identifier:
0.
Certificate is to be certified until Apr 22 20:16:51 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated
++ join , logging-es logging-es.logging.svc.cluster.local logging-es-cluster logging-es-cluster.logging.svc.cluster.local logging-es-ops logging-es-ops.logging.svc.cluster.local logging-es-ops-cluster logging-es-ops-cluster.logging.svc.cluster.local
++ local IFS=,
++ shift
++ echo logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local

generate_JKS_chain logging-es logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
dir=/etc/deploy
NODE_NAME=logging-es
CERT_NAMES=logging-es,logging-es.logging.svc.cluster.local,logging-es-cluster,logging-es-cluster.logging.svc.cluster.local,logging-es-ops,logging-es-ops.logging.svc.cluster.local,logging-es-ops-cluster,logging-es-ops-cluster.logging.svc.cluster.local
ks_pass=kspass
ts_pass=tspass
rm -rf logging-es
extension_names=
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster
for name in '${CERT_NAMES//,/ }'
extension_names=,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
echo Generating keystore and certificate for node logging-es
/bin/keytool -genkey -alias logging-es -keystore /etc/deploy/keystore.jks -keypass kspass -storepass kspass -keyalg RSA -keysize 2048 -validity 712 -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
Generating keystore and certificate for node logging-es
echo Generating certificate signing request for node logging-es
/bin/keytool -certreq -alias logging-es -keystore /etc/deploy/keystore.jks -storepass kspass -file /etc/deploy/logging-es.csr -keyalg rsa -dname 'CN=logging-es, OU=SSL, O=Test, L=Test, C=DE' -ext san=dns:localhost,ip:127.0.0.1,dns:logging-es,dns:logging-es.logging.svc.cluster.local,dns:logging-es-cluster,dns:logging-es-cluster.logging.svc.cluster.local,dns:logging-es-ops,dns:logging-es-ops.logging.svc.cluster.local,dns:logging-es-ops-cluster,dns:logging-es-ops-cluster.logging.svc.cluster.local
Generating certificate signing request for node logging-es
echo Sign certificate request with CA
openssl ca -in /etc/deploy/logging-es.csr -notext -out /etc/deploy/logging-es.crt -config /etc/deploy/signing.conf -extensions v3_req -batch -extensions server_ext
Sign certificate request with CA
Using configuration from /etc/deploy/signing.conf
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 6 (0x6)
Validity
Not Before: Apr 22 20:16:53 2016 GMT
Not After : Apr 22 20:16:53 2018 GMT
Subject:
countryName = DE
localityName = Test
organizationName = Test
organizationalUnitName = SSL
commonName = logging-es
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints:
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
79:F7:CD:26:B8:84:5B:19:08:87:53:1F:2B:C8:25:A7:8A:73:70:EC
X509v3 Authority Key Identifier:
0.
X509v3 Subject Alternative Name:
DNS:localhost, IP Address:127.0.0.1, DNS:logging-es, DNS:logging-es.logging.svc.cluster.local, DNS:logging-es-cluster, DNS:logging-es-cluster.logging.svc.cluster.local, DNS:logging-es-ops, DNS:logging-es-ops.logging.svc.cluster.local, DNS:logging-es-ops-cluster, DNS:logging-es-ops-cluster.logging.svc.cluster.local
Certificate is to be certified until Apr 22 20:16:53 2018 GMT (730 days)

Write out database with 1 new entries
Data Base Updated

echo 'Import back to keystore (including CA chain)'
/bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias sig-ca
Import back to keystore (including CA chain)
Certificate was added to keystore
/bin/keytool -import -file /etc/deploy/logging-es.crt -keystore /etc/deploy/keystore.jks -storepass kspass -noprompt -alias logging-es
Certificate reply was installed in keystore
Import CA to truststore for validating client certs
echo 'Import CA to truststore for validating client certs'
/bin/keytool -import -file /etc/deploy/ca.crt -keystore /etc/deploy/truststore.jks -storepass tspass -noprompt -alias sig-ca
Certificate was added to keystore
All done for logging-es
echo All done for logging-es
openssl rand 16
openssl enc -aes-128-cbc -nosalt -out /etc/deploy/searchguard_node_key.key -pass pass:pass
cat /dev/urandom
tr -dc a-zA-Z0-9
fold -w 200
head -n 1
cat /dev/urandom
tr -dc a-zA-Z0-9
fold -w 64
head -n 1
echo 'Deleting existing secrets'
oc delete secret logging-fluentd logging-elasticsearch logging-kibana logging-kibana-proxy logging-kibana-ops-proxy logging-curator logging-curator-ops
Deleting existing secrets
secret "logging-fluentd" deleted
secret "logging-elasticsearch" deleted
secret "logging-kibana" deleted
secret "logging-kibana-proxy" deleted
secret "logging-curator" deleted
secret "logging-curator-ops" deleted
Error from server: secrets "logging-kibana-ops-proxy" not found
:
echo 'Creating secrets'
Creating secrets
oc secrets new logging-elasticsearch key=/etc/deploy/keystore.jks truststore=/etc/deploy/truststore.jks searchguard.key=/etc/deploy/searchguard_node_key.key admin-key=/etc/deploy/system.admin.key admin-cert=/etc/deploy/system.admin.crt admin-ca=/etc/deploy/ca.crt
secret/logging-elasticsearch
oc secrets new logging-kibana ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.kibana.key cert=/etc/deploy/system.logging.kibana.crt
secret/logging-kibana
oc secrets new logging-kibana-proxy oauth-secret=/etc/deploy/oauth-secret session-secret=/etc/deploy/session-secret server-key=/etc/deploy/kibana-internal.key server-cert=/etc/deploy/kibana-internal.crt server-tls.json=/etc/deploy/server-tls.json
secret/logging-kibana-proxy
oc secrets new logging-fluentd ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.fluentd.key cert=/etc/deploy/system.logging.fluentd.crt
secret/logging-fluentd
oc secrets new logging-curator ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator
oc secrets new logging-curator-ops ca=/etc/deploy/ca.crt key=/etc/deploy/system.logging.curator.key cert=/etc/deploy/system.logging.curator.crt
secret/logging-curator-ops
echo 'Attaching secrets to service accounts'
oc secrets add serviceaccount/aggregated-logging-kibana logging-kibana logging-kibana-proxy
Attaching secrets to service accounts
oc secrets add serviceaccount/aggregated-logging-elasticsearch logging-elasticsearch
oc secrets add serviceaccount/aggregated-logging-fluentd logging-fluentd
oc secrets add serviceaccount/aggregated-logging-curator logging-curator
generate_templates
echo '(Re-)Creating templates'
oc delete template --selector logging-infra=curator
(Re-)Creating templates
template "logging-curator-template" deleted
oc delete template --selector logging-infra=kibana
template "logging-kibana-template" deleted
oc delete template --selector logging-infra=fluentd
template "logging-fluentd-template" deleted
oc delete template --selector logging-infra=elasticsearch
template "logging-es-template" deleted
create_template_optional_nodeselector '' es --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1G --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest
local nodeselector=
shift
local template=es
shift
cp templates/es.yaml /etc/deploy/es.yaml
[[ -n '' ]]
oc new-app -f /etc/deploy/es.yaml --param ES_CLUSTER_NAME=es --param ES_INSTANCE_RAM=1G --param ES_NODE_QUORUM=1 --param ES_RECOVER_AFTER_NODES=0 --param ES_RECOVER_EXPECTED_NODES=1 --param ES_RECOVER_AFTER_TIME=5m --param IMAGE_VERSION_DEFAULT=latest
--> Deploying template logging-elasticsearch-template-maker for "/etc/deploy/es.yaml"
With parameters:
ES_CLUSTER_NAME=es
ES_INSTANCE_RAM=1G
ES_NODE_QUORUM=1
ES_RECOVER_AFTER_NODES=0
ES_RECOVER_EXPECTED_NODES=1
ES_RECOVER_AFTER_TIME=5m
IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
template "logging-es-template" created
--> Success
Run 'oc status' to view your app.
es_host=logging-es
create_template_optional_nodeselector '' kibana --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest
local nodeselector=
shift
local template=kibana
shift
cp templates/kibana.yaml /etc/deploy/kibana.yaml
[[ -n '' ]]
oc new-app -f /etc/deploy/kibana.yaml --param OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443 --param OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_VERSION_DEFAULT=latest
--> Deploying template logging-kibana-template-maker for "/etc/deploy/kibana.yaml"
With parameters:
KIBANA_DEPLOY_NAME=kibana
OAP_MASTER_URL=https://kubernetes.default.svc.cluster.local
OAP_PUBLIC_MASTER_URL=https://o3-master.videonext.net:8443
ES_HOST=logging-es
ES_PORT=9200
OAP_DEBUG=false
IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
template "logging-kibana-template" created
--> Success
Run 'oc status' to view your app.
create_template_optional_nodeselector '' curator --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest
local nodeselector=
shift
local template=curator
shift
cp templates/curator.yaml /etc/deploy/curator.yaml
[[ -n '' ]]
oc new-app -f /etc/deploy/curator.yaml --param ES_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param CURATOR_DEPLOY_NAME=curator --param IMAGE_VERSION_DEFAULT=latest
--> Deploying template logging-curator-template-maker for "/etc/deploy/curator.yaml"
With parameters:
CURATOR_DEPLOY_NAME=curator
MASTER_URL=https://kubernetes.default.svc.cluster.local
ES_HOST=logging-es
ES_PORT=9200
ES_CLIENT_CERT=/etc/curator/keys/cert
ES_CLIENT_KEY=/etc/curator/keys/key
ES_CA=/etc/curator/keys/ca
CURATOR_DEFAULT_DAYS=30
CURATOR_CONF_LOCATION=/etc/curator
CURATOR_RUN_HOUR=0
CURATOR_RUN_MINUTE=0
IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
template "logging-curator-template" created
--> Success
Run 'oc status' to view your app.
es_ops_host=logging-es
'[' false == true ']'
create_template_optional_nodeselector logging-infra-fluentd=true fluentd --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param IMAGE_VERSION_DEFAULT=latest
local nodeselector=logging-infra-fluentd=true
shift
local template=fluentd
shift
cp templates/fluentd.yaml /etc/deploy/fluentd.yaml
[[ -n logging-infra-fluentd=true ]]
++ extract_nodeselector logging-infra-fluentd=true
++ local inputstring=logging-infra-fluentd=true
++ selectors=()
++ local selectors
++ for keyvalstr in '${inputstring//,/ }'
++ keyval=(${keyvalstr//=/ })
++ [[ -n logging-infra-fluentd ]]
++ [[ -n true ]]
++ selectors=("${selectors[@]}" ""${keyval[0]}": "${keyval[1]}"")
++ [[ 1 -gt 0 ]]
+++ join , '"logging-infra-fluentd": "true"'
+++ local IFS=,
+++ shift
+++ echo '"logging-infra-fluentd": "true"'
++ echo nodeSelector: '{' '"logging-infra-fluentd":' '"true"' '}'
sed '/serviceAccountName/ i\ nodeSelector: { "logging-infra-fluentd": "true" }' templates/fluentd.yaml
oc new-app -f /etc/deploy/fluentd.yaml --param ES_HOST=logging-es --param OPS_HOST=logging-es --param MASTER_URL=https://kubernetes.default.svc.cluster.local --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin- --param IMAGE_VERSION_DEFAULT=latest
--> Deploying template logging-fluentd-template-maker for "/etc/deploy/fluentd.yaml"
With parameters:
MASTER_URL=https://kubernetes.default.svc.cluster.local
ES_HOST=logging-es
ES_PORT=9200
ES_CLIENT_CERT=/etc/fluent/keys/cert
ES_CLIENT_KEY=/etc/fluent/keys/key
ES_CA=/etc/fluent/keys/ca
OPS_HOST=logging-es
OPS_PORT=9200
OPS_CLIENT_CERT=/etc/fluent/keys/cert
OPS_CLIENT_KEY=/etc/fluent/keys/key
OPS_CA=/etc/fluent/keys/ca
ES_COPY=false
ES_COPY_HOST=
ES_COPY_PORT=
ES_COPY_SCHEME=https
ES_COPY_CLIENT_CERT=
ES_COPY_CLIENT_KEY=
ES_COPY_CA=
ES_COPY_USERNAME=
ES_COPY_PASSWORD=
OPS_COPY_HOST=
OPS_COPY_PORT=
OPS_COPY_SCHEME=https
OPS_COPY_CLIENT_CERT=
OPS_COPY_CLIENT_KEY=
OPS_COPY_CA=
OPS_COPY_USERNAME=
OPS_COPY_PASSWORD=
IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
IMAGE_VERSION_DEFAULT=latest
--> Creating resources ...
template "logging-fluentd-template" created
--> Success
Run 'oc status' to view your app.
'[' '' '!=' true ']'
oc delete template --selector logging-infra=support
template "logging-imagestream-template" deleted
template "logging-pvc-template" deleted
template "logging-support-template" deleted
++ cat /etc/deploy/oauth-secret
oc new-app -f templates/support.yaml --param OAUTH_SECRET=WyJiJ6bpeu0J225RCnhg1uUTS1F8PO9ViiaE2DMPHKwGS5OusNxePhOUSHboW6KD --param KIBANA_HOSTNAME=kibana.oc3.videonext.net --param KIBANA_OPS_HOSTNAME=kibana-ops.example.com --param IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Deploying template logging-support-template-maker for "templates/support.yaml"
With parameters:
OAUTH_SECRET=WyJiJ6bpeu0J225RCnhg1uUTS1F8PO9ViiaE2DMPHKwGS5OusNxePhOUSHboW6KD
KIBANA_HOSTNAME=kibana.oc3.videonext.net
KIBANA_OPS_HOSTNAME=kibana-ops.example.com
IMAGE_PREFIX_DEFAULT=docker.io/openshift/origin-
--> Creating resources ...
template "logging-support-template" created
template "logging-imagestream-template" created
template "logging-pvc-template" created
--> Success
Run 'oc status' to view your app.
(Re-)Creating deployed objects
generate_objects
echo '(Re-)Creating deployed objects'
'[' '' '!=' true ']'
oc process logging-support-template
oc delete -f -
service "logging-es" deleted
service "logging-es-cluster" deleted
service "logging-es-ops" deleted
service "logging-es-ops-cluster" deleted
service "logging-kibana" deleted
service "logging-kibana-ops" deleted
oauthclient "kibana-proxy" deleted
oc delete imagestream,service,route --selector logging-infra=support
No resources found
oc process logging-support-template
oc create -f -
service "logging-es" created
service "logging-es-cluster" created
service "logging-es-ops" created
service "logging-es-ops-cluster" created
service "logging-kibana" created
service "logging-kibana-ops" created
oauthclient "kibana-proxy" created
kibana_keys=
'[' -e /etc/deploy/kibana.crt ']'
kibana_keys='--cert='''/etc/deploy/kibana.crt''' --key='''/etc/deploy/kibana.key''''
oc create route reencrypt --service=logging-kibana --hostname=kibana.oc3.videonext.net --dest-ca-cert=/etc/deploy/ca.crt --ca-cert=/etc/deploy/ca.crt '--cert='''/etc/deploy/kibana.crt'''' '--key='''/etc/deploy/kibana.key''''
error: open '/etc/deploy/kibana.crt': no such file or directory

curator starts before elasticsearch during deploy

The curator pod starts before the elasticsearch pod during deploy. This causes errors like this in the curator logs::

logging-curator running [1] jobs
2016-03-11 18:33:13,831 ERROR     Connection failure.
logging-curator run finish

This is because elasticsearch hasn't started yet::

[2016-03-11 18:33:18,372][INFO ][node                     ] [Outrage] version[1.5.2], pid[8], build[62ff986/2015-04-27T09:21:06Z]
[2016-03-11 18:33:18,377][INFO ][node                     ] [Outrage] initializing ...
[2016-03-11 18:33:19,601][INFO ][plugins                  ] [Outrage] loaded [searchguard, openshift-elasticsearch-plugin, cloud-kubernetes], sites []

Is there some way we can orchestrate the pods in the deployer?

Question, regarding fluentd volume configuration

Guys,

I've dismantled this as we want to borrow some bits of this but make it work for our specific environment. In the process I do not understand how the master branch can be working. Either some magic is going on, or what is built does not match what is on master.

The volume configuration for fluentd as noted below:
https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/templates/fluentd.yaml#L79

Does not match where fluentd is configured to look for the log files:
https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/fluent.conf#L147

Update search-guard ACL

To address the following error:

[ERROR][com.floragunn.searchguard.filter.SearchGuardActionFilter] Error while apply() due to com.floragunn.searchguard.tokeneval.MalformedConfigurationException: no bypass or execute filters at all for action indices:admin/mappings/fields/get

I believe this occurs when people are navigating to the settings page in Kibana, and Kibana tries to pull the fields for the logstash-* index.

Decide on a proper name for the deployer

Should it be deployer or deployment:

origin-logging-deployment

metrics calls it:

openshift/origin-metrics-deployer

Error in deployment instruction (minor)

https://docs.openshift.org/latest/install_config/aggregate_logging.html#pre-deployment-configuration

Preparation step #3 has to have URL corrected: https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/**deployment**/deployer.yaml

Deployer pod fails on timeout: How to debug?

I'm finding that on a cluster, after deploying some ES nodes, I get this:

logging-es-wucg4yv5-1-deploy   0/1       Error       0          2m
logging-es-wvn347bb-1-deploy   0/1       Error       0          4m
logging-es-xf75ods2-1-deploy   0/1       Error       0          2m
logging-es-xtb58wpb-1-deploy   0/1       Error       0          4m
logging-es-y2kmwhda-1-deploy   0/1       Error       0          2m
logging-es-ybwfcgo1-1-deploy   0/1       Error       0          2m

Basically, all the ES deploy tasks fail. Since there's no logging-es pod ever created, all i see is a timeout.

root@support: /opt/jay/team8/projects/enterprise_logging # oc logs logging-es-q0pf7vnt-1-deploy
I0520 17:44:24.403372       1 deployer.go:200] Deploying logging/logging-es-q0pf7vnt-1 for the first time (replicas: 1)
I0520 17:44:24.447335       1 recreate.go:126] Scaling logging/logging-es-q0pf7vnt-1 to 1 before performing acceptance check
F0520 17:46:25.543245       1 deployer.go:70] couldn't scale logging/logging-es-q0pf7vnt-1 to 1: timed out waiting for the condition

Any idea how to fix this?
How can we get more information about where the logging deployer is falling down?

cc @rflorenc

Elasticsearch's run.sh does not set heap size at all

ES_JAVA_OPTS is not exported, so the elasticsearch start script does not even see the options set there.
Also, it configures elasticsearch to use a variable amount of Ram (-Xms512m -Xmx$HALF_THE_RAM). Elasticsearch's heap sizing guide recommends to set them to the same value, or better: set ES_HEAP_SIZE.

When this is fixed, it might possibly conflict with options set in elasticsearch.in.sh. That should be checked.

Improve ES volume documentation

It's become clear the official docs for dealing with ES volumes are inadequate. A discussion of how multiple instances attach separate volumes and the tradeoffs of hostmount vs NAS is in order.

"error: couldn't read version from server: Get https://kubernetes.default.svc.cluster.local/api: dial tcp: lookup kubernetes.default.svc.cluster.local: no such host" in logging-deployer Pod

Met "error: couldn't read version from server: Get https://kubernetes.default.svc.cluster.local/api: dial tcp: lookup kubernetes.default.svc.cluster.local: no such host" in logging-deployer Pod.

Steps to Reproduce:

Log into OSE env
Create a project named "chunpj"
Create the Deployer Secret
oc secrets new logging-deployer nothing=/dev/null
Create the Deployer ServiceAccount
oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
metadata:
name: logging-deployer
secrets:

name: logging-deployer
API

oc policy add-role-to-user edit
system:serviceaccount:chunpj:logging-deployer
5. Run the Deployer
oc process -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployment/deployer.yaml -v IMAGE_PREFIX=<rcm-img-docker01_REGISTRY>/openshift3/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://<OSE_MASTER>:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -
6. Check the logging-deployer's logs, please refer to the detail logs from this gist: https://gist.github.com/chunyunchen/19122b09b62cc178af2d

Fluentd will blindly use hostname from syslog

Currently Fluentd will just use the value of the third column to populate the value of 'hostname', even if it is something like 'localhost'. We should probably replace this with the FQDN

Handle kibana-proxy SSL/TLS termination in the router

As far as I can tell, there's no real reason to have SSL/TLS termination be handled by the Kibana auth proxy instead of in the OpenShift router. This is especially true since the router image is updated far more often than that of the Kibana auth proxy, which is running on Node.js v0.10.36 with an old version of OpenSSL.

Improve experience around scaling ES

Improve documentation surrounding restricting deployments of different EFK components

If we want to restrict where ES and Kibana are deployed, we would still want to ensure that Fluentd can be deployed to every node. We need to describe means of doing this, such as leveraging the Default namespace and using https://docs.openshift.org/latest/admin_guide/pod_network.html#joining-project-networks when the multitenant SDN plugin is available

Describe how do we create in different namespaces after install?

need docs on scaling elasticsearch

we currently allude to some gotchas with scaling elasticsearch, but we may need to more explicitly document how to do it (simply).

fluentd container does not work with Project Atomic

This appears to be a known issue, but I thought it would be nice to have an open bug about it, because it's extremely hard to google for.

The Fluentd container generates SELinux exceptions on Project Atomic. Dan Walsh has written about how to create an SELinux context to allow Fluentd to run: http://www.projectatomic.io/blog/2016/03/selinux-and-docker-part-2/

However, it's not at all clear to me how to generate such a policy on Atomic (i.e. via openshift-ansible) and then make use of it in the fluentd template.

EFK doubt

Hi, I have deployed EFK in openshift 3.1, In doc it is mentoined "Unfortunately there is no way to stream logs as they are created at this time."

how much time gap is there between log generation and display in kibana and why??

Clean up Fluentd Dockerfile

Should no longer provide out_*.rb files and should not ADD them either. elasticsearch_dynamic is now available from fluentd-plugin-elasticsearch releases

hardcode for image version for es,curator,kibana components

https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/templates/es.yaml#L43
https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/templates/curator.yaml#L45
https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/templates/kibana.yaml#L45

It's better to use IMAGE_VERSION parameter to specify image tag for above template instead of hardcode.

[RFE] allow for providing node selectors for es, kibana in the template

[RFE] Expand deployer to have upgrade option

Can safely scale down components
Pull in updated images
Create missing or remove deprecated api objects (e.g.: secrets, dc, ds)
Maintain previous configurations (volumes and nodeSelectors)

then Scale back up

Add logging auth proxy to build and release scripts

... the title says it all

Incorrect year used for syslog based message operations index

When pulling in log files from the previous year in the current year; the index the logs are first created for are the current year.

E.g. logs from 12/27/2015 will be created in the index ".operations.2016.12.27" if it is read in in 2016 initially

rt

Setting ENABLE_OPS_CLUSTER does not set ES_HOST correctly for kibana-ops DC

Within the deployer script when set ENABLE_OPS_CLUSTER to true, we are not setting the value ES_HOST to point to the operations ElasticsSearch service.

Deployment fails on oc process logging-fluentd-template

I am running the deploy based on this pod https://gist.github.com/allen13/ad4f720dc0b94da26cc7
I followed this guide to set things up https://docs.openshift.org/latest/install_config/aggregate_logging.html

deploymentconfig "logging-es-ugoyxphc" created

(( n++ ))
(( n<1 ))
oc process logging-fluentd-template
oc create -f -
json: cannot unmarshal object into Go value of type string

Increase documentation around upgrading EFK stack

Describe scaling down clusters
Describe recreating keys/certs/secrets + oauthclient

no liveness/readiness probes

Not sure about readiness, but liveness probes are probably important?

logging-fluentd connection refused to ElasticSearch

We are running into an issue following the documentation. We have basically followed the docs apart from calling the our project mbaas-logging instead of the default.

The issue we are seeing is with the fluentd pod:

$ oc logs -f logging-fluentd-1-3925v
2016-01-25 08:02:35 -0500 [info]: reading config file path="/etc/fluent/fluent.conf"
2016-01-25 08:04:06 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-01-25 08:02:48 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es.mbaas-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! Connection refused - connect(2) (Errno::ECONNREFUSED)" plugin_id="object:1421250"
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:61:in `rescue in client'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:58:in `client'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `rescue in send'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:182:in `send'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:174:in `block in write'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:173:in `each'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.0.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:173:in `write'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/buffer.rb:325:in `write_chunk'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/buffer.rb:304:in `pop'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/output.rb:321:in `try_flush'
  2016-01-25 08:04:06 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.16/lib/fluent/output.rb:140:in `run'
2016-01-25 08:05:39 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-01-25 08:02:50 -0500
error_class="Fluent::ElasticsearchOutput::ConnectionFailure" 
error="Can not reach Elasticsearch cluster 
({:host=>\"logging-es.mbaas-logging.svc.cluster.local\", 
:port=>9200, 
:scheme=>\"https\", 
:user=>\"fluentd\",  
:password=>\"obfuscated\"})! 
Connection refused - connect(2) (Errno::ECONNREFUSED)" plugin_id="object:1421250"
2016-01-25 08:05:39 -0500 [warn]: suppressed same stacktrace

The ES_HOST is defined in the run.sh:

es_host=logging-es.${project}.svc.cluster.local

If we look inspect /etc/resolv.conf for the fluentd pod:

[root@logging-fluentd-8-9ukoc /]# cat /etc/resolv.conf
nameserver 172.30.0.1
nameserver 10.0.2.3
search mbaas-logging.svc.cluster.local svc.cluster.local cluster.local feedhenry.io
options ndots:5

The original host we were dealing with was logging-es.mbaas-logging.svc.cluster.local. We have 4 dots in that name, so the above search args (in combination with ndots) would append mbaas-logging.svc.cluster.local to the first resolve try (because there are fewer than ndots:5).
The order would be:

logging-es.mbaas-logging.svc.cluster.local.mbaas-logging.svc.cluster.local
logging-es.mbaas-logging.svc.cluster.local.svc.cluster.local
logging-es.mbaas-logging.svc.cluster.local.cluster.local
~~logging-es.mbaas-logging.svc.cluster.local.feedhenry.io~~

~~None of the above would be resolved with any on the dns servers.~~
The above was not a correct statement. It turns out that the last entry in the above list could be resolved by the second nameserver (10.0.2.3):

[vagrant@local ~]$ dig @10.0.2.3 logging-es.mbaas-logging.svc.cluster.local.feedhenry.io

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7 <<>> @10.0.2.3 logging-es.mbaas-logging.svc.cluster.local.feedhenry.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50317
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;logging-es.mbaas-logging.svc.cluster.local.feedhenry.io. IN A

;; ANSWER SECTION:
logging-es.mbaas-logging.svc.cluster.local.feedhenry.io. 6 IN A 192.168.33.12

;; Query time: 606 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Wed Jan 27 08:31:49 UTC 2016
;; MSG SIZE  rcvd: 100

We can in fact see this when we use curl:

curl -s -S -I -k --verbose https://logging-es.mbaas-logging.svc.cluster.local:9200
* About to connect() to logging-es.mbaas-logging.svc.cluster.local port 9200 (#0)
* Trying 192.168.33.12...
* Connection refused
* Failed connect to logging-es.mbaas-logging.svc.cluster.local:9200; Connection refused
* Closing connection 0
curl: (7) Failed connect to logging-es.mbaas-logging.svc.cluster.local:9200; Connection refused

Look at this, it is trying to connect to 192.168.33.12 and not the kubernetes services cluster ip 172.30.177.137.

If we bypass the dns resolution we can infact connect to the internal cluster ip:

$ curl -s -S -I -k -H "Host: logging-es.mbaas-logging.svc.cluster.local" --resolve logging-es.mbaas-logging.svc.cluster.local:9200:172.30.177.137 --verbose https://logging-es.mbaas-logging.svc.cluster.local:9200

* Added logging-es.mbaas-logging.svc.cluster.local:9200:172.30.177.137 to DNS cache
* About to connect() to logging-es.mbaas-logging.svc.cluster.local port 9200 (#0)
*   Trying 172.30.177.137...
* Connected to logging-es.mbaas-logging.svc.cluster.local (172.30.177.137) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* NSS error -12271 (SSL_ERROR_BAD_CERT_ALERT)
* SSL peer cannot verify your certificate.
* Closing connection 0
curl: (58) NSS: client certificate not found (nickname not specified)

If we simply use logging-es that would append .mbaas-logging.svc.cluster.local which can be resolved:

[vagrant@local ~]$  kubectl exec busybox -- nslookup logging-es.mbaas-logging.svc.cluster.local
Server:    172.30.0.1
Address 1: 172.30.0.1 kubernetes.default.svc.cluster.local

Name:      logging-es.mbaas-logging.svc.cluster.local
Address 1: 172.30.177.137 logging-es.mbaas-logging.svc.cluster.local

One solution for this would be to not use the full service name, as in my-svc.my-namespace.svc.cluster.local. Since these pods all live in the same namespace/project they can refer to other services in the same namespace project using the simple service name (without the namespace.

As mentioned earlier the value for ES_HOST is logging-es.mbaas-logging.svc.cluster.local in our case. This is then used when processing fluentd.yaml.

We are currently working around this issue by editing the logging-fluentd deployment configuration:

$ oc edit deploymentconfig logging-fluentd

and changing the ES_HOST value to logging-es (the default value in fluentd.yaml). This works for us and we logs are showing up in Kibana.

I'm trying to figure out if this is something specific to our environment or if this might also be an issue for others?

EFK: Error checking ACL when seeding

Issue:
After doing a basic installation of EFK logging on a lab cluster, the following errors come up in all of the Elasticsearch pods:

[2016-05-23 15:26:50,982][ERROR][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Stained Glass Scarlet] Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
...

For further details see also:
[(https://gist.github.com/rflorenc/18644314624ff0876b9f62a8e30ac25c)]

Version info:
openshift v3.2.0.44
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

oc get pods -o wide
NAME READY STATUS RESTARTS AGE NODE
logging-deployer-0ihvw 0/1 Completed 0 2d 192.1.12.26
logging-deployer-8vlkb 0/1 Completed 0 2d 192.1.12.26
logging-deployer-98bpu 0/1 Completed 0 2d 192.1.12.26
logging-deployer-dwyno 0/1 Completed 0 2d 192.1.12.26
logging-deployer-jd6so 0/1 Completed 0 2d 192.1.12.26
logging-deployer-rpri1 0/1 Completed 0 2d 192.1.12.26
logging-deployer-xtgde 0/1 Completed 0 2d 192.1.12.26
logging-es-0egyxsqv-1-7yo6f 1/1 Running 0 2d 192.1.13.21
logging-es-0tkimh6h-1-52ejb 1/1 Running 0 2d 192.1.13.44
logging-es-1jnuwamn-1-2qgsu 1/1 Running 0 2d 192.1.12.91
logging-es-1wutikij-1-q63ed 1/1 Running 0 2d 192.1.13.30
logging-es-6mdp2lg4-1-qaqda 1/1 Running 0 2d 192.1.13.35
logging-es-6qvmrgr3-1-66jhn 1/1 Running 0 2d 192.1.12.80
logging-es-agd26le3-1-tc59s 1/1 Running 0 2d 192.1.13.16
logging-es-blvicocl-1-jygw6 1/1 Running 0 2d 192.1.13.26
logging-es-crg80ae2-1-cxzgd 1/1 Running 0 2d 192.1.13.33
logging-es-cu4hcxij-1-st9og 1/1 Running 0 2d 192.1.12.81
logging-es-ewaf2h33-1-snp2v 1/1 Running 0 2d 192.1.12.86
logging-es-ki533gnt-1-5hx7x 1/1 Running 0 2d 192.1.13.1
logging-es-m62n8qre-1-comug 1/1 Running 0 2d 192.1.13.14
logging-es-qb6hswff-1-5cosr 1/1 Running 0 2d 192.1.13.4
logging-es-qdbdftok-1-v2foz 1/1 Running 0 2d 192.1.13.51
logging-es-shm6lj2r-1-wxs1f 1/1 Running 0 2d 192.1.13.28
logging-es-sq0arlq1-1-flv71 1/1 Running 0 2d 192.1.12.96
logging-es-uvcyndsx-1-6njbd 1/1 Running 0 2d 192.1.13.17
logging-es-ve4k3a1i-1-nhuot 1/1 Running 0 2d 192.1.13.20
logging-es-y4fts5fj-1-7n9zw 1/1 Running 0 2d 192.1.12.78
logging-kibana-1-5915n 2/2 Running 0 2d 192.1.12.26

Add tuning knobs to ES

e.g. http://evertrue.github.io/blog/2014/11/16/3-performance-tuning-tips-for-elasticsearch/
And any findings from the aos scalability team

fluentd does not work with hostmount-anyuid with SELinux Enforcing

With SELinux Enforcing, with the new hostmount-anyuid scc for system:serviceaccount:logging:aggregated-logging-fluentd, fluentd is not able to read/write /var/log on the host.

aggregated-logging and stacktraces

We've installed the aggregated-logging Stack from https://github.com/openshift/origin-aggregated-logging/tree/master/deployment

Fluentd handels java exception stack traces line per line as a single event. Support requires error messages/stack traces as ONE event, otherwise it is hard to separate different errors recorded at the same time.

[RFE] Allow running a consolidated ops/logs cluster as opposed to requiring a separate ops cluster

My suspicion is that, when using a "separate" ops cluster, the ops cluster is also capturing all of the user app data, which results in duplication of data.

Or is the back-end configured such that the ops kibana cluster is pulling the data for the app logs from the regular elasticsearch cluster?

Add support for image pull secrets

The deployer supports arbitrary prefixes for the logging component images. It would be nice if you could specify an image pull secret, in order to allow them to be hosted on a locked-down external Docker registry.

Sometime can't get log in kibana

If node time zone is inconsistent with container time zone, is this can affect logging function ?

Support standalone external ElasticSearch deployment

Since #60, it is possible to have fluentd send logs to an external ElasticSearch instance in addition to the embedded one. It would be nice to be able to send logs to an external ElasticSearch instance exclusively, without even running ElasticSearch (or its associated Kibana and curator) inside OpenShift.

[exploratory] client-node based optimizations for finer grained scaling tasks

Problem

Scaling Elastic search now is done course-grained: By creating more servers. However, there are many different types of scaling - write throughput, capacity, etc... So, when we need to just scale one aspect (i.e. write throughput), we may have to take up extra resources on a cluster (storage) if we only have 1 way to scale elastic search.

Soution

To prevent wasting resources, we can implement finer-grained scaling of different elastic search nodes (writers, readers, clients).

Details

@portante has suggested we leverage ES-Client nodes to decouple scaling requirements , cc @jeremyeder @timothysc @rflorenc

the elastic search "client node" is a special node which is really just there to do smart "ES-aware" loadbalancing, https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html.

this diagram summarizes the arch change we might implement if we use ES Clients

This is just a first pass but the main idea is:

separate writers from readers
have kuberentes service abstraction connect to "client nodes"

The benefits:

Writers never have to field read requests, clients minimize the amount of "hops" they have to do when connecting to a shard, and you decouple scaling of logging capacity (bottlnecked by servers) from scaling of other operations (reads/writes/lookups).

openshift / origin-aggregated-logging Goto Github PK

origin-aggregated-logging's Introduction

OpenShift Logging

Components

Elasticsearch

Fluentd

Kibana

Cluster Logging Operator

Issues

Contributions

origin-aggregated-logging's People

Contributors

Stargazers

Watchers

Forkers

origin-aggregated-logging's Issues

Stack setup

Recommend Projects

Recommend Topics

Recommend Org