vmware / kube-fluentd-operator Goto Github PK

View Code? Open in Web Editor NEW

318.0 14.0 98.0 1.05 MB

Auto-configuration of Fluentd daemon-set based on Kubernetes metadata

License: Other

Makefile 2.74% Shell 1.09% Ruby 5.37% Go 88.53% Dockerfile 1.67% Mustache 0.59%

kubernetes fluentd operators hacktoberfest golang vmware

kube-fluentd-operator's Introduction

kube-fluentd-operator (KFO)

Overview

Kubernetes Fluentd Operator (KFO) is a Fluentd config manager with batteries included, config validation, no needs to restart, with sensible defaults and best practices built-in. Use Kubernetes labels to filter/route logs per namespace!

kube-fluentd-operator configures Fluentd in a Kubernetes environment. It compiles a Fluentd configuration from configmaps (one per namespace) - similar to how an Ingress controller would compile nginx configuration from several Ingress resources. This way only one instance of Fluentd can handle all log shipping for an entire cluster and the cluster admin does NOT need to coordinate with namespace admins.

Cluster administrators set up Fluentd only once and namespace owners can configure log routing as they wish. KFO will re-configure Fluentd accordingly and make sure logs originating from a namespace will not be accessible by other tenants/namespaces.

KFO also extends the Fluentd configuration language making it possible to refer to pods based on their labels and the container name pattern. This enables for very fined-grained targeting of log streams for the purpose of pre-processing before shipping. Writing a custom processor, adding a new Fluentd plugin, or writing a custom Fluentd plugin allow KFO to be extendable for any use case and any external logging ingestion system.

Finally, it is possible to ingest logs from a file on the container filesystem. While this is not recommended, there are still legacy or misconfigured apps that insist on logging to the local filesystem.

Try it out

The easiest way to get started is using the Helm chart. Official images are not published yet, so you need to pass the image.repository and image.tag manually:

git clone [email protected]:vmware/kube-fluentd-operator.git
helm install kfo ./kube-fluentd-operator/charts/log-router \
  --set rbac.create=true \
  --set image.tag=v1.18.1 \
  --set image.repository=vmware/kube-fluentd-operator

Alternatively, deploy the Helm chart from a Github release:

CHART_URL='https://github.com/vmware/kube-fluentd-operator/releases/download/v1.18.1/log-router-0.4.0.tgz'

helm install kfo ${CHART_URL} \
  --set rbac.create=true \
  --set image.tag=v1.18.1 \
  --set image.repository=vmware/kube-fluentd-operator

Then create a namespace demo and a configmap describing where all logs from demo should go to. The configmap must contain an entry called "fluent.conf". Finally, point the kube-fluentd-operator to this configmap using annotations.

kubectl create ns demo

cat > fluent.conf << EOF
<match **>
  @type null
</match>
EOF

# Create the configmap with a single entry "fluent.conf"
kubectl create configmap fluentd-config --namespace demo --from-file=fluent.conf=fluent.conf


# The following step is optional: the fluentd-config is the default configmap name.
# kubectl annotate namespace demo logging.csp.vmware.com/fluentd-configmap=fluentd-config

In a minute, this configuration would be translated to something like this:

<match demo.**>
  @type null
</match>

Even though the tag ** was used in the <match> directive, the kube-fluentd-operator correctly expands this to demo.**. Indeed, if another tag which does not start with demo. was used, it would have failed validation. Namespace admins can safely assume that they has a dedicated Fluentd for themselves.

All configuration errors are stored in the annotation logging.csp.vmware.com/fluentd-status. Try replacing ** with an invalid tag like 'hello-world'. After a minute, verify that the error message looks like this:

# extract just the value of logging.csp.vmware.com/fluentd-status
kubectl get ns demo -o jsonpath='{.metadata.annotations.logging\.csp\.vmware\.com/fluentd-status}'
bad tag for <match>: hello-world. Tag must start with **, $thisns or demo

When the configuration is made valid again the fluentd-status is set to "".

To see kube-fluentd-operator in action you need a cloud log collector like logz.io, papertrail or ELK accessible from the K8S cluster. A simple logz.io configuration looks like this (replace TOKEN with your customer token):

<match **>
   @type logzio_buffered
   endpoint_url https://listener.logz.io:8071?token=$TOKEN
</match>

Build

Get the code using go get or git clone this repo:

go get -u github.com/vmware/kube-fluentd-operator/config-reloader
cd $GOPATH/src/github.com/vmware/kube-fluentd-operator

# build a base-image
cd base-image && make build-image

# build helm chart
cd charts/log-router && make helm-package

# build the daemon
cd config-reloader
make install
make build-image

# run with mock data (loaded from the examples/ folder)
make run-once-fs

# run with mock data in a loop (may need to ctrl+z to exit)
make run-loop-fs

# inspect what is generated from the above command
ls -l tmp/

Project structure

charts/log-router: Builds the Helm chart
base-image: Builds a Fluentd 1.2.x image with a curated list of plugins
config-reloader: Builds the daemon that generates fluentd configuration files

Config-reloader

This is where interesting work happens. The dependency graph shows the high-level package interaction and general dataflow.

config: handles startup configuration, reading and validation
datasource: fetches Pods, Namespaces, ConfigMaps from Kubernetes
fluentd: parses Fluentd config files into an object graph
processors: walks this object graph doing validations and modifications. All features are implemented as chained Processor subtypes
generator: serializes the processed object graph to the filesystem for Fluentd to read
controller: orchestrates the high-level datasource -> processor -> generator pipeline.

How does it work

It works be rewriting the user-provided configuration. This is possible because kube-fluentd-operator knows about the kubernetes cluster, the current namespace and also has some sensible defaults built in. To get a quick idea what happens behind the scenes consider this configuration deployed in a namespace called monitoring:

<filter $labels(server=apache)>
  @type parser
  <parse>
    @type apache2
  </parse>
</filter>

<filter $labels(app=django)>
  @type detect_exceptions
  language python
</filter>

<match **>
  @type es
</match>

It gets processed into the following configuration which is then fed to Fluentd:

<filter kube.monitoring.*.*>
  @type record_transformer
  enable_ruby true

  <record>
    kubernetes_pod_label_values ${record["kubernetes"]["labels"]["app"]&.gsub(/[.-]/, '_') || '_'}.${record["kubernetes"]["labels"]["server"]&.gsub(/[.-]/, '_') || '_'}
  </record>
</filter>

<match kube.monitoring.*.*>
  @type rewrite_tag_filter

  <rule>
    key kubernetes_pod_label_values
    pattern ^(.+)$
    tag ${tag}._labels.$1
  </rule>
</match>

<filter kube.monitoring.*.*.**>
  @type record_modifier
  remove_keys kubernetes_pod_label_values
</filter>

<filter kube.monitoring.*.*._labels.*.apache _proc.kube.monitoring.*.*._labels.*.apache>
  @type parser
  <parse>
    @type apache2
  </parse>
</filter>

<match kube.monitoring.*.*._labels.django.*>
  @type rewrite_tag_filter

  <rule>
    invert true
    key _dummy
    pattern /ZZ/
    tag 3bfd045d94ce15036a8e3ff77fcb470e0e02ebee._proc.${tag}
  </rule>
</match>

<match 3bfd045d94ce15036a8e3ff77fcb470e0e02ebee._proc.kube.monitoring.*.*._labels.django.*>
  @type detect_exceptions
  remove_tag_prefix 3bfd045d94ce15036a8e3ff77fcb470e0e02ebee
  stream container_info
</match>

<match kube.monitoring.*.*._labels.*.* _proc.kube.monitoring.*.*._labels.*.*>
  @type es
</match>

Configuration

Basic usage

To give the illusion that every namespace runs a dedicated Fluentd the user-provided configuration is post-processed. In general, expressions starting with $ are macros that are expanded. These two directives are equivalent: <match **>, <match $thisns>. Almost always, using the ** is the preferred way to match logs: this way you can reuse the same configuration for multiple namespaces.

The admin namespace

Kube-fluentd-operator defines one namespace to be the admin namespace. By default this is set to kube-system. The admin namespace is treated differently. Its configuration is not processed further as it is assumed only the cluster admin can manipulate resources in this namespace. If you don't plan to use any of the advanced features described bellow, you can just route all logs from all namespaces using this snippet in the admin namespace:

<match **>
 @type ...
 # destination configuration omitted
</match>

** in this context is not processed and it means literally everything.

Fluentd assumes it is running in a distro with systemd and generates logs with these Fluentd tags:

systemd.{unit}: the journal of a systemd unit, for example systemd.docker.service
docker: all docker logs, not containers. If systemd is used, the docker logs are in systemd.docker.service
k8s.{component}: logs from a K8S component, for example k8s.kube-apiserver
kube.{namespace}.{pod_name}.{container_name}: a log originating from (namespace, pod, container)

As the admin namespace is processed first, a match-all directive would consume all logs and any other namespace configuration will become irrelevant (unless <copy> is used). A recommended configuration for the admin namespace is this one (assuming it is set to kube-system) - it captures all but the user namespaces' logs:

<match systemd.** kube.kube-system.** k8s.** docker>
  # all k8s-internal and OS-level logs

  # destination config omitted...
</match>

Note the <match systemd.** syntax. A single * would not work as the tag is the full name - including the unit type, for example systemd.nginx.service

Using the $labels macro

A very useful feature is the <filter> and the $labels macro to define parsing at the namespace level. For example, the config-reloader container uses the logfmt format. This makes it easy to use structured logging and ingest json data into a remote log ingestion service.

<filter $labels(app=log-router, _container=reloader)>
  @type parser
  reserve_data true
  <parse>
    @type logfmt
  </parse>
</filter>

<match **>
  @type logzio_buffered
  # destination config omitted
</match>

The above config will pipe all logs from the pods labelled with app=log-router through a logfmt parser before sending them to logz.io. Again, this configuration is valid in any namespace. If the namespace doesn't contain any log-router components then the <filter> directive is never activated. The _container is sort of a "meta" label and it allows for targeting the log stream of a specific container in a multi-container pod.

If you use Kubernetes recommended labels for the pods and deployments, then KFO will rewrite . characters into _.

For example, let's assume the following labels exist in the fluentd-config in the testing namespace:

This label $labels(_container=nginx-ingress-controller) will filter by container name pattern. The label will convert to this for example: kube.testing.*.nginx-ingress-controller._labels.*.*.

This label $labels(app.kubernetes.io/name=nginx-ingress, _container=nginx-ingress-controller) converts to this kube.testing.*.nginx-ingress-controller._labels.*.nginx_ingress.

This label $labels(app.kubernetes.io/name=nginx-ingress) converts to this $labels(kube.testing.*.*._labels.*.nginx_ingress).

This fluentd configmap in the testing namespace:

<filter **>
  @type concat
  timeout_label @DISTILLERY_TYPES
  key message
  stream_identity_key cont_id
  multiline_start_regexp /^(\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}|\[\w+\]\s|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|=\w+ REPORT====|\d{2}\:\d{2}\:\d{2}\.\d{3})/
  flush_interval 10
</filter>

<match **>
  @type relabel
  @label @DISTILLERY_TYPES
</match>

<label @DISTILLERY_TYPES>
  <filter $labels(app_kubernetes_io/name=kafka)>
    @type parser
    key_name log
    format json
    reserve_data true
    suppress_parse_error_log true
  </filter>

  <filter $labels(app.kubernetes.io/name=nginx-ingress, _container=controller)>
    @type parser
    key_name log

    <parse>
      @type json
      reserve_data true
      time_format %FT%T%:z
      emit_invalid_record_to_error false
    </parse>
  </filter>

  <match $labels(tag=noisy)>
    @type null
  </match>
</label>

will be rewritten inside of KFO pods as this:

<filter kube.testing.**>
  @type concat
  flush_interval 10
  key message
  multiline_start_regexp /^(\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}|\[\w+\]\s|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|=\w+ REPORT====|\d{2}\:\d{2}\:\d{2}\.\d{3})/
  stream_identity_key cont_id
  timeout_label @-DISTILLERY_TYPES-0e93f964a5b5f1760278744f1adf55d58d0e78ba
</filter>

<match kube.testing.**>
  @label @-DISTILLERY_TYPES-0e93f964a5b5f1760278744f1adf55d58d0e78ba
  @type relabel
</match>

<match kube.testing.**>
  @label @-DISTILLERY_TYPES-0e93f964a5b5f1760278744f1adf55d58d0e78ba
  @type null
</match>

<label @-DISTILLERY_TYPES-0e93f964a5b5f1760278744f1adf55d58d0e78ba>
  <filter kube.testing.*.*._labels.*.kafka.*>
    @type parser
    format json
    key_name log
    reserve_data true
    suppress_parse_error_log true
  </filter>
  <filter kube.testing.*.controller._labels.nginx_ingress.*.*>
    @type parser
    key_name log

    <parse>
      @type json
      emit_invalid_record_to_error false
      reserve_data true
      time_format %FT%T%:z
    </parse>
  </filter>
  <match kube.testing.*.*._labels.*.*.noisy>
    @type null
  </match>
</label>

All plugins that change the fluentd tag are disabled for security reasons. Otherwise a rogue configuration may divert other namespace's logs to itself by prepending its name to the tag.

Ingest logs from a file in the container

The only allowed <source> directive is of type mounted-file. It is used to ingest a log file from a container on an emptyDir-mounted volume:

<source>
  @type mounted-file
  path /var/log/welcome.log
  labels app=grafana, _container=test-container
  <parse>
    @type none
  </parse>
</source>

The labels parameter is similar to the $labels macro and helps the daemon locate all pods that might log to the given file path. The <parse> directive is optional and if omitted the default @type none will be used. If you know the format of the log file you can explicitly specify it, for example @type apache2 or @type json.

The above configuration would translate at runtime to something similar to this:

<source>
  @type tail
  path /var/lib/kubelet/pods/723dd34a-4ac0-11e8-8a81-0a930dd884b0/volumes/kubernetes.io~empty-dir/logs/welcome.log
  pos_file /var/log/kfotail-7020a0b821b0d230d89283ba47d9088d9b58f97d.pos
  read_from_head true
  tag kube.kfo-test.welcome-logger.test-container

  <parse>
    @type none
  </parse>
</source>

Dealing with multi-line exception stacktraces (since v1.3.0)

Most log streams are line-oriented. However, stacktraces always span multiple lines. kube-fluentd-operator integrates stacktrace processing using the fluent-plugin-detect-exceptions. If a Java-based pod produces stacktraces in the logs, then the stacktraces can be collapsed in a single log event like this:

<filter $labels(app=jpetstore)>
  @type detect_exceptions
  # you can skip language in which case all possible languages will be tried: go, java, python, ruby, etc...
  language java
</filter>

# The rest of the configuration stays the same even though quite a lot of tag rewriting takes place

<match **>
 @type es
</match>

Notice how filter is used instead of match as described in fluent-plugin-detect-exceptions. Internally, this filter is translated into several match directives so that the end user doesn't need to bother with rewriting the Fluentd tag.

Also, users don't need to bother with setting the correct stream parameter. kube-fluentd-operator generates one internally based on the container id and the stream.

Reusing output plugin definitions (since v1.6.0)

Sometimes you only have a few valid options for log sinks: a dedicated S3 bucket, the ELK stack you manage, etc. The only flexibility you're after is letting namespace owners filter and parse their logs. In such cases you can abstract over an output plugin configuration - basically reducing it to a simple name which can be referenced from any namespace. For example, let's assume you have an S3 bucket for a "test" environment and you use logz.io for a "staging" environment. The first thing you do is define these two output in the admin namespace:

admin-ns.conf:
<match systemd.** docker kube.kube-system.** k8s.**>
  @type logzio_buffered
  endpoint_url https://listener.logz.io:8071?token=$TOKEN
</match>

<plugin test>
  @type s3
  aws_key_id  YOUR_AWS_KEY_ID
  aws_sec_key YOUR_AWS_SECRET_KEY
  s3_bucket   YOUR_S3_BUCKET_NAME
  s3_region   AWS_REGION
</plugin>

<plugin staging>
  @type logzio_buffered
  endpoint_url https://listener.logz.io:8071?token=$TOKEN
</plugin>

In the above example for the admin configuration, the match directive is first defined to direct where to send logs for the systemd, docker, kube-system, and kubernetes control plane components. Below the match directive we have defined the plugin directives which define the log sinks that can be reused by namespace configurations.

A namespace can refer to the staging and test plugins oblivious to the fact where exactly the logs end up:

acme-test.conf
<match **>
  @type test
</match>


acme-staging.conf
<match **>
  @type staging
</match>

kube-fluentd-operator will insert the content of the plugin directive in the match directive. From then on, regular validation and postprocessing takes place.

Retagging based on log contents (since v1.12.0)

Sometimes you might need to split a single log stream to perform different processing based on the contents of one of the fields. To achieve this you can use the retag plugin that allows to specify a set of rules that match regular expressions against the specified fields. If one of the rules matches, the log is re-emitted with a new namespace-unique tag based on the specified tag.

Logs that are emitted by this plugin can be consequently filtered and processed by using the $tag macro when specifiying the tag:

<match $labels(app=apache)>
  @type retag
  <rule>
    key message
    pattern /^(ERROR) .*$/
    tag notifications.$1 # refer to a capturing group using $number
  </rule>
  <rule>
    key message
    pattern /^(FATAL) .*$/
    tag notifications.$1
  </rule>
  <rule>
    key message
    pattern /^(ERROR)|(FATAL) .*$/
    tag notifications.other
    invert true # rewrite tag when unmatch pattern
  </rule>
</match>

<filter $tag(notifications.ERROR)>
  # perform some extra processing
</filter>

<filter $tag(notifications.FATAL)>
  # perform different processing
</filter>

<match $tag(notifications.**)>
  # send to common output plugin
</match>

kube-fluentd-operator ensures that tags specified using the $tag macro never conflict with tags from other namespaces, even if the tag itself is equivalent.

Sharing logs between namespaces

By default, you can consume logs only from your namespaces. Often it is useful for multiple namespaces (tenants) to get access to the logs streams of a shared resource (pod, namespace). kube-fluentd-operator makes it possible using two constructs: the source namespace expresses its intent to share logs with a destination namespace and the destination namespace expresses its desire to consume logs from a source. As a result logs are streamed only when both sides agree.

A source namespace can share with another namespace using the @type share macro:

producer namespace configuration:

<match $labels(msg=nginx-ingress)>
  @type copy
  <store>
    @type share
    # share all logs matching the labels with the namespace "consumer"
    with_namespace consumer
  </store>
</match>

consumer namespace configuration:

# use $from(producer) to get all shared logs from a namespace called "producer"
<label @$from(producer)>
  <match **>
    # process all shared logs here as usual
  </match>
</match>

The consuming namespace can use the usual syntax inside the <label @$from...> directive. The fluentd tag is being rewritten as if the logs originated from the same namespace.

The producing namespace need to wrap @type share within a <store> directive. This is done on purpose as it is very easy to just redirect the logs to the destination namespace and lose them. The @type copy clones the whole stream.

Log metadata

Often you run mulitple Kubernetes clusters but you need to aggregate all logs to a single destination. To distinguish between different sources, kube-fluentd-operator can attach arbitrary metadata to every log event. The metadata is nested under a key chosen with --meta-key. Using the helm chart, metadata can be enabled like this:

helm install ... \
  --set meta.key=metadata \
  --set meta.values.region=us-east-1 \
  --set meta.values.env=staging \
  --set meta.values.cluster=legacy

Every log event, be it from a pod, mounted-file or a systemd unit, will now carry this metadata:

{
  "metadata": {
    "region": "us-east-1",
    "env": "staging",
    "cluster": "legacy"
  }
}

All logs originating from a file look exactly as all other Kubernetes logs. However, their stream field is not set to stdout but to the path to the source file:

{
  "message": "Some message from the welcome-logger pod",
  "stream": "/var/log/welcome.log",
  "kubernetes": {
    "container_name": "test-container",
    "host": "ip-11-11-11-11.us-east-2.compute.internal",
    "namespace_name": "kfo-test",
    "pod_id": "723dd34a-4ac0-11e8-8a81-0a930dd884b0",
    "pod_name": "welcome-logger",
    "labels": {
      "msg": "welcome",
      "test-case": "b"
    },
    "namespace_labels": {}
  },
  "metadata": {
    "region": "us-east-2",
    "cluster": "legacy",
    "env": "staging"
  }
}

Go templting

The ConfigMap holding the fluentd configuration can be templated using go templting, you can use this for example to get a value from another kubernetes resource, like a secret, for example:

kind: ConfigMap
apiVersion: v1
metadata:
  annotations: {}
  name: fluentd-config
  namespace: my-namespace
data:
  fluent.conf: |
    {{- $s := k8sLookup "Secret.v1" "my-namespace" "my-secret" -}}
    <match **>
      @type logzio_buffered
      endpoint_url https://listener.logz.io:8071?token={{ $s.data.token }}&type=log-router
      output_include_time true
      output_include_tags false
      http_idle_timeout 10

      <buffer>
        @type file
        path /var/log/my_namespace.log.buf
        flush_thread_count 4
        flush_interval 10s
        chunk_limit_size 16m
        queue_limit_length 4096
      </buffer>
    </match>

You can limit what k8s objects can be looked up using the templting functionality by passing the --allow-label flag, for example --allow-label=logs.vmware.com/allow. You can also override what label to use on specific Namespaces by passing the --allow-label-annotation flag and then setting what label to use in that annotation on the Namespace, for example, --allow-label-annotation=logs.vmware.com/allow-label And in the Namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: my-namespace
  annotations:
    logs.vmware.com/allow-label: "logs.vmware.com/my_namespace"
spec:
  finalizers:
    - kubernetes

And the templated config in this Namespace will only be allowed to lookup resources labeled with logs.vmware.com/my_namespace="true"

Custom resource definition(CRD) support (since v1.13.0)

Custom resources are introduced from v1.13.0 release onwards. It allows to have a dedicated resource for fluentd configurations, which enables to manage them in a more consistent way and move away from the generic ConfigMaps. It is possible to create configs for a new application simply by attaching a FluentdConfig resource to the application manifests, rather than using a more generic ConfigMap with specific names and/or labels.

apiVersion: logs.vdp.vmware.com/v1beta1
kind: FluentdConfig
metadata:
  name: fd-config
spec:
  fluentconf: |
    <match kube.ns.**>
      @type relabel
      @label @NOTIFICATIONS
    </match>

    <label @NOTIFICATIONS>
     <match **>
       @type null
     </match>
    </label>

The "crd" has been introduced as a new datasource, configurable through the helm chart values, to allow users that are currently set up with ConfigMaps and do not want to perform the switchover to FluentdConfigs, to be able to keep on using them. The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed. The existing configurations though ConfigMaps can be migrated to CRDs through the following migration flow

A new user, who is installing kube-fluentd-operator for the first time, should set the datasource: crd option in the chart. This enables the crd support
A user who is already using kube-fluentd-operator with either datasource: default or datasource: multimap will have update to the new chart and set the 'crdMigrationMode' property to 'true'. This enables the config-reloader to launch with the crd datasource and the legacy datasource (either default or multimap depending on what was configured in the datasource property). The user can slowly migrate one by one all configmap resources to the corresponding fluentdconfig resources. When the migration is complete, the Helm release can be upgraded by changing the 'crdMigrationMode' property to 'false' and switching the datasource property to 'crd'. This will effectively disable the legacy datasource and set the config-reloader to only watch fluentdconfig resources.

Tracking Fluentd version

This projects tries to keep up with major releases for Fluentd docker image.

Fluentd version	Operator version
0.12.x	1.0.0
1.15.3	1.17.1
1.16.1	1.17.6
1.16.1	1.18.0
1.16.1	1.18.1

Plugins in latest release (1.18.1)

kube-fluentd-operator aims to be easy to use and flexible. It also favors sending logs to multiple destinations using <copy> and as such comes with many plugins pre-installed:

fluentd (1.16.1)
fluent-plugin-amqp (0.14.0)
fluent-plugin-azure-loganalytics (0.7.0)
fluent-plugin-cloudwatch-logs (0.14.3)
fluent-plugin-concat (2.5.0)
fluent-plugin-datadog (0.14.2)
fluent-plugin-elasticsearch (5.3.0)
fluent-plugin-opensearch (1.1.0)
fluent-plugin-gelf-hs (1.0.8)
fluent-plugin-google-cloud (0.13.0) - forked to allow fluentd v1.14.x
fluent-plugin-grafana-loki (1.2.20)
fluent-plugin-grok-parser (2.6.2)
fluent-plugin-json-in-json-2 (1.0.2)
fluent-plugin-kafka (0.18.1)
fluent-plugin-kinesis (3.4.2)
fluent-plugin-kubernetes_metadata_filter (3.2.0)
fluent-plugin-kubernetes_sumologic (2.4.2)
fluent-plugin-kubernetes (0.3.1)
fluent-plugin-logentries (0.2.10)
fluent-plugin-logzio (0.0.22)
fluent-plugin-mail (0.3.0)
fluent-plugin-mongo (1.5.0)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-papertrail (0.2.8)
fluent-plugin-prometheus (2.1.0)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-record-reformer (0.9.1)
fluent-plugin-redis (0.3.5)
fluent-plugin-remote_syslog (1.0.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-route (1.0.0)
fluent-plugin-s3 (1.7.2)
fluent-plugin-splunk-hec (1.3.1)
fluent-plugin-splunkhec (2.3)
fluent-plugin-sumologic_output (1.7.3)
fluent-plugin-systemd (1.0.5)
fluent-plugin-uri-parser (0.3.0)
fluent-plugin-verticajson (0.0.6)
fluent-plugin-vmware-loginsight (1.4.1)
fluent-plugin-vmware-log-intelligence (2.0.8)
fluent-plugin-mysqlslowquery (0.0.9)
fluent-plugin-throttle (0.0.5)
fluent-plugin-webhdfs (1.5.0)
fluent-plugin-detect-exceptions (0.0.15)

When customizing the image be careful not to uninstall plugins that are used internally to implement the macros.

If you need other destination plugins you are welcome to contribute a patch or just create an issue.

Synopsis

The config-reloader binary is the one that listens to changes in K8S and generates Fluentd files. It runs as a daemonset and is not intended to interact with directly. The synopsis is useful when trying to understand the Helm chart or just hacking.

usage: config-reloader [<flags>]

Regenerates Fluentd configs based Kubernetes namespace annotations against templates, reloading
Fluentd if necessary

Flags:
  --help                        Show context-sensitive help (also try --help-long and
                                --help-man).
  --version                     Show application version.
  --master=""                   The Kubernetes API server to connect to (default: auto-detect)
  --kubeconfig=""               Retrieve target cluster configuration from a Kubernetes
                                configuration file (default: auto-detect)
  --datasource=default          Datasource to use (default|fake|fs|multimap|crd)
  --crd-migration-mode          Enable the crd datasource together with the current datasource to facilitate the migration (used only with --datasource=default|multimap)
  --fs-dir=FS-DIR               If datasource=fs is used, configure the dir hosting the files
  --interval=60                 Run every x seconds
  --allow-file                  Allow @type file for namespace configuration
  --id="default"                The id of this deployment. It is used internally so that two
                                deployments don't overwrite each other's data
  --fluentd-rpc-port=24444      RPC port of Fluentd
  --log-level="info"            Control verbosity of config-reloader logs
  --fluentd-loglevel="info"     Control verbosity of fluentd logs
  --buffer-mount-folder=""      Folder in /var/log/{} where to create all fluentd buffers
  --annotation="logging.csp.vmware.com/fluentd-configmap"
                                Which annotation on the namespace stores the configmap name?
  --default-configmap="fluentd-config"
                                Read the configmap by this name if namespace is not annotated.
                                Use empty string to suppress the default.
  --status-annotation="logging.csp.vmware.com/fluentd-status"
                                Store configuration errors in this annotation, leave empty to
                                turn off
  --kubelet-root="/var/lib/kubelet/"
                                Kubelet root dir, configured using --root-dir on the kubelet
                                service
  --namespaces=NAMESPACES ...   List of namespaces to process. If empty, processes all namespaces
  --templates-dir="/templates"  Where to find templates
  --output-dir="/fluentd/etc"   Where to output config files
  --meta-key=META-KEY           Attach metadat under this key
  --meta-values=META-VALUES     Metadata in the k=v,k2=v2 format
  --fluentd-binary=FLUENTD-BINARY
                                Path to fluentd binary used to validate configuration
  --prometheus-enabled          Prometheus metrics enabled (default: false)
  --admin-namespace="kube-system"
                                The namespace to be treated as admin namespace

Helm chart

Parameter	Description	Default
`rbac.create`	Create a serviceaccount+role, use if K8s is using RBAC	`false`
`serviceAccountName`	Reuse an existing service account	`""`
`defaultConfigmap`	Read the configmap by this name if the namespace is not annotated	`"fluentd-config"`
`image.repositiry`	Repository	`vmware/kube-fluentd-operator`
`image.tag`	Image tag	`latest`
`image.pullPolicy`	Pull policy	`Always`
`image.pullSecret`	Optional pull secret name	`""`
`logLevel`	Default log level for config-reloader	`info`
`fluentdLogLevel`	Default log level for fluentd	`info`
`bufferMountFolder`	Folder in /var/log/{} where to create all fluentd buffers	`""`
`kubeletRoot`	The home dir of the kubelet, usually set using `--root-dir` on the kubelet	`/var/lib/kubelet`
`namespaces`	List of namespaces to operate on. Empty means all namespaces	`[]`
`interval`	How often to check for config changes (seconds)	`45`
`meta.key`	The metadata key (optional)	`""`
`meta.values`	Metadata to use for the key	`{}`
`extraVolumes`	Extra volumes
`fluentd.extraVolumeMounts`	Mount extra volumes for the fluentd container, required to mount ssl certificates when elasticsearch has tls enabled
`fluentd.resources`	Resource definitions for the fluentd container	`{}`
`fluentd.extraEnv`	Extra env vars to pass to the fluentd container	`{}`
`reloader.extraVolumeMounts`	Mount extra volumes for the reloader container
`reloader.resources`	Resource definitions for the reloader container	`{}`
`reloader.extraEnv`	Extra env vars to pass to the reloader container	`{}`
`tolerations`	Pod tolerations	`[]`
`updateStrategy`	UpdateStrategy for the daemonset. Leave empty to get the K8S' default (probably the safest choice)	`{}`
`podAnnotations`	Pod annotations for the daemonset
`adminNamespace`	The namespace to be treated as admin namespace	`kube-system`

Cookbook

I want to use one destination for everything

Simple, define configuration only for the admin namespace (by default kube-system):

kube-system.conf:
<match **>
  # configure destination here
</match>

I dont't care for systemd and docker logs

Simple, exclude them at the admin namespace level (by default kube-system):

kube-system.conf:
<match systemd.** docker>
  @type null
</match>

<match **>
  # all but systemd.** is still around
  # configure destination
</match>

I want to use one destination but also want to just exclude a few pods

It is not possible to handle this globally. Instead, provide this config for the noisy namespace and configure other namespaces at the cost of some code duplication:

noisy-namespace.conf:
<match $labels(app=verbose-logger)>
  @type null
</match>

# all other logs are captured here
<match **>
  @type ...
</match>

On the bright side, the configuration of noisy-namespace contains nothing specific to noisy-namespace and the same content can be used for all namespaces whose logs we need collected.

I am getting errors "namespaces is forbidden: ... cannot list namespaces at the cluster scope"

Your cluster is running under RBAC. You need to enable a serviceaccount for the log-router pods. It's easy when using the Helm chart:

helm install ./charts/log-router --set rbac.create=true ...

I have a legacy container that logs to /var/log/httpd/access.log

First you need version 1.1.0 or later. At the namespace level you need to add a source directive of type mounted-file:

<source>
  @type mounted-file
  path /var/log/httpd/access.log
  labels app=apache2
  <parse>
    @type apache2
  </parse>
</source>

<match **>
  # destination config omitted
</match>

The type mounted-file is again a macro that is expanded to a tail plugin. The <parse> directive is optional and if not set a @type none will be used instead.

In order for this to work the pod must define a mount of type emptyDir at /var/log/httpd or any of it parent folders. For example, this pod definition is part of the test suite (it logs to /var/log/hello.log):

apiVersion: v1
kind: Pod
metadata:
  name: hello-logger
  namespace: kfo-test
  labels:
    msg: hello
spec:
  containers:
    - image: ubuntu
      name: greeter
      command:
        - bash
        - -c
        - while true; do echo `date -R` [INFO] "Random hello number $((var++)) to file"; sleep 2; [[ $(($var % 100)) == 0 ]] && :> /var/log/hello.log ;done > /var/log/hello.log
      volumeMounts:
        - mountPath: /var/log
          name: logs
  volumes:
    - name: logs
      emptyDir: {}

To get the hello.log ingested by Fluentd you need at least this in the configuration for kfo-test namespace:

<source>
  @type mounted-file
  # need to specify the path on the container filesystem
  path /var/log/hello.log

  # only look at pods labeled this way
  labels msg=hello
  <parse>
    @type none
  </parse>
</source>

<match $labels(msg=hello)>
  # store the hello.log somewhere
  @type ...
</match>

I want to push logs from namespace `demo` to logz.io

demo.conf:
<match **>
  @type logzio_buffered
  endpoint_url https://listener.logz.io:8071?token=TOKEN&type=log-router
  output_include_time true
  output_include_tags true
  <buffer>
    @type memory
    flush_thread_count 4
    flush_interval 3s
    queue_limit_length 4096
  </buffer>
</match>

For details you should consult the plugin documentation.

I want to push logs to a remote syslog server

The built-in remote_syslog plugin cannot be used as the fluentd tag may be longer than 32 bytes. For this reason there is a truncating_remote_syslog plugin that shortens the tag to the allowed limit. If you are currently using the remote_syslog output plugin you only need to change a single line:

<match **>
  # instead of "remote_syslog"
  @type truncating_remote_syslog

  # the usual config for remote_syslog
</match>

To get the general idea how truncation works, consider this table:

Original Tag	Truncated tag
`kube.demo.test.test`	`demo.test.test`
`kube.demo.nginx-65899c769f-5zj6d.nginx`	`demo.nginx-65899c769f-5zj*.nginx`
`kube.demo.test.nginx11111111._lablels.hello`	`demo.test.nginx11111111`

I want to push logs to Humio

Humio speaks the elasticsearh protocol so configuration is pretty similar to Elasticsearch. The example bellow is based on https://github.com/humio/kubernetes2humio/blob/master/fluentd/docker-image/fluent.conf.

<match **>
  @type elasticsearch
  include_tag_key false

  host "YOUR_HOST"
  path "/api/v1/dataspaces/YOUR_NAMESPACE/ingest/elasticsearch/"
  scheme "https"
  port "443"

  user "YOUR_KEY"
  password ""

  logstash_format true

  reload_connections "true"
  logstash_prefix "fluentd:kubernetes2humio"
  buffer_chunk_limit 1M
  buffer_queue_limit 32
  flush_interval 1s
  max_retry_wait 30
  disable_retry_limit
  num_threads 8
</match>

I want to push logs to papertrail

test.conf:
<match **>
    @type papertrail
    papertrail_host YOUR_HOST.papertrailapp.com
    papertrail_port YOUR_PORT
    flush_interval 30
</match>

I want to push logs to an ELK cluster

<match ***>
  @type elasticsearch
  host ...
  port ...
  index_name ...
  # many options available
</match>

For details you should consult the plugin documentation.

I want to validate my config file before using it as a configmap

The container comes with a file validation command. To use it put all your *.conf file in a directory. Use the namespace name for the filename. Then use this one-liner, bind-mounting the folder and feeding it as a DATASOURCE_DIR env var:

docker run --entrypoint=/bin/validate-from-dir.sh \
    --net=host --rm \
    -v /path/to/config-folder:/workspace \
    -e DATASOURCE_DIR=/workspace \
    vmware/kube-fluentd-operator:latest

It will run fluentd in dry-run mode and even catch incorrect plug-in usage. This is so common that it' already captured as a script validate-logging-config.sh. The preferred way to use it is to copy it to your project and invoke it like this:

validate-logging-config.sh path/to/folder

All path/to/folder/*.conf files will be validated. Check stderr and the exit code for errors.

I want to use Fluentd @label to simplify processing

Use <label> as usual, the daemon ensures that label names are unique cluster-wide. For example to route several pods' logs to destination X, and ignore a few others you can use this:

<match $labels(app=foo)>
  @type relabel
  @label @blackhole
</match>

<match $labels(app=bar)>
  @type relabel
  @label @blackhole
</match>

<label @blackhole>
  <match **>
    @type null
  </match>
</label>

# at this point, foo and bar's logs are being handled in the @blackhole chain,
# the rest are still available for processing
<match **>
  @type ..
</match>

I want to parse ingress-nginx access logs and send them to a different log aggregator

The ingress controller uses a format different than the plain Nginx. You can use this fragment to configure the namespace hosting the ingress-nginx controller:

<filter $labels(app=nginx-ingress, _container=nginx-ingress-controller)>
  @type parser
  key_name log
  reserve_data true
  <parse>
    @type regexp
    expression /(?<remote_addr>[^ ]*) - \[(?<proxy_protocol_addr>[^ ]*)\] - (?<remote_user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<request>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*)/
    time_format %d/%b/%Y:%H:%M:%S %z
  </parse>
</filter>

<match **>
  # send the parsed access logs here
</match>

The above configuration assumes you're using the Helm charts for Nginx ingress. If not, make sure to the change the app and _container labels accordingly. Given the horrendous regex above, you really should be outputting access logs in json format and just specify @type json.

I want to send logs to different sinks based on log contents

The retag plugin allows to split a log stream based on whether the contents of certain fields match the given regular expressions.

<match $labels(app=apache)>
  @type retag
  <rule>
    key message
    pattern ^ERR
    tag notifications.error
  </rule>
  <rule>
    key message
    pattern ^ERR
    invert true
    tag notifications.other
  </rule>
</match>

<match $tag(notifications.error)>
  # manage log stream with error severity
</match>

<match $tag(notifications.**)>
  # manage log stream with non-error severity
</match>

I have my kubectl configured and my configmaps ready. I want to see the generated files before deploying the Helm chart

You need to run make like this:

make run-once

This will build the code, then config-reloader will connect to the K8S cluster, fetch the data and generate *.conf files in the ./tmp directory. If there are errors the namespaces will be annotated.

I want to build a custom image with my own fluentd plugin

Use the vmware/kube-fluentd-operator:TAG as a base and do any modification as usual. If this plugin is not top-secret consider sending us a patch :)

I run two clusters - in us-east-2 and eu-west-2. How to differentiate between them when pushing logs to a single location?

When deploying the daemonset using Helm, make sure to pass some metadata:

For the cluster in USA:

helm install ... \
  --set=meta.key=cluster_info \
  --set=meta.values.region=us-east-2

For the cluster in Europe:

helm install ... \
  --set=meta.key=cluster_info \
  --set=meta.values.region=eu-west-2

If you are using ELK you can easily get only the logs from Europe using cluster_info.region: +eu-west-2. In this example the metadata key is cluster_info but you can use any key you like.

I don't want to annotate all my namespaces at all

It is possible to reduce configuration burden by using a default configmap name. The default value is fluentd-config - kube-fluentd-operator will read the configmap by that name if the namespace is not annotated. If you don't like this default name or happen to use this configmap for other purposes then override the default with --default-configmap=my-default.

How can I be sure to use a valid path for the .pos and .buf files

.pos files store the progress of the upload process and .buf are used for local buffering. Colliding .pos/.buf paths can lead to races in Fluentd. As such, kube-fluentd-operator tries hard to rewrite such path-based parameters in a predictable way. You only need to make sure they are unique for your namespace and config-reloader will take care to make them unique cluster-wide.

I dont like the annotation name logging.csp.vmware.com/fluentd-configmap

Use --annotation=acme.com/fancy-config to use acme.com/fancy-config as annotation name. However, you'd also need to customize the Helm chart. Patches are welcome!

Known Issues

Currently space-delimited tags are not supported. For example, instead of <filter a b>, you need to use <filter a> and <filter b>. This limitation will be addressed in a later version.

Releases

CHANGELOG.md.

Resoures

This plugin is used to provide kubernetes metadata https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter
This daemonset definition is used as a template: https://github.com/fluent/fluentd-kubernetes-daemonset/tree/master/docker-image/v0.12/debian-elasticsearch, however kube-fluentd-operator uses version 1.x version of fluentd and all the compatible plugin versions.
This Github issue was the inspiration for the project. In particular it borrows the tag rewriting based on Kubernetes metadata to allow easier routing after that.

Contributing

The kube-fluentd-operator project team welcomes contributions from the community. If you wish to contribute code and you have not signed our contributor license agreement (CLA), our bot will update the issue when you open a Pull Request. For any questions about the CLA process, please refer to our FAQ. For more detailed information, refer to CONTRIBUTING.md.

kube-fluentd-operator's People

Contributors

Stargazers

Watchers

Forkers

sneko mhulscher anton-107 wirelessjeano rafaysystems dimitrovvlado patgpraj wandera sasg lemaral antoans pablocastellano zonzamas vsakarov devops-chimera criteo-forks jonasrutishauser vivekgarg20 apu1111 chirauki psi zachomedia neighborhoods etsangsplk fsero ibelikov yanzhaoli codacy skalickym wandera-pchung abinatarajan tommasopozzetti cw-sakamoto 181192 cablespaghetti jvassev i-developers evilrussian naveens raja-gola serialvelocity fishkins kkapoor1987 leewalter jeremyrickard tsunny sbrohivmw floriankoch viveksyngh dimalbaby jaekunchoi zhangheng1442 herikwebb julien-deruere cryptophobia vmohariya jliao2011 n0n0x slimm609 revellski musha68k logikone enigmacurry maxsivkov gmcelhoe deep9191 amaranthlis sabdalla80 vkadi javiercri infvie-cloud xelalexv oadekoya danlenar luksan47 gga-kialo emerout yuzs2 berlin-ab huskykurt djdillon thisismageshs bhargavnalluri cesium147 sureshamk lynn-e nhamlh vanabbott vsamidurai luis-sousa-pinto johnlinvc seanpm2001 wolfi-chainguard-demo radoctocode a-b-v

kube-fluentd-operator's Issues

'Add_labels' in mounted-file do not work

Sorry for my late confirm. I found bug in add_label(#26).

Looking at the fluent.conf generated using add_labels, it seems that the same tag is attached to tail directive for separate add_labels.
This seems that record_modifer is executed twice, remove_keys in record_modifer deletes the first dummy_, and it seems that only the result of applying the second record_modifer remains.

configmap

<source>
  @type mounted-file
  path /var/log/hello1.log
  labels msg=hello,_container=greeter-hoge
  add_labels fluent1=app1
</source>

<source>
  @type mounted-file
  path /var/log/hello2.log
  labels msg=hello,_container=greeter-hoge
  add_labels fluent2=app2
</source>

<source>
  @type mounted-file
  path /var/log/hello3.log
  labels msg=hello,_container=greeter-foo
  add_labels fluent3=app3
</source>

<match $labels(fluent1=app1)>
  @type s3a
</match>

<match $labels(fluent2=app2)>
  @type gcp
</match>

<match $labels(fluent3=app3)>
  @type s3b
</match>

# fluent.conf
# dont modify this file when building on top of the image

<system>
  log_level info

  # needed to enable /api/config.reload
  rpc_endpoint 127.0.0.1:24444
</system>

# you can turn this on for debug
# <match fluent.**>
#   @type stdout
# </match>


# OS-level services
@include systemd.conf

# docker container logs
@include kubernetes.conf

# enrich docker logs with k8s metadata
@include kubernetes-postprocess.conf


#################
# Namespace pre-processing
#################
<source>
  @type tail
  path /var/lib/kubelet/pods/dde1207b-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-15c6831538ad3f5b2acb634a88e34f00a22e72e7.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'dde1207b-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-9nh9n'}; record['container_info']='deac11d9b446470dd58bf76521a9196531b4d4b1'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent1'=>'app1','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/e208e61a-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-cc3928c6702394d76dafb56beed91fed759faa4e.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'e208e61a-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-dqngq'}; record['container_info']='aa33c2ab3e30423fbfe283fc7a11af676d1aac8a'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent1'=>'app1','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/dde1207b-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello2.log
  pos_file /var/log/kfotail-55503e847b3e3e55f897e1cf2359489383a3501b.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello2.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'dde1207b-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-9nh9n'}; record['container_info']='3c64193f7c2c23a64887dcc5c431d91801365da1'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent2'=>'app2','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/e208e61a-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello2.log
  pos_file /var/log/kfotail-2d274eb87f7c3f678e41319f9821772aec13d596.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello2.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'e208e61a-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-dqngq'}; record['container_info']='82d2a2e341b03b849218e382f7ecfc4f7c4d8fe9'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent2'=>'app2','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/dde1207b-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello3.log
  pos_file /var/log/kfotail-a49942a7f58a82aab8a5dd5f1e49f12fd734482a.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-foo

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-foo>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello3.log'; record['kubernetes']={'container_name'=>'greeter-foo','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'dde1207b-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-9nh9n'}; record['container_info']='b694b49070e3e86272081f1253187f8f29c8e04c'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent3'=>'app3','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/e208e61a-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello3.log
  pos_file /var/log/kfotail-0b4ab6ff645f6fffc5c0544f448efa4761de1c54.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-foo

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-foo>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello3.log'; record['kubernetes']={'container_name'=>'greeter-foo','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'e208e61a-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-dqngq'}; record['container_info']='28e431cf93a3c39aa3060cd833d668121143300a'; record['kubernetes']['labels']={'app'=>'hello-logger','fluent3'=>'app3','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>




#################
# Generated based on annotated kube-system namespace
#################
@include kube-system.conf
#################


#################
# Generated based on namespace annotations
#################
@include ns-test.conf

#################


<match **>
  # prevent fluentd from reporting every unmatched tag
  @type null
</match>

# ns-test.conf
<filter kube.test.*.*>
  @type record_transformer
  enable_ruby true

  <record>
    kubernetes_pod_label_values ${record.dig('kubernetes','labels','fluent1')&.gsub(/[.-]/, '_') || '_'}.${record.dig('kubernetes','labels','fluent2')&.gsub(/[.-]/, '_') || '_'}.${record.dig('kubernetes','labels','fluent3')&.gsub(/[.-]/, '_') || '_'}
  </record>
</filter>

<match kube.test.*.*>
  @type rewrite_tag_filter

  <rule>
    key kubernetes_pod_label_values
    pattern ^(.+)$
    tag ${tag}._labels.$1
  </rule>
</match>

<filter kube.test.*.*.**>
  @type record_transformer
  remove_keys kubernetes_pod_label_values
</filter>

<match kube.test.*.*._labels.app1.*.*>
  @type s3
  path logs/a/%Y/%m/%d/
  s3_bucket 2018-03-16-hslave4-log-skmt
  s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
  s3_region ap-northeast-1
</match>

<match kube.test.*.*._labels.*.app2.*>
  @type google_cloud
</match>

<match kube.test.*.*._labels.*.*.app3>
  @type s3
  path logs/b/%Y/%m/%d/
  s3_bucket 2018-03-16-hslave4-log-skmt
  s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
  s3_region ap-northeast-1
</match>

I think there is a problem with tag generation, but how about with such a fix(5bce248)?

Will fluentbit be supported?

Will this project support using FluentBit?

mounted-file logs missing Docker and kubernetes metadata

kubernetes.container_image and docker.container_id must be populated for logs originating from mounted-file sources.

Allow labels in copy output

Currently there is a restriction that relabel output can be used only in match. But when using copy output it is positioned in store directive. This is valid use case if you want to split the output and perform additional modifications.

How to use the kube-fluentd-operator to run separate instance of Fluentd - one for Infrastructure and another for Applications

Is your feature request related to a problem? Please describe.
Need a solution for Fluentd in a multitenant environment, specific concern is Whether we can restrict a separate Fluentd instance for each tenant in its own namespace.
A simple scenario : Keep Infrastructure and Application pods separate.
run separate instance of Fluentd - one for Infrastructure and another for Applications.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Enhance the logging-operator to allow creating a seperate instance of Fluentd per namespace.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Support generating of ServiceMonitor

Please add feature of generating ServiceMonitor in helm chart.
For example, in values.yaml

prometheusEnabled: true
ServiceMonitorEnabled: true

Update fluentd Image

Hi @jvassev Fluentd released its 1.4.2 version. Kube-fluentd operator also should be updated with 1.4.2 fluentd version.

Allow log ingestion for log files on disk

Very often in legacy setup a container logs to the local filesystem and to stderr/stdout.
While docker captures the output it is not possible to get to those log file easily (for example an access log or java GC log).

For example this

<source>
  @type pod-file
  labels app=nginx, container=main
  path /var/log/nginx-access.log
</source>

should expand to something like this:

<source>
  @type tail
  path /var/lib/kubelet/pods/8e0f9442-41b5-11e8-a138-02b2be114bba/volumes/kubernetes.io~empty- 
  dir/log/nginx-access.log

  tag {namespace}.{pod_name}.{container_name}
  # also, the stream field of the event must be set to "/var/log/nginx-access.log"
</source>

A few notes:

probably only emptyDir volumes will be supported
need to specify the --root-dir of the kubelet
need to extend the servicaacount to also be able to read pods

Some logs of the router should be display into stdout, no?

Hi!

I'm trying to set some alerts from my ElasticSearch cluster when something happens on the stderr stream.

I just noticed that your Fluentd routers were writing on this stream for this kind of message:

{
  "_index": "xxxxxxx:2018.08.09",
  "_type": "fluentd",
  "_id": "zjHEIGUBfMVtb4xVhlCo",
  "_version": 1,
  "_score": null,
  "_source": {
    "log": "time=\"2018-08-09T22:17:45Z\" level=debug msg=\"Will not process namespace 'monitoring': configmaps \\\"fluentd-config\\\" not found\"\n",
    "stream": "stderr",
    "docker": {
      "container_id": "aa13f4ee0ccdaf70b7b3e0cd247caf27ae41618ae3e60970f1c91eec3d014878"
    },
    "kubernetes": {
      "container_name": "reloader",
      "namespace_name": "logging",
      "pod_name": "fluentd-log-router-qsd54",
      "container_image": "jvassev/kube-fluentd-operator:latest",
      "container_image_id": "docker-pullable://jvassev/kube-fluentd-operator@sha256:684d41b219573d27d7029ab3a031beb63c92f412aa7520134086f0e2a03d64e8",
      "pod_id": "3b32f90e-9c1a-11e8-9255-42010a800003",
      "labels": {
        "app": "log-router",
        "release": "fluentd"
      },
      "host": "XXXXXXXXXXXX"
    },
    "container_info": "aa13f4ee0ccdaf70b7b3e0cd247caf27ae41618ae3e60970f1c91eec3d014878-stderr",
    "@timestamp": "2018-08-09T22:17:45.692996268+00:00"
  },
  "fields": {
    "@timestamp": [
      "2018-08-09T22:17:45.692Z"
    ]
  },
  "highlight": {
    "stream": [
      "@kibana-highlighted-field@stderr@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1533853065692
  ]
}

Indeed the router considers it as "debug" so I guess it should go to the stdout stream, what are your thoughts?

Thanks!

Tail arbitrary host file in /var/log ?

Having enabled kube-apiserver auditlog, I would like to send it to a specific index, but I don't seem to find a working way to specify an arbitrary absolute host file, and I don't want to go back to plain ol' fluentd.
Maybe it's human or doc issue :)
Thanks!

Duplicate pos file path in fluent.conf with mounted-file

I tried with the settings below, the fluentd container became crashLoopback.

Calcurating hash in https://github.com/vmware/kube-fluentd-operator/blob/master/config-reloader/processors/mounted_file.go#L110 seems to not use container id.

fluentd log

2018-08-27 06:58:06 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-08-27 06:58:06 +0000 [info]: adding rewrite_tag_filter rule: unit [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x007f8d06dd2878 @keys="unit">, /^(.+)$/, "", "systemd.$1"]
2018-08-27 06:58:06 +0000 [info]: adding rewrite_tag_filter rule: kubernetes_namespace_container_name [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x007f8d023005e8 @keys="kubernetes_namespace_container_name">, /^(.+)$/, "", "kube.$1"]
2018-08-27 06:58:07 +0000 [warn]: both of Plugin @id and path for <storage> are not specified. Using on-memory store.
2018-08-27 06:58:07 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Other 'in_tail' plugin already use same pos_file path: plugin_id = object:3fc680ff6dfc, pos_file path = /var/log/kfotail-e2ed04fd00276fdd0b2583231b9044c0c12c45b2.pos"

Deployment:

apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  name: hello-logger-deployment
  labels:
    app: hello-logger
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello-logger
  template:
    metadata:
      labels:
        app: hello-logger
        msg: hello
    spec:
      containers:
      - image: ubuntu
        name: greeter-hoge
        command: [ "bash", "-c"]
        args:
        - |
          output1=/var/log/hello1.log
          output2=/var/log/hello2.log
          while true; do
            echo `date -R` [INFO] "Random hello1 number $((var1++)) to file" > ${output1}
            echo `date -R` [INFO] "Random hello2 number $((var2++)) to file" > ${output2}
            sleep 2;
            [[ $(($var1 % 100)) == 0 ]] && :> ${output1} ;
            [[ $(($var2 % 100)) == 0 ]] && :> ${output2} ;
          done
        volumeMounts:
        - mountPath: /var/log
          name: logs
      - image: ubuntu
        name: greeter-foo
        command: [ "bash", "-c"]
        args:
        - |
          output3=/var/log/hello3.log
          while true; do
            echo `date -R` [INFO] "Random hello3 number $((var3++)) to file" > ${output3}
            sleep 2;
            [[ $(($var3 % 100)) == 0 ]] && :> ${output3} ;
          done
        volumeMounts:
        - mountPath: /var/log
          name: logs
      volumes:
      - name: logs
        emptyDir: {}

configmap for fluent.conf

<source>
  @type mounted-file
  path /var/log/hello1.log
  labels msg=hello
</source>
 
<match **>
  @type null
</match>

/fluentd/etc/fluent.conf in reloader(fluentd)

# dont modify this file when building on top of the image

<system>
  log_level info

  # needed to enable /api/config.reload
  rpc_endpoint 127.0.0.1:24444
</system>

# you can turn this on for debug
# <match fluent.**>
#   @type stdout
# </match>


# OS-level services
@include systemd.conf

# docker container logs
@include kubernetes.conf

# enrich docker logs with k8s metadata
@include kubernetes-postprocess.conf


#################
# Namespace pre-processing
#################
<source>
  @type tail
  path /var/lib/kubelet/pods/dde1207b-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-e2ed04fd00276fdd0b2583231b9044c0c12c45b2.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'dde1207b-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-9nh9n'}; record['container_info']='deac11d9b446470dd58bf76521a9196531b4d4b1'; record['kubernetes']['labels']={'app'=>'hello-logger','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/dde1207b-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-e2ed04fd00276fdd0b2583231b9044c0c12c45b2.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-foo

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-9nh9n.greeter-foo>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-foo','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'dde1207b-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-9nh9n'}; record['container_info']='deac11d9b446470dd58bf76521a9196531b4d4b1'; record['kubernetes']['labels']={'app'=>'hello-logger','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/e208e61a-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-f2edf6c3f456e55f2a38a79fc1fa21e351e52603.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-hoge>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-hoge','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'e208e61a-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-dqngq'}; record['container_info']='aa33c2ab3e30423fbfe283fc7a11af676d1aac8a'; record['kubernetes']['labels']={'app'=>'hello-logger','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

<source>
  @type tail
  path /var/lib/kubelet/pods/e208e61a-a9bf-11e8-a89a-0a391dd99bb8/volumes/kubernetes.io~empty-dir/logs/hello1.log
  pos_file /var/log/kfotail-f2edf6c3f456e55f2a38a79fc1fa21e351e52603.pos
  read_from_head true
  tag kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-foo

  <parse>
    @type none
  </parse>
</source>

<filter kube.test.hello-logger-deployment-7c5c7759fb-dqngq.greeter-foo>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/hello1.log'; record['kubernetes']={'container_name'=>'greeter-foo','host'=>'ip-10-0-4-93.ap-northeast-1.compute.internal','namespace_name'=>'test','pod_id'=>'e208e61a-a9bf-11e8-a89a-0a391dd99bb8','pod_name'=>'hello-logger-deployment-7c5c7759fb-dqngq'}; record['container_info']='aa33c2ab3e30423fbfe283fc7a11af676d1aac8a'; record['kubernetes']['labels']={'app'=>'hello-logger','msg'=>'hello','pod-template-hash'=>'3717331596'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>




#################
# Generated based on annotated kube-system namespace
#################
@include kube-system.conf
#################


#################
# Generated based on namespace annotations
#################
@include ns-test.conf

#################


<match **>
  # prevent fluentd from reporting every unmatched tag
  @type null
</match>

Support externalized secrets

In pure Fluentd one can use #{ENV['LOGZIO_TOKEN']} to get a value from the environment.

This is not usable in a multi-tenant setup like kube-fluentd-operator.

Instead, kube-fluentd-operator would need to support a similar syntax, for example:
#{SECRET['my-secret']['token']}. This would presumably get a token from a secret named my-secret in the same namespace.

The SECRET syntax would be expanded as early as possible in the processor chain.

Fluentd-reloader operator eats all inodes on kubernetes worker node

We have jvassev/kube-fluentd-operator:v1.11.0

reloader makes too many SERVERENGINE_SOCKETMANAGER files in tmp.
For example it made 14 files for 10 seconds:

root@kfo-log-router-9b2zr:/tmp# for x in {1..10}; do ls -laht | grep SERVERENGINE_SOCKETMANAGER| wc -l; sleep 1; done
97
99
100
102
104
105
107
108
110
111

And after month we had insufficient inodes on kubernetes workers.

Top inodes used overlays on node:

It is reloader:

[root@worker3 tmp]# docker ps | grep bcfa98ff44f9
bcfa98ff44f9        0bd8e776310c                                                         "/bin/config-reloade…"   8 weeks ago         Up 8 weeks                              k8s_reloader_kfo-log-router-56qn8_fluentd-operator_f2579948-3c51-11ea-8f5a-0050562c0157_0
[root@worker3 tmp]# docker inspect bcfa98ff44f9 | grep d140
                "LowerDir": "/var/lib/docker/overlay2/d140fa521fb344693fab9b795501ebaedcb677be4b0561d24373fa3b5fdb7f16-init/diff:/var/lib/docker/overlay2/655a53211a6a91c2e916417f0cbb88e273ec4c46b4f492cd86a8821a5421f317/diff:/var/lib/docker/overlay2/81342be5da4ee5d002b04661e7f0166742503a2e5fb9e8982b2bb574f298b4e7/diff:/var/lib/docker/overlay2/0c80e7fb9f84fdc2e661653d7f91b95736b90c10992fa730c7a56943b01c8640/diff:/var/lib/docker/overlay2/58fe7768413da32d52ac281e2d22abd42cce9aef517144a3dd17f789c6b807d3/diff:/var/lib/docker/overlay2/069c42d0c72a06cf21bdb140897927bc2fb336120820cafca6157222a545c184/diff:/var/lib/docker/overlay2/a4db1743b43ef1fa49e34be244f476faeacaaadf60375cc5ef2613f11ceef520/diff:/var/lib/docker/overlay2/c0bef6ffcdb724a3a778687059d73306fa5af7c9f11b005b5572f378711da53a/diff:/var/lib/docker/overlay2/108a82e633f29dea13b8bf91088b390660ef72f602198108639b8f4cc5b10d15/diff:/var/lib/docker/overlay2/84781d76100510a42903da4837ea3d90ca4170af8723f62dbf1f396a3579701f/diff:/var/lib/docker/overlay2/859be7a49ce3587672f9b3aae52cae133c1e270e7bc111eafa8145ce126d52a8/diff:/var/lib/docker/overlay2/8896e654f7f5b8231801ecf45ee8361ea69195e382b0e6e131d12cbba27439ce/diff:/var/lib/docker/overlay2/a22e0150f4d417d530c10461d35117868249394fb1fd53999a72cffa463790fe/diff:/var/lib/docker/overlay2/ec456f05b2fb9a8544392d1bd16afc8a8f27accb8db31697ce8ee6b53251198a/diff:/var/lib/docker/overlay2/50a2a0515f554ee4f6eb7ab4b472627ad864df1e69f264aeccd41334c70505fd/diff:/var/lib/docker/overlay2/9cb174d21b12f740525457f493ffd17a4bd156297745000eff291be75f703030/diff:/var/lib/docker/overlay2/85e01a55c547ad2f75cb60c4bbe87d511582852705773cb364a074719f6e1129/diff:/var/lib/docker/overlay2/b7a39a5abaf989a8b60eac4ba4c6b6bc8c70c7eb5019ec260a58ee2cf4896144/diff:/var/lib/docker/overlay2/93e4b2bf22f0ef6715c3fddfd1964823a68e203be4c0b2c4cc34d73a16d5cdb1/diff:/var/lib/docker/overlay2/24bef762aa1f88b6431d8208565a34f13bc3f7103d7f28e9f6bffea419859504/diff",
                "MergedDir": "/var/lib/docker/overlay2/d140fa521fb344693fab9b795501ebaedcb677be4b0561d24373fa3b5fdb7f16/merged",
                "UpperDir": "/var/lib/docker/overlay2/d140fa521fb344693fab9b795501ebaedcb677be4b0561d24373fa3b5fdb7f16/diff",
                "WorkDir": "/var/lib/docker/overlay2/d140fa521fb344693fab9b795501ebaedcb677be4b0561d24373fa3b5fdb7f16/work"

journal source by default

When installing with

CHART_URL='https://github.com/vmware/kube-fluentd-operator/releases/download/v1.8.0/log-router-0.3.0.tgz'

helm install --name kfo ${CHART_URL} \
  --set rbac.create=true \
  --set image.tag=v1.8.0 \
  --set image.repository=jvassev/kube-fluentd-operator

the kfo-log-router pods produce a lot of

2019-03-13 08:43:02 +0000 [warn]: #0 Systemd::JournalError: No such file or directory retrying in 1s
2019-03-13 08:43:03 +0000 [warn]: #0 Systemd::JournalError: No such file or directory retrying in 1s
2019-03-13 08:43:04 +0000 [warn]: #0 Systemd::JournalError: No such file or directory retrying in 1s
2019-03-13 08:43:05 +0000 [warn]: #0 Systemd::JournalError: No such file or directory retrying in 1s

because (?) the default /fluentd/etc/systemd.conf has:

# sync with
# https://raw.githubusercontent.com/fluent/fluentd-kubernetes-daemonset/master/docker-image/v0.12/debian-loggly/conf/systemd.conf

# Logs from systemd-journal for interesting services.
# https://github.com/reevoo/fluent-plugin-systemd/tree/v0.3.1
<source>
  @type systemd
  pos_file /var/log/kfo-log-router-fluentd-journald.pos
  path /var/log/journal
...

I don't undertand why kube-fluentd-operator that is installed with helm has this kind of setting by default?

Failed to expand log record

While fluentd-operator works great for most of my pods, the logs of some pods are lost. Instead I receive the following error messages in the fluentd logs. So far I did not find a pattern which pods are affected.

2018-08-07 08:41:30 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="failed to expand `record[\"kubernetes\"][\"labels\"][\"parselogs\"]&.gsub(/[.-]/, '_') || '_'` : error = undefined method `[]' for nil:NilClass" location="/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/filter_record_transformer.rb:310:in `rescue in expand'" tag="kube.site-delta.oneshot-5c7hc.test" time=2018-08-07 08:40:50.846153013 +0000 record={"log"=>"Ciao Delta!\n", "stream"=>"stdout", "docker"=>{"container_id"=>"5229dd14d74e921a72c7f294c1b2766d6e8b57f68cedaf46b192c2be7d44ac2b"}, "kubernetes"=>{"container_name"=>"test", "namespace_name"=>"site-delta", "pod_name"=>"oneshot-5c7hc", "pod_id"=>"5229dd14d74e921a72c7f294c1b2766d6e8b57f68cedaf46b192c2be7d44ac2b", "namespace_labels"=>{"name"=>"site-delta"}}, "container_info"=>"5229dd14d74e921a72c7f294c1b2766d6e8b57f68cedaf46b192c2be7d44ac2b-stdout"}

In fact, the "kubernetes" section does not contain any labels, although the pod is labelled.

The namespaces' fluentd-config is as simple as

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |-
    <filter $labels(parselogs=python)>
      @type parser
      key_name log
      format /^(?<time>[^\[]*) \[(?<level>[^\]]*)\] (?<logger>[^:]*):(?<message>.*)$/
    </filter>

    <match **>
      @type loggly
      loggly_url <my-loggly-url>
    </match>

With the following simple job I am sometimes able to reproduce the issue:

apiVersion: batch/v1
kind: Job
metadata:
  name: oneshot
  labels:
    component: logging-test
spec:
  parallelism: 5  # Not necessary
  completions: 5  # but might help to reproduce the issue
  template:
    metadata:
      labels:
        component: logging-test
    spec:
      restartPolicy: Never
      containers:
      - name: test
        image: <some_image>
        imagePullPolicy: Always
        command: ['bash']
        args:
          - '-c'
          - |
            set -eu
            echo "Hallo Delta!"
            sleep 10s
            echo "Ciao Delta!"

I use version 1.6.0 from log-router-0.2.3.tgz.

Add `strict` mode for the logfmt parser

Non logfmt-formatted events get split at at a whitespace.

A strict true parameter will ignore logfmt handling alltogether.

Error on startup v1.10.0

When I deploy the fluentd-operator to my cluster, I receive this error:

time="2019-09-13T15:57:57Z" level=fatal msg="Bad validate command used: '/usr/local/bin/fluentd -p /fluentd/plugins', either use correct one or none at all: invalid fluentd binary used /usr/local/bin/fluentd: fork/exec /usr/local/bin/fluentd: no such file or directory"

Not entirely sure of the source of this problem.

libjemalloc Arch Issue and Installed Already

I am working on porting this Helm Chart and the respective Docker containers to s390x. I noticed in the Docker container you set ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libjemalloc.so.1". This would not work for s390x since the path is slightly different, ENV LD_PRELOAD="/usr/lib/s390x-linux-gnu/libjemalloc.so.1". It is a small change to make and not a huge issue, but I am a little confused why you guys are install libjemalloc again?

Looking at the Dockefile you guys are pulling from, fluent/fluentd:v1.2.6-debian, you can see that they are already installing libjemalloc.

Here is the snip it:

 && wget -O /tmp/jemalloc-4.5.0.tar.bz2 https://github.com/jemalloc/jemalloc/releases/download/4.5.0/jemalloc-4.5.0.tar.bz2 \
 && cd /tmp && tar -xjf jemalloc-4.5.0.tar.bz2 && cd jemalloc-4.5.0/ \
 && ./configure && make \
 && mv lib/libjemalloc.so.2 /usr/lib \

From the looks of it, they are actually pulling in a newer version of libjemalloc that you can find via apt.

root@cfe3103b08f8:/usr/lib# apt search libjemalloc1
Sorting... Done
Full Text Search... Done
libjemalloc1/oldstable 3.6.0-9.1 s390x
  general-purpose scalable concurrent malloc(3) implementation

libjemalloc1-dbg/oldstable 3.6.0-9.1 s390x
  debug symbols for jemalloc

I am still pretty new to this project and still trying to figure things out, so please correct me if I am wrong. Is it that your application needs this version of libjemalloc you can get via apt?

Additionally, if the apt version is required could you add a small fix to make it arch agnostic? The Docker container your pulling from again, uses this snip it to achieve this by moving libjemalloc to /usr/lib not /usr/lib/ARCH/. Then export ENV LD_PRELOAD="/usr/lib/libjemalloc.so.1"

helm install fails

[centos@ip-172-31-4-30 ~]$

helm install --name kfo ./kube-fluentd-operator/log-router \
>   --set rbac.create=true \
>   --set image.tag=v1.11.0 \
>   --set image.repository=jvassev/kube-fluentd-operator

Error: release kfo failed: DaemonSet.apps "kfo-log-router" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app":"log-router", "release":"kfo"}: selector does not match template `labels

any reason why i get this error.
how can i fix this.

Thanks any help would be appreciated.

Add support for exception detection

I was wondering if the following plugin would be compatible with the way kube-fluentd-operator handles message routing. Being able to parse and forward multi-line exceptions would be a great feature.

https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions

detect_exceptions plugin is missing

I installed using the Helm chart and the copy/paste instructions in the README. I have the following config file:

<filter **>
  @type detect_exceptions
  # you can skip language in which case all possible languages will be tried: go, java, python, ruby, etc...
  language java
</filter>

Which produces this error:

[fluentd-operator-log-router-rnwrj] 2018-10-24 20:57:07 +0000 [error]: #0 config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Unknown filter plugin 'detect_exceptions'. Run 'gem search -rd fluent-plugin' to find plugins"

Move docker image to org account

Artefacts shouldn't be hosted in an individuals account if the repo is owned by the org.

If using a vmware docker account is problematic, using the github docker registry with github actions might be a good alternative.

Can we use fluentd cloudwatch plugin where I can stream logs directly to AWS CloudWatch

Does not match for labels containing .

I want to log epoxrt the following application(external-dns), but because of the dot in the label, it cannot be matched well. Is there any better way than adding a label without dots?
https://github.com/helm/charts/blob/master/stable/external-dns/templates/_helpers.tpl#L35-L40

Update of fluent-plugin-google-cloud?

Hi,

fluent-plugin-google-cloud seems really outdated and I was wondering if you could update it? I got some troubles using it (GoogleCloudPlatform/fluent-plugin-google-cloud#364). I see the Gemfile in the repository but I guess the version is specified inside the Gemfile.lock that was not committed, right?

Your README.md indeed says the version of this plugin is 0.4.10, note the current latest version is 0.7.27

Thank you,

Supporting Multi Process Workers

Hi @jvassev @vsakarov ,
Any plans to support workers config under "<system>" directive and "<worker N-M>" directive as fluentd already supports it?

https://docs.fluentd.org/deployment/multi-process-workers

Splitting control plane from DaemonSet

Hi, this is some great stuff that I was missing for a long time. I have a question did you considered splitting the "control-plane" e.g. config-reloader out of the DaemonSet? It would be really powerful if one could use any compatible fluentd image with whatever plugins in a DaemonSet. Config reloader would take care of the configuration (which could be running as Deployment) while DaemonSet will do the heavy lifting sync could be done through ConfigMap/Secret (like prometheus-operator does). If that is a direction you guys would like to follow I am happy to work on a PR.

Unknown filter plugin `parse`

For any of the examples in the documentation with @type parse, the reloader logs show this message:

error_class=Fluent::ConfigError error="Unknown filter plugin 'parse'. Run 'gem search -rd fluent-plugin' to find plugins"

One of the examples from the documentation:

<filter $labels(server=apache)>
  @type parse
  <parse>
    @type apache2
  </parse>
</filter>

<filter $labels(app=django)>
  @type detect_exceptions
  language python
</filter>

<match **>
  @type es
</match>

Any idea why that error hasn't shown up for you all?

Carriage return & New line

Hi,
I am using kfo to get container stdout logs from file system. When KFO read the log it also append "\r\n" (sometime only "\n") at end of each "log" json filed. Can you please fix template to remove it and tell me intermediate fix ?

{
"log": "2019-08-01 22:31:34.913 [INFO][19524] health.go 150 \n",
"stream": "stdout"
.....
}

Thanks

Provide a default configmap name, overridable with the annotation

A default configmap per namespace would remove the need to annotate every namespace. Also, very often deployments standardize on a configmap name.

Support multi-valued tags

Currently, to keep parser code simple this is not possible:

<match a b>
</match>

Users need to split their tag into two directives:

<match a>
</match>

<match b>
</match>

The usual space-delimited multi-valued tags should be supported too.

Support custom SSL certificates

Hi!

It would be great if we could define volumes through Helm chart.

For example I'm using this fluentd configuration to send logs to Elasticsearch cluster over HTTPS connection:

...
  ssl_version TLSv1_2
  ca_file "/.../ca.crt"
  client_key "/.../es01.key"
  client_cert "/.../es01.crt"

These files are stored in a ConfigMap, may the chart could leave the possibility to specify a ConfigMap name, and if it exists, mount the volume?

Thanks

Add fluent-plugin-bufferize plugin

Currently if you use https://github.com/toshitanian/fluent-plugin-out-http-ext there is no mechanism for retrying and failed requests. fluent-plugin-bufferize is the helper plugin to achieve this functionality.

P.S. fluent-plugin-out-http-ext is deprecated, so you might think of adding https://github.com/fluent-plugins-nursery/fluent-plugin-out-http .

Kube-system logs filtered out due to incorrect config

This is config the defined in templates (config-reloader/templates/kubernetes-postprocess.conf)

# Parse logs in the kube-system namespace using the kubernetes formatter.
<filter kube.kube-system.**>
  @type parser
  reserve_data true
  key_name log
  emit_invalid_record_to_error false
  <parse>
    @type kubernetes
    time_format %FT%T%:z
  </parse>
</filter>

But this is the config generated by the fluentd-operator.

<filter kube.kube-system.**>
  @type parser
  reserve_data true
  key_name "log"
  emit_invalid_record_to_error false
  <parse>
    @type "kubernetes"
    expression /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/m
    time_format "%m%d %H:%M:%S.%N"
  </parse>
</filter>

If I remove above config from the templates, build and deploy it on k8s cluster, logs from kube-system are streaming to remote LogInsight. And if the config is present in the above template, kube-system logs aren't streaming..

Sample log (from kube-apiserver):

I0327 14:07:14.412834       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.system.antrea.tanzu.vmware.com

Expression is matching, but I doubt the time_format. Does the time_format require an update in the template??

Kube system namespace logging

Dear Team,

I am not sure if this is right channel to address the issue related to kube system namesapce logging to cloudwatch however, we are kind of stuck in our environment where we want only kube system namespace logs to cloudwatch. Currently as per documentation if we define the logging at kube system namespace then it will also consider other namespaces for logging.
We have kind of more than 50 namespaces and each has its uniq requirement about log configuration.

We want to collect only kube system namespace logs for administrative and auditing purpose and also has additional admin apps other than k8s components.

Could you please let is know if this is possible and if yes , point to relevant documentation with example?

Thanks in advance.

Extreme validation

Running fluentd with --dry-run only checks the general config format. If a plugin is misconfigured it will cause the daemon to exit.

Find a way to validate config by actually running fluentd with the config file being validated and catch any errors.

jvassev/kube-fluentd-operator:v1.7.0 is missing

Documentation to install the operator points to use v1.7.0 but there is no docker image available. I have to choose latest or v1.6.0.

Can you push the image please? There is no documentation in Docker Hub either about how the image got built. A link to the Dockerfile will be also good.

default output plugin

As far I see it, its not possible to define a default output plugin for everything.
What I'm looking for is to define something like this centrally after all the namespace specific configuration.

<Match **>
@type elasticsearch
host ...
</Match>

In our use case namespace or application owners shouldn't need to to know about elasticsearch endpoints, credentials, connection tuning parameters.
By default everything should be shipped as is to a central elasticsearch (or any other output plugin).
Namespace owners can then optional modify, discard, augment events from their application using <filter> and <match> rules.

Is my assumption correct that this is not possible atm?
Bear with my I'm new to fluentd and the configuration language is giving me headaches.

So I think what would help is a special annotation for configmaps/secrets in kube-system that signals that this config is to be included after the namespace specific stuff in the main fluentd.conf file.

What do you think?

Slack plugin support

Hi :)

What do you think about supporting the Slack plugin? Is it legitimate for this fluentd operator?

https://github.com/sowawa/fluent-plugin-slack

Thanks

Attach original tag for 'mounted-file'

When using mounted-file it will be tag kube.{namespace}.{pod_name}.{container_name}, but I'd like to attach original tag(kube.{namespace}.{pod_name}.{container_name}.mytag).

example:

<source>
  @type mounted-file
  path /var/log/hello.log
  labels msg=hello
  tag hello-log
</source>

-> kube.{namespace}.{pod_name}.{container_name}.hello-log

Why?

I'd like to separate output process in the container with each tags.

example:

namespace: test
<source>
  @type mounted-file
  path /var/log/hello1.log
  labels msg=hello
  tag hello1-log
</source>
<source>
  @type mounted-file
  path /var/log/hello2.log
  labels msg=hello
  tag hello2-log
</source>

<match **.hello1-log>
  @type test123
</match>

<match **.hello2-log>
  @type test456
</match>

namespace: kube-system
<plugin test123>
  @type logzio_buffered
  endpoint_url https://listener.logz.io:8071?token=TOKEN&type=log-router
  buffer_path /some/path
</plugin>

<plugin test456>
  @type logzio_buffered
  endpoint_url https://listener.logz.io:8071?token=TOKEN&type=log-router
  buffer_path /some/path456
  buffer_size 1m
</plugin>

If you can do with the current specifications, please let me know.

Support of custom priorityClassName

It will be nice if you add define of custom priorityClassName in DaemonSet of kfo in helm chart.
For example in daemonset.yaml:

{{- if .Values.priorityClassName }}
      priorityClassName: "{{ .Values.controller.priorityClassName }}"

values.yaml:

priorityClassName: "customCriticalClass"

no hostname/nodename information logged

I noticed that the events do not contain any hints from which node they were shipped. This makes it very hard to reason about systemd logs (e.g. which kubelet emitted this error?).
I think it would be generally good to have the node name attached to every log line emitted. This should be doable using the downardAPI and just handing the nodename via env/flag to the config generator.

Support for CRI-O container output format

When using any other runtime than docker, kfo log router spams this error message :

00 [warn]: #0 [in_tail_container_logs] pattern not matched: \\\"2019-11-08T15:44:42.665533197Z stdout F 2019-11-08 15:44:42 +0000 [warn]: #0 [in_tail_container_logs] pattern not matched: \\\\\\\"2019-11-08T15:44:41.247410734Z stdout F \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\^C\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

This has been resolved by fluentd community, see :
cri-o/cri-o#897

How can we apply this patch for this operator ?

Many thanks,

Config Reloading Process

Hi guys, Thanks for this great project. I want to know if config-reloader restarts fluentd process or only reload configuration without restarting?

Thanks

Plugin macros does not preserve defined buffer configuration

I have defined macros in kube-system:

<plugin elastic>
  @type elasticsearch
  host elastic-nodes
  port 9200
  <buffer>
     @type file
     path /var/log/fluentd.*.buffer
  </buffer>      
</plugin>

in namespaces

<match **>
   @type elastic
</match>

Generated config is

<match kube.namespace.*.*._labels.>
  @type elasticsearch
  host elastic-nodes
  port 9200
</match>

I expect that macros keeps buffer configuration

Plugin macros

Following the discussion in #17 a macro for output plugins is needed. Consider this example:
kube-system.conf:

kube-system.conf
<plugin default_output>
  @type es
  username admin
  password s3cret
  buffer_size 1m
</plugin>

demo.conf:
<match **>
  @type default_output
  buffer_size 5m
</match>

When processing the config for the demo namespace, the @type default_output should be replaced with the definition of the plugin in the kube-system namespace:

demo.conf (after processing):
<match **>
  @type es
  username admin
  password s3cret
  buffer_size=5m
</match>

All params defined at the call site override the parameters in the plugin definition. Also, all post-processing rules should be applied: .pos file path rewriting, tag validation, etc.

Multiprocess Input Plugin support

Hi,
@jvassev Do you think adding multiprocess support will be good addition?
By default fluentd uses single cpu and multiprocess input plugin is recommended for performance-tuning .

Syslog based plugins fail because tag length is too long

We've tried both papertrail and the remote_syslog plugins and both use the syslog_protocol gem that follows RFC 3164 which limits tag length to 32 characters. We were getting this stack trace in our fluentd container logs:

2018-06-01 16:48:36 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2018-06-01 16:48:37 +0000 chunk="56d975adcff66f1a1e87fce8bdd3305a" error_class=ArgumentError error="Tag must not be longer than 32 characters"
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/syslog_protocol-0.9.2/lib/syslog_protocol/packet.rb:49:in `tag='
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-papertrail-0.2.4/lib/fluent/plugin/out_papertrail.rb:64:in `create_packet'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-papertrail-0.2.4/lib/fluent/plugin/out_papertrail.rb:36:in `block in write'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/event.rb:323:in `each'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/event.rb:323:in `block in each'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/event.rb:322:in `each'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-papertrail-0.2.4/lib/fluent/plugin/out_papertrail.rb:34:in `write'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/compat/output.rb:131:in `write'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:1096:in `try_flush'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:1329:in `flush_thread_run'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:439:in `block (2 levels) in start'
  2018-06-01 16:48:36 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

When we log out the tag it is:

kube.demo.nginx-65899c769f-5zj6d.nginx

which is indeed longer than 32 characters. This seems like a bug in the operator since it is creating these extra long tags. Perhaps they can be truncated? An alternative solution might be to support an output plugin for syslog that uses RFC 5424 and structured data for this info?

cc/ @wfernandes @ahevenor @st3v

Enable cross-namespace log sharing

Very often logs from a shared component needs to be available to many consumers, for example access logs from an ingress controller.

The owner of the ingress controller should be able to share a portion of the log stream to any namespace that deploys ingress resources.

The logs receiver must be able to ignore or persist/process this peer traffic.

Enable Helm chart to pass extraEnv to fluentd

This makes it possible to parameterize plugins using Fluent'd #{ENV['var_name']} syntax.

https://docs.fluentd.org/v0.12/articles/faq#how-can-i-use-environment-variables-to-configure-parameters-dynamically?

vmware / kube-fluentd-operator Goto Github PK

kube-fluentd-operator's Introduction

kube-fluentd-operator (KFO)

Overview

Try it out

Build

Project structure

Config-reloader

How does it work

Configuration

Basic usage

The admin namespace

Using the $labels macro

Ingest logs from a file in the container

Dealing with multi-line exception stacktraces (since v1.3.0)

Reusing output plugin definitions (since v1.6.0)

Retagging based on log contents (since v1.12.0)

Sharing logs between namespaces

Log metadata

Go templting

Custom resource definition(CRD) support (since v1.13.0)

Tracking Fluentd version

Plugins in latest release (1.18.1)

Synopsis

Helm chart

Cookbook

I want to use one destination for everything

I dont't care for systemd and docker logs

I want to use one destination but also want to just exclude a few pods

I am getting errors "namespaces is forbidden: ... cannot list namespaces at the cluster scope"

I have a legacy container that logs to /var/log/httpd/access.log

I want to push logs from namespace demo to logz.io

I want to push logs to a remote syslog server

I want to push logs to Humio

I want to push logs to papertrail

I want to push logs to an ELK cluster

I want to validate my config file before using it as a configmap

I want to use Fluentd @label to simplify processing

I want to parse ingress-nginx access logs and send them to a different log aggregator

I want to send logs to different sinks based on log contents

I have my kubectl configured and my configmaps ready. I want to see the generated files before deploying the Helm chart

I want to build a custom image with my own fluentd plugin

I run two clusters - in us-east-2 and eu-west-2. How to differentiate between them when pushing logs to a single location?

I don't want to annotate all my namespaces at all

How can I be sure to use a valid path for the .pos and .buf files

I dont like the annotation name logging.csp.vmware.com/fluentd-configmap

Known Issues

Releases

Resoures

Contributing

kube-fluentd-operator's People

Contributors

Stargazers

Watchers

Forkers

kube-fluentd-operator's Issues

Why?

Recommend Projects

Recommend Topics

Recommend Org

I want to push logs from namespace `demo` to logz.io