zalando-incubator / kubernetes-log-watcher Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 9.0 290 KB

Kubernetes log watcher for Scalyr and AppDynamics

License: MIT License

Python 90.22% Dockerfile 0.17% Jinja 9.61%

cloud kubernetes logging monitoring scalyr

kubernetes-log-watcher's People

Contributors

Stargazers

Watchers

Forkers

open-source-archive femueller linki eicnix dryewo christianberg pheanex boomskats syllogy

kubernetes-log-watcher's Issues

Support custom parsers for log targets (Scalyr)

It is often required that an application needs specific parser other than the default json.

The way this could be inspected is via labels or annotations.

labels:
    scalyr-parser: "my-app-my-custom-parser"  # meh!
annotations:
    <namespace>/scalyr-parser: '[{"container": "my-app", "parser": "my-custom-parser"}]'

Dynamic loading of log processing agents

Implement ability to load any agent processor (plugin)

Persist checkpoint file

Scalyr agent uses a checkpoint file to avoid log duplication. This should be highlighted/used in a Kubernetes cluster since the file should survive agent container restarts.

Add support for custom Scalyr server

In Scalyr config:

"scalyr_server": "https://custom-scalyr-server"

Scalyr: use rename_logfile instead of symlinks

Will enhance the overview of log files (no useless container IDs in log file names)

https://github.com/scalyr/scalyr-agent-2/releases/tag/v2.024

Make Journald monitor write rate limit configurable

Current settings:

"monitor_log_write_rate": 10000,
"monitor_log_max_write_burst": 200000

Scalyr: Use parse_lines_as_json for docker containers

As described here: https://github.com/scalyr/scalyr-agent-2/releases/tag/v2.0.29

Also suggested here: zalando-incubator/kubernetes-on-aws#620

Add node to Journald attributes

Cluster node will be useful in filtering

Adjust rate limits of journald monitor

suggested fix:

{
      "monitor_log_write_rate": 10000,
      "monitor_log_max_write_burst": 200000,
}

Scalyr: pod labels and annotations extraction fail

The issue is inconsistent, but in some pods created by cronjobs, log-watcher fails to get the labels and annotations which lead to breaking custom parsing and log search based on application or version fields.

Add support to run standalone

Push to pypi

Add docs

Doc strings + README

Gaps in shipped logs to Scalyr

There have been times where I've found logs for pods through the kubernetes dashboard UI or kubectl logs that have not been shipped to over to Scalyr even though more recent log lines of that instance made it up without an issue.

There seems to be no problems during idling behaviour of pods, but when things get busy - it seems like things do get missed.

Missing Logs from Containers in a CrashLoop

I've noticed that logs from containers that may have been in a crash-loop do not get collected. The observed behavior is that a new "pause" docker container (whose logs get ignored) gets created under a different container ID (in docker) when the crashloop is detected and that somehow the logs of the prior crashing container are abandoned.

Test crash-looping container:

apiVersion: v1
kind: Pod
metadata:
  name: crashloop
  labels:
    application: crashloop
    version: '1.0'
spec:
  containers:
  - name: container
    image: python:3.5
    command: ["/bin/sh"]
    args: ["-c", "sleep 3; echo 'hello'; exit 1;"]

We currently have our logs going up to Scalyr.

Switch to pykube-ng

As the pykube module is not maintained anymore (archived) we should switch to @hjacobs fork pykube-ng.

It also includes changes that we hardcoded into this project like HTTP retries:
#72
That's the part where it's resolved in pykube-ng

Scalyr: use group_logs_by in configuration

Allow grouping logs in UI by specific field (e.g. application, cluster, etc...)

Automate docker build

We need to automate docker build of agent master

Allow updating Scalyr API key

In case Scalyr API key was changed while there was an existing agent.json, the initContainer doesn't update the API key. This would cause scalyr-agent container to continuously fail.

initContainer should update the key, while keeping agent.json log targets intact.

Ref issue: zalando-incubator/kubernetes-on-aws#415

Add support to load config file

A config file would provide better and more flexible structure for describing configuration for the watcher and its agents (compared to env variables)

Making required labels configureable

We are evaluating the kubernetes-log-watcher but our label structure doesn't match application and version.
Therefore I would like to tackle this todo: Support extending (overriding) constraints (e.g. require application, version and build labels to monitor the container)

Is backwards compatibility necessary? Otherwise I would reuse the existing WATCHER_STRICT_LABELS environment variable to enable providing list of labels that need to be set so the pod will be monitored.

example: WATCHER_STRICT_LABELS: application,application

Support multiple namespaces

Right now only pods in default namespace are queried.

Add Contributing guidelines

Add cluster ID to log fields

Must be set as config var
Add cluster ID to log fields

Scalyr: Allow sampling rules for containers

Add support for Scalyr sampling rules to Pod containers. The implementation should be similar to Scalyr parsers support.

Example:

kubernetes-log-watcher/scalyr-sampling-rules: '[{"container": "my-container", "sampling-rules":[{ match_expression: "<expression here>", sampling_rate: 0 }]}]'

Init-container for scalyr daemon set broken

Looks like this was introduced in #32, around here: bf8d34d#diff-88b99bb28683bd5b7e3a204826ead112R125

This was a little tricky to debug, but here's a cleaned up version of that block with fixed yaml/json character escapes that worked for me and @chen-anders:

          pod.beta.kubernetes.io/init-containers: |
            [
              {
                "name": "init-scalyr-config",
                "image": "busybox",
                "imagePullPolicy": "IfNotPresent",
                "command": ["sh", "-c"],
                "args": [
                  "set -xe; if [ ! -f /mnt/scalyr/agent.json ]; then\n                    echo '{\n                      \"import_vars\": [\"WATCHER_SCALYR_API_KEY\", \"WATCHER_CLUSTER_ID\"],\n                      \"server_attributes\": {\"serverHost\": \"$WATCHER_CLUSTER_ID\"},\n                      \"implicit_agent_process_metrics_monitor\": false,\n                      \"implicit_metric_monitor\": false,\n                      \"api_key\": \"$WATCHER_SCALYR_API_KEY\",\n                      \"monitors\": [],\n                      \"logs\": []\n                      }' > /mnt/scalyr/agent.json;\n                      echo Updated agent.json to inital configuration;\n                  fi &&\n                  cat /mnt/scalyr/agent.json;\n                  test -f /mnt/scalyr-checkpoint/checkpoints.json && ls -lah /mnt/scalyr-checkpoint/checkpoints.json && cat /mnt/scalyr-checkpoint/checkpoints.json || true"
                ],
                "volumeMounts": [
                  {
                    "name": "scalyr-config",
                    "mountPath": "/mnt/scalyr"
                  },
                  {
                    "name": "scalyr-checkpoint",
                    "mountPath": "/mnt/scalyr-checkpoint"
                  }
                ]
              }
            ]

Some of those long spans of whitespace can be collapsed and probably the \ns in that long bash script can be replaced with ;s, but this is the first thing that didn't error out for us.

Make application and version optional

As long as there is no real constraint on deployments I do not see a reason why one should not simply send all logs to scalyr.

application should default to e.g. docker image

version should just be docker image version

this will be a pretty good default imho