zalando-incubator / kubernetes-log-watcher Goto Github PK
View Code? Open in Web Editor NEWKubernetes log watcher for Scalyr and AppDynamics
License: MIT License
Kubernetes log watcher for Scalyr and AppDynamics
License: MIT License
It is often required that an application needs specific parser other than the default json
.
The way this could be inspected is via labels
or annotations
.
labels:
scalyr-parser: "my-app-my-custom-parser" # meh!
annotations:
<namespace>/scalyr-parser: '[{"container": "my-app", "parser": "my-custom-parser"}]'
Implement ability to load any agent processor (plugin)
Scalyr agent uses a checkpoint file to avoid log duplication. This should be highlighted/used in a Kubernetes cluster since the file should survive agent container restarts.
In Scalyr config:
"scalyr_server": "https://custom-scalyr-server"
Timeouts are not properly set.
Will enhance the overview of log files (no useless container IDs in log file names)
https://github.com/scalyr/scalyr-agent-2/releases/tag/v2.024
Current settings:
"monitor_log_write_rate": 10000,
"monitor_log_max_write_burst": 200000
Add support for scalyr agent.
As described here: https://github.com/scalyr/scalyr-agent-2/releases/tag/v2.0.29
Also suggested here: zalando-incubator/kubernetes-on-aws#620
Cluster node
will be useful in filtering
kubernetes
iso k8s
suggested fix:
{
"monitor_log_write_rate": 10000,
"monitor_log_max_write_burst": 200000,
}
The issue is inconsistent, but in some pods created by cronjobs, log-watcher fails to get the labels
and annotations
which lead to breaking custom parsing and log search based on application
or version
fields.
Push to pypi
Doc strings + README
There have been times where I've found logs for pods through the kubernetes dashboard UI or kubectl logs
that have not been shipped to over to Scalyr even though more recent log lines of that instance made it up without an issue.
There seems to be no problems during idling behaviour of pods, but when things get busy - it seems like things do get missed.
I've noticed that logs from containers that may have been in a crash-loop do not get collected. The observed behavior is that a new "pause" docker container (whose logs get ignored) gets created under a different container ID (in docker) when the crashloop is detected and that somehow the logs of the prior crashing container are abandoned.
Test crash-looping container:
apiVersion: v1
kind: Pod
metadata:
name: crashloop
labels:
application: crashloop
version: '1.0'
spec:
containers:
- name: container
image: python:3.5
command: ["/bin/sh"]
args: ["-c", "sleep 3; echo 'hello'; exit 1;"]
We currently have our logs going up to Scalyr.
Allow grouping logs in UI by specific field (e.g. application, cluster, etc...)
We need to automate docker build of agent master
In case Scalyr API key was changed while there was an existing agent.json
, the initContainer
doesn't update the API key. This would cause scalyr-agent
container to continuously fail.
initContainer
should update the key, while keeping agent.json
log targets intact.
Ref issue: zalando-incubator/kubernetes-on-aws#415
A config file would provide better and more flexible structure for describing configuration for the watcher and its agents (compared to env variables)
We are evaluating the kubernetes-log-watcher but our label structure doesn't match application
and version
.
Therefore I would like to tackle this todo: Support extending (overriding) constraints (e.g. require application, version and build labels to monitor the container)
Is backwards compatibility necessary? Otherwise I would reuse the existing WATCHER_STRICT_LABELS
environment variable to enable providing list of labels that need to be set so the pod will be monitored.
example: WATCHER_STRICT_LABELS: application,application
Right now only pods in default
namespace are queried.
Add Contributing guidelines
Must be set as config var
Add cluster ID to log fields
Add support for Scalyr sampling rules to Pod containers. The implementation should be similar to Scalyr parsers support.
Example:
kubernetes-log-watcher/scalyr-sampling-rules: '[{"container": "my-container", "sampling-rules":[{ match_expression: "<expression here>", sampling_rate: 0 }]}]'
Looks like this was introduced in #32, around here: bf8d34d#diff-88b99bb28683bd5b7e3a204826ead112R125
This was a little tricky to debug, but here's a cleaned up version of that block with fixed yaml/json character escapes that worked for me and @chen-anders:
pod.beta.kubernetes.io/init-containers: |
[
{
"name": "init-scalyr-config",
"image": "busybox",
"imagePullPolicy": "IfNotPresent",
"command": ["sh", "-c"],
"args": [
"set -xe; if [ ! -f /mnt/scalyr/agent.json ]; then\n echo '{\n \"import_vars\": [\"WATCHER_SCALYR_API_KEY\", \"WATCHER_CLUSTER_ID\"],\n \"server_attributes\": {\"serverHost\": \"$WATCHER_CLUSTER_ID\"},\n \"implicit_agent_process_metrics_monitor\": false,\n \"implicit_metric_monitor\": false,\n \"api_key\": \"$WATCHER_SCALYR_API_KEY\",\n \"monitors\": [],\n \"logs\": []\n }' > /mnt/scalyr/agent.json;\n echo Updated agent.json to inital configuration;\n fi &&\n cat /mnt/scalyr/agent.json;\n test -f /mnt/scalyr-checkpoint/checkpoints.json && ls -lah /mnt/scalyr-checkpoint/checkpoints.json && cat /mnt/scalyr-checkpoint/checkpoints.json || true"
],
"volumeMounts": [
{
"name": "scalyr-config",
"mountPath": "/mnt/scalyr"
},
{
"name": "scalyr-checkpoint",
"mountPath": "/mnt/scalyr-checkpoint"
}
]
}
]
Some of those long spans of whitespace can be collapsed and probably the \n
s in that long bash script can be replaced with ;
s, but this is the first thing that didn't error out for us.
As long as there is no real constraint on deployments I do not see a reason why one should not simply send all logs to scalyr.
application should default to e.g. docker image
version should just be docker image version
this will be a pretty good default imho
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.