Giter VIP home page Giter VIP logo

dashboard-linter's Introduction

Grafana Dashboard Linter

This tool is a command-line application to lint Grafana dashboards for common mistakes, and suggest best practices. To use the linter, run the following install commands:

$ go install github.com/grafana/dashboard-linter@latest
$ dashboard-linter lint dashboard.json

This tool is a work in progress and it's still very early days. The current capabilities are focused exclusively on dashboards that use a Prometheus data source.

See the docs for more detail.

dashboard-linter's People

Contributors

arajkumar avatar bentonam avatar clyang82 avatar cristiangreco avatar dasomeone avatar dependabot[bot] avatar dimitarvdimitrov avatar gaantunes avatar gotjosh avatar hdost avatar mrfreezeex avatar mshahzeb avatar reavessm avatar rgeyer avatar roidelapluie avatar suntala avatar tomwilkie avatar wuzuf avatar xorima avatar zeitlinger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dashboard-linter's Issues

Variable substitution in promql

It is perfectly valid for promql to use global, and dashboard templated variables throughout, including in range vector selectors.

Currently, promql which makes use of this will fail. An example is in the kubernetes mixin, on the network panels, this is used heavily in the form of [$interval:$resolution] and results in errors like the one shown below.

[panel-promql-rule] 'Kubernetes / Networking / Cluster': Dashboard 'Kubernetes / Networking / Cluster', panel 'Rate of TCP Retransmits out of all sent segments' invalid PromQL query 'sort_desc(sum(rate(node_netstat_Tcp_RetransSegs{cluster="$cluster"}[$interval:$resolution]) / rate(node_netstat_Tcp_OutSegs{cluster="$cluster"}[$interval:$resolution])) by (instance))': 1:69: parse error: bad duration syntax: ""

The linter should replace all global variables with a sane/unique value which can be validated against.

The linter should also replace all dashboard templated variables with a sane placeholder value, allowing it to be properly evaluated as promql.

This was solved in the closed-source iteration of this dashboard linter in https://github.com/grafana/cloud-onboarding/pull/412, and something similar should be implemented here.

Constant template should not be flagged by linter

I have a constant variable in a dashboard and it looks like it gets flagged by linter

     {
        "hide": 2,¬
        "name": "instance",¬
        "query": ".*",¬
        "skipUrlSync": false,¬
        "type": "constant"¬
      }
$HOME/go/bin/dashboard-linter lint mydashboard.json
[❌] Dashboard 'My dashboard' instance template should use datasource '$datasource', is currently ''

Should it be?

Linter throws an error when dashboard has multiple choice variables

Summary of the issue

When you have a multiple choice variable in your dashboard, linter will throw an error.

Cause of the issue

TemplateValue struct is expecting 2 fields as string (https://github.com/grafana/dashboard-linter/blob/main/lint/lint.go#L35), but multiple choice variables are arrays:

  "templating": {
    "list": [
      {
        "current": {
          "selected": true,
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
...

Expected behaviour

Linter shouldn't throw an error when the dashboard has multiple choice variables.

How to replicate

Just add a multiple choice variable to your dashboard. For example

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "iteration": 1649669111850,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "title": "Panel Title",
      "type": "timeseries"
    }
  ],
  "refresh": "",
  "schemaVersion": 35,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": true,
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
        "hide": 0,
        "includeAll": true,
        "label": "Variable",
        "multi": true,
        "name": "query0",
        "options": [
          {
            "selected": true,
            "text": "All",
            "value": "$__all"
          },
          {
            "selected": false,
            "text": "foo",
            "value": "foo"
          },
          {
            "selected": false,
            "text": "bar",
            "value": "bar"
          }
        ],
        "query": "foo,bar",
        "queryValue": "",
        "skipUrlSync": false,
        "type": "custom"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "New dashboard",
  "version": 0,
  "weekStart": ""
}

Value validation

Hi,

Before building such a feature I wanted to seek opinions from others as to if this should be part of this tool and if it is even valuable to others.

Problem I am trying to solve

There are values which I want to have set a specific way in my dashboards, for example:

  1. editable must always be false
  2. tags must be from a given list of approved tags (stops a lot of spelling & singular vs plural issues)
  3. some variables must always be set to the all
    etc

Proposed solution

Add in a config/cli flags to support additional checks based on desired outputs, with appropriate rules.
So we might have a rule called dashboard is not editable and that would only be enabled based on a flag.

Or some way of saying this json path should always look like this because it's how I need things to look in my world view. for example:

cfg:

valueMatchers:
  - path: "foo.bar"
     value:
       type: "exact" 
       value: "hello world"
  - path: "tags"
     value:
       type: "in" 
       list:
         - foo
         - bar

I'm sure I'm not explaining this 100% clearly but feedback would be

FEATURE REQUEST: Support linting multiple dashboards

Request to support linting multiple dashboards by providing a directory path instead of a filename. This could be based on a -d flag or merely checking if the provided arg is a directory or not.

The code should loop through all the .json files in the directory and lint them. Perhaps based on a -r flag this could support linting files recursively as well, picking up and .lint files along the way.

Linter will throw an error when a "Ad hoc filters" variable is used

Bug

When we have a "Ad hoc filters" variable created, linter will exit with error Error: failed to parse dashboard: invalid type for field 'query': <nil>

Expected behaviour

The code should be able to handle "ad hoc filters" and not throw an error when they exist.

Steps to reproduce

Just create a variable "Ad hoc filters" in your dashboard and run the linter.
For example:

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus"
          },
          "exemplar": true,
          "expr": "",
          "interval": "",
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "title": "Panel Title",
      "type": "timeseries"
    }
  ],
  "schemaVersion": 35,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "filters": [],
        "hide": 0,
        "name": "query0",
        "skipUrlSync": false,
        "type": "adhoc"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "New dashboard",
  "version": 0,
  "weekStart": ""
}

Golang CI Deprecations

Hey,

on running golangci I got the following output,

WARN [runner] The linter 'deadcode' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused. 
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused. 
WARN [runner] The linter 'ifshort' is deprecated (since v1.48.0) due to: The repository of the linter has been deprecated by the owner.  
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused. 

Non-Prometheus support (or at least exclusion)

When linting a dashboard which does not use prometheus targets, several linting errors will be returned.

These are not actually errors, they're just prometheus based rules being applied to non-prometheus targets and dashboards.

The linter should allow rules to opt-in to what type of panels/targets it can effectively evaluate for best practice, and simply ignore the rest.

This would mean that running the dashboard linter against a dashboard for which it has no best practices (such as query specific rules for a loki panel, as an example), no errors would be emitted.

In time, this would allow for rules specific to other datasources to be added.

Latest merge breaks CI builds

We have a build that pulls this repo, and the latest merge here broke the builds due to a go.mod replace directive being added here: https://github.com/grafana/dashboard-linter/pull/128/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R77

The error in question:

go: github.com/grafana/dashboard-linter@latest (in github.com/grafana/[email protected]):
The go.mod file for the module providing named packages contains one or
more replace directives. It must not contain directives that would cause
it to be interpreted differently than if it were the main module.

Datasource selection priority

In reviewing korfuri/django-prometheus#318 I noticed a couple of things.

  1. The panel-datasource-rule is very specific about the panel types it checks (graph, singlestat, and table) and therefore excluded timeseries. I've created #13 for this.
  2. The django dashboard actually sets the datasource on each target for the panel, rather than once on the panel itself. Both are "technically" valid. Should the panel-datasource rule accommodate this case, or should we be opinionated that different datasources on targets for the same panel is an anti-pattern? I believe that we should consider it an anti-pattern, and that there should be a target specific rule to check for it.

Thoughts @tomwilkie ?

Bug - Strict reports errors found when quiet is in use

Bug:

When you run the linter with --strict it reports failures even though from a ui perspective nothing is visible as a failure.

Analysis:

All of the Severity levels are iota based with Success being 0 and importantly Quiet being 4

https://github.com/grafana/dashboard-linter/blob/main/lint/lint.go#L11

The Results object is processed for MaximumSeverity and looks for the highest number based on iota: https://github.com/grafana/dashboard-linter/blob/main/lint/results.go#L160 which makes Quiet higher than the default Success so it is returned with the response of 4

Finally when strict is enabled, it will report errors for any level above Warning, https://github.com/grafana/dashboard-linter/blob/main/main.go#L91 which Quiet is, so errors are reported but they are not visible at all anywhere

Possible fixes:

  1. Update the MaximumSeverity to ignore Quiet - This may give people surprises later on
  2. Move Quiet to iota 1 so it will be: Success, Quiet, Warning ...
  3. If in strict mode not return if the result = Quiet. - This has a flaw that today Quiet is considered a higher level than error so errors could be hidden from strict response

Recommended fix: 2 - Change the iota and validate other tests don't fail

Why wasn't this caught:

It appears that the tests for MaximumSeverity haven't been written, so unfortunatley it's not been caught.

Feature Request: Output results as JSON

Hey,

As a possible feature it would be great to have results outputted in json so we can make more decisions on them in CI systems without text parsing, this could enable for example github actions/checks with annotations which would allow the ci system to annotate on the file where the issue(s) are

Check Idea - Editable: false

Hi 👋 ,

As an idea for a check, that editable is false, because if we are creating our dashboards as code surely the code object should be the source of truth and not overridden?

It can always be disabled in grafana when making purposeful amendments after all

Multi-select best practices

When a template variable has multi set to true, it should also have an allValue defined as .+. This check is performed on job and instance, but always requires those template vars to be multi select.

When a query includes a template variable that is multiselect, it must always be queried with regex matching, I.E. some_label=~"$some_template_var"

grafana panel

Architecture.
telegraf+influxdb+grafana
Requirements.
Is there a statistics panel to display server information, e.g., 20 servers in total, 4 abnormal and the rest normal.

image

Create releases for this repo

Hi 👋 ,

Would it be possible to have releases against this repo so I can track changes over time and not just be assuming I'm on latest or pulling latest all the time as this is causing me pain to keep track.

Thanks

Linting results in "panic: interface conversion: interface {} is nil, not string"

Hi,

I can't figure out why linting this file results in panic/crash:

dashboard-linter lint dashboards/sre/sre-kubernetes-events.json -c grafana-dashboard.lint.

panic: interface conversion: interface {} is nil, not string

goroutine 1 [running]:
github.com/grafana/dashboard-linter/lint.(*Template).UnmarshalJSON(0x14000796720, {0x140007b96d8, 0x321, 0x3f2})
        /Users/robin/go/pkg/mod/github.com/grafana/[email protected]/lint/lint.go:80 +0x37c
encoding/json.(*decodeState).object(0x14000110090, {0x1055503e0?, 0x14000796720?, 0x104b38a28?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:613 +0x650
encoding/json.(*decodeState).value(0x14000110090, {0x1055503e0?, 0x14000796720?, 0x6?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:374 +0x40
encoding/json.(*decodeState).array(0x14000110090, {0x1053ca420?, 0x14000112790?, 0x1?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:562 +0x58c
encoding/json.(*decodeState).value(0x14000110090, {0x1053ca420?, 0x14000112790?, 0x4?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:364 +0x70
encoding/json.(*decodeState).object(0x14000110090, {0x105454820?, 0x14000112790?, 0x104b3a004?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:775 +0xb34
encoding/json.(*decodeState).value(0x14000110090, {0x105454820?, 0x14000112790?, 0xa?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:374 +0x40
encoding/json.(*decodeState).object(0x14000110090, {0x1054626c0?, 0x14000112780?, 0x104b464fc?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:775 +0xb34
encoding/json.(*decodeState).value(0x14000110090, {0x1054626c0?, 0x14000112780?, 0x104b453fc?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:374 +0x40
encoding/json.(*decodeState).unmarshal(0x14000110090, {0x1054626c0?, 0x14000112780?})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:181 +0x184
encoding/json.Unmarshal({0x140007b4000, 0x5ac9, 0x5aca}, {0x1054626c0, 0x14000112780})
        /opt/homebrew/Cellar/go/1.20.5/libexec/src/encoding/json/decode.go:108 +0xf4
github.com/grafana/dashboard-linter/lint.NewDashboard({0x140007b4000, 0x5ac9, 0x5aca})
        /Users/robin/go/pkg/mod/github.com/grafana/[email protected]/lint/lint.go:287 +0x70
main.glob..func2(0x105c28080?, {0x1400010f4d0?, 0x1?, 0x3?})
        /Users/robin/go/pkg/mod/github.com/grafana/[email protected]/main.go:55 +0x12c
github.com/spf13/cobra.(*Command).execute(0x105c28080, {0x1400010f470, 0x3, 0x3})
        /Users/robin/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x5c8
github.com/spf13/cobra.(*Command).ExecuteC(0x105c28640)
        /Users/robin/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x35c
github.com/spf13/cobra.(*Command).Execute(...)
        /Users/robin/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
main.main()
        /Users/robin/go/pkg/mod/github.com/grafana/[email protected]/main.go:176 +0x28

grafana-dashboard.lint:

exclusions:
  template-job-rule:
  template-instance-rule:
  template-label-promql-rule:
  template-datasource-rule:
  panel-title-description-rule:
  panel-units-rule:
  target-promql-rule:
  target-rate-interval-rule:
  target-job-rule:
  target-instance-rule:

dashboards/sre/sre-kubernetes-events.json:

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "links": [
    {
      "asDropdown": true,
      "icon": "external link",
      "includeVars": false,
      "keepTime": true,
      "tags": [
        "sre"
      ],
      "targetBlank": false,
      "title": "SRE dashboards",
      "tooltip": "",
      "type": "dashboards",
      "url": ""
    }
  ],
  "liveNow": false,
  "panels": [
    {
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 13,
      "title": "Kubernetes Cluster",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_GMP}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "bars",
            "fillOpacity": 100,
            "gradientMode": "hue",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 1,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "yellow",
                "value": null
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 15,
      "interval": "$interval",
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_GMP}"
          },
          "editorMode": "code",
          "expr": "sum by(severity)(increase(logging_googleapis_com:byte_count{monitored_resource=\"k8s_cluster\",severity!=\"INFO\",log=\"events\"}[1m]))",
          "legendFormat": "__auto",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Kubernetes Cluster Events Count",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "custom": {
            "align": "auto",
            "cellOptions": {
              "type": "auto"
            },
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              },
              {
                "color": "#EAB839",
                "value": 90
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "Time"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 190
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "jsonPayload.involvedObject.apiVersion"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 371
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 1
      },
      "id": 14,
      "options": {
        "cellHeight": "sm",
        "footer": {
          "countRows": false,
          "fields": "",
          "reducer": [
            "sum"
          ],
          "show": false
        },
        "frameIndex": 2,
        "showHeader": true,
        "sortBy": []
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "resource.type=\"k8s_cluster\"\nlog_name=~\"projects/gcp-dev-.*/logs/events\"",
          "refId": "A"
        }
      ],
      "title": "Kubernetes Cluster Event Logs",
      "transformations": [
        {
          "id": "seriesToRows",
          "options": {}
        },
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "Metric": true
            },
            "indexByName": {},
            "renameByName": {}
          }
        }
      ],
      "type": "table"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 9
      },
      "id": 10,
      "panels": [],
      "title": "Kubernetes Container",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_GMP}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "bars",
            "fillOpacity": 100,
            "gradientMode": "hue",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 0,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byValue",
              "options": {
                "op": "gte",
                "reducer": "allIsZero",
                "value": 0
              }
            },
            "properties": [
              {
                "id": "custom.hideFrom",
                "value": {
                  "legend": true,
                  "tooltip": true,
                  "viz": true
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 7,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 9,
      "interval": "$interval",
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max"
          ],
          "displayMode": "table",
          "placement": "right",
          "showLegend": true,
          "sortBy": "Mean",
          "sortDesc": true
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_GMP}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "sum by(namespace_name,container_name,severity)(increase(logging_googleapis_com:byte_count{monitored_resource=\"k8s_container\",severity!=\"INFO\",namespace_name=~\"${namespace}\"}[1m]))",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "key": "Q-74fa07a5-79da-4e61-8ed4-5e23a5e9a935-0",
          "legendFormat": "{{namespace_name}}/{{container_name}}/{{severity}}",
          "range": true,
          "refId": "D"
        }
      ],
      "title": "Kubernetes Container Events Count",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "gridPos": {
        "h": 13,
        "w": 12,
        "x": 0,
        "y": 17
      },
      "id": 11,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "targets": [
        {
          "bucketId": "",
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "severity=ERROR\nresource.type=\"k8s_container\"",
          "refId": "A",
          "viewId": ""
        }
      ],
      "title": "Kubernetes Container Error Logs",
      "type": "logs"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "gridPos": {
        "h": 13,
        "w": 12,
        "x": 12,
        "y": 17
      },
      "id": 12,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "targets": [
        {
          "bucketId": "",
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "severity=WARNING\nresource.type=\"k8s_container\"",
          "refId": "A",
          "viewId": ""
        }
      ],
      "title": "Kubernetes Container Warning Logs",
      "type": "logs"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 30
      },
      "id": 8,
      "panels": [],
      "title": "Kubernetes Cluster Autoscaling",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 100,
            "gradientMode": "hue",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "decimals": 0,
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 0,
        "y": 31
      },
      "id": 6,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(kube_node_info)",
          "legendFormat": "Nodes",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Kubernetes Cluster Node Scaling",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 12,
        "y": 31
      },
      "id": 7,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "targets": [
        {
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "resource.type=\"k8s_cluster\"\nlogName=~\"projects/gcp-dev-.*/logs/container.googleapis.com%2Fcluster-autoscaler-visibility\" severity>=DEFAULT\n( \"decision\" NOT \"noDecisionStatus\" )",
          "refId": "A"
        }
      ],
      "title": "Kubernetes Cluster Autoscaler Logs",
      "transformations": [
        {
          "id": "extractFields",
          "options": {
            "format": "kvp",
            "keepTime": false,
            "replace": false,
            "source": "content"
          }
        }
      ],
      "type": "logs"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 38
      },
      "id": 4,
      "panels": [],
      "title": "Kubernetes Cluster Upgrades",
      "type": "row"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 0,
        "y": 39
      },
      "id": 2,
      "maxDataPoints": 9999999999,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "targets": [
        {
          "bucketId": "",
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "protoPayload.methodName=\"google.container.internal.ClusterManagerInternal.UpdateClusterInternal\"\nresource.type=\"gke_cluster\"",
          "refId": "A",
          "viewId": ""
        }
      ],
      "title": "GKE upgrades - Master/Controlplane",
      "type": "logs"
    },
    {
      "datasource": {
        "type": "googlecloud-logging-datasource",
        "uid": "${DS_GCP_LOGGING}"
      },
      "gridPos": {
        "h": 7,
        "w": 12,
        "x": 12,
        "y": 39
      },
      "id": 3,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "targets": [
        {
          "bucketId": "",
          "datasource": {
            "type": "googlecloud-logging-datasource",
            "uid": "${DS_GCP_LOGGING}"
          },
          "projectId": "${GCP_PROJECT_ID}",
          "queryText": "protoPayload.methodName=\"google.container.internal.ClusterManagerInternal.UpdateClusterInternal\"\nresource.type=\"gke_nodepool\"",
          "refId": "A",
          "viewId": ""
        }
      ],
      "title": "GKE upgrades - NodePool",
      "type": "logs"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "custom": {
            "fillOpacity": 70,
            "lineWidth": 0,
            "spanNulls": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 12,
        "x": 0,
        "y": 46
      },
      "id": 5,
      "options": {
        "alignValue": "left",
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "mergeValues": true,
        "rowHeight": 0.9,
        "showValue": "auto",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(kube_node_info) by (kubelet_version)",
          "legendFormat": "__auto",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Kubernetes versions",
      "type": "state-timeline"
    }
  ],
  "refresh": "",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [
    "sre",
    "kubernetes"
  ],
  "templating": {
    "list": [
      {
        "allValue": "",
        "current": {
          "selected": true,
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(namespace)",
        "hide": 0,
        "includeAll": true,
        "label": "Namespace",
        "multi": true,
        "name": "namespace",
        "options": [],
        "query": {
          "query": "label_values(namespace)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 5,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "1h",
          "value": "1h"
        },
        "hide": 0,
        "includeAll": false,
        "label": "Interval",
        "multi": false,
        "name": "interval",
        "options": [
          {
            "selected": false,
            "text": "30s",
            "value": "30s"
          },
          {
            "selected": false,
            "text": "60s",
            "value": "60s"
          },
          {
            "selected": false,
            "text": "10m",
            "value": "10m"
          },
          {
            "selected": true,
            "text": "1h",
            "value": "1h"
          },
          {
            "selected": false,
            "text": "12h",
            "value": "12h"
          },
          {
            "selected": false,
            "text": "1d",
            "value": "1d"
          }
        ],
        "query": "30s,60s,10m,1h,12h,1d",
        "queryValue": "",
        "skipUrlSync": false,
        "type": "custom"
      },
      {
        "current": {
          "selected": false,
          "text": "victoria-metrics",
          "value": "victoria-metrics"
        },
        "hide": 2,
        "includeAll": false,
        "multi": false,
        "name": "DS_PROMETHEUS",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "/victoria-metrics/",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "google-managed-prometheus",
          "value": "google-managed-prometheus"
        },
        "hide": 2,
        "includeAll": false,
        "label": "",
        "multi": false,
        "name": "DS_GMP",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "/google/",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "Google Cloud Logging",
          "value": "Google Cloud Logging"
        },
        "hide": 2,
        "includeAll": false,
        "multi": false,
        "name": "DS_GCP_LOGGING",
        "options": [],
        "query": "googlecloud-logging-datasource",
        "refresh": 1,
        "regex": "/Google Cloud Logging/",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "gcp-dev",
          "value": "gcp-dev"
        },
        "datasource": {
          "type": "googlecloud-logging-datasource",
          "uid": "${DS_GCP_LOGGING}"
        },
        "definition": "",
        "hide": 2,
        "includeAll": false,
        "multi": false,
        "name": "GCP_PROJECT_ID",
        "options": [],
        "query": {
          "bucketId": "",
          "loading": false,
          "projectId": "${GCP_PROJECT_ID}",
          "refId": "CloudLoggingVariableQueryEditor-VariableQuery",
          "selectedQueryType": "projects",
          "viewId": ""
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Kubernetes Events",
  "uid": "sre-kubernetes-events",
  "version": 1,
  "weekStart": ""
}

Create documentation with details on each rule

Currently, each rule is very tersely described. Each rule should have a longer one-pager to describe "why" the rule exists, and what best practice it enforces or encourages.

This information will be useful for consumers when deciding when/if they should exclude a given rule, and serve as the canonical manual of best practices.

Dashboard variable with type prometheus without $dashboard is not flagged by linter

Dashboard variable like below is not flagged by linter. You can see that UID is assigned specific value pjNv0fYGz. Such dashboard is not portable across different grafana environments. It would be nice for a linter to flag such case.

      {¬
        "current": {¬
          "selected": false,¬
          "text": "X",¬
          "value": "X"¬
        },¬
        "datasource": {¬
          "type": "prometheus",¬
          "uid": "pjNv0fYGz"¬
        },¬
        "definition": "definition",¬
        "hide": 0,¬
        "includeAll": false,¬
        "label": "Customer ID",¬
        "multi": false,¬
        "name": "customerid",¬
        "options": [],¬
        "query": {¬
          "query": "PromQL",¬
          "refId": "StandardVariableQuery"¬
        },¬
        "refresh": 2,¬
        "regex": "",¬
        "skipUrlSync": false,¬
        "sort": 0,¬
        "type": "query"¬
      }

Any query having a multi=true template variable should be aggregated

All the queries that use a template variable which allows multiselection should be aggregated:

  • Either by using an aggregator function in PromQL or LogQL
  • Or by using an aggregation option from the Grafana panel options

This will avoid a situation where we see multiple repeating values unintentionally in a panel like "Single stat".

This rule should also be limited to stat or panels that are used to show aggregated stats.

Target idx not set for target-rate-interval-rule

When the rate-interval rule is triggered, it does not include the target idx, making it impossible to selectively exclude queries in the .lint file.

Example from the grafana/jsonnet-libs/consul-mixin.

[target-rate-interval-rule] 'Consul Overview': Dashboard 'Consul Overview', panel 'Latency' invalid PromQL query 'sum(rate(consul_http_request{job=~"$job"}[5m])) / sum(rate(consul_http_request{job=~"$job"}[5m]))': should use $__rate_interval

[Feature Request] Filter Output To Disply Linter Errors

Currently, the dashboard-linter shows an OK log with linter errors if any. This makes the output look very sparse, requiring the user to scroll through to find any errors.

An ideal feature would be a flag --show-errors that only shows the linter errors.

Current Output:

Checks that each panel uses the templated datasource.                                                                             
[✔️] OK                                                                                                                            
[✔️] OK                                                                                                                            
[❌] Dashboard '<dashboard>', panel '<panel>' does not use templates datasource, uses '<source>'            
[❌] Dashboard '<dashboard>', panel '<panel>' does not use templates datasource, uses '<source>'   
[✔️] OK                                                                                                                            

Proposed Feature's Output:

Checks that each panel uses the templated datasource.                                                                             
[❌] Dashboard '<dashboard>', panel '<panel>' does not use templates datasource, uses '<source>'            
[❌] Dashboard '<dashboard>', panel '<panel>' does not use templates datasource, uses '<source>'   

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.