napsty / check_zpools Goto Github PK

View Code? Open in Web Editor NEW

21.0 8.0 10.0 28 KB

Monitor the usage and status of ZFS Pools (zpools)

Home Page: https://www.claudiokuenzler.com/monitoring-plugins/check_zpools.php

License: GNU General Public License v2.0

Shell 100.00%

nagios-plugins monitoring-plugins solaris zpool zfs bsd smartos opensolaris

check_zpools's Introduction

check_zpools

A Nagios/Icinga plugin to monitor ZFS Pools (zpools). It is based on "Check Solaris ZFS Pools" but is completely rewritten.

For my environment with different OS using ZFS (Solaris, OpenSolaris, SmartOS, FreeBSD) I needed a Nagios plugin which is running on all OS.

Based on my research (http://www.claudiokuenzler.com/blog/345/monitor-zfs-disk-pools-nagios-plugin-comparison) I finally decided to take an existing plugin and rewrite it.

You may find a full documentation with examples on: http://www.claudiokuenzler.com/monitoring-plugins/check_zpools.php

check_zpools's People

Contributors

Stargazers

Watchers

Forkers

carolinebeauchamp alexs77 cilahn hakong pv2b waoki brd xinqu32 kresike arakmar

check_zpools's Issues

License

Nice plugin, this is not a bug, but only requesting to add a License to the plugin.

these are the original code lines:
elif [[ $CAPACITY -gt $crit ]]; then echo "ZFS POOL $pool usage is CRITICAL (${CAPACITY}%|$pool=${CAPACITY}%)"; exit ${STATE_CRITICAL}
elif [[ $CAPACITY -gt $warn && $CAPACITY -lt $crit ]]; then echo "ZFS POOL $pool usage is WARNING (${CAPACITY}%)|$pool=${CAPACITY}%"; exit ${STATE_WARNING}

When critical parameter is e.g. 95% and warning parameter is 90%, then a check value of 95% results in an OK return value.

These are the corrected lines:

elif [[ $CAPACITY -ge $crit ]]; then echo "ZFS POOL $pool usage is CRITICAL (${CAPACITY}%|$pool=${CAPACITY}%)"; exit ${STATE_CRITICAL}
elif [[ $CAPACITY -ge $warn && $CAPACITY -lt $crit ]]; then echo "ZFS POOL $pool usage is WARNING (${CAPACITY}%)|$pool=${CAPACITY}%"; exit ${STATE_WARNING}

NRPE: Unable to read output

Hello!

I'm trying to use this plugin on OpenIndiana (open solaris fork) and cannot seem to get NRPE to read the output. I'm able to execute the check locally but not remotely. NRPE keeps giving me unable to read output errors. I'm already using NRPE to exectue a different plugin on this host so I don't think its a problem with how I configured NRPE.

LOCAL OS:

OpenIndiana (powered by illumos) SunOS 5.11 oi_151a9 November 2013

LOCAL EXECUTION OUTPUT EXAMPLES:

-bash-4.0$ ./check_zpools.sh -p BigD -w 90 -c 95
ZFS POOL BigD health is DEGRADED|BigD=57%

-bash-4.0$ ./check_zpools.sh -p rpool -w 90 -c 95
ALL ZFS POOLS OK (rpool)|rpool=80%

NRPE COMMAND EXAMPLE:

-bash-4.0$ ./check_nrpe e -H example.host -c check_zpools -a '-p BigD -w 90 -c 95'
NRPE: Unable to read output

any ideas?

Thank you,
Vince

Output truncated when shown in Nagios' Status Information field

Output of plugin when executed in terminal of FreeBSD system:

./check_zpools.sh -p zroot -w 70 -c 80
ZFS POOL zroot usage is CRITICAL (88%|zroot=88%)

Output of plugin as shown in Nagios' Status Information field:

ZFS POOL zroot usage is CRITICAL (88%

Inconsistent error messages and only one issue reported per pool

Command output is inconsistent when thresholds are given and using different arguments
for pools, for example:

crash@tesla:~/work/check_zpools$ sudo zpool status
  pool: mail
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Wed Feb 22 14:24:16 2023
config:

	NAME        STATE     READ WRITE CKSUM
	mail        DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    nbd0    UNAVAIL      0     0    40  corrupted data
	    nbd1    ONLINE       0     0     0

errors: No known data errors
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail -c 45 -w 40 || echo $?
ZFS POOL mail health is DEGRADED|mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL -c 45 -w 40 || echo $?
ZFS POOL ALARM: POOL mail usage is CRITICAL (50%)|mail=50%
2

Also the output only considers the first error and reports only that. Here we have two
possible issues, one is that the pool is DEGRADED, the other is that the usage is too high.

After my changes the output is more consistent and hopefully contains all issues:

crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail -c 45 -w 40 || echo $?
ZFS POOL ALARM: mail health is DEGRADED mail usage is CRITICAL (50%) |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail -c 65 -w 60 || echo $?
ZFS POOL ALARM: mail health is DEGRADED |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL -c 45 -w 40 || echo $?
ZFS POOL ALARM: mail health is DEGRADED POOL mail usage is CRITICAL (50%) |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL -c 65 -w 60 || echo $?
ZFS POOL ALARM: mail health is DEGRADED |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL || echo $?
ZFS POOL ALARM: mail health is DEGRADED|mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail || echo $?
ZFS POOL ALARM: mail health is DEGRADED|mail=50%
2

Also works ok with a normal ONLINE pool:

crash@tesla:~/work/check_zpools$ sudo zpool status
  pool: mail
 state: ONLINE
  scan: resilvered 244M in 00:00:07 with 0 errors on Wed Feb 22 15:28:24 2023
config:

	NAME        STATE     READ WRITE CKSUM
	mail        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0

errors: No known data errors
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail && echo $?
ALL ZFS POOLS OK (mail)|mail=50%
0
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL && echo $?
ALL ZFS POOLS OK (mail)|mail=50%
0
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail -w 40 -c 45 || echo $?
ZFS POOL ALARM: mail usage is CRITICAL (50%) |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL -w 40 -c 45 || echo $?
ZFS POOL ALARM: POOL mail usage is CRITICAL (50%) |mail=50%
2
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p ALL -w 60 -c 65 && echo $?
ALL ZFS POOLS OK (mail)|mail=50% 
0
crash@tesla:~/work/check_zpools$ ./check_zpools.sh -p mail -w 60 -c 65 && echo $?
ALL ZFS POOLS OK (mail)|mail=50%
0

I will add a pr shortly.

When given nonexistent pool name incorrectly reports OK

root@diskmaskin01:~ # /usr/local/libexec/nagios/check_zpools.sh -p thispooldoesnotexist ; echo $?
cannot open 'thispooldoesnotexist': no such pool
cannot open 'thispooldoesnotexist': no such pool
/usr/local/libexec/nagios/check_zpools.sh: line 125: [: !=: unary operator expected
ALL ZFS POOLS OK (thispooldoesnotexist)|thispooldoesnotexist=%
0
root@diskmaskin01:~ #

napsty / check_zpools Goto Github PK

check_zpools's Introduction

check_zpools

check_zpools's People

Contributors

Stargazers

Watchers

Forkers

check_zpools's Issues

License

Bug with Warn/Crit detection

NRPE: Unable to read output

Output truncated when shown in Nagios' Status Information field

Inconsistent error messages and only one issue reported per pool

When given nonexistent pool name incorrectly reports OK

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent