prometheus / node_exporter Goto Github PK

Exporter for machine metrics

License: Apache License 2.0

Go 96.84% Makefile 0.52% Shell 1.92% C 0.67% Dockerfile 0.04%

prometheus-exporter prometheus node-metrics machine-metrics host-metrics metrics procfs system-information system-metrics

node_exporter's Introduction

Node exporter

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The Windows exporter is recommended for Windows users. To expose NVIDIA GPU metrics, prometheus-dcgm can be used.

Installation and Usage

If you are new to Prometheus and node_exporter there is a simple step-by-step guide.

The node_exporter listens on HTTP port 9100 by default. See the --help output for more options.

Ansible

For automated installs with Ansible, there is the Prometheus Community role.

Docker

The node_exporter is designed to monitor the host system. Deploying in containers requires extra care in order to avoid monitoring the container itself.

For situations where containerized deployment is needed, some extra flags must be used to allow the node_exporter access to the host namespaces.

Be aware that any non-root mount points you want to monitor will need to be bind-mounted into the container.

If you start container for host monitoring, specify path.rootfs argument. This argument must match path in bind-mount of host root. The node_exporter will use path.rootfs as prefix to access host filesystem.

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host

For Docker compose, similar flag changes are needed.

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

On some systems, the timex collector requires an additional Docker flag, --cap-add=SYS_TIME, in order to access the required syscalls.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag. To enable only some specific collector(s), use --collector.disable-defaults --collector.<name> ....

Include & Exclude flags

A few collectors can be configured to include or exclude certain patterns using dedicated flags. The exclude flags are used to indicate "all except", while the include flags are used to say "none except". Note that these flags are mutually exclusive on collectors that support both.

Example:

--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)

List:

Collector	Scope	Include Flag	Exclude Flag
arp	device	--collector.arp.device-include	--collector.arp.device-exclude
cpu	bugs	--collector.cpu.info.bugs-include	N/A
cpu	flags	--collector.cpu.info.flags-include	N/A
diskstats	device	--collector.diskstats.device-include	--collector.diskstats.device-exclude
ethtool	device	--collector.ethtool.device-include	--collector.ethtool.device-exclude
ethtool	metrics	--collector.ethtool.metrics-include	N/A
filesystem	fs-types	N/A	--collector.filesystem.fs-types-exclude
filesystem	mount-points	N/A	--collector.filesystem.mount-points-exclude
hwmon	chip	--collector.hwmon.chip-include	--collector.hwmon.chip-exclude
netdev	device	--collector.netdev.device-include	--collector.netdev.device-exclude
qdisk	device	--collector.qdisk.device-include	--collector.qdisk.device-exclude
sysctl	all	--collector.sysctl.include	N/A
systemd	unit	--collector.systemd.unit-include	--collector.systemd.unit-exclude

Enabled by default

Name	Description	OS
arp	Exposes ARP statistics from `/proc/net/arp`.	Linux
bcache	Exposes bcache statistics from `/sys/fs/bcache/`.	Linux
bonding	Exposes the number of configured and active slaves of Linux bonding interfaces.	Linux
btrfs	Exposes btrfs statistics	Linux
boottime	Exposes system boot time derived from the `kern.boottime` sysctl.	Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack	Shows conntrack statistics (does nothing if no `/proc/sys/net/netfilter/` present).	Linux
cpu	Exposes CPU statistics	Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD
cpufreq	Exposes CPU frequency statistics	Linux, Solaris
diskstats	Exposes disk I/O statistics.	Darwin, Linux, OpenBSD
dmi	Expose Desktop Management Interface (DMI) info from `/sys/class/dmi/id/`	Linux
edac	Exposes error detection and correction statistics.	Linux
entropy	Exposes available entropy.	Linux
exec	Exposes execution statistics.	Dragonfly, FreeBSD
fibrechannel	Exposes fibre channel information and statistics from `/sys/class/fc_host/`.	Linux
filefd	Exposes file descriptor statistics from `/proc/sys/fs/file-nr`.	Linux
filesystem	Exposes filesystem statistics, such as disk space used.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	Expose hardware monitoring and sensor data from `/sys/class/hwmon/`.	Linux
infiniband	Exposes network statistics specific to InfiniBand and Intel OmniPath configurations.	Linux
ipvs	Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats`.	Linux
loadavg	Exposes load average.	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm	Exposes statistics about devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present).	Linux
meminfo	Exposes memory statistics.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass	Exposes network interface info from `/sys/class/net/`	Linux
netdev	Exposes network interface statistics such as bytes transferred.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netisr	Exposes netisr statistics	FreeBSD
netstat	Exposes network statistics from `/proc/net/netstat`. This is the same information as `netstat -s`.	Linux
nfs	Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same information as `nfsstat -c`.	Linux
nfsd	Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd`. This is the same information as `nfsstat -s`.	Linux
nvme	Exposes NVMe info from `/sys/class/nvme/`	Linux
os	Expose OS release info from `/etc/os-release` or `/usr/lib/os-release`	any
powersupplyclass	Exposes Power Supply statistics from `/sys/class/power_supply`	Linux
pressure	Exposes pressure stall statistics from `/proc/pressure/`.	Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl	Exposes various statistics from `/sys/class/powercap`.	Linux
schedstat	Exposes task scheduler statistics from `/proc/schedstat`.	Linux
selinux	Exposes SELinux statistics.	Linux
sockstat	Exposes various statistics from `/proc/net/sockstat`.	Linux
softnet	Exposes statistics from `/proc/net/softnet_stat`.	Linux
stat	Exposes various statistics from `/proc/stat`. This includes boot time, forks and interrupts.	Linux
tapestats	Exposes statistics from `/sys/class/scsi_tape`.	Linux
textfile	Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set.	any
thermal	Exposes thermal statistics like `pmset -g therm`.	Darwin
thermal_zone	Exposes thermal zone & cooling device statistics from `/sys/class/thermal`.	Linux
time	Exposes the current system time.	any
timex	Exposes selected adjtimex(2) system call stats.	Linux
udp_queues	Exposes UDP total lengths of the rx_queue and tx_queue from `/proc/net/udp` and `/proc/net/udp6`.	Linux
uname	Exposes system information as provided by the uname system call.	Darwin, FreeBSD, Linux, OpenBSD
vmstat	Exposes statistics from `/proc/vmstat`.	Linux
watchdog	Exposes statistics from `/sys/class/watchdog`	Linux
xfs	Exposes XFS runtime statistics.	Linux (kernel 4.4+)
zfs	Exposes ZFS performance statistics.	FreeBSD, Linux, Solaris

Disabled by default

node_exporter also implements a number of collectors that are disabled by default. Reasons for this vary by collector, and may include:

High cardinality
Prolonged runtime that exceeds the Prometheus scrape_interval or scrape_timeout
Significant resource demands on the host

You can enable additional collectors as desired by adding them to your init system's or service supervisor's startup configuration for node_exporter but caution is advised. Enable at most one at a time, testing first on a non-production system, then by hand on a single production node. When enabling additional collectors, you should carefully monitor the change by observing the scrape_duration_seconds metric to ensure that collection completes and does not time out. In addition, monitor the scrape_samples_post_metric_relabeling metric to see the changes in cardinality.

Name	Description	OS
buddyinfo	Exposes statistics of memory fragments as reported by /proc/buddyinfo.	Linux
cgroups	A summary of the number of active and enabled cgroups	Linux
cpu_vulnerabilities	Exposes CPU vulnerability information from sysfs.	Linux
devstat	Exposes device statistics	Dragonfly, FreeBSD
drm	Expose GPU metrics using sysfs / DRM, `amdgpu` is the only driver which exposes this information through DRM	Linux
drbd	Exposes Distributed Replicated Block Device statistics (to version 8.4)	Linux
ethtool	Exposes network interface information and network driver statistics equivalent to `ethtool`, `ethtool -S`, and `ethtool -i`.	Linux
interrupts	Exposes detailed interrupts statistics.	Linux, OpenBSD
ksmd	Exposes kernel and system statistics from `/sys/kernel/mm/ksm`.	Linux
lnstat	Exposes stats from `/proc/net/stat/`.	Linux
logind	Exposes session counts from logind.	Linux
meminfo_numa	Exposes memory statistics from `/sys/devices/system/node/node[0-9]/meminfo`, `/sys/devices/system/node/node[0-9]/numastat`.	Linux
mountstats	Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics.	Linux
network_route	Exposes the routing table as metrics	Linux
perf	Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings).	Linux
processes	Exposes aggregate process statistics from `/proc`.	Linux
qdisc	Exposes queuing discipline statistics	Linux
slabinfo	Exposes slab statistics from `/proc/slabinfo`. Note that permission of `/proc/slabinfo` is usually 0400, so set it appropriately.	Linux
softirqs	Exposes detailed softirq statistics from `/proc/softirqs`.	Linux
sysctl	Expose sysctl values from `/proc/sys`. Use `--collector.sysctl.include(-info)` to configure.	Linux
systemd	Exposes service and system status from systemd.	Linux
tcpstat	Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.)	Linux
wifi	Exposes WiFi device and station statistics.	Linux
xfrm	Exposes statistics from `/proc/net/xfrm_stat`	Linux
zoneinfo	Exposes NUMA memory zone metrics.	Linux

Deprecated

These collectors are deprecated and will be removed in the next major release.

Name	Description	OS
ntp	Exposes local NTP daemon health to check time	any
runit	Exposes service status from runit.	any
supervisord	Exposes service status from supervisord.	any

Perf Collector

The perf collector may not work out of the box on some Linux systems due to kernel configuration and security settings. To allow access, set the following sysctl parameter:

sysctl -w kernel.perf_event_paranoid=X

2 allow only user-space measurements (default since Linux 4.6).
1 allow both kernel and user measurements (default before Linux 4.6).
0 allow access to CPU-specific data but not raw tracepoint samples.
-1 no restrictions.

Depending on the configured value different metrics will be available, for most cases 0 will provide the most complete set. For more information see man 2 perf_event_open.

By default, the perf collector will only collect metrics of the CPUs that node_exporter is running on (ie runtime.NumCPU. If this is insufficient (e.g. if you run node_exporter with its CPU affinity set to specific CPUs), you can specify a list of alternate CPUs by using the --collector.perf.cpus flag. For example, to collect metrics on CPUs 2-6, you would specify: --collector.perf --collector.perf.cpus=2-6. The CPU configuration is zero indexed and can also take a stride value; e.g. --collector.perf --collector.perf.cpus=1-10:5 would collect on CPUs 1, 5, and 10.

The perf collector is also able to collect tracepoint counts when using the --collector.perf.tracepoint flag. Tracepoints can be found using perf list or from debugfs. And example usage of this would be --collector.perf.tracepoint="sched:sched_process_exec".

Sysctl Collector

The sysctl collector can be enabled with --collector.sysctl. It supports exposing numeric sysctl values as metrics using the --collector.sysctl.include flag and string values as info metrics by using the --collector.sysctl.include-info flag. The flags can be repeated. For sysctl with multiple numeric values, an optional mapping can be given to expose each value as its own metric. Otherwise an index label is used to identify the different fields.

Examples

Numeric values

Single values

Using --collector.sysctl.include=vm.user_reserve_kbytes: vm.user_reserve_kbytes = 131072 -> node_sysctl_vm_user_reserve_kbytes 131072

Multiple values

A sysctl can contain multiple values, for example:

net.ipv4.tcp_rmem = 4096	131072	6291456

Using --collector.sysctl.include=net.ipv4.tcp_rmem the collector will expose:

node_sysctl_net_ipv4_tcp_rmem{index="0"} 4096
node_sysctl_net_ipv4_tcp_rmem{index="1"} 131072
node_sysctl_net_ipv4_tcp_rmem{index="2"} 6291456

If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics by appending the mapping to the --collector.sysctl.include flag: Using --collector.sysctl.include=net.ipv4.tcp_rmem:min,default,max the collector will expose:

node_sysctl_net_ipv4_tcp_rmem_min 4096
node_sysctl_net_ipv4_tcp_rmem_default 131072
node_sysctl_net_ipv4_tcp_rmem_max 6291456

String values

String values need to be exposed as info metric. The user selects them by using the --collector.sysctl.include-info flag.

Single values

kernel.core_pattern = core -> node_sysctl_info{key="kernel.core_pattern_info", value="core"} 1

Multiple values

Given the following sysctl:

kernel.seccomp.actions_avail = kill_process kill_thread trap errno trace log allow

Setting --collector.sysctl.include-info=kernel.seccomp.actions_avail will yield:

node_sysctl_info{key="kernel.seccomp.actions_avail", index="0", value="kill_process"} 1
node_sysctl_info{key="kernel.seccomp.actions_avail", index="1", value="kill_thread"} 1
...

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the node_exporter commandline. The collector will parse all files in that directory matching the glob *.prom using the text format. Note: Timestamps are not supported.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

The node_exporter will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.

For advanced use the node_exporter can be passed an optional list of collectors to filter metrics. The collect[] parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.

  params:
    collect[]:
      - foo
      - bar

This can be useful for having different Prometheus servers collect specific metrics from nodes.

Development building and running

Prerequisites:

Go compiler
RHEL/CentOS: glibc-static package.

Building:

git clone https://github.com/prometheus/node_exporter.git
cd node_exporter
make build
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

TLS endpoint

** EXPERIMENTAL **

The exporter supports TLS via a new web configuration file.

./node_exporter --web.config.file=web-config.yml

See the exporter-toolkit web-configuration for more details.

node_exporter's People

Contributors

Stargazers

Watchers

Forkers

brian-brazil fabxc dspeichert clly kormat cournape ttakezawa kjmkznr franklinwise benley jmcfarlane matthiasr yo61 alexanderthaller jtakkala supershabam unixboy aleksi eripa ra1fh asbjornenge robbiet480 federicobaldo grealish kenxengineering alphab kirussel 70-10 edgecaseinc crisidev lyda mischief trumant lukaf trayio masterypro xbglowx cviecco alde jinty pdf f0 juergenhoetzel caskey-zz ekesken expressenab richih erkki wehkamp cagedmantis problame is00hcw dominikschulz bviolier dan-cleinmark tux21b pandemicsyn yanghongkjxy ashi-austin neocturne edmundnoble whywaita gregorygtseng baughn mohanarpit exotel cldmnky alexrudd mrwacky42 insanejudge pulcy andywq chbatey bulanlily blaubaer knyar therealbill knweiss mcspring ipstatic devnev wwfalcon slene mgit-at imgix songjiayang rektide farcaller pablolibo rtreffer pombredanne xuzhaokui thomasf carlpett tnakaike hambster jonboulle linearregression awh gitinsky-bot

node_exporter's Issues

Megacli collector should move to textfile collector

We don't want to shell out (particularly to things that require root), so these should move to the textfile collector. We should provide scripts or binaries that can be run from cron

Docker container versioning not implemented

Hi,
I am using node_exporter and I would like to use the official prom/node_exporter container from docker hub.
The versioning system is not implemented and currently prom/node_exporter:latest contains node_exporter v0.12.0-rc1. Is it possible to push containers with a sane versioning system?

Does not work so great inside a docker container

Currently it reports /etc/hostname and /etc/hosts and /etc/resolv.conf as mount points inside the docker container, and then reports / as just the filesystem that the docker container is mounted on.

It would be nice to be able to report the mounts from the host machine, or at least have some configuration option where you can do that similar to cAdvisor.

Undeprecate attributes module

I'm monitoring a set of aurora clusters. Because of how we move slave between them, the only source of truth for which cluster a given slave belongs to is a puppet configuration run, I need a way to annotate the metrics exposed from node exporter with that info -- it's not part of the metric uri that Prometheus saves for instance.

Is this module actually problematic for technical reasons? If so, is there a better way of doing this?

Improve default collector list selection

As some collectors are not available on all operating systems, they should not be part of the collectors.enabled flag by default. This list needs to be dynamic and remove any collectors which are not available/compiled.

go 1.5.2 C source files not allowed when not using cgo or SWIG:

i try to build the node_exporter master branch and get this error, any hints?
go build node_exporter.go

package runtime: C source files not allowed when not using cgo or SWIG: atomic_amd64x.c defs.c float.c heapdump.c lfstack.c malloc.c mcache.c mcentral.c mem_linux.c mfixalloc.c mgc0.c mheap.c msize.c os_linux.c panic.c parfor.c proc.c runtime.c signal.c signal_amd64x.c signal_unix.c stack.c string.c sys_x86.c vdso_linux_amd64.c

also with make

make
mkdir -p /home/fk/work/src/go/src/github.com/prometheus/node_exporter/.build/gopath/src/github.com/prometheus/
ln -s /home/fk/work/src/go/src/github.com/prometheus/node_exporter /home/fk/work/src/go/src/github.com/prometheus/node_exporter/.build/gopath/src/github.com/prometheus/node_exporter
GOPATH=/home/fk/work/src/go/src/github.com/prometheus/node_exporter/.build/gopath /usr/local/go/bin/go get -d
package .
    imports runtime: C source files not allowed when not using cgo or SWIG: atomic_amd64x.c defs.c float.c heapdump.c lfstack.c malloc.c mcache.c mcentral.c mem_linux.c mfixalloc.c mgc0.c mheap.c msize.c os_linux.c panic.c parfor.c proc.c runtime.c signal.c signal_amd64x.c signal_unix.c stack.c string.c sys_x86.c vdso_linux_amd64.c
Makefile.COMMON:93: recipe for target 'dependencies-stamp' failed
make: *** [dependencies-stamp] Error 1

regards f0

HTTPS

It would be good to be able to expose the node_exporter metrics over HTTPS with basic auth. Bonus points for client certificate verification.

For now will probably use a reverse proxy for this but it does seem like something that makes sense to have baked in.

stat_linux should move to ConstMetric

Multiple simultaneous scrapes can result in bad data.

Remove basic auth from node exporter

As we've now clarified that our general stance is that exporter auth should be done via a reverse proxy, we should remove the basic auth support from node exporter.

/proc/stat: absolute values vs normalization by USER_HZ

Would you consider reporting the cpu-related info from /proc/stat in ticks / sysconf(_SC_CLK_TCK) instead of ticks ? It would make comparison between VMs and across kernel more robust ?

The ntp package has no license

Hi,

I noticed that the ntp package that is used by node_exporter has no license, and so technically, it is not legal to use it at all. I have already filed a bug with the author: beevik/ntp#1

Thanks

high io load after running for a while

I have been running the node_exporter version 0.8.0 for a while now. Maybe 2 weeks, and I am now seeing a pretty high IO load. If I run iotop then I am seeing it reading megabytes of data in a matter of seconds. It is basically using 99% of the io on the system.

If I run lsof -p $PID then I see that there are a lot of open file descriptors to /pro/stat and /proc/$PID/limits and /proc/meminfo and /proc/$PID/stat

if this helps
sudo lsof -p 25359 | awk '{print $9}' | sort | uniq -c | sort -n

      1 *:7301
      1 /dev/null
      1 /proc/diskstats
      1 /tmp/node_exporter.ip-XXXX.root.log.INFO.20150501-131008.25359
      1 /usr/lib64/ld-2.19.so
      1 /usr/lib64/libc-2.19.so
      1 /usr/lib64/libnss_files-2.19.so
      1 /usr/lib64/libpthread-2.19.so
      1 /var/lib/prometheus/node_exporter
      1 NAME
      1 anon_inode
      1 ip-XXXX.compute.internal:7301->ip-XXXX.compute.internal:49115
      2 /
      2 /proc/25359/mounts
      2 /proc/25359/net/dev
      2 socket
     12 /proc/25359/limits
     17 /proc/stat
     19 /proc/25359/stat
     20 /proc/meminfo
     47 can't identify protocol

Build failure on CentOS 7

The Makefile doesn't find go on CentOS 7.

I worked around the issue by hacking Makefile.COMMON:

-GOCC       ?= $(GOROOT)/bin/go
+GOCC       ?= $(GOROOT)/bin/linux_amd64/go

Not sure how to do this generically.

Add OSX metrics

This is most likely due to lack of /proc under OSX system. The only collectors that worked for me were textfile, time and mdadm (hehe).

Quiet mode for node_exporter

Is there any option for quiet mode?

Node exporter sends a lot of data to syslog...

Vendor all dependencies

In order to have reproducible builds, we need to vendor all dependencies.

sar collector

IIUC, sar can give you a lot of metrics.

Proposal: Consolidate configuration options

There are currently two ways collectors can be configured: by using cli paraemters or by reading from a config file. I'd propose to settle on a single option to make it easier for users to use node_exporter.

The collectors using the config files are:

attributes
megacli

Collectors using parameters:

ntp
filesystem
diskstats

Additionally, the selection of collectors itself also uses parameters.

Given the majority of configuration options already happens using parameters, I propose to replace the config file options with parameters for now. Additionally, in order to make it easier to read all config options, all parameters get namespaced with their collector name. The namespacing scheme with a dot will be used to consolidate node_exporter with other prometheus components like pushgateway or the server which also use a dot.

Example:

Usage of node_exporter:
  -attributes.list="": comma separates list of static attributes
  -filesystem.ignoredMountPoints="^/(sys|proc|dev)($|/)": Regexp of mount points to ignore for filesystem collector.
  -ntp.server="": NTP server to use for ntp collector.
  -megacli.command="megacli": Command to execute to retrieve information.

@u-c-l @brian-brazil @discordianfish

runit: add service uptime

Right now node exporter only tracks desired state, normal state and state (btw, what the difference?). It's possible for service to be restarted between scrapes, and no-one will know about it from node exporter.

Export service statuses from systemd

Like the supervisord and runit collectors, it would be great to export service status from systemd. This shouldn't be too hard, there is already a go-systemd library here:
https://github.com/coreos/go-systemd

Which can be used to get service information/status over dbus.

export /proc/net/nf_conntrack

This would allow us to monitor the number of TCP and UDP connections and bandwidth

thanks

Test Cases on OSX will always pass when test case should fail

Whenever I run make test on node_exporter on OSX, the test cases will always pass, even if I purposely break a test case.

Example

$ cd /tmp
$ git clone https://github.com/prometheus/node_exporter.git
$ cd node_exporter

Change the test case loadavg_linux_test.go from 0.21 to 0.25

func TestLoad(t *testing.T) {
    load, err := parseLoad("0.21 0.37 0.39 1/719 19737")
    if err != nil {
        t.Fatal(err)
    }
    if want := 0.25; want != load {
        t.Fatalf("want load %f, got %f", want, load)
    }
}

Run the test cases using make:

$ make test
mkdir -p /private/tmp/node_exporter/.build/gopath/src/github.com/prometheus/
ln -s /private/tmp/node_exporter /private/tmp/node_exporter/.build/gopath/src/github.com/prometheus/node_exporter
GOPATH=/private/tmp/node_exporter/.build/gopath /usr/local/go/bin/go get -d
touch dependencies-stamp
GOPATH=/private/tmp/node_exporter/.build/gopath /usr/local/go/bin/go test ./...
?       _/tmp/node_exporter [no test files]
ok      _/tmp/node_exporter/collector   0.082s
?       _/tmp/node_exporter/collector/ganglia   [no test files]

My System Settings

$ which go
/usr/local/go/bin/go

$ go version
go version go1.4.2 darwin/amd64

$ uname -a
Darwin XXX.local 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64

Disk stats for bytes_read and bytes_written

Currently, node_disk_sectors_read and node_disk_sectors_written are exported, however, what use are these considering the sector size can be different for disk?

Would it perhaps not be a better idea to export node_disk_read_bytes and node_disk_written_bytes?

Windows support

Are there any plans to support windows nodes?

Network exporter should drop metrics for interfaces which don't exist anymore.

See #60 (comment)

Collect filesystem read-only status

The filesystem exporter should collect a boolean indicating if the filesystem is read-only or read-write.

This is placeholder issue since I'll implement the feature soon.

Collect Linux vmstat

/proc/vmstat has useful metrics to find out what the kernel memory system is up to.

See: http://www.linuxinsight.com/proc_vmstat.html

The runit package has no license

Like the ntp package, the runit package has no explicit license. Although I presume you can fix that one easily :-)

Prefix Ganglia-exported metrics

Prometheus metrics should be appropriately namespaced. Native node_exporter metrics are exported with a node_ prefix, but Ganglia ones are exported as-is. We should probably prefix them with ganglia_.

filter for node_filesystem_* and node_network_*

We're running node_exporter on machines on which we use Docker. There are a lot of docker containers that get spawned and destroyed. Over time, node_exporter accumulates hundreds of entries for node_filesystem_*{filesystem="/var/lib/docker/devicemapper/mnt/*"} and node_network_*{device="veth*"}. I think that could be a reason why our prometheus is using a lot of memory. I see two possible solutions: One is to filter those values completely, the other is to forget about old entries after some time. What do you think?

Crash while parsing /proc/stat

panic: runtime error: slice bounds out of range

goroutine 35 [running]:
github.com/prometheus/node_exporter/collector.(*statCollector).Update(0xc20806e200, 0xc2080526c0, 0x0, 0x0)
        /usr/local/src/node_exporter/.deps/gopath/src/github.com/prometheus/node_exporter/collector/stat.go:101 +0x74a
main.Execute(0x8aef80, 0x4, 0x7f716c18a4d0, 0xc20806e200, 0xc2080526c0)
        /usr/local/src/node_exporter/node_exporter.go:89 +0x75
main.func·001(0x8aef80, 0x4, 0x7f716c18a4d0, 0xc20806e200)
        /usr/local/src/node_exporter/node_exporter.go:62 +0x5b
created by main.NodeCollector.Collect
        /usr/local/src/node_exporter/node_exporter.go:64 +0x1c7

Kernel 2.6.32-25-pve provides 8 instead of 9 values for each CPU.

runit exporter does not expose service status

After activating the runit exporter, only the node_exporter_scrape_duration_seconds* metric gets populated with runit values.
No service statuses are exposed.
This happens in version 94d2259, other versions were not tested.
System is a debian testing with runit 2.1.2-3.
Also tested was a debian wheezy with runit 2.1.1-6.2

node_network_receive_bytes not collected on Centos 5.9 machines

Our production is a mix of Ubuntu 12.04 and Centos 5.9.
The kernel version on the Centos is 2.6.18.

Please advise

Thanks

Consolidate metrics names to adhere to our guide lines

There are currently many metrics which don't follow our naming guidelines, e.g. they're missing _total suffixes or units like _bytes.

This will be a breaking change, but for the better. As node_exporter is one of the most popular exporters, it should also lead by example.

@juliusv @brian-brazil

Is --net=host mandatory ? iptables problem

Hello,

As defined here debops/ansible-ferm#63 I use ferm to manage iptables.
As soon as I start a container with the --net=host parameter, I can not update my iptables rules anymore. I get the error :

stderr: iptables v1.4.21: host/network `' not found

Is there a way to start the node_exporter container without --net=host ?

Thanks

Document how to control where metric are stored on disk

It looks like data are written in /tmp by default.

Is there a way to change this ? I did not see any relevant flag in node_export -h.

Not compiling on Freebsd, problem with filesystem_freebsd.go

I'm trying to cross-compile to the freebsd archs, and they're all failing out. All the code and filenames seems to be correct, so I'm not sure what's going wrong. I'm using the official Go 1.5.1 package.

export GOOS=freebsd
export GOARCH=amd64
export GO15VENDOREXPERIMENT=1
go build github.com/prometheus/node_exporter

# github.com/prometheus/node_exporter/collector
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:34: undefined: defIgnoredMountPoints
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:62: undefined: filesystemLabelNames
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:68: undefined: filesystemLabelNames
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:74: undefined: filesystemLabelNames
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:80: undefined: filesystemLabelNames
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:86: undefined: filesystemLabelNames
../gopath/src/github.com/prometheus/node_exporter/collector/filesystem_common.go:100: c.GetStats undefined (type *filesystemCollector has no field or method GetStats)

node-exporter doesn't export metrics for all filesystems

Hi,

having this filesystem here:

$ mount|grep docker
/dev/mapper/ubuntu--vg-docker on /var/lib/docker type btrfs (rw)

$ grep docker /proc/mounts 
/dev/mapper/ubuntu--vg-docker /var/lib/docker btrfs rw,relatime,nospace_cache 0 0
/dev/mapper/ubuntu--vg-docker /var/lib/docker/btrfs btrfs rw,relatime,nospace_cache 0 0

But the node_exporter doesn't expose this mountpoint:

node_filesystem_free{env="prod",filesystem="/.dockerinit",instance="http://1.2.3.4:9080/metrics",job="node_exporter"}  350100799488.000000
node_filesystem_free{env="prod",filesystem="/etc/hosts",instance="http://1.2.3.4:9080/metrics",job="node_exporter"}  350100799488.000000
node_filesystem_free{env="prod",filesystem="/",instance="http://1.2.3.4:9080/metrics",job="node_exporter"} 350100799488.000000
node_filesystem_free{env="prod",filesystem="/etc/resolv.conf",instance="http://1.2.3.4:9080/metrics",job="node_exporter"}  120811970560.000000
node_filesystem_free{env="prod",filesystem="/etc/hostname",instance="http://1.2.3.4:9080/metrics",job="node_exporter"} 350100799488.000000

node-overview.html : parse error at char 72: unknown escape sequence U+0064 'd'

Hello,

I don't really plan to use the default node.html to monitor my system (more promdash and custom dashboards) but just to let you know that when I browse to http://server_ip:9090/consoles/node.html and click on the node link, I get the following error :

error executing template __console_/node-overview.html: template: __console_/node-overview.html:38:109: executing "__console_/node-overview.html" at <query>: error calling query: parse error at char 72: unknown escape sequence U+0064 'd'

NTP collector reports high drift sometime

Hi,

sometime the ntp collector returns drift around -3643548636.539542, which seems to refer to ~1900.
We're using https://github.com/beevik/ntp/blob/master/ntp.go which apparently depends on the UDP stack to make sure all udp packets are complete. The working theory right now is that some udp responses are truncated, causing the drift to be off.

web.telemetry-path not correctly set when starting via systemd

I want to create a setup where i guard node_exporter behind an nginx reverse proxy. Since this proxy should work directory-based and not vhost-based, i created the following proxy rule in nginx:

location /tirn-node {
           proxy_pass http://192.168.1.9:9100;
           include /etc/nginx/auth-basic.conf;
        }

So when you access /tirn-node, you should get to the landing page of node_exporter, and when you access /tirn-node/metrics, you get the metrics which are then scraped my prometheus.
To achieve this goal, i start up node_exporter with the following CLI flag:
-web.telemetry-path="/tirn-node/metrics which works great when you start node_exporter by hand. The result here is:

I call host.tld/tirn-node in my browser
The landing page shows up
The link leads to host.tld/tirn-node/metrics
I click on the link and the metrics are shown

But now i want to start it with systemd.
The corresponding line in the unit definition is:
ExecStart=/home/simonszu/go/src/github.com/prometheus/node_exporter/node_exporter -web.telemetry-path="/tirn-node/metrics"
I have checked with ps that this exact line is called by systemd.

But the behaviour is now:

I call host.tld/tirn-node in my browser.
The landing page shows up.
The link leads to host.tld/tirn-node (notice: no trailing /metrics)
I try to manually access host.tld/tirn-node/metrics
I am presented the landing page, in the address bar of my browser is shown "host.tld/tirn-node", without the metrics.

So i am not able to access the metrics when i start node_exporter via systemd, despite it has exactly the same CLI flags it had when i started it manually and could access the metrics.

Note: When i bypass the proxy and access the node_exporter directly, but all occurrences of host.tld/tirn-node are replaced by the blank ip:port urls, which is the desired behaviour, so i really think it has to do something with node_exporter itself.

I am a little clueless what to do now, so i'm sending you this issue and hope you have any idea.

Error on make

I'm getting the following error while trying to build:

GOROOT=/usr/local/go GOPATH=/home/f8366545/Projects/node_exporter-0.8.0/.build/gopath /usr/local/go/bin/go build  -o node_exporter
# github.com/prometheus/client_golang/text
.build/gopath/src/github.com/prometheus/client_golang/text/proto.go:30: undefined: ext.WriteDelimited
make: *** [node_exporter] Error 2

I'm not a gopher myself, so it may be related to my env.

Filesystem exporter should drop metrics for filesystems which don't exist anymore.

See #60 (comment)

node_exporter cannot parse net stats with big numbers in them

I see node_exporter choking on one host:

ERROR: netdev failed after 0.000096s: Couldn't get netstats: Invalid line in /proc/net/dev:     lo:181005683161 224490607    0    0    0     0          0         0 181005683161 224490607    0    0    0     0       0          0

Looking at the code, I think it is because there is no space between the colon and the number, I pressume this happens when the counters have more than 10 digits. This is the ProcNetDev file in the same host:

$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:181033261002 224517415    0    0    0     0          0         0 181033261002 224517415    0    0    0     0       0          0
  eth0:68210035552 520993275    0    0    0     0          0         0 9315587528 43451486    0    0    0     0       0          0
  eth1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  sit0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

Add /proc/net/rpc stats

Just had some interesting times diagnosing a problem where this data would prove helpful to alert off of in the future. This issue is a note to myself to implement it.

Be smarter about ignoring certain special filesystems

When there's a Docker container running on a host and the node exporter is not run as root, the filesystem collector fails with the following error:

INFO[0320] ERROR: filesystem failed after 0.002522s: Statfs on /var/lib/docker/aufs returned permission denied  file=node_exporter.go line=97

df is smarter about which filesystems to show usage for. It first does a stat, then only a statfs for some of them that are actually "relevant". Determine what it does exactly and perhaps use the same strategy.

Make it golint and go vet clean.

The current code contains many golint violations and even a few go vet violations.

Since new collectors share boilerplate code with existing ones, contributors copy those violations into new code as well. The badness spreads.

Missing docs for config file

Hi,

The configuration file is not really documented anywhere I could find, and a quick look at the code did not really help. A small reference to it would be very helpful.

report values of /proc/net/snmp

netstat -s reports on "active connection openings" and "passive connection openings", which is interesting to us since we just had issues because one of our apps wasn't properly using HTTP keep-alive and thus it created a ton of new TCP connections each second. We could see that with those values, but it seems like they are not reported atm. The source that netstat uses is /proc/net/snmp.

Default to not logging to /tmp

A trap for new users is that the node_exporter defaults (due to glog) to logging to /tmp but doesn't seem to set any sensible rotation limits on this. Since tmp can be quite small and/or a tmpfs, this can be a sneaky problem since log files are going to unexpected places.

I think we should default -logtostderr to true, since the vast majority of users are going to want to (1) run it and see if it accepts requests and then (2) run it and forget about it (if it goes down, Prometheus will tell us).

prometheus / node_exporter Goto Github PK

node_exporter's Introduction

Node exporter

Installation and Usage

Ansible

Docker

Collectors

Include & Exclude flags

Enabled by default

Disabled by default

Deprecated

Perf Collector

Sysctl Collector

Examples

Numeric values

Single values

Multiple values

String values

Single values

Multiple values

Textfile Collector

Filtering enabled collectors

Development building and running

Running tests

TLS endpoint

node_exporter's People

Contributors

Stargazers

Watchers

Forkers

node_exporter's Issues

Example

My System Settings

Recommend Projects

Recommend Topics

Recommend Org