cloudalchemy / ansible-node-exporter Goto Github PK
View Code? Open in Web Editor NEWProvision basic metrics exporter for prometheus monitoring tool
License: MIT License
Provision basic metrics exporter for prometheus monitoring tool
License: MIT License
The dash ('-') seems not allowed anymore in collection names. It seems it was previously allowed:
❯ ansible-galaxy collection install cloudalchemy.node-exporter
Process install dependency map
ERROR! Invalid collection name 'cloudalchemy.node-exporter', name must be in the format <namespace>.<collection>. Please make sure namespace and collection name contains characters from [a-zA-Z0-9_] only.
Tested with ansible version 2.9.16, both setting python_interpreter to python2 and python3
What happened?
Seems this task is not working against a debian9 with SELinux enabled.
- name: Allow node_exporter port in SELinux on RedHat OS family
seport:
ports: "{{ node_exporter_web_listen_address.split(':')[-1] }}"
proto: tcp
setype: http_port_t
state: present
when:
- ansible_version.full is version_compare('2.4', '>=')
- ansible_selinux.status == "enabled"
TASK [cloudalchemy.node-exporter : Allow node_exporter port in SELinux on RedHat OS family] **************************************************************************************
Monday 28 December 2020 12:09:27 +0100 (0:00:01.755) 0:00:24.342 *******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named seobject
fatal: [debian8-server]: FAILED! => {"changed": false, "msg": "Failed to import the required Python library (policycoreutils-python) on debian8-server's Python /usr/bin/python. Please read module documentation and install in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}
However the seport module is not tested against debian (https://docs.ansible.com/ansible/2.9/modules/seport_module.html#notes).
I'm not sure but probably the best idea is to disable the task by adding
- not ansible_distribution | lower == "debian"
If you agree I can do a PR.
Environment
Role version:
cloudalchemy.node-exporter (0.22.0)
Ansible version information:
ansible 2.9.14
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/my/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.18 (default, Aug 4 2020, 11:16:42) [GCC 9.3.0]
# dpkg -l | grep -i selinux
ii checkpolicy 2.6-2 amd64 SELinux policy compiler
ii libselinux1:amd64 2.6-3+b3 amd64 SELinux runtime shared libraries
ii libsemanage-common 2.6-2 all Common files for SELinux policy management libraries
ii libsemanage1:amd64 2.6-2 amd64 SELinux policy management library
ii libsepol1:amd64 2.6-2 amd64 SELinux library for manipulating binary security policies
ii policycoreutils 2.6-3 amd64 SELinux core policy utilities
ii policycoreutils-dev 2.6-3 amd64 SELinux core policy utilities (development utilities)
ii policycoreutils-python-utils 2.6-3 amd64 SELinux core policy utilities (Python utilities)
ii python-selinux 2.6-3+b3 amd64 Python bindings to SELinux shared libraries
ii python3-selinux 2.6-3+b3 amd64 Python3 bindings to SELinux shared libraries
ii python3-semanage 2.6-2 amd64 Python3 bindings for SELinux policy management
ii python3-sepolgen 2.6-3 all Python3 module used in SELinux policy generation
ii python3-sepolicy 2.6-3 amd64 Python binding for SELinux Policy Analyses
ii selinux-basics 0.5.6 all SELinux basic support
ii selinux-policy-default 2:2.20161023.1-9 all Strict and Targeted variants of the SELinux policy
ii selinux-policy-dev 2:2.20161023.1-9 all Headers from the SELinux reference policy for building modules
ii selinux-utils 2.6-3+b3 amd64 SELinux utility programs
When node_exporter_textfile_dir
is changed to a different directory, but node_exporter_enabled_collectors
is set to default, node_exporter have wrong configuration in systemd file. This is probably caused by the fact that configuration of node_exporter_enabled_collectors
is taking the default node_exporter_textfile_dir
value and not the overwitten one.
Possible fixes:
node_exporter_textfile_dir
and don't allow any changesnode_exporter_textfile_dir
has different value than default one and change node_exporter_enabled_collectors
accordingly.node_exporter_textfile_dir
has different value than default one and fail role with a user notification to accomodate this change in custom node_exporter_enabled_collectors
.Either way we should probably check for this issue in tasks/preflight.yml since that's what this file is for.
cc: @SuperQ
What is missing?
A way to pass arbitrary command line flags to node_exporter
Why do we need it?
node_exporter has more parameters than those exposed by the variables in this role. An "escape hatch" to pass any flags would be very useful. Something like:
node_exporter_flags: ["--web.telemetry-path=/foo", "--web.max-requests=99"]
In fact, with this variable in place, node_exporter_web_listen_address
could be deprecated IMO as it would be adequately covered by the more general node_exporter_flags
.
Hi,
--check is not working with this playbook
fatal: [vpnssl-d01-mon]: FAILED! => {"msg": "The conditional check '(not __node_exporter_is_installed.stat.exists) or (__node_exporter_current_version_output.stderr_lines[0].split(\" \")[2] != node_exporter_version)' failed. The error was: error while evaluating conditional ((not __node_exporter_is_installed.stat.exists) or (__node_exporter_current_version_output.stderr_lines[0].split(\" \")[2] != node_exporter_version)): 'dict object' has no attribute 'stderr_lines'\n\nThe error appears to be in '/home/bdupuis/git/vpnssl/ansible/roles/cloudalchemy.node-exporter/tasks/install.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Install dependencies\n ^ here\n"}
I think we should add "check_mode: no" on "Check if node_exporter is installed" and other register command
Best regards
With a large number of hosts, the task "Get checksum for amd64 architecture" usually fails for at least some of them with the following error:
fatal: [HOST1]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST2]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST3]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST4]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
This appears to be because the checksum is fetched for each node (which makes sense, they don't all share the architecture), but is fetched by the controller node, so for 40 amd64 nodes, the controller will fetch the amd64 checksums 40 times.
For now I have locally set run_once: true in
~/.ansible/...`, but I think this needs a proper solution.
Installation fails in Ansible AWX 9.1.0 Environment
/tmp/node_exporter-0.18.1.linux-amd64.tar.gz => exists on the target node after run
TASK [cloudalchemy.node-exporter : Download node_exporter binary to local folder] ***
ok: [elab-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-spine2 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-spine1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf3 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf4 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-extleaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-egress-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-egress-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]
TASK [cloudalchemy.node-exporter : Unpack node_exporter binary] ****************
fatal: [elab-spine2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-spine1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf3 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf4 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-extleaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-egress-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-egress-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
{
"group": "ansible-cumulus",
"uid": 999,
"url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
"changed": false,
"elapsed": 0,
"dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
"state": "file",
"gid": 997,
"mode": "0644",
"invocation": {
"module_args": {
"directory_mode": null,
"force": false,
"remote_src": null,
"path": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
"owner": null,
"follow": false,
"client_key": null,
"group": null,
"use_proxy": true,
"unsafe_writes": null,
"serole": null,
"content": null,
"validate_certs": true,
"setype": null,
"client_cert": null,
"timeout": 10,
"url_password": null,
"dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
"selevel": null,
"force_basic_auth": false,
"sha256sum": "",
"http_agent": "ansible-httpget",
"regexp": null,
"src": null,
"url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
"checksum": "sha256:b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424",
"seuser": null,
"headers": null,
"delimiter": null,
"mode": null,
"url_username": null,
"attributes": null,
"backup": null,
"tmp_dest": null
}
},
"owner": "ansible-cumulus",
"checksum_src": null,
"size": 8083296,
"checksum_dest": null,
"msg": "file already exists",
"_ansible_no_log": false,
"attempts": 1,
"_ansible_delegated_vars": {
"ansible_host": "{{ inventory_hostname }}.{{ host_domain }}"
}
}
Fail when at least one collector is in both lists node_exporter_enabled_collectors
and node_exporter_disabled_collectors
What if we use built-in feature of systemd named "socket activation" for starting node-exporter?
This way at boot node_exporter wouldn't be started, but systemd would invoke it on first prometheus scrape. End result is one process less when it is not used.
@SuperQ what do you think about it?
$ cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core)
$ cat /etc/systemd/system/node_exporter.service
#
# DO NOT EDIT THIS FILE It is automatically generated by Ansible.
#
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/data/application/node_exporter/node_exporter \
--log.level=error \
--web.telemetry-path=/metrics \
--collector.tcpstat \
--collector.processes \
--collector.netdev.ignored-devices=^(tap|cali|docker|veth|tun).*$ \
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|rootfs|nfs)$ \
--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|boot|run.*|var/lib/kubelet/.+|var/lib/docker/.+|data/docker/overlay)($|/) \
--collector.netstat.fields=^(.*_(InErrors|InErrs)|Ip_Forwarding|Ip(6|Ext)_(InOctets|OutOctets)|Icmp6?_(InMsgs|OutMsgs)|TcpExt_(Listen.*|Syncookies.*)|Tcp_(ActiveOpens|PassiveOpens|RetransSegs|CurrEstab)|Udp6?_(InDatagrams|OutDatagrams|NoPorts))$ \
--collector.diskstats.ignored-devices=^(ram|loop|fd|nvme\d+n\d+p|tmpfs|md|up-|sr|rootfs)(\d*)$ \
--collector.netclass.ignored-devices=^(tap|cali|docker|veth|tun).*$ \
--collector.textfile.directory=/data/application/node_exporter/text_metrics \
--no-collector.mdadm \
--web.listen-address=0.0.0.0:9100
SyslogIdentifier=node_exporter
Restart=always
PrivateTmp=yes
ProtectHome=yes
NoNewPrivileges=yes
ProtectSystem=full
Nice=0
[Install]
WantedBy=multi-user.target
$ systemctl status node_exporter.service -l
● node_exporter.service - Prometheus Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
Active: inactive (dead)
2月 25 16:56:33 systemd[1]: Cannot add dependency job for unit node_exporter.service, ignoring: Unit not found.
Commit 0d9f503 changed the role name on Ansible Galaxy from node-exporter
to node_exporter
. I understand that role names with dashes are deprecated, but this change is disruptive and I don't even see it mentioned in the commit message! This is not only breaking projects using latest version: even those pinned to a release tag are now failing to download the role. Documentation is now out of sync as well. What's the maintainer's opinion on this?
What happened?
As per this part of the README
the textfile directory should default to /var/lib/node_exporter
.
However, when running the following playbook:
- hosts: ubuntu
vars:
node_exporter_port: 9100
node_exporter_web_listen_address: "0.0.0.0:{{ node_exporter_port }}"
node_exporter_enabled_collectors: [textfile]
tasks:
- name: Set up node exporter
import_role:
name: cloudalchemy.node-exporter
tags: node_exporter
the systemd unit file ends up looking like this:
#
# Ansible managed
#
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
--collector.textfile \
--web.listen-address=0.0.0.0:9100
SyslogIdentifier=node_exporter
Restart=always
PrivateTmp=yes
ProtectHome=yes
NoNewPrivileges=yes
ProtectSystem=strict
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
[Install]
WantedBy=multi-user.target
which is missing the --collector.textfile.directory
option, making Node Exporter default to ""
, which
is not what the README of this role specifies.
Did you expect to see some different?
Yes. I expected the Node Exporter to be set up to have textfile look in /var/lib/node_exporter
.
How to reproduce it (as minimally and precisely as possible):
See above.
Environment
- src: cloudalchemy.node-exporter
version: 0.14.0
Ansible version information:
ansible --version
ansible 2.9.13
...
python version = 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0]
See above
Currently when running node_exporter as non-root user it cannot access some files (for example some filesystems, like /var/lib/docker). Adding cap_dac_read_search=+ep
to binary file should solve this problem.
We should support multiple architectures, not only one.
Example is here: https://gitlab.com/superq/prometheus-ansible/tree/master/roles/node_exporter/defaults
What happened?
Upgraded the role from 0.15 to 0.19. Role asserts that node-exporter is already installed because /usr/local/bin/node_exporter
exists and therefore skips the install.yml tasks. The user is now different from what it was before and fails creating/chowning the textfile collector dir because the new user/group does not exist.
TASK [node-exporter : Create textfile collector dir] ********
fatal: [focal-dev]: FAILED! => changed=false
gid: 998
group: _node-exporter
mode: '0775'
msg: 'chown failed: failed to look up user node-exp'
owner: _node-exporter
path: /var/run/node_exporter
size: 40
state: directory
uid: 997
Did you expect to see some different?
How to reproduce it (as minimally and precisely as possible):
Run the role with version 0.19 on hosts where node-exporter was previously installed using version 0.18 or older.
Environment
Role version:
- cloudalchemy.node-exporter, 0.19.0
Ansible version information:
ansible 2.8.5
config file = /var/lib/ansible/ansible.cfg
configured module search path = ['/home/vos/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0]
---
node_exporter_version: 0.18.1
node_exporter_system_group: "_node-exporter"
node_exporter_textfile_dir: "/var/run/node_exporter"
node_exporter_enabled_collectors:
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
node_exporter_disabled_collectors:
- systemd
PLAY [Manage Node Exporter] ***************************************************************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Assert usage of systemd as an init system] **************************************************************************************
ok: [focal-dev] => changed=false
msg: All assertions passed
TASK [node-exporter : Get systemd version] ************************************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Set systemd version fact] *******************************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Naive assertion of proper listen address] ***************************************************************************************
ok: [focal-dev] => changed=false
msg: All assertions passed
TASK [node-exporter : Assert collectors are not both disabled and enabled at the same time] ***********************************************************
ok: [focal-dev] => (item=systemd) => changed=false
ansible_loop_var: item
item: systemd
msg: All assertions passed
TASK [node-exporter : Check if node_exporter is installed] ********************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Gather currently installed node_exporter version (if any)] **********************************************************************
ok: [focal-dev]
TASK [node-exporter : Get checksum list from github] **************************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Get checksum for amd64 architecture] ********************************************************************************************
ok: [focal-dev] => (item=b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424 node_exporter-0.18.1.linux-amd64.tar.gz)
TASK [node-exporter : Copy the Node Exporter systemd service file] ************************************************************************************
ok: [focal-dev]
TASK [node-exporter : Create textfile collector dir] **************************************************************************************************
fatal: [focal-dev]: FAILED! => changed=false
gid: 998
group: _node-exporter
mode: '0775'
msg: 'chown failed: failed to look up user node-exp'
owner: _node-exporter
path: /var/run/node_exporter
size: 40
state: directory
uid: 997
PLAY RECAP ********************************************************************************************************************************************
focal-dev : ok=11 changed=0 unreachable=0 failed=1 skipped=11 rescued=0 ignored=0
Anything else we need to know?:
I can work around this by removing /usr/local/bin/node_exporter
on all my hosts so that the install.yml gets called but that is not a "clean" upgrade path.
(Sorry, for lack of better headline/title...)
I had a question about this role, specifically about:
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L24-L35
Is there a specific reason this is delegated to localhost? It seems downloading and unpacking on the target node would be much better in most cases?
Then, this part:
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L52-L60
Seems to happen almost every time. Can we add a "creates" and an initial version check? Any thoughts? I am asking because this setup seems to consume a substantial amount of time on my ansible runs and I am trying to avoid building special purpose playbooks right now and would prefer to be able to re-run this continuously/whenever with the same predicable outcome.
What happened?
TASK [ansible-node-exporter : Create node_exporter config directory] ***********
fatal: [centos8]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp `\"&& mkdir OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp/ansible-tmp-1595086095.7307746-27669-8276321652003 && echo ansible-tmp-1595086095.7307746-27669-8276321652003=\"` echo OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp/ansible-tmp-1595086095.7307746-27669-8276321652003 `\" ), exited with result 1", "unreachable": true}
Did you expect to see some different?
A working test.
Maybe fix?
I think I had similar issues with Github Actions. Could be multiple things:
Should we try to fix this?
You have some default variables commented out here that I'd like to use. Is there another way to do this?
Hello!
Been trying to use this role to install the node exporter on a few servers (VMs).
Ultimately, the Ansible run completes, but the node exporter is not started.
When inspecting systemctl status
I see the job has reached its start limit, the Journal has this:
Oct 25 11:11:03 prometheus.test systemd[14341]: Failed at step NICE spawning /usr/local/bin/node_exporter: Permission denied
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service: main process exited, code=exited, status=201/NICE
Oct 25 11:11:03 prometheus.test systemd[1]: Unit node_exporter.service entered failed state.
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service failed.
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service holdoff time over, scheduling restart.
Oct 25 11:11:03 prometheus.test systemd[1]: Started Prometheus Node Exporter.
Oct 25 11:11:03 prometheus.test systemd[1]: Starting Prometheus Node Exporter...
...
I tried starting the node exporter as root, which works, it just fails at nice'ing the process via Systemd when using the user node-exp
(which is also setup by this role).
I have tried to google how to check which permissions are needed by a user, but I am also stuck checking Systemd internals to see what is happening.
I think I am not doing something very custom here — running this role, a few configuration settings:
vars:
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_textfile_dir: "/var/lib/node_exporter"
node_exporter_disabled_collectors:
- diskstats
- mdadm
- nfs
- nfsd
- wifi
- xfs
- zfs
I noticed the node_exporter.service
installed by this role is the only one nice'ing the service. All other examples/tutorials etc. don't show that. I suspect there is good reason to do this, thus raising the issue here.
What happened?
When running the ansible-node-exporter, upon running this task:
- name: Get checksum list from github
set_fact:
_checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
I get the following issue.
[19:01:16] cloudalchemy.node-exporter : Get checksum list from github | host | FAILED | 1742ms
{
It seemed that the results return is a 302, which the url
did not follow. The option to follow seemed to be added in Ansible Devel. Switching to use uri
worked.
Did you expect to see some different?
Expect this to succeed.
How to reproduce it (as minimally and precisely as possible):
Attempt to run/use the role.
Environment
Ubuntu 18.04
https://github.com/cloudalchemy/ansible-node-exporter/releases/tag/0.19.0
root@c86f70a56fc7:/work# ansible --version
ansible 2.7.1
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.6/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L78
[19:01:16] cloudalchemy.node-exporter : Get checksum list from github | host | FAILED | 1742ms
{
- msg: An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt : HTTP Error 400: Bad Request
Anything else we need to know?:
Work around at the moment is to patch the task with
- name: Download checksum list from github
uri:
url: "{{ 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt' }}"
method: GET
return_content: true
status_code: 200
body_format: json
register: _raw_checksum
until: _raw_checksum.status == 200
retries: 5
run_once: true
- name: "Get checksum list from github results"
set_fact:
_checksums: "{{ _raw_checksum.content.split('\n') }}"
run_once: true
What happened?
After changing
node_exporter_binary_local_dir: node_exporter-1.0.0.linux-amd64
to
node_exporter_binary_local_dir: node_exporter-1.0.1.linux-amd64
I somewhat expected the role to copy the new version to the servers,
but install.yml is ignored after the first playbook run and the version on the servers is not updated.
Moving the task "propagate locally distributed node_exporter binary" from install.yml to main.yml right after the install.yml import would solve this.
How to reproduce it (as minimally and precisely as possible):
run this playbook:
- hosts: server
roles:
- ansible-node-exporter
vars:
node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter-1.0.0.linux-amd64
then run this playbook:
- hosts: server
roles:
- ansible-node-exporter
vars:
node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter-1.0.1.linux-amd64
Environment
0.21.0
Task is skipped on second run since the node_exporter file exists on the server already:
TASK [ansible-node-exporter : Propagate node_exporter binaries] ********************************************************************************************
skipping: [server]
The following piece of code is actually broken when using IPv6 because it assumes an IPv4 address:
ansible-node-exporter/tasks/configure.yml
Lines 21 to 29 in c8e2796
What happened?
Cannot fetch binaries over proxy
How to reproduce it (as minimally and precisely as possible):
We would like to export a proxy to the machine and use that proxy
export https_proxy=https://proxy-server:8080; ansible-playbook -i hosts playbook.yml
Environment
Linux
ansible 2.8.0
fatal: [host01]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Failed lookup url for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt : <urlopen error [Errno 111] Connection refused>"}
Anything else we need to know?:
Please add
validate_certs: False
In here
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L78
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L23
Same as
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L58
When i tried to apply the role:
# ------------------------------------------------------------------------------
# Node Exporter Service
# ------------------------------------------------------------------------------
- hosts:
- node-exporters
roles:
- role: cloudalchemy.node-exporter
node_exporter_version: 0.17.0
node_exporter_enabled_collectors:
- supervisord
- systemd
- tcpstat
- textfile
# ------------------------------------------------------------------------------
# Firewall Rules
# ------------------------------------------------------------------------------
- hosts:
- node-exporters
tasks:
- name: Open 9100/tcp port in the firewall
firewalld:
port: 9100/tcp
permanent: yes
immediate: yes
state: enabled
when: ansible_os_family == 'RedHat'
i got this error
<localhost> EXEC /bin/sh -c 'chmod u+x /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/ /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/AnsiballZ_get_url.py && sleep 0'
<localhost> EXEC /bin/sh -c 'sudo -H -S -p "[sudo via ansible, key=scrsjkfwaegntxilsjtpmzdkousxbdah] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS
/usr/local/Cellar/ansible/2.7.9/libexec/bin/python3.7 /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/AnsiballZ_get_url.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c 'echo ~ppadial && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /Users/ppadial/.ansible/tmp/ansible-tmp-1554487028.519246-130314029194652/ > /dev/null 2>&1 && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/ppadial/.ansible/tmp/ansible-tmp-1554487032.687249-126806156839976 `" && echo ansible-tmp-1554487032.687249-126806156839976="` echo /Users/ppadial/.ansible/tmp/ansible-tmp-1554487032.687249-126806156839976 `" ) && sleep 0'
FAILED - RETRYING: Download node_exporter binary to local folder (3 retries left).Result was: {
"attempts": 3,
"changed": false,
"module_stderr": "Sorry, try again.\n[sudo via ansible, key=akey] password: \nsudo: 1 incorrect password attempt\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 1,
"retries": 6
}
Any ideas?
What happened?
Can't download node_exporter binary due to invalid checksum
Did you expect to see some different?
Install ok
How to reproduce it (as minimally and precisely as possible):
Environment
Role version:
0.19.0
Ansible version information:
ansible --version
ansible 2.7.10
config file = /home/nicolas/Work/campings/sites/rundeck/ansible/ansible.cfg
configured module search path = ['/home/nicolas/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/nicolas/Applications/azure-cli/venv/lib/python3.8/site-packages/ansible
executable location = /home/nicolas/Applications/azure-cli/venv/bin/ansible
python version = 3.8.1 (default, Jan 22 2020, 06:38:00) [GCC 9.2.0]
---
node_exporter_version: 0.18.1
TASK [cloudalchemy_node_exporter : Download node_exporter binary to local folder] **********************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/install.yml:21
Using module file /home/nicolas/Applications/azure-cli/venv/lib/python3.8/site-packages/ansible/modules/net_tools/basics/get_url.py
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: nicolas
<localhost> EXEC /bin/sh -c '/home/nicolas/Applications/azure-cli/venv/bin/python3 && sleep 0'
FAILED - RETRYING: Download node_exporter binary to local folder (5 retries left).Result was: {
"attempts": 1,
"changed": false,
"invocation": {
"module_args": {
"attributes": null,
"backup": null,
"checksum": "sha256:['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460",
"client_cert": null,
"client_key": null,
"content": null,
"delimiter": null,
"dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
"directory_mode": null,
"follow": false,
"force": false,
"force_basic_auth": false,
"group": null,
"headers": null,
"http_agent": "ansible-httpget",
"mode": null,
"owner": null,
"regexp": null,
"remote_src": null,
"selevel": null,
"serole": null,
"setype": null,
"seuser": null,
"sha256sum": "",
"src": null,
"timeout": 10,
"tmp_dest": null,
"unsafe_writes": null,
"url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
"url_password": null,
"url_username": null,
"use_proxy": true,
"validate_certs": true
}
},
"msg": "The checksum for /tmp/node_exporter-0.18.1.linux-amd64.tar.gz did not match 61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460; it was b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d
1f57b51b72424.",
"retries": 6
}
Anything else we need to know?:
Debug code :
- block:
- name: Get checksum list from github
set_fact:
_checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
- name: "Get checksum for {{ go_arch }} architecture"
set_fact:
node_exporter_checksum: "{{ item.split(' ')[0] }}"
with_items: "{{ _checksums }}"
when:
- "('linux-' + go_arch + '.tar.gz') in item"
- name: tgz archive
debug:
var: "('linux-' + go_arch + '.tar.gz')"
- name: checksum
debug:
var: "node_exporter_checksum"
Logs :
TASK [cloudalchemy_node_exporter : Get checksum for amd64 architecture] ********************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:81
ok: [campings-sites-qa-rundeck] => (item=['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460 node_exporter-0.18.1.darwin-386.tar.gz', '20fadb3108de0a9cc70a1333394e5be90416b4f91025f9fc66f5736335e94
398 node_exporter-0.18.1.darwin-amd64.tar.gz', 'a6c7eb64bb5f27a5567d545a1b93780f3aa72d0627751fd9f054626bb542a4b5 node_exporter-0.18.1.linux-386.tar.gz', 'b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b
51b72424 node_exporter-0.18.1.linux-amd64.tar.gz', 'd5a28c46e74f45b9f2158f793a6064fd9fe8fd8da6e0d1e548835ceb7beb1982 node_exporter-0.18.1.linux-arm64.tar.gz', '1eecbaa2a7e58dc2a5b18e960c48842e5e158c9e2eea4d8a4b
a32b98ca2f638a node_exporter-0.18.1.linux-armv5.tar.gz', '6f3cb593c15c12cdfaef20d7e1c61d28ef822af6fc8c85d670cb3f0a1212778a node_exporter-0.18.1.linux-armv6.tar.gz', '5de85067f44b42b22d62b2789cb1a379ff5559649b99795cd1ba0c144b512ad0 node_exporter-0.18.1.linux-armv7.tar.gz', '9ef7c932970bc823a63347c3cdd8a34a4ef9d327cd5513600435dfd74d046755 node_exporter-0.18.1.linux-mips.tar.gz', 'c2721c1b85e3024e61f37fb2dc44a57f6d4eed8
cc0576185a1dedea20e36fb31 node_exporter-0.18.1.linux-mips64.tar.gz', 'ae262af96dd7409aeefe28f8ea6cb1b00377444837057ed67694d8fa1b75b848 node_exporter-0.18.1.linux-mips64le.tar.gz', '40860be242f563e3e10972685f1d1
654c9b5ca9686b26bde4a422f57a1ebdd18 node_exporter-0.18.1.linux-mipsle.tar.gz', 'b41f860dbe23b72cf2ae939dd6bb43ea3ddde268f5a964cf6f8d490fed1ed034 node_exporter-0.18.1.linux-ppc64.tar.gz', '27996a62327e07041b5dd2
f09d6054c7c21244e39358da5d9b44b96daf6a2bc0 node_exporter-0.18.1.linux-ppc64le.tar.gz', '0bc212b9db6c2201b2b38d46de2d4cc75b7f4648d7616a87d7616e85f0d6cba4 node_exporter-0.18.1.linux-s390x.tar.gz', 'c831801b573075
0177893a9866416ebb68977a8fd5a7b5305e39ef1162e146a9 node_exporter-0.18.1.netbsd-386.tar.gz', '4772c8e2d13935d2bcfa8ad1fd64b8ca5d2cc5d71bbee6dd4ef04306017c6368 node_exporter-0.18.1.netbsd-amd64.tar.gz']) => {
"ansible_facts": {
"node_exporter_checksum": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460"
},
"changed": false,
"item": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460 node_exporter-0.18.1.darwin-386.tar.gz', '20fadb3108de0a9cc70a1333394e5be90416b4f91025f9fc66f5736335e94398 node_exporter-0.18.1.da
rwin-amd64.tar.gz', 'a6c7eb64bb5f27a5567d545a1b93780f3aa72d0627751fd9f054626bb542a4b5 node_exporter-0.18.1.linux-386.tar.gz', 'b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424 node_exporter-0.18
.1.linux-amd64.tar.gz', 'd5a28c46e74f45b9f2158f793a6064fd9fe8fd8da6e0d1e548835ceb7beb1982 node_exporter-0.18.1.linux-arm64.tar.gz', '1eecbaa2a7e58dc2a5b18e960c48842e5e158c9e2eea4d8a4ba32b98ca2f638a node_exporte
r-0.18.1.linux-armv5.tar.gz', '6f3cb593c15c12cdfaef20d7e1c61d28ef822af6fc8c85d670cb3f0a1212778a node_exporter-0.18.1.linux-armv6.tar.gz', '5de85067f44b42b22d62b2789cb1a379ff5559649b99795cd1ba0c144b512ad0 node_e
xporter-0.18.1.linux-armv7.tar.gz', '9ef7c932970bc823a63347c3cdd8a34a4ef9d327cd5513600435dfd74d046755 node_exporter-0.18.1.linux-mips.tar.gz', 'c2721c1b85e3024e61f37fb2dc44a57f6d4eed8cc0576185a1dedea20e36fb31 n
ode_exporter-0.18.1.linux-mips64.tar.gz', 'ae262af96dd7409aeefe28f8ea6cb1b00377444837057ed67694d8fa1b75b848 node_exporter-0.18.1.linux-mips64le.tar.gz', '40860be242f563e3e10972685f1d1654c9b5ca9686b26bde4a422f57a
1ebdd18 node_exporter-0.18.1.linux-mipsle.tar.gz', 'b41f860dbe23b72cf2ae939dd6bb43ea3ddde268f5a964cf6f8d490fed1ed034 node_exporter-0.18.1.linux-ppc64.tar.gz', '27996a62327e07041b5dd2f09d6054c7c21244e39358da5d9b
44b96daf6a2bc0 node_exporter-0.18.1.linux-ppc64le.tar.gz', '0bc212b9db6c2201b2b38d46de2d4cc75b7f4648d7616a87d7616e85f0d6cba4 node_exporter-0.18.1.linux-s390x.tar.gz', 'c831801b5730750177893a9866416ebb68977a8fd5
a7b5305e39ef1162e146a9 node_exporter-0.18.1.netbsd-386.tar.gz', '4772c8e2d13935d2bcfa8ad1fd64b8ca5d2cc5d71bbee6dd4ef04306017c6368 node_exporter-0.18.1.netbsd-amd64.tar.gz']"
}
Read vars_file '../env_vars/{{ env }}.yml'
TASK [cloudalchemy_node_exporter : tgz archive] ********************************************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:88
ok: [campings-sites-qa-rundeck] => {
"('linux-' + go_arch + '.tar.gz')": "linux-amd64.tar.gz"
}
Read vars_file '../env_vars/{{ env }}.yml'
TASK [cloudalchemy_node_exporter : checksum] ***********************************************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:92
ok: [campings-sites-qa-rundeck] => {
"node_exporter_checksum": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460"
}
Read vars_file '../env_vars/{{ env }}.yml'
Role seems to set the Darwin checksum.
I'm not sure if this is related to this particular role, or perhaps Python, but I thought it would be a good idea to document the bug in the event anyone else runs into this, and perhaps the maintainers of this repository have recommendations on how to debug this.
I installed this role by doing the following:
requirements.yml
ansible-galaxy install -r requirements.yml
site.yml
- name: Install node_exporter on all bare-metal hosts
hosts: bare-metal
roles:
- cloudalchemy.node-exporter
And then attempted to do a dry run with:
ansible-playbook -i ansible/hosts.yml --ask-become-pass --check ansible/site.yml
The output looks like this:
PLAY [Install node_exporter on all bare-metal hosts] ******************************************************************************************************************
TASK [Gathering Facts] ************************************************************************************************************************************************
ok: [nuc7i5b.brooks.network]
ok: [nuc7i5a.brooks.network]
ok: [pi.brooks.network]
TASK [cloudalchemy.node-exporter : Gather variables for each operating system] ****************************************************************************************
fatal: [nuc7i5a.brooks.network]: FAILED! => {"msg": "No file was found when using with_first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}
fatal: [nuc7i5b.brooks.network]: FAILED! => {"msg": "No file was found when using with_first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}
ok: [pi.brooks.network] => (item=/Users/brooks/.ansible/roles/cloudalchemy.node-exporter/vars/debian.yml)
TASK [cloudalchemy.node-exporter : Naive assertion of proper listen address] ******************************************************************************************
ok: [pi.brooks.network] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [cloudalchemy.node-exporter : Fail on unsupported init systems] **************************************************************************************************
skipping: [pi.brooks.network]
TASK [cloudalchemy.node-exporter : Check collectors] ******************************************************************************************************************
TASK [cloudalchemy.node-exporter : Get checksum list from github] *****************************************************************************************************
objc[93522]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[93522]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork()
child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
And I get this from OSX:
I'm running this from MacOS 10.13.6, with Python version 2.7.15
installed from brew.
Trying to install node-exporter on two servers:
roles:
- cloudalchemy.node-exporter
and get strange error:
TASK [cloudalchemy.node-exporter : Download Node exporter binary to local folder] ****************************************************************************************************
FAILED - RETRYING: Download Node exporter binary to local folder (5 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (5 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (4 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (4 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (3 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (3 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (2 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (2 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (1 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (1 retries left).
fatal: [runner-1 -> localhost]: FAILED! => changed=false
attempts: 5
module_stderr: |-
/bin/sh: /usr/bin/python2: No such file or directory
module_stdout: ''
msg: |-
The module failed to execute correctly, you probably need to set the interpreter.
See stdout/stderr for the exact error
rc: 127
fatal: [runner-2 -> localhost]: FAILED! => changed=false
attempts: 5
module_stderr: |-
/bin/sh: /usr/bin/python2: No such file or directory
module_stdout: ''
msg: |-
The module failed to execute correctly, you probably need to set the interpreter.
See stdout/stderr for the exact error
rc: 127
Server 1: Linux runner-2 4.9.0-7-amd64 #1 SMP Debian 4.9.110-1 (2018-07-05) x86_64 GNU/Linux
Server 2: Linux runner 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Localhost: Linux localhost.localdomain 4.18.14-200.fc28.x86_64 #1 SMP Mon Oct 15 13:16:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
There is /tmp folder and an /usr/bin/python2 on both servers and on the localhost.
What's wrong?
What happened?
I'm trying to use the role to install node-exporter on a CentOS host from my MacOS machine. All other tasks from the playbook, involving various other roles such as docker work fine. The task "Get Checksum" fails with a strange error, see below. In parallel, a system dialogue shows up mentioning that Python quit unexpectedly.
Did you expect to see some different?
I'd expect the role to work :)
How to reproduce it (as minimally and precisely as possible):
I'm just using the following code snipped in my playbook:
- hosts: prometheus_monitoring
remote_user: ansible
become: yes
roles:
- role: cloudalchemy.node-exporter
vars:
node_exporter_basic_auth_users:
my_username: "{{ node_exporter_basic_auth_password }}"
Environment
ansible-playbook --version
WARNING: Executing a script that is loading libcrypto in an unsafe way. This will fail in a future version of macOS. Set the LIBRESSL_REDIRECT_STUB_ABORT=1 in the environment to force this into an error.
ansible-playbook 2.9.7
config file = /Users/dherrman/AnsibleProjects/makerspace-ansible/ansible.cfg
configured module search path = [u'/Users/dherrman/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /Library/Python/2.7/site-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 2.7.16 (default, Feb 29 2020, 01:55:37) [GCC 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.29.20) (-macos10.15-objc-
Role version:
0.21.0
Ansible version information:
See environment
Variables:
None relevant to the issue I believe
TASK [cloudalchemy.node-exporter : Get checksum list from github] **************
task path: /Users/<username>/AnsibleProjects/<project>-ansible/roles/cloudalchemy.node-exporter/tasks/preflight.yml:99
ERROR! A worker was found in a dead state
Anything else we need to know?:
Currently there are only ways to enable exporters. There should also be a way to disable them.
Hi,
would you accept PRs to add FreeBSD support to your role?
thanks!
Replacing lookup()
with uri
:
- name: Get checksum list from github
uri:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
method: GET
return_content: true
register: _checksum_result
until: _checksum_result.status == 200
retries: 5
- name: Set _checksums
set_fact:
_checksums: "{{ _checksum_result.stdout_lines }}"
run_once: true
Yields the 400
— but not sure where Authorization
header is introduced?
<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
--
2888 | <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 && echo ansible-tmp-1595343177.2519362-233-56170830947126="` echo /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 `" ) && sleep 0'
2889 | Using module file /usr/local/lib/python3.7/site-packages/ansible/modules/net_tools/basics/uri.py
2890 | Pipelining is enabled.
2891 | <localhost> EXEC /bin/sh -c '/usr/local/bin/python && sleep 0'
2892 | <localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126/ > /dev/null 2>&1 && sleep 0'
2893 | FAILED - RETRYING: Get checksum list from github (1 retries left).Result was: {
2894 | "attempts": 5,
2895 | "changed": false,
2896 | "connection": "close",
2897 | "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>Basic **redacted**</ArgumentValue><RequestId>C27DCB3881334C01</RequestId><HostId>FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=</HostId></Error>",
2898 | "content_type": "application/xml",
2899 | "date": "Tue, 21 Jul 2020 14:52:58 GMT",
2900 | "elapsed": 0,
2901 | "invocation": {
2902 | "module_args": {
2903 | "attributes": null,
2904 | "backup": null,
2905 | "body": null,
2906 | "body_format": "raw",
2907 | "client_cert": null,
2908 | "client_key": null,
2909 | "content": null,
2910 | "creates": null,
2911 | "delimiter": null,
2912 | "dest": null,
2913 | "directory_mode": null,
2914 | "follow": false,
2915 | "follow_redirects": "safe",
2916 | "force": false,
2917 | "force_basic_auth": false,
2918 | "group": null,
2919 | "headers": {},
2920 | "http_agent": "ansible-httpget",
2921 | "method": "GET",
2922 | "mode": null,
2923 | "owner": null,
2924 | "regexp": null,
2925 | "remote_src": null,
2926 | "removes": null,
2927 | "return_content": true,
2928 | "selevel": null,
2929 | "serole": null,
2930 | "setype": null,
2931 | "seuser": null,
2932 | "src": null,
2933 | "status_code": [
2934 | 200
2935 | ],
2936 | "timeout": 30,
2937 | "unix_socket": null,
2938 | "unsafe_writes": null,
2939 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2940 | "url_password": null,
2941 | "url_username": null,
2942 | "use_proxy": true,
2943 | "validate_certs": true
2944 | }
2945 | },
2946 | "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request",
2947 | "redirected": false,
2948 | "retries": 6,
2949 | "server": "AmazonS3",
2950 | "status": 400,
2951 | "transfer_encoding": "chunked",
2952 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2953 | "x_amz_id_2": "FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=",
2954 | "x_amz_request_id": "C27DCB3881334C01"
2955 | }
Originally posted by @till in #165 (comment)
Hello
In all cloudalchemy roles username and group are hard coded.
In this role username and group are freely definable.
Is there a reason for this?
I think it should be the same style in all roles for convenient reasons.
Either username and group are hard coded or freely definable.
Do we have an opinion on this?
Greetings
What is missing?
I think the role should support the use of a Github Token to avoid rate limiting.
Currently, I keep running into a 400 (Bad Request) errors when I deploy lots of nodes in parallel. That seems to rate-limiting from Github's end. In my environment, I cd to about 20 customer setups, each of them may have multiple nodes. The part we are continuously deploying involves a basic monitoring setup on each node (including the node-exporter).
Why do we need it?
We run Ansible in a container, shared-nothing so to speak. The container gets invoked during CI (merge to main branch).
There is no caching between builds, it seems to work well. There are no side effects, except for this part where each run makes requests against Github (API) resources and seems to run into the rate-limit eventually.
I was digging around, it seems that both of these blocks happen every time (unless I download the binary myself):
Checksum gathering:
ansible-node-exporter/tasks/preflight.yml
Lines 98 to 110 in b9cb0ee
Downloading:
ansible-node-exporter/tasks/install.yml
Lines 20 to 52 in c6ffcfd
I would think I wouldn't download anything — unless I really needed? Do you have any thoughts on changing that?
Environment
Role version:
0.21.3
Ansible version information:
root@5c649b77871a:/ansible-all-the-things# ansible --version
ansible 2.9.8
config file = /ansible-all-the-things/ansible.cfg
configured module search path = ['/ansible-all-the-things/library']
ansible python module location = /usr/local/lib/python3.7/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.7.8 (default, Jun 30 2020, 18:36:05) [GCC 8.3.0]
Anything else we need to know?:
#114 has renamed redhat.yml
to redhat-7.yml
.
This has broken Amazon ec2 installs with
{"msg": "No file was found when using first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}
Amazon returns "ansible_os_family": "RedHat",
and here is where I assume we used to match it:
ansible-node-exporter/tasks/main.yml
Line 11 in 0a59431
Just needs a symlink to the old redhat.yml
from redhat-7.yml
I reckon (or 8 - I'm not sure).
It looks like under certain circumstances, this role seems to be wreaking havoc where it is provisioned. It changes the permissions of the entire root folder (!?).
Example output:
TASK [cloudalchemy.node-exporter : Install dependencies] *********************************************************************************************************************
TASK [cloudalchemy.node-exporter : Create the node_exporter group] ***********************************************************************************************************
changed: [the_vm_ip]
TASK [cloudalchemy.node-exporter : Create the node_exporter user] ************************************************************************************************************
fatal: [the_vm_ip]: FAILED! => {"changed": false, "msg": "[Errno 1] Operation not permitted: '/proc/sys'"}
After this failure, I SSH'd into the system and found this....
system-username@vm-hostname:~$ ls -la /
total 88
drwxr-xr-x 23 node-exp users 4096 Oct 16 12:43 .
drwxr-xr-x 23 node-exp users 4096 Oct 16 12:43 ..
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:31 bin
drwxr-xr-x 3 node-exp users 4096 Oct 10 19:32 boot
drwxr-xr-x 16 node-exp users 3580 Oct 16 12:43 dev
drwxr-xr-x 103 node-exp users 4096 Oct 16 13:01 etc
drwxr-xr-x 18 node-exp users 4096 Oct 16 12:59 home
lrwxrwxrwx 1 root root 31 Oct 10 19:32 initrd.img -> boot/initrd.img-4.15.0-1046-gcp
lrwxrwxrwx 1 root root 31 Oct 10 19:32 initrd.img.old -> boot/initrd.img-4.15.0-1046-gcp
drwxr-xr-x 20 node-exp users 4096 Oct 16 12:51 lib
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:29 lib64
drwx------ 2 node-exp users 16384 Oct 10 19:31 lost+found
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:29 media
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:29 mnt
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:29 opt
dr-xr-xr-x 717 node-exp users 0 Oct 16 12:43 proc
drwx------ 4 node-exp users 4096 Oct 16 12:51 root
drwxr-xr-x 23 node-exp users 940 Oct 16 13:02 run
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:31 sbin
drwxr-xr-x 2 node-exp users 4096 Oct 16 12:43 snap
drwxr-xr-x 2 node-exp users 4096 Oct 10 19:29 srv
dr-xr-xr-x 13 node-exp users 0 Oct 16 12:48 sys
drwxrwxrwt 8 node-exp users 4096 Oct 16 13:01 tmp
drwxr-xr-x 10 node-exp users 4096 Oct 10 19:29 usr
drwxr-xr-x 14 node-exp users 4096 Oct 16 12:51 var
lrwxrwxrwx 1 root root 28 Oct 10 19:32 vmlinuz -> boot/vmlinuz-4.15.0-1046-gcp
lrwxrwxrwx 1 root root 28 Oct 10 19:32 vmlinuz.old -> boot/vmlinuz-4.15.0-1046-gcp
The above is a VM running in GCP. I have another VM running in GCP where I have run the same version of the role against it and this did not happen.
This is the playbook where this is happening:
# Playbook where the issue happens
---
- hosts: "{{ hosts_group }}"
gather_facts: true
become: true
roles:
- role: lifeofguenter.oracle-java
become: yes
- role: jobscore.beats
become: yes
- role: torian.logstash
become: yes
- role: cloudalchemy.node-exporter
And this is the playbook where this does not happen:
# Playbook where problem does not occur
---
- hosts: "{{ hosts_group }}"
gather_facts: yes
roles:
- role: jobscore.beats
become: yes
- role: torian.logstash
become: yes
- role: cloudalchemy.node-exporter
And this is the requirements.yml
file used in both projects:
---
- src: https://github.com/jobscore/ansible-role-beats/archive/v0.1.1.tar.gz
name: jobscore.beats
- src: https://github.com/torian/ansible-role-logstash/archive/1.2.0.tar.gz
name: torian.logstash
- src: https://github.com/lifeofguenter/ansible-role-oracle-java/archive/1.0.2.tar.gz
name: lifeofguenter.oracle-java
- src: https://github.com/cloudalchemy/ansible-node-exporter/archive/0.15.0.tar.gz
name: cloudalchemy.node-exporter
The only visible difference is the become: true
defined in the playbook where this happens. But still, why would the role change the permissions of the entire system? 🤔
What happened?
Setting the filesystem collector parameter ignored-mount-points puts the value in quotes to the service file,
which is fed verbatimly into regexp parser and makes the expression "useless".
Is there some way to work around this obstacle?
(detailed description below)
Did you expect to see some different?
As per these two node exporter issues quoting the regexp in service file is a wrong thing to do:
prometheus/node_exporter#911 (comment)
prometheus/node_exporter#1000 (comment)
How to reproduce it (as minimally and precisely as possible):
Use the example from the defaults/main.yml file:
node_exporter_enabled_collectors:
- filesystem:
ignored-mount-points: "^/(sys|proc|dev)($|/)"
Produces the .service file with execstart line:
ExecStart=/usr/local/bin/node_exporter \
--collector.textfile \
--collector.textfile.directory=/opt/prometheus_data \
--collector.filesystem \
--collector.filesystem.ignored-mount-points='^/(sys|proc|dev)($|/)' \
--web.listen-address=0.0.0.0:9100
Result:
# curl -s localhost:9100/metrics|grep 'mountpoint="/sys'
node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10
node_filesystem_device_error{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 0
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 2.2158326e+07
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 2.215831e+07
node_filesystem_free_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10
node_filesystem_readonly{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 1
node_filesystem_size_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10
If the quotes are manually removed from the switch line:
--collector.filesystem.ignored-mount-points=^/(sys|proc|dev)($|/) \
then the result is as expected (mountpoint is ignored):
# systemctl daemon-reload
# systemctl restart node_exporter.service
# curl -s localhost:9100/metrics|grep 'mountpoint="/sys'
#
Role version:
version: 0.21.5
Ansible version information:
ansible --version
ansible 2.8.2
config file = /root/src/clip-hpc/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /root/src/clip-hpc-venv/lib/python3.6/site-packages/ansible
executable location = /root/src/clip-hpc-venv/bin/ansible
python version = 3.6.8 (default, Dec 5 2019, 15:45:45) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
TASK [cloudalchemy.node-exporter : Assert collectors are not both disabled and enabled at the same time] ***
failed: [host] (item=diskstats) => {
"assertion": "item in node_exporter_enabled_collectors",
"changed": false,
"evaluated_to": false,
"item": "diskstats",
"msg": "Assertion failed"
}
failed: [host] (item=mdadm) => {
"assertion": "item in node_exporter_enabled_collectors",
"changed": false,
"evaluated_to": false,
"item": "mdadm",
"msg": "Assertion failed"
}
I recently updated the role locally and it started failing with the above message. I think this check was introduced in #22. How do you disable what is enabled by this role (by default)? So far, I was using the role and disabled the collectors that I didn't need and it worked — node-exporter didn't use the collector for e.g. mdam or diskstats.
With #22, my deploy stalls as the collector is enabled by default, but I can't disable it anymore.
Do I need to now explicitly set all enabled and disabled?
I used this role to install and run node_exporter on a fresh new VM, everything is installed but the service does not run by default.
ansible 2.7.7
cloudalchemy.node-exporter 0.12.0
Operating System Debian 9
It install the node-exporter, the service and all of what is needed for it to run BUT does not start the service.
Install everything and start the service.
Build a fresh instance of Debian 9 and run the playbook with default values on it.
Would it be possible to add support for darwin/macOS?
I just switched to using this role for our Linux based hosts; but we have a quite substantial amount of nodes that are running darwin as well.
What happened?
Role apply over a fresh centos7 vagrant box fails with:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named seobject
fatal: [centos7]: FAILED! => {"changed": false, "msg": "Failed to import the required Python library (policycoreutils-python) on localhost.localdomain's Python /usr/bin/python. Please read module documentation and install in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}
Did you expect to see some different?
The role should install missing python dependecies (policycoreutils-python
). Role works with no problem on cenots8.
How to reproduce it (as minimally and precisely as possible):
Environment
Role version:
commit: bde46a6273283b22d1ef7277ebbcd390b3680825
Ansible version information:
ansible 2.9.7`
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]
hello, I want to deploy node_exporter on our cantos 6.x nodes but there is no systemd there so how can I deploy it there.
It would be nice if this role can automate the deployment of those useful scripts:
https://github.com/prometheus-community/node-exporter-textfile-collector-scripts
On current RHEL, the dependency packages are python3-libselinux and python3-policycoreutils.
Hello,
maybe I am wrong but shouldn't your role first deploy the node exporter and then restart the service via a handler?
PLAY [Deploy node_exporter] ********************************************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************************************************
ok: [192.168.33.11]
ok: [192.168.33.10]
TASK [cloudalchemy.node-exporter : check collectors] *******************************************************************************************************
TASK [cloudalchemy.node-exporter : Get checksum for amd64 architecture] ************************************************************************************
skipping: [192.168.33.10] => (item=747ee549c1010947a8b162b1434976fe6cb8445540521d2fcc283765e4be1a79 node_exporter-0.16.0.darwin-386.tar.gz)
skipping: [192.168.33.10] => (item=73a8c451bd14dea587ebf2fd1258471fe97bddbae6f44b6a9d3ce7e2327bc91d node_exporter-0.16.0.darwin-amd64.tar.gz)
skipping: [192.168.33.10] => (item=2f18a32a7bb1c91307ed776cce50559bbcd66af90a61ea0a22a661ebe79e4fda node_exporter-0.16.0.linux-386.tar.gz)
skipping: [192.168.33.11] => (item=747ee549c1010947a8b162b1434976fe6cb8445540521d2fcc283765e4be1a79 node_exporter-0.16.0.darwin-386.tar.gz)
skipping: [192.168.33.11] => (item=73a8c451bd14dea587ebf2fd1258471fe97bddbae6f44b6a9d3ce7e2327bc91d node_exporter-0.16.0.darwin-amd64.tar.gz)
skipping: [192.168.33.11] => (item=2f18a32a7bb1c91307ed776cce50559bbcd66af90a61ea0a22a661ebe79e4fda node_exporter-0.16.0.linux-386.tar.gz)
ok: [192.168.33.10] => (item=e92a601a5ef4f77cce967266b488a978711dabc527a720bea26505cba426c029 node_exporter-0.16.0.linux-amd64.tar.gz)
skipping: [192.168.33.10] => (item=c793e8278ec6a167a49518d72dd928361a045bd4c8b155a22d5b158dd3aea2ac node_exporter-0.16.0.linux-arm64.tar.gz)
skipping: [192.168.33.10] => (item=18c91a0247f4bc97fb7cdd96502cd8a804a96f42a16357b39f43e28b3d2ac864 node_exporter-0.16.0.linux-armv5.tar.gz)
ok: [192.168.33.11] => (item=e92a601a5ef4f77cce967266b488a978711dabc527a720bea26505cba426c029 node_exporter-0.16.0.linux-amd64.tar.gz)
skipping: [192.168.33.10] => (item=f9518aea4fa7127122a6bf384ba8f70120deaaef75532749f1765cf6e25fd820 node_exporter-0.16.0.linux-armv6.tar.gz)
skipping: [192.168.33.11] => (item=c793e8278ec6a167a49518d72dd928361a045bd4c8b155a22d5b158dd3aea2ac node_exporter-0.16.0.linux-arm64.tar.gz)
skipping: [192.168.33.10] => (item=b8bf44c025ec2c5210bdda185f8e72b29ccd3eb9be339b8dbf96835d4fc1965d node_exporter-0.16.0.linux-armv7.tar.gz)
skipping: [192.168.33.11] => (item=18c91a0247f4bc97fb7cdd96502cd8a804a96f42a16357b39f43e28b3d2ac864 node_exporter-0.16.0.linux-armv5.tar.gz)
skipping: [192.168.33.10] => (item=e0561e421deb02f343e2dd5a75ad322bf6960de56c0fa965d9708f6b237f02b0 node_exporter-0.16.0.netbsd-386.tar.gz)
skipping: [192.168.33.11] => (item=f9518aea4fa7127122a6bf384ba8f70120deaaef75532749f1765cf6e25fd820 node_exporter-0.16.0.linux-armv6.tar.gz)
skipping: [192.168.33.10] => (item=293451f83ace3f25e36466fe34024827ac03dee6bf3c3694efdbc0c732959033 node_exporter-0.16.0.netbsd-amd64.tar.gz)
skipping: [192.168.33.11] => (item=b8bf44c025ec2c5210bdda185f8e72b29ccd3eb9be339b8dbf96835d4fc1965d node_exporter-0.16.0.linux-armv7.tar.gz)
skipping: [192.168.33.11] => (item=e0561e421deb02f343e2dd5a75ad322bf6960de56c0fa965d9708f6b237f02b0 node_exporter-0.16.0.netbsd-386.tar.gz)
skipping: [192.168.33.11] => (item=293451f83ace3f25e36466fe34024827ac03dee6bf3c3694efdbc0c732959033 node_exporter-0.16.0.netbsd-amd64.tar.gz)
TASK [cloudalchemy.node-exporter : Create the Node Exporter group] *****************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]
TASK [cloudalchemy.node-exporter : Create the Node Exporter user] ******************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]
TASK [cloudalchemy.node-exporter : Download node_exporter binary to local folder] **************************************************************************
ok: [192.168.33.10 -> localhost]
ok: [192.168.33.11 -> localhost]
TASK [cloudalchemy.node-exporter : Unpack node_exporter binary] ********************************************************************************************
skipping: [192.168.33.10]
skipping: [192.168.33.11]
TASK [cloudalchemy.node-exporter : Propagate Node Exporter binaries] ***************************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]
TASK [cloudalchemy.node-exporter : Create texfile collector dir] *******************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]
TASK [cloudalchemy.node-exporter : Install libcap on Debian systems] ***************************************************************************************
ok: [192.168.33.11]
ok: [192.168.33.10]
TASK [cloudalchemy.node-exporter : Node exporter can read anything (omit file permissions)] ****************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]
TASK [cloudalchemy.node-exporter : Copy the Node Exporter systemd service file] ****************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]
TASK [cloudalchemy.node-exporter : Install dependencies on RedHat OS family] *******************************************************************************
skipping: [192.168.33.10] => (item=libselinux-python)
skipping: [192.168.33.10] => (item=policycoreutils-python)
skipping: [192.168.33.11] => (item=libselinux-python)
skipping: [192.168.33.11] => (item=policycoreutils-python)
TASK [cloudalchemy.node-exporter : Allow Node Exporter port in SELinux on RedHat OS family] ****************************************************************
skipping: [192.168.33.10]
skipping: [192.168.33.11]
TASK [cloudalchemy.node-exporter : Ensure Node Exporter is enabled on boot] ********************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]
RUNNING HANDLER [cloudalchemy.node-exporter : restart node exporter] ***************************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]
PLAY [Deploy blackbox_exporter] ****************************************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************************************************
ok: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter system group] ******************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter system user] *******************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter directories] *******************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : download blackbox exporter binary to local folder] ******************************************************************
skipping: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : propagate blackbox exporter binary] *********************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : Install libcap on Debian systems] ***********************************************************************************
ok: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : Ensure blackbox exporter binary has cap_net_raw capability] *********************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : create systemd service unit] ****************************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : configure blackbox exporter] ****************************************************************************************
changed: [192.168.33.11]
TASK [cloudalchemy.blackbox-exporter : ensure blackbox_exporter service is enabled] ************************************************************************
changed: [192.168.33.11]
RUNNING HANDLER [cloudalchemy.blackbox-exporter : restart blackbox exporter] *******************************************************************************
changed: [192.168.33.11]
RUNNING HANDLER [cloudalchemy.blackbox-exporter : reload blackbox exporter] ********************************************************************************
changed: [192.168.33.11]
What happened?
Role fails to unpack node_exporter binary because of file permissions.
Maybe related to this security fix in ansible: ansible/ansible#67794
Did you expect to see some different?
I was expecting the installation to succeed as with previous versions of ansible.
How to reproduce it (as minimally and precisely as possible):
Use version of ansible = 2.9.12.
Environment
Role version:
0.21.3
Ansible version information:
2.9.12
Variables:
insert role variables relevant to the issue
qemu: TASK [node-exporter : Download node_exporter binary to local folder] ***********
qemu: changed: [localhost]
==> qemu: [WARNING]: File '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz' created with
==> qemu: default permissions '600'. The previous default was '666'. Specify 'mode' to
==> qemu: avoid this warning.
qemu:
qemu: TASK [node-exporter : Unpack node_exporter binary] *****************************
qemu: fatal: [localhost]: FAILED! => {"changed": false, "msg": "an error occurred while trying to read the file '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz': [Errno 13] Permission denied: '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz'"}
Hi,
I got the error message when try to run the ansible playbook.
ansible-node-exporter version: 0.15.0
to be installed node-exporter version: 0.18.1
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L71
TASK [node-exporter : Get checksum list from github] ***********************************************************************************************************************************
fatal: [host]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Error validating the server's certificate for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt: Failed to validate the SSL certificate for github.com:443. Make sure your managed systems have a valid CA certificate installed. You can use validate_certs=False if you do not need to confirm the servers identity but this is unsafe and not recommended. Paths checked for this platform: /etc/ssl/certs, /etc/ansible, /usr/local/etc/openssl. The exception msg was: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)."}
Could you help have a look?
Thanks,
What happened?
Installed on my arm64 instance, using ansible 2.7.10, no problems.
installed on my amd64 instance, using ansible 2.7.18, fails to find the correct checksum.
with logging:
TASK [node-exporter : Get checksum list from github] **************************************************************************************************************************************************************************************************************************************************************************
task path: /home/demo/zinfra/cailleach/environments/avs-test/ansible/.galaxy/node-exporter/tasks/preflight.yml:99
ok: [avs-test-sft01 -> localhost] => {
"ansible_facts": {
"_checksums": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791 node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47 node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929 node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0 node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78 node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857 node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241 node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808 node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146 node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06 node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260 node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56 node_exporter-1.0.1.netbsd-amd64.tar.gz']"
},
"changed": false
}
TASK [node-exporter : Get checksum for amd64 architecture] ********************************************************************************************************************************************************************************************************************************************************************
task path: /home/demo/zinfra/cailleach/environments/avs-test/ansible/.galaxy/node-exporter/tasks/preflight.yml:104
ok: [avs-test-sft01 -> localhost] => (item=['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791 node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47 node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929 node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0 node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78 node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857 node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241 node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808 node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146 node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06 node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260 node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56 node_exporter-1.0.1.netbsd-amd64.tar.gz']) => {
"ansible_facts": {
"node_exporter_checksum": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791"
},
"changed": false,
"item": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791 node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47 node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929 node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0 node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78 node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857 node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241 node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808 node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146 node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06 node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260 node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56 node_exporter-1.0.1.netbsd-amd64.tar.gz']"
}
Did you expect to see some different?
rather than finding "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791", i expected node_exporter_checksum to be "3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae"
How to reproduce it (as minimally and precisely as possible):
Run with ansible 2.7.18, instead of 2.7.10
Environment
Role version:
0.21.5
Ansible version information:
ansible 2.7.18 config file = /etc/ansible/ansible.cfg configured module search path = ['/home/demo/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/demo/zinfra/cailleach/third_party/.poetry/venvs/third-party-jL9HBFdt-py3.8/lib/python3.8/site-packages/ansible executable location = /home/demo/zinfra/cailleach/third_party/.poetry/venvs/third-party-jL9HBFdt-py3.8/bin/ansible python version = 3.8.4 (default, Jul 13 2020, 21:16:07) [GCC 9.3.0]
Variables:
insert role variables relevant to the issue
insert Ansible logs relevant to the issue here
Anything else we need to know?:
What did you do?
I'm currently trying to set up a node_exporter
on a test server using both TLS and basic authentication.
I have setup a simple user in my playbook as mentioned in the README.
node_exporter_basic_auth_users:
user: password
But this results in the following config file on the remote machine:
basic_auth_users:
user: *0
Changing the password to something else does not affect the *0
field.
The following error is also raised when the node_exporter
is starting up:
level=error ts=2020-10-14T13:31:20.226Z caller=node_exporter.go:194 err="yaml: unknown anchor '0' referenced"
The exporter works well if no users are specified.
Any idea where I could have made a mistake?
Did you expect to see some different?
No crash on start up and a correctly hashed password in the config file.
Environment
Role version:
0.22.0
Ansible version information:
ansible 2.9.12
config file = None
configured module search path = ['/home/tdh/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/tdh/.local/lib/python3.8/site-packages/ansible
executable location = /home/tdh/.local/bin/ansible
python version = 3.8.2 (default, Apr 27 2020, 15:53:34) [GCC 9.3.0]
node_exporter_basic_auth_users:
user: password
It would be nice to have an option to configure collectors.
My proposal is to depend on collectors enabled by default in node_exporter and change var node_exporter_enabled_collectors
into node_exporter_enabled_collectors
which would have values like:
node_exporter_enabled_collectors:
- timex
- filesystem:
option_a: value
option_b: val
What happened?
Task "Copy the node_exporter config file" fails with the following error:
"msg": "AnsibleError: Unexpected templating type error occurred on (---\n{{ ansible_managed | comment }}\n{% if node_exporter_tls_server_config | length > 0 %}\ntls_server_config:\n{{ node_exporter_tls_server_config | to_nice_yaml | indent(2, true) }}\n{% endif %}\n\n{% if node_exporter_http_server_config | length > 0 %}\nhttp_server_config:\n{{ node_exporter_http_server_config | to_nice_yaml | indent(2, true) }}\n{% endif %}\n\n{% if node_exporter_basic_auth_users | length > 0 %}\nbasic_auth_users:\n{% for k, v in node_exporter_basic_auth_users.items() %}\n {{ k }}: {{ v | password_hash('bcrypt', ('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890' | shuffle(seed=inventory_hostname) | join)[:22], rounds=9) }}\n{% endfor %}\n{% endif %}\n): value must be a string"
Removing
| to_nice_yaml | indent(2, true)
from the config.yaml.j2 template
avoids the error. Resulting config:
tls_server_config:
{'cert_file': '/etc/node_exporter/tls.cert', 'key_file': '/etc/node_exporter/tls.key'}
I suspect this is related to:
ansible/ansible#66916
Did you expect to see some different?
config.yaml is generated without errors.
How to reproduce it (as minimally and precisely as possible):
used playbook:
- hosts: server
roles:
- ansible-node-exporter
vars:
node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter
node_exporter_tls_server_config:
cert_file: '/etc/node_exporter/tls.cert'
key_file: '/etc/node_exporter/tls.key
ensure your ansible version matches the one bellow
Environment
0.21.0
ansible 2.9.9
config file = /home/user/.ansible.cfg
configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.7.7 (default, Mar 13 2020, 10:23:39) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
python3-pyyaml-5.3.1-1.fc31.x86_64
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.